Training: 2022-04-27 01:27:04,685-rank_id: 0
Training: 2022-04-27 01:27:18,502-: margin_list              [1.0, 0.5, 0.0]
Training: 2022-04-27 01:27:18,503-: network                  r50
Training: 2022-04-27 01:27:18,503-: resume                   False
Training: 2022-04-27 01:27:18,503-: output                   work_dirs/ms1mv2_r50
Training: 2022-04-27 01:27:18,503-: embedding_size           512
Training: 2022-04-27 01:27:18,503-: sample_rate              1.0
Training: 2022-04-27 01:27:18,503-: interclass_filtering_threshold0
Training: 2022-04-27 01:27:18,503-: fp16                     True
Training: 2022-04-27 01:27:18,503-: batch_size               128
Training: 2022-04-27 01:27:18,503-: optimizer                sgd
Training: 2022-04-27 01:27:18,503-: lr                       0.1
Training: 2022-04-27 01:27:18,503-: momentum                 0.9
Training: 2022-04-27 01:27:18,503-: weight_decay             0.0005
Training: 2022-04-27 01:27:18,503-: verbose                  2000
Training: 2022-04-27 01:27:18,504-: frequent                 10
Training: 2022-04-27 01:27:18,504-: dali                     False
Training: 2022-04-27 01:27:18,504-: rec                      /train_tmp/faces_emore
Training: 2022-04-27 01:27:18,504-: num_classes              85742
Training: 2022-04-27 01:27:18,504-: num_image                5822653
Training: 2022-04-27 01:27:18,504-: num_epoch                20
Training: 2022-04-27 01:27:18,504-: warmup_epoch             0
Training: 2022-04-27 01:27:18,504-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-27 01:27:18,504-: total_batch_size         1024
Training: 2022-04-27 01:27:18,504-: warmup_step              0
Training: 2022-04-27 01:27:18,504-: total_step               113720
Training: 2022-04-27 01:28:27,042-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-27 01:28:30,477-Speed 5360.42 samples/sec   Loss 46.3456   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 01:28:32,319-Speed 5564.04 samples/sec   Loss 47.0751   LearningRate 0.0999   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 01:28:35,036-Speed 3769.84 samples/sec   Loss 47.3905   LearningRate 0.0999   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 01:28:36,888-Speed 5534.23 samples/sec   Loss 47.9100   LearningRate 0.0999   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 01:28:38,706-Speed 5636.48 samples/sec   Loss 47.6235   LearningRate 0.0999   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 01:28:40,521-Speed 5644.10 samples/sec   Loss 47.0028   LearningRate 0.0999   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 01:28:42,345-Speed 5616.69 samples/sec   Loss 46.8146   LearningRate 0.0999   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 01:28:44,170-Speed 5615.92 samples/sec   Loss 46.6020   LearningRate 0.0998   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 01:28:45,961-Speed 5719.33 samples/sec   Loss 46.3931   LearningRate 0.0998   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 01:28:47,757-Speed 5705.15 samples/sec   Loss 46.2863   LearningRate 0.0998   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:28:49,557-Speed 5692.15 samples/sec   Loss 45.9337   LearningRate 0.0998   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:28:51,358-Speed 5689.73 samples/sec   Loss 45.7566   LearningRate 0.0998   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 01:28:53,167-Speed 5665.25 samples/sec   Loss 45.5772   LearningRate 0.0998   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 01:28:54,962-Speed 5708.78 samples/sec   Loss 45.3376   LearningRate 0.0997   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 01:28:56,789-Speed 5606.88 samples/sec   Loss 45.1236   LearningRate 0.0997   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 01:28:58,619-Speed 5597.91 samples/sec   Loss 45.0515   LearningRate 0.0997   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 01:29:00,464-Speed 5553.50 samples/sec   Loss 44.8643   LearningRate 0.0997   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 01:29:02,338-Speed 5468.18 samples/sec   Loss 44.5414   LearningRate 0.0997   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 01:29:04,150-Speed 5653.29 samples/sec   Loss 44.4042   LearningRate 0.0996   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 01:29:05,952-Speed 5685.72 samples/sec   Loss 44.2332   LearningRate 0.0996   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:07,740-Speed 5728.02 samples/sec   Loss 43.8737   LearningRate 0.0996   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:09,561-Speed 5626.85 samples/sec   Loss 43.8407   LearningRate 0.0996   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:11,384-Speed 5623.59 samples/sec   Loss 43.6009   LearningRate 0.0996   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:13,271-Speed 5428.23 samples/sec   Loss 43.2571   LearningRate 0.0996   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:15,101-Speed 5599.94 samples/sec   Loss 43.2277   LearningRate 0.0995   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:16,901-Speed 5691.68 samples/sec   Loss 43.0493   LearningRate 0.0995   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:18,728-Speed 5608.91 samples/sec   Loss 42.8201   LearningRate 0.0995   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:20,572-Speed 5556.54 samples/sec   Loss 42.6947   LearningRate 0.0995   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:22,408-Speed 5579.72 samples/sec   Loss 42.4567   LearningRate 0.0995   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:29:24,208-Speed 5691.81 samples/sec   Loss 42.2410   LearningRate 0.0995   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:25,993-Speed 5740.73 samples/sec   Loss 42.0434   LearningRate 0.0994   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:27,778-Speed 5739.99 samples/sec   Loss 41.8529   LearningRate 0.0994   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:29,586-Speed 5666.01 samples/sec   Loss 41.7013   LearningRate 0.0994   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:31,392-Speed 5674.10 samples/sec   Loss 41.4476   LearningRate 0.0994   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:33,226-Speed 5593.65 samples/sec   Loss 41.1463   LearningRate 0.0994   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:35,066-Speed 5567.77 samples/sec   Loss 41.0893   LearningRate 0.0994   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:36,898-Speed 5592.77 samples/sec   Loss 40.8740   LearningRate 0.0993   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:38,704-Speed 5673.70 samples/sec   Loss 40.7653   LearningRate 0.0993   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:40,504-Speed 5691.46 samples/sec   Loss 40.5158   LearningRate 0.0993   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:42,311-Speed 5671.69 samples/sec   Loss 40.3706   LearningRate 0.0993   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:29:44,124-Speed 5649.65 samples/sec   Loss 40.2503   LearningRate 0.0993   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:29:45,925-Speed 5692.03 samples/sec   Loss 39.9941   LearningRate 0.0992   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:47,749-Speed 5615.71 samples/sec   Loss 39.8943   LearningRate 0.0992   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:49,582-Speed 5588.60 samples/sec   Loss 39.5720   LearningRate 0.0992   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:29:51,412-Speed 5598.68 samples/sec   Loss 39.4801   LearningRate 0.0992   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:29:53,228-Speed 5642.47 samples/sec   Loss 39.3382   LearningRate 0.0992   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:29:55,025-Speed 5701.10 samples/sec   Loss 39.0744   LearningRate 0.0992   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:29:56,840-Speed 5645.23 samples/sec   Loss 38.9429   LearningRate 0.0991   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:29:58,656-Speed 5641.35 samples/sec   Loss 38.7262   LearningRate 0.0991   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:00,477-Speed 5626.98 samples/sec   Loss 38.5426   LearningRate 0.0991   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:02,287-Speed 5662.42 samples/sec   Loss 38.3403   LearningRate 0.0991   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:04,097-Speed 5661.05 samples/sec   Loss 38.1049   LearningRate 0.0991   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:30:05,876-Speed 5756.98 samples/sec   Loss 37.8981   LearningRate 0.0991   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:07,704-Speed 5603.79 samples/sec   Loss 37.7569   LearningRate 0.0990   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:09,496-Speed 5718.73 samples/sec   Loss 37.6067   LearningRate 0.0990   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:11,319-Speed 5620.43 samples/sec   Loss 37.3457   LearningRate 0.0990   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:13,129-Speed 5660.87 samples/sec   Loss 37.2947   LearningRate 0.0990   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:14,925-Speed 5700.58 samples/sec   Loss 37.0705   LearningRate 0.0990   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:16,720-Speed 5708.65 samples/sec   Loss 36.8260   LearningRate 0.0989   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:18,507-Speed 5732.65 samples/sec   Loss 36.6641   LearningRate 0.0989   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:20,364-Speed 5516.45 samples/sec   Loss 36.4975   LearningRate 0.0989   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:22,170-Speed 5673.02 samples/sec   Loss 36.2007   LearningRate 0.0989   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:23,995-Speed 5615.45 samples/sec   Loss 36.1533   LearningRate 0.0989   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:30:25,820-Speed 5613.67 samples/sec   Loss 35.9954   LearningRate 0.0989   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:27,634-Speed 5646.97 samples/sec   Loss 35.6941   LearningRate 0.0988   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:29,455-Speed 5624.98 samples/sec   Loss 35.5594   LearningRate 0.0988   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:31,275-Speed 5628.94 samples/sec   Loss 35.2871   LearningRate 0.0988   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:33,117-Speed 5563.33 samples/sec   Loss 35.0293   LearningRate 0.0988   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:34,939-Speed 5621.83 samples/sec   Loss 34.9440   LearningRate 0.0988   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:36,781-Speed 5563.18 samples/sec   Loss 34.7333   LearningRate 0.0988   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:38,615-Speed 5583.55 samples/sec   Loss 34.6076   LearningRate 0.0987   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:40,401-Speed 5737.74 samples/sec   Loss 34.5230   LearningRate 0.0987   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:42,220-Speed 5634.43 samples/sec   Loss 34.2310   LearningRate 0.0987   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:44,007-Speed 5730.76 samples/sec   Loss 34.1398   LearningRate 0.0987   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:45,850-Speed 5560.71 samples/sec   Loss 33.7545   LearningRate 0.0987   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:47,681-Speed 5593.58 samples/sec   Loss 33.5612   LearningRate 0.0987   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:49,557-Speed 5459.62 samples/sec   Loss 33.4901   LearningRate 0.0986   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:51,368-Speed 5656.97 samples/sec   Loss 33.3739   LearningRate 0.0986   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:53,175-Speed 5670.17 samples/sec   Loss 33.0390   LearningRate 0.0986   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:55,012-Speed 5576.92 samples/sec   Loss 32.8303   LearningRate 0.0986   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:56,852-Speed 5569.41 samples/sec   Loss 32.7205   LearningRate 0.0986   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:30:58,654-Speed 5684.57 samples/sec   Loss 32.6235   LearningRate 0.0985   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:00,442-Speed 5727.37 samples/sec   Loss 32.3061   LearningRate 0.0985   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:02,246-Speed 5682.62 samples/sec   Loss 32.1047   LearningRate 0.0985   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:04,031-Speed 5738.87 samples/sec   Loss 31.8824   LearningRate 0.0985   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:05,826-Speed 5706.15 samples/sec   Loss 31.8638   LearningRate 0.0985   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:07,624-Speed 5695.89 samples/sec   Loss 31.5507   LearningRate 0.0985   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:09,446-Speed 5624.57 samples/sec   Loss 31.5783   LearningRate 0.0984   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:11,317-Speed 5475.16 samples/sec   Loss 31.2269   LearningRate 0.0984   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:13,196-Speed 5452.76 samples/sec   Loss 31.1127   LearningRate 0.0984   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:15,008-Speed 5653.30 samples/sec   Loss 30.9387   LearningRate 0.0984   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:16,858-Speed 5538.43 samples/sec   Loss 30.8575   LearningRate 0.0984   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:18,688-Speed 5595.83 samples/sec   Loss 30.5190   LearningRate 0.0984   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:20,539-Speed 5535.77 samples/sec   Loss 30.3220   LearningRate 0.0983   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:22,346-Speed 5671.37 samples/sec   Loss 30.2798   LearningRate 0.0983   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:24,160-Speed 5646.44 samples/sec   Loss 30.0458   LearningRate 0.0983   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:25,962-Speed 5686.34 samples/sec   Loss 29.8282   LearningRate 0.0983   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:27,759-Speed 5698.86 samples/sec   Loss 29.8006   LearningRate 0.0983   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:29,556-Speed 5700.82 samples/sec   Loss 29.5134   LearningRate 0.0982   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:31,363-Speed 5670.29 samples/sec   Loss 29.4889   LearningRate 0.0982   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:33,177-Speed 5648.63 samples/sec   Loss 29.1232   LearningRate 0.0982   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:34,961-Speed 5742.95 samples/sec   Loss 29.1206   LearningRate 0.0982   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:36,805-Speed 5556.00 samples/sec   Loss 28.8510   LearningRate 0.0982   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:38,612-Speed 5668.65 samples/sec   Loss 28.5462   LearningRate 0.0982   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:31:40,417-Speed 5675.38 samples/sec   Loss 28.3833   LearningRate 0.0981   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:42,254-Speed 5577.57 samples/sec   Loss 28.2652   LearningRate 0.0981   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:44,079-Speed 5614.10 samples/sec   Loss 28.1105   LearningRate 0.0981   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:45,945-Speed 5489.01 samples/sec   Loss 27.9250   LearningRate 0.0981   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:47,819-Speed 5465.89 samples/sec   Loss 27.8872   LearningRate 0.0981   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:49,635-Speed 5641.40 samples/sec   Loss 27.6505   LearningRate 0.0981   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:51,461-Speed 5610.30 samples/sec   Loss 27.4609   LearningRate 0.0980   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:53,295-Speed 5589.52 samples/sec   Loss 27.3734   LearningRate 0.0980   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:55,145-Speed 5541.51 samples/sec   Loss 27.3261   LearningRate 0.0980   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:57,012-Speed 5486.54 samples/sec   Loss 26.8758   LearningRate 0.0980   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:31:58,793-Speed 5749.02 samples/sec   Loss 26.9413   LearningRate 0.0980   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:00,580-Speed 5733.93 samples/sec   Loss 26.9153   LearningRate 0.0980   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:02,415-Speed 5584.69 samples/sec   Loss 26.6429   LearningRate 0.0979   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:04,246-Speed 5595.38 samples/sec   Loss 26.3613   LearningRate 0.0979   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:06,048-Speed 5683.09 samples/sec   Loss 26.3372   LearningRate 0.0979   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:07,856-Speed 5665.79 samples/sec   Loss 26.1279   LearningRate 0.0979   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:09,643-Speed 5733.30 samples/sec   Loss 26.0724   LearningRate 0.0979   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:11,448-Speed 5675.07 samples/sec   Loss 25.8484   LearningRate 0.0978   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:13,253-Speed 5678.31 samples/sec   Loss 25.7585   LearningRate 0.0978   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:15,079-Speed 5608.86 samples/sec   Loss 25.5395   LearningRate 0.0978   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:16,892-Speed 5648.37 samples/sec   Loss 25.3461   LearningRate 0.0978   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:18,725-Speed 5588.87 samples/sec   Loss 25.2559   LearningRate 0.0978   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:20,575-Speed 5538.12 samples/sec   Loss 25.1085   LearningRate 0.0978   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:22,373-Speed 5699.20 samples/sec   Loss 24.9766   LearningRate 0.0977   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:24,189-Speed 5640.46 samples/sec   Loss 24.9944   LearningRate 0.0977   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:25,972-Speed 5746.87 samples/sec   Loss 24.5523   LearningRate 0.0977   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:27,799-Speed 5606.16 samples/sec   Loss 24.6014   LearningRate 0.0977   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:29,653-Speed 5527.19 samples/sec   Loss 24.4833   LearningRate 0.0977   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:31,464-Speed 5656.09 samples/sec   Loss 24.3917   LearningRate 0.0977   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:33,252-Speed 5731.19 samples/sec   Loss 24.1957   LearningRate 0.0976   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:35,044-Speed 5715.23 samples/sec   Loss 24.1251   LearningRate 0.0976   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:36,833-Speed 5726.18 samples/sec   Loss 23.9014   LearningRate 0.0976   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:38,662-Speed 5601.21 samples/sec   Loss 23.7837   LearningRate 0.0976   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:40,466-Speed 5681.01 samples/sec   Loss 23.8481   LearningRate 0.0976   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:42,279-Speed 5649.30 samples/sec   Loss 23.6532   LearningRate 0.0976   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:44,077-Speed 5698.08 samples/sec   Loss 23.3556   LearningRate 0.0975   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:45,858-Speed 5753.03 samples/sec   Loss 23.3261   LearningRate 0.0975   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:47,653-Speed 5708.74 samples/sec   Loss 23.2658   LearningRate 0.0975   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:49,438-Speed 5737.30 samples/sec   Loss 23.0196   LearningRate 0.0975   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:51,216-Speed 5763.30 samples/sec   Loss 22.9705   LearningRate 0.0975   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:53,046-Speed 5598.06 samples/sec   Loss 22.9133   LearningRate 0.0974   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:32:54,842-Speed 5703.78 samples/sec   Loss 22.7707   LearningRate 0.0974   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:56,649-Speed 5669.96 samples/sec   Loss 22.5527   LearningRate 0.0974   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:32:58,456-Speed 5668.04 samples/sec   Loss 22.8149   LearningRate 0.0974   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:00,274-Speed 5636.18 samples/sec   Loss 22.4662   LearningRate 0.0974   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:02,124-Speed 5536.62 samples/sec   Loss 22.3601   LearningRate 0.0974   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:03,960-Speed 5582.75 samples/sec   Loss 22.2294   LearningRate 0.0973   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:05,819-Speed 5509.12 samples/sec   Loss 22.1268   LearningRate 0.0973   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:07,634-Speed 5644.80 samples/sec   Loss 22.0232   LearningRate 0.0973   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:09,431-Speed 5702.16 samples/sec   Loss 21.8800   LearningRate 0.0973   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:11,249-Speed 5636.45 samples/sec   Loss 21.6714   LearningRate 0.0973   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:13,044-Speed 5707.05 samples/sec   Loss 21.5479   LearningRate 0.0973   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:14,849-Speed 5676.57 samples/sec   Loss 21.7890   LearningRate 0.0972   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:16,716-Speed 5485.03 samples/sec   Loss 21.4092   LearningRate 0.0972   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:18,555-Speed 5571.77 samples/sec   Loss 21.3742   LearningRate 0.0972   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:20,425-Speed 5477.48 samples/sec   Loss 21.4461   LearningRate 0.0972   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:22,265-Speed 5568.26 samples/sec   Loss 21.3889   LearningRate 0.0972   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:24,064-Speed 5697.26 samples/sec   Loss 21.2927   LearningRate 0.0972   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:25,870-Speed 5672.53 samples/sec   Loss 21.0575   LearningRate 0.0971   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:27,655-Speed 5736.93 samples/sec   Loss 21.1780   LearningRate 0.0971   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:29,499-Speed 5555.67 samples/sec   Loss 20.9113   LearningRate 0.0971   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:31,325-Speed 5608.96 samples/sec   Loss 20.9050   LearningRate 0.0971   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:33,136-Speed 5657.85 samples/sec   Loss 20.6957   LearningRate 0.0971   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:34,958-Speed 5623.96 samples/sec   Loss 20.6057   LearningRate 0.0970   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:36,756-Speed 5697.64 samples/sec   Loss 20.5681   LearningRate 0.0970   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:38,559-Speed 5682.21 samples/sec   Loss 20.6389   LearningRate 0.0970   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:40,342-Speed 5745.56 samples/sec   Loss 20.4193   LearningRate 0.0970   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:42,152-Speed 5662.49 samples/sec   Loss 20.4045   LearningRate 0.0970   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:43,948-Speed 5703.15 samples/sec   Loss 20.1147   LearningRate 0.0970   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:33:45,750-Speed 5683.34 samples/sec   Loss 20.2711   LearningRate 0.0969   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:47,558-Speed 5668.14 samples/sec   Loss 20.1070   LearningRate 0.0969   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:49,406-Speed 5542.95 samples/sec   Loss 20.0374   LearningRate 0.0969   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:51,266-Speed 5507.37 samples/sec   Loss 19.8804   LearningRate 0.0969   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:53,133-Speed 5488.70 samples/sec   Loss 19.8550   LearningRate 0.0969   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:54,928-Speed 5707.96 samples/sec   Loss 19.8572   LearningRate 0.0969   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:56,738-Speed 5661.71 samples/sec   Loss 19.7349   LearningRate 0.0968   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:33:58,558-Speed 5629.07 samples/sec   Loss 19.6512   LearningRate 0.0968   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:34:00,359-Speed 5688.75 samples/sec   Loss 19.7015   LearningRate 0.0968   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:34:02,202-Speed 5556.81 samples/sec   Loss 19.5028   LearningRate 0.0968   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:34:04,020-Speed 5635.30 samples/sec   Loss 19.5194   LearningRate 0.0968   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:05,821-Speed 5690.64 samples/sec   Loss 19.4473   LearningRate 0.0968   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:07,615-Speed 5708.57 samples/sec   Loss 19.5072   LearningRate 0.0967   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:09,455-Speed 5568.59 samples/sec   Loss 19.2960   LearningRate 0.0967   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:11,275-Speed 5629.91 samples/sec   Loss 19.0530   LearningRate 0.0967   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:13,076-Speed 5686.48 samples/sec   Loss 19.1925   LearningRate 0.0967   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:14,879-Speed 5683.96 samples/sec   Loss 18.8920   LearningRate 0.0967   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:16,712-Speed 5587.07 samples/sec   Loss 18.9327   LearningRate 0.0967   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:18,524-Speed 5653.95 samples/sec   Loss 19.0520   LearningRate 0.0966   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:20,333-Speed 5662.96 samples/sec   Loss 18.7798   LearningRate 0.0966   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:22,138-Speed 5677.25 samples/sec   Loss 18.8671   LearningRate 0.0966   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:34:23,957-Speed 5631.32 samples/sec   Loss 18.8507   LearningRate 0.0966   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:25,784-Speed 5609.18 samples/sec   Loss 18.6976   LearningRate 0.0966   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:27,615-Speed 5593.43 samples/sec   Loss 18.5995   LearningRate 0.0965   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:29,447-Speed 5591.64 samples/sec   Loss 18.7519   LearningRate 0.0965   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:34:31,241-Speed 5711.45 samples/sec   Loss 18.6280   LearningRate 0.0965   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:35:00,033-[lfw][2000]XNorm: 20.626767
Training: 2022-04-27 01:35:00,033-[lfw][2000]Accuracy-Flip: 0.98117+-0.00597
Training: 2022-04-27 01:35:00,033-[lfw][2000]Accuracy-Highest: 0.98117
Training: 2022-04-27 01:35:30,987-[cfp_fp][2000]XNorm: 17.467478
Training: 2022-04-27 01:35:30,987-[cfp_fp][2000]Accuracy-Flip: 0.77686+-0.01619
Training: 2022-04-27 01:35:30,988-[cfp_fp][2000]Accuracy-Highest: 0.77686
Training: 2022-04-27 01:35:57,530-[agedb_30][2000]XNorm: 20.021395
Training: 2022-04-27 01:35:57,531-[agedb_30][2000]Accuracy-Flip: 0.89350+-0.02585
Training: 2022-04-27 01:35:57,532-[agedb_30][2000]Accuracy-Highest: 0.89350
Training: 2022-04-27 01:35:59,397-Speed 116.16 samples/sec   Loss 18.3722   LearningRate 0.0965   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:01,213-Speed 5641.63 samples/sec   Loss 18.2830   LearningRate 0.0965   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:03,047-Speed 5586.62 samples/sec   Loss 18.2472   LearningRate 0.0965   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:04,855-Speed 5667.01 samples/sec   Loss 18.3900   LearningRate 0.0964   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:06,690-Speed 5580.39 samples/sec   Loss 18.1385   LearningRate 0.0964   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:08,544-Speed 5528.78 samples/sec   Loss 18.3360   LearningRate 0.0964   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:10,411-Speed 5486.35 samples/sec   Loss 18.5458   LearningRate 0.0964   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:12,276-Speed 5492.72 samples/sec   Loss 18.2607   LearningRate 0.0964   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:14,150-Speed 5467.99 samples/sec   Loss 18.0803   LearningRate 0.0964   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:15,947-Speed 5698.32 samples/sec   Loss 18.2470   LearningRate 0.0963   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:17,753-Speed 5672.66 samples/sec   Loss 17.9455   LearningRate 0.0963   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:19,538-Speed 5739.74 samples/sec   Loss 17.7721   LearningRate 0.0963   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:21,330-Speed 5717.43 samples/sec   Loss 17.7421   LearningRate 0.0963   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:23,143-Speed 5651.35 samples/sec   Loss 17.6527   LearningRate 0.0963   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:24,933-Speed 5723.58 samples/sec   Loss 17.5630   LearningRate 0.0963   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:26,727-Speed 5708.87 samples/sec   Loss 17.5829   LearningRate 0.0962   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:28,547-Speed 5632.39 samples/sec   Loss 17.6403   LearningRate 0.0962   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:30,341-Speed 5710.66 samples/sec   Loss 17.4217   LearningRate 0.0962   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:32,126-Speed 5739.03 samples/sec   Loss 17.5170   LearningRate 0.0962   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:33,937-Speed 5660.62 samples/sec   Loss 17.5091   LearningRate 0.0962   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:35,720-Speed 5746.38 samples/sec   Loss 17.4380   LearningRate 0.0962   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:37,523-Speed 5679.62 samples/sec   Loss 17.2893   LearningRate 0.0961   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:39,335-Speed 5657.40 samples/sec   Loss 17.2331   LearningRate 0.0961   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:41,131-Speed 5703.93 samples/sec   Loss 17.2973   LearningRate 0.0961   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:36:42,942-Speed 5657.25 samples/sec   Loss 17.2356   LearningRate 0.0961   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:44,730-Speed 5730.21 samples/sec   Loss 17.3028   LearningRate 0.0961   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:46,557-Speed 5611.03 samples/sec   Loss 17.1295   LearningRate 0.0960   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:48,372-Speed 5646.54 samples/sec   Loss 17.0288   LearningRate 0.0960   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:50,199-Speed 5607.84 samples/sec   Loss 17.0225   LearningRate 0.0960   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:51,996-Speed 5701.16 samples/sec   Loss 16.8710   LearningRate 0.0960   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:53,785-Speed 5725.42 samples/sec   Loss 17.0499   LearningRate 0.0960   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:55,576-Speed 5718.00 samples/sec   Loss 16.8542   LearningRate 0.0960   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:57,374-Speed 5699.16 samples/sec   Loss 17.0558   LearningRate 0.0959   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:36:59,179-Speed 5678.37 samples/sec   Loss 16.7291   LearningRate 0.0959   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:00,989-Speed 5661.18 samples/sec   Loss 16.6541   LearningRate 0.0959   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:02,794-Speed 5673.27 samples/sec   Loss 16.7534   LearningRate 0.0959   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:04,605-Speed 5657.81 samples/sec   Loss 16.7782   LearningRate 0.0959   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:06,389-Speed 5741.24 samples/sec   Loss 16.8834   LearningRate 0.0959   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:08,190-Speed 5689.88 samples/sec   Loss 16.7004   LearningRate 0.0958   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:09,994-Speed 5677.93 samples/sec   Loss 16.5012   LearningRate 0.0958   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:11,782-Speed 5730.96 samples/sec   Loss 16.6023   LearningRate 0.0958   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:13,571-Speed 5726.84 samples/sec   Loss 16.4872   LearningRate 0.0958   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:15,353-Speed 5750.67 samples/sec   Loss 16.5775   LearningRate 0.0958   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:17,173-Speed 5629.39 samples/sec   Loss 16.6165   LearningRate 0.0958   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:18,951-Speed 5761.46 samples/sec   Loss 16.4181   LearningRate 0.0957   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:20,760-Speed 5663.56 samples/sec   Loss 16.4520   LearningRate 0.0957   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:22,563-Speed 5680.94 samples/sec   Loss 16.2132   LearningRate 0.0957   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:24,349-Speed 5736.18 samples/sec   Loss 16.1577   LearningRate 0.0957   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:26,141-Speed 5716.57 samples/sec   Loss 16.2249   LearningRate 0.0957   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:27,928-Speed 5733.62 samples/sec   Loss 16.1782   LearningRate 0.0957   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:29,751-Speed 5621.66 samples/sec   Loss 16.3545   LearningRate 0.0956   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:31,551-Speed 5693.16 samples/sec   Loss 16.2255   LearningRate 0.0956   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:33,347-Speed 5702.91 samples/sec   Loss 16.1851   LearningRate 0.0956   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:35,155-Speed 5690.39 samples/sec   Loss 15.9891   LearningRate 0.0956   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:36,962-Speed 5671.33 samples/sec   Loss 16.1700   LearningRate 0.0956   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:38,787-Speed 5612.41 samples/sec   Loss 16.1010   LearningRate 0.0955   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:40,572-Speed 5739.81 samples/sec   Loss 16.0645   LearningRate 0.0955   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:42,367-Speed 5706.58 samples/sec   Loss 15.9780   LearningRate 0.0955   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:44,186-Speed 5632.82 samples/sec   Loss 15.8056   LearningRate 0.0955   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:45,996-Speed 5659.87 samples/sec   Loss 16.0167   LearningRate 0.0955   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:47,782-Speed 5735.08 samples/sec   Loss 15.8945   LearningRate 0.0955   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:49,584-Speed 5687.84 samples/sec   Loss 15.8537   LearningRate 0.0954   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:51,377-Speed 5713.52 samples/sec   Loss 15.7436   LearningRate 0.0954   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:53,159-Speed 5749.43 samples/sec   Loss 15.7922   LearningRate 0.0954   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:54,970-Speed 5655.36 samples/sec   Loss 15.8897   LearningRate 0.0954   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:37:56,783-Speed 5651.12 samples/sec   Loss 15.6177   LearningRate 0.0954   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:37:58,573-Speed 5721.72 samples/sec   Loss 15.7162   LearningRate 0.0954   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:00,372-Speed 5695.53 samples/sec   Loss 15.6855   LearningRate 0.0953   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:02,161-Speed 5726.68 samples/sec   Loss 15.7855   LearningRate 0.0953   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:03,966-Speed 5689.08 samples/sec   Loss 15.8962   LearningRate 0.0953   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:05,758-Speed 5716.65 samples/sec   Loss 15.5118   LearningRate 0.0953   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:07,548-Speed 5723.06 samples/sec   Loss 15.6303   LearningRate 0.0953   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:09,349-Speed 5688.09 samples/sec   Loss 15.6411   LearningRate 0.0953   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:11,138-Speed 5726.96 samples/sec   Loss 15.4358   LearningRate 0.0952   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:12,949-Speed 5698.70 samples/sec   Loss 15.4013   LearningRate 0.0952   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:14,739-Speed 5724.68 samples/sec   Loss 15.4621   LearningRate 0.0952   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:16,571-Speed 5590.99 samples/sec   Loss 15.4899   LearningRate 0.0952   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:18,404-Speed 5589.43 samples/sec   Loss 15.4873   LearningRate 0.0952   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:20,216-Speed 5661.39 samples/sec   Loss 15.5077   LearningRate 0.0952   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:22,035-Speed 5630.46 samples/sec   Loss 15.5228   LearningRate 0.0951   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:23,849-Speed 5648.78 samples/sec   Loss 15.3901   LearningRate 0.0951   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:25,636-Speed 5730.20 samples/sec   Loss 15.3192   LearningRate 0.0951   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:27,425-Speed 5727.77 samples/sec   Loss 15.3243   LearningRate 0.0951   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:29,223-Speed 5696.26 samples/sec   Loss 15.4703   LearningRate 0.0951   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:31,026-Speed 5683.16 samples/sec   Loss 15.2768   LearningRate 0.0951   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:32,830-Speed 5704.38 samples/sec   Loss 15.1758   LearningRate 0.0950   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:38:34,613-Speed 5746.34 samples/sec   Loss 15.0984   LearningRate 0.0950   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:36,396-Speed 5746.25 samples/sec   Loss 15.1071   LearningRate 0.0950   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:38,186-Speed 5723.16 samples/sec   Loss 14.8835   LearningRate 0.0950   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:39,969-Speed 5747.54 samples/sec   Loss 15.0346   LearningRate 0.0950   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:41,777-Speed 5664.77 samples/sec   Loss 15.3479   LearningRate 0.0949   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:43,598-Speed 5634.59 samples/sec   Loss 14.8801   LearningRate 0.0949   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:45,411-Speed 5648.42 samples/sec   Loss 15.0555   LearningRate 0.0949   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:47,205-Speed 5710.99 samples/sec   Loss 15.1311   LearningRate 0.0949   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:49,027-Speed 5624.00 samples/sec   Loss 15.0430   LearningRate 0.0949   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:50,854-Speed 5625.23 samples/sec   Loss 14.9190   LearningRate 0.0949   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:52,664-Speed 5661.20 samples/sec   Loss 14.9401   LearningRate 0.0948   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:54,484-Speed 5625.92 samples/sec   Loss 15.1299   LearningRate 0.0948   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:56,274-Speed 5724.18 samples/sec   Loss 14.9492   LearningRate 0.0948   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:58,084-Speed 5660.36 samples/sec   Loss 14.9079   LearningRate 0.0948   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:38:59,927-Speed 5558.86 samples/sec   Loss 14.8103   LearningRate 0.0948   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:01,713-Speed 5735.60 samples/sec   Loss 14.9088   LearningRate 0.0948   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:03,539-Speed 5630.78 samples/sec   Loss 14.9134   LearningRate 0.0947   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:05,341-Speed 5685.98 samples/sec   Loss 15.0781   LearningRate 0.0947   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:07,121-Speed 5754.44 samples/sec   Loss 14.8235   LearningRate 0.0947   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:08,913-Speed 5714.64 samples/sec   Loss 14.8098   LearningRate 0.0947   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:10,720-Speed 5672.00 samples/sec   Loss 14.7361   LearningRate 0.0947   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:12,519-Speed 5693.26 samples/sec   Loss 14.6106   LearningRate 0.0947   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:14,338-Speed 5690.85 samples/sec   Loss 14.6268   LearningRate 0.0946   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:16,135-Speed 5700.38 samples/sec   Loss 14.6599   LearningRate 0.0946   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:39:17,944-Speed 5660.62 samples/sec   Loss 14.7569   LearningRate 0.0946   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:19,742-Speed 5700.14 samples/sec   Loss 14.7106   LearningRate 0.0946   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:21,564-Speed 5653.16 samples/sec   Loss 14.6159   LearningRate 0.0946   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:23,372-Speed 5664.32 samples/sec   Loss 14.6502   LearningRate 0.0946   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:25,203-Speed 5595.44 samples/sec   Loss 14.4723   LearningRate 0.0945   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:27,015-Speed 5653.28 samples/sec   Loss 14.5233   LearningRate 0.0945   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:28,825-Speed 5660.93 samples/sec   Loss 14.4528   LearningRate 0.0945   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:30,672-Speed 5547.76 samples/sec   Loss 14.6016   LearningRate 0.0945   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:32,482-Speed 5658.35 samples/sec   Loss 14.4854   LearningRate 0.0945   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:34,337-Speed 5626.94 samples/sec   Loss 14.5896   LearningRate 0.0945   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:39:36,214-Speed 5456.79 samples/sec   Loss 14.4261   LearningRate 0.0944   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:39:38,025-Speed 5657.21 samples/sec   Loss 14.3774   LearningRate 0.0944   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:39:39,822-Speed 5701.94 samples/sec   Loss 14.3545   LearningRate 0.0944   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:41,603-Speed 5753.09 samples/sec   Loss 14.2725   LearningRate 0.0944   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:43,426-Speed 5639.92 samples/sec   Loss 14.5015   LearningRate 0.0944   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:45,210-Speed 5743.61 samples/sec   Loss 14.2725   LearningRate 0.0943   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:47,032-Speed 5623.24 samples/sec   Loss 14.2288   LearningRate 0.0943   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:48,831-Speed 5693.74 samples/sec   Loss 14.2481   LearningRate 0.0943   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:50,653-Speed 5671.25 samples/sec   Loss 14.4050   LearningRate 0.0943   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:52,445-Speed 5715.74 samples/sec   Loss 14.3579   LearningRate 0.0943   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:54,239-Speed 5712.47 samples/sec   Loss 14.1832   LearningRate 0.0943   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:56,042-Speed 5682.36 samples/sec   Loss 14.2983   LearningRate 0.0942   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:57,838-Speed 5700.97 samples/sec   Loss 14.1763   LearningRate 0.0942   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:39:59,649-Speed 5658.60 samples/sec   Loss 14.2785   LearningRate 0.0942   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:01,481-Speed 5590.40 samples/sec   Loss 14.1675   LearningRate 0.0942   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:03,284-Speed 5700.67 samples/sec   Loss 14.0374   LearningRate 0.0942   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:05,107-Speed 5618.99 samples/sec   Loss 14.3008   LearningRate 0.0942   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:06,905-Speed 5697.36 samples/sec   Loss 14.1350   LearningRate 0.0941   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:08,718-Speed 5651.04 samples/sec   Loss 14.1431   LearningRate 0.0941   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:10,540-Speed 5624.11 samples/sec   Loss 14.0206   LearningRate 0.0941   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:12,321-Speed 5750.66 samples/sec   Loss 14.1414   LearningRate 0.0941   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:14,136-Speed 5647.43 samples/sec   Loss 14.1572   LearningRate 0.0941   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:15,939-Speed 5680.12 samples/sec   Loss 13.8228   LearningRate 0.0941   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:17,732-Speed 5716.05 samples/sec   Loss 14.1407   LearningRate 0.0940   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:19,533-Speed 5688.07 samples/sec   Loss 14.0596   LearningRate 0.0940   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:21,336-Speed 5702.70 samples/sec   Loss 13.8486   LearningRate 0.0940   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:23,123-Speed 5731.07 samples/sec   Loss 14.1734   LearningRate 0.0940   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:24,920-Speed 5700.47 samples/sec   Loss 14.0695   LearningRate 0.0940   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:26,719-Speed 5697.25 samples/sec   Loss 13.8443   LearningRate 0.0940   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:28,502-Speed 5743.63 samples/sec   Loss 14.1173   LearningRate 0.0939   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:30,291-Speed 5726.08 samples/sec   Loss 14.1101   LearningRate 0.0939   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:32,081-Speed 5723.42 samples/sec   Loss 13.9457   LearningRate 0.0939   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:33,884-Speed 5713.08 samples/sec   Loss 14.0378   LearningRate 0.0939   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:40:35,674-Speed 5722.39 samples/sec   Loss 13.8730   LearningRate 0.0939   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:37,474-Speed 5693.18 samples/sec   Loss 13.7425   LearningRate 0.0939   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:39,261-Speed 5734.25 samples/sec   Loss 13.8816   LearningRate 0.0938   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:41,052-Speed 5719.05 samples/sec   Loss 13.8947   LearningRate 0.0938   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:42,863-Speed 5657.35 samples/sec   Loss 13.8119   LearningRate 0.0938   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:44,675-Speed 5671.58 samples/sec   Loss 13.5636   LearningRate 0.0938   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:46,474-Speed 5720.62 samples/sec   Loss 13.4996   LearningRate 0.0938   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:48,284-Speed 5660.00 samples/sec   Loss 13.8682   LearningRate 0.0938   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:50,099-Speed 5643.33 samples/sec   Loss 13.9597   LearningRate 0.0937   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:51,921-Speed 5656.38 samples/sec   Loss 13.7929   LearningRate 0.0937   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:53,739-Speed 5635.59 samples/sec   Loss 13.6289   LearningRate 0.0937   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:55,528-Speed 5725.66 samples/sec   Loss 13.8986   LearningRate 0.0937   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:57,343-Speed 5643.06 samples/sec   Loss 13.7506   LearningRate 0.0937   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:40:59,153-Speed 5657.99 samples/sec   Loss 13.7127   LearningRate 0.0936   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:00,962-Speed 5665.55 samples/sec   Loss 13.7760   LearningRate 0.0936   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:02,781-Speed 5632.51 samples/sec   Loss 13.7209   LearningRate 0.0936   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:04,590-Speed 5717.54 samples/sec   Loss 13.7394   LearningRate 0.0936   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:06,378-Speed 5727.61 samples/sec   Loss 13.6975   LearningRate 0.0936   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:08,177-Speed 5695.63 samples/sec   Loss 13.7584   LearningRate 0.0936   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:09,972-Speed 5706.28 samples/sec   Loss 13.3949   LearningRate 0.0935   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:11,783-Speed 5655.79 samples/sec   Loss 13.6693   LearningRate 0.0935   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:13,576-Speed 5742.67 samples/sec   Loss 13.5243   LearningRate 0.0935   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:15,361-Speed 5739.10 samples/sec   Loss 13.7714   LearningRate 0.0935   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:17,169-Speed 5664.65 samples/sec   Loss 13.8382   LearningRate 0.0935   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:18,959-Speed 5724.89 samples/sec   Loss 13.6976   LearningRate 0.0935   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:20,786-Speed 5625.79 samples/sec   Loss 13.3342   LearningRate 0.0934   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:22,592-Speed 5670.55 samples/sec   Loss 13.5139   LearningRate 0.0934   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:24,429-Speed 5578.98 samples/sec   Loss 13.4156   LearningRate 0.0934   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:41:26,257-Speed 5603.64 samples/sec   Loss 13.4666   LearningRate 0.0934   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:28,057-Speed 5692.22 samples/sec   Loss 13.3318   LearningRate 0.0934   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:29,886-Speed 5599.71 samples/sec   Loss 13.5268   LearningRate 0.0934   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:31,698-Speed 5654.19 samples/sec   Loss 13.4212   LearningRate 0.0933   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:33,491-Speed 5735.62 samples/sec   Loss 13.5047   LearningRate 0.0933   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:35,287-Speed 5702.77 samples/sec   Loss 13.5762   LearningRate 0.0933   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:37,084-Speed 5697.91 samples/sec   Loss 13.5844   LearningRate 0.0933   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:38,876-Speed 5717.36 samples/sec   Loss 13.2985   LearningRate 0.0933   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:40,694-Speed 5635.47 samples/sec   Loss 13.4698   LearningRate 0.0933   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:42,483-Speed 5726.21 samples/sec   Loss 13.5311   LearningRate 0.0932   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:44,290-Speed 5705.43 samples/sec   Loss 13.2960   LearningRate 0.0932   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:46,087-Speed 5699.78 samples/sec   Loss 13.3494   LearningRate 0.0932   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:47,872-Speed 5742.96 samples/sec   Loss 13.3692   LearningRate 0.0932   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:49,720-Speed 5622.20 samples/sec   Loss 13.3668   LearningRate 0.0932   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:51,524-Speed 5678.99 samples/sec   Loss 13.5243   LearningRate 0.0932   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:53,342-Speed 5637.41 samples/sec   Loss 13.2178   LearningRate 0.0931   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:55,137-Speed 5706.86 samples/sec   Loss 13.3711   LearningRate 0.0931   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:56,974-Speed 5615.31 samples/sec   Loss 13.4024   LearningRate 0.0931   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:41:58,778-Speed 5679.26 samples/sec   Loss 13.2602   LearningRate 0.0931   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:42:25,482-[lfw][4000]XNorm: 21.498628
Training: 2022-04-27 01:42:25,506-[lfw][4000]Accuracy-Flip: 0.98983+-0.00425
Training: 2022-04-27 01:42:25,507-[lfw][4000]Accuracy-Highest: 0.98983
Training: 2022-04-27 01:42:56,338-[cfp_fp][4000]XNorm: 18.556288
Training: 2022-04-27 01:42:56,340-[cfp_fp][4000]Accuracy-Flip: 0.86286+-0.01287
Training: 2022-04-27 01:42:56,341-[cfp_fp][4000]Accuracy-Highest: 0.86286
Training: 2022-04-27 01:43:22,866-[agedb_30][4000]XNorm: 21.040543
Training: 2022-04-27 01:43:22,936-[agedb_30][4000]Accuracy-Flip: 0.93167+-0.01412
Training: 2022-04-27 01:43:22,936-[agedb_30][4000]Accuracy-Highest: 0.93167
Training: 2022-04-27 01:43:24,738-Speed 119.13 samples/sec   Loss 13.2713   LearningRate 0.0931   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:26,553-Speed 5645.25 samples/sec   Loss 13.2576   LearningRate 0.0931   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:43:28,377-Speed 5616.87 samples/sec   Loss 13.2427   LearningRate 0.0930   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:30,190-Speed 5650.72 samples/sec   Loss 13.2277   LearningRate 0.0930   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:31,987-Speed 5708.39 samples/sec   Loss 13.2277   LearningRate 0.0930   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:33,784-Speed 5700.63 samples/sec   Loss 13.1456   LearningRate 0.0930   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:35,567-Speed 5745.03 samples/sec   Loss 13.1526   LearningRate 0.0930   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:37,365-Speed 5695.96 samples/sec   Loss 13.0568   LearningRate 0.0930   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:39,202-Speed 5578.33 samples/sec   Loss 13.2792   LearningRate 0.0929   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:41,105-Speed 5383.24 samples/sec   Loss 13.1169   LearningRate 0.0929   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:42,901-Speed 5701.79 samples/sec   Loss 13.1134   LearningRate 0.0929   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:44,693-Speed 5729.05 samples/sec   Loss 13.3161   LearningRate 0.0929   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:46,483-Speed 5725.73 samples/sec   Loss 13.0973   LearningRate 0.0929   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:43:48,265-Speed 5746.19 samples/sec   Loss 13.2733   LearningRate 0.0929   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:50,059-Speed 5731.72 samples/sec   Loss 12.9995   LearningRate 0.0928   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:51,866-Speed 5669.03 samples/sec   Loss 13.0621   LearningRate 0.0928   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:53,657-Speed 5719.84 samples/sec   Loss 13.0668   LearningRate 0.0928   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:55,452-Speed 5708.41 samples/sec   Loss 12.9115   LearningRate 0.0928   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:57,263-Speed 5686.64 samples/sec   Loss 12.9041   LearningRate 0.0928   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:43:59,080-Speed 5637.01 samples/sec   Loss 12.9927   LearningRate 0.0927   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:00,875-Speed 5706.98 samples/sec   Loss 13.1281   LearningRate 0.0927   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:02,678-Speed 5682.15 samples/sec   Loss 12.6766   LearningRate 0.0927   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:04,468-Speed 5724.09 samples/sec   Loss 13.0092   LearningRate 0.0927   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:06,254-Speed 5735.55 samples/sec   Loss 12.9966   LearningRate 0.0927   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:08,041-Speed 5732.82 samples/sec   Loss 13.0313   LearningRate 0.0927   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:09,842-Speed 5707.06 samples/sec   Loss 12.9197   LearningRate 0.0926   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:11,641-Speed 5695.60 samples/sec   Loss 12.8718   LearningRate 0.0926   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:13,477-Speed 5578.33 samples/sec   Loss 13.0006   LearningRate 0.0926   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:15,336-Speed 5510.57 samples/sec   Loss 13.0317   LearningRate 0.0926   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:17,143-Speed 5668.33 samples/sec   Loss 13.0785   LearningRate 0.0926   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:18,946-Speed 5683.60 samples/sec   Loss 12.7946   LearningRate 0.0926   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:20,736-Speed 5722.75 samples/sec   Loss 12.9522   LearningRate 0.0925   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:22,552-Speed 5642.98 samples/sec   Loss 12.8313   LearningRate 0.0925   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:24,363-Speed 5665.25 samples/sec   Loss 12.9213   LearningRate 0.0925   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:26,157-Speed 5710.05 samples/sec   Loss 12.9417   LearningRate 0.0925   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:27,945-Speed 5729.78 samples/sec   Loss 12.7878   LearningRate 0.0925   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:29,790-Speed 5551.05 samples/sec   Loss 12.8949   LearningRate 0.0925   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:31,616-Speed 5613.83 samples/sec   Loss 12.8251   LearningRate 0.0924   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:33,433-Speed 5637.72 samples/sec   Loss 12.9767   LearningRate 0.0924   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:35,243-Speed 5658.57 samples/sec   Loss 12.7866   LearningRate 0.0924   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:37,061-Speed 5636.22 samples/sec   Loss 12.7060   LearningRate 0.0924   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:38,857-Speed 5705.36 samples/sec   Loss 12.7602   LearningRate 0.0924   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:40,646-Speed 5726.22 samples/sec   Loss 12.8291   LearningRate 0.0924   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:42,443-Speed 5698.72 samples/sec   Loss 12.7631   LearningRate 0.0923   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:44,262-Speed 5634.36 samples/sec   Loss 12.7748   LearningRate 0.0923   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:44:46,081-Speed 5631.35 samples/sec   Loss 12.5993   LearningRate 0.0923   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:47,896-Speed 5644.73 samples/sec   Loss 12.5352   LearningRate 0.0923   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:49,687-Speed 5720.38 samples/sec   Loss 12.6741   LearningRate 0.0923   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:51,496-Speed 5662.56 samples/sec   Loss 12.5767   LearningRate 0.0923   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:53,302-Speed 5671.84 samples/sec   Loss 12.7922   LearningRate 0.0922   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:55,117-Speed 5647.85 samples/sec   Loss 12.7052   LearningRate 0.0922   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:56,937-Speed 5627.01 samples/sec   Loss 12.6796   LearningRate 0.0922   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:44:58,796-Speed 5510.37 samples/sec   Loss 12.7016   LearningRate 0.0922   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:00,590-Speed 5713.18 samples/sec   Loss 12.6243   LearningRate 0.0922   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:02,394-Speed 5683.33 samples/sec   Loss 12.6991   LearningRate 0.0922   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:04,214-Speed 5630.91 samples/sec   Loss 12.5648   LearningRate 0.0921   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:45:06,003-Speed 5726.63 samples/sec   Loss 12.4926   LearningRate 0.0921   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:07,820-Speed 5637.94 samples/sec   Loss 12.7147   LearningRate 0.0921   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:09,607-Speed 5731.73 samples/sec   Loss 12.6041   LearningRate 0.0921   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:11,395-Speed 5728.57 samples/sec   Loss 12.5444   LearningRate 0.0921   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:13,191-Speed 5705.52 samples/sec   Loss 12.7611   LearningRate 0.0921   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:14,996-Speed 5675.76 samples/sec   Loss 12.5239   LearningRate 0.0920   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:16,816-Speed 5628.70 samples/sec   Loss 12.4999   LearningRate 0.0920   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:18,604-Speed 5730.61 samples/sec   Loss 12.5114   LearningRate 0.0920   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:20,430-Speed 5608.82 samples/sec   Loss 12.5924   LearningRate 0.0920   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:22,234-Speed 5680.65 samples/sec   Loss 12.7173   LearningRate 0.0920   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:24,025-Speed 5719.19 samples/sec   Loss 12.6523   LearningRate 0.0920   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:25,821-Speed 5704.67 samples/sec   Loss 12.4798   LearningRate 0.0919   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:27,618-Speed 5700.31 samples/sec   Loss 12.5356   LearningRate 0.0919   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:29,441-Speed 5621.63 samples/sec   Loss 12.5079   LearningRate 0.0919   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:31,280-Speed 5572.27 samples/sec   Loss 12.5542   LearningRate 0.0919   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:33,093-Speed 5649.34 samples/sec   Loss 12.4672   LearningRate 0.0919   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:34,894-Speed 5687.34 samples/sec   Loss 12.5747   LearningRate 0.0919   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:36,709-Speed 5646.38 samples/sec   Loss 12.5108   LearningRate 0.0918   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:38,514-Speed 5672.58 samples/sec   Loss 12.4374   LearningRate 0.0918   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:40,356-Speed 5564.39 samples/sec   Loss 12.4691   LearningRate 0.0918   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:42,184-Speed 5602.71 samples/sec   Loss 12.3608   LearningRate 0.0918   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:43,982-Speed 5698.10 samples/sec   Loss 12.6885   LearningRate 0.0918   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:45:45,802-Speed 5629.99 samples/sec   Loss 12.3994   LearningRate 0.0918   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:47,612-Speed 5661.06 samples/sec   Loss 12.5682   LearningRate 0.0917   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:49,417-Speed 5673.84 samples/sec   Loss 12.3157   LearningRate 0.0917   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:51,224-Speed 5671.35 samples/sec   Loss 12.5211   LearningRate 0.0917   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:53,025-Speed 5686.19 samples/sec   Loss 12.4565   LearningRate 0.0917   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:54,823-Speed 5700.48 samples/sec   Loss 12.1828   LearningRate 0.0917   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:56,625-Speed 5683.59 samples/sec   Loss 12.4118   LearningRate 0.0917   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:45:58,457-Speed 5593.95 samples/sec   Loss 12.5142   LearningRate 0.0916   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:00,264-Speed 5669.75 samples/sec   Loss 12.3388   LearningRate 0.0916   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:02,083-Speed 5632.20 samples/sec   Loss 12.3123   LearningRate 0.0916   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:03,899-Speed 5642.17 samples/sec   Loss 12.4451   LearningRate 0.0916   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:05,721-Speed 5621.89 samples/sec   Loss 12.3494   LearningRate 0.0916   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:07,529-Speed 5664.86 samples/sec   Loss 12.6278   LearningRate 0.0916   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:09,329-Speed 5691.85 samples/sec   Loss 12.4653   LearningRate 0.0915   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:11,124-Speed 5707.87 samples/sec   Loss 12.2095   LearningRate 0.0915   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:12,932-Speed 5667.65 samples/sec   Loss 12.2476   LearningRate 0.0915   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:14,787-Speed 5521.33 samples/sec   Loss 12.4260   LearningRate 0.0915   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:16,591-Speed 5680.63 samples/sec   Loss 12.3417   LearningRate 0.0915   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:18,383-Speed 5714.36 samples/sec   Loss 12.3979   LearningRate 0.0915   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:20,176-Speed 5714.20 samples/sec   Loss 12.3522   LearningRate 0.0914   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:21,967-Speed 5721.50 samples/sec   Loss 12.3673   LearningRate 0.0914   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:23,760-Speed 5710.88 samples/sec   Loss 12.2655   LearningRate 0.0914   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:25,553-Speed 5715.73 samples/sec   Loss 12.1534   LearningRate 0.0914   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:27,346-Speed 5711.02 samples/sec   Loss 12.2785   LearningRate 0.0914   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:29,132-Speed 5737.79 samples/sec   Loss 12.2490   LearningRate 0.0913   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:30,922-Speed 5724.79 samples/sec   Loss 12.2661   LearningRate 0.0913   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:32,729-Speed 5668.91 samples/sec   Loss 12.3663   LearningRate 0.0913   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:34,514-Speed 5739.03 samples/sec   Loss 12.1940   LearningRate 0.0913   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:36,299-Speed 5738.11 samples/sec   Loss 12.1720   LearningRate 0.0913   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:38,101-Speed 5688.02 samples/sec   Loss 12.2789   LearningRate 0.0913   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:39,890-Speed 5725.31 samples/sec   Loss 12.0730   LearningRate 0.0912   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:41,691-Speed 5688.23 samples/sec   Loss 12.1125   LearningRate 0.0912   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:43,488-Speed 5702.38 samples/sec   Loss 12.1093   LearningRate 0.0912   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:45,297-Speed 5660.57 samples/sec   Loss 12.2699   LearningRate 0.0912   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:47,141-Speed 5558.09 samples/sec   Loss 12.1064   LearningRate 0.0912   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:48,926-Speed 5738.09 samples/sec   Loss 12.2239   LearningRate 0.0912   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:50,747-Speed 5627.01 samples/sec   Loss 12.1771   LearningRate 0.0911   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:52,562-Speed 5643.42 samples/sec   Loss 12.2539   LearningRate 0.0911   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:54,403-Speed 5567.21 samples/sec   Loss 12.2019   LearningRate 0.0911   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:46:56,223-Speed 5627.75 samples/sec   Loss 12.1189   LearningRate 0.0911   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:58,016-Speed 5713.48 samples/sec   Loss 12.1945   LearningRate 0.0911   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:46:59,820-Speed 5681.54 samples/sec   Loss 12.2827   LearningRate 0.0911   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:47:01,638-Speed 5635.89 samples/sec   Loss 12.2747   LearningRate 0.0910   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:47:03,432-Speed 5708.47 samples/sec   Loss 12.1566   LearningRate 0.0910   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:47:05,222-Speed 5722.42 samples/sec   Loss 12.1403   LearningRate 0.0910   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:47:07,032-Speed 5661.71 samples/sec   Loss 12.1642   LearningRate 0.0910   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:47:08,823-Speed 5719.76 samples/sec   Loss 11.9315   LearningRate 0.0910   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:10,624-Speed 5690.74 samples/sec   Loss 11.8864   LearningRate 0.0910   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:12,473-Speed 5538.80 samples/sec   Loss 12.2182   LearningRate 0.0909   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:14,299-Speed 5612.37 samples/sec   Loss 12.0049   LearningRate 0.0909   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:16,107-Speed 5666.08 samples/sec   Loss 12.2923   LearningRate 0.0909   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:17,937-Speed 5598.00 samples/sec   Loss 12.1841   LearningRate 0.0909   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:19,746-Speed 5664.56 samples/sec   Loss 12.1385   LearningRate 0.0909   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:21,532-Speed 5735.52 samples/sec   Loss 12.2674   LearningRate 0.0909   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:23,321-Speed 5727.00 samples/sec   Loss 12.1192   LearningRate 0.0908   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:25,135-Speed 5647.00 samples/sec   Loss 12.2730   LearningRate 0.0908   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:26,945-Speed 5660.02 samples/sec   Loss 12.1155   LearningRate 0.0908   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:28,735-Speed 5725.74 samples/sec   Loss 12.1033   LearningRate 0.0908   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:30,545-Speed 5660.89 samples/sec   Loss 12.0922   LearningRate 0.0908   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:32,328-Speed 5744.10 samples/sec   Loss 12.0104   LearningRate 0.0908   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:34,123-Speed 5706.01 samples/sec   Loss 11.9554   LearningRate 0.0907   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:35,948-Speed 5615.88 samples/sec   Loss 12.0398   LearningRate 0.0907   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:37,755-Speed 5668.66 samples/sec   Loss 11.9698   LearningRate 0.0907   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:47:39,572-Speed 5637.60 samples/sec   Loss 12.0964   LearningRate 0.0907   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:41,375-Speed 5684.51 samples/sec   Loss 11.9936   LearningRate 0.0907   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:43,174-Speed 5694.28 samples/sec   Loss 12.0363   LearningRate 0.0907   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:44,995-Speed 5625.01 samples/sec   Loss 12.0406   LearningRate 0.0906   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:46,800-Speed 5674.26 samples/sec   Loss 12.0182   LearningRate 0.0906   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:48,603-Speed 5685.05 samples/sec   Loss 11.9138   LearningRate 0.0906   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:50,412-Speed 5661.96 samples/sec   Loss 11.9201   LearningRate 0.0906   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:52,226-Speed 5648.01 samples/sec   Loss 12.0156   LearningRate 0.0906   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:54,068-Speed 5560.56 samples/sec   Loss 12.0148   LearningRate 0.0906   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:55,871-Speed 5683.07 samples/sec   Loss 11.9307   LearningRate 0.0905   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:47:57,716-Speed 5552.68 samples/sec   Loss 11.8723   LearningRate 0.0905   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:47:59,538-Speed 5624.77 samples/sec   Loss 12.0674   LearningRate 0.0905   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:01,332-Speed 5710.18 samples/sec   Loss 11.9653   LearningRate 0.0905   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:03,145-Speed 5652.94 samples/sec   Loss 12.0083   LearningRate 0.0905   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:04,941-Speed 5701.97 samples/sec   Loss 11.8580   LearningRate 0.0905   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:06,792-Speed 5537.03 samples/sec   Loss 11.9321   LearningRate 0.0904   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:08,645-Speed 5526.59 samples/sec   Loss 11.8531   LearningRate 0.0904   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:10,487-Speed 5561.81 samples/sec   Loss 11.7932   LearningRate 0.0904   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:12,294-Speed 5670.26 samples/sec   Loss 11.9241   LearningRate 0.0904   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:14,110-Speed 5641.69 samples/sec   Loss 11.9544   LearningRate 0.0904   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:15,903-Speed 5714.44 samples/sec   Loss 11.7989   LearningRate 0.0904   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:17,762-Speed 5510.07 samples/sec   Loss 11.7006   LearningRate 0.0903   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:19,568-Speed 5671.63 samples/sec   Loss 11.6925   LearningRate 0.0903   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:21,372-Speed 5679.14 samples/sec   Loss 11.9825   LearningRate 0.0903   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:23,175-Speed 5681.21 samples/sec   Loss 11.7854   LearningRate 0.0903   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:24,984-Speed 5662.55 samples/sec   Loss 11.9105   LearningRate 0.0903   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:48:26,864-Speed 5450.73 samples/sec   Loss 11.9494   LearningRate 0.0903   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:38,650-Speed 868.88 samples/sec   Loss 11.3573   LearningRate 0.0902   Epoch: 1   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:40,471-Speed 5628.13 samples/sec   Loss 11.1112   LearningRate 0.0902   Epoch: 1   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:48:42,278-Speed 5669.84 samples/sec   Loss 11.0967   LearningRate 0.0902   Epoch: 1   Global Step: 5710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:44,229-Speed 5250.25 samples/sec   Loss 11.0840   LearningRate 0.0902   Epoch: 1   Global Step: 5720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:46,065-Speed 5580.16 samples/sec   Loss 10.9549   LearningRate 0.0902   Epoch: 1   Global Step: 5730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:47,907-Speed 5564.57 samples/sec   Loss 11.1733   LearningRate 0.0902   Epoch: 1   Global Step: 5740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:49,697-Speed 5721.58 samples/sec   Loss 11.1006   LearningRate 0.0901   Epoch: 1   Global Step: 5750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:51,514-Speed 5639.88 samples/sec   Loss 11.0302   LearningRate 0.0901   Epoch: 1   Global Step: 5760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:53,340-Speed 5611.90 samples/sec   Loss 11.0678   LearningRate 0.0901   Epoch: 1   Global Step: 5770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:55,143-Speed 5679.78 samples/sec   Loss 11.1510   LearningRate 0.0901   Epoch: 1   Global Step: 5780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:56,990-Speed 5548.34 samples/sec   Loss 11.1475   LearningRate 0.0901   Epoch: 1   Global Step: 5790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:48:58,817-Speed 5606.06 samples/sec   Loss 11.2589   LearningRate 0.0901   Epoch: 1   Global Step: 5800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 01:49:00,609-Speed 5717.82 samples/sec   Loss 11.1063   LearningRate 0.0900   Epoch: 1   Global Step: 5810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:02,448-Speed 5568.82 samples/sec   Loss 11.1103   LearningRate 0.0900   Epoch: 1   Global Step: 5820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:04,297-Speed 5542.76 samples/sec   Loss 11.2272   LearningRate 0.0900   Epoch: 1   Global Step: 5830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:06,143-Speed 5549.02 samples/sec   Loss 11.2709   LearningRate 0.0900   Epoch: 1   Global Step: 5840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:07,968-Speed 5611.91 samples/sec   Loss 11.2278   LearningRate 0.0900   Epoch: 1   Global Step: 5850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:09,813-Speed 5552.06 samples/sec   Loss 11.0519   LearningRate 0.0900   Epoch: 1   Global Step: 5860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:11,635-Speed 5623.34 samples/sec   Loss 11.2374   LearningRate 0.0899   Epoch: 1   Global Step: 5870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:13,443-Speed 5666.30 samples/sec   Loss 11.2196   LearningRate 0.0899   Epoch: 1   Global Step: 5880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:15,302-Speed 5508.52 samples/sec   Loss 11.2742   LearningRate 0.0899   Epoch: 1   Global Step: 5890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:17,165-Speed 5500.08 samples/sec   Loss 11.3558   LearningRate 0.0899   Epoch: 1   Global Step: 5900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:18,982-Speed 5636.96 samples/sec   Loss 11.2752   LearningRate 0.0899   Epoch: 1   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:49:20,805-Speed 5620.66 samples/sec   Loss 11.2280   LearningRate 0.0899   Epoch: 1   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:49:22,651-Speed 5550.87 samples/sec   Loss 11.3553   LearningRate 0.0898   Epoch: 1   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:49:24,516-Speed 5491.88 samples/sec   Loss 11.2856   LearningRate 0.0898   Epoch: 1   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:49:26,303-Speed 5734.21 samples/sec   Loss 11.5056   LearningRate 0.0898   Epoch: 1   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:49:28,124-Speed 5625.98 samples/sec   Loss 11.3685   LearningRate 0.0898   Epoch: 1   Global Step: 5960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:29,935-Speed 5657.06 samples/sec   Loss 11.2588   LearningRate 0.0898   Epoch: 1   Global Step: 5970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:31,720-Speed 5737.30 samples/sec   Loss 11.3085   LearningRate 0.0898   Epoch: 1   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:33,568-Speed 5545.00 samples/sec   Loss 11.2043   LearningRate 0.0897   Epoch: 1   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:49:35,407-Speed 5568.78 samples/sec   Loss 11.4039   LearningRate 0.0897   Epoch: 1   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:50:01,839-[lfw][6000]XNorm: 22.373225
Training: 2022-04-27 01:50:01,840-[lfw][6000]Accuracy-Flip: 0.99283+-0.00334
Training: 2022-04-27 01:50:01,840-[lfw][6000]Accuracy-Highest: 0.99283
Training: 2022-04-27 01:50:32,719-[cfp_fp][6000]XNorm: 18.730500
Training: 2022-04-27 01:50:32,720-[cfp_fp][6000]Accuracy-Flip: 0.88843+-0.01585
Training: 2022-04-27 01:50:32,720-[cfp_fp][6000]Accuracy-Highest: 0.88843
Training: 2022-04-27 01:50:59,279-[agedb_30][6000]XNorm: 21.276906
Training: 2022-04-27 01:50:59,280-[agedb_30][6000]Accuracy-Flip: 0.94550+-0.01085
Training: 2022-04-27 01:50:59,281-[agedb_30][6000]Accuracy-Highest: 0.94550
Training: 2022-04-27 01:51:01,112-Speed 119.48 samples/sec   Loss 11.0510   LearningRate 0.0897   Epoch: 1   Global Step: 6010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:02,903-Speed 5721.40 samples/sec   Loss 11.2775   LearningRate 0.0897   Epoch: 1   Global Step: 6020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:04,697-Speed 5708.85 samples/sec   Loss 11.3413   LearningRate 0.0897   Epoch: 1   Global Step: 6030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:06,480-Speed 5744.04 samples/sec   Loss 11.4747   LearningRate 0.0897   Epoch: 1   Global Step: 6040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:08,270-Speed 5723.67 samples/sec   Loss 11.2591   LearningRate 0.0896   Epoch: 1   Global Step: 6050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:10,105-Speed 5584.16 samples/sec   Loss 11.2510   LearningRate 0.0896   Epoch: 1   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:11,928-Speed 5618.88 samples/sec   Loss 11.3043   LearningRate 0.0896   Epoch: 1   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:13,730-Speed 5684.48 samples/sec   Loss 11.2923   LearningRate 0.0896   Epoch: 1   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:15,538-Speed 5672.24 samples/sec   Loss 11.3325   LearningRate 0.0896   Epoch: 1   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:17,339-Speed 5687.26 samples/sec   Loss 11.2201   LearningRate 0.0896   Epoch: 1   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:19,135-Speed 5706.38 samples/sec   Loss 11.2270   LearningRate 0.0895   Epoch: 1   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:20,937-Speed 5683.95 samples/sec   Loss 11.3511   LearningRate 0.0895   Epoch: 1   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:22,759-Speed 5621.14 samples/sec   Loss 11.3905   LearningRate 0.0895   Epoch: 1   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:24,570-Speed 5657.53 samples/sec   Loss 11.4273   LearningRate 0.0895   Epoch: 1   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:26,355-Speed 5739.66 samples/sec   Loss 11.4074   LearningRate 0.0895   Epoch: 1   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:28,180-Speed 5614.61 samples/sec   Loss 11.3813   LearningRate 0.0895   Epoch: 1   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:29,981-Speed 5685.86 samples/sec   Loss 11.3995   LearningRate 0.0894   Epoch: 1   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:31,770-Speed 5726.74 samples/sec   Loss 11.3266   LearningRate 0.0894   Epoch: 1   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:33,549-Speed 5761.06 samples/sec   Loss 11.2425   LearningRate 0.0894   Epoch: 1   Global Step: 6190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:35,360-Speed 5656.47 samples/sec   Loss 11.2715   LearningRate 0.0894   Epoch: 1   Global Step: 6200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:37,168-Speed 5665.73 samples/sec   Loss 11.2042   LearningRate 0.0894   Epoch: 1   Global Step: 6210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:38,989-Speed 5627.14 samples/sec   Loss 11.2692   LearningRate 0.0894   Epoch: 1   Global Step: 6220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:40,789-Speed 5691.71 samples/sec   Loss 11.4074   LearningRate 0.0893   Epoch: 1   Global Step: 6230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:42,599-Speed 5657.78 samples/sec   Loss 11.1975   LearningRate 0.0893   Epoch: 1   Global Step: 6240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:44,401-Speed 5686.85 samples/sec   Loss 11.3409   LearningRate 0.0893   Epoch: 1   Global Step: 6250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:46,186-Speed 5737.04 samples/sec   Loss 11.1876   LearningRate 0.0893   Epoch: 1   Global Step: 6260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:47,998-Speed 5654.17 samples/sec   Loss 11.4219   LearningRate 0.0893   Epoch: 1   Global Step: 6270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:49,810-Speed 5654.61 samples/sec   Loss 11.2799   LearningRate 0.0893   Epoch: 1   Global Step: 6280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:51:51,628-Speed 5634.46 samples/sec   Loss 11.1975   LearningRate 0.0892   Epoch: 1   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:53,436-Speed 5666.39 samples/sec   Loss 11.2433   LearningRate 0.0892   Epoch: 1   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:55,265-Speed 5598.50 samples/sec   Loss 11.2067   LearningRate 0.0892   Epoch: 1   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:57,082-Speed 5641.44 samples/sec   Loss 11.3396   LearningRate 0.0892   Epoch: 1   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:51:58,897-Speed 5642.95 samples/sec   Loss 11.2254   LearningRate 0.0892   Epoch: 1   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:00,749-Speed 5532.07 samples/sec   Loss 11.2959   LearningRate 0.0892   Epoch: 1   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:02,588-Speed 5568.11 samples/sec   Loss 11.2941   LearningRate 0.0891   Epoch: 1   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:04,379-Speed 5720.28 samples/sec   Loss 11.3416   LearningRate 0.0891   Epoch: 1   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:06,172-Speed 5714.77 samples/sec   Loss 11.2192   LearningRate 0.0891   Epoch: 1   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:07,983-Speed 5659.68 samples/sec   Loss 11.1959   LearningRate 0.0891   Epoch: 1   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:09,818-Speed 5582.61 samples/sec   Loss 11.2934   LearningRate 0.0891   Epoch: 1   Global Step: 6390   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:52:11,668-Speed 5538.06 samples/sec   Loss 11.1807   LearningRate 0.0891   Epoch: 1   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:13,483-Speed 5643.43 samples/sec   Loss 11.3378   LearningRate 0.0890   Epoch: 1   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:15,289-Speed 5675.63 samples/sec   Loss 11.2219   LearningRate 0.0890   Epoch: 1   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:17,094-Speed 5672.93 samples/sec   Loss 11.1728   LearningRate 0.0890   Epoch: 1   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:18,888-Speed 5711.59 samples/sec   Loss 11.2328   LearningRate 0.0890   Epoch: 1   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:20,703-Speed 5643.39 samples/sec   Loss 11.5090   LearningRate 0.0890   Epoch: 1   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:22,529-Speed 5611.60 samples/sec   Loss 11.3111   LearningRate 0.0890   Epoch: 1   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:24,411-Speed 5442.19 samples/sec   Loss 11.2294   LearningRate 0.0889   Epoch: 1   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:26,256-Speed 5554.28 samples/sec   Loss 11.2745   LearningRate 0.0889   Epoch: 1   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:28,068-Speed 5654.13 samples/sec   Loss 11.2603   LearningRate 0.0889   Epoch: 1   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:29,871-Speed 5680.58 samples/sec   Loss 11.3234   LearningRate 0.0889   Epoch: 1   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:31,668-Speed 5700.64 samples/sec   Loss 11.1353   LearningRate 0.0889   Epoch: 1   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:33,468-Speed 5693.41 samples/sec   Loss 11.2485   LearningRate 0.0889   Epoch: 1   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:35,286-Speed 5635.48 samples/sec   Loss 11.1149   LearningRate 0.0888   Epoch: 1   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:37,088-Speed 5683.44 samples/sec   Loss 11.2530   LearningRate 0.0888   Epoch: 1   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:38,902-Speed 5648.65 samples/sec   Loss 11.2261   LearningRate 0.0888   Epoch: 1   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:40,689-Speed 5734.17 samples/sec   Loss 11.3109   LearningRate 0.0888   Epoch: 1   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:42,502-Speed 5650.70 samples/sec   Loss 11.2756   LearningRate 0.0888   Epoch: 1   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:44,314-Speed 5653.08 samples/sec   Loss 11.3217   LearningRate 0.0888   Epoch: 1   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:46,122-Speed 5667.02 samples/sec   Loss 11.3796   LearningRate 0.0887   Epoch: 1   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:47,957-Speed 5582.76 samples/sec   Loss 11.4235   LearningRate 0.0887   Epoch: 1   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:49,783-Speed 5609.38 samples/sec   Loss 11.2762   LearningRate 0.0887   Epoch: 1   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:51,650-Speed 5487.13 samples/sec   Loss 11.0992   LearningRate 0.0887   Epoch: 1   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:53,502-Speed 5531.42 samples/sec   Loss 11.3379   LearningRate 0.0887   Epoch: 1   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:55,285-Speed 5745.29 samples/sec   Loss 10.9922   LearningRate 0.0887   Epoch: 1   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:57,083-Speed 5698.88 samples/sec   Loss 11.2218   LearningRate 0.0886   Epoch: 1   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:52:58,920-Speed 5575.93 samples/sec   Loss 11.2464   LearningRate 0.0886   Epoch: 1   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:00,700-Speed 5754.96 samples/sec   Loss 11.1181   LearningRate 0.0886   Epoch: 1   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:02,493-Speed 5716.02 samples/sec   Loss 11.2687   LearningRate 0.0886   Epoch: 1   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:04,279-Speed 5736.06 samples/sec   Loss 11.1788   LearningRate 0.0886   Epoch: 1   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:06,056-Speed 5762.88 samples/sec   Loss 11.3461   LearningRate 0.0886   Epoch: 1   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:07,845-Speed 5727.41 samples/sec   Loss 11.1914   LearningRate 0.0885   Epoch: 1   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:09,616-Speed 5784.44 samples/sec   Loss 11.0404   LearningRate 0.0885   Epoch: 1   Global Step: 6720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:11,417-Speed 5685.84 samples/sec   Loss 11.1649   LearningRate 0.0885   Epoch: 1   Global Step: 6730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:13,200-Speed 5747.24 samples/sec   Loss 11.0962   LearningRate 0.0885   Epoch: 1   Global Step: 6740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:15,024-Speed 5617.70 samples/sec   Loss 10.9999   LearningRate 0.0885   Epoch: 1   Global Step: 6750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:16,817-Speed 5712.29 samples/sec   Loss 11.3177   LearningRate 0.0885   Epoch: 1   Global Step: 6760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:18,608-Speed 5720.47 samples/sec   Loss 11.1720   LearningRate 0.0884   Epoch: 1   Global Step: 6770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:20,390-Speed 5749.18 samples/sec   Loss 11.1120   LearningRate 0.0884   Epoch: 1   Global Step: 6780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:22,184-Speed 5709.58 samples/sec   Loss 11.0454   LearningRate 0.0884   Epoch: 1   Global Step: 6790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:23,985-Speed 5688.51 samples/sec   Loss 11.1783   LearningRate 0.0884   Epoch: 1   Global Step: 6800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:25,766-Speed 5754.15 samples/sec   Loss 11.0639   LearningRate 0.0884   Epoch: 1   Global Step: 6810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:27,562-Speed 5702.45 samples/sec   Loss 11.1266   LearningRate 0.0884   Epoch: 1   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:29,369-Speed 5669.88 samples/sec   Loss 11.0998   LearningRate 0.0883   Epoch: 1   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:31,158-Speed 5727.03 samples/sec   Loss 11.1105   LearningRate 0.0883   Epoch: 1   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:32,952-Speed 5712.56 samples/sec   Loss 11.2177   LearningRate 0.0883   Epoch: 1   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:34,761-Speed 5660.72 samples/sec   Loss 11.1769   LearningRate 0.0883   Epoch: 1   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:36,573-Speed 5654.23 samples/sec   Loss 11.1269   LearningRate 0.0883   Epoch: 1   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:38,359-Speed 5738.37 samples/sec   Loss 11.2548   LearningRate 0.0883   Epoch: 1   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:40,166-Speed 5667.57 samples/sec   Loss 11.0303   LearningRate 0.0882   Epoch: 1   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:53:41,939-Speed 5777.81 samples/sec   Loss 11.0603   LearningRate 0.0882   Epoch: 1   Global Step: 6900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:43,746-Speed 5668.80 samples/sec   Loss 10.9575   LearningRate 0.0882   Epoch: 1   Global Step: 6910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:45,562-Speed 5642.66 samples/sec   Loss 11.2014   LearningRate 0.0882   Epoch: 1   Global Step: 6920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:47,365-Speed 5681.22 samples/sec   Loss 11.1431   LearningRate 0.0882   Epoch: 1   Global Step: 6930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:49,168-Speed 5683.38 samples/sec   Loss 10.9604   LearningRate 0.0882   Epoch: 1   Global Step: 6940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:50,982-Speed 5646.13 samples/sec   Loss 11.1932   LearningRate 0.0882   Epoch: 1   Global Step: 6950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:52,768-Speed 5736.04 samples/sec   Loss 11.0837   LearningRate 0.0881   Epoch: 1   Global Step: 6960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:54,558-Speed 5722.43 samples/sec   Loss 11.1581   LearningRate 0.0881   Epoch: 1   Global Step: 6970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:56,393-Speed 5583.35 samples/sec   Loss 11.0668   LearningRate 0.0881   Epoch: 1   Global Step: 6980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:53:58,190-Speed 5701.18 samples/sec   Loss 11.0263   LearningRate 0.0881   Epoch: 1   Global Step: 6990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:00,030-Speed 5569.39 samples/sec   Loss 10.9930   LearningRate 0.0881   Epoch: 1   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:54:01,826-Speed 5703.11 samples/sec   Loss 11.4268   LearningRate 0.0881   Epoch: 1   Global Step: 7010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:03,611-Speed 5740.19 samples/sec   Loss 11.1477   LearningRate 0.0880   Epoch: 1   Global Step: 7020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:05,399-Speed 5727.53 samples/sec   Loss 10.9051   LearningRate 0.0880   Epoch: 1   Global Step: 7030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:07,211-Speed 5656.38 samples/sec   Loss 11.1771   LearningRate 0.0880   Epoch: 1   Global Step: 7040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:09,007-Speed 5705.14 samples/sec   Loss 11.0153   LearningRate 0.0880   Epoch: 1   Global Step: 7050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:10,787-Speed 5754.43 samples/sec   Loss 11.1046   LearningRate 0.0880   Epoch: 1   Global Step: 7060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:12,571-Speed 5741.98 samples/sec   Loss 11.1827   LearningRate 0.0880   Epoch: 1   Global Step: 7070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:14,374-Speed 5682.24 samples/sec   Loss 11.1415   LearningRate 0.0879   Epoch: 1   Global Step: 7080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:16,158-Speed 5742.24 samples/sec   Loss 11.1708   LearningRate 0.0879   Epoch: 1   Global Step: 7090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:17,958-Speed 5692.25 samples/sec   Loss 10.9061   LearningRate 0.0879   Epoch: 1   Global Step: 7100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:19,781-Speed 5620.42 samples/sec   Loss 11.0422   LearningRate 0.0879   Epoch: 1   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:54:21,570-Speed 5727.21 samples/sec   Loss 11.0731   LearningRate 0.0879   Epoch: 1   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:54:23,358-Speed 5728.58 samples/sec   Loss 10.9406   LearningRate 0.0879   Epoch: 1   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:54:25,158-Speed 5691.45 samples/sec   Loss 11.0847   LearningRate 0.0878   Epoch: 1   Global Step: 7140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:26,982-Speed 5617.10 samples/sec   Loss 11.0844   LearningRate 0.0878   Epoch: 1   Global Step: 7150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:54:28,791-Speed 5660.40 samples/sec   Loss 10.9083   LearningRate 0.0878   Epoch: 1   Global Step: 7160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:30,600-Speed 5665.63 samples/sec   Loss 11.1871   LearningRate 0.0878   Epoch: 1   Global Step: 7170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:32,409-Speed 5661.26 samples/sec   Loss 11.0300   LearningRate 0.0878   Epoch: 1   Global Step: 7180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:34,203-Speed 5712.15 samples/sec   Loss 11.0834   LearningRate 0.0878   Epoch: 1   Global Step: 7190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:35,995-Speed 5716.40 samples/sec   Loss 11.1207   LearningRate 0.0877   Epoch: 1   Global Step: 7200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:37,794-Speed 5695.22 samples/sec   Loss 11.0620   LearningRate 0.0877   Epoch: 1   Global Step: 7210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:39,589-Speed 5706.71 samples/sec   Loss 10.7903   LearningRate 0.0877   Epoch: 1   Global Step: 7220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:41,375-Speed 5735.56 samples/sec   Loss 11.0704   LearningRate 0.0877   Epoch: 1   Global Step: 7230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:43,158-Speed 5745.39 samples/sec   Loss 10.9223   LearningRate 0.0877   Epoch: 1   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:54:44,949-Speed 5721.88 samples/sec   Loss 10.9728   LearningRate 0.0877   Epoch: 1   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:54:46,745-Speed 5703.59 samples/sec   Loss 10.8449   LearningRate 0.0876   Epoch: 1   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:54:48,541-Speed 5703.07 samples/sec   Loss 10.9120   LearningRate 0.0876   Epoch: 1   Global Step: 7270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:50,336-Speed 5708.21 samples/sec   Loss 10.9476   LearningRate 0.0876   Epoch: 1   Global Step: 7280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:52,133-Speed 5700.02 samples/sec   Loss 10.7995   LearningRate 0.0876   Epoch: 1   Global Step: 7290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:53,951-Speed 5635.34 samples/sec   Loss 10.8023   LearningRate 0.0876   Epoch: 1   Global Step: 7300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:55,752-Speed 5689.30 samples/sec   Loss 10.8641   LearningRate 0.0876   Epoch: 1   Global Step: 7310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:57,555-Speed 5682.63 samples/sec   Loss 10.9109   LearningRate 0.0875   Epoch: 1   Global Step: 7320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:54:59,366-Speed 5657.05 samples/sec   Loss 10.8680   LearningRate 0.0875   Epoch: 1   Global Step: 7330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:01,151-Speed 5737.07 samples/sec   Loss 10.7438   LearningRate 0.0875   Epoch: 1   Global Step: 7340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:02,940-Speed 5725.04 samples/sec   Loss 10.9618   LearningRate 0.0875   Epoch: 1   Global Step: 7350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:04,736-Speed 5707.05 samples/sec   Loss 10.8771   LearningRate 0.0875   Epoch: 1   Global Step: 7360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:06,536-Speed 5690.43 samples/sec   Loss 10.8339   LearningRate 0.0875   Epoch: 1   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:08,374-Speed 5575.25 samples/sec   Loss 10.7793   LearningRate 0.0874   Epoch: 1   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:10,199-Speed 5612.60 samples/sec   Loss 11.0669   LearningRate 0.0874   Epoch: 1   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:12,015-Speed 5639.77 samples/sec   Loss 10.7874   LearningRate 0.0874   Epoch: 1   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:13,821-Speed 5674.65 samples/sec   Loss 10.6885   LearningRate 0.0874   Epoch: 1   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:15,614-Speed 5712.77 samples/sec   Loss 10.9283   LearningRate 0.0874   Epoch: 1   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:17,452-Speed 5575.00 samples/sec   Loss 10.9000   LearningRate 0.0874   Epoch: 1   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:19,261-Speed 5661.69 samples/sec   Loss 10.7843   LearningRate 0.0873   Epoch: 1   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:21,063-Speed 5685.94 samples/sec   Loss 10.8089   LearningRate 0.0873   Epoch: 1   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:22,860-Speed 5700.75 samples/sec   Loss 10.7141   LearningRate 0.0873   Epoch: 1   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:24,658-Speed 5697.78 samples/sec   Loss 10.7804   LearningRate 0.0873   Epoch: 1   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:26,442-Speed 5741.61 samples/sec   Loss 10.9757   LearningRate 0.0873   Epoch: 1   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:28,227-Speed 5741.66 samples/sec   Loss 10.8494   LearningRate 0.0873   Epoch: 1   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:30,022-Speed 5706.96 samples/sec   Loss 10.8837   LearningRate 0.0872   Epoch: 1   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:31,824-Speed 5685.85 samples/sec   Loss 10.4707   LearningRate 0.0872   Epoch: 1   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:33,616-Speed 5714.91 samples/sec   Loss 10.7889   LearningRate 0.0872   Epoch: 1   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:35,404-Speed 5730.44 samples/sec   Loss 10.6933   LearningRate 0.0872   Epoch: 1   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:37,191-Speed 5733.98 samples/sec   Loss 10.8974   LearningRate 0.0872   Epoch: 1   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:39,043-Speed 5531.22 samples/sec   Loss 10.7362   LearningRate 0.0872   Epoch: 1   Global Step: 7550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:40,833-Speed 5723.38 samples/sec   Loss 10.7996   LearningRate 0.0871   Epoch: 1   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:55:42,636-Speed 5682.33 samples/sec   Loss 10.8546   LearningRate 0.0871   Epoch: 1   Global Step: 7570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:44,425-Speed 5726.43 samples/sec   Loss 10.9208   LearningRate 0.0871   Epoch: 1   Global Step: 7580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:46,209-Speed 5743.13 samples/sec   Loss 10.8701   LearningRate 0.0871   Epoch: 1   Global Step: 7590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:48,022-Speed 5650.38 samples/sec   Loss 10.8728   LearningRate 0.0871   Epoch: 1   Global Step: 7600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:49,841-Speed 5629.56 samples/sec   Loss 10.8481   LearningRate 0.0871   Epoch: 1   Global Step: 7610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:51,665-Speed 5618.80 samples/sec   Loss 10.9321   LearningRate 0.0870   Epoch: 1   Global Step: 7620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:53,450-Speed 5739.15 samples/sec   Loss 10.8250   LearningRate 0.0870   Epoch: 1   Global Step: 7630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:55,233-Speed 5743.48 samples/sec   Loss 10.7097   LearningRate 0.0870   Epoch: 1   Global Step: 7640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:57,023-Speed 5724.91 samples/sec   Loss 10.7733   LearningRate 0.0870   Epoch: 1   Global Step: 7650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:55:58,822-Speed 5693.63 samples/sec   Loss 10.7816   LearningRate 0.0870   Epoch: 1   Global Step: 7660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:00,605-Speed 5746.76 samples/sec   Loss 10.6494   LearningRate 0.0870   Epoch: 1   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:02,429-Speed 5614.34 samples/sec   Loss 10.8110   LearningRate 0.0869   Epoch: 1   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:04,226-Speed 5700.84 samples/sec   Loss 10.6867   LearningRate 0.0869   Epoch: 1   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:06,023-Speed 5701.91 samples/sec   Loss 10.9096   LearningRate 0.0869   Epoch: 1   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:07,818-Speed 5706.54 samples/sec   Loss 10.8564   LearningRate 0.0869   Epoch: 1   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:09,614-Speed 5706.07 samples/sec   Loss 10.8524   LearningRate 0.0869   Epoch: 1   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:11,395-Speed 5751.41 samples/sec   Loss 10.7329   LearningRate 0.0869   Epoch: 1   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:13,202-Speed 5668.33 samples/sec   Loss 10.7322   LearningRate 0.0869   Epoch: 1   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:15,007-Speed 5676.17 samples/sec   Loss 10.6968   LearningRate 0.0868   Epoch: 1   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:16,812-Speed 5676.61 samples/sec   Loss 10.9076   LearningRate 0.0868   Epoch: 1   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:18,612-Speed 5690.01 samples/sec   Loss 10.7930   LearningRate 0.0868   Epoch: 1   Global Step: 7770   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 01:56:20,404-Speed 5715.49 samples/sec   Loss 10.7475   LearningRate 0.0868   Epoch: 1   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:22,223-Speed 5633.08 samples/sec   Loss 10.8395   LearningRate 0.0868   Epoch: 1   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:24,010-Speed 5732.59 samples/sec   Loss 10.7463   LearningRate 0.0868   Epoch: 1   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:25,811-Speed 5687.06 samples/sec   Loss 10.6187   LearningRate 0.0867   Epoch: 1   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:27,595-Speed 5742.51 samples/sec   Loss 10.7630   LearningRate 0.0867   Epoch: 1   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:29,374-Speed 5758.30 samples/sec   Loss 10.8406   LearningRate 0.0867   Epoch: 1   Global Step: 7830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:31,155-Speed 5752.40 samples/sec   Loss 10.7879   LearningRate 0.0867   Epoch: 1   Global Step: 7840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:32,948-Speed 5712.24 samples/sec   Loss 10.6702   LearningRate 0.0867   Epoch: 1   Global Step: 7850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:34,741-Speed 5712.38 samples/sec   Loss 10.5769   LearningRate 0.0867   Epoch: 1   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:36,527-Speed 5735.41 samples/sec   Loss 10.5497   LearningRate 0.0866   Epoch: 1   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:38,321-Speed 5714.30 samples/sec   Loss 10.9506   LearningRate 0.0866   Epoch: 1   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:40,114-Speed 5715.38 samples/sec   Loss 10.7795   LearningRate 0.0866   Epoch: 1   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:41,916-Speed 5684.35 samples/sec   Loss 10.6737   LearningRate 0.0866   Epoch: 1   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:43,744-Speed 5604.09 samples/sec   Loss 10.6655   LearningRate 0.0866   Epoch: 1   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:45,590-Speed 5549.52 samples/sec   Loss 10.6377   LearningRate 0.0866   Epoch: 1   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 01:56:47,388-Speed 5697.25 samples/sec   Loss 10.8435   LearningRate 0.0865   Epoch: 1   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:49,182-Speed 5708.97 samples/sec   Loss 10.6438   LearningRate 0.0865   Epoch: 1   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:51,008-Speed 5610.38 samples/sec   Loss 10.5556   LearningRate 0.0865   Epoch: 1   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:52,836-Speed 5604.02 samples/sec   Loss 10.6327   LearningRate 0.0865   Epoch: 1   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:54,626-Speed 5720.28 samples/sec   Loss 10.6593   LearningRate 0.0865   Epoch: 1   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:56,431-Speed 5674.32 samples/sec   Loss 10.7046   LearningRate 0.0865   Epoch: 1   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:56:58,215-Speed 5745.13 samples/sec   Loss 10.7072   LearningRate 0.0864   Epoch: 1   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:57:00,036-Speed 5624.04 samples/sec   Loss 10.6941   LearningRate 0.0864   Epoch: 1   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 01:57:26,523-[lfw][8000]XNorm: 22.133128
Training: 2022-04-27 01:57:26,523-[lfw][8000]Accuracy-Flip: 0.99450+-0.00415
Training: 2022-04-27 01:57:26,524-[lfw][8000]Accuracy-Highest: 0.99450
Training: 2022-04-27 01:57:57,191-[cfp_fp][8000]XNorm: 18.497261
Training: 2022-04-27 01:57:57,191-[cfp_fp][8000]Accuracy-Flip: 0.91071+-0.01325
Training: 2022-04-27 01:57:57,192-[cfp_fp][8000]Accuracy-Highest: 0.91071
Training: 2022-04-27 01:58:23,671-[agedb_30][8000]XNorm: 21.640810
Training: 2022-04-27 01:58:23,672-[agedb_30][8000]Accuracy-Flip: 0.95567+-0.00961
Training: 2022-04-27 01:58:23,672-[agedb_30][8000]Accuracy-Highest: 0.95567
Training: 2022-04-27 01:58:25,486-Speed 119.84 samples/sec   Loss 10.7057   LearningRate 0.0864   Epoch: 1   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:27,277-Speed 5721.68 samples/sec   Loss 10.5941   LearningRate 0.0864   Epoch: 1   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:29,048-Speed 5783.11 samples/sec   Loss 10.5033   LearningRate 0.0864   Epoch: 1   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:30,839-Speed 5720.21 samples/sec   Loss 10.8357   LearningRate 0.0864   Epoch: 1   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:32,636-Speed 5699.13 samples/sec   Loss 10.5151   LearningRate 0.0863   Epoch: 1   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:34,412-Speed 5769.04 samples/sec   Loss 10.8805   LearningRate 0.0863   Epoch: 1   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:36,192-Speed 5757.46 samples/sec   Loss 10.7316   LearningRate 0.0863   Epoch: 1   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:37,982-Speed 5722.39 samples/sec   Loss 10.6967   LearningRate 0.0863   Epoch: 1   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:39,785-Speed 5682.57 samples/sec   Loss 10.5257   LearningRate 0.0863   Epoch: 1   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:41,581-Speed 5705.27 samples/sec   Loss 10.5432   LearningRate 0.0863   Epoch: 1   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:43,417-Speed 5579.67 samples/sec   Loss 10.6007   LearningRate 0.0862   Epoch: 1   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:45,212-Speed 5705.79 samples/sec   Loss 10.7053   LearningRate 0.0862   Epoch: 1   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:46,997-Speed 5741.65 samples/sec   Loss 10.7008   LearningRate 0.0862   Epoch: 1   Global Step: 8130   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:58:48,797-Speed 5691.32 samples/sec   Loss 10.5550   LearningRate 0.0862   Epoch: 1   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:50,588-Speed 5720.07 samples/sec   Loss 10.4912   LearningRate 0.0862   Epoch: 1   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:52,382-Speed 5707.77 samples/sec   Loss 10.5521   LearningRate 0.0862   Epoch: 1   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:54,175-Speed 5715.68 samples/sec   Loss 10.6199   LearningRate 0.0861   Epoch: 1   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:55,986-Speed 5656.21 samples/sec   Loss 10.6788   LearningRate 0.0861   Epoch: 1   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:57,767-Speed 5752.51 samples/sec   Loss 10.6006   LearningRate 0.0861   Epoch: 1   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:58:59,549-Speed 5748.48 samples/sec   Loss 10.5807   LearningRate 0.0861   Epoch: 1   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:01,356-Speed 5669.30 samples/sec   Loss 10.5440   LearningRate 0.0861   Epoch: 1   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:03,137-Speed 5752.62 samples/sec   Loss 10.5735   LearningRate 0.0861   Epoch: 1   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:04,924-Speed 5731.65 samples/sec   Loss 10.5811   LearningRate 0.0860   Epoch: 1   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:06,723-Speed 5693.48 samples/sec   Loss 10.5855   LearningRate 0.0860   Epoch: 1   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:08,509-Speed 5738.12 samples/sec   Loss 10.4053   LearningRate 0.0860   Epoch: 1   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:10,319-Speed 5658.59 samples/sec   Loss 10.5278   LearningRate 0.0860   Epoch: 1   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:12,173-Speed 5526.51 samples/sec   Loss 10.4346   LearningRate 0.0860   Epoch: 1   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:13,988-Speed 5645.56 samples/sec   Loss 10.5241   LearningRate 0.0860   Epoch: 1   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:15,800-Speed 5655.25 samples/sec   Loss 10.5621   LearningRate 0.0860   Epoch: 1   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:17,619-Speed 5629.49 samples/sec   Loss 10.5859   LearningRate 0.0859   Epoch: 1   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:19,419-Speed 5691.63 samples/sec   Loss 10.2764   LearningRate 0.0859   Epoch: 1   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:21,213-Speed 5710.82 samples/sec   Loss 10.4652   LearningRate 0.0859   Epoch: 1   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:23,023-Speed 5661.57 samples/sec   Loss 10.3737   LearningRate 0.0859   Epoch: 1   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:24,811-Speed 5729.49 samples/sec   Loss 10.5361   LearningRate 0.0859   Epoch: 1   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:26,614-Speed 5680.52 samples/sec   Loss 10.5583   LearningRate 0.0859   Epoch: 1   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:28,406-Speed 5718.95 samples/sec   Loss 10.6645   LearningRate 0.0858   Epoch: 1   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:30,221-Speed 5641.81 samples/sec   Loss 10.5425   LearningRate 0.0858   Epoch: 1   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:32,003-Speed 5749.27 samples/sec   Loss 10.4493   LearningRate 0.0858   Epoch: 1   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:33,799-Speed 5706.11 samples/sec   Loss 10.4997   LearningRate 0.0858   Epoch: 1   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:35,594-Speed 5707.89 samples/sec   Loss 10.4716   LearningRate 0.0858   Epoch: 1   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:37,423-Speed 5600.07 samples/sec   Loss 10.5574   LearningRate 0.0858   Epoch: 1   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:39,224-Speed 5689.88 samples/sec   Loss 10.5970   LearningRate 0.0857   Epoch: 1   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:41,019-Speed 5704.19 samples/sec   Loss 10.5970   LearningRate 0.0857   Epoch: 1   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:42,817-Speed 5699.76 samples/sec   Loss 10.3551   LearningRate 0.0857   Epoch: 1   Global Step: 8440   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 01:59:44,598-Speed 5751.20 samples/sec   Loss 10.4548   LearningRate 0.0857   Epoch: 1   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:46,385-Speed 5734.36 samples/sec   Loss 10.5520   LearningRate 0.0857   Epoch: 1   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:48,175-Speed 5722.34 samples/sec   Loss 10.6178   LearningRate 0.0857   Epoch: 1   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 01:59:49,974-Speed 5695.97 samples/sec   Loss 10.5082   LearningRate 0.0856   Epoch: 1   Global Step: 8480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:59:51,768-Speed 5708.56 samples/sec   Loss 10.4952   LearningRate 0.0856   Epoch: 1   Global Step: 8490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:59:53,567-Speed 5694.50 samples/sec   Loss 10.3849   LearningRate 0.0856   Epoch: 1   Global Step: 8500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:59:55,370-Speed 5681.60 samples/sec   Loss 10.5123   LearningRate 0.0856   Epoch: 1   Global Step: 8510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:59:57,178-Speed 5668.93 samples/sec   Loss 10.3219   LearningRate 0.0856   Epoch: 1   Global Step: 8520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 01:59:58,968-Speed 5721.38 samples/sec   Loss 10.4210   LearningRate 0.0856   Epoch: 1   Global Step: 8530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 02:00:00,770-Speed 5685.71 samples/sec   Loss 10.5311   LearningRate 0.0855   Epoch: 1   Global Step: 8540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 02:00:02,570-Speed 5693.35 samples/sec   Loss 10.2872   LearningRate 0.0855   Epoch: 1   Global Step: 8550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 02:00:04,356-Speed 5734.02 samples/sec   Loss 10.3969   LearningRate 0.0855   Epoch: 1   Global Step: 8560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 02:00:06,157-Speed 5688.98 samples/sec   Loss 10.4365   LearningRate 0.0855   Epoch: 1   Global Step: 8570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 02:00:07,963-Speed 5671.68 samples/sec   Loss 10.5898   LearningRate 0.0855   Epoch: 1   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 02:00:09,745-Speed 5750.26 samples/sec   Loss 10.2535   LearningRate 0.0855   Epoch: 1   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 02:00:11,540-Speed 5704.30 samples/sec   Loss 10.4610   LearningRate 0.0854   Epoch: 1   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 02:00:13,324-Speed 5744.72 samples/sec   Loss 10.6841   LearningRate 0.0854   Epoch: 1   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:15,118-Speed 5708.91 samples/sec   Loss 10.3825   LearningRate 0.0854   Epoch: 1   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:16,906-Speed 5730.46 samples/sec   Loss 10.3186   LearningRate 0.0854   Epoch: 1   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:18,704-Speed 5696.57 samples/sec   Loss 10.5790   LearningRate 0.0854   Epoch: 1   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:20,497-Speed 5713.86 samples/sec   Loss 10.4686   LearningRate 0.0854   Epoch: 1   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:22,332-Speed 5582.00 samples/sec   Loss 10.2604   LearningRate 0.0853   Epoch: 1   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:24,142-Speed 5658.85 samples/sec   Loss 10.3869   LearningRate 0.0853   Epoch: 1   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:25,915-Speed 5778.69 samples/sec   Loss 10.3344   LearningRate 0.0853   Epoch: 1   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:27,764-Speed 5541.05 samples/sec   Loss 10.2645   LearningRate 0.0853   Epoch: 1   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:29,553-Speed 5726.45 samples/sec   Loss 10.2884   LearningRate 0.0853   Epoch: 1   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:31,347-Speed 5708.01 samples/sec   Loss 10.3509   LearningRate 0.0853   Epoch: 1   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:33,146-Speed 5698.75 samples/sec   Loss 10.4638   LearningRate 0.0853   Epoch: 1   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:34,926-Speed 5753.36 samples/sec   Loss 10.3509   LearningRate 0.0852   Epoch: 1   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:36,729-Speed 5681.60 samples/sec   Loss 10.4426   LearningRate 0.0852   Epoch: 1   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:38,540-Speed 5657.33 samples/sec   Loss 10.3474   LearningRate 0.0852   Epoch: 1   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:40,339-Speed 5694.30 samples/sec   Loss 10.2945   LearningRate 0.0852   Epoch: 1   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:42,129-Speed 5724.34 samples/sec   Loss 10.3960   LearningRate 0.0852   Epoch: 1   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:43,906-Speed 5766.14 samples/sec   Loss 10.3700   LearningRate 0.0852   Epoch: 1   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:45,729-Speed 5617.85 samples/sec   Loss 10.3692   LearningRate 0.0851   Epoch: 1   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:47,541-Speed 5654.96 samples/sec   Loss 10.3639   LearningRate 0.0851   Epoch: 1   Global Step: 8800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:49,321-Speed 5753.53 samples/sec   Loss 10.2020   LearningRate 0.0851   Epoch: 1   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:51,122-Speed 5689.16 samples/sec   Loss 10.3628   LearningRate 0.0851   Epoch: 1   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:52,913-Speed 5720.25 samples/sec   Loss 10.3381   LearningRate 0.0851   Epoch: 1   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:54,697-Speed 5744.30 samples/sec   Loss 10.1951   LearningRate 0.0851   Epoch: 1   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:56,525-Speed 5603.18 samples/sec   Loss 10.2233   LearningRate 0.0850   Epoch: 1   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:00:58,334-Speed 5662.57 samples/sec   Loss 10.1685   LearningRate 0.0850   Epoch: 1   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:00,141-Speed 5669.63 samples/sec   Loss 10.1977   LearningRate 0.0850   Epoch: 1   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:01,940-Speed 5692.45 samples/sec   Loss 10.3309   LearningRate 0.0850   Epoch: 1   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:03,796-Speed 5523.20 samples/sec   Loss 10.0664   LearningRate 0.0850   Epoch: 1   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:05,664-Speed 5482.90 samples/sec   Loss 10.1574   LearningRate 0.0850   Epoch: 1   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:07,486-Speed 5621.81 samples/sec   Loss 10.3518   LearningRate 0.0849   Epoch: 1   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:09,298-Speed 5655.42 samples/sec   Loss 10.2609   LearningRate 0.0849   Epoch: 1   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:11,101-Speed 5680.48 samples/sec   Loss 10.1656   LearningRate 0.0849   Epoch: 1   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:12,897-Speed 5707.94 samples/sec   Loss 10.3118   LearningRate 0.0849   Epoch: 1   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:14,690-Speed 5711.08 samples/sec   Loss 10.2240   LearningRate 0.0849   Epoch: 1   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:16,488-Speed 5697.53 samples/sec   Loss 10.2727   LearningRate 0.0849   Epoch: 1   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:18,276-Speed 5729.64 samples/sec   Loss 10.4237   LearningRate 0.0848   Epoch: 1   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:20,070-Speed 5712.19 samples/sec   Loss 10.2931   LearningRate 0.0848   Epoch: 1   Global Step: 8980   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:01:21,905-Speed 5581.53 samples/sec   Loss 10.0952   LearningRate 0.0848   Epoch: 1   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:23,701-Speed 5706.95 samples/sec   Loss 10.2686   LearningRate 0.0848   Epoch: 1   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:25,508-Speed 5668.12 samples/sec   Loss 10.1449   LearningRate 0.0848   Epoch: 1   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:27,306-Speed 5697.21 samples/sec   Loss 10.2480   LearningRate 0.0848   Epoch: 1   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:29,095-Speed 5727.56 samples/sec   Loss 10.2990   LearningRate 0.0847   Epoch: 1   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:30,898-Speed 5680.33 samples/sec   Loss 10.3413   LearningRate 0.0847   Epoch: 1   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:32,762-Speed 5496.19 samples/sec   Loss 10.2597   LearningRate 0.0847   Epoch: 1   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:34,594-Speed 5592.82 samples/sec   Loss 10.1763   LearningRate 0.0847   Epoch: 1   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:36,402-Speed 5664.61 samples/sec   Loss 10.4073   LearningRate 0.0847   Epoch: 1   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:38,225-Speed 5622.00 samples/sec   Loss 10.3282   LearningRate 0.0847   Epoch: 1   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:40,022-Speed 5700.11 samples/sec   Loss 10.3897   LearningRate 0.0847   Epoch: 1   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:41,844-Speed 5623.25 samples/sec   Loss 10.2823   LearningRate 0.0846   Epoch: 1   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:43,640-Speed 5704.03 samples/sec   Loss 10.3052   LearningRate 0.0846   Epoch: 1   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:45,442-Speed 5685.40 samples/sec   Loss 10.2770   LearningRate 0.0846   Epoch: 1   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:47,241-Speed 5692.65 samples/sec   Loss 10.1909   LearningRate 0.0846   Epoch: 1   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:49,025-Speed 5743.95 samples/sec   Loss 10.4263   LearningRate 0.0846   Epoch: 1   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:50,839-Speed 5649.87 samples/sec   Loss 10.4187   LearningRate 0.0846   Epoch: 1   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:52,646-Speed 5667.99 samples/sec   Loss 10.2975   LearningRate 0.0845   Epoch: 1   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:54,444-Speed 5697.82 samples/sec   Loss 10.3456   LearningRate 0.0845   Epoch: 1   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:56,242-Speed 5695.74 samples/sec   Loss 10.1928   LearningRate 0.0845   Epoch: 1   Global Step: 9180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:01:58,074-Speed 5591.59 samples/sec   Loss 10.1625   LearningRate 0.0845   Epoch: 1   Global Step: 9190   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:01:59,855-Speed 5752.66 samples/sec   Loss 10.2324   LearningRate 0.0845   Epoch: 1   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:01,654-Speed 5695.55 samples/sec   Loss 10.2123   LearningRate 0.0845   Epoch: 1   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:03,446-Speed 5717.78 samples/sec   Loss 10.3022   LearningRate 0.0844   Epoch: 1   Global Step: 9220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:05,229-Speed 5745.49 samples/sec   Loss 10.1491   LearningRate 0.0844   Epoch: 1   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:07,030-Speed 5689.49 samples/sec   Loss 10.0279   LearningRate 0.0844   Epoch: 1   Global Step: 9240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:08,831-Speed 5687.48 samples/sec   Loss 10.1774   LearningRate 0.0844   Epoch: 1   Global Step: 9250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:10,623-Speed 5715.07 samples/sec   Loss 10.1972   LearningRate 0.0844   Epoch: 1   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:12,412-Speed 5728.40 samples/sec   Loss 10.3126   LearningRate 0.0844   Epoch: 1   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:14,204-Speed 5716.03 samples/sec   Loss 10.2695   LearningRate 0.0843   Epoch: 1   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:16,048-Speed 5556.21 samples/sec   Loss 10.2280   LearningRate 0.0843   Epoch: 1   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:17,861-Speed 5650.71 samples/sec   Loss 10.1821   LearningRate 0.0843   Epoch: 1   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:19,652-Speed 5720.66 samples/sec   Loss 10.1979   LearningRate 0.0843   Epoch: 1   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:21,460-Speed 5667.05 samples/sec   Loss 10.4555   LearningRate 0.0843   Epoch: 1   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:23,263-Speed 5682.54 samples/sec   Loss 10.2656   LearningRate 0.0843   Epoch: 1   Global Step: 9330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:25,051-Speed 5727.74 samples/sec   Loss 10.2006   LearningRate 0.0842   Epoch: 1   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:26,838-Speed 5731.87 samples/sec   Loss 10.2212   LearningRate 0.0842   Epoch: 1   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:28,630-Speed 5717.73 samples/sec   Loss 10.0755   LearningRate 0.0842   Epoch: 1   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:30,427-Speed 5699.24 samples/sec   Loss 10.1744   LearningRate 0.0842   Epoch: 1   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:32,242-Speed 5646.28 samples/sec   Loss 10.1262   LearningRate 0.0842   Epoch: 1   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:34,032-Speed 5722.84 samples/sec   Loss 10.1948   LearningRate 0.0842   Epoch: 1   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:35,820-Speed 5729.00 samples/sec   Loss 10.1498   LearningRate 0.0842   Epoch: 1   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:37,615-Speed 5708.72 samples/sec   Loss 10.2891   LearningRate 0.0841   Epoch: 1   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:39,403-Speed 5728.57 samples/sec   Loss 10.2751   LearningRate 0.0841   Epoch: 1   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:41,201-Speed 5697.10 samples/sec   Loss 10.1037   LearningRate 0.0841   Epoch: 1   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:42,991-Speed 5726.54 samples/sec   Loss 10.2305   LearningRate 0.0841   Epoch: 1   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:44,782-Speed 5719.26 samples/sec   Loss 10.1045   LearningRate 0.0841   Epoch: 1   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:46,575-Speed 5714.19 samples/sec   Loss 10.1607   LearningRate 0.0841   Epoch: 1   Global Step: 9460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:48,378-Speed 5682.03 samples/sec   Loss 10.1921   LearningRate 0.0840   Epoch: 1   Global Step: 9470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:50,162-Speed 5741.43 samples/sec   Loss 10.0813   LearningRate 0.0840   Epoch: 1   Global Step: 9480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:51,955-Speed 5715.09 samples/sec   Loss 10.1892   LearningRate 0.0840   Epoch: 1   Global Step: 9490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:53,740-Speed 5737.77 samples/sec   Loss 10.2081   LearningRate 0.0840   Epoch: 1   Global Step: 9500   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:02:55,549-Speed 5665.76 samples/sec   Loss 10.0631   LearningRate 0.0840   Epoch: 1   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:57,361-Speed 5652.09 samples/sec   Loss 10.0688   LearningRate 0.0840   Epoch: 1   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:02:59,153-Speed 5717.71 samples/sec   Loss 10.0985   LearningRate 0.0839   Epoch: 1   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:00,956-Speed 5678.50 samples/sec   Loss 10.1688   LearningRate 0.0839   Epoch: 1   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:02,758-Speed 5686.62 samples/sec   Loss 10.2486   LearningRate 0.0839   Epoch: 1   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:04,573-Speed 5644.08 samples/sec   Loss 10.0515   LearningRate 0.0839   Epoch: 1   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:06,433-Speed 5507.73 samples/sec   Loss 10.0551   LearningRate 0.0839   Epoch: 1   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:08,242-Speed 5663.88 samples/sec   Loss 10.2417   LearningRate 0.0839   Epoch: 1   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:10,037-Speed 5706.83 samples/sec   Loss 10.1305   LearningRate 0.0838   Epoch: 1   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:11,848-Speed 5657.79 samples/sec   Loss 10.1608   LearningRate 0.0838   Epoch: 1   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:13,665-Speed 5637.04 samples/sec   Loss 10.1411   LearningRate 0.0838   Epoch: 1   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:15,505-Speed 5569.39 samples/sec   Loss 10.0835   LearningRate 0.0838   Epoch: 1   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:17,312-Speed 5666.91 samples/sec   Loss 10.1027   LearningRate 0.0838   Epoch: 1   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:19,102-Speed 5722.19 samples/sec   Loss 10.2561   LearningRate 0.0838   Epoch: 1   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:20,892-Speed 5724.06 samples/sec   Loss 10.1797   LearningRate 0.0837   Epoch: 1   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:22,688-Speed 5705.95 samples/sec   Loss 10.1944   LearningRate 0.0837   Epoch: 1   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:24,493-Speed 5674.51 samples/sec   Loss 10.0968   LearningRate 0.0837   Epoch: 1   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:26,288-Speed 5707.61 samples/sec   Loss 9.8984   LearningRate 0.0837   Epoch: 1   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:28,083-Speed 5705.98 samples/sec   Loss 10.2155   LearningRate 0.0837   Epoch: 1   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:29,899-Speed 5641.49 samples/sec   Loss 10.2128   LearningRate 0.0837   Epoch: 1   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:31,690-Speed 5718.95 samples/sec   Loss 10.0413   LearningRate 0.0837   Epoch: 1   Global Step: 9710   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:03:33,467-Speed 5768.29 samples/sec   Loss 9.9978   LearningRate 0.0836   Epoch: 1   Global Step: 9720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:35,264-Speed 5699.63 samples/sec   Loss 10.1456   LearningRate 0.0836   Epoch: 1   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:37,054-Speed 5722.61 samples/sec   Loss 9.9853   LearningRate 0.0836   Epoch: 1   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:38,852-Speed 5696.38 samples/sec   Loss 10.0471   LearningRate 0.0836   Epoch: 1   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:40,655-Speed 5683.22 samples/sec   Loss 10.0188   LearningRate 0.0836   Epoch: 1   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:42,469-Speed 5646.80 samples/sec   Loss 10.1233   LearningRate 0.0836   Epoch: 1   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:44,274-Speed 5678.05 samples/sec   Loss 10.2043   LearningRate 0.0835   Epoch: 1   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:03:46,073-Speed 5694.25 samples/sec   Loss 9.9990   LearningRate 0.0835   Epoch: 1   Global Step: 9790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:03:47,865-Speed 5717.32 samples/sec   Loss 10.0872   LearningRate 0.0835   Epoch: 1   Global Step: 9800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:03:49,648-Speed 5746.15 samples/sec   Loss 9.8705   LearningRate 0.0835   Epoch: 1   Global Step: 9810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:03:51,440-Speed 5717.88 samples/sec   Loss 10.0163   LearningRate 0.0835   Epoch: 1   Global Step: 9820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:03:53,233-Speed 5711.65 samples/sec   Loss 10.0366   LearningRate 0.0835   Epoch: 1   Global Step: 9830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:03:55,042-Speed 5665.23 samples/sec   Loss 10.2559   LearningRate 0.0834   Epoch: 1   Global Step: 9840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:03:56,844-Speed 5684.82 samples/sec   Loss 10.0413   LearningRate 0.0834   Epoch: 1   Global Step: 9850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:03:58,636-Speed 5716.89 samples/sec   Loss 10.1394   LearningRate 0.0834   Epoch: 1   Global Step: 9860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:04:00,436-Speed 5691.20 samples/sec   Loss 9.8882   LearningRate 0.0834   Epoch: 1   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:04:02,254-Speed 5635.70 samples/sec   Loss 9.8361   LearningRate 0.0834   Epoch: 1   Global Step: 9880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:04:04,074-Speed 5627.55 samples/sec   Loss 9.9611   LearningRate 0.0834   Epoch: 1   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:04:05,880-Speed 5671.31 samples/sec   Loss 10.0429   LearningRate 0.0833   Epoch: 1   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:04:07,692-Speed 5654.14 samples/sec   Loss 10.0752   LearningRate 0.0833   Epoch: 1   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:04:09,506-Speed 5648.92 samples/sec   Loss 10.1007   LearningRate 0.0833   Epoch: 1   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:04:11,317-Speed 5658.16 samples/sec   Loss 9.9505   LearningRate 0.0833   Epoch: 1   Global Step: 9930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:13,116-Speed 5694.66 samples/sec   Loss 9.9305   LearningRate 0.0833   Epoch: 1   Global Step: 9940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:14,934-Speed 5636.12 samples/sec   Loss 10.1706   LearningRate 0.0833   Epoch: 1   Global Step: 9950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:16,717-Speed 5743.72 samples/sec   Loss 10.0291   LearningRate 0.0833   Epoch: 1   Global Step: 9960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:18,528-Speed 5659.47 samples/sec   Loss 10.1131   LearningRate 0.0832   Epoch: 1   Global Step: 9970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:20,326-Speed 5697.27 samples/sec   Loss 9.9290   LearningRate 0.0832   Epoch: 1   Global Step: 9980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:22,110-Speed 5740.74 samples/sec   Loss 10.1092   LearningRate 0.0832   Epoch: 1   Global Step: 9990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:23,909-Speed 5696.49 samples/sec   Loss 10.1087   LearningRate 0.0832   Epoch: 1   Global Step: 10000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:04:50,724-[lfw][10000]XNorm: 21.238482
Training: 2022-04-27 02:04:50,724-[lfw][10000]Accuracy-Flip: 0.99517+-0.00311
Training: 2022-04-27 02:04:50,725-[lfw][10000]Accuracy-Highest: 0.99517
Training: 2022-04-27 02:05:21,743-[cfp_fp][10000]XNorm: 18.948939
Training: 2022-04-27 02:05:21,744-[cfp_fp][10000]Accuracy-Flip: 0.90586+-0.01211
Training: 2022-04-27 02:05:21,745-[cfp_fp][10000]Accuracy-Highest: 0.91071
Training: 2022-04-27 02:05:48,436-[agedb_30][10000]XNorm: 21.228598
Training: 2022-04-27 02:05:48,436-[agedb_30][10000]Accuracy-Flip: 0.95350+-0.01050
Training: 2022-04-27 02:05:48,437-[agedb_30][10000]Accuracy-Highest: 0.95567
Training: 2022-04-27 02:05:50,238-Speed 118.62 samples/sec   Loss 9.9019   LearningRate 0.0832   Epoch: 1   Global Step: 10010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:05:52,052-Speed 5647.28 samples/sec   Loss 9.8762   LearningRate 0.0832   Epoch: 1   Global Step: 10020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:05:53,851-Speed 5695.28 samples/sec   Loss 9.9874   LearningRate 0.0831   Epoch: 1   Global Step: 10030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:05:55,661-Speed 5660.07 samples/sec   Loss 9.9149   LearningRate 0.0831   Epoch: 1   Global Step: 10040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:05:57,449-Speed 5733.69 samples/sec   Loss 10.0144   LearningRate 0.0831   Epoch: 1   Global Step: 10050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:05:59,239-Speed 5720.43 samples/sec   Loss 9.9034   LearningRate 0.0831   Epoch: 1   Global Step: 10060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:06:01,026-Speed 5733.58 samples/sec   Loss 9.9557   LearningRate 0.0831   Epoch: 1   Global Step: 10070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:06:02,810-Speed 5741.06 samples/sec   Loss 10.0913   LearningRate 0.0831   Epoch: 1   Global Step: 10080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:06:04,608-Speed 5700.12 samples/sec   Loss 10.1132   LearningRate 0.0830   Epoch: 1   Global Step: 10090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:06:06,392-Speed 5742.16 samples/sec   Loss 10.1382   LearningRate 0.0830   Epoch: 1   Global Step: 10100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:06:08,173-Speed 5751.68 samples/sec   Loss 9.9740   LearningRate 0.0830   Epoch: 1   Global Step: 10110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:06:09,961-Speed 5727.91 samples/sec   Loss 9.9214   LearningRate 0.0830   Epoch: 1   Global Step: 10120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:06:11,740-Speed 5757.61 samples/sec   Loss 10.0323   LearningRate 0.0830   Epoch: 1   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:13,523-Speed 5749.70 samples/sec   Loss 10.0066   LearningRate 0.0830   Epoch: 1   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:15,314-Speed 5719.77 samples/sec   Loss 10.0964   LearningRate 0.0829   Epoch: 1   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:17,125-Speed 5655.35 samples/sec   Loss 10.0723   LearningRate 0.0829   Epoch: 1   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:18,915-Speed 5722.94 samples/sec   Loss 9.9286   LearningRate 0.0829   Epoch: 1   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:20,705-Speed 5723.21 samples/sec   Loss 9.9190   LearningRate 0.0829   Epoch: 1   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:22,496-Speed 5718.35 samples/sec   Loss 9.9333   LearningRate 0.0829   Epoch: 1   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:24,285-Speed 5729.13 samples/sec   Loss 9.9572   LearningRate 0.0829   Epoch: 1   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:26,081-Speed 5704.70 samples/sec   Loss 9.8279   LearningRate 0.0828   Epoch: 1   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:27,918-Speed 5576.27 samples/sec   Loss 9.8614   LearningRate 0.0828   Epoch: 1   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:29,732-Speed 5647.80 samples/sec   Loss 9.8638   LearningRate 0.0828   Epoch: 1   Global Step: 10230   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:06:31,505-Speed 5777.46 samples/sec   Loss 9.9583   LearningRate 0.0828   Epoch: 1   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:33,304-Speed 5696.64 samples/sec   Loss 9.8560   LearningRate 0.0828   Epoch: 1   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:35,089-Speed 5738.93 samples/sec   Loss 9.9179   LearningRate 0.0828   Epoch: 1   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:36,880-Speed 5718.83 samples/sec   Loss 9.8349   LearningRate 0.0828   Epoch: 1   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:38,681-Speed 5689.05 samples/sec   Loss 9.9059   LearningRate 0.0827   Epoch: 1   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:40,474-Speed 5714.30 samples/sec   Loss 9.9685   LearningRate 0.0827   Epoch: 1   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:42,257-Speed 5743.13 samples/sec   Loss 10.0370   LearningRate 0.0827   Epoch: 1   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:44,062-Speed 5678.44 samples/sec   Loss 9.9204   LearningRate 0.0827   Epoch: 1   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:45,847-Speed 5738.48 samples/sec   Loss 10.0502   LearningRate 0.0827   Epoch: 1   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:47,631-Speed 5742.98 samples/sec   Loss 9.8838   LearningRate 0.0827   Epoch: 1   Global Step: 10330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:49,413-Speed 5748.62 samples/sec   Loss 9.7684   LearningRate 0.0826   Epoch: 1   Global Step: 10340   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:06:51,193-Speed 5755.97 samples/sec   Loss 9.9217   LearningRate 0.0826   Epoch: 1   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:53,002-Speed 5664.85 samples/sec   Loss 10.0376   LearningRate 0.0826   Epoch: 1   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:54,784-Speed 5747.20 samples/sec   Loss 10.0428   LearningRate 0.0826   Epoch: 1   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:56,613-Speed 5602.38 samples/sec   Loss 10.0122   LearningRate 0.0826   Epoch: 1   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:06:58,430-Speed 5637.72 samples/sec   Loss 10.0223   LearningRate 0.0826   Epoch: 1   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:00,237-Speed 5669.36 samples/sec   Loss 9.9863   LearningRate 0.0825   Epoch: 1   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:02,029-Speed 5717.39 samples/sec   Loss 9.7396   LearningRate 0.0825   Epoch: 1   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:03,814-Speed 5742.00 samples/sec   Loss 9.9686   LearningRate 0.0825   Epoch: 1   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:05,610-Speed 5703.38 samples/sec   Loss 9.8916   LearningRate 0.0825   Epoch: 1   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:07,403-Speed 5713.63 samples/sec   Loss 9.8129   LearningRate 0.0825   Epoch: 1   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:09,202-Speed 5694.05 samples/sec   Loss 9.9253   LearningRate 0.0825   Epoch: 1   Global Step: 10450   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:07:10,991-Speed 5726.37 samples/sec   Loss 10.0466   LearningRate 0.0824   Epoch: 1   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:12,799-Speed 5665.04 samples/sec   Loss 10.0114   LearningRate 0.0824   Epoch: 1   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:14,622-Speed 5622.13 samples/sec   Loss 9.7493   LearningRate 0.0824   Epoch: 1   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:16,418-Speed 5701.89 samples/sec   Loss 9.8188   LearningRate 0.0824   Epoch: 1   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:18,205-Speed 5734.17 samples/sec   Loss 9.8661   LearningRate 0.0824   Epoch: 1   Global Step: 10500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:19,992-Speed 5732.07 samples/sec   Loss 9.7739   LearningRate 0.0824   Epoch: 1   Global Step: 10510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:21,785-Speed 5713.79 samples/sec   Loss 9.7987   LearningRate 0.0824   Epoch: 1   Global Step: 10520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:23,596-Speed 5658.22 samples/sec   Loss 9.7885   LearningRate 0.0823   Epoch: 1   Global Step: 10530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:25,377-Speed 5751.58 samples/sec   Loss 9.8699   LearningRate 0.0823   Epoch: 1   Global Step: 10540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:27,169-Speed 5716.71 samples/sec   Loss 9.9679   LearningRate 0.0823   Epoch: 1   Global Step: 10550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:28,980-Speed 5659.08 samples/sec   Loss 9.7239   LearningRate 0.0823   Epoch: 1   Global Step: 10560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:30,763-Speed 5746.18 samples/sec   Loss 9.8029   LearningRate 0.0823   Epoch: 1   Global Step: 10570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:32,544-Speed 5749.02 samples/sec   Loss 9.6602   LearningRate 0.0823   Epoch: 1   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:34,364-Speed 5630.46 samples/sec   Loss 9.8698   LearningRate 0.0822   Epoch: 1   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:36,182-Speed 5633.84 samples/sec   Loss 9.6672   LearningRate 0.0822   Epoch: 1   Global Step: 10600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:37,992-Speed 5660.46 samples/sec   Loss 9.6954   LearningRate 0.0822   Epoch: 1   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:39,780-Speed 5729.60 samples/sec   Loss 9.8189   LearningRate 0.0822   Epoch: 1   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:41,584-Speed 5680.10 samples/sec   Loss 9.6169   LearningRate 0.0822   Epoch: 1   Global Step: 10630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:43,367-Speed 5744.79 samples/sec   Loss 9.8236   LearningRate 0.0822   Epoch: 1   Global Step: 10640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:45,182-Speed 5646.63 samples/sec   Loss 9.8467   LearningRate 0.0821   Epoch: 1   Global Step: 10650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:46,965-Speed 5742.65 samples/sec   Loss 9.9139   LearningRate 0.0821   Epoch: 1   Global Step: 10660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:48,747-Speed 5751.36 samples/sec   Loss 9.9436   LearningRate 0.0821   Epoch: 1   Global Step: 10670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:50,547-Speed 5691.52 samples/sec   Loss 9.8468   LearningRate 0.0821   Epoch: 1   Global Step: 10680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:52,358-Speed 5654.64 samples/sec   Loss 9.7217   LearningRate 0.0821   Epoch: 1   Global Step: 10690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:54,159-Speed 5690.17 samples/sec   Loss 9.7731   LearningRate 0.0821   Epoch: 1   Global Step: 10700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:07:55,940-Speed 5753.10 samples/sec   Loss 9.6341   LearningRate 0.0821   Epoch: 1   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:57,739-Speed 5694.90 samples/sec   Loss 9.6540   LearningRate 0.0820   Epoch: 1   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:07:59,524-Speed 5737.70 samples/sec   Loss 9.8116   LearningRate 0.0820   Epoch: 1   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:01,344-Speed 5630.40 samples/sec   Loss 9.9032   LearningRate 0.0820   Epoch: 1   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:03,136-Speed 5719.31 samples/sec   Loss 9.8360   LearningRate 0.0820   Epoch: 1   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:04,922-Speed 5735.40 samples/sec   Loss 9.8885   LearningRate 0.0820   Epoch: 1   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:06,758-Speed 5578.42 samples/sec   Loss 9.6768   LearningRate 0.0820   Epoch: 1   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:08,560-Speed 5686.02 samples/sec   Loss 9.8426   LearningRate 0.0819   Epoch: 1   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:10,356-Speed 5704.01 samples/sec   Loss 9.6664   LearningRate 0.0819   Epoch: 1   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:12,136-Speed 5755.49 samples/sec   Loss 9.8293   LearningRate 0.0819   Epoch: 1   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:13,934-Speed 5697.87 samples/sec   Loss 9.7266   LearningRate 0.0819   Epoch: 1   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:15,711-Speed 5764.57 samples/sec   Loss 9.6517   LearningRate 0.0819   Epoch: 1   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:17,492-Speed 5751.54 samples/sec   Loss 9.5228   LearningRate 0.0819   Epoch: 1   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:19,298-Speed 5671.88 samples/sec   Loss 9.7144   LearningRate 0.0818   Epoch: 1   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:21,118-Speed 5631.21 samples/sec   Loss 9.8718   LearningRate 0.0818   Epoch: 1   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:22,948-Speed 5597.10 samples/sec   Loss 9.9407   LearningRate 0.0818   Epoch: 1   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:24,813-Speed 5495.01 samples/sec   Loss 9.6680   LearningRate 0.0818   Epoch: 1   Global Step: 10870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:26,632-Speed 5629.16 samples/sec   Loss 9.7796   LearningRate 0.0818   Epoch: 1   Global Step: 10880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:28,415-Speed 5747.90 samples/sec   Loss 9.8308   LearningRate 0.0818   Epoch: 1   Global Step: 10890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:30,196-Speed 5752.24 samples/sec   Loss 9.7361   LearningRate 0.0817   Epoch: 1   Global Step: 10900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:32,001-Speed 5674.32 samples/sec   Loss 9.7923   LearningRate 0.0817   Epoch: 1   Global Step: 10910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:33,781-Speed 5755.30 samples/sec   Loss 9.6319   LearningRate 0.0817   Epoch: 1   Global Step: 10920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:35,569-Speed 5729.09 samples/sec   Loss 9.8211   LearningRate 0.0817   Epoch: 1   Global Step: 10930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:37,388-Speed 5634.16 samples/sec   Loss 9.8293   LearningRate 0.0817   Epoch: 1   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:39,239-Speed 5534.46 samples/sec   Loss 9.9672   LearningRate 0.0817   Epoch: 1   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:41,046-Speed 5666.85 samples/sec   Loss 9.7632   LearningRate 0.0817   Epoch: 1   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:42,847-Speed 5688.68 samples/sec   Loss 9.7668   LearningRate 0.0816   Epoch: 1   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:08:44,644-Speed 5702.25 samples/sec   Loss 9.8728   LearningRate 0.0816   Epoch: 1   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:46,435-Speed 5721.04 samples/sec   Loss 9.7549   LearningRate 0.0816   Epoch: 1   Global Step: 10990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:48,217-Speed 5746.58 samples/sec   Loss 9.9684   LearningRate 0.0816   Epoch: 1   Global Step: 11000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:50,010-Speed 5713.86 samples/sec   Loss 9.8606   LearningRate 0.0816   Epoch: 1   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:51,806-Speed 5702.64 samples/sec   Loss 9.8088   LearningRate 0.0816   Epoch: 1   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:53,600-Speed 5711.63 samples/sec   Loss 9.5473   LearningRate 0.0815   Epoch: 1   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:55,389-Speed 5727.75 samples/sec   Loss 9.6388   LearningRate 0.0815   Epoch: 1   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:57,222-Speed 5586.92 samples/sec   Loss 9.8127   LearningRate 0.0815   Epoch: 1   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:08:59,054-Speed 5591.72 samples/sec   Loss 9.7477   LearningRate 0.0815   Epoch: 1   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:00,884-Speed 5597.74 samples/sec   Loss 9.6857   LearningRate 0.0815   Epoch: 1   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:02,714-Speed 5599.92 samples/sec   Loss 9.6408   LearningRate 0.0815   Epoch: 1   Global Step: 11080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:04,537-Speed 5618.70 samples/sec   Loss 9.7677   LearningRate 0.0814   Epoch: 1   Global Step: 11090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:06,378-Speed 5564.01 samples/sec   Loss 9.7061   LearningRate 0.0814   Epoch: 1   Global Step: 11100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:08,224-Speed 5549.96 samples/sec   Loss 9.6771   LearningRate 0.0814   Epoch: 1   Global Step: 11110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:10,025-Speed 5687.94 samples/sec   Loss 9.6602   LearningRate 0.0814   Epoch: 1   Global Step: 11120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:11,826-Speed 5688.79 samples/sec   Loss 9.8005   LearningRate 0.0814   Epoch: 1   Global Step: 11130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:13,658-Speed 5592.60 samples/sec   Loss 9.7799   LearningRate 0.0814   Epoch: 1   Global Step: 11140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:15,527-Speed 5482.37 samples/sec   Loss 9.8196   LearningRate 0.0814   Epoch: 1   Global Step: 11150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:17,372-Speed 5550.20 samples/sec   Loss 9.7291   LearningRate 0.0813   Epoch: 1   Global Step: 11160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:19,190-Speed 5638.18 samples/sec   Loss 9.8940   LearningRate 0.0813   Epoch: 1   Global Step: 11170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:20,996-Speed 5669.80 samples/sec   Loss 9.7434   LearningRate 0.0813   Epoch: 1   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:22,791-Speed 5708.78 samples/sec   Loss 9.8405   LearningRate 0.0813   Epoch: 1   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:24,587-Speed 5705.59 samples/sec   Loss 9.8786   LearningRate 0.0813   Epoch: 1   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:26,419-Speed 5591.23 samples/sec   Loss 9.7683   LearningRate 0.0813   Epoch: 1   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:28,232-Speed 5649.76 samples/sec   Loss 9.7171   LearningRate 0.0812   Epoch: 1   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:30,033-Speed 5688.78 samples/sec   Loss 9.5069   LearningRate 0.0812   Epoch: 1   Global Step: 11230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:31,825-Speed 5717.90 samples/sec   Loss 9.6315   LearningRate 0.0812   Epoch: 1   Global Step: 11240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:33,656-Speed 5595.38 samples/sec   Loss 9.7714   LearningRate 0.0812   Epoch: 1   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:35,448-Speed 5716.26 samples/sec   Loss 9.6129   LearningRate 0.0812   Epoch: 1   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:37,245-Speed 5700.77 samples/sec   Loss 9.8072   LearningRate 0.0812   Epoch: 1   Global Step: 11270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:39,039-Speed 5708.82 samples/sec   Loss 9.7780   LearningRate 0.0811   Epoch: 1   Global Step: 11280   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:09:40,832-Speed 5715.37 samples/sec   Loss 9.5819   LearningRate 0.0811   Epoch: 1   Global Step: 11290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:42,690-Speed 5510.79 samples/sec   Loss 9.6404   LearningRate 0.0811   Epoch: 1   Global Step: 11300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:44,511-Speed 5629.84 samples/sec   Loss 9.7320   LearningRate 0.0811   Epoch: 1   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:46,352-Speed 5562.19 samples/sec   Loss 9.6451   LearningRate 0.0811   Epoch: 1   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:48,142-Speed 5723.60 samples/sec   Loss 9.7155   LearningRate 0.0811   Epoch: 1   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:09:49,934-Speed 5716.63 samples/sec   Loss 9.6498   LearningRate 0.0811   Epoch: 1   Global Step: 11340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:51,798-Speed 5495.42 samples/sec   Loss 9.6612   LearningRate 0.0810   Epoch: 1   Global Step: 11350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:53,605-Speed 5673.48 samples/sec   Loss 9.8116   LearningRate 0.0810   Epoch: 1   Global Step: 11360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:09:55,485-Speed 5449.08 samples/sec   Loss 9.5793   LearningRate 0.0810   Epoch: 1   Global Step: 11370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:08,206-Speed 805.04 samples/sec   Loss 9.2617   LearningRate 0.0810   Epoch: 2   Global Step: 11380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:10,038-Speed 5594.23 samples/sec   Loss 9.0670   LearningRate 0.0810   Epoch: 2   Global Step: 11390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:11,860-Speed 5630.21 samples/sec   Loss 9.1420   LearningRate 0.0810   Epoch: 2   Global Step: 11400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:13,697-Speed 5576.55 samples/sec   Loss 8.9711   LearningRate 0.0809   Epoch: 2   Global Step: 11410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:15,506-Speed 5661.46 samples/sec   Loss 9.0038   LearningRate 0.0809   Epoch: 2   Global Step: 11420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:17,416-Speed 5365.68 samples/sec   Loss 8.9337   LearningRate 0.0809   Epoch: 2   Global Step: 11430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:19,224-Speed 5667.18 samples/sec   Loss 9.0977   LearningRate 0.0809   Epoch: 2   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:10:21,022-Speed 5700.53 samples/sec   Loss 8.9034   LearningRate 0.0809   Epoch: 2   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:10:22,821-Speed 5692.57 samples/sec   Loss 8.9371   LearningRate 0.0809   Epoch: 2   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:10:24,626-Speed 5677.02 samples/sec   Loss 9.0995   LearningRate 0.0808   Epoch: 2   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:10:26,441-Speed 5644.92 samples/sec   Loss 9.0033   LearningRate 0.0808   Epoch: 2   Global Step: 11480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:28,243-Speed 5686.07 samples/sec   Loss 9.2327   LearningRate 0.0808   Epoch: 2   Global Step: 11490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:30,050-Speed 5667.80 samples/sec   Loss 9.1152   LearningRate 0.0808   Epoch: 2   Global Step: 11500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:31,851-Speed 5690.09 samples/sec   Loss 9.1559   LearningRate 0.0808   Epoch: 2   Global Step: 11510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:33,657-Speed 5672.29 samples/sec   Loss 9.1925   LearningRate 0.0808   Epoch: 2   Global Step: 11520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:35,453-Speed 5703.94 samples/sec   Loss 9.0908   LearningRate 0.0808   Epoch: 2   Global Step: 11530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:37,267-Speed 5648.04 samples/sec   Loss 9.2335   LearningRate 0.0807   Epoch: 2   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:39,073-Speed 5672.00 samples/sec   Loss 9.2894   LearningRate 0.0807   Epoch: 2   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:40,886-Speed 5650.14 samples/sec   Loss 9.3186   LearningRate 0.0807   Epoch: 2   Global Step: 11560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:10:42,667-Speed 5751.60 samples/sec   Loss 9.2571   LearningRate 0.0807   Epoch: 2   Global Step: 11570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:44,477-Speed 5660.63 samples/sec   Loss 9.1216   LearningRate 0.0807   Epoch: 2   Global Step: 11580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:46,274-Speed 5700.41 samples/sec   Loss 9.3299   LearningRate 0.0807   Epoch: 2   Global Step: 11590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:48,085-Speed 5658.32 samples/sec   Loss 9.3656   LearningRate 0.0806   Epoch: 2   Global Step: 11600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:49,893-Speed 5664.77 samples/sec   Loss 9.3284   LearningRate 0.0806   Epoch: 2   Global Step: 11610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:51,697-Speed 5678.89 samples/sec   Loss 9.0962   LearningRate 0.0806   Epoch: 2   Global Step: 11620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:53,501-Speed 5680.15 samples/sec   Loss 9.1851   LearningRate 0.0806   Epoch: 2   Global Step: 11630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:55,353-Speed 5531.62 samples/sec   Loss 9.1147   LearningRate 0.0806   Epoch: 2   Global Step: 11640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:57,185-Speed 5591.61 samples/sec   Loss 9.2534   LearningRate 0.0806   Epoch: 2   Global Step: 11650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:10:58,979-Speed 5712.31 samples/sec   Loss 9.2760   LearningRate 0.0805   Epoch: 2   Global Step: 11660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:11:00,813-Speed 5584.40 samples/sec   Loss 9.2043   LearningRate 0.0805   Epoch: 2   Global Step: 11670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:02,638-Speed 5614.59 samples/sec   Loss 9.2597   LearningRate 0.0805   Epoch: 2   Global Step: 11680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:04,439-Speed 5689.00 samples/sec   Loss 9.2699   LearningRate 0.0805   Epoch: 2   Global Step: 11690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:06,266-Speed 5606.91 samples/sec   Loss 9.0967   LearningRate 0.0805   Epoch: 2   Global Step: 11700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:08,077-Speed 5657.03 samples/sec   Loss 9.2662   LearningRate 0.0805   Epoch: 2   Global Step: 11710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:09,916-Speed 5571.46 samples/sec   Loss 9.1465   LearningRate 0.0805   Epoch: 2   Global Step: 11720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:11,724-Speed 5666.66 samples/sec   Loss 9.3204   LearningRate 0.0804   Epoch: 2   Global Step: 11730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:13,544-Speed 5627.26 samples/sec   Loss 9.5022   LearningRate 0.0804   Epoch: 2   Global Step: 11740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:15,370-Speed 5610.85 samples/sec   Loss 9.5140   LearningRate 0.0804   Epoch: 2   Global Step: 11750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:17,175-Speed 5675.50 samples/sec   Loss 9.3734   LearningRate 0.0804   Epoch: 2   Global Step: 11760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:18,975-Speed 5697.34 samples/sec   Loss 9.3977   LearningRate 0.0804   Epoch: 2   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:20,813-Speed 5574.06 samples/sec   Loss 9.5220   LearningRate 0.0804   Epoch: 2   Global Step: 11780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:22,699-Speed 5431.02 samples/sec   Loss 9.3947   LearningRate 0.0803   Epoch: 2   Global Step: 11790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:24,576-Speed 5456.77 samples/sec   Loss 9.2618   LearningRate 0.0803   Epoch: 2   Global Step: 11800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:26,451-Speed 5463.43 samples/sec   Loss 9.1862   LearningRate 0.0803   Epoch: 2   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:28,325-Speed 5465.13 samples/sec   Loss 9.3562   LearningRate 0.0803   Epoch: 2   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:30,234-Speed 5367.59 samples/sec   Loss 9.1975   LearningRate 0.0803   Epoch: 2   Global Step: 11830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:32,042-Speed 5667.24 samples/sec   Loss 9.4039   LearningRate 0.0803   Epoch: 2   Global Step: 11840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:33,868-Speed 5607.16 samples/sec   Loss 9.2815   LearningRate 0.0802   Epoch: 2   Global Step: 11850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:35,675-Speed 5669.48 samples/sec   Loss 9.3999   LearningRate 0.0802   Epoch: 2   Global Step: 11860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:37,459-Speed 5742.44 samples/sec   Loss 9.3696   LearningRate 0.0802   Epoch: 2   Global Step: 11870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:39,287-Speed 5603.49 samples/sec   Loss 9.2750   LearningRate 0.0802   Epoch: 2   Global Step: 11880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:41,094-Speed 5671.26 samples/sec   Loss 9.1223   LearningRate 0.0802   Epoch: 2   Global Step: 11890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:42,886-Speed 5714.56 samples/sec   Loss 9.2844   LearningRate 0.0802   Epoch: 2   Global Step: 11900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:44,682-Speed 5704.57 samples/sec   Loss 9.3571   LearningRate 0.0802   Epoch: 2   Global Step: 11910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:46,473-Speed 5719.87 samples/sec   Loss 9.2590   LearningRate 0.0801   Epoch: 2   Global Step: 11920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:11:48,263-Speed 5722.84 samples/sec   Loss 9.3360   LearningRate 0.0801   Epoch: 2   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:50,109-Speed 5551.23 samples/sec   Loss 9.4474   LearningRate 0.0801   Epoch: 2   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:51,904-Speed 5705.32 samples/sec   Loss 9.4279   LearningRate 0.0801   Epoch: 2   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:53,709-Speed 5675.03 samples/sec   Loss 9.5476   LearningRate 0.0801   Epoch: 2   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:55,551-Speed 5562.13 samples/sec   Loss 9.4805   LearningRate 0.0801   Epoch: 2   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:57,387-Speed 5579.85 samples/sec   Loss 9.3332   LearningRate 0.0800   Epoch: 2   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:11:59,223-Speed 5579.55 samples/sec   Loss 9.3122   LearningRate 0.0800   Epoch: 2   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:12:01,061-Speed 5570.94 samples/sec   Loss 9.2695   LearningRate 0.0800   Epoch: 2   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:12:28,012-[lfw][12000]XNorm: 22.465708
Training: 2022-04-27 02:12:28,013-[lfw][12000]Accuracy-Flip: 0.99583+-0.00344
Training: 2022-04-27 02:12:28,013-[lfw][12000]Accuracy-Highest: 0.99583
Training: 2022-04-27 02:12:58,917-[cfp_fp][12000]XNorm: 19.763358
Training: 2022-04-27 02:12:58,918-[cfp_fp][12000]Accuracy-Flip: 0.91529+-0.01357
Training: 2022-04-27 02:12:58,919-[cfp_fp][12000]Accuracy-Highest: 0.91529
Training: 2022-04-27 02:13:25,781-[agedb_30][12000]XNorm: 22.394243
Training: 2022-04-27 02:13:25,781-[agedb_30][12000]Accuracy-Flip: 0.95533+-0.00974
Training: 2022-04-27 02:13:25,782-[agedb_30][12000]Accuracy-Highest: 0.95567
Training: 2022-04-27 02:13:27,585-Speed 118.35 samples/sec   Loss 9.5076   LearningRate 0.0800   Epoch: 2   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:29,385-Speed 5692.04 samples/sec   Loss 9.3575   LearningRate 0.0800   Epoch: 2   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:31,176-Speed 5720.54 samples/sec   Loss 9.3931   LearningRate 0.0800   Epoch: 2   Global Step: 12030   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:13:32,960-Speed 5742.65 samples/sec   Loss 9.3345   LearningRate 0.0799   Epoch: 2   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:34,771-Speed 5658.50 samples/sec   Loss 9.3602   LearningRate 0.0799   Epoch: 2   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:36,591-Speed 5629.37 samples/sec   Loss 9.3711   LearningRate 0.0799   Epoch: 2   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:38,400-Speed 5664.71 samples/sec   Loss 9.3314   LearningRate 0.0799   Epoch: 2   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:40,214-Speed 5645.95 samples/sec   Loss 9.3767   LearningRate 0.0799   Epoch: 2   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:42,022-Speed 5669.45 samples/sec   Loss 9.4185   LearningRate 0.0799   Epoch: 2   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:43,813-Speed 5717.25 samples/sec   Loss 9.2434   LearningRate 0.0799   Epoch: 2   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:45,603-Speed 5724.28 samples/sec   Loss 9.4975   LearningRate 0.0798   Epoch: 2   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:47,421-Speed 5636.64 samples/sec   Loss 9.4370   LearningRate 0.0798   Epoch: 2   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:49,227-Speed 5673.87 samples/sec   Loss 9.3191   LearningRate 0.0798   Epoch: 2   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:51,017-Speed 5721.97 samples/sec   Loss 9.3566   LearningRate 0.0798   Epoch: 2   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:52,810-Speed 5717.46 samples/sec   Loss 9.3580   LearningRate 0.0798   Epoch: 2   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:54,610-Speed 5692.95 samples/sec   Loss 9.4980   LearningRate 0.0798   Epoch: 2   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:56,407-Speed 5702.37 samples/sec   Loss 9.4481   LearningRate 0.0797   Epoch: 2   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:13:58,201-Speed 5710.67 samples/sec   Loss 9.4696   LearningRate 0.0797   Epoch: 2   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:00,032-Speed 5595.36 samples/sec   Loss 9.4665   LearningRate 0.0797   Epoch: 2   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:01,864-Speed 5594.64 samples/sec   Loss 9.3831   LearningRate 0.0797   Epoch: 2   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:03,674-Speed 5660.99 samples/sec   Loss 9.2765   LearningRate 0.0797   Epoch: 2   Global Step: 12210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:05,490-Speed 5640.21 samples/sec   Loss 9.3375   LearningRate 0.0797   Epoch: 2   Global Step: 12220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:07,324-Speed 5588.06 samples/sec   Loss 9.4343   LearningRate 0.0796   Epoch: 2   Global Step: 12230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:09,134-Speed 5660.31 samples/sec   Loss 9.3732   LearningRate 0.0796   Epoch: 2   Global Step: 12240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:10,948-Speed 5647.55 samples/sec   Loss 9.5760   LearningRate 0.0796   Epoch: 2   Global Step: 12250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:12,778-Speed 5600.50 samples/sec   Loss 9.4338   LearningRate 0.0796   Epoch: 2   Global Step: 12260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:14,607-Speed 5601.11 samples/sec   Loss 9.4199   LearningRate 0.0796   Epoch: 2   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:16,420-Speed 5652.49 samples/sec   Loss 9.2725   LearningRate 0.0796   Epoch: 2   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:18,225-Speed 5676.90 samples/sec   Loss 9.4097   LearningRate 0.0796   Epoch: 2   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:20,039-Speed 5646.55 samples/sec   Loss 9.3327   LearningRate 0.0795   Epoch: 2   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:21,841-Speed 5684.77 samples/sec   Loss 9.1447   LearningRate 0.0795   Epoch: 2   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:23,644-Speed 5691.64 samples/sec   Loss 9.5159   LearningRate 0.0795   Epoch: 2   Global Step: 12320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:25,441-Speed 5700.06 samples/sec   Loss 9.3298   LearningRate 0.0795   Epoch: 2   Global Step: 12330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:27,243-Speed 5686.71 samples/sec   Loss 9.4500   LearningRate 0.0795   Epoch: 2   Global Step: 12340   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:14:29,028-Speed 5740.58 samples/sec   Loss 9.3471   LearningRate 0.0795   Epoch: 2   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:30,818-Speed 5721.92 samples/sec   Loss 9.5173   LearningRate 0.0794   Epoch: 2   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:32,635-Speed 5641.55 samples/sec   Loss 9.4131   LearningRate 0.0794   Epoch: 2   Global Step: 12370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:34,456-Speed 5625.16 samples/sec   Loss 9.4754   LearningRate 0.0794   Epoch: 2   Global Step: 12380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:36,261-Speed 5676.34 samples/sec   Loss 9.3648   LearningRate 0.0794   Epoch: 2   Global Step: 12390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:38,047-Speed 5734.67 samples/sec   Loss 9.4630   LearningRate 0.0794   Epoch: 2   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:39,834-Speed 5736.01 samples/sec   Loss 9.3522   LearningRate 0.0794   Epoch: 2   Global Step: 12410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:41,664-Speed 5595.25 samples/sec   Loss 9.2821   LearningRate 0.0793   Epoch: 2   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:43,454-Speed 5723.61 samples/sec   Loss 9.1606   LearningRate 0.0793   Epoch: 2   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:45,239-Speed 5741.64 samples/sec   Loss 9.4430   LearningRate 0.0793   Epoch: 2   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:47,072-Speed 5588.88 samples/sec   Loss 9.3903   LearningRate 0.0793   Epoch: 2   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:48,862-Speed 5723.39 samples/sec   Loss 9.2064   LearningRate 0.0793   Epoch: 2   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:50,668-Speed 5673.63 samples/sec   Loss 9.4083   LearningRate 0.0793   Epoch: 2   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:52,474-Speed 5671.84 samples/sec   Loss 9.4826   LearningRate 0.0793   Epoch: 2   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:14:54,283-Speed 5665.11 samples/sec   Loss 9.2998   LearningRate 0.0792   Epoch: 2   Global Step: 12490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:14:56,080-Speed 5698.33 samples/sec   Loss 9.3792   LearningRate 0.0792   Epoch: 2   Global Step: 12500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:14:57,886-Speed 5673.24 samples/sec   Loss 9.2571   LearningRate 0.0792   Epoch: 2   Global Step: 12510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:14:59,698-Speed 5655.43 samples/sec   Loss 9.2782   LearningRate 0.0792   Epoch: 2   Global Step: 12520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:01,486-Speed 5729.99 samples/sec   Loss 9.3029   LearningRate 0.0792   Epoch: 2   Global Step: 12530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:03,305-Speed 5630.89 samples/sec   Loss 9.4119   LearningRate 0.0792   Epoch: 2   Global Step: 12540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:05,154-Speed 5541.00 samples/sec   Loss 9.2772   LearningRate 0.0791   Epoch: 2   Global Step: 12550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:06,955-Speed 5687.72 samples/sec   Loss 9.3303   LearningRate 0.0791   Epoch: 2   Global Step: 12560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:08,767-Speed 5653.83 samples/sec   Loss 9.2241   LearningRate 0.0791   Epoch: 2   Global Step: 12570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:10,567-Speed 5694.36 samples/sec   Loss 9.2883   LearningRate 0.0791   Epoch: 2   Global Step: 12580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:12,375-Speed 5664.79 samples/sec   Loss 9.4877   LearningRate 0.0791   Epoch: 2   Global Step: 12590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:14,189-Speed 5649.67 samples/sec   Loss 9.3395   LearningRate 0.0791   Epoch: 2   Global Step: 12600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:16,003-Speed 5645.89 samples/sec   Loss 9.4158   LearningRate 0.0791   Epoch: 2   Global Step: 12610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:17,811-Speed 5667.52 samples/sec   Loss 9.3832   LearningRate 0.0790   Epoch: 2   Global Step: 12620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:19,636-Speed 5613.35 samples/sec   Loss 9.4274   LearningRate 0.0790   Epoch: 2   Global Step: 12630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:21,455-Speed 5635.22 samples/sec   Loss 9.3548   LearningRate 0.0790   Epoch: 2   Global Step: 12640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:23,267-Speed 5653.65 samples/sec   Loss 9.1487   LearningRate 0.0790   Epoch: 2   Global Step: 12650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:25,085-Speed 5634.44 samples/sec   Loss 9.4753   LearningRate 0.0790   Epoch: 2   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:26,879-Speed 5708.99 samples/sec   Loss 9.2618   LearningRate 0.0790   Epoch: 2   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:28,670-Speed 5721.46 samples/sec   Loss 9.2798   LearningRate 0.0789   Epoch: 2   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:30,464-Speed 5710.73 samples/sec   Loss 9.1937   LearningRate 0.0789   Epoch: 2   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:32,250-Speed 5737.35 samples/sec   Loss 9.2828   LearningRate 0.0789   Epoch: 2   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:34,049-Speed 5692.49 samples/sec   Loss 9.1053   LearningRate 0.0789   Epoch: 2   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:35,831-Speed 5750.61 samples/sec   Loss 9.1628   LearningRate 0.0789   Epoch: 2   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:37,614-Speed 5745.39 samples/sec   Loss 9.2784   LearningRate 0.0789   Epoch: 2   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:39,403-Speed 5728.15 samples/sec   Loss 9.4009   LearningRate 0.0788   Epoch: 2   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:41,202-Speed 5691.74 samples/sec   Loss 9.2501   LearningRate 0.0788   Epoch: 2   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:43,009-Speed 5670.09 samples/sec   Loss 9.2483   LearningRate 0.0788   Epoch: 2   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:44,812-Speed 5683.32 samples/sec   Loss 9.4020   LearningRate 0.0788   Epoch: 2   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:46,616-Speed 5678.56 samples/sec   Loss 9.2484   LearningRate 0.0788   Epoch: 2   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:48,417-Speed 5690.43 samples/sec   Loss 9.4391   LearningRate 0.0788   Epoch: 2   Global Step: 12790   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:15:50,224-Speed 5671.48 samples/sec   Loss 9.3372   LearningRate 0.0788   Epoch: 2   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:15:52,001-Speed 5764.96 samples/sec   Loss 9.1398   LearningRate 0.0787   Epoch: 2   Global Step: 12810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:53,807-Speed 5670.73 samples/sec   Loss 9.2626   LearningRate 0.0787   Epoch: 2   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:55,605-Speed 5700.43 samples/sec   Loss 9.3898   LearningRate 0.0787   Epoch: 2   Global Step: 12830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:57,427-Speed 5620.85 samples/sec   Loss 9.2632   LearningRate 0.0787   Epoch: 2   Global Step: 12840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:15:59,225-Speed 5700.07 samples/sec   Loss 9.4134   LearningRate 0.0787   Epoch: 2   Global Step: 12850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:16:01,027-Speed 5687.02 samples/sec   Loss 9.2101   LearningRate 0.0787   Epoch: 2   Global Step: 12860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:16:02,844-Speed 5636.88 samples/sec   Loss 9.4717   LearningRate 0.0786   Epoch: 2   Global Step: 12870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:16:04,645-Speed 5688.20 samples/sec   Loss 9.2227   LearningRate 0.0786   Epoch: 2   Global Step: 12880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:16:06,441-Speed 5703.72 samples/sec   Loss 9.3224   LearningRate 0.0786   Epoch: 2   Global Step: 12890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:16:08,258-Speed 5640.48 samples/sec   Loss 9.3852   LearningRate 0.0786   Epoch: 2   Global Step: 12900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:16:10,094-Speed 5580.50 samples/sec   Loss 9.4453   LearningRate 0.0786   Epoch: 2   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:11,884-Speed 5723.07 samples/sec   Loss 9.3059   LearningRate 0.0786   Epoch: 2   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:13,686-Speed 5683.94 samples/sec   Loss 9.2878   LearningRate 0.0786   Epoch: 2   Global Step: 12930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:15,484-Speed 5699.37 samples/sec   Loss 9.4831   LearningRate 0.0785   Epoch: 2   Global Step: 12940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:17,309-Speed 5614.46 samples/sec   Loss 9.3608   LearningRate 0.0785   Epoch: 2   Global Step: 12950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:19,101-Speed 5715.53 samples/sec   Loss 9.2854   LearningRate 0.0785   Epoch: 2   Global Step: 12960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:20,889-Speed 5730.25 samples/sec   Loss 9.1612   LearningRate 0.0785   Epoch: 2   Global Step: 12970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:22,679-Speed 5723.87 samples/sec   Loss 9.2057   LearningRate 0.0785   Epoch: 2   Global Step: 12980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:24,462-Speed 5744.29 samples/sec   Loss 9.4271   LearningRate 0.0785   Epoch: 2   Global Step: 12990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:26,271-Speed 5663.00 samples/sec   Loss 9.3324   LearningRate 0.0784   Epoch: 2   Global Step: 13000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:28,065-Speed 5710.18 samples/sec   Loss 9.3045   LearningRate 0.0784   Epoch: 2   Global Step: 13010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:29,876-Speed 5660.49 samples/sec   Loss 9.3302   LearningRate 0.0784   Epoch: 2   Global Step: 13020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:31,717-Speed 5563.51 samples/sec   Loss 9.3896   LearningRate 0.0784   Epoch: 2   Global Step: 13030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:33,557-Speed 5567.77 samples/sec   Loss 9.2565   LearningRate 0.0784   Epoch: 2   Global Step: 13040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:35,359-Speed 5685.98 samples/sec   Loss 9.4717   LearningRate 0.0784   Epoch: 2   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:37,163-Speed 5679.75 samples/sec   Loss 9.2247   LearningRate 0.0784   Epoch: 2   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:38,961-Speed 5697.94 samples/sec   Loss 9.2387   LearningRate 0.0783   Epoch: 2   Global Step: 13070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:40,822-Speed 5511.04 samples/sec   Loss 9.3374   LearningRate 0.0783   Epoch: 2   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:42,653-Speed 5594.38 samples/sec   Loss 9.2270   LearningRate 0.0783   Epoch: 2   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:44,451-Speed 5697.21 samples/sec   Loss 9.4634   LearningRate 0.0783   Epoch: 2   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:46,250-Speed 5694.02 samples/sec   Loss 9.3938   LearningRate 0.0783   Epoch: 2   Global Step: 13110   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:16:48,024-Speed 5775.47 samples/sec   Loss 9.3965   LearningRate 0.0783   Epoch: 2   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:49,839-Speed 5645.20 samples/sec   Loss 9.2376   LearningRate 0.0782   Epoch: 2   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:51,668-Speed 5602.65 samples/sec   Loss 9.2878   LearningRate 0.0782   Epoch: 2   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:53,450-Speed 5748.76 samples/sec   Loss 9.2624   LearningRate 0.0782   Epoch: 2   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:55,292-Speed 5561.25 samples/sec   Loss 9.2897   LearningRate 0.0782   Epoch: 2   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:57,087-Speed 5708.26 samples/sec   Loss 9.3799   LearningRate 0.0782   Epoch: 2   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:16:58,876-Speed 5727.82 samples/sec   Loss 9.2809   LearningRate 0.0782   Epoch: 2   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:00,663-Speed 5736.44 samples/sec   Loss 9.4861   LearningRate 0.0781   Epoch: 2   Global Step: 13190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:02,472-Speed 5662.59 samples/sec   Loss 9.0390   LearningRate 0.0781   Epoch: 2   Global Step: 13200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:04,255-Speed 5745.27 samples/sec   Loss 9.1706   LearningRate 0.0781   Epoch: 2   Global Step: 13210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:06,060-Speed 5676.23 samples/sec   Loss 9.3322   LearningRate 0.0781   Epoch: 2   Global Step: 13220   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:17:07,840-Speed 5755.00 samples/sec   Loss 9.3250   LearningRate 0.0781   Epoch: 2   Global Step: 13230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:09,654-Speed 5647.51 samples/sec   Loss 9.0229   LearningRate 0.0781   Epoch: 2   Global Step: 13240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:11,443-Speed 5727.21 samples/sec   Loss 9.3978   LearningRate 0.0781   Epoch: 2   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:13,255-Speed 5654.38 samples/sec   Loss 9.2418   LearningRate 0.0780   Epoch: 2   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:15,041-Speed 5734.44 samples/sec   Loss 9.3671   LearningRate 0.0780   Epoch: 2   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:16,854-Speed 5651.64 samples/sec   Loss 9.3084   LearningRate 0.0780   Epoch: 2   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:18,661-Speed 5673.09 samples/sec   Loss 9.1307   LearningRate 0.0780   Epoch: 2   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:20,476-Speed 5644.08 samples/sec   Loss 9.0579   LearningRate 0.0780   Epoch: 2   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:22,283-Speed 5669.70 samples/sec   Loss 9.2165   LearningRate 0.0780   Epoch: 2   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:24,103-Speed 5627.38 samples/sec   Loss 9.0822   LearningRate 0.0779   Epoch: 2   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:25,905-Speed 5688.35 samples/sec   Loss 9.2273   LearningRate 0.0779   Epoch: 2   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:27,704-Speed 5694.72 samples/sec   Loss 9.1711   LearningRate 0.0779   Epoch: 2   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:29,512-Speed 5666.64 samples/sec   Loss 9.2171   LearningRate 0.0779   Epoch: 2   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:31,314-Speed 5685.80 samples/sec   Loss 9.2651   LearningRate 0.0779   Epoch: 2   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:33,122-Speed 5666.24 samples/sec   Loss 9.2197   LearningRate 0.0779   Epoch: 2   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:34,927-Speed 5675.40 samples/sec   Loss 9.2391   LearningRate 0.0779   Epoch: 2   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:36,716-Speed 5728.60 samples/sec   Loss 9.0824   LearningRate 0.0778   Epoch: 2   Global Step: 13390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:38,540-Speed 5616.25 samples/sec   Loss 9.2153   LearningRate 0.0778   Epoch: 2   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:40,330-Speed 5725.94 samples/sec   Loss 9.1652   LearningRate 0.0778   Epoch: 2   Global Step: 13410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:42,138-Speed 5666.60 samples/sec   Loss 9.0979   LearningRate 0.0778   Epoch: 2   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:43,931-Speed 5713.59 samples/sec   Loss 9.1539   LearningRate 0.0778   Epoch: 2   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:45,724-Speed 5711.73 samples/sec   Loss 9.0078   LearningRate 0.0778   Epoch: 2   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:47,526-Speed 5686.92 samples/sec   Loss 9.3867   LearningRate 0.0777   Epoch: 2   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:49,325-Speed 5695.61 samples/sec   Loss 9.1632   LearningRate 0.0777   Epoch: 2   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:51,137-Speed 5653.24 samples/sec   Loss 9.1073   LearningRate 0.0777   Epoch: 2   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:52,982-Speed 5552.04 samples/sec   Loss 9.3281   LearningRate 0.0777   Epoch: 2   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:54,766-Speed 5744.63 samples/sec   Loss 9.1881   LearningRate 0.0777   Epoch: 2   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:56,567-Speed 5688.65 samples/sec   Loss 9.1681   LearningRate 0.0777   Epoch: 2   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:17:58,388-Speed 5626.12 samples/sec   Loss 9.1761   LearningRate 0.0777   Epoch: 2   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:00,213-Speed 5614.67 samples/sec   Loss 9.2880   LearningRate 0.0776   Epoch: 2   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:02,074-Speed 5505.18 samples/sec   Loss 9.1612   LearningRate 0.0776   Epoch: 2   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:03,929-Speed 5525.23 samples/sec   Loss 9.4146   LearningRate 0.0776   Epoch: 2   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:05,789-Speed 5507.58 samples/sec   Loss 9.0333   LearningRate 0.0776   Epoch: 2   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:07,620-Speed 5596.17 samples/sec   Loss 9.1600   LearningRate 0.0776   Epoch: 2   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:09,413-Speed 5713.05 samples/sec   Loss 9.1684   LearningRate 0.0776   Epoch: 2   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:11,199-Speed 5740.22 samples/sec   Loss 9.2245   LearningRate 0.0775   Epoch: 2   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:12,990-Speed 5719.55 samples/sec   Loss 9.3180   LearningRate 0.0775   Epoch: 2   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:14,782-Speed 5716.82 samples/sec   Loss 9.1406   LearningRate 0.0775   Epoch: 2   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:16,582-Speed 5692.50 samples/sec   Loss 9.1205   LearningRate 0.0775   Epoch: 2   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:18,388-Speed 5673.46 samples/sec   Loss 9.2289   LearningRate 0.0775   Epoch: 2   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:20,185-Speed 5698.92 samples/sec   Loss 9.2632   LearningRate 0.0775   Epoch: 2   Global Step: 13630   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:18:21,972-Speed 5735.19 samples/sec   Loss 9.1415   LearningRate 0.0774   Epoch: 2   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:23,776-Speed 5678.68 samples/sec   Loss 9.2121   LearningRate 0.0774   Epoch: 2   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:25,579-Speed 5684.14 samples/sec   Loss 9.1706   LearningRate 0.0774   Epoch: 2   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:27,410-Speed 5594.42 samples/sec   Loss 9.2011   LearningRate 0.0774   Epoch: 2   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:29,206-Speed 5704.44 samples/sec   Loss 9.2628   LearningRate 0.0774   Epoch: 2   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:31,014-Speed 5668.47 samples/sec   Loss 9.1812   LearningRate 0.0774   Epoch: 2   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:32,819-Speed 5675.61 samples/sec   Loss 8.9395   LearningRate 0.0774   Epoch: 2   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:34,608-Speed 5727.79 samples/sec   Loss 9.1245   LearningRate 0.0773   Epoch: 2   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:36,401-Speed 5713.19 samples/sec   Loss 9.1050   LearningRate 0.0773   Epoch: 2   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:38,189-Speed 5729.08 samples/sec   Loss 9.0837   LearningRate 0.0773   Epoch: 2   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:40,004-Speed 5646.64 samples/sec   Loss 9.1898   LearningRate 0.0773   Epoch: 2   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:41,804-Speed 5691.67 samples/sec   Loss 9.1063   LearningRate 0.0773   Epoch: 2   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:43,593-Speed 5724.46 samples/sec   Loss 9.1208   LearningRate 0.0773   Epoch: 2   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:45,415-Speed 5625.71 samples/sec   Loss 9.1145   LearningRate 0.0772   Epoch: 2   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:47,225-Speed 5660.41 samples/sec   Loss 9.0394   LearningRate 0.0772   Epoch: 2   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:49,047-Speed 5623.97 samples/sec   Loss 9.2858   LearningRate 0.0772   Epoch: 2   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:50,842-Speed 5707.05 samples/sec   Loss 9.1917   LearningRate 0.0772   Epoch: 2   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:52,631-Speed 5726.28 samples/sec   Loss 9.1132   LearningRate 0.0772   Epoch: 2   Global Step: 13810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:54,427-Speed 5705.48 samples/sec   Loss 9.1634   LearningRate 0.0772   Epoch: 2   Global Step: 13820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:56,233-Speed 5673.47 samples/sec   Loss 9.0479   LearningRate 0.0772   Epoch: 2   Global Step: 13830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:18:58,050-Speed 5637.02 samples/sec   Loss 9.0993   LearningRate 0.0771   Epoch: 2   Global Step: 13840   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:18:59,835-Speed 5741.19 samples/sec   Loss 9.1241   LearningRate 0.0771   Epoch: 2   Global Step: 13850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:01,637-Speed 5683.65 samples/sec   Loss 9.3036   LearningRate 0.0771   Epoch: 2   Global Step: 13860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:03,458-Speed 5625.34 samples/sec   Loss 9.0844   LearningRate 0.0771   Epoch: 2   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:05,287-Speed 5603.89 samples/sec   Loss 9.1528   LearningRate 0.0771   Epoch: 2   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:07,129-Speed 5562.22 samples/sec   Loss 9.0164   LearningRate 0.0771   Epoch: 2   Global Step: 13890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:08,937-Speed 5666.20 samples/sec   Loss 9.1222   LearningRate 0.0770   Epoch: 2   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:10,738-Speed 5690.45 samples/sec   Loss 9.1677   LearningRate 0.0770   Epoch: 2   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:12,536-Speed 5695.79 samples/sec   Loss 9.0835   LearningRate 0.0770   Epoch: 2   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:14,325-Speed 5727.56 samples/sec   Loss 9.2086   LearningRate 0.0770   Epoch: 2   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:16,117-Speed 5718.03 samples/sec   Loss 9.1785   LearningRate 0.0770   Epoch: 2   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:17,917-Speed 5693.42 samples/sec   Loss 9.0037   LearningRate 0.0770   Epoch: 2   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:19,710-Speed 5714.21 samples/sec   Loss 9.0941   LearningRate 0.0770   Epoch: 2   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:21,540-Speed 5596.76 samples/sec   Loss 9.0807   LearningRate 0.0769   Epoch: 2   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:23,328-Speed 5730.25 samples/sec   Loss 9.1427   LearningRate 0.0769   Epoch: 2   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:25,158-Speed 5599.97 samples/sec   Loss 9.1293   LearningRate 0.0769   Epoch: 2   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:26,987-Speed 5599.15 samples/sec   Loss 9.2193   LearningRate 0.0769   Epoch: 2   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:19:53,711-[lfw][14000]XNorm: 23.086993
Training: 2022-04-27 02:19:53,712-[lfw][14000]Accuracy-Flip: 0.99617+-0.00279
Training: 2022-04-27 02:19:53,713-[lfw][14000]Accuracy-Highest: 0.99617
Training: 2022-04-27 02:20:24,771-[cfp_fp][14000]XNorm: 19.993398
Training: 2022-04-27 02:20:24,772-[cfp_fp][14000]Accuracy-Flip: 0.92343+-0.01574
Training: 2022-04-27 02:20:24,773-[cfp_fp][14000]Accuracy-Highest: 0.92343
Training: 2022-04-27 02:20:51,551-[agedb_30][14000]XNorm: 22.814435
Training: 2022-04-27 02:20:51,552-[agedb_30][14000]Accuracy-Flip: 0.95567+-0.00978
Training: 2022-04-27 02:20:51,552-[agedb_30][14000]Accuracy-Highest: 0.95567
Training: 2022-04-27 02:20:53,408-Speed 118.49 samples/sec   Loss 9.1931   LearningRate 0.0769   Epoch: 2   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:20:55,226-Speed 5636.66 samples/sec   Loss 9.2419   LearningRate 0.0769   Epoch: 2   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:20:57,027-Speed 5686.30 samples/sec   Loss 9.1542   LearningRate 0.0768   Epoch: 2   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:20:58,812-Speed 5740.74 samples/sec   Loss 8.9942   LearningRate 0.0768   Epoch: 2   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:00,601-Speed 5728.66 samples/sec   Loss 9.1459   LearningRate 0.0768   Epoch: 2   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:02,413-Speed 5653.97 samples/sec   Loss 9.1195   LearningRate 0.0768   Epoch: 2   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:04,240-Speed 5608.74 samples/sec   Loss 9.1885   LearningRate 0.0768   Epoch: 2   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:06,046-Speed 5673.58 samples/sec   Loss 8.9563   LearningRate 0.0768   Epoch: 2   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:07,866-Speed 5628.08 samples/sec   Loss 8.9537   LearningRate 0.0768   Epoch: 2   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:09,697-Speed 5596.38 samples/sec   Loss 9.0961   LearningRate 0.0767   Epoch: 2   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:11,514-Speed 5638.66 samples/sec   Loss 8.9109   LearningRate 0.0767   Epoch: 2   Global Step: 14110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:13,307-Speed 5713.67 samples/sec   Loss 9.1050   LearningRate 0.0767   Epoch: 2   Global Step: 14120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:15,137-Speed 5595.62 samples/sec   Loss 9.1483   LearningRate 0.0767   Epoch: 2   Global Step: 14130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:16,941-Speed 5680.68 samples/sec   Loss 9.0742   LearningRate 0.0767   Epoch: 2   Global Step: 14140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:18,753-Speed 5653.13 samples/sec   Loss 9.1103   LearningRate 0.0767   Epoch: 2   Global Step: 14150   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:21:20,569-Speed 5644.23 samples/sec   Loss 9.0943   LearningRate 0.0766   Epoch: 2   Global Step: 14160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:22,363-Speed 5708.18 samples/sec   Loss 9.0681   LearningRate 0.0766   Epoch: 2   Global Step: 14170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:24,165-Speed 5687.28 samples/sec   Loss 9.0846   LearningRate 0.0766   Epoch: 2   Global Step: 14180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:25,965-Speed 5689.90 samples/sec   Loss 9.0874   LearningRate 0.0766   Epoch: 2   Global Step: 14190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:27,786-Speed 5626.64 samples/sec   Loss 9.0367   LearningRate 0.0766   Epoch: 2   Global Step: 14200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:29,602-Speed 5643.48 samples/sec   Loss 9.2168   LearningRate 0.0766   Epoch: 2   Global Step: 14210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:31,413-Speed 5655.98 samples/sec   Loss 9.1119   LearningRate 0.0766   Epoch: 2   Global Step: 14220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:33,217-Speed 5679.86 samples/sec   Loss 9.0661   LearningRate 0.0765   Epoch: 2   Global Step: 14230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:35,005-Speed 5728.95 samples/sec   Loss 9.0163   LearningRate 0.0765   Epoch: 2   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:21:36,811-Speed 5671.89 samples/sec   Loss 9.0421   LearningRate 0.0765   Epoch: 2   Global Step: 14250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:38,608-Speed 5703.13 samples/sec   Loss 8.9782   LearningRate 0.0765   Epoch: 2   Global Step: 14260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:40,407-Speed 5694.45 samples/sec   Loss 9.0659   LearningRate 0.0765   Epoch: 2   Global Step: 14270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:42,201-Speed 5711.55 samples/sec   Loss 9.0478   LearningRate 0.0765   Epoch: 2   Global Step: 14280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:44,109-Speed 5368.71 samples/sec   Loss 8.8857   LearningRate 0.0764   Epoch: 2   Global Step: 14290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:45,927-Speed 5633.99 samples/sec   Loss 9.1045   LearningRate 0.0764   Epoch: 2   Global Step: 14300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:47,739-Speed 5654.98 samples/sec   Loss 8.9992   LearningRate 0.0764   Epoch: 2   Global Step: 14310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:49,551-Speed 5654.82 samples/sec   Loss 9.0477   LearningRate 0.0764   Epoch: 2   Global Step: 14320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:51,373-Speed 5619.98 samples/sec   Loss 9.1369   LearningRate 0.0764   Epoch: 2   Global Step: 14330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:53,180-Speed 5671.73 samples/sec   Loss 9.1065   LearningRate 0.0764   Epoch: 2   Global Step: 14340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:21:54,986-Speed 5671.68 samples/sec   Loss 8.9842   LearningRate 0.0764   Epoch: 2   Global Step: 14350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:21:56,797-Speed 5655.53 samples/sec   Loss 9.0656   LearningRate 0.0763   Epoch: 2   Global Step: 14360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:21:58,598-Speed 5688.39 samples/sec   Loss 9.0917   LearningRate 0.0763   Epoch: 2   Global Step: 14370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:00,416-Speed 5633.96 samples/sec   Loss 9.1244   LearningRate 0.0763   Epoch: 2   Global Step: 14380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:02,250-Speed 5585.61 samples/sec   Loss 8.9260   LearningRate 0.0763   Epoch: 2   Global Step: 14390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:04,047-Speed 5703.30 samples/sec   Loss 8.9894   LearningRate 0.0763   Epoch: 2   Global Step: 14400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:05,832-Speed 5738.13 samples/sec   Loss 8.9637   LearningRate 0.0763   Epoch: 2   Global Step: 14410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:07,649-Speed 5637.49 samples/sec   Loss 9.0977   LearningRate 0.0762   Epoch: 2   Global Step: 14420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:09,454-Speed 5676.58 samples/sec   Loss 8.9268   LearningRate 0.0762   Epoch: 2   Global Step: 14430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:11,263-Speed 5662.47 samples/sec   Loss 9.0390   LearningRate 0.0762   Epoch: 2   Global Step: 14440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:22:13,064-Speed 5688.00 samples/sec   Loss 9.0543   LearningRate 0.0762   Epoch: 2   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:14,888-Speed 5617.97 samples/sec   Loss 9.0279   LearningRate 0.0762   Epoch: 2   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:16,699-Speed 5657.46 samples/sec   Loss 9.0128   LearningRate 0.0762   Epoch: 2   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:18,484-Speed 5737.78 samples/sec   Loss 9.0712   LearningRate 0.0762   Epoch: 2   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:20,278-Speed 5710.76 samples/sec   Loss 8.9404   LearningRate 0.0761   Epoch: 2   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:22,072-Speed 5709.40 samples/sec   Loss 8.9340   LearningRate 0.0761   Epoch: 2   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:23,877-Speed 5675.83 samples/sec   Loss 8.9015   LearningRate 0.0761   Epoch: 2   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:25,663-Speed 5735.99 samples/sec   Loss 9.1146   LearningRate 0.0761   Epoch: 2   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:27,454-Speed 5719.00 samples/sec   Loss 9.1301   LearningRate 0.0761   Epoch: 2   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:29,302-Speed 5544.16 samples/sec   Loss 9.0405   LearningRate 0.0761   Epoch: 2   Global Step: 14540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:31,161-Speed 5511.15 samples/sec   Loss 8.9402   LearningRate 0.0760   Epoch: 2   Global Step: 14550   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:22:33,004-Speed 5558.98 samples/sec   Loss 9.0924   LearningRate 0.0760   Epoch: 2   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:34,873-Speed 5481.55 samples/sec   Loss 9.0720   LearningRate 0.0760   Epoch: 2   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:36,741-Speed 5482.05 samples/sec   Loss 9.0840   LearningRate 0.0760   Epoch: 2   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:38,557-Speed 5640.38 samples/sec   Loss 8.9527   LearningRate 0.0760   Epoch: 2   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:40,371-Speed 5650.85 samples/sec   Loss 8.9491   LearningRate 0.0760   Epoch: 2   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:42,184-Speed 5650.93 samples/sec   Loss 8.9197   LearningRate 0.0760   Epoch: 2   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:43,991-Speed 5667.80 samples/sec   Loss 9.2807   LearningRate 0.0759   Epoch: 2   Global Step: 14620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:45,794-Speed 5684.79 samples/sec   Loss 8.9509   LearningRate 0.0759   Epoch: 2   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:47,592-Speed 5696.64 samples/sec   Loss 8.9750   LearningRate 0.0759   Epoch: 2   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:49,408-Speed 5641.42 samples/sec   Loss 9.1653   LearningRate 0.0759   Epoch: 2   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:51,209-Speed 5690.26 samples/sec   Loss 8.8854   LearningRate 0.0759   Epoch: 2   Global Step: 14660   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:22:53,003-Speed 5709.86 samples/sec   Loss 8.9620   LearningRate 0.0759   Epoch: 2   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:54,808-Speed 5674.22 samples/sec   Loss 8.9320   LearningRate 0.0758   Epoch: 2   Global Step: 14680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:56,673-Speed 5492.20 samples/sec   Loss 9.0549   LearningRate 0.0758   Epoch: 2   Global Step: 14690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:22:58,470-Speed 5700.46 samples/sec   Loss 9.0493   LearningRate 0.0758   Epoch: 2   Global Step: 14700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:00,265-Speed 5714.22 samples/sec   Loss 8.9852   LearningRate 0.0758   Epoch: 2   Global Step: 14710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:02,057-Speed 5714.41 samples/sec   Loss 8.9567   LearningRate 0.0758   Epoch: 2   Global Step: 14720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:03,888-Speed 5597.10 samples/sec   Loss 8.8972   LearningRate 0.0758   Epoch: 2   Global Step: 14730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:05,694-Speed 5671.60 samples/sec   Loss 8.9602   LearningRate 0.0758   Epoch: 2   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:07,499-Speed 5675.51 samples/sec   Loss 9.0774   LearningRate 0.0757   Epoch: 2   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:09,289-Speed 5724.11 samples/sec   Loss 8.8663   LearningRate 0.0757   Epoch: 2   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:11,085-Speed 5706.56 samples/sec   Loss 8.8297   LearningRate 0.0757   Epoch: 2   Global Step: 14770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:12,885-Speed 5688.64 samples/sec   Loss 9.1735   LearningRate 0.0757   Epoch: 2   Global Step: 14780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:14,701-Speed 5640.10 samples/sec   Loss 8.9926   LearningRate 0.0757   Epoch: 2   Global Step: 14790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:16,510-Speed 5667.41 samples/sec   Loss 8.9586   LearningRate 0.0757   Epoch: 2   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:18,303-Speed 5711.57 samples/sec   Loss 9.0358   LearningRate 0.0756   Epoch: 2   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:20,161-Speed 5514.13 samples/sec   Loss 8.9689   LearningRate 0.0756   Epoch: 2   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:21,983-Speed 5623.63 samples/sec   Loss 8.9104   LearningRate 0.0756   Epoch: 2   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:23,797-Speed 5653.50 samples/sec   Loss 8.9449   LearningRate 0.0756   Epoch: 2   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:25,623-Speed 5610.72 samples/sec   Loss 9.0092   LearningRate 0.0756   Epoch: 2   Global Step: 14850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:27,422-Speed 5691.87 samples/sec   Loss 8.9656   LearningRate 0.0756   Epoch: 2   Global Step: 14860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:29,228-Speed 5673.85 samples/sec   Loss 9.0277   LearningRate 0.0756   Epoch: 2   Global Step: 14870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:31,030-Speed 5685.46 samples/sec   Loss 9.1175   LearningRate 0.0755   Epoch: 2   Global Step: 14880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:32,820-Speed 5722.44 samples/sec   Loss 8.9195   LearningRate 0.0755   Epoch: 2   Global Step: 14890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:34,625-Speed 5678.80 samples/sec   Loss 8.9811   LearningRate 0.0755   Epoch: 2   Global Step: 14900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:36,487-Speed 5502.84 samples/sec   Loss 8.7294   LearningRate 0.0755   Epoch: 2   Global Step: 14910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:38,296-Speed 5662.58 samples/sec   Loss 8.9314   LearningRate 0.0755   Epoch: 2   Global Step: 14920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:40,088-Speed 5716.25 samples/sec   Loss 9.1658   LearningRate 0.0755   Epoch: 2   Global Step: 14930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:41,898-Speed 5661.44 samples/sec   Loss 8.9766   LearningRate 0.0755   Epoch: 2   Global Step: 14940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:43,699-Speed 5688.82 samples/sec   Loss 8.9697   LearningRate 0.0754   Epoch: 2   Global Step: 14950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:45,497-Speed 5697.73 samples/sec   Loss 8.8811   LearningRate 0.0754   Epoch: 2   Global Step: 14960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:47,311-Speed 5646.40 samples/sec   Loss 9.2026   LearningRate 0.0754   Epoch: 2   Global Step: 14970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:49,142-Speed 5596.70 samples/sec   Loss 8.8922   LearningRate 0.0754   Epoch: 2   Global Step: 14980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:23:50,953-Speed 5654.40 samples/sec   Loss 8.8725   LearningRate 0.0754   Epoch: 2   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:52,752-Speed 5696.68 samples/sec   Loss 8.8033   LearningRate 0.0754   Epoch: 2   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:54,552-Speed 5690.58 samples/sec   Loss 9.0445   LearningRate 0.0753   Epoch: 2   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:56,358-Speed 5673.74 samples/sec   Loss 9.0373   LearningRate 0.0753   Epoch: 2   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:58,200-Speed 5561.54 samples/sec   Loss 8.9763   LearningRate 0.0753   Epoch: 2   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:23:59,992-Speed 5716.92 samples/sec   Loss 8.8499   LearningRate 0.0753   Epoch: 2   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:01,789-Speed 5702.72 samples/sec   Loss 9.1001   LearningRate 0.0753   Epoch: 2   Global Step: 15050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:03,609-Speed 5628.08 samples/sec   Loss 9.0090   LearningRate 0.0753   Epoch: 2   Global Step: 15060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:05,435-Speed 5611.41 samples/sec   Loss 8.9696   LearningRate 0.0753   Epoch: 2   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:07,237-Speed 5682.72 samples/sec   Loss 8.9585   LearningRate 0.0752   Epoch: 2   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:09,040-Speed 5683.87 samples/sec   Loss 8.9354   LearningRate 0.0752   Epoch: 2   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:10,847-Speed 5669.15 samples/sec   Loss 8.7362   LearningRate 0.0752   Epoch: 2   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:12,651-Speed 5678.00 samples/sec   Loss 8.8719   LearningRate 0.0752   Epoch: 2   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:14,440-Speed 5729.87 samples/sec   Loss 8.9343   LearningRate 0.0752   Epoch: 2   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:16,246-Speed 5671.44 samples/sec   Loss 9.0024   LearningRate 0.0752   Epoch: 2   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:18,046-Speed 5692.40 samples/sec   Loss 8.8692   LearningRate 0.0751   Epoch: 2   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:19,834-Speed 5729.90 samples/sec   Loss 9.0175   LearningRate 0.0751   Epoch: 2   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:21,617-Speed 5744.14 samples/sec   Loss 9.0044   LearningRate 0.0751   Epoch: 2   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:23,421-Speed 5679.70 samples/sec   Loss 8.9130   LearningRate 0.0751   Epoch: 2   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:25,206-Speed 5741.26 samples/sec   Loss 8.8206   LearningRate 0.0751   Epoch: 2   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:26,987-Speed 5752.03 samples/sec   Loss 9.1444   LearningRate 0.0751   Epoch: 2   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:28,786-Speed 5695.15 samples/sec   Loss 8.9889   LearningRate 0.0751   Epoch: 2   Global Step: 15200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:30,584-Speed 5696.91 samples/sec   Loss 8.9567   LearningRate 0.0750   Epoch: 2   Global Step: 15210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:32,413-Speed 5602.27 samples/sec   Loss 8.9057   LearningRate 0.0750   Epoch: 2   Global Step: 15220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:34,200-Speed 5730.56 samples/sec   Loss 8.8341   LearningRate 0.0750   Epoch: 2   Global Step: 15230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:36,011-Speed 5658.78 samples/sec   Loss 8.9357   LearningRate 0.0750   Epoch: 2   Global Step: 15240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:37,841-Speed 5598.82 samples/sec   Loss 9.0280   LearningRate 0.0750   Epoch: 2   Global Step: 15250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:39,650-Speed 5664.18 samples/sec   Loss 8.8969   LearningRate 0.0750   Epoch: 2   Global Step: 15260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:41,449-Speed 5694.81 samples/sec   Loss 8.8034   LearningRate 0.0749   Epoch: 2   Global Step: 15270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:43,240-Speed 5720.05 samples/sec   Loss 9.0282   LearningRate 0.0749   Epoch: 2   Global Step: 15280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:45,063-Speed 5619.82 samples/sec   Loss 8.9256   LearningRate 0.0749   Epoch: 2   Global Step: 15290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:24:46,913-Speed 5539.76 samples/sec   Loss 8.8917   LearningRate 0.0749   Epoch: 2   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:48,710-Speed 5698.22 samples/sec   Loss 8.8706   LearningRate 0.0749   Epoch: 2   Global Step: 15310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:50,510-Speed 5695.44 samples/sec   Loss 8.9793   LearningRate 0.0749   Epoch: 2   Global Step: 15320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:52,306-Speed 5701.84 samples/sec   Loss 8.8632   LearningRate 0.0749   Epoch: 2   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:54,110-Speed 5679.59 samples/sec   Loss 8.9912   LearningRate 0.0748   Epoch: 2   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:55,919-Speed 5665.63 samples/sec   Loss 8.8514   LearningRate 0.0748   Epoch: 2   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:57,715-Speed 5701.95 samples/sec   Loss 8.7462   LearningRate 0.0748   Epoch: 2   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:24:59,521-Speed 5673.78 samples/sec   Loss 9.0112   LearningRate 0.0748   Epoch: 2   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:01,331-Speed 5659.96 samples/sec   Loss 8.9611   LearningRate 0.0748   Epoch: 2   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:03,138-Speed 5670.23 samples/sec   Loss 8.8375   LearningRate 0.0748   Epoch: 2   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:04,933-Speed 5707.20 samples/sec   Loss 8.9329   LearningRate 0.0747   Epoch: 2   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:06,727-Speed 5710.43 samples/sec   Loss 8.9362   LearningRate 0.0747   Epoch: 2   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:08,525-Speed 5696.61 samples/sec   Loss 8.9252   LearningRate 0.0747   Epoch: 2   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:10,343-Speed 5637.35 samples/sec   Loss 8.7280   LearningRate 0.0747   Epoch: 2   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:12,149-Speed 5673.92 samples/sec   Loss 8.8540   LearningRate 0.0747   Epoch: 2   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:13,951-Speed 5685.44 samples/sec   Loss 8.7979   LearningRate 0.0747   Epoch: 2   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:15,753-Speed 5685.23 samples/sec   Loss 8.8931   LearningRate 0.0747   Epoch: 2   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:17,553-Speed 5691.69 samples/sec   Loss 8.9926   LearningRate 0.0746   Epoch: 2   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:19,363-Speed 5659.84 samples/sec   Loss 8.7627   LearningRate 0.0746   Epoch: 2   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:21,182-Speed 5634.01 samples/sec   Loss 8.9060   LearningRate 0.0746   Epoch: 2   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:22,967-Speed 5740.65 samples/sec   Loss 9.0690   LearningRate 0.0746   Epoch: 2   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:24,759-Speed 5716.83 samples/sec   Loss 8.9283   LearningRate 0.0746   Epoch: 2   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:26,557-Speed 5697.25 samples/sec   Loss 9.0108   LearningRate 0.0746   Epoch: 2   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:28,371-Speed 5646.83 samples/sec   Loss 8.8189   LearningRate 0.0746   Epoch: 2   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:30,172-Speed 5690.39 samples/sec   Loss 8.8061   LearningRate 0.0745   Epoch: 2   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:31,991-Speed 5632.58 samples/sec   Loss 9.0280   LearningRate 0.0745   Epoch: 2   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:33,788-Speed 5700.75 samples/sec   Loss 8.9203   LearningRate 0.0745   Epoch: 2   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:35,591-Speed 5681.13 samples/sec   Loss 8.8206   LearningRate 0.0745   Epoch: 2   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:37,398-Speed 5673.18 samples/sec   Loss 8.8432   LearningRate 0.0745   Epoch: 2   Global Step: 15580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:39,201-Speed 5679.75 samples/sec   Loss 8.8831   LearningRate 0.0745   Epoch: 2   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:41,050-Speed 5543.76 samples/sec   Loss 8.9159   LearningRate 0.0744   Epoch: 2   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:42,911-Speed 5509.12 samples/sec   Loss 8.9244   LearningRate 0.0744   Epoch: 2   Global Step: 15610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:44,771-Speed 5505.29 samples/sec   Loss 8.7896   LearningRate 0.0744   Epoch: 2   Global Step: 15620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:46,616-Speed 5558.06 samples/sec   Loss 8.9032   LearningRate 0.0744   Epoch: 2   Global Step: 15630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:48,417-Speed 5687.42 samples/sec   Loss 8.7593   LearningRate 0.0744   Epoch: 2   Global Step: 15640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:50,222-Speed 5676.66 samples/sec   Loss 8.6884   LearningRate 0.0744   Epoch: 2   Global Step: 15650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:52,021-Speed 5695.54 samples/sec   Loss 8.7331   LearningRate 0.0744   Epoch: 2   Global Step: 15660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:53,838-Speed 5640.34 samples/sec   Loss 8.8274   LearningRate 0.0743   Epoch: 2   Global Step: 15670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:55,638-Speed 5692.87 samples/sec   Loss 8.8343   LearningRate 0.0743   Epoch: 2   Global Step: 15680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:57,430-Speed 5719.37 samples/sec   Loss 8.9197   LearningRate 0.0743   Epoch: 2   Global Step: 15690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:25:59,239-Speed 5660.55 samples/sec   Loss 8.9241   LearningRate 0.0743   Epoch: 2   Global Step: 15700   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:26:01,033-Speed 5713.17 samples/sec   Loss 8.8476   LearningRate 0.0743   Epoch: 2   Global Step: 15710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:02,839-Speed 5674.95 samples/sec   Loss 8.9150   LearningRate 0.0743   Epoch: 2   Global Step: 15720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:04,646-Speed 5672.17 samples/sec   Loss 8.7551   LearningRate 0.0742   Epoch: 2   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:06,478-Speed 5594.48 samples/sec   Loss 8.7329   LearningRate 0.0742   Epoch: 2   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:08,307-Speed 5599.98 samples/sec   Loss 8.8935   LearningRate 0.0742   Epoch: 2   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:10,099-Speed 5718.36 samples/sec   Loss 8.8623   LearningRate 0.0742   Epoch: 2   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:11,897-Speed 5695.06 samples/sec   Loss 8.7388   LearningRate 0.0742   Epoch: 2   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:13,701-Speed 5678.67 samples/sec   Loss 8.8885   LearningRate 0.0742   Epoch: 2   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:15,506-Speed 5677.32 samples/sec   Loss 8.6891   LearningRate 0.0742   Epoch: 2   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:17,311-Speed 5678.26 samples/sec   Loss 8.7881   LearningRate 0.0741   Epoch: 2   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:19,094-Speed 5745.98 samples/sec   Loss 8.8302   LearningRate 0.0741   Epoch: 2   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:20,896-Speed 5682.57 samples/sec   Loss 8.8551   LearningRate 0.0741   Epoch: 2   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:22,693-Speed 5702.78 samples/sec   Loss 8.8676   LearningRate 0.0741   Epoch: 2   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:24,483-Speed 5724.83 samples/sec   Loss 8.9331   LearningRate 0.0741   Epoch: 2   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:26,278-Speed 5706.79 samples/sec   Loss 8.8677   LearningRate 0.0741   Epoch: 2   Global Step: 15850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:28,087-Speed 5663.36 samples/sec   Loss 8.8749   LearningRate 0.0741   Epoch: 2   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:29,896-Speed 5661.34 samples/sec   Loss 8.8466   LearningRate 0.0740   Epoch: 2   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:31,695-Speed 5696.35 samples/sec   Loss 8.9505   LearningRate 0.0740   Epoch: 2   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:33,500-Speed 5675.83 samples/sec   Loss 8.8544   LearningRate 0.0740   Epoch: 2   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:35,324-Speed 5615.29 samples/sec   Loss 8.8109   LearningRate 0.0740   Epoch: 2   Global Step: 15900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:37,118-Speed 5713.09 samples/sec   Loss 8.7582   LearningRate 0.0740   Epoch: 2   Global Step: 15910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:38,971-Speed 5526.85 samples/sec   Loss 8.5775   LearningRate 0.0740   Epoch: 2   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:40,779-Speed 5668.63 samples/sec   Loss 8.6625   LearningRate 0.0739   Epoch: 2   Global Step: 15930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:42,594-Speed 5643.47 samples/sec   Loss 8.6943   LearningRate 0.0739   Epoch: 2   Global Step: 15940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:44,403-Speed 5664.81 samples/sec   Loss 8.7998   LearningRate 0.0739   Epoch: 2   Global Step: 15950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:46,200-Speed 5699.62 samples/sec   Loss 8.7113   LearningRate 0.0739   Epoch: 2   Global Step: 15960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:48,009-Speed 5665.17 samples/sec   Loss 8.7996   LearningRate 0.0739   Epoch: 2   Global Step: 15970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:49,810-Speed 5688.01 samples/sec   Loss 8.7586   LearningRate 0.0739   Epoch: 2   Global Step: 15980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:51,604-Speed 5714.31 samples/sec   Loss 8.8093   LearningRate 0.0739   Epoch: 2   Global Step: 15990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:26:53,400-Speed 5703.64 samples/sec   Loss 8.7692   LearningRate 0.0738   Epoch: 2   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:27:20,430-[lfw][16000]XNorm: 21.846058
Training: 2022-04-27 02:27:20,431-[lfw][16000]Accuracy-Flip: 0.99500+-0.00342
Training: 2022-04-27 02:27:20,432-[lfw][16000]Accuracy-Highest: 0.99617
Training: 2022-04-27 02:27:51,431-[cfp_fp][16000]XNorm: 18.664564
Training: 2022-04-27 02:27:51,432-[cfp_fp][16000]Accuracy-Flip: 0.92200+-0.01685
Training: 2022-04-27 02:27:51,433-[cfp_fp][16000]Accuracy-Highest: 0.92343
Training: 2022-04-27 02:28:17,981-[agedb_30][16000]XNorm: 21.657405
Training: 2022-04-27 02:28:17,981-[agedb_30][16000]Accuracy-Flip: 0.96383+-0.00997
Training: 2022-04-27 02:28:17,982-[agedb_30][16000]Accuracy-Highest: 0.96383
Training: 2022-04-27 02:28:19,783-Speed 118.54 samples/sec   Loss 8.8321   LearningRate 0.0738   Epoch: 2   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:21,596-Speed 5652.78 samples/sec   Loss 8.9230   LearningRate 0.0738   Epoch: 2   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:23,383-Speed 5733.09 samples/sec   Loss 8.8022   LearningRate 0.0738   Epoch: 2   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:25,185-Speed 5686.92 samples/sec   Loss 8.7705   LearningRate 0.0738   Epoch: 2   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:27,024-Speed 5570.84 samples/sec   Loss 8.6551   LearningRate 0.0738   Epoch: 2   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:28,878-Speed 5525.37 samples/sec   Loss 8.6625   LearningRate 0.0737   Epoch: 2   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:30,697-Speed 5634.43 samples/sec   Loss 8.5044   LearningRate 0.0737   Epoch: 2   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:32,531-Speed 5585.52 samples/sec   Loss 8.7866   LearningRate 0.0737   Epoch: 2   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:34,329-Speed 5696.70 samples/sec   Loss 8.7948   LearningRate 0.0737   Epoch: 2   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:36,119-Speed 5722.82 samples/sec   Loss 8.9348   LearningRate 0.0737   Epoch: 2   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:37,917-Speed 5699.16 samples/sec   Loss 8.8102   LearningRate 0.0737   Epoch: 2   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:39,719-Speed 5684.91 samples/sec   Loss 8.7353   LearningRate 0.0737   Epoch: 2   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:41,520-Speed 5690.57 samples/sec   Loss 8.8066   LearningRate 0.0736   Epoch: 2   Global Step: 16130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:43,315-Speed 5706.69 samples/sec   Loss 8.5475   LearningRate 0.0736   Epoch: 2   Global Step: 16140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:28:45,119-Speed 5678.48 samples/sec   Loss 8.8323   LearningRate 0.0736   Epoch: 2   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:46,929-Speed 5661.38 samples/sec   Loss 8.8680   LearningRate 0.0736   Epoch: 2   Global Step: 16160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:48,751-Speed 5622.05 samples/sec   Loss 8.8023   LearningRate 0.0736   Epoch: 2   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:50,555-Speed 5680.82 samples/sec   Loss 8.7706   LearningRate 0.0736   Epoch: 2   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:52,371-Speed 5639.58 samples/sec   Loss 8.8292   LearningRate 0.0736   Epoch: 2   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:54,168-Speed 5703.83 samples/sec   Loss 8.8139   LearningRate 0.0735   Epoch: 2   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:55,993-Speed 5611.17 samples/sec   Loss 8.8175   LearningRate 0.0735   Epoch: 2   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:57,801-Speed 5670.37 samples/sec   Loss 8.6322   LearningRate 0.0735   Epoch: 2   Global Step: 16220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:28:59,622-Speed 5624.00 samples/sec   Loss 8.6983   LearningRate 0.0735   Epoch: 2   Global Step: 16230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:29:01,467-Speed 5553.81 samples/sec   Loss 8.7208   LearningRate 0.0735   Epoch: 2   Global Step: 16240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:29:03,277-Speed 5661.34 samples/sec   Loss 8.7706   LearningRate 0.0735   Epoch: 2   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:05,089-Speed 5652.33 samples/sec   Loss 8.7293   LearningRate 0.0734   Epoch: 2   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:06,901-Speed 5656.85 samples/sec   Loss 8.6218   LearningRate 0.0734   Epoch: 2   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:08,695-Speed 5710.09 samples/sec   Loss 8.6388   LearningRate 0.0734   Epoch: 2   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:10,519-Speed 5617.80 samples/sec   Loss 8.8273   LearningRate 0.0734   Epoch: 2   Global Step: 16290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:12,315-Speed 5701.22 samples/sec   Loss 8.7567   LearningRate 0.0734   Epoch: 2   Global Step: 16300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:14,107-Speed 5719.97 samples/sec   Loss 8.6831   LearningRate 0.0734   Epoch: 2   Global Step: 16310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:15,909-Speed 5684.72 samples/sec   Loss 8.7244   LearningRate 0.0734   Epoch: 2   Global Step: 16320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:17,751-Speed 5562.32 samples/sec   Loss 8.7272   LearningRate 0.0733   Epoch: 2   Global Step: 16330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:19,585-Speed 5586.51 samples/sec   Loss 8.8356   LearningRate 0.0733   Epoch: 2   Global Step: 16340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:21,375-Speed 5722.58 samples/sec   Loss 8.8273   LearningRate 0.0733   Epoch: 2   Global Step: 16350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:23,258-Speed 5442.07 samples/sec   Loss 8.8067   LearningRate 0.0733   Epoch: 2   Global Step: 16360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:25,061-Speed 5681.87 samples/sec   Loss 8.8323   LearningRate 0.0733   Epoch: 2   Global Step: 16370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:26,884-Speed 5618.65 samples/sec   Loss 8.7895   LearningRate 0.0733   Epoch: 2   Global Step: 16380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:28,684-Speed 5691.25 samples/sec   Loss 8.6869   LearningRate 0.0733   Epoch: 2   Global Step: 16390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:30,507-Speed 5622.61 samples/sec   Loss 8.7197   LearningRate 0.0732   Epoch: 2   Global Step: 16400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:32,319-Speed 5652.24 samples/sec   Loss 8.5760   LearningRate 0.0732   Epoch: 2   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:34,112-Speed 5712.26 samples/sec   Loss 8.8136   LearningRate 0.0732   Epoch: 2   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:35,900-Speed 5729.17 samples/sec   Loss 8.5345   LearningRate 0.0732   Epoch: 2   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:37,715-Speed 5646.31 samples/sec   Loss 8.7178   LearningRate 0.0732   Epoch: 2   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:39,503-Speed 5729.06 samples/sec   Loss 8.5687   LearningRate 0.0732   Epoch: 2   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:41,332-Speed 5600.54 samples/sec   Loss 8.8896   LearningRate 0.0731   Epoch: 2   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:43,141-Speed 5664.07 samples/sec   Loss 8.7462   LearningRate 0.0731   Epoch: 2   Global Step: 16470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:44,941-Speed 5693.70 samples/sec   Loss 8.5461   LearningRate 0.0731   Epoch: 2   Global Step: 16480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:46,759-Speed 5635.54 samples/sec   Loss 8.8619   LearningRate 0.0731   Epoch: 2   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:48,561-Speed 5685.42 samples/sec   Loss 8.6666   LearningRate 0.0731   Epoch: 2   Global Step: 16500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:50,367-Speed 5672.39 samples/sec   Loss 8.7250   LearningRate 0.0731   Epoch: 2   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:52,194-Speed 5609.64 samples/sec   Loss 8.6230   LearningRate 0.0731   Epoch: 2   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:53,989-Speed 5705.28 samples/sec   Loss 8.6906   LearningRate 0.0730   Epoch: 2   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:55,807-Speed 5641.17 samples/sec   Loss 8.7636   LearningRate 0.0730   Epoch: 2   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:57,607-Speed 5690.68 samples/sec   Loss 8.7569   LearningRate 0.0730   Epoch: 2   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:29:59,411-Speed 5680.09 samples/sec   Loss 8.6869   LearningRate 0.0730   Epoch: 2   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:01,204-Speed 5714.24 samples/sec   Loss 8.5304   LearningRate 0.0730   Epoch: 2   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:03,024-Speed 5626.68 samples/sec   Loss 8.7424   LearningRate 0.0730   Epoch: 2   Global Step: 16580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:04,823-Speed 5696.78 samples/sec   Loss 8.6832   LearningRate 0.0730   Epoch: 2   Global Step: 16590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:06,629-Speed 5673.35 samples/sec   Loss 8.6407   LearningRate 0.0729   Epoch: 2   Global Step: 16600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:08,453-Speed 5618.68 samples/sec   Loss 8.6473   LearningRate 0.0729   Epoch: 2   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:10,266-Speed 5649.67 samples/sec   Loss 8.5312   LearningRate 0.0729   Epoch: 2   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:12,105-Speed 5570.59 samples/sec   Loss 8.6958   LearningRate 0.0729   Epoch: 2   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:13,912-Speed 5671.96 samples/sec   Loss 8.5922   LearningRate 0.0729   Epoch: 2   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:15,707-Speed 5705.36 samples/sec   Loss 8.8744   LearningRate 0.0729   Epoch: 2   Global Step: 16650   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:30:17,538-Speed 5597.40 samples/sec   Loss 8.6529   LearningRate 0.0728   Epoch: 2   Global Step: 16660   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:30:19,331-Speed 5712.47 samples/sec   Loss 8.6671   LearningRate 0.0728   Epoch: 2   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:21,145-Speed 5651.25 samples/sec   Loss 8.7929   LearningRate 0.0728   Epoch: 2   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:22,951-Speed 5670.42 samples/sec   Loss 8.7745   LearningRate 0.0728   Epoch: 2   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:24,750-Speed 5695.38 samples/sec   Loss 8.6930   LearningRate 0.0728   Epoch: 2   Global Step: 16700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:26,595-Speed 5552.12 samples/sec   Loss 8.7618   LearningRate 0.0728   Epoch: 2   Global Step: 16710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:28,440-Speed 5555.29 samples/sec   Loss 8.7139   LearningRate 0.0728   Epoch: 2   Global Step: 16720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:30,239-Speed 5693.93 samples/sec   Loss 8.7709   LearningRate 0.0727   Epoch: 2   Global Step: 16730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:32,048-Speed 5664.83 samples/sec   Loss 8.7664   LearningRate 0.0727   Epoch: 2   Global Step: 16740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:33,873-Speed 5613.50 samples/sec   Loss 8.7121   LearningRate 0.0727   Epoch: 2   Global Step: 16750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:35,687-Speed 5646.87 samples/sec   Loss 8.6044   LearningRate 0.0727   Epoch: 2   Global Step: 16760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:37,520-Speed 5590.57 samples/sec   Loss 8.5538   LearningRate 0.0727   Epoch: 2   Global Step: 16770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:39,364-Speed 5556.28 samples/sec   Loss 8.6801   LearningRate 0.0727   Epoch: 2   Global Step: 16780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:41,198-Speed 5584.43 samples/sec   Loss 8.6808   LearningRate 0.0727   Epoch: 2   Global Step: 16790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:30:43,030-Speed 5592.83 samples/sec   Loss 8.7454   LearningRate 0.0726   Epoch: 2   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:44,891-Speed 5504.97 samples/sec   Loss 8.5544   LearningRate 0.0726   Epoch: 2   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:46,713-Speed 5624.53 samples/sec   Loss 8.4573   LearningRate 0.0726   Epoch: 2   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:48,536-Speed 5621.41 samples/sec   Loss 8.6803   LearningRate 0.0726   Epoch: 2   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:50,353-Speed 5638.55 samples/sec   Loss 8.5615   LearningRate 0.0726   Epoch: 2   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:52,190-Speed 5575.58 samples/sec   Loss 8.6580   LearningRate 0.0726   Epoch: 2   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:54,015-Speed 5615.53 samples/sec   Loss 8.6083   LearningRate 0.0725   Epoch: 2   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:55,807-Speed 5716.86 samples/sec   Loss 8.7597   LearningRate 0.0725   Epoch: 2   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:57,608-Speed 5688.90 samples/sec   Loss 8.6918   LearningRate 0.0725   Epoch: 2   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:30:59,412-Speed 5680.96 samples/sec   Loss 8.8556   LearningRate 0.0725   Epoch: 2   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:01,218-Speed 5671.29 samples/sec   Loss 8.5750   LearningRate 0.0725   Epoch: 2   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:03,016-Speed 5703.31 samples/sec   Loss 8.5139   LearningRate 0.0725   Epoch: 2   Global Step: 16910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:04,807-Speed 5720.05 samples/sec   Loss 8.5183   LearningRate 0.0725   Epoch: 2   Global Step: 16920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:06,613-Speed 5672.30 samples/sec   Loss 8.7017   LearningRate 0.0724   Epoch: 2   Global Step: 16930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:08,450-Speed 5577.28 samples/sec   Loss 8.6892   LearningRate 0.0724   Epoch: 2   Global Step: 16940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:10,277-Speed 5608.87 samples/sec   Loss 8.6581   LearningRate 0.0724   Epoch: 2   Global Step: 16950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:12,097-Speed 5630.23 samples/sec   Loss 8.6513   LearningRate 0.0724   Epoch: 2   Global Step: 16960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:13,906-Speed 5662.56 samples/sec   Loss 8.6960   LearningRate 0.0724   Epoch: 2   Global Step: 16970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:15,710-Speed 5679.26 samples/sec   Loss 8.5887   LearningRate 0.0724   Epoch: 2   Global Step: 16980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:17,513-Speed 5682.10 samples/sec   Loss 8.6307   LearningRate 0.0724   Epoch: 2   Global Step: 16990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:19,341-Speed 5605.96 samples/sec   Loss 8.6001   LearningRate 0.0723   Epoch: 2   Global Step: 17000   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:31:21,128-Speed 5733.83 samples/sec   Loss 8.6960   LearningRate 0.0723   Epoch: 2   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:22,965-Speed 5577.86 samples/sec   Loss 8.5624   LearningRate 0.0723   Epoch: 2   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:24,791-Speed 5609.99 samples/sec   Loss 8.4896   LearningRate 0.0723   Epoch: 2   Global Step: 17030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:26,591-Speed 5691.12 samples/sec   Loss 8.5987   LearningRate 0.0723   Epoch: 2   Global Step: 17040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:28,499-Speed 5372.00 samples/sec   Loss 8.8981   LearningRate 0.0723   Epoch: 2   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:40,164-Speed 877.93 samples/sec   Loss 8.5458   LearningRate 0.0722   Epoch: 3   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:42,202-Speed 5027.19 samples/sec   Loss 7.9754   LearningRate 0.0722   Epoch: 3   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:44,023-Speed 5628.05 samples/sec   Loss 7.8362   LearningRate 0.0722   Epoch: 3   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:45,847-Speed 5619.11 samples/sec   Loss 8.0257   LearningRate 0.0722   Epoch: 3   Global Step: 17090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:47,656-Speed 5662.45 samples/sec   Loss 7.9474   LearningRate 0.0722   Epoch: 3   Global Step: 17100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:49,601-Speed 5267.12 samples/sec   Loss 7.8727   LearningRate 0.0722   Epoch: 3   Global Step: 17110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:51,532-Speed 5308.59 samples/sec   Loss 7.9359   LearningRate 0.0722   Epoch: 3   Global Step: 17120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:53,343-Speed 5657.10 samples/sec   Loss 7.8595   LearningRate 0.0721   Epoch: 3   Global Step: 17130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:55,147-Speed 5679.26 samples/sec   Loss 8.1134   LearningRate 0.0721   Epoch: 3   Global Step: 17140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:31:56,940-Speed 5714.04 samples/sec   Loss 8.1087   LearningRate 0.0721   Epoch: 3   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:31:58,744-Speed 5679.22 samples/sec   Loss 8.2271   LearningRate 0.0721   Epoch: 3   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:00,562-Speed 5636.18 samples/sec   Loss 8.1711   LearningRate 0.0721   Epoch: 3   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:02,393-Speed 5595.79 samples/sec   Loss 7.9409   LearningRate 0.0721   Epoch: 3   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:04,203-Speed 5659.79 samples/sec   Loss 8.1410   LearningRate 0.0721   Epoch: 3   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:06,004-Speed 5687.02 samples/sec   Loss 8.0676   LearningRate 0.0720   Epoch: 3   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:07,820-Speed 5640.41 samples/sec   Loss 8.1410   LearningRate 0.0720   Epoch: 3   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:09,632-Speed 5655.12 samples/sec   Loss 8.1718   LearningRate 0.0720   Epoch: 3   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:11,424-Speed 5717.83 samples/sec   Loss 8.1878   LearningRate 0.0720   Epoch: 3   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:13,278-Speed 5524.94 samples/sec   Loss 8.1929   LearningRate 0.0720   Epoch: 3   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:15,082-Speed 5679.79 samples/sec   Loss 8.1476   LearningRate 0.0720   Epoch: 3   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:16,871-Speed 5724.94 samples/sec   Loss 8.2591   LearningRate 0.0719   Epoch: 3   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:18,685-Speed 5649.39 samples/sec   Loss 8.3397   LearningRate 0.0719   Epoch: 3   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:20,487-Speed 5685.87 samples/sec   Loss 8.2492   LearningRate 0.0719   Epoch: 3   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:22,310-Speed 5619.70 samples/sec   Loss 8.2297   LearningRate 0.0719   Epoch: 3   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:24,116-Speed 5672.65 samples/sec   Loss 8.2746   LearningRate 0.0719   Epoch: 3   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:25,914-Speed 5698.41 samples/sec   Loss 8.2327   LearningRate 0.0719   Epoch: 3   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:27,733-Speed 5633.42 samples/sec   Loss 8.3393   LearningRate 0.0719   Epoch: 3   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:29,554-Speed 5627.36 samples/sec   Loss 8.1096   LearningRate 0.0718   Epoch: 3   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:31,382-Speed 5603.61 samples/sec   Loss 8.1590   LearningRate 0.0718   Epoch: 3   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:33,191-Speed 5664.45 samples/sec   Loss 8.3881   LearningRate 0.0718   Epoch: 3   Global Step: 17350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:34,995-Speed 5679.57 samples/sec   Loss 8.1392   LearningRate 0.0718   Epoch: 3   Global Step: 17360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:36,833-Speed 5573.69 samples/sec   Loss 8.3892   LearningRate 0.0718   Epoch: 3   Global Step: 17370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:38,665-Speed 5591.88 samples/sec   Loss 8.3568   LearningRate 0.0718   Epoch: 3   Global Step: 17380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:40,469-Speed 5679.94 samples/sec   Loss 8.1845   LearningRate 0.0718   Epoch: 3   Global Step: 17390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:42,314-Speed 5554.69 samples/sec   Loss 8.2525   LearningRate 0.0717   Epoch: 3   Global Step: 17400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:44,160-Speed 5548.55 samples/sec   Loss 8.2585   LearningRate 0.0717   Epoch: 3   Global Step: 17410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:45,978-Speed 5635.00 samples/sec   Loss 8.3618   LearningRate 0.0717   Epoch: 3   Global Step: 17420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:47,790-Speed 5655.14 samples/sec   Loss 8.3242   LearningRate 0.0717   Epoch: 3   Global Step: 17430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:49,625-Speed 5580.30 samples/sec   Loss 8.2664   LearningRate 0.0717   Epoch: 3   Global Step: 17440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:32:51,447-Speed 5625.29 samples/sec   Loss 8.3507   LearningRate 0.0717   Epoch: 3   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:53,275-Speed 5606.04 samples/sec   Loss 8.4257   LearningRate 0.0717   Epoch: 3   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:55,085-Speed 5658.67 samples/sec   Loss 8.4090   LearningRate 0.0716   Epoch: 3   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:56,906-Speed 5627.62 samples/sec   Loss 8.3149   LearningRate 0.0716   Epoch: 3   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:32:58,718-Speed 5657.11 samples/sec   Loss 8.2471   LearningRate 0.0716   Epoch: 3   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:00,527-Speed 5663.52 samples/sec   Loss 8.2851   LearningRate 0.0716   Epoch: 3   Global Step: 17500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:02,362-Speed 5584.19 samples/sec   Loss 8.2576   LearningRate 0.0716   Epoch: 3   Global Step: 17510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:04,186-Speed 5616.60 samples/sec   Loss 8.1898   LearningRate 0.0716   Epoch: 3   Global Step: 17520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:05,994-Speed 5664.59 samples/sec   Loss 8.1712   LearningRate 0.0715   Epoch: 3   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:07,797-Speed 5683.71 samples/sec   Loss 8.1893   LearningRate 0.0715   Epoch: 3   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:09,593-Speed 5702.87 samples/sec   Loss 8.4064   LearningRate 0.0715   Epoch: 3   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:11,406-Speed 5650.53 samples/sec   Loss 8.3539   LearningRate 0.0715   Epoch: 3   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:13,217-Speed 5657.62 samples/sec   Loss 8.3946   LearningRate 0.0715   Epoch: 3   Global Step: 17570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:15,027-Speed 5661.73 samples/sec   Loss 8.2741   LearningRate 0.0715   Epoch: 3   Global Step: 17580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:16,881-Speed 5524.56 samples/sec   Loss 8.4603   LearningRate 0.0715   Epoch: 3   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:18,684-Speed 5681.63 samples/sec   Loss 8.3465   LearningRate 0.0714   Epoch: 3   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:20,478-Speed 5711.90 samples/sec   Loss 8.4804   LearningRate 0.0714   Epoch: 3   Global Step: 17610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:22,294-Speed 5640.76 samples/sec   Loss 8.3364   LearningRate 0.0714   Epoch: 3   Global Step: 17620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:24,110-Speed 5643.11 samples/sec   Loss 8.3313   LearningRate 0.0714   Epoch: 3   Global Step: 17630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:25,954-Speed 5554.61 samples/sec   Loss 8.3908   LearningRate 0.0714   Epoch: 3   Global Step: 17640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:27,775-Speed 5649.19 samples/sec   Loss 8.2949   LearningRate 0.0714   Epoch: 3   Global Step: 17650   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:33:29,591-Speed 5646.06 samples/sec   Loss 8.3365   LearningRate 0.0714   Epoch: 3   Global Step: 17660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:31,395-Speed 5676.47 samples/sec   Loss 8.2960   LearningRate 0.0713   Epoch: 3   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:33,187-Speed 5718.24 samples/sec   Loss 8.4783   LearningRate 0.0713   Epoch: 3   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:34,993-Speed 5672.63 samples/sec   Loss 8.5359   LearningRate 0.0713   Epoch: 3   Global Step: 17690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:36,783-Speed 5723.48 samples/sec   Loss 8.3215   LearningRate 0.0713   Epoch: 3   Global Step: 17700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:38,574-Speed 5719.39 samples/sec   Loss 8.3830   LearningRate 0.0713   Epoch: 3   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:40,381-Speed 5670.07 samples/sec   Loss 8.3611   LearningRate 0.0713   Epoch: 3   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:42,270-Speed 5425.41 samples/sec   Loss 8.3514   LearningRate 0.0712   Epoch: 3   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:44,093-Speed 5618.67 samples/sec   Loss 8.5520   LearningRate 0.0712   Epoch: 3   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:45,898-Speed 5674.33 samples/sec   Loss 8.2699   LearningRate 0.0712   Epoch: 3   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:47,680-Speed 5749.78 samples/sec   Loss 8.5283   LearningRate 0.0712   Epoch: 3   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:49,502-Speed 5624.22 samples/sec   Loss 8.4532   LearningRate 0.0712   Epoch: 3   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:51,302-Speed 5690.74 samples/sec   Loss 8.3108   LearningRate 0.0712   Epoch: 3   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:53,110-Speed 5667.50 samples/sec   Loss 8.4022   LearningRate 0.0712   Epoch: 3   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:54,929-Speed 5631.37 samples/sec   Loss 8.4019   LearningRate 0.0711   Epoch: 3   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:56,737-Speed 5667.35 samples/sec   Loss 8.3025   LearningRate 0.0711   Epoch: 3   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:33:58,555-Speed 5635.27 samples/sec   Loss 8.4353   LearningRate 0.0711   Epoch: 3   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:00,399-Speed 5577.09 samples/sec   Loss 8.4107   LearningRate 0.0711   Epoch: 3   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:02,216-Speed 5637.08 samples/sec   Loss 8.4657   LearningRate 0.0711   Epoch: 3   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:04,037-Speed 5626.06 samples/sec   Loss 8.3889   LearningRate 0.0711   Epoch: 3   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:05,820-Speed 5746.32 samples/sec   Loss 8.5415   LearningRate 0.0711   Epoch: 3   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:07,641-Speed 5625.66 samples/sec   Loss 8.3466   LearningRate 0.0710   Epoch: 3   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:09,484-Speed 5558.24 samples/sec   Loss 8.3410   LearningRate 0.0710   Epoch: 3   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:11,284-Speed 5695.05 samples/sec   Loss 8.4210   LearningRate 0.0710   Epoch: 3   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:13,090-Speed 5671.05 samples/sec   Loss 8.4518   LearningRate 0.0710   Epoch: 3   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:34:14,905-Speed 5645.19 samples/sec   Loss 8.5334   LearningRate 0.0710   Epoch: 3   Global Step: 17910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:16,728-Speed 5619.69 samples/sec   Loss 8.4335   LearningRate 0.0710   Epoch: 3   Global Step: 17920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:18,578-Speed 5537.87 samples/sec   Loss 8.5289   LearningRate 0.0710   Epoch: 3   Global Step: 17930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:20,390-Speed 5654.97 samples/sec   Loss 8.5761   LearningRate 0.0709   Epoch: 3   Global Step: 17940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:22,227-Speed 5575.77 samples/sec   Loss 8.4816   LearningRate 0.0709   Epoch: 3   Global Step: 17950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:24,052-Speed 5613.91 samples/sec   Loss 8.4410   LearningRate 0.0709   Epoch: 3   Global Step: 17960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:25,906-Speed 5526.80 samples/sec   Loss 8.2965   LearningRate 0.0709   Epoch: 3   Global Step: 17970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:27,693-Speed 5731.42 samples/sec   Loss 8.5643   LearningRate 0.0709   Epoch: 3   Global Step: 17980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:29,513-Speed 5629.67 samples/sec   Loss 8.4499   LearningRate 0.0709   Epoch: 3   Global Step: 17990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:31,331-Speed 5637.67 samples/sec   Loss 8.5323   LearningRate 0.0708   Epoch: 3   Global Step: 18000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:34:57,837-[lfw][18000]XNorm: 22.767255
Training: 2022-04-27 02:34:57,838-[lfw][18000]Accuracy-Flip: 0.99567+-0.00281
Training: 2022-04-27 02:34:57,839-[lfw][18000]Accuracy-Highest: 0.99617
Training: 2022-04-27 02:35:28,673-[cfp_fp][18000]XNorm: 19.326344
Training: 2022-04-27 02:35:28,674-[cfp_fp][18000]Accuracy-Flip: 0.92057+-0.01528
Training: 2022-04-27 02:35:28,675-[cfp_fp][18000]Accuracy-Highest: 0.92343
Training: 2022-04-27 02:35:55,172-[agedb_30][18000]XNorm: 22.428629
Training: 2022-04-27 02:35:55,173-[agedb_30][18000]Accuracy-Flip: 0.96367+-0.00826
Training: 2022-04-27 02:35:55,173-[agedb_30][18000]Accuracy-Highest: 0.96383
Training: 2022-04-27 02:35:56,985-Speed 119.55 samples/sec   Loss 8.5343   LearningRate 0.0708   Epoch: 3   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:35:58,797-Speed 5653.55 samples/sec   Loss 8.2422   LearningRate 0.0708   Epoch: 3   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:00,588-Speed 5721.25 samples/sec   Loss 8.3996   LearningRate 0.0708   Epoch: 3   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:02,416-Speed 5605.78 samples/sec   Loss 8.4968   LearningRate 0.0708   Epoch: 3   Global Step: 18040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:04,239-Speed 5620.66 samples/sec   Loss 8.3137   LearningRate 0.0708   Epoch: 3   Global Step: 18050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:06,031-Speed 5717.76 samples/sec   Loss 8.4336   LearningRate 0.0708   Epoch: 3   Global Step: 18060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:07,881-Speed 5536.69 samples/sec   Loss 8.4393   LearningRate 0.0707   Epoch: 3   Global Step: 18070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:09,701-Speed 5631.30 samples/sec   Loss 8.4146   LearningRate 0.0707   Epoch: 3   Global Step: 18080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:11,512-Speed 5655.90 samples/sec   Loss 8.5153   LearningRate 0.0707   Epoch: 3   Global Step: 18090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:13,319-Speed 5669.39 samples/sec   Loss 8.3470   LearningRate 0.0707   Epoch: 3   Global Step: 18100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:15,123-Speed 5680.11 samples/sec   Loss 8.4384   LearningRate 0.0707   Epoch: 3   Global Step: 18110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:16,916-Speed 5715.05 samples/sec   Loss 8.4874   LearningRate 0.0707   Epoch: 3   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:18,706-Speed 5722.12 samples/sec   Loss 8.4564   LearningRate 0.0707   Epoch: 3   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:20,506-Speed 5692.06 samples/sec   Loss 8.2547   LearningRate 0.0706   Epoch: 3   Global Step: 18140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:22,286-Speed 5754.92 samples/sec   Loss 8.4558   LearningRate 0.0706   Epoch: 3   Global Step: 18150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:24,088-Speed 5685.60 samples/sec   Loss 8.3630   LearningRate 0.0706   Epoch: 3   Global Step: 18160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:25,921-Speed 5591.06 samples/sec   Loss 8.3808   LearningRate 0.0706   Epoch: 3   Global Step: 18170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:27,711-Speed 5722.93 samples/sec   Loss 8.3710   LearningRate 0.0706   Epoch: 3   Global Step: 18180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:29,515-Speed 5678.55 samples/sec   Loss 8.4228   LearningRate 0.0706   Epoch: 3   Global Step: 18190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:31,309-Speed 5710.30 samples/sec   Loss 8.4983   LearningRate 0.0706   Epoch: 3   Global Step: 18200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:33,118-Speed 5664.38 samples/sec   Loss 8.3403   LearningRate 0.0705   Epoch: 3   Global Step: 18210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:34,906-Speed 5731.45 samples/sec   Loss 8.3803   LearningRate 0.0705   Epoch: 3   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:36,693-Speed 5730.52 samples/sec   Loss 8.5510   LearningRate 0.0705   Epoch: 3   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:38,513-Speed 5630.06 samples/sec   Loss 8.2846   LearningRate 0.0705   Epoch: 3   Global Step: 18240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:40,335-Speed 5622.66 samples/sec   Loss 8.5901   LearningRate 0.0705   Epoch: 3   Global Step: 18250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:42,140-Speed 5674.76 samples/sec   Loss 8.3483   LearningRate 0.0705   Epoch: 3   Global Step: 18260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:43,993-Speed 5528.69 samples/sec   Loss 8.3887   LearningRate 0.0704   Epoch: 3   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:45,782-Speed 5726.92 samples/sec   Loss 8.3605   LearningRate 0.0704   Epoch: 3   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:47,579-Speed 5701.85 samples/sec   Loss 8.4801   LearningRate 0.0704   Epoch: 3   Global Step: 18290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:49,405-Speed 5609.62 samples/sec   Loss 8.4864   LearningRate 0.0704   Epoch: 3   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:51,217-Speed 5654.81 samples/sec   Loss 8.2974   LearningRate 0.0704   Epoch: 3   Global Step: 18310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:53,025-Speed 5665.25 samples/sec   Loss 8.2172   LearningRate 0.0704   Epoch: 3   Global Step: 18320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:54,845-Speed 5629.71 samples/sec   Loss 8.4339   LearningRate 0.0704   Epoch: 3   Global Step: 18330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:36:56,644-Speed 5695.60 samples/sec   Loss 8.4340   LearningRate 0.0703   Epoch: 3   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:36:58,427-Speed 5745.29 samples/sec   Loss 8.3267   LearningRate 0.0703   Epoch: 3   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:00,237-Speed 5660.82 samples/sec   Loss 8.5275   LearningRate 0.0703   Epoch: 3   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:02,046-Speed 5662.57 samples/sec   Loss 8.6164   LearningRate 0.0703   Epoch: 3   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:03,839-Speed 5714.48 samples/sec   Loss 8.3215   LearningRate 0.0703   Epoch: 3   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:05,664-Speed 5612.40 samples/sec   Loss 8.4056   LearningRate 0.0703   Epoch: 3   Global Step: 18390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:07,502-Speed 5573.06 samples/sec   Loss 8.2994   LearningRate 0.0703   Epoch: 3   Global Step: 18400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:09,310-Speed 5665.62 samples/sec   Loss 8.4662   LearningRate 0.0702   Epoch: 3   Global Step: 18410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:11,114-Speed 5680.60 samples/sec   Loss 8.3911   LearningRate 0.0702   Epoch: 3   Global Step: 18420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:12,903-Speed 5726.93 samples/sec   Loss 8.4963   LearningRate 0.0702   Epoch: 3   Global Step: 18430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:14,712-Speed 5662.68 samples/sec   Loss 8.3246   LearningRate 0.0702   Epoch: 3   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:16,500-Speed 5729.30 samples/sec   Loss 8.4406   LearningRate 0.0702   Epoch: 3   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:18,296-Speed 5702.97 samples/sec   Loss 8.5177   LearningRate 0.0702   Epoch: 3   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:20,103-Speed 5668.69 samples/sec   Loss 8.3149   LearningRate 0.0702   Epoch: 3   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:21,926-Speed 5619.76 samples/sec   Loss 8.6009   LearningRate 0.0701   Epoch: 3   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:23,758-Speed 5591.59 samples/sec   Loss 8.4070   LearningRate 0.0701   Epoch: 3   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:25,597-Speed 5572.51 samples/sec   Loss 8.4287   LearningRate 0.0701   Epoch: 3   Global Step: 18500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:27,445-Speed 5544.07 samples/sec   Loss 8.4747   LearningRate 0.0701   Epoch: 3   Global Step: 18510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:29,247-Speed 5686.71 samples/sec   Loss 8.5390   LearningRate 0.0701   Epoch: 3   Global Step: 18520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:31,065-Speed 5633.73 samples/sec   Loss 8.4454   LearningRate 0.0701   Epoch: 3   Global Step: 18530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:32,877-Speed 5653.18 samples/sec   Loss 8.4813   LearningRate 0.0701   Epoch: 3   Global Step: 18540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:34,691-Speed 5646.88 samples/sec   Loss 8.4360   LearningRate 0.0700   Epoch: 3   Global Step: 18550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:36,493-Speed 5685.04 samples/sec   Loss 8.4790   LearningRate 0.0700   Epoch: 3   Global Step: 18560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:38,309-Speed 5640.95 samples/sec   Loss 8.4045   LearningRate 0.0700   Epoch: 3   Global Step: 18570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:40,160-Speed 5535.78 samples/sec   Loss 8.4718   LearningRate 0.0700   Epoch: 3   Global Step: 18580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:42,010-Speed 5536.54 samples/sec   Loss 8.4224   LearningRate 0.0700   Epoch: 3   Global Step: 18590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:37:43,805-Speed 5707.99 samples/sec   Loss 8.2933   LearningRate 0.0700   Epoch: 3   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:45,618-Speed 5649.30 samples/sec   Loss 8.3618   LearningRate 0.0699   Epoch: 3   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:47,420-Speed 5687.11 samples/sec   Loss 8.3943   LearningRate 0.0699   Epoch: 3   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:49,229-Speed 5661.28 samples/sec   Loss 8.5983   LearningRate 0.0699   Epoch: 3   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:51,056-Speed 5608.70 samples/sec   Loss 8.4922   LearningRate 0.0699   Epoch: 3   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:52,847-Speed 5721.26 samples/sec   Loss 8.4613   LearningRate 0.0699   Epoch: 3   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:54,678-Speed 5595.36 samples/sec   Loss 8.4857   LearningRate 0.0699   Epoch: 3   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:56,491-Speed 5650.42 samples/sec   Loss 8.3915   LearningRate 0.0699   Epoch: 3   Global Step: 18670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:37:58,312-Speed 5624.43 samples/sec   Loss 8.1290   LearningRate 0.0698   Epoch: 3   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:00,133-Speed 5628.34 samples/sec   Loss 8.4954   LearningRate 0.0698   Epoch: 3   Global Step: 18690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:01,986-Speed 5527.69 samples/sec   Loss 8.3874   LearningRate 0.0698   Epoch: 3   Global Step: 18700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:03,788-Speed 5685.54 samples/sec   Loss 8.2651   LearningRate 0.0698   Epoch: 3   Global Step: 18710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:05,617-Speed 5601.46 samples/sec   Loss 8.3409   LearningRate 0.0698   Epoch: 3   Global Step: 18720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:07,417-Speed 5695.48 samples/sec   Loss 8.3706   LearningRate 0.0698   Epoch: 3   Global Step: 18730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:09,211-Speed 5711.01 samples/sec   Loss 8.3220   LearningRate 0.0698   Epoch: 3   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:11,018-Speed 5667.32 samples/sec   Loss 8.3992   LearningRate 0.0697   Epoch: 3   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:12,854-Speed 5581.66 samples/sec   Loss 8.3869   LearningRate 0.0697   Epoch: 3   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:14,676-Speed 5622.99 samples/sec   Loss 8.3897   LearningRate 0.0697   Epoch: 3   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:16,498-Speed 5621.19 samples/sec   Loss 8.4682   LearningRate 0.0697   Epoch: 3   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:18,334-Speed 5580.38 samples/sec   Loss 8.2223   LearningRate 0.0697   Epoch: 3   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:20,168-Speed 5587.54 samples/sec   Loss 8.3587   LearningRate 0.0697   Epoch: 3   Global Step: 18800   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:38:21,994-Speed 5611.92 samples/sec   Loss 8.3946   LearningRate 0.0697   Epoch: 3   Global Step: 18810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:23,799-Speed 5677.53 samples/sec   Loss 8.3474   LearningRate 0.0696   Epoch: 3   Global Step: 18820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:25,637-Speed 5572.71 samples/sec   Loss 8.3207   LearningRate 0.0696   Epoch: 3   Global Step: 18830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:27,455-Speed 5633.25 samples/sec   Loss 8.3161   LearningRate 0.0696   Epoch: 3   Global Step: 18840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:29,265-Speed 5661.59 samples/sec   Loss 8.2828   LearningRate 0.0696   Epoch: 3   Global Step: 18850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:31,063-Speed 5699.18 samples/sec   Loss 8.3774   LearningRate 0.0696   Epoch: 3   Global Step: 18860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:32,874-Speed 5657.34 samples/sec   Loss 8.4149   LearningRate 0.0696   Epoch: 3   Global Step: 18870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:34,681-Speed 5668.78 samples/sec   Loss 8.3748   LearningRate 0.0696   Epoch: 3   Global Step: 18880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:36,477-Speed 5704.80 samples/sec   Loss 8.3782   LearningRate 0.0695   Epoch: 3   Global Step: 18890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:38,288-Speed 5657.85 samples/sec   Loss 8.3291   LearningRate 0.0695   Epoch: 3   Global Step: 18900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:40,068-Speed 5754.61 samples/sec   Loss 8.4196   LearningRate 0.0695   Epoch: 3   Global Step: 18910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:41,873-Speed 5683.77 samples/sec   Loss 8.3016   LearningRate 0.0695   Epoch: 3   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:43,677-Speed 5678.95 samples/sec   Loss 8.3767   LearningRate 0.0695   Epoch: 3   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:45,461-Speed 5743.79 samples/sec   Loss 8.3087   LearningRate 0.0695   Epoch: 3   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:47,267-Speed 5674.34 samples/sec   Loss 8.3489   LearningRate 0.0694   Epoch: 3   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:49,059-Speed 5717.11 samples/sec   Loss 8.3205   LearningRate 0.0694   Epoch: 3   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:50,856-Speed 5698.44 samples/sec   Loss 8.3590   LearningRate 0.0694   Epoch: 3   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:52,677-Speed 5628.43 samples/sec   Loss 8.2597   LearningRate 0.0694   Epoch: 3   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:38:54,484-Speed 5670.38 samples/sec   Loss 8.3710   LearningRate 0.0694   Epoch: 3   Global Step: 18990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:38:56,271-Speed 5729.64 samples/sec   Loss 8.3074   LearningRate 0.0694   Epoch: 3   Global Step: 19000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:38:58,056-Speed 5741.39 samples/sec   Loss 8.4247   LearningRate 0.0694   Epoch: 3   Global Step: 19010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:38:59,878-Speed 5624.51 samples/sec   Loss 8.3125   LearningRate 0.0693   Epoch: 3   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:01,686-Speed 5665.41 samples/sec   Loss 8.2646   LearningRate 0.0693   Epoch: 3   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:03,479-Speed 5712.99 samples/sec   Loss 8.3093   LearningRate 0.0693   Epoch: 3   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:05,285-Speed 5674.44 samples/sec   Loss 8.1821   LearningRate 0.0693   Epoch: 3   Global Step: 19050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:07,099-Speed 5645.09 samples/sec   Loss 8.4694   LearningRate 0.0693   Epoch: 3   Global Step: 19060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:08,900-Speed 5687.58 samples/sec   Loss 8.4165   LearningRate 0.0693   Epoch: 3   Global Step: 19070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:10,698-Speed 5698.57 samples/sec   Loss 8.3309   LearningRate 0.0693   Epoch: 3   Global Step: 19080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:12,496-Speed 5700.70 samples/sec   Loss 8.1671   LearningRate 0.0692   Epoch: 3   Global Step: 19090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:14,304-Speed 5664.36 samples/sec   Loss 8.2899   LearningRate 0.0692   Epoch: 3   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:16,099-Speed 5707.72 samples/sec   Loss 8.1756   LearningRate 0.0692   Epoch: 3   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:17,892-Speed 5714.10 samples/sec   Loss 8.3599   LearningRate 0.0692   Epoch: 3   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:19,681-Speed 5726.47 samples/sec   Loss 8.3337   LearningRate 0.0692   Epoch: 3   Global Step: 19130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:21,475-Speed 5710.64 samples/sec   Loss 8.3660   LearningRate 0.0692   Epoch: 3   Global Step: 19140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:23,306-Speed 5593.16 samples/sec   Loss 8.2427   LearningRate 0.0692   Epoch: 3   Global Step: 19150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:25,121-Speed 5644.02 samples/sec   Loss 8.4933   LearningRate 0.0691   Epoch: 3   Global Step: 19160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:26,915-Speed 5711.57 samples/sec   Loss 8.3020   LearningRate 0.0691   Epoch: 3   Global Step: 19170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:28,718-Speed 5682.30 samples/sec   Loss 8.5337   LearningRate 0.0691   Epoch: 3   Global Step: 19180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:30,518-Speed 5690.19 samples/sec   Loss 8.1706   LearningRate 0.0691   Epoch: 3   Global Step: 19190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:32,299-Speed 5751.66 samples/sec   Loss 8.1813   LearningRate 0.0691   Epoch: 3   Global Step: 19200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:34,133-Speed 5586.95 samples/sec   Loss 8.4280   LearningRate 0.0691   Epoch: 3   Global Step: 19210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:35,931-Speed 5697.30 samples/sec   Loss 8.4220   LearningRate 0.0691   Epoch: 3   Global Step: 19220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:39:37,752-Speed 5622.53 samples/sec   Loss 8.4001   LearningRate 0.0690   Epoch: 3   Global Step: 19230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:39,566-Speed 5648.97 samples/sec   Loss 8.3895   LearningRate 0.0690   Epoch: 3   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:41,356-Speed 5721.79 samples/sec   Loss 8.3197   LearningRate 0.0690   Epoch: 3   Global Step: 19250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:43,170-Speed 5648.97 samples/sec   Loss 8.3171   LearningRate 0.0690   Epoch: 3   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:44,956-Speed 5733.88 samples/sec   Loss 8.2030   LearningRate 0.0690   Epoch: 3   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:46,760-Speed 5679.95 samples/sec   Loss 8.1889   LearningRate 0.0690   Epoch: 3   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:48,604-Speed 5555.52 samples/sec   Loss 8.4377   LearningRate 0.0690   Epoch: 3   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:50,464-Speed 5506.39 samples/sec   Loss 8.2884   LearningRate 0.0689   Epoch: 3   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:52,277-Speed 5650.53 samples/sec   Loss 8.2151   LearningRate 0.0689   Epoch: 3   Global Step: 19310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:54,083-Speed 5674.86 samples/sec   Loss 8.2398   LearningRate 0.0689   Epoch: 3   Global Step: 19320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:55,886-Speed 5678.93 samples/sec   Loss 8.2367   LearningRate 0.0689   Epoch: 3   Global Step: 19330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:57,751-Speed 5493.25 samples/sec   Loss 8.2599   LearningRate 0.0689   Epoch: 3   Global Step: 19340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:39:59,611-Speed 5510.10 samples/sec   Loss 8.4112   LearningRate 0.0689   Epoch: 3   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:01,415-Speed 5678.52 samples/sec   Loss 8.4052   LearningRate 0.0688   Epoch: 3   Global Step: 19360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:03,248-Speed 5589.85 samples/sec   Loss 8.3125   LearningRate 0.0688   Epoch: 3   Global Step: 19370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:05,102-Speed 5528.31 samples/sec   Loss 8.2686   LearningRate 0.0688   Epoch: 3   Global Step: 19380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:06,954-Speed 5529.13 samples/sec   Loss 8.4337   LearningRate 0.0688   Epoch: 3   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:08,776-Speed 5624.75 samples/sec   Loss 8.3287   LearningRate 0.0688   Epoch: 3   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:10,580-Speed 5676.87 samples/sec   Loss 8.2711   LearningRate 0.0688   Epoch: 3   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:12,376-Speed 5705.94 samples/sec   Loss 8.3854   LearningRate 0.0688   Epoch: 3   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:14,173-Speed 5700.25 samples/sec   Loss 8.4877   LearningRate 0.0687   Epoch: 3   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:15,970-Speed 5699.71 samples/sec   Loss 8.2535   LearningRate 0.0687   Epoch: 3   Global Step: 19440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:17,773-Speed 5689.54 samples/sec   Loss 8.3547   LearningRate 0.0687   Epoch: 3   Global Step: 19450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:19,575-Speed 5684.34 samples/sec   Loss 8.1746   LearningRate 0.0687   Epoch: 3   Global Step: 19460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:21,400-Speed 5614.48 samples/sec   Loss 8.2451   LearningRate 0.0687   Epoch: 3   Global Step: 19470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:23,196-Speed 5702.41 samples/sec   Loss 8.2338   LearningRate 0.0687   Epoch: 3   Global Step: 19480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:25,007-Speed 5660.23 samples/sec   Loss 8.1274   LearningRate 0.0687   Epoch: 3   Global Step: 19490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:26,812-Speed 5674.22 samples/sec   Loss 8.3413   LearningRate 0.0686   Epoch: 3   Global Step: 19500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:28,613-Speed 5687.52 samples/sec   Loss 8.4609   LearningRate 0.0686   Epoch: 3   Global Step: 19510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:30,405-Speed 5717.19 samples/sec   Loss 8.2033   LearningRate 0.0686   Epoch: 3   Global Step: 19520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:32,236-Speed 5595.20 samples/sec   Loss 8.3195   LearningRate 0.0686   Epoch: 3   Global Step: 19530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:40:34,040-Speed 5680.37 samples/sec   Loss 8.1194   LearningRate 0.0686   Epoch: 3   Global Step: 19540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:35,846-Speed 5671.50 samples/sec   Loss 8.2669   LearningRate 0.0686   Epoch: 3   Global Step: 19550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:37,679-Speed 5589.92 samples/sec   Loss 8.2865   LearningRate 0.0686   Epoch: 3   Global Step: 19560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:39,494-Speed 5645.45 samples/sec   Loss 8.2417   LearningRate 0.0685   Epoch: 3   Global Step: 19570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:41,297-Speed 5682.78 samples/sec   Loss 8.2778   LearningRate 0.0685   Epoch: 3   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:43,121-Speed 5618.00 samples/sec   Loss 8.2448   LearningRate 0.0685   Epoch: 3   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:44,929-Speed 5665.08 samples/sec   Loss 8.3075   LearningRate 0.0685   Epoch: 3   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:46,740-Speed 5657.03 samples/sec   Loss 8.4955   LearningRate 0.0685   Epoch: 3   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:48,539-Speed 5695.49 samples/sec   Loss 8.2020   LearningRate 0.0685   Epoch: 3   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:50,356-Speed 5637.21 samples/sec   Loss 8.2637   LearningRate 0.0685   Epoch: 3   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:52,158-Speed 5686.78 samples/sec   Loss 8.4194   LearningRate 0.0684   Epoch: 3   Global Step: 19640   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:40:53,950-Speed 5718.40 samples/sec   Loss 8.3417   LearningRate 0.0684   Epoch: 3   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:55,797-Speed 5545.19 samples/sec   Loss 8.4313   LearningRate 0.0684   Epoch: 3   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:57,616-Speed 5635.18 samples/sec   Loss 8.1371   LearningRate 0.0684   Epoch: 3   Global Step: 19670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:40:59,445-Speed 5602.59 samples/sec   Loss 8.3830   LearningRate 0.0684   Epoch: 3   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:01,283-Speed 5573.92 samples/sec   Loss 8.2642   LearningRate 0.0684   Epoch: 3   Global Step: 19690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:03,082-Speed 5693.86 samples/sec   Loss 8.1585   LearningRate 0.0684   Epoch: 3   Global Step: 19700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:04,918-Speed 5580.75 samples/sec   Loss 8.3060   LearningRate 0.0683   Epoch: 3   Global Step: 19710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:06,711-Speed 5711.71 samples/sec   Loss 8.1798   LearningRate 0.0683   Epoch: 3   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:08,528-Speed 5639.30 samples/sec   Loss 8.3373   LearningRate 0.0683   Epoch: 3   Global Step: 19730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:10,336-Speed 5667.43 samples/sec   Loss 8.2076   LearningRate 0.0683   Epoch: 3   Global Step: 19740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:12,143-Speed 5670.11 samples/sec   Loss 8.2276   LearningRate 0.0683   Epoch: 3   Global Step: 19750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:13,982-Speed 5569.20 samples/sec   Loss 8.2306   LearningRate 0.0683   Epoch: 3   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:15,806-Speed 5617.68 samples/sec   Loss 8.1713   LearningRate 0.0683   Epoch: 3   Global Step: 19770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:17,616-Speed 5660.76 samples/sec   Loss 8.4062   LearningRate 0.0682   Epoch: 3   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:19,461-Speed 5550.84 samples/sec   Loss 8.2956   LearningRate 0.0682   Epoch: 3   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:21,266-Speed 5674.66 samples/sec   Loss 8.3398   LearningRate 0.0682   Epoch: 3   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:23,090-Speed 5617.83 samples/sec   Loss 8.4015   LearningRate 0.0682   Epoch: 3   Global Step: 19810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:24,880-Speed 5724.90 samples/sec   Loss 8.2963   LearningRate 0.0682   Epoch: 3   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:26,701-Speed 5625.81 samples/sec   Loss 8.1786   LearningRate 0.0682   Epoch: 3   Global Step: 19830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:28,516-Speed 5644.79 samples/sec   Loss 8.4416   LearningRate 0.0682   Epoch: 3   Global Step: 19840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:30,324-Speed 5667.03 samples/sec   Loss 8.1953   LearningRate 0.0681   Epoch: 3   Global Step: 19850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:32,144-Speed 5628.59 samples/sec   Loss 8.2384   LearningRate 0.0681   Epoch: 3   Global Step: 19860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:34,017-Speed 5468.95 samples/sec   Loss 8.2379   LearningRate 0.0681   Epoch: 3   Global Step: 19870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:35,812-Speed 5710.42 samples/sec   Loss 8.2654   LearningRate 0.0681   Epoch: 3   Global Step: 19880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:37,638-Speed 5610.06 samples/sec   Loss 8.3618   LearningRate 0.0681   Epoch: 3   Global Step: 19890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:39,443-Speed 5674.46 samples/sec   Loss 8.2822   LearningRate 0.0681   Epoch: 3   Global Step: 19900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:41,251-Speed 5666.34 samples/sec   Loss 8.0920   LearningRate 0.0680   Epoch: 3   Global Step: 19910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:41:43,096-Speed 5552.99 samples/sec   Loss 8.2786   LearningRate 0.0680   Epoch: 3   Global Step: 19920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:44,930-Speed 5586.41 samples/sec   Loss 8.1576   LearningRate 0.0680   Epoch: 3   Global Step: 19930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:46,773-Speed 5558.93 samples/sec   Loss 8.0952   LearningRate 0.0680   Epoch: 3   Global Step: 19940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:48,611-Speed 5574.95 samples/sec   Loss 8.2131   LearningRate 0.0680   Epoch: 3   Global Step: 19950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:50,449-Speed 5571.40 samples/sec   Loss 8.1865   LearningRate 0.0680   Epoch: 3   Global Step: 19960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:52,264-Speed 5644.91 samples/sec   Loss 8.2419   LearningRate 0.0680   Epoch: 3   Global Step: 19970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:54,070-Speed 5673.64 samples/sec   Loss 8.2091   LearningRate 0.0679   Epoch: 3   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:55,861-Speed 5720.25 samples/sec   Loss 8.2650   LearningRate 0.0679   Epoch: 3   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:41:57,655-Speed 5712.02 samples/sec   Loss 8.3071   LearningRate 0.0679   Epoch: 3   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:42:24,579-[lfw][20000]XNorm: 23.215628
Training: 2022-04-27 02:42:24,580-[lfw][20000]Accuracy-Flip: 0.99500+-0.00387
Training: 2022-04-27 02:42:24,581-[lfw][20000]Accuracy-Highest: 0.99617
Training: 2022-04-27 02:42:55,768-[cfp_fp][20000]XNorm: 20.358291
Training: 2022-04-27 02:42:55,769-[cfp_fp][20000]Accuracy-Flip: 0.91500+-0.01674
Training: 2022-04-27 02:42:55,770-[cfp_fp][20000]Accuracy-Highest: 0.92343
Training: 2022-04-27 02:43:22,730-[agedb_30][20000]XNorm: 23.245018
Training: 2022-04-27 02:43:22,731-[agedb_30][20000]Accuracy-Flip: 0.96867+-0.01082
Training: 2022-04-27 02:43:22,732-[agedb_30][20000]Accuracy-Highest: 0.96867
Training: 2022-04-27 02:43:24,573-Speed 117.81 samples/sec   Loss 8.2992   LearningRate 0.0679   Epoch: 3   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:26,368-Speed 5707.58 samples/sec   Loss 8.1662   LearningRate 0.0679   Epoch: 3   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:28,174-Speed 5671.21 samples/sec   Loss 8.1514   LearningRate 0.0679   Epoch: 3   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:29,997-Speed 5623.00 samples/sec   Loss 8.1965   LearningRate 0.0679   Epoch: 3   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:31,836-Speed 5574.94 samples/sec   Loss 8.3157   LearningRate 0.0678   Epoch: 3   Global Step: 20050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:33,640-Speed 5680.30 samples/sec   Loss 8.2701   LearningRate 0.0678   Epoch: 3   Global Step: 20060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:35,448-Speed 5666.41 samples/sec   Loss 8.2319   LearningRate 0.0678   Epoch: 3   Global Step: 20070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:37,247-Speed 5693.21 samples/sec   Loss 8.0376   LearningRate 0.0678   Epoch: 3   Global Step: 20080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:39,029-Speed 5749.61 samples/sec   Loss 8.1342   LearningRate 0.0678   Epoch: 3   Global Step: 20090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:40,861-Speed 5592.42 samples/sec   Loss 8.1907   LearningRate 0.0678   Epoch: 3   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:42,677-Speed 5641.85 samples/sec   Loss 8.1161   LearningRate 0.0678   Epoch: 3   Global Step: 20110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:44,489-Speed 5655.27 samples/sec   Loss 8.2633   LearningRate 0.0677   Epoch: 3   Global Step: 20120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:46,303-Speed 5648.35 samples/sec   Loss 8.3329   LearningRate 0.0677   Epoch: 3   Global Step: 20130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:48,111-Speed 5664.92 samples/sec   Loss 8.1445   LearningRate 0.0677   Epoch: 3   Global Step: 20140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:43:49,910-Speed 5696.59 samples/sec   Loss 8.1232   LearningRate 0.0677   Epoch: 3   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:51,760-Speed 5536.35 samples/sec   Loss 8.2375   LearningRate 0.0677   Epoch: 3   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:53,579-Speed 5634.19 samples/sec   Loss 8.2744   LearningRate 0.0677   Epoch: 3   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:55,416-Speed 5577.58 samples/sec   Loss 8.2870   LearningRate 0.0677   Epoch: 3   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:57,220-Speed 5679.11 samples/sec   Loss 8.3100   LearningRate 0.0676   Epoch: 3   Global Step: 20190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:43:59,039-Speed 5632.62 samples/sec   Loss 8.3571   LearningRate 0.0676   Epoch: 3   Global Step: 20200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:00,837-Speed 5697.11 samples/sec   Loss 8.1697   LearningRate 0.0676   Epoch: 3   Global Step: 20210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:02,658-Speed 5625.67 samples/sec   Loss 8.2149   LearningRate 0.0676   Epoch: 3   Global Step: 20220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:04,493-Speed 5585.09 samples/sec   Loss 8.2279   LearningRate 0.0676   Epoch: 3   Global Step: 20230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:06,309-Speed 5638.85 samples/sec   Loss 8.3107   LearningRate 0.0676   Epoch: 3   Global Step: 20240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:08,135-Speed 5613.51 samples/sec   Loss 8.2085   LearningRate 0.0676   Epoch: 3   Global Step: 20250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:09,973-Speed 5571.95 samples/sec   Loss 8.1169   LearningRate 0.0675   Epoch: 3   Global Step: 20260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:11,784-Speed 5657.95 samples/sec   Loss 7.9985   LearningRate 0.0675   Epoch: 3   Global Step: 20270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:13,590-Speed 5675.23 samples/sec   Loss 8.2000   LearningRate 0.0675   Epoch: 3   Global Step: 20280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:15,400-Speed 5661.17 samples/sec   Loss 8.1246   LearningRate 0.0675   Epoch: 3   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:17,206-Speed 5671.17 samples/sec   Loss 8.1141   LearningRate 0.0675   Epoch: 3   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:19,016-Speed 5658.76 samples/sec   Loss 8.1518   LearningRate 0.0675   Epoch: 3   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:20,865-Speed 5542.07 samples/sec   Loss 8.0869   LearningRate 0.0675   Epoch: 3   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:22,687-Speed 5624.56 samples/sec   Loss 8.0263   LearningRate 0.0674   Epoch: 3   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:24,505-Speed 5636.30 samples/sec   Loss 8.1434   LearningRate 0.0674   Epoch: 3   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:26,315-Speed 5657.89 samples/sec   Loss 8.2738   LearningRate 0.0674   Epoch: 3   Global Step: 20350   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:44:28,102-Speed 5735.94 samples/sec   Loss 8.1815   LearningRate 0.0674   Epoch: 3   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:29,910-Speed 5665.04 samples/sec   Loss 8.0499   LearningRate 0.0674   Epoch: 3   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:31,717-Speed 5669.25 samples/sec   Loss 8.0866   LearningRate 0.0674   Epoch: 3   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:33,510-Speed 5714.04 samples/sec   Loss 8.1325   LearningRate 0.0674   Epoch: 3   Global Step: 20390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:35,303-Speed 5713.12 samples/sec   Loss 8.0841   LearningRate 0.0673   Epoch: 3   Global Step: 20400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:37,102-Speed 5697.52 samples/sec   Loss 8.2527   LearningRate 0.0673   Epoch: 3   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:38,899-Speed 5698.29 samples/sec   Loss 7.9853   LearningRate 0.0673   Epoch: 3   Global Step: 20420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:40,709-Speed 5660.80 samples/sec   Loss 8.3250   LearningRate 0.0673   Epoch: 3   Global Step: 20430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:42,505-Speed 5704.50 samples/sec   Loss 8.2201   LearningRate 0.0673   Epoch: 3   Global Step: 20440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:44,303-Speed 5698.55 samples/sec   Loss 8.3665   LearningRate 0.0673   Epoch: 3   Global Step: 20450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:46,087-Speed 5755.45 samples/sec   Loss 8.1032   LearningRate 0.0673   Epoch: 3   Global Step: 20460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:47,906-Speed 5633.54 samples/sec   Loss 8.1148   LearningRate 0.0672   Epoch: 3   Global Step: 20470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:49,719-Speed 5651.27 samples/sec   Loss 8.1558   LearningRate 0.0672   Epoch: 3   Global Step: 20480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:44:51,550-Speed 5594.90 samples/sec   Loss 8.1487   LearningRate 0.0672   Epoch: 3   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:44:53,365-Speed 5646.72 samples/sec   Loss 7.9561   LearningRate 0.0672   Epoch: 3   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:44:55,167-Speed 5684.52 samples/sec   Loss 8.2604   LearningRate 0.0672   Epoch: 3   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:44:56,997-Speed 5600.18 samples/sec   Loss 8.1160   LearningRate 0.0672   Epoch: 3   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:44:58,807-Speed 5658.09 samples/sec   Loss 8.1598   LearningRate 0.0672   Epoch: 3   Global Step: 20530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:00,618-Speed 5659.00 samples/sec   Loss 8.1716   LearningRate 0.0671   Epoch: 3   Global Step: 20540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:02,440-Speed 5622.25 samples/sec   Loss 8.1622   LearningRate 0.0671   Epoch: 3   Global Step: 20550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:04,245-Speed 5676.33 samples/sec   Loss 8.2538   LearningRate 0.0671   Epoch: 3   Global Step: 20560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:06,045-Speed 5689.73 samples/sec   Loss 8.0664   LearningRate 0.0671   Epoch: 3   Global Step: 20570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:07,861-Speed 5641.17 samples/sec   Loss 8.2742   LearningRate 0.0671   Epoch: 3   Global Step: 20580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:09,694-Speed 5590.30 samples/sec   Loss 8.0579   LearningRate 0.0671   Epoch: 3   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:11,507-Speed 5651.84 samples/sec   Loss 8.2000   LearningRate 0.0671   Epoch: 3   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:13,293-Speed 5740.20 samples/sec   Loss 8.2802   LearningRate 0.0670   Epoch: 3   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:15,088-Speed 5707.89 samples/sec   Loss 8.2470   LearningRate 0.0670   Epoch: 3   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:16,882-Speed 5716.56 samples/sec   Loss 8.2000   LearningRate 0.0670   Epoch: 3   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:18,700-Speed 5634.30 samples/sec   Loss 8.3014   LearningRate 0.0670   Epoch: 3   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:20,515-Speed 5643.77 samples/sec   Loss 8.3213   LearningRate 0.0670   Epoch: 3   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:22,328-Speed 5653.58 samples/sec   Loss 8.3755   LearningRate 0.0670   Epoch: 3   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:24,129-Speed 5687.80 samples/sec   Loss 8.2022   LearningRate 0.0670   Epoch: 3   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:25,928-Speed 5694.21 samples/sec   Loss 8.3416   LearningRate 0.0669   Epoch: 3   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:27,752-Speed 5617.76 samples/sec   Loss 8.2834   LearningRate 0.0669   Epoch: 3   Global Step: 20690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:29,557-Speed 5677.05 samples/sec   Loss 8.1810   LearningRate 0.0669   Epoch: 3   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:31,374-Speed 5637.69 samples/sec   Loss 8.1870   LearningRate 0.0669   Epoch: 3   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:33,183-Speed 5663.14 samples/sec   Loss 8.1230   LearningRate 0.0669   Epoch: 3   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:45:34,995-Speed 5653.93 samples/sec   Loss 8.0577   LearningRate 0.0669   Epoch: 3   Global Step: 20730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:36,799-Speed 5676.86 samples/sec   Loss 8.2390   LearningRate 0.0669   Epoch: 3   Global Step: 20740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:38,609-Speed 5659.12 samples/sec   Loss 8.2185   LearningRate 0.0668   Epoch: 3   Global Step: 20750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:40,411-Speed 5684.99 samples/sec   Loss 8.2117   LearningRate 0.0668   Epoch: 3   Global Step: 20760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:42,236-Speed 5613.08 samples/sec   Loss 8.0580   LearningRate 0.0668   Epoch: 3   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:44,050-Speed 5649.17 samples/sec   Loss 8.1904   LearningRate 0.0668   Epoch: 3   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:45,865-Speed 5644.00 samples/sec   Loss 8.1065   LearningRate 0.0668   Epoch: 3   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:47,670-Speed 5675.63 samples/sec   Loss 8.0697   LearningRate 0.0668   Epoch: 3   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:49,480-Speed 5661.36 samples/sec   Loss 8.2632   LearningRate 0.0667   Epoch: 3   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:51,293-Speed 5647.73 samples/sec   Loss 8.1320   LearningRate 0.0667   Epoch: 3   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:53,087-Speed 5716.25 samples/sec   Loss 8.2437   LearningRate 0.0667   Epoch: 3   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:54,925-Speed 5573.57 samples/sec   Loss 8.2515   LearningRate 0.0667   Epoch: 3   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:56,729-Speed 5678.06 samples/sec   Loss 8.1505   LearningRate 0.0667   Epoch: 3   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:45:58,525-Speed 5703.59 samples/sec   Loss 8.1271   LearningRate 0.0667   Epoch: 3   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:00,337-Speed 5656.75 samples/sec   Loss 8.1901   LearningRate 0.0667   Epoch: 3   Global Step: 20870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:02,158-Speed 5627.40 samples/sec   Loss 8.0976   LearningRate 0.0666   Epoch: 3   Global Step: 20880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:03,986-Speed 5605.78 samples/sec   Loss 8.2696   LearningRate 0.0666   Epoch: 3   Global Step: 20890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:05,798-Speed 5653.28 samples/sec   Loss 8.1088   LearningRate 0.0666   Epoch: 3   Global Step: 20900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:07,620-Speed 5622.69 samples/sec   Loss 8.0000   LearningRate 0.0666   Epoch: 3   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:09,436-Speed 5643.32 samples/sec   Loss 8.2603   LearningRate 0.0666   Epoch: 3   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:11,304-Speed 5484.14 samples/sec   Loss 8.2375   LearningRate 0.0666   Epoch: 3   Global Step: 20930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:13,112-Speed 5668.22 samples/sec   Loss 8.0563   LearningRate 0.0666   Epoch: 3   Global Step: 20940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:14,933-Speed 5625.39 samples/sec   Loss 8.1153   LearningRate 0.0665   Epoch: 3   Global Step: 20950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:16,740-Speed 5670.86 samples/sec   Loss 8.0762   LearningRate 0.0665   Epoch: 3   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:18,551-Speed 5655.30 samples/sec   Loss 8.0670   LearningRate 0.0665   Epoch: 3   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:20,352-Speed 5689.06 samples/sec   Loss 8.0007   LearningRate 0.0665   Epoch: 3   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:22,153-Speed 5689.11 samples/sec   Loss 8.1004   LearningRate 0.0665   Epoch: 3   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:23,999-Speed 5550.35 samples/sec   Loss 8.2068   LearningRate 0.0665   Epoch: 3   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:25,792-Speed 5712.94 samples/sec   Loss 8.0037   LearningRate 0.0665   Epoch: 3   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:27,615-Speed 5619.62 samples/sec   Loss 7.9615   LearningRate 0.0664   Epoch: 3   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:29,427-Speed 5657.19 samples/sec   Loss 7.9987   LearningRate 0.0664   Epoch: 3   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:31,226-Speed 5694.69 samples/sec   Loss 8.1091   LearningRate 0.0664   Epoch: 3   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:33,042-Speed 5640.67 samples/sec   Loss 8.0157   LearningRate 0.0664   Epoch: 3   Global Step: 21050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:34,858-Speed 5640.43 samples/sec   Loss 8.0764   LearningRate 0.0664   Epoch: 3   Global Step: 21060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:36,666-Speed 5667.02 samples/sec   Loss 8.1555   LearningRate 0.0664   Epoch: 3   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:38,477-Speed 5657.76 samples/sec   Loss 8.0777   LearningRate 0.0664   Epoch: 3   Global Step: 21080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:40,300-Speed 5620.33 samples/sec   Loss 8.0800   LearningRate 0.0663   Epoch: 3   Global Step: 21090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:42,113-Speed 5649.63 samples/sec   Loss 8.1122   LearningRate 0.0663   Epoch: 3   Global Step: 21100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:46:43,895-Speed 5748.85 samples/sec   Loss 8.2092   LearningRate 0.0663   Epoch: 3   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:45,736-Speed 5566.15 samples/sec   Loss 8.1133   LearningRate 0.0663   Epoch: 3   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:47,542-Speed 5671.46 samples/sec   Loss 8.3125   LearningRate 0.0663   Epoch: 3   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:49,341-Speed 5696.43 samples/sec   Loss 8.2305   LearningRate 0.0663   Epoch: 3   Global Step: 21140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:51,154-Speed 5649.29 samples/sec   Loss 8.0413   LearningRate 0.0663   Epoch: 3   Global Step: 21150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:52,960-Speed 5672.43 samples/sec   Loss 8.2204   LearningRate 0.0662   Epoch: 3   Global Step: 21160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:54,781-Speed 5626.70 samples/sec   Loss 8.0946   LearningRate 0.0662   Epoch: 3   Global Step: 21170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:56,592-Speed 5658.14 samples/sec   Loss 8.1181   LearningRate 0.0662   Epoch: 3   Global Step: 21180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:46:58,398-Speed 5672.18 samples/sec   Loss 7.9500   LearningRate 0.0662   Epoch: 3   Global Step: 21190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:00,215-Speed 5641.03 samples/sec   Loss 8.0347   LearningRate 0.0662   Epoch: 3   Global Step: 21200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:02,026-Speed 5658.09 samples/sec   Loss 8.1101   LearningRate 0.0662   Epoch: 3   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:03,859-Speed 5585.96 samples/sec   Loss 8.0024   LearningRate 0.0662   Epoch: 3   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:05,665-Speed 5674.23 samples/sec   Loss 8.0278   LearningRate 0.0661   Epoch: 3   Global Step: 21230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:07,459-Speed 5711.71 samples/sec   Loss 8.1477   LearningRate 0.0661   Epoch: 3   Global Step: 21240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:09,255-Speed 5702.71 samples/sec   Loss 8.0722   LearningRate 0.0661   Epoch: 3   Global Step: 21250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:11,069-Speed 5646.66 samples/sec   Loss 8.1638   LearningRate 0.0661   Epoch: 3   Global Step: 21260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:12,865-Speed 5707.78 samples/sec   Loss 8.1328   LearningRate 0.0661   Epoch: 3   Global Step: 21270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:14,672-Speed 5670.13 samples/sec   Loss 7.9412   LearningRate 0.0661   Epoch: 3   Global Step: 21280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:16,497-Speed 5612.60 samples/sec   Loss 8.1682   LearningRate 0.0661   Epoch: 3   Global Step: 21290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:18,307-Speed 5658.50 samples/sec   Loss 8.3039   LearningRate 0.0660   Epoch: 3   Global Step: 21300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:20,105-Speed 5698.78 samples/sec   Loss 7.9448   LearningRate 0.0660   Epoch: 3   Global Step: 21310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:21,923-Speed 5635.25 samples/sec   Loss 8.1144   LearningRate 0.0660   Epoch: 3   Global Step: 21320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:23,740-Speed 5637.72 samples/sec   Loss 8.1045   LearningRate 0.0660   Epoch: 3   Global Step: 21330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:47:25,524-Speed 5743.99 samples/sec   Loss 7.9938   LearningRate 0.0660   Epoch: 3   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:27,323-Speed 5696.87 samples/sec   Loss 8.1015   LearningRate 0.0660   Epoch: 3   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:29,154-Speed 5595.70 samples/sec   Loss 8.0468   LearningRate 0.0660   Epoch: 3   Global Step: 21360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:30,956-Speed 5685.43 samples/sec   Loss 7.8915   LearningRate 0.0659   Epoch: 3   Global Step: 21370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:32,745-Speed 5724.79 samples/sec   Loss 8.0263   LearningRate 0.0659   Epoch: 3   Global Step: 21380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:34,560-Speed 5643.15 samples/sec   Loss 8.2251   LearningRate 0.0659   Epoch: 3   Global Step: 21390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:36,403-Speed 5558.56 samples/sec   Loss 8.2240   LearningRate 0.0659   Epoch: 3   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:38,228-Speed 5613.72 samples/sec   Loss 8.0505   LearningRate 0.0659   Epoch: 3   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:40,039-Speed 5655.67 samples/sec   Loss 8.1125   LearningRate 0.0659   Epoch: 3   Global Step: 21420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:41,866-Speed 5607.08 samples/sec   Loss 8.0706   LearningRate 0.0659   Epoch: 3   Global Step: 21430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:43,659-Speed 5714.74 samples/sec   Loss 8.1256   LearningRate 0.0658   Epoch: 3   Global Step: 21440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:45,465-Speed 5671.77 samples/sec   Loss 8.1125   LearningRate 0.0658   Epoch: 3   Global Step: 21450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:47,287-Speed 5624.07 samples/sec   Loss 8.1584   LearningRate 0.0658   Epoch: 3   Global Step: 21460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:49,134-Speed 5546.00 samples/sec   Loss 8.1597   LearningRate 0.0658   Epoch: 3   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:50,964-Speed 5596.98 samples/sec   Loss 8.1169   LearningRate 0.0658   Epoch: 3   Global Step: 21480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:52,810-Speed 5552.18 samples/sec   Loss 8.1378   LearningRate 0.0658   Epoch: 3   Global Step: 21490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:54,621-Speed 5655.68 samples/sec   Loss 8.1384   LearningRate 0.0658   Epoch: 3   Global Step: 21500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:56,450-Speed 5599.55 samples/sec   Loss 8.0643   LearningRate 0.0657   Epoch: 3   Global Step: 21510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:47:58,291-Speed 5566.14 samples/sec   Loss 8.0340   LearningRate 0.0657   Epoch: 3   Global Step: 21520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:48:00,108-Speed 5636.08 samples/sec   Loss 8.2064   LearningRate 0.0657   Epoch: 3   Global Step: 21530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:48:01,914-Speed 5674.50 samples/sec   Loss 8.0848   LearningRate 0.0657   Epoch: 3   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:03,719-Speed 5673.19 samples/sec   Loss 8.1031   LearningRate 0.0657   Epoch: 3   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:05,535-Speed 5643.40 samples/sec   Loss 8.1332   LearningRate 0.0657   Epoch: 3   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:07,343-Speed 5666.19 samples/sec   Loss 7.9857   LearningRate 0.0657   Epoch: 3   Global Step: 21570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:09,158-Speed 5642.08 samples/sec   Loss 8.0518   LearningRate 0.0656   Epoch: 3   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:10,970-Speed 5652.31 samples/sec   Loss 8.1051   LearningRate 0.0656   Epoch: 3   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:12,782-Speed 5653.86 samples/sec   Loss 8.0938   LearningRate 0.0656   Epoch: 3   Global Step: 21600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:14,593-Speed 5656.26 samples/sec   Loss 7.9399   LearningRate 0.0656   Epoch: 3   Global Step: 21610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:16,425-Speed 5592.83 samples/sec   Loss 8.0820   LearningRate 0.0656   Epoch: 3   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:18,260-Speed 5580.49 samples/sec   Loss 8.0086   LearningRate 0.0656   Epoch: 3   Global Step: 21630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:20,082-Speed 5623.15 samples/sec   Loss 7.9960   LearningRate 0.0656   Epoch: 3   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:21,893-Speed 5658.08 samples/sec   Loss 8.0889   LearningRate 0.0655   Epoch: 3   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:23,739-Speed 5549.54 samples/sec   Loss 7.9211   LearningRate 0.0655   Epoch: 3   Global Step: 21660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:25,555-Speed 5643.29 samples/sec   Loss 7.8642   LearningRate 0.0655   Epoch: 3   Global Step: 21670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:27,416-Speed 5504.78 samples/sec   Loss 7.9165   LearningRate 0.0655   Epoch: 3   Global Step: 21680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:29,239-Speed 5617.43 samples/sec   Loss 8.1743   LearningRate 0.0655   Epoch: 3   Global Step: 21690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:31,103-Speed 5498.64 samples/sec   Loss 7.9192   LearningRate 0.0655   Epoch: 3   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:32,965-Speed 5501.31 samples/sec   Loss 8.1476   LearningRate 0.0655   Epoch: 3   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:34,787-Speed 5624.37 samples/sec   Loss 7.9673   LearningRate 0.0654   Epoch: 3   Global Step: 21720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:36,602-Speed 5643.33 samples/sec   Loss 8.0389   LearningRate 0.0654   Epoch: 3   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:38,402-Speed 5694.71 samples/sec   Loss 7.9996   LearningRate 0.0654   Epoch: 3   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:40,204-Speed 5688.01 samples/sec   Loss 7.9005   LearningRate 0.0654   Epoch: 3   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:42,037-Speed 5588.16 samples/sec   Loss 8.0956   LearningRate 0.0654   Epoch: 3   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:43,876-Speed 5571.27 samples/sec   Loss 8.2515   LearningRate 0.0654   Epoch: 3   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:46,034-Speed 4747.55 samples/sec   Loss 8.0400   LearningRate 0.0654   Epoch: 3   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:47,861-Speed 5608.54 samples/sec   Loss 8.1060   LearningRate 0.0653   Epoch: 3   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:49,710-Speed 5540.06 samples/sec   Loss 8.0499   LearningRate 0.0653   Epoch: 3   Global Step: 21800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:51,566-Speed 5521.97 samples/sec   Loss 8.1543   LearningRate 0.0653   Epoch: 3   Global Step: 21810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:53,382-Speed 5642.44 samples/sec   Loss 7.9753   LearningRate 0.0653   Epoch: 3   Global Step: 21820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:55,216-Speed 5584.58 samples/sec   Loss 8.0087   LearningRate 0.0653   Epoch: 3   Global Step: 21830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:48:57,022-Speed 5672.86 samples/sec   Loss 7.9673   LearningRate 0.0653   Epoch: 3   Global Step: 21840   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:48:58,853-Speed 5594.66 samples/sec   Loss 8.0376   LearningRate 0.0653   Epoch: 3   Global Step: 21850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:00,695-Speed 5562.44 samples/sec   Loss 7.9752   LearningRate 0.0652   Epoch: 3   Global Step: 21860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:02,527-Speed 5592.31 samples/sec   Loss 7.9977   LearningRate 0.0652   Epoch: 3   Global Step: 21870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:04,326-Speed 5695.42 samples/sec   Loss 8.0035   LearningRate 0.0652   Epoch: 3   Global Step: 21880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:06,162-Speed 5580.46 samples/sec   Loss 7.9215   LearningRate 0.0652   Epoch: 3   Global Step: 21890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:07,984-Speed 5623.04 samples/sec   Loss 7.9236   LearningRate 0.0652   Epoch: 3   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:09,823-Speed 5569.40 samples/sec   Loss 7.9155   LearningRate 0.0652   Epoch: 3   Global Step: 21910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:11,644-Speed 5624.92 samples/sec   Loss 8.0729   LearningRate 0.0652   Epoch: 3   Global Step: 21920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:13,524-Speed 5452.15 samples/sec   Loss 7.9754   LearningRate 0.0652   Epoch: 3   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:15,425-Speed 5387.60 samples/sec   Loss 8.0360   LearningRate 0.0651   Epoch: 3   Global Step: 21940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:17,273-Speed 5545.09 samples/sec   Loss 7.9624   LearningRate 0.0651   Epoch: 3   Global Step: 21950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:19,145-Speed 5473.66 samples/sec   Loss 7.8507   LearningRate 0.0651   Epoch: 3   Global Step: 21960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:20,970-Speed 5611.59 samples/sec   Loss 7.9425   LearningRate 0.0651   Epoch: 3   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:22,839-Speed 5482.92 samples/sec   Loss 8.0383   LearningRate 0.0651   Epoch: 3   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:24,725-Speed 5429.59 samples/sec   Loss 8.1059   LearningRate 0.0651   Epoch: 3   Global Step: 21990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:26,559-Speed 5587.08 samples/sec   Loss 8.1370   LearningRate 0.0651   Epoch: 3   Global Step: 22000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:49:53,492-[lfw][22000]XNorm: 21.071302
Training: 2022-04-27 02:49:53,492-[lfw][22000]Accuracy-Flip: 0.99583+-0.00318
Training: 2022-04-27 02:49:53,493-[lfw][22000]Accuracy-Highest: 0.99617
Training: 2022-04-27 02:50:24,580-[cfp_fp][22000]XNorm: 18.253711
Training: 2022-04-27 02:50:24,581-[cfp_fp][22000]Accuracy-Flip: 0.93171+-0.01137
Training: 2022-04-27 02:50:24,582-[cfp_fp][22000]Accuracy-Highest: 0.93171
Training: 2022-04-27 02:50:51,403-[agedb_30][22000]XNorm: 20.980046
Training: 2022-04-27 02:50:51,404-[agedb_30][22000]Accuracy-Flip: 0.96750+-0.00672
Training: 2022-04-27 02:50:51,404-[agedb_30][22000]Accuracy-Highest: 0.96867
Training: 2022-04-27 02:50:53,209-Speed 118.18 samples/sec   Loss 7.9351   LearningRate 0.0650   Epoch: 3   Global Step: 22010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:50:55,012-Speed 5684.63 samples/sec   Loss 8.0260   LearningRate 0.0650   Epoch: 3   Global Step: 22020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:50:56,850-Speed 5573.02 samples/sec   Loss 7.9518   LearningRate 0.0650   Epoch: 3   Global Step: 22030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:50:58,665-Speed 5643.93 samples/sec   Loss 8.0046   LearningRate 0.0650   Epoch: 3   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:00,464-Speed 5700.02 samples/sec   Loss 7.8917   LearningRate 0.0650   Epoch: 3   Global Step: 22050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:02,326-Speed 5504.12 samples/sec   Loss 8.1336   LearningRate 0.0650   Epoch: 3   Global Step: 22060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:04,141-Speed 5641.70 samples/sec   Loss 7.9443   LearningRate 0.0650   Epoch: 3   Global Step: 22070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:05,945-Speed 5678.50 samples/sec   Loss 7.8540   LearningRate 0.0649   Epoch: 3   Global Step: 22080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:07,752-Speed 5670.41 samples/sec   Loss 8.0192   LearningRate 0.0649   Epoch: 3   Global Step: 22090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:09,555-Speed 5679.77 samples/sec   Loss 7.8545   LearningRate 0.0649   Epoch: 3   Global Step: 22100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:11,341-Speed 5737.14 samples/sec   Loss 8.0439   LearningRate 0.0649   Epoch: 3   Global Step: 22110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:13,130-Speed 5725.65 samples/sec   Loss 7.9050   LearningRate 0.0649   Epoch: 3   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:14,930-Speed 5695.03 samples/sec   Loss 8.0292   LearningRate 0.0649   Epoch: 3   Global Step: 22130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:16,727-Speed 5698.33 samples/sec   Loss 8.0381   LearningRate 0.0649   Epoch: 3   Global Step: 22140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:18,507-Speed 5755.83 samples/sec   Loss 7.9307   LearningRate 0.0648   Epoch: 3   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:20,306-Speed 5695.98 samples/sec   Loss 7.8725   LearningRate 0.0648   Epoch: 3   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:22,094-Speed 5727.52 samples/sec   Loss 7.8733   LearningRate 0.0648   Epoch: 3   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:23,888-Speed 5712.51 samples/sec   Loss 8.1184   LearningRate 0.0648   Epoch: 3   Global Step: 22180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:25,685-Speed 5699.83 samples/sec   Loss 8.0337   LearningRate 0.0648   Epoch: 3   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:27,488-Speed 5682.69 samples/sec   Loss 7.9390   LearningRate 0.0648   Epoch: 3   Global Step: 22200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:29,281-Speed 5711.66 samples/sec   Loss 7.9424   LearningRate 0.0648   Epoch: 3   Global Step: 22210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:31,109-Speed 5607.52 samples/sec   Loss 7.9794   LearningRate 0.0647   Epoch: 3   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:32,916-Speed 5669.76 samples/sec   Loss 7.8850   LearningRate 0.0647   Epoch: 3   Global Step: 22230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:34,731-Speed 5643.53 samples/sec   Loss 7.8545   LearningRate 0.0647   Epoch: 3   Global Step: 22240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:36,572-Speed 5565.08 samples/sec   Loss 7.9874   LearningRate 0.0647   Epoch: 3   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:38,406-Speed 5583.75 samples/sec   Loss 8.0187   LearningRate 0.0647   Epoch: 3   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:40,215-Speed 5664.34 samples/sec   Loss 7.9523   LearningRate 0.0647   Epoch: 3   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:42,024-Speed 5660.71 samples/sec   Loss 7.8800   LearningRate 0.0647   Epoch: 3   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:51:43,818-Speed 5714.26 samples/sec   Loss 7.9657   LearningRate 0.0646   Epoch: 3   Global Step: 22290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:45,627-Speed 5662.01 samples/sec   Loss 8.0390   LearningRate 0.0646   Epoch: 3   Global Step: 22300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:47,436-Speed 5661.13 samples/sec   Loss 7.8227   LearningRate 0.0646   Epoch: 3   Global Step: 22310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:49,254-Speed 5637.93 samples/sec   Loss 7.8656   LearningRate 0.0646   Epoch: 3   Global Step: 22320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:51,070-Speed 5639.22 samples/sec   Loss 7.9969   LearningRate 0.0646   Epoch: 3   Global Step: 22330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:52,889-Speed 5632.06 samples/sec   Loss 8.0989   LearningRate 0.0646   Epoch: 3   Global Step: 22340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:54,706-Speed 5639.98 samples/sec   Loss 8.1021   LearningRate 0.0646   Epoch: 3   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:56,512-Speed 5670.83 samples/sec   Loss 8.0888   LearningRate 0.0645   Epoch: 3   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:51:58,371-Speed 5511.77 samples/sec   Loss 8.0878   LearningRate 0.0645   Epoch: 3   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:00,185-Speed 5645.65 samples/sec   Loss 7.8903   LearningRate 0.0645   Epoch: 3   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:01,999-Speed 5647.26 samples/sec   Loss 7.9637   LearningRate 0.0645   Epoch: 3   Global Step: 22390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:03,804-Speed 5675.12 samples/sec   Loss 7.9979   LearningRate 0.0645   Epoch: 3   Global Step: 22400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:05,597-Speed 5714.89 samples/sec   Loss 7.9487   LearningRate 0.0645   Epoch: 3   Global Step: 22410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:07,418-Speed 5625.10 samples/sec   Loss 8.0887   LearningRate 0.0645   Epoch: 3   Global Step: 22420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:09,207-Speed 5725.09 samples/sec   Loss 7.9272   LearningRate 0.0644   Epoch: 3   Global Step: 22430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:11,026-Speed 5631.69 samples/sec   Loss 7.8922   LearningRate 0.0644   Epoch: 3   Global Step: 22440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:12,855-Speed 5603.53 samples/sec   Loss 8.0858   LearningRate 0.0644   Epoch: 3   Global Step: 22450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:14,676-Speed 5623.64 samples/sec   Loss 8.0021   LearningRate 0.0644   Epoch: 3   Global Step: 22460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:16,485-Speed 5663.15 samples/sec   Loss 7.9994   LearningRate 0.0644   Epoch: 3   Global Step: 22470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:18,284-Speed 5693.25 samples/sec   Loss 7.9966   LearningRate 0.0644   Epoch: 3   Global Step: 22480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:20,089-Speed 5676.60 samples/sec   Loss 7.9319   LearningRate 0.0644   Epoch: 3   Global Step: 22490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:21,910-Speed 5625.10 samples/sec   Loss 7.8838   LearningRate 0.0643   Epoch: 3   Global Step: 22500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:23,709-Speed 5699.81 samples/sec   Loss 8.0393   LearningRate 0.0643   Epoch: 3   Global Step: 22510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:25,522-Speed 5650.08 samples/sec   Loss 7.9606   LearningRate 0.0643   Epoch: 3   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:27,346-Speed 5613.81 samples/sec   Loss 7.8463   LearningRate 0.0643   Epoch: 3   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:29,168-Speed 5624.48 samples/sec   Loss 7.9099   LearningRate 0.0643   Epoch: 3   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:30,975-Speed 5671.43 samples/sec   Loss 7.9384   LearningRate 0.0643   Epoch: 3   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:32,788-Speed 5649.03 samples/sec   Loss 7.7676   LearningRate 0.0643   Epoch: 3   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:34,593-Speed 5677.72 samples/sec   Loss 7.8974   LearningRate 0.0642   Epoch: 3   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:52:36,420-Speed 5605.91 samples/sec   Loss 7.9799   LearningRate 0.0642   Epoch: 3   Global Step: 22580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:38,226-Speed 5672.74 samples/sec   Loss 7.8224   LearningRate 0.0642   Epoch: 3   Global Step: 22590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:40,028-Speed 5684.40 samples/sec   Loss 7.9801   LearningRate 0.0642   Epoch: 3   Global Step: 22600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:41,839-Speed 5656.22 samples/sec   Loss 7.9200   LearningRate 0.0642   Epoch: 3   Global Step: 22610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:43,646-Speed 5671.28 samples/sec   Loss 8.0383   LearningRate 0.0642   Epoch: 3   Global Step: 22620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:45,463-Speed 5636.50 samples/sec   Loss 8.0295   LearningRate 0.0642   Epoch: 3   Global Step: 22630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:47,262-Speed 5694.33 samples/sec   Loss 7.8503   LearningRate 0.0641   Epoch: 3   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:49,085-Speed 5619.15 samples/sec   Loss 7.8247   LearningRate 0.0641   Epoch: 3   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:50,889-Speed 5679.20 samples/sec   Loss 7.9700   LearningRate 0.0641   Epoch: 3   Global Step: 22660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:52,703-Speed 5646.26 samples/sec   Loss 7.9294   LearningRate 0.0641   Epoch: 3   Global Step: 22670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:54,550-Speed 5547.74 samples/sec   Loss 8.0789   LearningRate 0.0641   Epoch: 3   Global Step: 22680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:56,365-Speed 5645.32 samples/sec   Loss 8.0149   LearningRate 0.0641   Epoch: 3   Global Step: 22690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:58,180-Speed 5646.12 samples/sec   Loss 7.9287   LearningRate 0.0641   Epoch: 3   Global Step: 22700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:52:59,995-Speed 5644.42 samples/sec   Loss 7.9199   LearningRate 0.0640   Epoch: 3   Global Step: 22710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:01,813-Speed 5633.63 samples/sec   Loss 7.9284   LearningRate 0.0640   Epoch: 3   Global Step: 22720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:03,653-Speed 5570.35 samples/sec   Loss 8.0621   LearningRate 0.0640   Epoch: 3   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:05,514-Speed 5503.26 samples/sec   Loss 7.9205   LearningRate 0.0640   Epoch: 3   Global Step: 22740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:19,352-Speed 740.05 samples/sec   Loss 7.5249   LearningRate 0.0640   Epoch: 4   Global Step: 22750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:21,381-Speed 5051.12 samples/sec   Loss 7.3209   LearningRate 0.0640   Epoch: 4   Global Step: 22760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:23,200-Speed 5631.33 samples/sec   Loss 7.3136   LearningRate 0.0640   Epoch: 4   Global Step: 22770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:25,032-Speed 5595.83 samples/sec   Loss 7.2668   LearningRate 0.0639   Epoch: 4   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:26,878-Speed 5550.06 samples/sec   Loss 7.1854   LearningRate 0.0639   Epoch: 4   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:28,709-Speed 5592.27 samples/sec   Loss 7.1962   LearningRate 0.0639   Epoch: 4   Global Step: 22800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:30,550-Speed 5568.78 samples/sec   Loss 7.2275   LearningRate 0.0639   Epoch: 4   Global Step: 22810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:32,387-Speed 5577.11 samples/sec   Loss 7.2536   LearningRate 0.0639   Epoch: 4   Global Step: 22820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:34,246-Speed 5510.70 samples/sec   Loss 7.1927   LearningRate 0.0639   Epoch: 4   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:36,061-Speed 5645.20 samples/sec   Loss 7.3272   LearningRate 0.0639   Epoch: 4   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:37,887-Speed 5612.83 samples/sec   Loss 7.4441   LearningRate 0.0639   Epoch: 4   Global Step: 22850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:39,698-Speed 5658.24 samples/sec   Loss 7.3787   LearningRate 0.0638   Epoch: 4   Global Step: 22860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:41,528-Speed 5597.56 samples/sec   Loss 7.4082   LearningRate 0.0638   Epoch: 4   Global Step: 22870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:43,358-Speed 5597.80 samples/sec   Loss 7.2974   LearningRate 0.0638   Epoch: 4   Global Step: 22880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:45,197-Speed 5572.23 samples/sec   Loss 7.5060   LearningRate 0.0638   Epoch: 4   Global Step: 22890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:47,062-Speed 5493.23 samples/sec   Loss 7.6514   LearningRate 0.0638   Epoch: 4   Global Step: 22900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:48,940-Speed 5457.35 samples/sec   Loss 7.3299   LearningRate 0.0638   Epoch: 4   Global Step: 22910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:50,780-Speed 5568.54 samples/sec   Loss 7.4678   LearningRate 0.0638   Epoch: 4   Global Step: 22920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:52,623-Speed 5557.85 samples/sec   Loss 7.3510   LearningRate 0.0637   Epoch: 4   Global Step: 22930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:54,521-Speed 5398.01 samples/sec   Loss 7.4806   LearningRate 0.0637   Epoch: 4   Global Step: 22940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:53:56,354-Speed 5590.86 samples/sec   Loss 7.5025   LearningRate 0.0637   Epoch: 4   Global Step: 22950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:53:58,189-Speed 5581.60 samples/sec   Loss 7.2343   LearningRate 0.0637   Epoch: 4   Global Step: 22960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:00,009-Speed 5629.65 samples/sec   Loss 7.5409   LearningRate 0.0637   Epoch: 4   Global Step: 22970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:01,870-Speed 5505.15 samples/sec   Loss 7.5068   LearningRate 0.0637   Epoch: 4   Global Step: 22980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:03,717-Speed 5546.64 samples/sec   Loss 7.5921   LearningRate 0.0637   Epoch: 4   Global Step: 22990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:05,539-Speed 5620.61 samples/sec   Loss 7.4046   LearningRate 0.0636   Epoch: 4   Global Step: 23000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:07,388-Speed 5541.63 samples/sec   Loss 7.4683   LearningRate 0.0636   Epoch: 4   Global Step: 23010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:09,264-Speed 5461.71 samples/sec   Loss 7.4562   LearningRate 0.0636   Epoch: 4   Global Step: 23020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:11,077-Speed 5649.88 samples/sec   Loss 7.4856   LearningRate 0.0636   Epoch: 4   Global Step: 23030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:12,889-Speed 5653.27 samples/sec   Loss 7.3914   LearningRate 0.0636   Epoch: 4   Global Step: 23040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:14,724-Speed 5581.49 samples/sec   Loss 7.5212   LearningRate 0.0636   Epoch: 4   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:16,551-Speed 5610.68 samples/sec   Loss 7.4898   LearningRate 0.0636   Epoch: 4   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:18,391-Speed 5566.80 samples/sec   Loss 7.6000   LearningRate 0.0635   Epoch: 4   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:20,226-Speed 5582.78 samples/sec   Loss 7.5457   LearningRate 0.0635   Epoch: 4   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:22,061-Speed 5584.12 samples/sec   Loss 7.6638   LearningRate 0.0635   Epoch: 4   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:23,870-Speed 5662.87 samples/sec   Loss 7.4534   LearningRate 0.0635   Epoch: 4   Global Step: 23100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:25,672-Speed 5685.69 samples/sec   Loss 7.5348   LearningRate 0.0635   Epoch: 4   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:27,497-Speed 5614.78 samples/sec   Loss 7.3965   LearningRate 0.0635   Epoch: 4   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:29,323-Speed 5609.10 samples/sec   Loss 7.5741   LearningRate 0.0635   Epoch: 4   Global Step: 23130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:31,159-Speed 5581.35 samples/sec   Loss 7.5774   LearningRate 0.0634   Epoch: 4   Global Step: 23140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:33,000-Speed 5564.12 samples/sec   Loss 7.5949   LearningRate 0.0634   Epoch: 4   Global Step: 23150   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:54:34,801-Speed 5688.04 samples/sec   Loss 7.7766   LearningRate 0.0634   Epoch: 4   Global Step: 23160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:36,622-Speed 5625.83 samples/sec   Loss 7.7392   LearningRate 0.0634   Epoch: 4   Global Step: 23170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:38,453-Speed 5594.66 samples/sec   Loss 7.5544   LearningRate 0.0634   Epoch: 4   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:40,310-Speed 5517.79 samples/sec   Loss 7.6138   LearningRate 0.0634   Epoch: 4   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:42,148-Speed 5572.34 samples/sec   Loss 7.5442   LearningRate 0.0634   Epoch: 4   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:43,953-Speed 5678.53 samples/sec   Loss 7.6814   LearningRate 0.0633   Epoch: 4   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:45,781-Speed 5603.12 samples/sec   Loss 7.6475   LearningRate 0.0633   Epoch: 4   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:47,611-Speed 5596.93 samples/sec   Loss 7.7440   LearningRate 0.0633   Epoch: 4   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:49,425-Speed 5648.03 samples/sec   Loss 7.6875   LearningRate 0.0633   Epoch: 4   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:51,255-Speed 5600.12 samples/sec   Loss 7.6415   LearningRate 0.0633   Epoch: 4   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:53,078-Speed 5619.52 samples/sec   Loss 7.5112   LearningRate 0.0633   Epoch: 4   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:54,930-Speed 5530.64 samples/sec   Loss 7.8367   LearningRate 0.0633   Epoch: 4   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:56,777-Speed 5548.04 samples/sec   Loss 7.7091   LearningRate 0.0632   Epoch: 4   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:54:58,636-Speed 5509.47 samples/sec   Loss 7.5885   LearningRate 0.0632   Epoch: 4   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:00,433-Speed 5700.52 samples/sec   Loss 7.7671   LearningRate 0.0632   Epoch: 4   Global Step: 23300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:02,258-Speed 5614.05 samples/sec   Loss 7.4829   LearningRate 0.0632   Epoch: 4   Global Step: 23310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:04,717-Speed 4166.22 samples/sec   Loss 7.6115   LearningRate 0.0632   Epoch: 4   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:07,765-Speed 3360.65 samples/sec   Loss 7.5067   LearningRate 0.0632   Epoch: 4   Global Step: 23330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:09,579-Speed 5646.72 samples/sec   Loss 7.7567   LearningRate 0.0632   Epoch: 4   Global Step: 23340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:11,404-Speed 5614.96 samples/sec   Loss 7.5218   LearningRate 0.0632   Epoch: 4   Global Step: 23350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:13,230-Speed 5609.96 samples/sec   Loss 7.6623   LearningRate 0.0631   Epoch: 4   Global Step: 23360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:15,076-Speed 5548.46 samples/sec   Loss 7.7503   LearningRate 0.0631   Epoch: 4   Global Step: 23370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:16,898-Speed 5624.10 samples/sec   Loss 7.7925   LearningRate 0.0631   Epoch: 4   Global Step: 23380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:18,735-Speed 5574.20 samples/sec   Loss 7.6312   LearningRate 0.0631   Epoch: 4   Global Step: 23390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:20,548-Speed 5651.56 samples/sec   Loss 7.6544   LearningRate 0.0631   Epoch: 4   Global Step: 23400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:22,372-Speed 5616.34 samples/sec   Loss 7.7428   LearningRate 0.0631   Epoch: 4   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:24,232-Speed 5507.60 samples/sec   Loss 7.7061   LearningRate 0.0631   Epoch: 4   Global Step: 23420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:26,058-Speed 5608.92 samples/sec   Loss 7.8538   LearningRate 0.0630   Epoch: 4   Global Step: 23430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:27,877-Speed 5633.20 samples/sec   Loss 7.6620   LearningRate 0.0630   Epoch: 4   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:29,682-Speed 5673.46 samples/sec   Loss 7.7861   LearningRate 0.0630   Epoch: 4   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:31,490-Speed 5666.69 samples/sec   Loss 7.6819   LearningRate 0.0630   Epoch: 4   Global Step: 23460   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 02:55:33,283-Speed 5712.80 samples/sec   Loss 7.7271   LearningRate 0.0630   Epoch: 4   Global Step: 23470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:35,097-Speed 5645.63 samples/sec   Loss 7.6772   LearningRate 0.0630   Epoch: 4   Global Step: 23480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:36,885-Speed 5731.24 samples/sec   Loss 7.5202   LearningRate 0.0630   Epoch: 4   Global Step: 23490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:38,693-Speed 5664.58 samples/sec   Loss 7.6712   LearningRate 0.0629   Epoch: 4   Global Step: 23500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:40,511-Speed 5635.42 samples/sec   Loss 7.7445   LearningRate 0.0629   Epoch: 4   Global Step: 23510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:42,318-Speed 5667.46 samples/sec   Loss 7.7636   LearningRate 0.0629   Epoch: 4   Global Step: 23520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:44,162-Speed 5555.15 samples/sec   Loss 7.7690   LearningRate 0.0629   Epoch: 4   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:45,958-Speed 5704.47 samples/sec   Loss 7.6437   LearningRate 0.0629   Epoch: 4   Global Step: 23540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:47,765-Speed 5668.27 samples/sec   Loss 7.6694   LearningRate 0.0629   Epoch: 4   Global Step: 23550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:49,572-Speed 5671.34 samples/sec   Loss 7.6387   LearningRate 0.0629   Epoch: 4   Global Step: 23560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:51,368-Speed 5702.01 samples/sec   Loss 7.7664   LearningRate 0.0628   Epoch: 4   Global Step: 23570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:53,183-Speed 5646.28 samples/sec   Loss 7.7092   LearningRate 0.0628   Epoch: 4   Global Step: 23580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:55:55,049-Speed 5489.57 samples/sec   Loss 7.7161   LearningRate 0.0628   Epoch: 4   Global Step: 23590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:56,909-Speed 5505.68 samples/sec   Loss 7.7351   LearningRate 0.0628   Epoch: 4   Global Step: 23600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:55:58,716-Speed 5667.59 samples/sec   Loss 7.6702   LearningRate 0.0628   Epoch: 4   Global Step: 23610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:56:00,538-Speed 5624.19 samples/sec   Loss 7.4544   LearningRate 0.0628   Epoch: 4   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:56:02,347-Speed 5661.63 samples/sec   Loss 7.6721   LearningRate 0.0628   Epoch: 4   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:56:04,158-Speed 5657.47 samples/sec   Loss 7.8310   LearningRate 0.0627   Epoch: 4   Global Step: 23640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:56:05,947-Speed 5726.16 samples/sec   Loss 7.5993   LearningRate 0.0627   Epoch: 4   Global Step: 23650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:56:07,766-Speed 5630.73 samples/sec   Loss 7.7535   LearningRate 0.0627   Epoch: 4   Global Step: 23660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:09,667-Speed 5389.02 samples/sec   Loss 7.7219   LearningRate 0.0627   Epoch: 4   Global Step: 23670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:11,560-Speed 5409.25 samples/sec   Loss 7.7019   LearningRate 0.0627   Epoch: 4   Global Step: 23680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:13,417-Speed 5516.91 samples/sec   Loss 7.7814   LearningRate 0.0627   Epoch: 4   Global Step: 23690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:15,263-Speed 5548.22 samples/sec   Loss 7.8136   LearningRate 0.0627   Epoch: 4   Global Step: 23700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:17,082-Speed 5632.60 samples/sec   Loss 7.5812   LearningRate 0.0626   Epoch: 4   Global Step: 23710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:18,887-Speed 5674.84 samples/sec   Loss 7.7387   LearningRate 0.0626   Epoch: 4   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:20,683-Speed 5703.81 samples/sec   Loss 7.7149   LearningRate 0.0626   Epoch: 4   Global Step: 23730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:22,478-Speed 5706.77 samples/sec   Loss 7.7502   LearningRate 0.0626   Epoch: 4   Global Step: 23740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:24,276-Speed 5697.38 samples/sec   Loss 7.6423   LearningRate 0.0626   Epoch: 4   Global Step: 23750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:26,103-Speed 5607.94 samples/sec   Loss 7.7069   LearningRate 0.0626   Epoch: 4   Global Step: 23760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:27,934-Speed 5595.76 samples/sec   Loss 7.7510   LearningRate 0.0626   Epoch: 4   Global Step: 23770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:29,741-Speed 5666.45 samples/sec   Loss 7.6559   LearningRate 0.0626   Epoch: 4   Global Step: 23780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:31,548-Speed 5671.23 samples/sec   Loss 7.6402   LearningRate 0.0625   Epoch: 4   Global Step: 23790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:33,359-Speed 5657.45 samples/sec   Loss 7.7106   LearningRate 0.0625   Epoch: 4   Global Step: 23800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:35,161-Speed 5684.83 samples/sec   Loss 7.6920   LearningRate 0.0625   Epoch: 4   Global Step: 23810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:36,959-Speed 5694.21 samples/sec   Loss 7.6883   LearningRate 0.0625   Epoch: 4   Global Step: 23820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:56:38,751-Speed 5717.06 samples/sec   Loss 7.8850   LearningRate 0.0625   Epoch: 4   Global Step: 23830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:40,564-Speed 5649.59 samples/sec   Loss 7.6343   LearningRate 0.0625   Epoch: 4   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:42,378-Speed 5648.34 samples/sec   Loss 7.7641   LearningRate 0.0625   Epoch: 4   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:44,167-Speed 5727.13 samples/sec   Loss 7.8742   LearningRate 0.0624   Epoch: 4   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:45,991-Speed 5614.02 samples/sec   Loss 7.7918   LearningRate 0.0624   Epoch: 4   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:47,826-Speed 5582.97 samples/sec   Loss 7.9118   LearningRate 0.0624   Epoch: 4   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:49,621-Speed 5707.38 samples/sec   Loss 7.6187   LearningRate 0.0624   Epoch: 4   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:51,456-Speed 5582.93 samples/sec   Loss 7.7609   LearningRate 0.0624   Epoch: 4   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:53,268-Speed 5652.32 samples/sec   Loss 7.7500   LearningRate 0.0624   Epoch: 4   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:55,095-Speed 5607.63 samples/sec   Loss 7.6922   LearningRate 0.0624   Epoch: 4   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:56:56,912-Speed 5637.26 samples/sec   Loss 7.6300   LearningRate 0.0623   Epoch: 4   Global Step: 23930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:56:58,736-Speed 5616.33 samples/sec   Loss 7.8516   LearningRate 0.0623   Epoch: 4   Global Step: 23940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:57:00,565-Speed 5601.92 samples/sec   Loss 7.7495   LearningRate 0.0623   Epoch: 4   Global Step: 23950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:57:02,397-Speed 5590.63 samples/sec   Loss 7.7268   LearningRate 0.0623   Epoch: 4   Global Step: 23960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:57:04,238-Speed 5565.56 samples/sec   Loss 7.9841   LearningRate 0.0623   Epoch: 4   Global Step: 23970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:57:06,043-Speed 5673.65 samples/sec   Loss 7.7810   LearningRate 0.0623   Epoch: 4   Global Step: 23980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:57:07,848-Speed 5675.01 samples/sec   Loss 7.7160   LearningRate 0.0623   Epoch: 4   Global Step: 23990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:57:09,652-Speed 5678.85 samples/sec   Loss 7.6570   LearningRate 0.0622   Epoch: 4   Global Step: 24000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:57:36,688-[lfw][24000]XNorm: 21.676750
Training: 2022-04-27 02:57:36,689-[lfw][24000]Accuracy-Flip: 0.99717+-0.00248
Training: 2022-04-27 02:57:36,689-[lfw][24000]Accuracy-Highest: 0.99717
Training: 2022-04-27 02:58:07,709-[cfp_fp][24000]XNorm: 18.930162
Training: 2022-04-27 02:58:07,710-[cfp_fp][24000]Accuracy-Flip: 0.91771+-0.01542
Training: 2022-04-27 02:58:07,710-[cfp_fp][24000]Accuracy-Highest: 0.93171
Training: 2022-04-27 02:58:34,644-[agedb_30][24000]XNorm: 21.409065
Training: 2022-04-27 02:58:34,645-[agedb_30][24000]Accuracy-Flip: 0.96500+-0.00719
Training: 2022-04-27 02:58:34,645-[agedb_30][24000]Accuracy-Highest: 0.96867
Training: 2022-04-27 02:58:36,481-Speed 117.93 samples/sec   Loss 7.8664   LearningRate 0.0622   Epoch: 4   Global Step: 24010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:58:38,313-Speed 5593.09 samples/sec   Loss 7.7716   LearningRate 0.0622   Epoch: 4   Global Step: 24020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:58:40,119-Speed 5673.42 samples/sec   Loss 7.6766   LearningRate 0.0622   Epoch: 4   Global Step: 24030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:58:41,940-Speed 5630.24 samples/sec   Loss 7.7115   LearningRate 0.0622   Epoch: 4   Global Step: 24040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:58:43,756-Speed 5642.62 samples/sec   Loss 7.6369   LearningRate 0.0622   Epoch: 4   Global Step: 24050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:58:45,578-Speed 5623.43 samples/sec   Loss 7.7752   LearningRate 0.0622   Epoch: 4   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:58:47,487-Speed 5365.70 samples/sec   Loss 7.6285   LearningRate 0.0621   Epoch: 4   Global Step: 24070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:58:49,318-Speed 5597.00 samples/sec   Loss 7.5898   LearningRate 0.0621   Epoch: 4   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:58:51,135-Speed 5639.23 samples/sec   Loss 7.6724   LearningRate 0.0621   Epoch: 4   Global Step: 24090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:58:52,959-Speed 5617.73 samples/sec   Loss 7.6074   LearningRate 0.0621   Epoch: 4   Global Step: 24100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:58:54,776-Speed 5639.23 samples/sec   Loss 7.7066   LearningRate 0.0621   Epoch: 4   Global Step: 24110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:58:56,604-Speed 5605.59 samples/sec   Loss 7.7920   LearningRate 0.0621   Epoch: 4   Global Step: 24120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:58:58,423-Speed 5633.91 samples/sec   Loss 7.7175   LearningRate 0.0621   Epoch: 4   Global Step: 24130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:59:00,242-Speed 5631.20 samples/sec   Loss 7.7119   LearningRate 0.0621   Epoch: 4   Global Step: 24140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:59:02,046-Speed 5679.40 samples/sec   Loss 7.6719   LearningRate 0.0620   Epoch: 4   Global Step: 24150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:59:03,896-Speed 5538.45 samples/sec   Loss 7.7670   LearningRate 0.0620   Epoch: 4   Global Step: 24160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:59:05,739-Speed 5560.12 samples/sec   Loss 7.7064   LearningRate 0.0620   Epoch: 4   Global Step: 24170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:59:07,537-Speed 5694.48 samples/sec   Loss 7.7959   LearningRate 0.0620   Epoch: 4   Global Step: 24180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 02:59:09,336-Speed 5693.94 samples/sec   Loss 7.8070   LearningRate 0.0620   Epoch: 4   Global Step: 24190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:11,177-Speed 5566.71 samples/sec   Loss 7.6657   LearningRate 0.0620   Epoch: 4   Global Step: 24200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:13,009-Speed 5592.03 samples/sec   Loss 7.7591   LearningRate 0.0620   Epoch: 4   Global Step: 24210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:14,813-Speed 5676.84 samples/sec   Loss 7.7381   LearningRate 0.0619   Epoch: 4   Global Step: 24220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:16,612-Speed 5695.29 samples/sec   Loss 7.6300   LearningRate 0.0619   Epoch: 4   Global Step: 24230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:18,430-Speed 5637.85 samples/sec   Loss 7.6954   LearningRate 0.0619   Epoch: 4   Global Step: 24240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:20,246-Speed 5638.80 samples/sec   Loss 7.7343   LearningRate 0.0619   Epoch: 4   Global Step: 24250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:22,057-Speed 5656.39 samples/sec   Loss 7.7049   LearningRate 0.0619   Epoch: 4   Global Step: 24260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:23,896-Speed 5572.97 samples/sec   Loss 7.8325   LearningRate 0.0619   Epoch: 4   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:25,742-Speed 5549.66 samples/sec   Loss 7.8294   LearningRate 0.0619   Epoch: 4   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:27,588-Speed 5548.61 samples/sec   Loss 7.8781   LearningRate 0.0618   Epoch: 4   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:59:29,406-Speed 5634.49 samples/sec   Loss 7.8842   LearningRate 0.0618   Epoch: 4   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:59:31,218-Speed 5655.09 samples/sec   Loss 7.9734   LearningRate 0.0618   Epoch: 4   Global Step: 24310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:59:33,045-Speed 5608.39 samples/sec   Loss 7.5573   LearningRate 0.0618   Epoch: 4   Global Step: 24320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:34,859-Speed 5647.98 samples/sec   Loss 7.7421   LearningRate 0.0618   Epoch: 4   Global Step: 24330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:36,680-Speed 5628.23 samples/sec   Loss 7.8184   LearningRate 0.0618   Epoch: 4   Global Step: 24340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:38,489-Speed 5660.66 samples/sec   Loss 7.7621   LearningRate 0.0618   Epoch: 4   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:40,322-Speed 5589.65 samples/sec   Loss 7.5460   LearningRate 0.0617   Epoch: 4   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:42,160-Speed 5573.68 samples/sec   Loss 7.7171   LearningRate 0.0617   Epoch: 4   Global Step: 24370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:43,984-Speed 5618.94 samples/sec   Loss 7.8530   LearningRate 0.0617   Epoch: 4   Global Step: 24380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:45,856-Speed 5470.29 samples/sec   Loss 7.7175   LearningRate 0.0617   Epoch: 4   Global Step: 24390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:47,709-Speed 5532.60 samples/sec   Loss 7.6457   LearningRate 0.0617   Epoch: 4   Global Step: 24400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:49,542-Speed 5589.94 samples/sec   Loss 7.6020   LearningRate 0.0617   Epoch: 4   Global Step: 24410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 02:59:51,387-Speed 5552.20 samples/sec   Loss 7.7485   LearningRate 0.0617   Epoch: 4   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:59:53,240-Speed 5528.23 samples/sec   Loss 7.7299   LearningRate 0.0616   Epoch: 4   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:59:55,080-Speed 5566.34 samples/sec   Loss 7.6280   LearningRate 0.0616   Epoch: 4   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:59:56,916-Speed 5581.57 samples/sec   Loss 7.8419   LearningRate 0.0616   Epoch: 4   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 02:59:58,759-Speed 5558.23 samples/sec   Loss 7.9439   LearningRate 0.0616   Epoch: 4   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:00,649-Speed 5419.11 samples/sec   Loss 7.7532   LearningRate 0.0616   Epoch: 4   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:02,447-Speed 5701.70 samples/sec   Loss 7.6581   LearningRate 0.0616   Epoch: 4   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:04,244-Speed 5699.50 samples/sec   Loss 7.7090   LearningRate 0.0616   Epoch: 4   Global Step: 24490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:06,088-Speed 5557.21 samples/sec   Loss 7.7860   LearningRate 0.0616   Epoch: 4   Global Step: 24500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:07,910-Speed 5623.17 samples/sec   Loss 7.5960   LearningRate 0.0615   Epoch: 4   Global Step: 24510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:09,736-Speed 5612.61 samples/sec   Loss 7.7831   LearningRate 0.0615   Epoch: 4   Global Step: 24520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:11,571-Speed 5584.17 samples/sec   Loss 7.7343   LearningRate 0.0615   Epoch: 4   Global Step: 24530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:13,416-Speed 5551.26 samples/sec   Loss 7.5478   LearningRate 0.0615   Epoch: 4   Global Step: 24540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:15,212-Speed 5706.62 samples/sec   Loss 7.8149   LearningRate 0.0615   Epoch: 4   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:17,035-Speed 5618.78 samples/sec   Loss 7.6812   LearningRate 0.0615   Epoch: 4   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:18,864-Speed 5604.25 samples/sec   Loss 7.7415   LearningRate 0.0615   Epoch: 4   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:20,686-Speed 5624.75 samples/sec   Loss 7.7679   LearningRate 0.0614   Epoch: 4   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:22,495-Speed 5663.30 samples/sec   Loss 7.7594   LearningRate 0.0614   Epoch: 4   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:24,305-Speed 5659.65 samples/sec   Loss 7.6640   LearningRate 0.0614   Epoch: 4   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:26,162-Speed 5519.77 samples/sec   Loss 7.6087   LearningRate 0.0614   Epoch: 4   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:27,992-Speed 5596.01 samples/sec   Loss 7.7552   LearningRate 0.0614   Epoch: 4   Global Step: 24620   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-27 03:00:29,817-Speed 5614.88 samples/sec   Loss 7.6255   LearningRate 0.0614   Epoch: 4   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:31,623-Speed 5675.67 samples/sec   Loss 7.7525   LearningRate 0.0614   Epoch: 4   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:33,476-Speed 5528.01 samples/sec   Loss 7.6820   LearningRate 0.0613   Epoch: 4   Global Step: 24650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:35,313-Speed 5577.88 samples/sec   Loss 7.7106   LearningRate 0.0613   Epoch: 4   Global Step: 24660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:37,142-Speed 5601.94 samples/sec   Loss 7.7971   LearningRate 0.0613   Epoch: 4   Global Step: 24670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:38,947-Speed 5676.61 samples/sec   Loss 7.7541   LearningRate 0.0613   Epoch: 4   Global Step: 24680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:40,764-Speed 5637.05 samples/sec   Loss 7.6826   LearningRate 0.0613   Epoch: 4   Global Step: 24690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:42,586-Speed 5623.09 samples/sec   Loss 7.8081   LearningRate 0.0613   Epoch: 4   Global Step: 24700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:44,396-Speed 5663.97 samples/sec   Loss 7.6036   LearningRate 0.0613   Epoch: 4   Global Step: 24710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:46,242-Speed 5548.42 samples/sec   Loss 7.5258   LearningRate 0.0613   Epoch: 4   Global Step: 24720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:48,046-Speed 5678.73 samples/sec   Loss 7.7383   LearningRate 0.0612   Epoch: 4   Global Step: 24730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:49,861-Speed 5645.42 samples/sec   Loss 7.7751   LearningRate 0.0612   Epoch: 4   Global Step: 24740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:51,682-Speed 5628.04 samples/sec   Loss 7.7593   LearningRate 0.0612   Epoch: 4   Global Step: 24750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:53,498-Speed 5642.08 samples/sec   Loss 7.7196   LearningRate 0.0612   Epoch: 4   Global Step: 24760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:55,346-Speed 5543.39 samples/sec   Loss 7.5974   LearningRate 0.0612   Epoch: 4   Global Step: 24770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:57,183-Speed 5577.11 samples/sec   Loss 7.6746   LearningRate 0.0612   Epoch: 4   Global Step: 24780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:00:59,046-Speed 5499.76 samples/sec   Loss 7.7521   LearningRate 0.0612   Epoch: 4   Global Step: 24790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:00,894-Speed 5545.10 samples/sec   Loss 7.6514   LearningRate 0.0611   Epoch: 4   Global Step: 24800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:02,737-Speed 5558.50 samples/sec   Loss 7.8039   LearningRate 0.0611   Epoch: 4   Global Step: 24810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:04,558-Speed 5626.12 samples/sec   Loss 7.6934   LearningRate 0.0611   Epoch: 4   Global Step: 24820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:06,370-Speed 5655.22 samples/sec   Loss 7.7004   LearningRate 0.0611   Epoch: 4   Global Step: 24830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:08,170-Speed 5691.91 samples/sec   Loss 7.4782   LearningRate 0.0611   Epoch: 4   Global Step: 24840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:09,987-Speed 5639.13 samples/sec   Loss 7.6161   LearningRate 0.0611   Epoch: 4   Global Step: 24850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:11,819-Speed 5592.11 samples/sec   Loss 7.6921   LearningRate 0.0611   Epoch: 4   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:13,635-Speed 5640.89 samples/sec   Loss 7.6741   LearningRate 0.0610   Epoch: 4   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:15,454-Speed 5630.78 samples/sec   Loss 7.7715   LearningRate 0.0610   Epoch: 4   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:17,281-Speed 5609.65 samples/sec   Loss 7.6374   LearningRate 0.0610   Epoch: 4   Global Step: 24890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:19,090-Speed 5661.51 samples/sec   Loss 7.5960   LearningRate 0.0610   Epoch: 4   Global Step: 24900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:20,906-Speed 5640.93 samples/sec   Loss 7.6567   LearningRate 0.0610   Epoch: 4   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:22,780-Speed 5466.24 samples/sec   Loss 7.7478   LearningRate 0.0610   Epoch: 4   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:24,589-Speed 5664.04 samples/sec   Loss 7.6507   LearningRate 0.0610   Epoch: 4   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:26,404-Speed 5646.17 samples/sec   Loss 7.5411   LearningRate 0.0609   Epoch: 4   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:28,226-Speed 5622.23 samples/sec   Loss 7.4116   LearningRate 0.0609   Epoch: 4   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:30,056-Speed 5599.71 samples/sec   Loss 7.5987   LearningRate 0.0609   Epoch: 4   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:31,880-Speed 5616.09 samples/sec   Loss 7.6434   LearningRate 0.0609   Epoch: 4   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:33,728-Speed 5544.58 samples/sec   Loss 7.6448   LearningRate 0.0609   Epoch: 4   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:35,554-Speed 5610.32 samples/sec   Loss 7.7745   LearningRate 0.0609   Epoch: 4   Global Step: 24990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:37,382-Speed 5606.19 samples/sec   Loss 7.5859   LearningRate 0.0609   Epoch: 4   Global Step: 25000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:39,177-Speed 5709.91 samples/sec   Loss 7.6309   LearningRate 0.0609   Epoch: 4   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:41,003-Speed 5609.81 samples/sec   Loss 7.7126   LearningRate 0.0608   Epoch: 4   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:42,834-Speed 5595.10 samples/sec   Loss 7.6660   LearningRate 0.0608   Epoch: 4   Global Step: 25030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:44,677-Speed 5560.29 samples/sec   Loss 7.6698   LearningRate 0.0608   Epoch: 4   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:46,523-Speed 5549.09 samples/sec   Loss 7.6033   LearningRate 0.0608   Epoch: 4   Global Step: 25050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:48,357-Speed 5588.12 samples/sec   Loss 7.5868   LearningRate 0.0608   Epoch: 4   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:50,204-Speed 5545.95 samples/sec   Loss 7.4834   LearningRate 0.0608   Epoch: 4   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:52,043-Speed 5574.34 samples/sec   Loss 7.6490   LearningRate 0.0608   Epoch: 4   Global Step: 25080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:53,888-Speed 5555.89 samples/sec   Loss 7.5660   LearningRate 0.0607   Epoch: 4   Global Step: 25090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:55,701-Speed 5650.17 samples/sec   Loss 7.6039   LearningRate 0.0607   Epoch: 4   Global Step: 25100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 03:01:57,540-Speed 5571.84 samples/sec   Loss 7.5998   LearningRate 0.0607   Epoch: 4   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:01:59,374-Speed 5586.89 samples/sec   Loss 7.5911   LearningRate 0.0607   Epoch: 4   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:01,197-Speed 5619.99 samples/sec   Loss 7.6219   LearningRate 0.0607   Epoch: 4   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:03,016-Speed 5630.66 samples/sec   Loss 7.4803   LearningRate 0.0607   Epoch: 4   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:04,857-Speed 5565.95 samples/sec   Loss 7.6448   LearningRate 0.0607   Epoch: 4   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:06,685-Speed 5606.77 samples/sec   Loss 7.7681   LearningRate 0.0606   Epoch: 4   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:08,610-Speed 5319.60 samples/sec   Loss 7.6076   LearningRate 0.0606   Epoch: 4   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:10,465-Speed 5524.27 samples/sec   Loss 7.7116   LearningRate 0.0606   Epoch: 4   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:12,267-Speed 5687.14 samples/sec   Loss 7.6688   LearningRate 0.0606   Epoch: 4   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:14,098-Speed 5595.15 samples/sec   Loss 7.5260   LearningRate 0.0606   Epoch: 4   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 03:02:15,924-Speed 5608.22 samples/sec   Loss 7.6094   LearningRate 0.0606   Epoch: 4   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:17,754-Speed 5603.51 samples/sec   Loss 7.6586   LearningRate 0.0606   Epoch: 4   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:19,606-Speed 5531.12 samples/sec   Loss 7.7449   LearningRate 0.0606   Epoch: 4   Global Step: 25230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:21,451-Speed 5555.44 samples/sec   Loss 7.6191   LearningRate 0.0605   Epoch: 4   Global Step: 25240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:23,282-Speed 5595.13 samples/sec   Loss 7.5734   LearningRate 0.0605   Epoch: 4   Global Step: 25250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:25,113-Speed 5598.11 samples/sec   Loss 7.6181   LearningRate 0.0605   Epoch: 4   Global Step: 25260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:26,923-Speed 5657.40 samples/sec   Loss 7.7646   LearningRate 0.0605   Epoch: 4   Global Step: 25270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:28,737-Speed 5648.61 samples/sec   Loss 7.7077   LearningRate 0.0605   Epoch: 4   Global Step: 25280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:30,557-Speed 5630.47 samples/sec   Loss 7.6269   LearningRate 0.0605   Epoch: 4   Global Step: 25290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:32,368-Speed 5655.49 samples/sec   Loss 7.6341   LearningRate 0.0605   Epoch: 4   Global Step: 25300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:34,191-Speed 5620.47 samples/sec   Loss 7.6377   LearningRate 0.0604   Epoch: 4   Global Step: 25310   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-27 03:02:35,990-Speed 5696.35 samples/sec   Loss 7.5690   LearningRate 0.0604   Epoch: 4   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:37,796-Speed 5674.14 samples/sec   Loss 7.6376   LearningRate 0.0604   Epoch: 4   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:39,625-Speed 5600.03 samples/sec   Loss 7.4727   LearningRate 0.0604   Epoch: 4   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:41,465-Speed 5570.33 samples/sec   Loss 7.6730   LearningRate 0.0604   Epoch: 4   Global Step: 25350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:43,288-Speed 5619.44 samples/sec   Loss 7.5661   LearningRate 0.0604   Epoch: 4   Global Step: 25360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:45,109-Speed 5626.03 samples/sec   Loss 7.6029   LearningRate 0.0604   Epoch: 4   Global Step: 25370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:46,939-Speed 5599.38 samples/sec   Loss 7.5467   LearningRate 0.0603   Epoch: 4   Global Step: 25380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:48,771-Speed 5591.49 samples/sec   Loss 7.6860   LearningRate 0.0603   Epoch: 4   Global Step: 25390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:50,594-Speed 5620.98 samples/sec   Loss 7.5348   LearningRate 0.0603   Epoch: 4   Global Step: 25400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:52,410-Speed 5642.49 samples/sec   Loss 7.6429   LearningRate 0.0603   Epoch: 4   Global Step: 25410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:54,227-Speed 5638.44 samples/sec   Loss 7.5688   LearningRate 0.0603   Epoch: 4   Global Step: 25420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:56,055-Speed 5604.41 samples/sec   Loss 7.7292   LearningRate 0.0603   Epoch: 4   Global Step: 25430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:02:57,870-Speed 5644.84 samples/sec   Loss 7.6623   LearningRate 0.0603   Epoch: 4   Global Step: 25440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:02:59,688-Speed 5633.62 samples/sec   Loss 7.5293   LearningRate 0.0602   Epoch: 4   Global Step: 25450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:01,503-Speed 5646.05 samples/sec   Loss 7.8081   LearningRate 0.0602   Epoch: 4   Global Step: 25460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:03,320-Speed 5638.45 samples/sec   Loss 7.5905   LearningRate 0.0602   Epoch: 4   Global Step: 25470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:05,147-Speed 5606.09 samples/sec   Loss 7.5892   LearningRate 0.0602   Epoch: 4   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:06,947-Speed 5692.70 samples/sec   Loss 7.5376   LearningRate 0.0602   Epoch: 4   Global Step: 25490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:08,755-Speed 5665.94 samples/sec   Loss 7.5003   LearningRate 0.0602   Epoch: 4   Global Step: 25500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:10,569-Speed 5648.52 samples/sec   Loss 7.6345   LearningRate 0.0602   Epoch: 4   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:12,384-Speed 5645.42 samples/sec   Loss 7.3594   LearningRate 0.0602   Epoch: 4   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:14,208-Speed 5618.15 samples/sec   Loss 7.6498   LearningRate 0.0601   Epoch: 4   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:16,034-Speed 5609.14 samples/sec   Loss 7.6497   LearningRate 0.0601   Epoch: 4   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:17,839-Speed 5678.17 samples/sec   Loss 7.3792   LearningRate 0.0601   Epoch: 4   Global Step: 25550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:19,645-Speed 5669.91 samples/sec   Loss 7.5467   LearningRate 0.0601   Epoch: 4   Global Step: 25560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:21,468-Speed 5621.00 samples/sec   Loss 7.5262   LearningRate 0.0601   Epoch: 4   Global Step: 25570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:23,269-Speed 5687.77 samples/sec   Loss 7.4038   LearningRate 0.0601   Epoch: 4   Global Step: 25580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:25,086-Speed 5639.40 samples/sec   Loss 7.7853   LearningRate 0.0601   Epoch: 4   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:26,938-Speed 5531.99 samples/sec   Loss 7.4604   LearningRate 0.0600   Epoch: 4   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:28,761-Speed 5621.01 samples/sec   Loss 7.5828   LearningRate 0.0600   Epoch: 4   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:30,603-Speed 5562.65 samples/sec   Loss 7.5581   LearningRate 0.0600   Epoch: 4   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:32,428-Speed 5615.29 samples/sec   Loss 7.6412   LearningRate 0.0600   Epoch: 4   Global Step: 25630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:34,254-Speed 5610.40 samples/sec   Loss 7.6135   LearningRate 0.0600   Epoch: 4   Global Step: 25640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:36,080-Speed 5610.70 samples/sec   Loss 7.4934   LearningRate 0.0600   Epoch: 4   Global Step: 25650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:37,942-Speed 5506.28 samples/sec   Loss 7.6994   LearningRate 0.0600   Epoch: 4   Global Step: 25660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:39,782-Speed 5568.69 samples/sec   Loss 7.5816   LearningRate 0.0599   Epoch: 4   Global Step: 25670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:41,620-Speed 5575.39 samples/sec   Loss 7.6217   LearningRate 0.0599   Epoch: 4   Global Step: 25680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:43,434-Speed 5648.09 samples/sec   Loss 7.5723   LearningRate 0.0599   Epoch: 4   Global Step: 25690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:45,245-Speed 5656.75 samples/sec   Loss 7.6127   LearningRate 0.0599   Epoch: 4   Global Step: 25700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:47,064-Speed 5632.01 samples/sec   Loss 7.5242   LearningRate 0.0599   Epoch: 4   Global Step: 25710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:48,894-Speed 5599.36 samples/sec   Loss 7.4321   LearningRate 0.0599   Epoch: 4   Global Step: 25720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:03:50,717-Speed 5619.13 samples/sec   Loss 7.4753   LearningRate 0.0599   Epoch: 4   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:52,544-Speed 5607.91 samples/sec   Loss 7.6162   LearningRate 0.0599   Epoch: 4   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:54,389-Speed 5553.93 samples/sec   Loss 7.7093   LearningRate 0.0598   Epoch: 4   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:56,221-Speed 5593.67 samples/sec   Loss 7.6064   LearningRate 0.0598   Epoch: 4   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:58,044-Speed 5620.57 samples/sec   Loss 7.5427   LearningRate 0.0598   Epoch: 4   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:03:59,919-Speed 5462.93 samples/sec   Loss 7.6420   LearningRate 0.0598   Epoch: 4   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:01,754-Speed 5584.49 samples/sec   Loss 7.3746   LearningRate 0.0598   Epoch: 4   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:03,564-Speed 5659.21 samples/sec   Loss 7.5874   LearningRate 0.0598   Epoch: 4   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:05,404-Speed 5571.67 samples/sec   Loss 7.6924   LearningRate 0.0598   Epoch: 4   Global Step: 25810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:07,231-Speed 5606.92 samples/sec   Loss 7.5805   LearningRate 0.0597   Epoch: 4   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:09,049-Speed 5637.42 samples/sec   Loss 7.5825   LearningRate 0.0597   Epoch: 4   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:10,925-Speed 5459.88 samples/sec   Loss 7.6509   LearningRate 0.0597   Epoch: 4   Global Step: 25840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:12,741-Speed 5642.78 samples/sec   Loss 7.4272   LearningRate 0.0597   Epoch: 4   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:14,583-Speed 5562.45 samples/sec   Loss 7.6485   LearningRate 0.0597   Epoch: 4   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:16,394-Speed 5658.33 samples/sec   Loss 7.5414   LearningRate 0.0597   Epoch: 4   Global Step: 25870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:18,251-Speed 5517.64 samples/sec   Loss 7.8024   LearningRate 0.0597   Epoch: 4   Global Step: 25880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:20,089-Speed 5571.01 samples/sec   Loss 7.6642   LearningRate 0.0597   Epoch: 4   Global Step: 25890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:21,913-Speed 5618.44 samples/sec   Loss 7.4870   LearningRate 0.0596   Epoch: 4   Global Step: 25900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:23,727-Speed 5648.99 samples/sec   Loss 7.5225   LearningRate 0.0596   Epoch: 4   Global Step: 25910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:25,540-Speed 5651.47 samples/sec   Loss 7.5214   LearningRate 0.0596   Epoch: 4   Global Step: 25920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:27,373-Speed 5589.67 samples/sec   Loss 7.6541   LearningRate 0.0596   Epoch: 4   Global Step: 25930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:29,193-Speed 5627.41 samples/sec   Loss 7.7836   LearningRate 0.0596   Epoch: 4   Global Step: 25940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:31,041-Speed 5543.36 samples/sec   Loss 7.4596   LearningRate 0.0596   Epoch: 4   Global Step: 25950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:32,859-Speed 5638.38 samples/sec   Loss 7.6369   LearningRate 0.0596   Epoch: 4   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:04:34,684-Speed 5612.70 samples/sec   Loss 7.6861   LearningRate 0.0595   Epoch: 4   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:36,505-Speed 5626.40 samples/sec   Loss 7.5118   LearningRate 0.0595   Epoch: 4   Global Step: 25980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:38,322-Speed 5638.16 samples/sec   Loss 7.6172   LearningRate 0.0595   Epoch: 4   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:04:40,159-Speed 5577.14 samples/sec   Loss 7.6716   LearningRate 0.0595   Epoch: 4   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:05:06,958-[lfw][26000]XNorm: 22.816512
Training: 2022-04-27 03:05:06,959-[lfw][26000]Accuracy-Flip: 0.99567+-0.00309
Training: 2022-04-27 03:05:06,959-[lfw][26000]Accuracy-Highest: 0.99717
Training: 2022-04-27 03:05:37,734-[cfp_fp][26000]XNorm: 19.423009
Training: 2022-04-27 03:05:37,735-[cfp_fp][26000]Accuracy-Flip: 0.93243+-0.00895
Training: 2022-04-27 03:05:37,736-[cfp_fp][26000]Accuracy-Highest: 0.93243
Training: 2022-04-27 03:06:04,345-[agedb_30][26000]XNorm: 22.459086
Training: 2022-04-27 03:06:04,346-[agedb_30][26000]Accuracy-Flip: 0.96433+-0.01153
Training: 2022-04-27 03:06:04,347-[agedb_30][26000]Accuracy-Highest: 0.96867
Training: 2022-04-27 03:06:06,193-Speed 119.03 samples/sec   Loss 7.6100   LearningRate 0.0595   Epoch: 4   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:06:08,006-Speed 5649.13 samples/sec   Loss 7.5094   LearningRate 0.0595   Epoch: 4   Global Step: 26020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:06:09,807-Speed 5688.71 samples/sec   Loss 7.6237   LearningRate 0.0595   Epoch: 4   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:06:11,644-Speed 5576.58 samples/sec   Loss 7.6584   LearningRate 0.0594   Epoch: 4   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:06:13,476-Speed 5593.92 samples/sec   Loss 7.6225   LearningRate 0.0594   Epoch: 4   Global Step: 26050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:15,308-Speed 5590.04 samples/sec   Loss 7.3726   LearningRate 0.0594   Epoch: 4   Global Step: 26060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:17,168-Speed 5511.49 samples/sec   Loss 7.2887   LearningRate 0.0594   Epoch: 4   Global Step: 26070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:18,982-Speed 5647.87 samples/sec   Loss 7.5112   LearningRate 0.0594   Epoch: 4   Global Step: 26080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:20,806-Speed 5616.70 samples/sec   Loss 7.5437   LearningRate 0.0594   Epoch: 4   Global Step: 26090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:22,643-Speed 5575.65 samples/sec   Loss 7.4948   LearningRate 0.0594   Epoch: 4   Global Step: 26100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:24,486-Speed 5559.53 samples/sec   Loss 7.6638   LearningRate 0.0594   Epoch: 4   Global Step: 26110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:26,309-Speed 5620.66 samples/sec   Loss 7.5961   LearningRate 0.0593   Epoch: 4   Global Step: 26120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:28,161-Speed 5532.44 samples/sec   Loss 7.6997   LearningRate 0.0593   Epoch: 4   Global Step: 26130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:30,025-Speed 5498.51 samples/sec   Loss 7.5325   LearningRate 0.0593   Epoch: 4   Global Step: 26140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:31,887-Speed 5501.73 samples/sec   Loss 7.5366   LearningRate 0.0593   Epoch: 4   Global Step: 26150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:33,690-Speed 5683.59 samples/sec   Loss 7.6005   LearningRate 0.0593   Epoch: 4   Global Step: 26160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:35,522-Speed 5593.41 samples/sec   Loss 7.4902   LearningRate 0.0593   Epoch: 4   Global Step: 26170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:37,347-Speed 5611.76 samples/sec   Loss 7.5336   LearningRate 0.0593   Epoch: 4   Global Step: 26180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:39,206-Speed 5512.75 samples/sec   Loss 7.5784   LearningRate 0.0592   Epoch: 4   Global Step: 26190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:41,028-Speed 5623.63 samples/sec   Loss 7.4574   LearningRate 0.0592   Epoch: 4   Global Step: 26200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:42,851-Speed 5622.46 samples/sec   Loss 7.5249   LearningRate 0.0592   Epoch: 4   Global Step: 26210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:44,674-Speed 5618.74 samples/sec   Loss 7.6516   LearningRate 0.0592   Epoch: 4   Global Step: 26220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:46,493-Speed 5633.10 samples/sec   Loss 7.4979   LearningRate 0.0592   Epoch: 4   Global Step: 26230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:48,307-Speed 5649.46 samples/sec   Loss 7.4090   LearningRate 0.0592   Epoch: 4   Global Step: 26240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:50,138-Speed 5595.08 samples/sec   Loss 7.5088   LearningRate 0.0592   Epoch: 4   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:06:51,963-Speed 5611.22 samples/sec   Loss 7.6106   LearningRate 0.0591   Epoch: 4   Global Step: 26260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:53,809-Speed 5551.19 samples/sec   Loss 7.4561   LearningRate 0.0591   Epoch: 4   Global Step: 26270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:55,671-Speed 5501.46 samples/sec   Loss 7.3993   LearningRate 0.0591   Epoch: 4   Global Step: 26280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:57,553-Speed 5444.38 samples/sec   Loss 7.4558   LearningRate 0.0591   Epoch: 4   Global Step: 26290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:06:59,394-Speed 5565.18 samples/sec   Loss 7.4651   LearningRate 0.0591   Epoch: 4   Global Step: 26300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:01,225-Speed 5593.91 samples/sec   Loss 7.3137   LearningRate 0.0591   Epoch: 4   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:03,044-Speed 5632.23 samples/sec   Loss 7.3899   LearningRate 0.0591   Epoch: 4   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:04,884-Speed 5569.46 samples/sec   Loss 7.5606   LearningRate 0.0591   Epoch: 4   Global Step: 26330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:06,694-Speed 5658.52 samples/sec   Loss 7.7378   LearningRate 0.0590   Epoch: 4   Global Step: 26340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:08,544-Speed 5540.92 samples/sec   Loss 7.2797   LearningRate 0.0590   Epoch: 4   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:10,355-Speed 5658.29 samples/sec   Loss 7.3221   LearningRate 0.0590   Epoch: 4   Global Step: 26360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:12,234-Speed 5452.06 samples/sec   Loss 7.6032   LearningRate 0.0590   Epoch: 4   Global Step: 26370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:14,144-Speed 5362.29 samples/sec   Loss 7.4280   LearningRate 0.0590   Epoch: 4   Global Step: 26380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:16,062-Speed 5341.48 samples/sec   Loss 7.4761   LearningRate 0.0590   Epoch: 4   Global Step: 26390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:17,874-Speed 5655.95 samples/sec   Loss 7.4894   LearningRate 0.0590   Epoch: 4   Global Step: 26400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:19,698-Speed 5617.46 samples/sec   Loss 7.3503   LearningRate 0.0589   Epoch: 4   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:21,495-Speed 5702.85 samples/sec   Loss 7.6038   LearningRate 0.0589   Epoch: 4   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:23,304-Speed 5662.26 samples/sec   Loss 7.6072   LearningRate 0.0589   Epoch: 4   Global Step: 26430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:25,124-Speed 5628.47 samples/sec   Loss 7.5043   LearningRate 0.0589   Epoch: 4   Global Step: 26440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:26,954-Speed 5600.62 samples/sec   Loss 7.5197   LearningRate 0.0589   Epoch: 4   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:28,758-Speed 5676.78 samples/sec   Loss 7.4256   LearningRate 0.0589   Epoch: 4   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:30,575-Speed 5639.19 samples/sec   Loss 7.5595   LearningRate 0.0589   Epoch: 4   Global Step: 26470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:32,458-Speed 5440.50 samples/sec   Loss 7.4647   LearningRate 0.0589   Epoch: 4   Global Step: 26480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:34,291-Speed 5589.33 samples/sec   Loss 7.4791   LearningRate 0.0588   Epoch: 4   Global Step: 26490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:36,111-Speed 5628.92 samples/sec   Loss 7.2853   LearningRate 0.0588   Epoch: 4   Global Step: 26500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:37,915-Speed 5679.19 samples/sec   Loss 7.4500   LearningRate 0.0588   Epoch: 4   Global Step: 26510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:39,720-Speed 5677.80 samples/sec   Loss 7.4931   LearningRate 0.0588   Epoch: 4   Global Step: 26520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:41,557-Speed 5576.87 samples/sec   Loss 7.3707   LearningRate 0.0588   Epoch: 4   Global Step: 26530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:43,383-Speed 5611.12 samples/sec   Loss 7.3179   LearningRate 0.0588   Epoch: 4   Global Step: 26540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:45,179-Speed 5704.85 samples/sec   Loss 7.3906   LearningRate 0.0588   Epoch: 4   Global Step: 26550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:47,012-Speed 5585.67 samples/sec   Loss 7.3777   LearningRate 0.0587   Epoch: 4   Global Step: 26560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:48,846-Speed 5587.26 samples/sec   Loss 7.6497   LearningRate 0.0587   Epoch: 4   Global Step: 26570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:50,685-Speed 5570.50 samples/sec   Loss 7.6592   LearningRate 0.0587   Epoch: 4   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:52,521-Speed 5581.59 samples/sec   Loss 7.5139   LearningRate 0.0587   Epoch: 4   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:07:54,346-Speed 5615.96 samples/sec   Loss 7.4854   LearningRate 0.0587   Epoch: 4   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:56,208-Speed 5503.77 samples/sec   Loss 7.5613   LearningRate 0.0587   Epoch: 4   Global Step: 26610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:58,066-Speed 5514.47 samples/sec   Loss 7.5383   LearningRate 0.0587   Epoch: 4   Global Step: 26620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:07:59,904-Speed 5574.14 samples/sec   Loss 7.5293   LearningRate 0.0586   Epoch: 4   Global Step: 26630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:01,744-Speed 5566.40 samples/sec   Loss 7.4910   LearningRate 0.0586   Epoch: 4   Global Step: 26640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:03,561-Speed 5638.40 samples/sec   Loss 7.5642   LearningRate 0.0586   Epoch: 4   Global Step: 26650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:05,381-Speed 5630.77 samples/sec   Loss 7.4485   LearningRate 0.0586   Epoch: 4   Global Step: 26660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:07,225-Speed 5556.70 samples/sec   Loss 7.4892   LearningRate 0.0586   Epoch: 4   Global Step: 26670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:09,066-Speed 5564.61 samples/sec   Loss 7.6369   LearningRate 0.0586   Epoch: 4   Global Step: 26680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:10,889-Speed 5621.76 samples/sec   Loss 7.5659   LearningRate 0.0586   Epoch: 4   Global Step: 26690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:12,706-Speed 5638.21 samples/sec   Loss 7.4668   LearningRate 0.0586   Epoch: 4   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:14,559-Speed 5530.54 samples/sec   Loss 7.4243   LearningRate 0.0585   Epoch: 4   Global Step: 26710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:16,381-Speed 5621.68 samples/sec   Loss 7.5662   LearningRate 0.0585   Epoch: 4   Global Step: 26720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:18,196-Speed 5645.07 samples/sec   Loss 7.4799   LearningRate 0.0585   Epoch: 4   Global Step: 26730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:20,023-Speed 5606.79 samples/sec   Loss 7.4997   LearningRate 0.0585   Epoch: 4   Global Step: 26740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:21,832-Speed 5664.91 samples/sec   Loss 7.4734   LearningRate 0.0585   Epoch: 4   Global Step: 26750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:23,642-Speed 5661.14 samples/sec   Loss 7.5154   LearningRate 0.0585   Epoch: 4   Global Step: 26760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:25,457-Speed 5646.96 samples/sec   Loss 7.3706   LearningRate 0.0585   Epoch: 4   Global Step: 26770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:27,305-Speed 5544.14 samples/sec   Loss 7.3896   LearningRate 0.0584   Epoch: 4   Global Step: 26780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:29,126-Speed 5626.64 samples/sec   Loss 7.4787   LearningRate 0.0584   Epoch: 4   Global Step: 26790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:30,958-Speed 5593.01 samples/sec   Loss 7.4730   LearningRate 0.0584   Epoch: 4   Global Step: 26800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:32,797-Speed 5570.05 samples/sec   Loss 7.5850   LearningRate 0.0584   Epoch: 4   Global Step: 26810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:34,636-Speed 5572.99 samples/sec   Loss 7.6456   LearningRate 0.0584   Epoch: 4   Global Step: 26820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:36,461-Speed 5611.50 samples/sec   Loss 7.4809   LearningRate 0.0584   Epoch: 4   Global Step: 26830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:38,286-Speed 5618.04 samples/sec   Loss 7.4380   LearningRate 0.0584   Epoch: 4   Global Step: 26840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:40,136-Speed 5537.24 samples/sec   Loss 7.5574   LearningRate 0.0584   Epoch: 4   Global Step: 26850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:41,952-Speed 5641.40 samples/sec   Loss 7.3891   LearningRate 0.0583   Epoch: 4   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:43,789-Speed 5578.09 samples/sec   Loss 7.5329   LearningRate 0.0583   Epoch: 4   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:45,615-Speed 5609.57 samples/sec   Loss 7.5733   LearningRate 0.0583   Epoch: 4   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:47,448-Speed 5592.76 samples/sec   Loss 7.3661   LearningRate 0.0583   Epoch: 4   Global Step: 26890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:49,269-Speed 5626.97 samples/sec   Loss 7.6115   LearningRate 0.0583   Epoch: 4   Global Step: 26900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:08:51,104-Speed 5584.25 samples/sec   Loss 7.5283   LearningRate 0.0583   Epoch: 4   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:52,961-Speed 5516.98 samples/sec   Loss 7.4988   LearningRate 0.0583   Epoch: 4   Global Step: 26920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:54,812-Speed 5533.20 samples/sec   Loss 7.5508   LearningRate 0.0582   Epoch: 4   Global Step: 26930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:56,632-Speed 5630.54 samples/sec   Loss 7.4693   LearningRate 0.0582   Epoch: 4   Global Step: 26940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:08:58,454-Speed 5623.62 samples/sec   Loss 7.3584   LearningRate 0.0582   Epoch: 4   Global Step: 26950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:00,303-Speed 5540.54 samples/sec   Loss 7.6089   LearningRate 0.0582   Epoch: 4   Global Step: 26960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:02,150-Speed 5547.27 samples/sec   Loss 7.6868   LearningRate 0.0582   Epoch: 4   Global Step: 26970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:03,983-Speed 5587.24 samples/sec   Loss 7.4050   LearningRate 0.0582   Epoch: 4   Global Step: 26980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:05,859-Speed 5462.56 samples/sec   Loss 7.4566   LearningRate 0.0582   Epoch: 4   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:07,672-Speed 5652.60 samples/sec   Loss 7.5537   LearningRate 0.0582   Epoch: 4   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:09,490-Speed 5635.65 samples/sec   Loss 7.4922   LearningRate 0.0581   Epoch: 4   Global Step: 27010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:11,313-Speed 5620.58 samples/sec   Loss 7.3012   LearningRate 0.0581   Epoch: 4   Global Step: 27020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:13,126-Speed 5650.16 samples/sec   Loss 7.4469   LearningRate 0.0581   Epoch: 4   Global Step: 27030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:14,971-Speed 5554.77 samples/sec   Loss 7.5241   LearningRate 0.0581   Epoch: 4   Global Step: 27040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:16,805-Speed 5585.69 samples/sec   Loss 7.4218   LearningRate 0.0581   Epoch: 4   Global Step: 27050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:18,652-Speed 5544.90 samples/sec   Loss 7.4175   LearningRate 0.0581   Epoch: 4   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:20,503-Speed 5537.37 samples/sec   Loss 7.4119   LearningRate 0.0581   Epoch: 4   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:22,331-Speed 5602.44 samples/sec   Loss 7.4391   LearningRate 0.0580   Epoch: 4   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:24,139-Speed 5665.68 samples/sec   Loss 7.3268   LearningRate 0.0580   Epoch: 4   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:25,963-Speed 5619.99 samples/sec   Loss 7.3658   LearningRate 0.0580   Epoch: 4   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:27,777-Speed 5648.61 samples/sec   Loss 7.3733   LearningRate 0.0580   Epoch: 4   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:29,609-Speed 5592.00 samples/sec   Loss 7.4578   LearningRate 0.0580   Epoch: 4   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:31,454-Speed 5552.10 samples/sec   Loss 7.4586   LearningRate 0.0580   Epoch: 4   Global Step: 27130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:33,286-Speed 5592.21 samples/sec   Loss 7.3163   LearningRate 0.0580   Epoch: 4   Global Step: 27140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:35,121-Speed 5584.02 samples/sec   Loss 7.4987   LearningRate 0.0580   Epoch: 4   Global Step: 27150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:36,977-Speed 5520.62 samples/sec   Loss 7.4345   LearningRate 0.0579   Epoch: 4   Global Step: 27160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:38,790-Speed 5651.94 samples/sec   Loss 7.2839   LearningRate 0.0579   Epoch: 4   Global Step: 27170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:40,652-Speed 5540.08 samples/sec   Loss 7.5560   LearningRate 0.0579   Epoch: 4   Global Step: 27180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:42,479-Speed 5606.47 samples/sec   Loss 7.3716   LearningRate 0.0579   Epoch: 4   Global Step: 27190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:44,291-Speed 5654.50 samples/sec   Loss 7.4368   LearningRate 0.0579   Epoch: 4   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:46,116-Speed 5614.49 samples/sec   Loss 7.3859   LearningRate 0.0579   Epoch: 4   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:47,923-Speed 5670.84 samples/sec   Loss 7.4070   LearningRate 0.0579   Epoch: 4   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:09:49,741-Speed 5634.82 samples/sec   Loss 7.5735   LearningRate 0.0578   Epoch: 4   Global Step: 27230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:51,554-Speed 5650.42 samples/sec   Loss 7.4800   LearningRate 0.0578   Epoch: 4   Global Step: 27240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:53,373-Speed 5632.48 samples/sec   Loss 7.2415   LearningRate 0.0578   Epoch: 4   Global Step: 27250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:55,175-Speed 5687.90 samples/sec   Loss 7.2886   LearningRate 0.0578   Epoch: 4   Global Step: 27260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:57,039-Speed 5494.68 samples/sec   Loss 7.4089   LearningRate 0.0578   Epoch: 4   Global Step: 27270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:09:58,850-Speed 5656.60 samples/sec   Loss 7.2998   LearningRate 0.0578   Epoch: 4   Global Step: 27280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:00,691-Speed 5565.99 samples/sec   Loss 7.5345   LearningRate 0.0578   Epoch: 4   Global Step: 27290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:02,525-Speed 5586.08 samples/sec   Loss 7.5694   LearningRate 0.0578   Epoch: 4   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:04,352-Speed 5609.03 samples/sec   Loss 7.5508   LearningRate 0.0577   Epoch: 4   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:06,173-Speed 5628.50 samples/sec   Loss 7.3567   LearningRate 0.0577   Epoch: 4   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:08,006-Speed 5588.05 samples/sec   Loss 7.4764   LearningRate 0.0577   Epoch: 4   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:09,837-Speed 5596.35 samples/sec   Loss 7.2985   LearningRate 0.0577   Epoch: 4   Global Step: 27340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:11,648-Speed 5655.58 samples/sec   Loss 7.4298   LearningRate 0.0577   Epoch: 4   Global Step: 27350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:13,507-Speed 5512.07 samples/sec   Loss 7.3093   LearningRate 0.0577   Epoch: 4   Global Step: 27360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:15,314-Speed 5669.69 samples/sec   Loss 7.2227   LearningRate 0.0577   Epoch: 4   Global Step: 27370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:17,141-Speed 5607.13 samples/sec   Loss 7.3490   LearningRate 0.0576   Epoch: 4   Global Step: 27380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:18,957-Speed 5641.88 samples/sec   Loss 7.5673   LearningRate 0.0576   Epoch: 4   Global Step: 27390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:20,771-Speed 5647.18 samples/sec   Loss 7.3757   LearningRate 0.0576   Epoch: 4   Global Step: 27400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:22,615-Speed 5558.67 samples/sec   Loss 7.4542   LearningRate 0.0576   Epoch: 4   Global Step: 27410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:24,443-Speed 5602.34 samples/sec   Loss 7.4320   LearningRate 0.0576   Epoch: 4   Global Step: 27420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:26,268-Speed 5615.65 samples/sec   Loss 7.4241   LearningRate 0.0576   Epoch: 4   Global Step: 27430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:28,093-Speed 5611.53 samples/sec   Loss 7.4217   LearningRate 0.0576   Epoch: 4   Global Step: 27440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:29,913-Speed 5631.76 samples/sec   Loss 7.3259   LearningRate 0.0576   Epoch: 4   Global Step: 27450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:31,726-Speed 5650.27 samples/sec   Loss 7.4468   LearningRate 0.0575   Epoch: 4   Global Step: 27460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:33,557-Speed 5594.56 samples/sec   Loss 7.3928   LearningRate 0.0575   Epoch: 4   Global Step: 27470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:35,380-Speed 5620.70 samples/sec   Loss 7.4591   LearningRate 0.0575   Epoch: 4   Global Step: 27480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:37,197-Speed 5638.48 samples/sec   Loss 7.3472   LearningRate 0.0575   Epoch: 4   Global Step: 27490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:39,019-Speed 5623.62 samples/sec   Loss 7.3275   LearningRate 0.0575   Epoch: 4   Global Step: 27500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:40,831-Speed 5651.90 samples/sec   Loss 7.4120   LearningRate 0.0575   Epoch: 4   Global Step: 27510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:42,660-Speed 5601.70 samples/sec   Loss 7.5535   LearningRate 0.0575   Epoch: 4   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:44,497-Speed 5577.07 samples/sec   Loss 7.1821   LearningRate 0.0574   Epoch: 4   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:46,331-Speed 5589.08 samples/sec   Loss 7.4136   LearningRate 0.0574   Epoch: 4   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:48,151-Speed 5629.63 samples/sec   Loss 7.4343   LearningRate 0.0574   Epoch: 4   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:50,043-Speed 5414.41 samples/sec   Loss 7.4361   LearningRate 0.0574   Epoch: 4   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:51,862-Speed 5632.91 samples/sec   Loss 7.4183   LearningRate 0.0574   Epoch: 4   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:10:53,685-Speed 5618.19 samples/sec   Loss 7.4731   LearningRate 0.0574   Epoch: 4   Global Step: 27580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:55,529-Speed 5558.65 samples/sec   Loss 7.3005   LearningRate 0.0574   Epoch: 4   Global Step: 27590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:57,358-Speed 5599.43 samples/sec   Loss 7.4585   LearningRate 0.0574   Epoch: 4   Global Step: 27600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:10:59,207-Speed 5539.77 samples/sec   Loss 7.3687   LearningRate 0.0573   Epoch: 4   Global Step: 27610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:11:01,034-Speed 5610.20 samples/sec   Loss 7.4099   LearningRate 0.0573   Epoch: 4   Global Step: 27620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:11:02,881-Speed 5545.13 samples/sec   Loss 7.4186   LearningRate 0.0573   Epoch: 4   Global Step: 27630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:11:04,717-Speed 5581.66 samples/sec   Loss 7.4852   LearningRate 0.0573   Epoch: 4   Global Step: 27640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:11:06,573-Speed 5519.98 samples/sec   Loss 7.3013   LearningRate 0.0573   Epoch: 4   Global Step: 27650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:11:08,403-Speed 5598.35 samples/sec   Loss 7.3970   LearningRate 0.0573   Epoch: 4   Global Step: 27660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:11:10,235-Speed 5592.78 samples/sec   Loss 7.3299   LearningRate 0.0573   Epoch: 4   Global Step: 27670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:11:12,080-Speed 5554.39 samples/sec   Loss 7.4914   LearningRate 0.0572   Epoch: 4   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:13,904-Speed 5614.55 samples/sec   Loss 7.5174   LearningRate 0.0572   Epoch: 4   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:15,754-Speed 5540.92 samples/sec   Loss 7.3122   LearningRate 0.0572   Epoch: 4   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:17,591-Speed 5576.10 samples/sec   Loss 7.3944   LearningRate 0.0572   Epoch: 4   Global Step: 27710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:19,433-Speed 5563.61 samples/sec   Loss 7.3309   LearningRate 0.0572   Epoch: 4   Global Step: 27720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:21,250-Speed 5637.76 samples/sec   Loss 7.2382   LearningRate 0.0572   Epoch: 4   Global Step: 27730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:23,063-Speed 5650.50 samples/sec   Loss 7.4465   LearningRate 0.0572   Epoch: 4   Global Step: 27740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:24,900-Speed 5577.86 samples/sec   Loss 7.3250   LearningRate 0.0572   Epoch: 4   Global Step: 27750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:26,714-Speed 5648.33 samples/sec   Loss 7.3515   LearningRate 0.0571   Epoch: 4   Global Step: 27760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:28,531-Speed 5636.63 samples/sec   Loss 7.2981   LearningRate 0.0571   Epoch: 4   Global Step: 27770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:30,351-Speed 5630.28 samples/sec   Loss 7.2432   LearningRate 0.0571   Epoch: 4   Global Step: 27780   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-27 03:11:32,149-Speed 5694.74 samples/sec   Loss 7.3794   LearningRate 0.0571   Epoch: 4   Global Step: 27790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:33,975-Speed 5614.34 samples/sec   Loss 7.4096   LearningRate 0.0571   Epoch: 4   Global Step: 27800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:35,820-Speed 5552.97 samples/sec   Loss 7.2657   LearningRate 0.0571   Epoch: 4   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:37,639-Speed 5631.82 samples/sec   Loss 7.2490   LearningRate 0.0571   Epoch: 4   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:39,494-Speed 5522.66 samples/sec   Loss 7.3635   LearningRate 0.0570   Epoch: 4   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:41,319-Speed 5613.63 samples/sec   Loss 7.3598   LearningRate 0.0570   Epoch: 4   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:43,143-Speed 5616.24 samples/sec   Loss 7.3858   LearningRate 0.0570   Epoch: 4   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:44,964-Speed 5628.10 samples/sec   Loss 7.3567   LearningRate 0.0570   Epoch: 4   Global Step: 27860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:46,774-Speed 5660.77 samples/sec   Loss 7.4582   LearningRate 0.0570   Epoch: 4   Global Step: 27870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:48,592-Speed 5632.53 samples/sec   Loss 7.2770   LearningRate 0.0570   Epoch: 4   Global Step: 27880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:50,410-Speed 5636.05 samples/sec   Loss 7.3501   LearningRate 0.0570   Epoch: 4   Global Step: 27890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:52,251-Speed 5565.17 samples/sec   Loss 7.3963   LearningRate 0.0570   Epoch: 4   Global Step: 27900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:54,080-Speed 5602.61 samples/sec   Loss 7.3521   LearningRate 0.0569   Epoch: 4   Global Step: 27910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:55,892-Speed 5655.77 samples/sec   Loss 7.1377   LearningRate 0.0569   Epoch: 4   Global Step: 27920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:57,726-Speed 5585.36 samples/sec   Loss 7.2981   LearningRate 0.0569   Epoch: 4   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:11:59,536-Speed 5659.40 samples/sec   Loss 7.3879   LearningRate 0.0569   Epoch: 4   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:12:01,358-Speed 5623.70 samples/sec   Loss 7.4082   LearningRate 0.0569   Epoch: 4   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:12:03,222-Speed 5497.30 samples/sec   Loss 7.3907   LearningRate 0.0569   Epoch: 4   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:12:05,051-Speed 5603.62 samples/sec   Loss 7.3384   LearningRate 0.0569   Epoch: 4   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:12:06,874-Speed 5620.04 samples/sec   Loss 7.3258   LearningRate 0.0568   Epoch: 4   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:12:08,708-Speed 5587.44 samples/sec   Loss 7.5364   LearningRate 0.0568   Epoch: 4   Global Step: 27990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:12:10,552-Speed 5553.29 samples/sec   Loss 7.4023   LearningRate 0.0568   Epoch: 4   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:12:37,586-[lfw][28000]XNorm: 21.047332
Training: 2022-04-27 03:12:37,586-[lfw][28000]Accuracy-Flip: 0.99667+-0.00279
Training: 2022-04-27 03:12:37,587-[lfw][28000]Accuracy-Highest: 0.99717
Training: 2022-04-27 03:13:08,558-[cfp_fp][28000]XNorm: 18.570384
Training: 2022-04-27 03:13:08,558-[cfp_fp][28000]Accuracy-Flip: 0.92343+-0.01185
Training: 2022-04-27 03:13:08,559-[cfp_fp][28000]Accuracy-Highest: 0.93243
Training: 2022-04-27 03:13:35,348-[agedb_30][28000]XNorm: 20.969596
Training: 2022-04-27 03:13:35,349-[agedb_30][28000]Accuracy-Flip: 0.96883+-0.00778
Training: 2022-04-27 03:13:35,350-[agedb_30][28000]Accuracy-Highest: 0.96883
Training: 2022-04-27 03:13:37,194-Speed 118.19 samples/sec   Loss 7.4707   LearningRate 0.0568   Epoch: 4   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:13:39,037-Speed 5558.65 samples/sec   Loss 7.2823   LearningRate 0.0568   Epoch: 4   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:13:40,867-Speed 5599.80 samples/sec   Loss 7.3509   LearningRate 0.0568   Epoch: 4   Global Step: 28030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:13:42,676-Speed 5663.51 samples/sec   Loss 7.2962   LearningRate 0.0568   Epoch: 4   Global Step: 28040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:13:44,482-Speed 5673.83 samples/sec   Loss 7.3249   LearningRate 0.0568   Epoch: 4   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:46,341-Speed 5511.76 samples/sec   Loss 7.2162   LearningRate 0.0567   Epoch: 4   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:48,162-Speed 5626.64 samples/sec   Loss 7.3043   LearningRate 0.0567   Epoch: 4   Global Step: 28070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:49,978-Speed 5642.31 samples/sec   Loss 7.3400   LearningRate 0.0567   Epoch: 4   Global Step: 28080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:51,799-Speed 5626.48 samples/sec   Loss 7.3729   LearningRate 0.0567   Epoch: 4   Global Step: 28090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:53,621-Speed 5625.09 samples/sec   Loss 7.4800   LearningRate 0.0567   Epoch: 4   Global Step: 28100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:55,435-Speed 5646.93 samples/sec   Loss 7.3901   LearningRate 0.0567   Epoch: 4   Global Step: 28110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:57,260-Speed 5616.62 samples/sec   Loss 7.4090   LearningRate 0.0567   Epoch: 4   Global Step: 28120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:13:59,095-Speed 5582.61 samples/sec   Loss 7.3481   LearningRate 0.0566   Epoch: 4   Global Step: 28130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:00,920-Speed 5614.71 samples/sec   Loss 7.4330   LearningRate 0.0566   Epoch: 4   Global Step: 28140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:02,754-Speed 5584.59 samples/sec   Loss 7.2727   LearningRate 0.0566   Epoch: 4   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:04,634-Speed 5450.33 samples/sec   Loss 7.1587   LearningRate 0.0566   Epoch: 4   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:06,454-Speed 5630.01 samples/sec   Loss 7.1924   LearningRate 0.0566   Epoch: 4   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:08,290-Speed 5580.00 samples/sec   Loss 7.4075   LearningRate 0.0566   Epoch: 4   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:10,163-Speed 5471.02 samples/sec   Loss 7.2027   LearningRate 0.0566   Epoch: 4   Global Step: 28190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:12,072-Speed 5366.62 samples/sec   Loss 7.3481   LearningRate 0.0566   Epoch: 4   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:13,918-Speed 5550.19 samples/sec   Loss 7.4129   LearningRate 0.0565   Epoch: 4   Global Step: 28210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:15,768-Speed 5537.48 samples/sec   Loss 7.2081   LearningRate 0.0565   Epoch: 4   Global Step: 28220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:17,618-Speed 5540.59 samples/sec   Loss 7.4543   LearningRate 0.0565   Epoch: 4   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:19,442-Speed 5614.91 samples/sec   Loss 7.4613   LearningRate 0.0565   Epoch: 4   Global Step: 28240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:21,281-Speed 5575.20 samples/sec   Loss 7.3356   LearningRate 0.0565   Epoch: 4   Global Step: 28250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:23,115-Speed 5589.90 samples/sec   Loss 7.2787   LearningRate 0.0565   Epoch: 4   Global Step: 28260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:24,944-Speed 5599.32 samples/sec   Loss 7.1830   LearningRate 0.0565   Epoch: 4   Global Step: 28270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:26,775-Speed 5595.31 samples/sec   Loss 7.2396   LearningRate 0.0564   Epoch: 4   Global Step: 28280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:28,686-Speed 5360.86 samples/sec   Loss 7.2367   LearningRate 0.0564   Epoch: 4   Global Step: 28290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:30,544-Speed 5514.56 samples/sec   Loss 7.2248   LearningRate 0.0564   Epoch: 4   Global Step: 28300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:32,363-Speed 5631.07 samples/sec   Loss 7.2430   LearningRate 0.0564   Epoch: 4   Global Step: 28310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:34,212-Speed 5542.14 samples/sec   Loss 7.3521   LearningRate 0.0564   Epoch: 4   Global Step: 28320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:36,027-Speed 5647.14 samples/sec   Loss 7.4476   LearningRate 0.0564   Epoch: 4   Global Step: 28330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:37,844-Speed 5637.03 samples/sec   Loss 7.2961   LearningRate 0.0564   Epoch: 4   Global Step: 28340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:39,679-Speed 5582.46 samples/sec   Loss 7.2690   LearningRate 0.0564   Epoch: 4   Global Step: 28350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:41,518-Speed 5572.53 samples/sec   Loss 7.3434   LearningRate 0.0563   Epoch: 4   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:14:43,337-Speed 5634.58 samples/sec   Loss 7.4459   LearningRate 0.0563   Epoch: 4   Global Step: 28370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:45,173-Speed 5578.01 samples/sec   Loss 7.3349   LearningRate 0.0563   Epoch: 4   Global Step: 28380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:47,032-Speed 5511.43 samples/sec   Loss 7.3305   LearningRate 0.0563   Epoch: 4   Global Step: 28390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:48,859-Speed 5609.00 samples/sec   Loss 7.2245   LearningRate 0.0563   Epoch: 4   Global Step: 28400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:50,711-Speed 5531.25 samples/sec   Loss 7.1992   LearningRate 0.0563   Epoch: 4   Global Step: 28410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:52,671-Speed 5228.36 samples/sec   Loss 7.3783   LearningRate 0.0563   Epoch: 4   Global Step: 28420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:14:54,454-Speed 5746.10 samples/sec   Loss 7.3105   LearningRate 0.0563   Epoch: 4   Global Step: 28430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:10,061-Speed 656.17 samples/sec   Loss 6.7975   LearningRate 0.0562   Epoch: 5   Global Step: 28440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:11,928-Speed 5489.71 samples/sec   Loss 6.6816   LearningRate 0.0562   Epoch: 5   Global Step: 28450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:13,753-Speed 5613.05 samples/sec   Loss 6.8391   LearningRate 0.0562   Epoch: 5   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:15,606-Speed 5531.03 samples/sec   Loss 6.7640   LearningRate 0.0562   Epoch: 5   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:17,500-Speed 5408.88 samples/sec   Loss 6.6999   LearningRate 0.0562   Epoch: 5   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:19,331-Speed 5596.34 samples/sec   Loss 6.6379   LearningRate 0.0562   Epoch: 5   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:21,179-Speed 5545.36 samples/sec   Loss 6.6999   LearningRate 0.0562   Epoch: 5   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:23,013-Speed 5585.31 samples/sec   Loss 6.7053   LearningRate 0.0561   Epoch: 5   Global Step: 28510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:24,827-Speed 5649.38 samples/sec   Loss 6.7827   LearningRate 0.0561   Epoch: 5   Global Step: 28520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:26,665-Speed 5573.10 samples/sec   Loss 6.8327   LearningRate 0.0561   Epoch: 5   Global Step: 28530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:28,484-Speed 5632.36 samples/sec   Loss 6.8206   LearningRate 0.0561   Epoch: 5   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:30,290-Speed 5673.45 samples/sec   Loss 6.7471   LearningRate 0.0561   Epoch: 5   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:32,106-Speed 5640.36 samples/sec   Loss 6.7916   LearningRate 0.0561   Epoch: 5   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:33,945-Speed 5570.70 samples/sec   Loss 6.7803   LearningRate 0.0561   Epoch: 5   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:35,749-Speed 5677.81 samples/sec   Loss 6.7195   LearningRate 0.0561   Epoch: 5   Global Step: 28580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:37,567-Speed 5634.66 samples/sec   Loss 6.7842   LearningRate 0.0560   Epoch: 5   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:39,394-Speed 5605.46 samples/sec   Loss 6.7595   LearningRate 0.0560   Epoch: 5   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:41,213-Speed 5630.63 samples/sec   Loss 6.8438   LearningRate 0.0560   Epoch: 5   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:43,033-Speed 5628.14 samples/sec   Loss 6.6716   LearningRate 0.0560   Epoch: 5   Global Step: 28620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:44,849-Speed 5642.13 samples/sec   Loss 6.9387   LearningRate 0.0560   Epoch: 5   Global Step: 28630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:46,657-Speed 5666.97 samples/sec   Loss 6.8810   LearningRate 0.0560   Epoch: 5   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:48,470-Speed 5648.71 samples/sec   Loss 6.8100   LearningRate 0.0560   Epoch: 5   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:50,287-Speed 5639.94 samples/sec   Loss 6.9717   LearningRate 0.0559   Epoch: 5   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:15:52,096-Speed 5659.51 samples/sec   Loss 6.8396   LearningRate 0.0559   Epoch: 5   Global Step: 28670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:53,912-Speed 5644.02 samples/sec   Loss 6.8145   LearningRate 0.0559   Epoch: 5   Global Step: 28680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:55,715-Speed 5680.77 samples/sec   Loss 6.8373   LearningRate 0.0559   Epoch: 5   Global Step: 28690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:57,518-Speed 5679.28 samples/sec   Loss 7.0163   LearningRate 0.0559   Epoch: 5   Global Step: 28700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:15:59,338-Speed 5629.38 samples/sec   Loss 6.9819   LearningRate 0.0559   Epoch: 5   Global Step: 28710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:01,160-Speed 5622.65 samples/sec   Loss 6.9801   LearningRate 0.0559   Epoch: 5   Global Step: 28720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:02,975-Speed 5643.10 samples/sec   Loss 6.9049   LearningRate 0.0559   Epoch: 5   Global Step: 28730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:04,809-Speed 5585.22 samples/sec   Loss 6.8306   LearningRate 0.0558   Epoch: 5   Global Step: 28740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:06,626-Speed 5639.04 samples/sec   Loss 6.9674   LearningRate 0.0558   Epoch: 5   Global Step: 28750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:08,458-Speed 5591.46 samples/sec   Loss 6.8519   LearningRate 0.0558   Epoch: 5   Global Step: 28760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:10,292-Speed 5584.85 samples/sec   Loss 6.9212   LearningRate 0.0558   Epoch: 5   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:12,119-Speed 5606.44 samples/sec   Loss 6.8318   LearningRate 0.0558   Epoch: 5   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:13,943-Speed 5615.43 samples/sec   Loss 6.9979   LearningRate 0.0558   Epoch: 5   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:15,784-Speed 5563.45 samples/sec   Loss 6.9054   LearningRate 0.0558   Epoch: 5   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:17,616-Speed 5591.78 samples/sec   Loss 7.0134   LearningRate 0.0557   Epoch: 5   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:19,453-Speed 5576.09 samples/sec   Loss 7.0146   LearningRate 0.0557   Epoch: 5   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:21,252-Speed 5693.90 samples/sec   Loss 7.1187   LearningRate 0.0557   Epoch: 5   Global Step: 28830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:23,070-Speed 5635.30 samples/sec   Loss 6.9082   LearningRate 0.0557   Epoch: 5   Global Step: 28840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:24,882-Speed 5653.39 samples/sec   Loss 7.1119   LearningRate 0.0557   Epoch: 5   Global Step: 28850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:26,705-Speed 5620.12 samples/sec   Loss 6.9756   LearningRate 0.0557   Epoch: 5   Global Step: 28860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:28,521-Speed 5640.33 samples/sec   Loss 6.9693   LearningRate 0.0557   Epoch: 5   Global Step: 28870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:30,332-Speed 5656.11 samples/sec   Loss 6.9116   LearningRate 0.0557   Epoch: 5   Global Step: 28880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:32,150-Speed 5633.67 samples/sec   Loss 6.9215   LearningRate 0.0556   Epoch: 5   Global Step: 28890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:34,065-Speed 5347.83 samples/sec   Loss 6.9220   LearningRate 0.0556   Epoch: 5   Global Step: 28900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:35,917-Speed 5532.33 samples/sec   Loss 6.9788   LearningRate 0.0556   Epoch: 5   Global Step: 28910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:37,797-Speed 5450.41 samples/sec   Loss 7.0148   LearningRate 0.0556   Epoch: 5   Global Step: 28920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:39,693-Speed 5400.16 samples/sec   Loss 7.0386   LearningRate 0.0556   Epoch: 5   Global Step: 28930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:41,527-Speed 5584.89 samples/sec   Loss 7.2240   LearningRate 0.0556   Epoch: 5   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:43,380-Speed 5530.97 samples/sec   Loss 7.1789   LearningRate 0.0556   Epoch: 5   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:45,231-Speed 5532.64 samples/sec   Loss 6.9906   LearningRate 0.0556   Epoch: 5   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:47,139-Speed 5368.89 samples/sec   Loss 6.9829   LearningRate 0.0555   Epoch: 5   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:48,964-Speed 5615.13 samples/sec   Loss 7.0424   LearningRate 0.0555   Epoch: 5   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:50,778-Speed 5645.92 samples/sec   Loss 6.9556   LearningRate 0.0555   Epoch: 5   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:16:52,580-Speed 5684.95 samples/sec   Loss 7.0257   LearningRate 0.0555   Epoch: 5   Global Step: 29000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:54,399-Speed 5629.55 samples/sec   Loss 7.0209   LearningRate 0.0555   Epoch: 5   Global Step: 29010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:56,215-Speed 5641.93 samples/sec   Loss 6.9682   LearningRate 0.0555   Epoch: 5   Global Step: 29020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:58,026-Speed 5654.81 samples/sec   Loss 7.1572   LearningRate 0.0555   Epoch: 5   Global Step: 29030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:16:59,843-Speed 5639.55 samples/sec   Loss 7.0298   LearningRate 0.0554   Epoch: 5   Global Step: 29040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:01,705-Speed 5500.03 samples/sec   Loss 7.1544   LearningRate 0.0554   Epoch: 5   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:03,534-Speed 5599.62 samples/sec   Loss 6.9042   LearningRate 0.0554   Epoch: 5   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:05,341-Speed 5672.24 samples/sec   Loss 7.1493   LearningRate 0.0554   Epoch: 5   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:07,177-Speed 5576.80 samples/sec   Loss 6.9621   LearningRate 0.0554   Epoch: 5   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:09,020-Speed 5557.87 samples/sec   Loss 6.8725   LearningRate 0.0554   Epoch: 5   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:10,841-Speed 5626.62 samples/sec   Loss 6.9045   LearningRate 0.0554   Epoch: 5   Global Step: 29100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:12,653-Speed 5652.64 samples/sec   Loss 7.0957   LearningRate 0.0554   Epoch: 5   Global Step: 29110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:14,476-Speed 5617.35 samples/sec   Loss 7.0391   LearningRate 0.0553   Epoch: 5   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:16,290-Speed 5647.48 samples/sec   Loss 7.1195   LearningRate 0.0553   Epoch: 5   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:18,121-Speed 5594.11 samples/sec   Loss 7.0124   LearningRate 0.0553   Epoch: 5   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:19,940-Speed 5633.37 samples/sec   Loss 7.1565   LearningRate 0.0553   Epoch: 5   Global Step: 29150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:21,763-Speed 5618.45 samples/sec   Loss 7.0038   LearningRate 0.0553   Epoch: 5   Global Step: 29160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:23,574-Speed 5655.54 samples/sec   Loss 7.0391   LearningRate 0.0553   Epoch: 5   Global Step: 29170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:25,386-Speed 5655.14 samples/sec   Loss 7.0377   LearningRate 0.0553   Epoch: 5   Global Step: 29180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:27,193-Speed 5668.72 samples/sec   Loss 7.1436   LearningRate 0.0553   Epoch: 5   Global Step: 29190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:29,035-Speed 5560.31 samples/sec   Loss 7.0498   LearningRate 0.0552   Epoch: 5   Global Step: 29200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:30,868-Speed 5587.61 samples/sec   Loss 7.1311   LearningRate 0.0552   Epoch: 5   Global Step: 29210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:32,688-Speed 5628.26 samples/sec   Loss 7.0901   LearningRate 0.0552   Epoch: 5   Global Step: 29220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:34,522-Speed 5585.40 samples/sec   Loss 6.9247   LearningRate 0.0552   Epoch: 5   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:36,344-Speed 5621.60 samples/sec   Loss 7.0359   LearningRate 0.0552   Epoch: 5   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:38,150-Speed 5672.94 samples/sec   Loss 6.9563   LearningRate 0.0552   Epoch: 5   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:39,982-Speed 5591.09 samples/sec   Loss 6.9645   LearningRate 0.0552   Epoch: 5   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:41,803-Speed 5623.30 samples/sec   Loss 7.0125   LearningRate 0.0551   Epoch: 5   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:43,620-Speed 5638.97 samples/sec   Loss 7.1929   LearningRate 0.0551   Epoch: 5   Global Step: 29280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:17:45,424-Speed 5679.20 samples/sec   Loss 6.9678   LearningRate 0.0551   Epoch: 5   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:47,246-Speed 5620.99 samples/sec   Loss 7.0165   LearningRate 0.0551   Epoch: 5   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:49,059-Speed 5651.90 samples/sec   Loss 6.9220   LearningRate 0.0551   Epoch: 5   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:50,890-Speed 5595.21 samples/sec   Loss 7.1501   LearningRate 0.0551   Epoch: 5   Global Step: 29320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:52,717-Speed 5607.28 samples/sec   Loss 6.9887   LearningRate 0.0551   Epoch: 5   Global Step: 29330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:54,536-Speed 5629.94 samples/sec   Loss 7.0738   LearningRate 0.0551   Epoch: 5   Global Step: 29340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:56,349-Speed 5650.75 samples/sec   Loss 7.0926   LearningRate 0.0550   Epoch: 5   Global Step: 29350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:58,174-Speed 5612.45 samples/sec   Loss 7.1180   LearningRate 0.0550   Epoch: 5   Global Step: 29360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:17:59,992-Speed 5632.79 samples/sec   Loss 6.9720   LearningRate 0.0550   Epoch: 5   Global Step: 29370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:01,905-Speed 5354.98 samples/sec   Loss 7.0996   LearningRate 0.0550   Epoch: 5   Global Step: 29380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:03,723-Speed 5632.83 samples/sec   Loss 7.2517   LearningRate 0.0550   Epoch: 5   Global Step: 29390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:05,533-Speed 5661.57 samples/sec   Loss 7.1808   LearningRate 0.0550   Epoch: 5   Global Step: 29400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:07,347-Speed 5645.16 samples/sec   Loss 6.9010   LearningRate 0.0550   Epoch: 5   Global Step: 29410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:09,162-Speed 5644.37 samples/sec   Loss 6.9558   LearningRate 0.0550   Epoch: 5   Global Step: 29420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:10,983-Speed 5626.16 samples/sec   Loss 7.0450   LearningRate 0.0549   Epoch: 5   Global Step: 29430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:12,829-Speed 5548.73 samples/sec   Loss 6.9551   LearningRate 0.0549   Epoch: 5   Global Step: 29440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:14,667-Speed 5573.95 samples/sec   Loss 7.1283   LearningRate 0.0549   Epoch: 5   Global Step: 29450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:16,512-Speed 5551.13 samples/sec   Loss 6.9609   LearningRate 0.0549   Epoch: 5   Global Step: 29460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:18,360-Speed 5543.95 samples/sec   Loss 7.0696   LearningRate 0.0549   Epoch: 5   Global Step: 29470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:20,177-Speed 5638.24 samples/sec   Loss 7.1398   LearningRate 0.0549   Epoch: 5   Global Step: 29480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:22,004-Speed 5606.83 samples/sec   Loss 7.0291   LearningRate 0.0549   Epoch: 5   Global Step: 29490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:23,830-Speed 5608.38 samples/sec   Loss 7.1428   LearningRate 0.0548   Epoch: 5   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:25,656-Speed 5611.11 samples/sec   Loss 7.0763   LearningRate 0.0548   Epoch: 5   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:27,521-Speed 5492.03 samples/sec   Loss 7.1051   LearningRate 0.0548   Epoch: 5   Global Step: 29520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:29,349-Speed 5604.33 samples/sec   Loss 7.0213   LearningRate 0.0548   Epoch: 5   Global Step: 29530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:31,171-Speed 5620.59 samples/sec   Loss 7.1877   LearningRate 0.0548   Epoch: 5   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:32,990-Speed 5631.66 samples/sec   Loss 7.0383   LearningRate 0.0548   Epoch: 5   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:18:34,825-Speed 5583.09 samples/sec   Loss 7.1368   LearningRate 0.0548   Epoch: 5   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:36,641-Speed 5639.60 samples/sec   Loss 7.1953   LearningRate 0.0548   Epoch: 5   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:38,448-Speed 5670.57 samples/sec   Loss 7.1764   LearningRate 0.0547   Epoch: 5   Global Step: 29580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:40,270-Speed 5620.53 samples/sec   Loss 6.9903   LearningRate 0.0547   Epoch: 5   Global Step: 29590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:42,102-Speed 5592.48 samples/sec   Loss 7.1514   LearningRate 0.0547   Epoch: 5   Global Step: 29600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:43,921-Speed 5631.43 samples/sec   Loss 7.2875   LearningRate 0.0547   Epoch: 5   Global Step: 29610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:45,754-Speed 5586.12 samples/sec   Loss 7.0309   LearningRate 0.0547   Epoch: 5   Global Step: 29620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:47,571-Speed 5639.45 samples/sec   Loss 6.9308   LearningRate 0.0547   Epoch: 5   Global Step: 29630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:49,398-Speed 5606.45 samples/sec   Loss 7.1501   LearningRate 0.0547   Epoch: 5   Global Step: 29640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:51,259-Speed 5503.57 samples/sec   Loss 7.1559   LearningRate 0.0547   Epoch: 5   Global Step: 29650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:53,127-Speed 5484.51 samples/sec   Loss 7.1925   LearningRate 0.0546   Epoch: 5   Global Step: 29660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:54,996-Speed 5481.34 samples/sec   Loss 7.1468   LearningRate 0.0546   Epoch: 5   Global Step: 29670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:56,820-Speed 5615.99 samples/sec   Loss 7.1204   LearningRate 0.0546   Epoch: 5   Global Step: 29680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:18:58,640-Speed 5627.25 samples/sec   Loss 7.1655   LearningRate 0.0546   Epoch: 5   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:00,459-Speed 5631.25 samples/sec   Loss 7.0176   LearningRate 0.0546   Epoch: 5   Global Step: 29700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:02,288-Speed 5600.59 samples/sec   Loss 7.1473   LearningRate 0.0546   Epoch: 5   Global Step: 29710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:04,119-Speed 5594.08 samples/sec   Loss 6.9899   LearningRate 0.0546   Epoch: 5   Global Step: 29720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:05,951-Speed 5592.64 samples/sec   Loss 7.1505   LearningRate 0.0545   Epoch: 5   Global Step: 29730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:07,785-Speed 5584.19 samples/sec   Loss 7.1803   LearningRate 0.0545   Epoch: 5   Global Step: 29740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:09,622-Speed 5576.28 samples/sec   Loss 7.0667   LearningRate 0.0545   Epoch: 5   Global Step: 29750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:11,460-Speed 5573.70 samples/sec   Loss 7.0211   LearningRate 0.0545   Epoch: 5   Global Step: 29760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:13,279-Speed 5630.22 samples/sec   Loss 7.0714   LearningRate 0.0545   Epoch: 5   Global Step: 29770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:15,106-Speed 5607.43 samples/sec   Loss 7.2720   LearningRate 0.0545   Epoch: 5   Global Step: 29780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:16,921-Speed 5646.34 samples/sec   Loss 7.1631   LearningRate 0.0545   Epoch: 5   Global Step: 29790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:18,738-Speed 5636.96 samples/sec   Loss 7.1058   LearningRate 0.0545   Epoch: 5   Global Step: 29800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:20,544-Speed 5670.91 samples/sec   Loss 7.0178   LearningRate 0.0544   Epoch: 5   Global Step: 29810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:22,356-Speed 5653.03 samples/sec   Loss 7.1004   LearningRate 0.0544   Epoch: 5   Global Step: 29820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:24,164-Speed 5664.70 samples/sec   Loss 7.0914   LearningRate 0.0544   Epoch: 5   Global Step: 29830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:25,990-Speed 5609.33 samples/sec   Loss 7.1251   LearningRate 0.0544   Epoch: 5   Global Step: 29840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:27,799-Speed 5663.58 samples/sec   Loss 7.0588   LearningRate 0.0544   Epoch: 5   Global Step: 29850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:29,624-Speed 5612.06 samples/sec   Loss 7.0897   LearningRate 0.0544   Epoch: 5   Global Step: 29860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:31,438-Speed 5647.21 samples/sec   Loss 7.0025   LearningRate 0.0544   Epoch: 5   Global Step: 29870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:33,258-Speed 5628.38 samples/sec   Loss 7.1049   LearningRate 0.0544   Epoch: 5   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:35,074-Speed 5641.28 samples/sec   Loss 7.1515   LearningRate 0.0543   Epoch: 5   Global Step: 29890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:36,889-Speed 5643.51 samples/sec   Loss 6.9463   LearningRate 0.0543   Epoch: 5   Global Step: 29900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:19:38,703-Speed 5646.28 samples/sec   Loss 7.2233   LearningRate 0.0543   Epoch: 5   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:40,509-Speed 5673.10 samples/sec   Loss 7.1366   LearningRate 0.0543   Epoch: 5   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:42,325-Speed 5640.14 samples/sec   Loss 7.0537   LearningRate 0.0543   Epoch: 5   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:44,136-Speed 5657.94 samples/sec   Loss 7.0935   LearningRate 0.0543   Epoch: 5   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:46,023-Speed 5428.24 samples/sec   Loss 6.9823   LearningRate 0.0543   Epoch: 5   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:47,880-Speed 5515.40 samples/sec   Loss 7.2024   LearningRate 0.0542   Epoch: 5   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:49,705-Speed 5611.86 samples/sec   Loss 7.0772   LearningRate 0.0542   Epoch: 5   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:51,556-Speed 5534.72 samples/sec   Loss 7.0162   LearningRate 0.0542   Epoch: 5   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:53,373-Speed 5638.91 samples/sec   Loss 7.0283   LearningRate 0.0542   Epoch: 5   Global Step: 29990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:19:55,210-Speed 5575.83 samples/sec   Loss 7.2309   LearningRate 0.0542   Epoch: 5   Global Step: 30000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:20:21,450-[lfw][30000]XNorm: 22.914327
Training: 2022-04-27 03:20:21,451-[lfw][30000]Accuracy-Flip: 0.99700+-0.00323
Training: 2022-04-27 03:20:21,452-[lfw][30000]Accuracy-Highest: 0.99717
Training: 2022-04-27 03:20:52,786-[cfp_fp][30000]XNorm: 20.399378
Training: 2022-04-27 03:20:52,787-[cfp_fp][30000]Accuracy-Flip: 0.92700+-0.01357
Training: 2022-04-27 03:20:52,788-[cfp_fp][30000]Accuracy-Highest: 0.93243
Training: 2022-04-27 03:21:19,699-[agedb_30][30000]XNorm: 22.715611
Training: 2022-04-27 03:21:19,700-[agedb_30][30000]Accuracy-Flip: 0.96767+-0.00967
Training: 2022-04-27 03:21:19,700-[agedb_30][30000]Accuracy-Highest: 0.96883
Training: 2022-04-27 03:21:21,522-Speed 118.64 samples/sec   Loss 7.0276   LearningRate 0.0542   Epoch: 5   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:23,364-Speed 5561.17 samples/sec   Loss 7.1284   LearningRate 0.0542   Epoch: 5   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:25,180-Speed 5641.80 samples/sec   Loss 7.0219   LearningRate 0.0542   Epoch: 5   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:27,016-Speed 5580.89 samples/sec   Loss 7.0929   LearningRate 0.0541   Epoch: 5   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:28,852-Speed 5583.41 samples/sec   Loss 7.0706   LearningRate 0.0541   Epoch: 5   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:30,684-Speed 5590.44 samples/sec   Loss 7.1659   LearningRate 0.0541   Epoch: 5   Global Step: 30060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:32,509-Speed 5614.68 samples/sec   Loss 7.2337   LearningRate 0.0541   Epoch: 5   Global Step: 30070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:34,346-Speed 5577.91 samples/sec   Loss 7.0110   LearningRate 0.0541   Epoch: 5   Global Step: 30080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:36,167-Speed 5626.47 samples/sec   Loss 7.2490   LearningRate 0.0541   Epoch: 5   Global Step: 30090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:37,999-Speed 5594.92 samples/sec   Loss 7.0532   LearningRate 0.0541   Epoch: 5   Global Step: 30100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:39,845-Speed 5549.74 samples/sec   Loss 7.1994   LearningRate 0.0541   Epoch: 5   Global Step: 30110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:21:41,669-Speed 5616.93 samples/sec   Loss 7.1213   LearningRate 0.0540   Epoch: 5   Global Step: 30120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:43,511-Speed 5561.33 samples/sec   Loss 7.0243   LearningRate 0.0540   Epoch: 5   Global Step: 30130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:45,360-Speed 5541.86 samples/sec   Loss 7.0903   LearningRate 0.0540   Epoch: 5   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:47,181-Speed 5628.83 samples/sec   Loss 6.9572   LearningRate 0.0540   Epoch: 5   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:49,038-Speed 5516.92 samples/sec   Loss 7.1079   LearningRate 0.0540   Epoch: 5   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:50,850-Speed 5654.93 samples/sec   Loss 7.0744   LearningRate 0.0540   Epoch: 5   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:52,682-Speed 5591.88 samples/sec   Loss 7.2225   LearningRate 0.0540   Epoch: 5   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:54,539-Speed 5519.56 samples/sec   Loss 7.1624   LearningRate 0.0540   Epoch: 5   Global Step: 30190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:56,392-Speed 5526.79 samples/sec   Loss 7.1332   LearningRate 0.0539   Epoch: 5   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:21:58,255-Speed 5500.73 samples/sec   Loss 6.9912   LearningRate 0.0539   Epoch: 5   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:00,099-Speed 5557.92 samples/sec   Loss 7.2180   LearningRate 0.0539   Epoch: 5   Global Step: 30220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:01,913-Speed 5646.62 samples/sec   Loss 7.0828   LearningRate 0.0539   Epoch: 5   Global Step: 30230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:03,717-Speed 5680.05 samples/sec   Loss 6.9989   LearningRate 0.0539   Epoch: 5   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:05,550-Speed 5590.14 samples/sec   Loss 7.1211   LearningRate 0.0539   Epoch: 5   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:07,383-Speed 5588.35 samples/sec   Loss 7.1842   LearningRate 0.0539   Epoch: 5   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:09,216-Speed 5589.22 samples/sec   Loss 7.1908   LearningRate 0.0538   Epoch: 5   Global Step: 30270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:11,053-Speed 5576.82 samples/sec   Loss 7.0320   LearningRate 0.0538   Epoch: 5   Global Step: 30280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:12,921-Speed 5484.58 samples/sec   Loss 7.0686   LearningRate 0.0538   Epoch: 5   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:14,762-Speed 5566.67 samples/sec   Loss 6.8756   LearningRate 0.0538   Epoch: 5   Global Step: 30300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:16,595-Speed 5590.20 samples/sec   Loss 7.0820   LearningRate 0.0538   Epoch: 5   Global Step: 30310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:18,443-Speed 5547.81 samples/sec   Loss 7.1948   LearningRate 0.0538   Epoch: 5   Global Step: 30320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:20,254-Speed 5659.63 samples/sec   Loss 7.0479   LearningRate 0.0538   Epoch: 5   Global Step: 30330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:22,095-Speed 5564.37 samples/sec   Loss 7.3191   LearningRate 0.0538   Epoch: 5   Global Step: 30340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:23,936-Speed 5566.06 samples/sec   Loss 7.1249   LearningRate 0.0537   Epoch: 5   Global Step: 30350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:25,762-Speed 5607.72 samples/sec   Loss 7.1107   LearningRate 0.0537   Epoch: 5   Global Step: 30360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:27,583-Speed 5627.76 samples/sec   Loss 7.0894   LearningRate 0.0537   Epoch: 5   Global Step: 30370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:29,410-Speed 5608.15 samples/sec   Loss 7.1623   LearningRate 0.0537   Epoch: 5   Global Step: 30380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:31,219-Speed 5662.99 samples/sec   Loss 7.0161   LearningRate 0.0537   Epoch: 5   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:33,033-Speed 5647.40 samples/sec   Loss 6.9894   LearningRate 0.0537   Epoch: 5   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:34,878-Speed 5554.07 samples/sec   Loss 7.1691   LearningRate 0.0537   Epoch: 5   Global Step: 30410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:36,693-Speed 5642.98 samples/sec   Loss 7.0671   LearningRate 0.0537   Epoch: 5   Global Step: 30420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:38,554-Speed 5504.53 samples/sec   Loss 7.2064   LearningRate 0.0536   Epoch: 5   Global Step: 30430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:40,400-Speed 5551.67 samples/sec   Loss 7.0762   LearningRate 0.0536   Epoch: 5   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:42,250-Speed 5536.76 samples/sec   Loss 6.8777   LearningRate 0.0536   Epoch: 5   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:44,103-Speed 5531.69 samples/sec   Loss 7.0047   LearningRate 0.0536   Epoch: 5   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:45,927-Speed 5615.64 samples/sec   Loss 7.1888   LearningRate 0.0536   Epoch: 5   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:47,764-Speed 5578.06 samples/sec   Loss 7.0417   LearningRate 0.0536   Epoch: 5   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:49,607-Speed 5560.55 samples/sec   Loss 6.9701   LearningRate 0.0536   Epoch: 5   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:51,450-Speed 5557.35 samples/sec   Loss 7.0498   LearningRate 0.0536   Epoch: 5   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:53,322-Speed 5472.22 samples/sec   Loss 7.1043   LearningRate 0.0535   Epoch: 5   Global Step: 30510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:22:55,162-Speed 5570.26 samples/sec   Loss 7.1436   LearningRate 0.0535   Epoch: 5   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:56,986-Speed 5618.81 samples/sec   Loss 7.1158   LearningRate 0.0535   Epoch: 5   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:22:58,827-Speed 5562.25 samples/sec   Loss 7.2537   LearningRate 0.0535   Epoch: 5   Global Step: 30540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:00,666-Speed 5574.45 samples/sec   Loss 7.0807   LearningRate 0.0535   Epoch: 5   Global Step: 30550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:02,483-Speed 5637.00 samples/sec   Loss 7.0340   LearningRate 0.0535   Epoch: 5   Global Step: 30560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:04,339-Speed 5519.26 samples/sec   Loss 6.9575   LearningRate 0.0535   Epoch: 5   Global Step: 30570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:06,167-Speed 5606.38 samples/sec   Loss 7.2490   LearningRate 0.0534   Epoch: 5   Global Step: 30580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:07,990-Speed 5618.93 samples/sec   Loss 7.0751   LearningRate 0.0534   Epoch: 5   Global Step: 30590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:09,831-Speed 5565.96 samples/sec   Loss 7.1568   LearningRate 0.0534   Epoch: 5   Global Step: 30600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:11,657-Speed 5613.87 samples/sec   Loss 7.1516   LearningRate 0.0534   Epoch: 5   Global Step: 30610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:13,522-Speed 5492.53 samples/sec   Loss 7.1467   LearningRate 0.0534   Epoch: 5   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:15,353-Speed 5594.30 samples/sec   Loss 6.9989   LearningRate 0.0534   Epoch: 5   Global Step: 30630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:17,175-Speed 5624.65 samples/sec   Loss 7.0781   LearningRate 0.0534   Epoch: 5   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:18,999-Speed 5616.16 samples/sec   Loss 6.9980   LearningRate 0.0534   Epoch: 5   Global Step: 30650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:20,828-Speed 5601.40 samples/sec   Loss 6.9276   LearningRate 0.0533   Epoch: 5   Global Step: 30660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:22,683-Speed 5521.92 samples/sec   Loss 7.2009   LearningRate 0.0533   Epoch: 5   Global Step: 30670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:24,523-Speed 5570.99 samples/sec   Loss 7.0262   LearningRate 0.0533   Epoch: 5   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:26,363-Speed 5566.64 samples/sec   Loss 7.0279   LearningRate 0.0533   Epoch: 5   Global Step: 30690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:28,200-Speed 5577.68 samples/sec   Loss 7.0175   LearningRate 0.0533   Epoch: 5   Global Step: 30700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:30,064-Speed 5495.43 samples/sec   Loss 7.0543   LearningRate 0.0533   Epoch: 5   Global Step: 30710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:31,908-Speed 5557.65 samples/sec   Loss 6.9451   LearningRate 0.0533   Epoch: 5   Global Step: 30720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:33,752-Speed 5555.49 samples/sec   Loss 6.9481   LearningRate 0.0533   Epoch: 5   Global Step: 30730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:35,577-Speed 5613.16 samples/sec   Loss 6.8803   LearningRate 0.0532   Epoch: 5   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:37,420-Speed 5557.39 samples/sec   Loss 7.0301   LearningRate 0.0532   Epoch: 5   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:39,251-Speed 5597.12 samples/sec   Loss 7.0892   LearningRate 0.0532   Epoch: 5   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:41,085-Speed 5586.82 samples/sec   Loss 7.2584   LearningRate 0.0532   Epoch: 5   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:42,915-Speed 5597.17 samples/sec   Loss 7.0604   LearningRate 0.0532   Epoch: 5   Global Step: 30780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:44,782-Speed 5485.29 samples/sec   Loss 6.9500   LearningRate 0.0532   Epoch: 5   Global Step: 30790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:46,619-Speed 5576.19 samples/sec   Loss 7.0442   LearningRate 0.0532   Epoch: 5   Global Step: 30800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:48,475-Speed 5522.54 samples/sec   Loss 6.9467   LearningRate 0.0532   Epoch: 5   Global Step: 30810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:50,305-Speed 5596.13 samples/sec   Loss 7.1666   LearningRate 0.0531   Epoch: 5   Global Step: 30820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:52,154-Speed 5540.20 samples/sec   Loss 7.0510   LearningRate 0.0531   Epoch: 5   Global Step: 30830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:23:53,956-Speed 5683.33 samples/sec   Loss 6.9659   LearningRate 0.0531   Epoch: 5   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:55,787-Speed 5594.93 samples/sec   Loss 7.0849   LearningRate 0.0531   Epoch: 5   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:57,606-Speed 5632.80 samples/sec   Loss 7.1561   LearningRate 0.0531   Epoch: 5   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:23:59,432-Speed 5611.92 samples/sec   Loss 6.9543   LearningRate 0.0531   Epoch: 5   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:01,244-Speed 5653.72 samples/sec   Loss 7.0599   LearningRate 0.0531   Epoch: 5   Global Step: 30880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:03,089-Speed 5551.23 samples/sec   Loss 6.9704   LearningRate 0.0531   Epoch: 5   Global Step: 30890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:04,936-Speed 5548.16 samples/sec   Loss 6.8821   LearningRate 0.0530   Epoch: 5   Global Step: 30900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:06,773-Speed 5576.10 samples/sec   Loss 7.1022   LearningRate 0.0530   Epoch: 5   Global Step: 30910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:08,612-Speed 5571.03 samples/sec   Loss 7.0280   LearningRate 0.0530   Epoch: 5   Global Step: 30920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:10,435-Speed 5619.27 samples/sec   Loss 6.9022   LearningRate 0.0530   Epoch: 5   Global Step: 30930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:12,261-Speed 5609.60 samples/sec   Loss 6.9427   LearningRate 0.0530   Epoch: 5   Global Step: 30940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:24:14,086-Speed 5617.58 samples/sec   Loss 7.0299   LearningRate 0.0530   Epoch: 5   Global Step: 30950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:15,928-Speed 5576.46 samples/sec   Loss 7.1415   LearningRate 0.0530   Epoch: 5   Global Step: 30960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:17,784-Speed 5522.34 samples/sec   Loss 7.1041   LearningRate 0.0529   Epoch: 5   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:19,621-Speed 5577.15 samples/sec   Loss 6.9608   LearningRate 0.0529   Epoch: 5   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:21,454-Speed 5588.17 samples/sec   Loss 7.0351   LearningRate 0.0529   Epoch: 5   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:23,297-Speed 5558.38 samples/sec   Loss 6.9661   LearningRate 0.0529   Epoch: 5   Global Step: 31000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:25,138-Speed 5564.12 samples/sec   Loss 6.8005   LearningRate 0.0529   Epoch: 5   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:26,993-Speed 5524.25 samples/sec   Loss 6.9095   LearningRate 0.0529   Epoch: 5   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:28,851-Speed 5513.85 samples/sec   Loss 7.1393   LearningRate 0.0529   Epoch: 5   Global Step: 31030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:30,691-Speed 5567.67 samples/sec   Loss 7.0804   LearningRate 0.0529   Epoch: 5   Global Step: 31040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:32,527-Speed 5579.47 samples/sec   Loss 7.0347   LearningRate 0.0528   Epoch: 5   Global Step: 31050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:24:34,383-Speed 5547.67 samples/sec   Loss 6.8331   LearningRate 0.0528   Epoch: 5   Global Step: 31060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:24:36,236-Speed 5527.72 samples/sec   Loss 7.1609   LearningRate 0.0528   Epoch: 5   Global Step: 31070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:24:38,103-Speed 5488.66 samples/sec   Loss 7.0225   LearningRate 0.0528   Epoch: 5   Global Step: 31080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:24:39,932-Speed 5602.51 samples/sec   Loss 6.8231   LearningRate 0.0528   Epoch: 5   Global Step: 31090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:41,783-Speed 5537.82 samples/sec   Loss 6.9888   LearningRate 0.0528   Epoch: 5   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:43,604-Speed 5625.43 samples/sec   Loss 6.9686   LearningRate 0.0528   Epoch: 5   Global Step: 31110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:45,438-Speed 5586.22 samples/sec   Loss 7.0475   LearningRate 0.0528   Epoch: 5   Global Step: 31120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:47,269-Speed 5595.23 samples/sec   Loss 6.9981   LearningRate 0.0527   Epoch: 5   Global Step: 31130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:49,145-Speed 5462.05 samples/sec   Loss 6.9163   LearningRate 0.0527   Epoch: 5   Global Step: 31140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:51,071-Speed 5319.64 samples/sec   Loss 6.9897   LearningRate 0.0527   Epoch: 5   Global Step: 31150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:52,994-Speed 5329.45 samples/sec   Loss 7.1452   LearningRate 0.0527   Epoch: 5   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:54,918-Speed 5324.10 samples/sec   Loss 7.0468   LearningRate 0.0527   Epoch: 5   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:56,842-Speed 5323.05 samples/sec   Loss 7.0923   LearningRate 0.0527   Epoch: 5   Global Step: 31180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:24:58,768-Speed 5320.46 samples/sec   Loss 6.9082   LearningRate 0.0527   Epoch: 5   Global Step: 31190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:25:00,695-Speed 5316.28 samples/sec   Loss 7.0005   LearningRate 0.0527   Epoch: 5   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:25:02,623-Speed 5315.37 samples/sec   Loss 7.1752   LearningRate 0.0526   Epoch: 5   Global Step: 31210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:25:04,484-Speed 5507.81 samples/sec   Loss 7.0588   LearningRate 0.0526   Epoch: 5   Global Step: 31220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:25:06,311-Speed 5605.70 samples/sec   Loss 7.1026   LearningRate 0.0526   Epoch: 5   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:08,148-Speed 5576.88 samples/sec   Loss 7.0782   LearningRate 0.0526   Epoch: 5   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:09,982-Speed 5590.33 samples/sec   Loss 7.0432   LearningRate 0.0526   Epoch: 5   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:11,814-Speed 5591.74 samples/sec   Loss 7.0757   LearningRate 0.0526   Epoch: 5   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:13,681-Speed 5488.33 samples/sec   Loss 6.9831   LearningRate 0.0526   Epoch: 5   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:15,514-Speed 5589.10 samples/sec   Loss 7.2018   LearningRate 0.0526   Epoch: 5   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:17,388-Speed 5466.41 samples/sec   Loss 7.1737   LearningRate 0.0525   Epoch: 5   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:19,230-Speed 5563.97 samples/sec   Loss 7.0344   LearningRate 0.0525   Epoch: 5   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:21,062-Speed 5591.22 samples/sec   Loss 7.0551   LearningRate 0.0525   Epoch: 5   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:22,891-Speed 5602.06 samples/sec   Loss 6.8964   LearningRate 0.0525   Epoch: 5   Global Step: 31320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:24,687-Speed 5701.17 samples/sec   Loss 6.8683   LearningRate 0.0525   Epoch: 5   Global Step: 31330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:26,525-Speed 5576.61 samples/sec   Loss 6.8989   LearningRate 0.0525   Epoch: 5   Global Step: 31340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:28,387-Speed 5501.23 samples/sec   Loss 6.9303   LearningRate 0.0525   Epoch: 5   Global Step: 31350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:30,213-Speed 5608.87 samples/sec   Loss 6.8236   LearningRate 0.0525   Epoch: 5   Global Step: 31360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:32,032-Speed 5631.42 samples/sec   Loss 6.9066   LearningRate 0.0524   Epoch: 5   Global Step: 31370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:33,845-Speed 5652.75 samples/sec   Loss 7.0591   LearningRate 0.0524   Epoch: 5   Global Step: 31380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:35,655-Speed 5659.02 samples/sec   Loss 7.1044   LearningRate 0.0524   Epoch: 5   Global Step: 31390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:37,500-Speed 5554.56 samples/sec   Loss 6.8619   LearningRate 0.0524   Epoch: 5   Global Step: 31400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:39,353-Speed 5525.58 samples/sec   Loss 6.8893   LearningRate 0.0524   Epoch: 5   Global Step: 31410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:41,187-Speed 5588.87 samples/sec   Loss 6.8872   LearningRate 0.0524   Epoch: 5   Global Step: 31420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:25:43,014-Speed 5609.12 samples/sec   Loss 7.0421   LearningRate 0.0524   Epoch: 5   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:44,878-Speed 5497.51 samples/sec   Loss 7.0641   LearningRate 0.0523   Epoch: 5   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:46,726-Speed 5545.23 samples/sec   Loss 6.9870   LearningRate 0.0523   Epoch: 5   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:48,554-Speed 5604.85 samples/sec   Loss 6.9612   LearningRate 0.0523   Epoch: 5   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:50,384-Speed 5599.50 samples/sec   Loss 6.8290   LearningRate 0.0523   Epoch: 5   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:52,206-Speed 5621.31 samples/sec   Loss 7.0368   LearningRate 0.0523   Epoch: 5   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:54,052-Speed 5551.30 samples/sec   Loss 6.9769   LearningRate 0.0523   Epoch: 5   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:55,917-Speed 5492.98 samples/sec   Loss 7.0518   LearningRate 0.0523   Epoch: 5   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:57,768-Speed 5533.89 samples/sec   Loss 6.9059   LearningRate 0.0523   Epoch: 5   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:25:59,629-Speed 5507.89 samples/sec   Loss 6.9522   LearningRate 0.0522   Epoch: 5   Global Step: 31520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:01,490-Speed 5504.72 samples/sec   Loss 6.8381   LearningRate 0.0522   Epoch: 5   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:03,339-Speed 5541.89 samples/sec   Loss 7.1866   LearningRate 0.0522   Epoch: 5   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:05,188-Speed 5539.77 samples/sec   Loss 6.9576   LearningRate 0.0522   Epoch: 5   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:07,021-Speed 5589.62 samples/sec   Loss 6.9283   LearningRate 0.0522   Epoch: 5   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:08,854-Speed 5588.28 samples/sec   Loss 6.9266   LearningRate 0.0522   Epoch: 5   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:10,669-Speed 5644.21 samples/sec   Loss 6.9519   LearningRate 0.0522   Epoch: 5   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:12,482-Speed 5653.92 samples/sec   Loss 6.9328   LearningRate 0.0522   Epoch: 5   Global Step: 31590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:14,310-Speed 5604.69 samples/sec   Loss 6.9865   LearningRate 0.0521   Epoch: 5   Global Step: 31600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:16,162-Speed 5531.02 samples/sec   Loss 6.9455   LearningRate 0.0521   Epoch: 5   Global Step: 31610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:17,995-Speed 5590.74 samples/sec   Loss 6.9899   LearningRate 0.0521   Epoch: 5   Global Step: 31620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:19,825-Speed 5598.44 samples/sec   Loss 7.1106   LearningRate 0.0521   Epoch: 5   Global Step: 31630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:21,653-Speed 5602.44 samples/sec   Loss 6.8541   LearningRate 0.0521   Epoch: 5   Global Step: 31640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:23,478-Speed 5616.12 samples/sec   Loss 6.9356   LearningRate 0.0521   Epoch: 5   Global Step: 31650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:25,314-Speed 5580.57 samples/sec   Loss 6.9712   LearningRate 0.0521   Epoch: 5   Global Step: 31660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:27,170-Speed 5518.07 samples/sec   Loss 6.9075   LearningRate 0.0521   Epoch: 5   Global Step: 31670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:28,989-Speed 5633.24 samples/sec   Loss 7.0926   LearningRate 0.0520   Epoch: 5   Global Step: 31680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:26:30,827-Speed 5574.35 samples/sec   Loss 6.9031   LearningRate 0.0520   Epoch: 5   Global Step: 31690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:32,654-Speed 5608.33 samples/sec   Loss 6.8883   LearningRate 0.0520   Epoch: 5   Global Step: 31700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:34,470-Speed 5642.66 samples/sec   Loss 7.0947   LearningRate 0.0520   Epoch: 5   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:36,294-Speed 5617.42 samples/sec   Loss 7.0364   LearningRate 0.0520   Epoch: 5   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:38,129-Speed 5582.23 samples/sec   Loss 7.0069   LearningRate 0.0520   Epoch: 5   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:39,946-Speed 5638.04 samples/sec   Loss 6.8681   LearningRate 0.0520   Epoch: 5   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:41,791-Speed 5553.57 samples/sec   Loss 7.0663   LearningRate 0.0520   Epoch: 5   Global Step: 31750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:43,620-Speed 5601.18 samples/sec   Loss 6.9268   LearningRate 0.0519   Epoch: 5   Global Step: 31760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:45,443-Speed 5619.98 samples/sec   Loss 7.0019   LearningRate 0.0519   Epoch: 5   Global Step: 31770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:47,276-Speed 5589.06 samples/sec   Loss 6.9366   LearningRate 0.0519   Epoch: 5   Global Step: 31780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:49,096-Speed 5627.40 samples/sec   Loss 6.9106   LearningRate 0.0519   Epoch: 5   Global Step: 31790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:50,944-Speed 5542.47 samples/sec   Loss 6.9494   LearningRate 0.0519   Epoch: 5   Global Step: 31800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:52,774-Speed 5597.40 samples/sec   Loss 7.0066   LearningRate 0.0519   Epoch: 5   Global Step: 31810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:54,628-Speed 5527.69 samples/sec   Loss 6.9736   LearningRate 0.0519   Epoch: 5   Global Step: 31820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:56,465-Speed 5575.53 samples/sec   Loss 7.1330   LearningRate 0.0519   Epoch: 5   Global Step: 31830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:26:58,321-Speed 5522.61 samples/sec   Loss 6.9117   LearningRate 0.0518   Epoch: 5   Global Step: 31840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:27:00,138-Speed 5637.21 samples/sec   Loss 6.9682   LearningRate 0.0518   Epoch: 5   Global Step: 31850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:01,967-Speed 5602.41 samples/sec   Loss 6.7647   LearningRate 0.0518   Epoch: 5   Global Step: 31860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:03,791-Speed 5616.47 samples/sec   Loss 6.9388   LearningRate 0.0518   Epoch: 5   Global Step: 31870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:05,620-Speed 5600.78 samples/sec   Loss 6.9294   LearningRate 0.0518   Epoch: 5   Global Step: 31880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:07,459-Speed 5570.41 samples/sec   Loss 6.9610   LearningRate 0.0518   Epoch: 5   Global Step: 31890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:09,304-Speed 5554.13 samples/sec   Loss 7.0137   LearningRate 0.0518   Epoch: 5   Global Step: 31900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:11,177-Speed 5470.36 samples/sec   Loss 6.9715   LearningRate 0.0518   Epoch: 5   Global Step: 31910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:13,066-Speed 5420.97 samples/sec   Loss 7.0002   LearningRate 0.0517   Epoch: 5   Global Step: 31920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:14,948-Speed 5445.88 samples/sec   Loss 7.1102   LearningRate 0.0517   Epoch: 5   Global Step: 31930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:16,780-Speed 5591.50 samples/sec   Loss 6.9682   LearningRate 0.0517   Epoch: 5   Global Step: 31940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:18,625-Speed 5553.32 samples/sec   Loss 7.0530   LearningRate 0.0517   Epoch: 5   Global Step: 31950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:27:20,463-Speed 5572.39 samples/sec   Loss 6.9556   LearningRate 0.0517   Epoch: 5   Global Step: 31960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:27:22,279-Speed 5642.52 samples/sec   Loss 6.9600   LearningRate 0.0517   Epoch: 5   Global Step: 31970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:24,115-Speed 5579.35 samples/sec   Loss 6.9893   LearningRate 0.0517   Epoch: 5   Global Step: 31980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:25,937-Speed 5624.87 samples/sec   Loss 7.0177   LearningRate 0.0517   Epoch: 5   Global Step: 31990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:27,813-Speed 5462.06 samples/sec   Loss 7.0371   LearningRate 0.0516   Epoch: 5   Global Step: 32000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:27:57,109-[lfw][32000]XNorm: 22.176498
Training: 2022-04-27 03:27:57,110-[lfw][32000]Accuracy-Flip: 0.99650+-0.00345
Training: 2022-04-27 03:27:57,110-[lfw][32000]Accuracy-Highest: 0.99717
Training: 2022-04-27 03:28:28,314-[cfp_fp][32000]XNorm: 19.174454
Training: 2022-04-27 03:28:28,315-[cfp_fp][32000]Accuracy-Flip: 0.94771+-0.01044
Training: 2022-04-27 03:28:28,316-[cfp_fp][32000]Accuracy-Highest: 0.94771
Training: 2022-04-27 03:28:56,109-[agedb_30][32000]XNorm: 21.835893
Training: 2022-04-27 03:28:56,109-[agedb_30][32000]Accuracy-Flip: 0.96683+-0.00953
Training: 2022-04-27 03:28:56,110-[agedb_30][32000]Accuracy-Highest: 0.96883
Training: 2022-04-27 03:28:57,962-Speed 113.59 samples/sec   Loss 6.9282   LearningRate 0.0516   Epoch: 5   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:28:59,806-Speed 5559.12 samples/sec   Loss 6.8021   LearningRate 0.0516   Epoch: 5   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:01,629-Speed 5620.44 samples/sec   Loss 6.9622   LearningRate 0.0516   Epoch: 5   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:03,465-Speed 5581.08 samples/sec   Loss 6.9314   LearningRate 0.0516   Epoch: 5   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:05,275-Speed 5662.25 samples/sec   Loss 6.8034   LearningRate 0.0516   Epoch: 5   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:07,105-Speed 5597.17 samples/sec   Loss 6.9730   LearningRate 0.0516   Epoch: 5   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:08,940-Speed 5585.01 samples/sec   Loss 7.0174   LearningRate 0.0516   Epoch: 5   Global Step: 32070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:10,781-Speed 5565.18 samples/sec   Loss 6.9010   LearningRate 0.0515   Epoch: 5   Global Step: 32080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:12,594-Speed 5653.40 samples/sec   Loss 7.0084   LearningRate 0.0515   Epoch: 5   Global Step: 32090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:14,434-Speed 5566.68 samples/sec   Loss 7.0044   LearningRate 0.0515   Epoch: 5   Global Step: 32100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:16,275-Speed 5566.49 samples/sec   Loss 6.7955   LearningRate 0.0515   Epoch: 5   Global Step: 32110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:18,108-Speed 5588.97 samples/sec   Loss 6.7964   LearningRate 0.0515   Epoch: 5   Global Step: 32120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:19,936-Speed 5606.90 samples/sec   Loss 6.9159   LearningRate 0.0515   Epoch: 5   Global Step: 32130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:21,815-Speed 5450.33 samples/sec   Loss 7.0787   LearningRate 0.0515   Epoch: 5   Global Step: 32140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:23,656-Speed 5567.54 samples/sec   Loss 6.8884   LearningRate 0.0515   Epoch: 5   Global Step: 32150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:25,492-Speed 5577.89 samples/sec   Loss 7.0705   LearningRate 0.0514   Epoch: 5   Global Step: 32160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:27,390-Speed 5400.43 samples/sec   Loss 7.1282   LearningRate 0.0514   Epoch: 5   Global Step: 32170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:29,220-Speed 5599.78 samples/sec   Loss 6.9864   LearningRate 0.0514   Epoch: 5   Global Step: 32180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:31,065-Speed 5552.90 samples/sec   Loss 7.0177   LearningRate 0.0514   Epoch: 5   Global Step: 32190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:32,896-Speed 5593.88 samples/sec   Loss 6.9977   LearningRate 0.0514   Epoch: 5   Global Step: 32200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:34,725-Speed 5605.40 samples/sec   Loss 7.1232   LearningRate 0.0514   Epoch: 5   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:36,585-Speed 5506.85 samples/sec   Loss 6.9478   LearningRate 0.0514   Epoch: 5   Global Step: 32220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:38,409-Speed 5618.50 samples/sec   Loss 7.0855   LearningRate 0.0513   Epoch: 5   Global Step: 32230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:40,238-Speed 5601.23 samples/sec   Loss 6.8555   LearningRate 0.0513   Epoch: 5   Global Step: 32240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:42,067-Speed 5601.36 samples/sec   Loss 6.9620   LearningRate 0.0513   Epoch: 5   Global Step: 32250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:43,880-Speed 5652.47 samples/sec   Loss 6.9331   LearningRate 0.0513   Epoch: 5   Global Step: 32260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:45,728-Speed 5544.32 samples/sec   Loss 6.9586   LearningRate 0.0513   Epoch: 5   Global Step: 32270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:47,578-Speed 5539.55 samples/sec   Loss 6.9261   LearningRate 0.0513   Epoch: 5   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:49,394-Speed 5640.83 samples/sec   Loss 6.9663   LearningRate 0.0513   Epoch: 5   Global Step: 32290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:29:51,228-Speed 5584.22 samples/sec   Loss 6.9369   LearningRate 0.0513   Epoch: 5   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:53,062-Speed 5587.22 samples/sec   Loss 6.8958   LearningRate 0.0512   Epoch: 5   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:54,890-Speed 5604.36 samples/sec   Loss 7.0311   LearningRate 0.0512   Epoch: 5   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:56,716-Speed 5612.91 samples/sec   Loss 6.9680   LearningRate 0.0512   Epoch: 5   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:29:58,567-Speed 5535.56 samples/sec   Loss 6.9349   LearningRate 0.0512   Epoch: 5   Global Step: 32340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:00,413-Speed 5548.79 samples/sec   Loss 6.9415   LearningRate 0.0512   Epoch: 5   Global Step: 32350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:02,304-Speed 5418.92 samples/sec   Loss 6.8846   LearningRate 0.0512   Epoch: 5   Global Step: 32360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:04,184-Speed 5450.04 samples/sec   Loss 7.0452   LearningRate 0.0512   Epoch: 5   Global Step: 32370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:06,022-Speed 5574.32 samples/sec   Loss 6.7893   LearningRate 0.0512   Epoch: 5   Global Step: 32380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:07,850-Speed 5605.12 samples/sec   Loss 6.9299   LearningRate 0.0511   Epoch: 5   Global Step: 32390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:09,721-Speed 5475.83 samples/sec   Loss 6.9581   LearningRate 0.0511   Epoch: 5   Global Step: 32400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:30:11,568-Speed 5545.47 samples/sec   Loss 7.0490   LearningRate 0.0511   Epoch: 5   Global Step: 32410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:30:13,422-Speed 5527.76 samples/sec   Loss 7.0552   LearningRate 0.0511   Epoch: 5   Global Step: 32420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:15,243-Speed 5624.28 samples/sec   Loss 6.9958   LearningRate 0.0511   Epoch: 5   Global Step: 32430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:17,075-Speed 5593.50 samples/sec   Loss 6.7806   LearningRate 0.0511   Epoch: 5   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:18,920-Speed 5556.23 samples/sec   Loss 6.8027   LearningRate 0.0511   Epoch: 5   Global Step: 32450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:20,734-Speed 5646.85 samples/sec   Loss 6.9429   LearningRate 0.0511   Epoch: 5   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:22,558-Speed 5617.44 samples/sec   Loss 6.8233   LearningRate 0.0510   Epoch: 5   Global Step: 32470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:24,386-Speed 5605.32 samples/sec   Loss 6.9214   LearningRate 0.0510   Epoch: 5   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:26,216-Speed 5597.04 samples/sec   Loss 6.8502   LearningRate 0.0510   Epoch: 5   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:28,038-Speed 5626.60 samples/sec   Loss 6.8860   LearningRate 0.0510   Epoch: 5   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:29,868-Speed 5598.05 samples/sec   Loss 7.0676   LearningRate 0.0510   Epoch: 5   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:31,708-Speed 5566.98 samples/sec   Loss 7.0472   LearningRate 0.0510   Epoch: 5   Global Step: 32520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:30:33,549-Speed 5565.73 samples/sec   Loss 6.8661   LearningRate 0.0510   Epoch: 5   Global Step: 32530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:30:35,367-Speed 5635.48 samples/sec   Loss 6.8273   LearningRate 0.0510   Epoch: 5   Global Step: 32540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:30:37,199-Speed 5594.69 samples/sec   Loss 6.7623   LearningRate 0.0509   Epoch: 5   Global Step: 32550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:30:39,019-Speed 5630.50 samples/sec   Loss 6.9342   LearningRate 0.0509   Epoch: 5   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:40,859-Speed 5566.21 samples/sec   Loss 6.8998   LearningRate 0.0509   Epoch: 5   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:42,705-Speed 5552.86 samples/sec   Loss 6.9057   LearningRate 0.0509   Epoch: 5   Global Step: 32580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:44,556-Speed 5533.48 samples/sec   Loss 6.9541   LearningRate 0.0509   Epoch: 5   Global Step: 32590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:46,420-Speed 5498.00 samples/sec   Loss 6.7134   LearningRate 0.0509   Epoch: 5   Global Step: 32600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:48,251-Speed 5595.59 samples/sec   Loss 6.9505   LearningRate 0.0509   Epoch: 5   Global Step: 32610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:50,060-Speed 5662.86 samples/sec   Loss 7.0860   LearningRate 0.0509   Epoch: 5   Global Step: 32620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:51,904-Speed 5557.96 samples/sec   Loss 6.9327   LearningRate 0.0508   Epoch: 5   Global Step: 32630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:53,734-Speed 5597.05 samples/sec   Loss 6.8654   LearningRate 0.0508   Epoch: 5   Global Step: 32640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:55,550-Speed 5641.99 samples/sec   Loss 6.9587   LearningRate 0.0508   Epoch: 5   Global Step: 32650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:57,372-Speed 5624.30 samples/sec   Loss 6.8148   LearningRate 0.0508   Epoch: 5   Global Step: 32660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:30:59,185-Speed 5650.87 samples/sec   Loss 6.9513   LearningRate 0.0508   Epoch: 5   Global Step: 32670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:01,004-Speed 5632.82 samples/sec   Loss 6.9781   LearningRate 0.0508   Epoch: 5   Global Step: 32680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:02,833-Speed 5601.69 samples/sec   Loss 6.8264   LearningRate 0.0508   Epoch: 5   Global Step: 32690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:04,658-Speed 5614.67 samples/sec   Loss 7.0216   LearningRate 0.0508   Epoch: 5   Global Step: 32700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:06,494-Speed 5578.97 samples/sec   Loss 7.0430   LearningRate 0.0507   Epoch: 5   Global Step: 32710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:08,320-Speed 5613.27 samples/sec   Loss 6.7838   LearningRate 0.0507   Epoch: 5   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:10,173-Speed 5526.28 samples/sec   Loss 6.9564   LearningRate 0.0507   Epoch: 5   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:12,002-Speed 5602.83 samples/sec   Loss 6.8672   LearningRate 0.0507   Epoch: 5   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:13,843-Speed 5564.67 samples/sec   Loss 6.9400   LearningRate 0.0507   Epoch: 5   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:15,684-Speed 5567.22 samples/sec   Loss 6.8286   LearningRate 0.0507   Epoch: 5   Global Step: 32760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:17,501-Speed 5638.82 samples/sec   Loss 6.8689   LearningRate 0.0507   Epoch: 5   Global Step: 32770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:19,334-Speed 5590.30 samples/sec   Loss 6.9951   LearningRate 0.0507   Epoch: 5   Global Step: 32780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:21,165-Speed 5597.33 samples/sec   Loss 6.8005   LearningRate 0.0506   Epoch: 5   Global Step: 32790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:22,999-Speed 5584.98 samples/sec   Loss 6.8706   LearningRate 0.0506   Epoch: 5   Global Step: 32800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:24,848-Speed 5542.46 samples/sec   Loss 6.8445   LearningRate 0.0506   Epoch: 5   Global Step: 32810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:26,699-Speed 5536.38 samples/sec   Loss 6.8758   LearningRate 0.0506   Epoch: 5   Global Step: 32820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:28,530-Speed 5595.74 samples/sec   Loss 6.8243   LearningRate 0.0506   Epoch: 5   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:30,360-Speed 5597.01 samples/sec   Loss 6.8574   LearningRate 0.0506   Epoch: 5   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:32,177-Speed 5642.69 samples/sec   Loss 6.8706   LearningRate 0.0506   Epoch: 5   Global Step: 32850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:33,999-Speed 5622.23 samples/sec   Loss 6.9326   LearningRate 0.0506   Epoch: 5   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:35,832-Speed 5590.61 samples/sec   Loss 6.8412   LearningRate 0.0505   Epoch: 5   Global Step: 32870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:37,667-Speed 5582.13 samples/sec   Loss 6.7665   LearningRate 0.0505   Epoch: 5   Global Step: 32880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:39,502-Speed 5584.82 samples/sec   Loss 6.9803   LearningRate 0.0505   Epoch: 5   Global Step: 32890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:41,334-Speed 5591.57 samples/sec   Loss 6.9593   LearningRate 0.0505   Epoch: 5   Global Step: 32900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:43,187-Speed 5529.76 samples/sec   Loss 6.8236   LearningRate 0.0505   Epoch: 5   Global Step: 32910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:45,011-Speed 5614.96 samples/sec   Loss 6.9051   LearningRate 0.0505   Epoch: 5   Global Step: 32920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:46,838-Speed 5609.46 samples/sec   Loss 6.8946   LearningRate 0.0505   Epoch: 5   Global Step: 32930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:48,664-Speed 5612.34 samples/sec   Loss 6.8989   LearningRate 0.0505   Epoch: 5   Global Step: 32940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:50,497-Speed 5590.06 samples/sec   Loss 6.7450   LearningRate 0.0504   Epoch: 5   Global Step: 32950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:31:52,328-Speed 5594.02 samples/sec   Loss 7.0546   LearningRate 0.0504   Epoch: 5   Global Step: 32960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:54,162-Speed 5587.47 samples/sec   Loss 6.8494   LearningRate 0.0504   Epoch: 5   Global Step: 32970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:55,986-Speed 5614.52 samples/sec   Loss 6.9051   LearningRate 0.0504   Epoch: 5   Global Step: 32980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:57,825-Speed 5574.80 samples/sec   Loss 7.0230   LearningRate 0.0504   Epoch: 5   Global Step: 32990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:31:59,688-Speed 5499.47 samples/sec   Loss 6.7334   LearningRate 0.0504   Epoch: 5   Global Step: 33000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:01,520-Speed 5590.83 samples/sec   Loss 6.7985   LearningRate 0.0504   Epoch: 5   Global Step: 33010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:03,343-Speed 5621.83 samples/sec   Loss 6.8676   LearningRate 0.0504   Epoch: 5   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:05,166-Speed 5619.51 samples/sec   Loss 6.8652   LearningRate 0.0503   Epoch: 5   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:06,980-Speed 5650.75 samples/sec   Loss 6.8374   LearningRate 0.0503   Epoch: 5   Global Step: 33040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:08,821-Speed 5561.25 samples/sec   Loss 6.8425   LearningRate 0.0503   Epoch: 5   Global Step: 33050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:10,654-Speed 5592.17 samples/sec   Loss 6.8955   LearningRate 0.0503   Epoch: 5   Global Step: 33060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:12,490-Speed 5577.06 samples/sec   Loss 6.8684   LearningRate 0.0503   Epoch: 5   Global Step: 33070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:14,341-Speed 5534.68 samples/sec   Loss 6.8826   LearningRate 0.0503   Epoch: 5   Global Step: 33080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:16,172-Speed 5597.39 samples/sec   Loss 7.0700   LearningRate 0.0503   Epoch: 5   Global Step: 33090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:18,005-Speed 5589.07 samples/sec   Loss 6.7896   LearningRate 0.0503   Epoch: 5   Global Step: 33100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:19,839-Speed 5586.80 samples/sec   Loss 7.0187   LearningRate 0.0502   Epoch: 5   Global Step: 33110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:21,669-Speed 5598.37 samples/sec   Loss 6.6778   LearningRate 0.0502   Epoch: 5   Global Step: 33120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:23,524-Speed 5522.86 samples/sec   Loss 6.8910   LearningRate 0.0502   Epoch: 5   Global Step: 33130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:25,375-Speed 5536.14 samples/sec   Loss 6.7918   LearningRate 0.0502   Epoch: 5   Global Step: 33140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:27,202-Speed 5607.05 samples/sec   Loss 6.8280   LearningRate 0.0502   Epoch: 5   Global Step: 33150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:29,070-Speed 5486.43 samples/sec   Loss 6.8364   LearningRate 0.0502   Epoch: 5   Global Step: 33160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:30,930-Speed 5510.49 samples/sec   Loss 6.8739   LearningRate 0.0502   Epoch: 5   Global Step: 33170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:32,763-Speed 5589.13 samples/sec   Loss 6.7009   LearningRate 0.0502   Epoch: 5   Global Step: 33180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:34,600-Speed 5577.63 samples/sec   Loss 6.8691   LearningRate 0.0501   Epoch: 5   Global Step: 33190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:36,433-Speed 5589.96 samples/sec   Loss 6.9075   LearningRate 0.0501   Epoch: 5   Global Step: 33200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:38,327-Speed 5408.61 samples/sec   Loss 6.8142   LearningRate 0.0501   Epoch: 5   Global Step: 33210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:40,160-Speed 5589.56 samples/sec   Loss 6.9315   LearningRate 0.0501   Epoch: 5   Global Step: 33220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:41,982-Speed 5624.48 samples/sec   Loss 6.8491   LearningRate 0.0501   Epoch: 5   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:32:43,810-Speed 5604.07 samples/sec   Loss 6.7781   LearningRate 0.0501   Epoch: 5   Global Step: 33240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:45,679-Speed 5479.48 samples/sec   Loss 6.7029   LearningRate 0.0501   Epoch: 5   Global Step: 33250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:47,564-Speed 5437.40 samples/sec   Loss 6.9643   LearningRate 0.0501   Epoch: 5   Global Step: 33260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:49,406-Speed 5561.26 samples/sec   Loss 6.7780   LearningRate 0.0500   Epoch: 5   Global Step: 33270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:51,243-Speed 5578.22 samples/sec   Loss 6.9005   LearningRate 0.0500   Epoch: 5   Global Step: 33280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:53,103-Speed 5507.41 samples/sec   Loss 6.6649   LearningRate 0.0500   Epoch: 5   Global Step: 33290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:54,944-Speed 5563.33 samples/sec   Loss 6.8121   LearningRate 0.0500   Epoch: 5   Global Step: 33300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:56,775-Speed 5594.23 samples/sec   Loss 6.7685   LearningRate 0.0500   Epoch: 5   Global Step: 33310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:32:58,599-Speed 5617.42 samples/sec   Loss 7.0180   LearningRate 0.0500   Epoch: 5   Global Step: 33320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:00,456-Speed 5517.10 samples/sec   Loss 6.7561   LearningRate 0.0500   Epoch: 5   Global Step: 33330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:02,280-Speed 5619.08 samples/sec   Loss 6.8326   LearningRate 0.0500   Epoch: 5   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:04,113-Speed 5586.90 samples/sec   Loss 6.8750   LearningRate 0.0499   Epoch: 5   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:05,980-Speed 5490.42 samples/sec   Loss 6.7364   LearningRate 0.0499   Epoch: 5   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:07,803-Speed 5619.60 samples/sec   Loss 6.8270   LearningRate 0.0499   Epoch: 5   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:09,638-Speed 5584.66 samples/sec   Loss 6.9194   LearningRate 0.0499   Epoch: 5   Global Step: 33380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:11,454-Speed 5641.78 samples/sec   Loss 6.7643   LearningRate 0.0499   Epoch: 5   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:13,269-Speed 5645.91 samples/sec   Loss 6.8597   LearningRate 0.0499   Epoch: 5   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:15,090-Speed 5626.13 samples/sec   Loss 6.8121   LearningRate 0.0499   Epoch: 5   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:16,928-Speed 5573.37 samples/sec   Loss 6.8481   LearningRate 0.0499   Epoch: 5   Global Step: 33420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:18,747-Speed 5632.57 samples/sec   Loss 6.8592   LearningRate 0.0498   Epoch: 5   Global Step: 33430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:20,552-Speed 5676.33 samples/sec   Loss 6.8001   LearningRate 0.0498   Epoch: 5   Global Step: 33440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:22,373-Speed 5625.80 samples/sec   Loss 6.9341   LearningRate 0.0498   Epoch: 5   Global Step: 33450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:24,200-Speed 5606.64 samples/sec   Loss 6.8500   LearningRate 0.0498   Epoch: 5   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:26,024-Speed 5618.49 samples/sec   Loss 6.7992   LearningRate 0.0498   Epoch: 5   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:27,854-Speed 5597.52 samples/sec   Loss 6.8568   LearningRate 0.0498   Epoch: 5   Global Step: 33480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:29,673-Speed 5633.37 samples/sec   Loss 6.8172   LearningRate 0.0498   Epoch: 5   Global Step: 33490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:31,493-Speed 5630.17 samples/sec   Loss 6.9347   LearningRate 0.0498   Epoch: 5   Global Step: 33500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:33,310-Speed 5638.96 samples/sec   Loss 6.9162   LearningRate 0.0497   Epoch: 5   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:35,140-Speed 5597.25 samples/sec   Loss 6.8513   LearningRate 0.0497   Epoch: 5   Global Step: 33520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:36,971-Speed 5596.84 samples/sec   Loss 6.7930   LearningRate 0.0497   Epoch: 5   Global Step: 33530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:38,782-Speed 5658.33 samples/sec   Loss 7.0006   LearningRate 0.0497   Epoch: 5   Global Step: 33540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:40,609-Speed 5605.14 samples/sec   Loss 6.8002   LearningRate 0.0497   Epoch: 5   Global Step: 33550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:33:42,468-Speed 5514.43 samples/sec   Loss 6.9148   LearningRate 0.0497   Epoch: 5   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:44,301-Speed 5591.52 samples/sec   Loss 6.6762   LearningRate 0.0497   Epoch: 5   Global Step: 33570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:46,131-Speed 5595.03 samples/sec   Loss 6.9138   LearningRate 0.0497   Epoch: 5   Global Step: 33580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:47,956-Speed 5615.20 samples/sec   Loss 6.8668   LearningRate 0.0496   Epoch: 5   Global Step: 33590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:49,796-Speed 5568.36 samples/sec   Loss 6.7486   LearningRate 0.0496   Epoch: 5   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:51,627-Speed 5596.57 samples/sec   Loss 6.7421   LearningRate 0.0496   Epoch: 5   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:53,461-Speed 5586.77 samples/sec   Loss 6.7165   LearningRate 0.0496   Epoch: 5   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:55,297-Speed 5580.69 samples/sec   Loss 6.7471   LearningRate 0.0496   Epoch: 5   Global Step: 33630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:57,120-Speed 5617.81 samples/sec   Loss 6.9063   LearningRate 0.0496   Epoch: 5   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:33:58,949-Speed 5603.24 samples/sec   Loss 6.9459   LearningRate 0.0496   Epoch: 5   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:34:00,782-Speed 5587.62 samples/sec   Loss 6.8556   LearningRate 0.0496   Epoch: 5   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:34:02,621-Speed 5572.55 samples/sec   Loss 6.7695   LearningRate 0.0496   Epoch: 5   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:34:04,446-Speed 5612.54 samples/sec   Loss 6.8822   LearningRate 0.0495   Epoch: 5   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:34:06,271-Speed 5613.30 samples/sec   Loss 6.7921   LearningRate 0.0495   Epoch: 5   Global Step: 33690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:34:08,082-Speed 5659.06 samples/sec   Loss 6.8228   LearningRate 0.0495   Epoch: 5   Global Step: 33700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:09,911-Speed 5600.46 samples/sec   Loss 6.7327   LearningRate 0.0495   Epoch: 5   Global Step: 33710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:11,761-Speed 5539.81 samples/sec   Loss 6.7076   LearningRate 0.0495   Epoch: 5   Global Step: 33720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:13,599-Speed 5573.74 samples/sec   Loss 6.8318   LearningRate 0.0495   Epoch: 5   Global Step: 33730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:15,442-Speed 5558.22 samples/sec   Loss 6.8340   LearningRate 0.0495   Epoch: 5   Global Step: 33740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:17,282-Speed 5569.95 samples/sec   Loss 6.6421   LearningRate 0.0495   Epoch: 5   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:19,115-Speed 5588.25 samples/sec   Loss 6.6788   LearningRate 0.0494   Epoch: 5   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:20,949-Speed 5586.59 samples/sec   Loss 6.6766   LearningRate 0.0494   Epoch: 5   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:22,795-Speed 5552.57 samples/sec   Loss 6.7142   LearningRate 0.0494   Epoch: 5   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:24,608-Speed 5648.57 samples/sec   Loss 6.8183   LearningRate 0.0494   Epoch: 5   Global Step: 33790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:26,445-Speed 5579.80 samples/sec   Loss 6.9685   LearningRate 0.0494   Epoch: 5   Global Step: 33800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:28,299-Speed 5526.64 samples/sec   Loss 6.8192   LearningRate 0.0494   Epoch: 5   Global Step: 33810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:30,138-Speed 5570.16 samples/sec   Loss 6.7602   LearningRate 0.0494   Epoch: 5   Global Step: 33820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:31,990-Speed 5531.71 samples/sec   Loss 6.8460   LearningRate 0.0494   Epoch: 5   Global Step: 33830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:33,830-Speed 5566.34 samples/sec   Loss 6.7463   LearningRate 0.0493   Epoch: 5   Global Step: 33840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:35,692-Speed 5503.69 samples/sec   Loss 6.7605   LearningRate 0.0493   Epoch: 5   Global Step: 33850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:37,534-Speed 5561.86 samples/sec   Loss 6.6743   LearningRate 0.0493   Epoch: 5   Global Step: 33860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:39,356-Speed 5623.74 samples/sec   Loss 6.7215   LearningRate 0.0493   Epoch: 5   Global Step: 33870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:41,171-Speed 5643.40 samples/sec   Loss 6.8630   LearningRate 0.0493   Epoch: 5   Global Step: 33880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:34:42,989-Speed 5634.73 samples/sec   Loss 6.9127   LearningRate 0.0493   Epoch: 5   Global Step: 33890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:44,833-Speed 5555.80 samples/sec   Loss 6.8289   LearningRate 0.0493   Epoch: 5   Global Step: 33900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:46,739-Speed 5376.63 samples/sec   Loss 6.8299   LearningRate 0.0493   Epoch: 5   Global Step: 33910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:48,587-Speed 5546.58 samples/sec   Loss 6.9689   LearningRate 0.0492   Epoch: 5   Global Step: 33920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:50,431-Speed 5555.20 samples/sec   Loss 6.8437   LearningRate 0.0492   Epoch: 5   Global Step: 33930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:52,339-Speed 5372.30 samples/sec   Loss 6.8548   LearningRate 0.0492   Epoch: 5   Global Step: 33940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:54,201-Speed 5501.68 samples/sec   Loss 6.8533   LearningRate 0.0492   Epoch: 5   Global Step: 33950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:56,046-Speed 5551.61 samples/sec   Loss 6.7817   LearningRate 0.0492   Epoch: 5   Global Step: 33960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:57,883-Speed 5579.44 samples/sec   Loss 6.7234   LearningRate 0.0492   Epoch: 5   Global Step: 33970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:34:59,731-Speed 5544.62 samples/sec   Loss 6.8584   LearningRate 0.0492   Epoch: 5   Global Step: 33980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:35:01,569-Speed 5571.91 samples/sec   Loss 6.9021   LearningRate 0.0492   Epoch: 5   Global Step: 33990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:35:03,384-Speed 5645.57 samples/sec   Loss 6.8291   LearningRate 0.0491   Epoch: 5   Global Step: 34000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:35:29,941-[lfw][34000]XNorm: 22.483673
Training: 2022-04-27 03:35:29,942-[lfw][34000]Accuracy-Flip: 0.99650+-0.00293
Training: 2022-04-27 03:35:29,942-[lfw][34000]Accuracy-Highest: 0.99717
Training: 2022-04-27 03:36:02,641-[cfp_fp][34000]XNorm: 19.802441
Training: 2022-04-27 03:36:02,642-[cfp_fp][34000]Accuracy-Flip: 0.93657+-0.01173
Training: 2022-04-27 03:36:02,643-[cfp_fp][34000]Accuracy-Highest: 0.94771
Training: 2022-04-27 03:36:29,690-[agedb_30][34000]XNorm: 22.284478
Training: 2022-04-27 03:36:29,691-[agedb_30][34000]Accuracy-Flip: 0.97283+-0.00699
Training: 2022-04-27 03:36:29,692-[agedb_30][34000]Accuracy-Highest: 0.97283
Training: 2022-04-27 03:36:31,526-Speed 116.18 samples/sec   Loss 6.7492   LearningRate 0.0491   Epoch: 5   Global Step: 34010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:33,341-Speed 5644.88 samples/sec   Loss 6.8411   LearningRate 0.0491   Epoch: 5   Global Step: 34020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:35,159-Speed 5634.47 samples/sec   Loss 6.7198   LearningRate 0.0491   Epoch: 5   Global Step: 34030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:36,977-Speed 5633.20 samples/sec   Loss 6.7161   LearningRate 0.0491   Epoch: 5   Global Step: 34040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:38,803-Speed 5609.72 samples/sec   Loss 6.8204   LearningRate 0.0491   Epoch: 5   Global Step: 34050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:40,606-Speed 5681.74 samples/sec   Loss 6.7373   LearningRate 0.0491   Epoch: 5   Global Step: 34060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:42,410-Speed 5679.01 samples/sec   Loss 6.8463   LearningRate 0.0491   Epoch: 5   Global Step: 34070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:44,216-Speed 5670.16 samples/sec   Loss 6.8484   LearningRate 0.0490   Epoch: 5   Global Step: 34080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:36:46,030-Speed 5648.82 samples/sec   Loss 6.8152   LearningRate 0.0490   Epoch: 5   Global Step: 34090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:36:47,852-Speed 5621.85 samples/sec   Loss 6.7350   LearningRate 0.0490   Epoch: 5   Global Step: 34100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:36:49,793-Speed 5277.77 samples/sec   Loss 6.8782   LearningRate 0.0490   Epoch: 5   Global Step: 34110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:01,347-Speed 886.38 samples/sec   Loss 6.4894   LearningRate 0.0490   Epoch: 6   Global Step: 34120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:03,194-Speed 5546.73 samples/sec   Loss 6.1559   LearningRate 0.0490   Epoch: 6   Global Step: 34130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:05,029-Speed 5583.40 samples/sec   Loss 6.0390   LearningRate 0.0490   Epoch: 6   Global Step: 34140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:06,853-Speed 5615.44 samples/sec   Loss 6.0863   LearningRate 0.0490   Epoch: 6   Global Step: 34150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:08,690-Speed 5576.75 samples/sec   Loss 6.0015   LearningRate 0.0489   Epoch: 6   Global Step: 34160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:10,599-Speed 5365.81 samples/sec   Loss 6.1044   LearningRate 0.0489   Epoch: 6   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:12,481-Speed 5442.68 samples/sec   Loss 6.0553   LearningRate 0.0489   Epoch: 6   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:14,315-Speed 5587.73 samples/sec   Loss 5.9625   LearningRate 0.0489   Epoch: 6   Global Step: 34190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:16,156-Speed 5564.67 samples/sec   Loss 6.2596   LearningRate 0.0489   Epoch: 6   Global Step: 34200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:18,016-Speed 5506.34 samples/sec   Loss 6.1802   LearningRate 0.0489   Epoch: 6   Global Step: 34210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:19,840-Speed 5614.63 samples/sec   Loss 6.2307   LearningRate 0.0489   Epoch: 6   Global Step: 34220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:21,676-Speed 5581.21 samples/sec   Loss 6.2906   LearningRate 0.0489   Epoch: 6   Global Step: 34230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:23,481-Speed 5674.67 samples/sec   Loss 6.1947   LearningRate 0.0488   Epoch: 6   Global Step: 34240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:25,305-Speed 5617.51 samples/sec   Loss 6.1797   LearningRate 0.0488   Epoch: 6   Global Step: 34250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:27,130-Speed 5610.87 samples/sec   Loss 6.3523   LearningRate 0.0488   Epoch: 6   Global Step: 34260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:28,968-Speed 5572.36 samples/sec   Loss 6.1893   LearningRate 0.0488   Epoch: 6   Global Step: 34270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:30,796-Speed 5605.74 samples/sec   Loss 6.2743   LearningRate 0.0488   Epoch: 6   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:32,632-Speed 5577.64 samples/sec   Loss 6.2138   LearningRate 0.0488   Epoch: 6   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:34,462-Speed 5598.30 samples/sec   Loss 6.2440   LearningRate 0.0488   Epoch: 6   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:36,286-Speed 5616.68 samples/sec   Loss 6.3037   LearningRate 0.0488   Epoch: 6   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:38,100-Speed 5646.88 samples/sec   Loss 6.3098   LearningRate 0.0487   Epoch: 6   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:39,925-Speed 5614.03 samples/sec   Loss 6.4361   LearningRate 0.0487   Epoch: 6   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:41,737-Speed 5653.00 samples/sec   Loss 6.1162   LearningRate 0.0487   Epoch: 6   Global Step: 34340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:43,588-Speed 5531.89 samples/sec   Loss 6.3093   LearningRate 0.0487   Epoch: 6   Global Step: 34350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:45,419-Speed 5596.18 samples/sec   Loss 6.3212   LearningRate 0.0487   Epoch: 6   Global Step: 34360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:47,246-Speed 5605.03 samples/sec   Loss 6.4354   LearningRate 0.0487   Epoch: 6   Global Step: 34370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:49,069-Speed 5620.02 samples/sec   Loss 6.4452   LearningRate 0.0487   Epoch: 6   Global Step: 34380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:37:50,878-Speed 5661.23 samples/sec   Loss 6.3245   LearningRate 0.0487   Epoch: 6   Global Step: 34390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:52,694-Speed 5642.04 samples/sec   Loss 6.2928   LearningRate 0.0487   Epoch: 6   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:54,544-Speed 5536.32 samples/sec   Loss 6.2530   LearningRate 0.0486   Epoch: 6   Global Step: 34410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:56,375-Speed 5594.54 samples/sec   Loss 6.1274   LearningRate 0.0486   Epoch: 6   Global Step: 34420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:37:58,216-Speed 5565.04 samples/sec   Loss 6.2420   LearningRate 0.0486   Epoch: 6   Global Step: 34430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:00,059-Speed 5557.75 samples/sec   Loss 6.3257   LearningRate 0.0486   Epoch: 6   Global Step: 34440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:01,877-Speed 5633.32 samples/sec   Loss 6.4135   LearningRate 0.0486   Epoch: 6   Global Step: 34450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:03,717-Speed 5569.95 samples/sec   Loss 6.4822   LearningRate 0.0486   Epoch: 6   Global Step: 34460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:05,549-Speed 5593.24 samples/sec   Loss 6.3507   LearningRate 0.0486   Epoch: 6   Global Step: 34470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:07,388-Speed 5567.71 samples/sec   Loss 6.4280   LearningRate 0.0486   Epoch: 6   Global Step: 34480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:09,256-Speed 5483.68 samples/sec   Loss 6.4476   LearningRate 0.0485   Epoch: 6   Global Step: 34490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:11,073-Speed 5636.97 samples/sec   Loss 6.1843   LearningRate 0.0485   Epoch: 6   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:12,910-Speed 5577.03 samples/sec   Loss 6.5307   LearningRate 0.0485   Epoch: 6   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:14,736-Speed 5611.05 samples/sec   Loss 6.3640   LearningRate 0.0485   Epoch: 6   Global Step: 34520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:16,547-Speed 5655.73 samples/sec   Loss 6.3510   LearningRate 0.0485   Epoch: 6   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:18,359-Speed 5653.30 samples/sec   Loss 6.3315   LearningRate 0.0485   Epoch: 6   Global Step: 34540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:20,169-Speed 5657.60 samples/sec   Loss 6.4602   LearningRate 0.0485   Epoch: 6   Global Step: 34550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:21,996-Speed 5607.42 samples/sec   Loss 6.4617   LearningRate 0.0485   Epoch: 6   Global Step: 34560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:23,806-Speed 5658.67 samples/sec   Loss 6.4788   LearningRate 0.0484   Epoch: 6   Global Step: 34570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:25,634-Speed 5605.89 samples/sec   Loss 6.4374   LearningRate 0.0484   Epoch: 6   Global Step: 34580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:27,462-Speed 5603.84 samples/sec   Loss 6.4418   LearningRate 0.0484   Epoch: 6   Global Step: 34590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:29,296-Speed 5583.74 samples/sec   Loss 6.3658   LearningRate 0.0484   Epoch: 6   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:31,125-Speed 5601.00 samples/sec   Loss 6.4876   LearningRate 0.0484   Epoch: 6   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:32,946-Speed 5624.72 samples/sec   Loss 6.4655   LearningRate 0.0484   Epoch: 6   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:34,789-Speed 5558.82 samples/sec   Loss 6.4368   LearningRate 0.0484   Epoch: 6   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:36,652-Speed 5496.43 samples/sec   Loss 6.4029   LearningRate 0.0484   Epoch: 6   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:38,497-Speed 5552.43 samples/sec   Loss 6.4240   LearningRate 0.0483   Epoch: 6   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:40,321-Speed 5615.87 samples/sec   Loss 6.3845   LearningRate 0.0483   Epoch: 6   Global Step: 34660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:42,140-Speed 5634.41 samples/sec   Loss 6.3687   LearningRate 0.0483   Epoch: 6   Global Step: 34670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:43,975-Speed 5580.64 samples/sec   Loss 6.3826   LearningRate 0.0483   Epoch: 6   Global Step: 34680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:45,787-Speed 5652.96 samples/sec   Loss 6.4136   LearningRate 0.0483   Epoch: 6   Global Step: 34690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:38:47,595-Speed 5664.30 samples/sec   Loss 6.5750   LearningRate 0.0483   Epoch: 6   Global Step: 34700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:49,402-Speed 5669.10 samples/sec   Loss 6.5251   LearningRate 0.0483   Epoch: 6   Global Step: 34710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:51,216-Speed 5647.49 samples/sec   Loss 6.4009   LearningRate 0.0483   Epoch: 6   Global Step: 34720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:53,048-Speed 5590.37 samples/sec   Loss 6.5055   LearningRate 0.0482   Epoch: 6   Global Step: 34730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:54,856-Speed 5667.76 samples/sec   Loss 6.4670   LearningRate 0.0482   Epoch: 6   Global Step: 34740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:56,670-Speed 5646.32 samples/sec   Loss 6.5284   LearningRate 0.0482   Epoch: 6   Global Step: 34750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:38:58,499-Speed 5601.89 samples/sec   Loss 6.6088   LearningRate 0.0482   Epoch: 6   Global Step: 34760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:00,324-Speed 5610.65 samples/sec   Loss 6.4711   LearningRate 0.0482   Epoch: 6   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:02,149-Speed 5613.22 samples/sec   Loss 6.4958   LearningRate 0.0482   Epoch: 6   Global Step: 34780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:03,989-Speed 5567.41 samples/sec   Loss 6.5891   LearningRate 0.0482   Epoch: 6   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:05,830-Speed 5563.05 samples/sec   Loss 6.5160   LearningRate 0.0482   Epoch: 6   Global Step: 34800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:07,661-Speed 5594.87 samples/sec   Loss 6.3696   LearningRate 0.0481   Epoch: 6   Global Step: 34810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:09,502-Speed 5564.52 samples/sec   Loss 6.3975   LearningRate 0.0481   Epoch: 6   Global Step: 34820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:11,319-Speed 5636.99 samples/sec   Loss 6.4317   LearningRate 0.0481   Epoch: 6   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:13,123-Speed 5678.81 samples/sec   Loss 6.5024   LearningRate 0.0481   Epoch: 6   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:14,973-Speed 5537.73 samples/sec   Loss 6.5991   LearningRate 0.0481   Epoch: 6   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:16,828-Speed 5521.27 samples/sec   Loss 6.4814   LearningRate 0.0481   Epoch: 6   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:18,661-Speed 5588.59 samples/sec   Loss 6.4915   LearningRate 0.0481   Epoch: 6   Global Step: 34870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:20,485-Speed 5614.97 samples/sec   Loss 6.4811   LearningRate 0.0481   Epoch: 6   Global Step: 34880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:22,306-Speed 5626.25 samples/sec   Loss 6.4980   LearningRate 0.0481   Epoch: 6   Global Step: 34890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:24,148-Speed 5559.82 samples/sec   Loss 6.5308   LearningRate 0.0480   Epoch: 6   Global Step: 34900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:25,982-Speed 5587.11 samples/sec   Loss 6.3774   LearningRate 0.0480   Epoch: 6   Global Step: 34910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:27,870-Speed 5424.33 samples/sec   Loss 6.4585   LearningRate 0.0480   Epoch: 6   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:29,688-Speed 5636.30 samples/sec   Loss 6.5420   LearningRate 0.0480   Epoch: 6   Global Step: 34930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:31,517-Speed 5598.62 samples/sec   Loss 6.5410   LearningRate 0.0480   Epoch: 6   Global Step: 34940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:33,347-Speed 5597.78 samples/sec   Loss 6.6002   LearningRate 0.0480   Epoch: 6   Global Step: 34950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:35,176-Speed 5599.79 samples/sec   Loss 6.5385   LearningRate 0.0480   Epoch: 6   Global Step: 34960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:37,011-Speed 5581.88 samples/sec   Loss 6.3504   LearningRate 0.0480   Epoch: 6   Global Step: 34970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:38,873-Speed 5502.76 samples/sec   Loss 6.5527   LearningRate 0.0479   Epoch: 6   Global Step: 34980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:39:40,701-Speed 5602.22 samples/sec   Loss 6.5065   LearningRate 0.0479   Epoch: 6   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:42,523-Speed 5623.38 samples/sec   Loss 6.6983   LearningRate 0.0479   Epoch: 6   Global Step: 35000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:44,371-Speed 5542.85 samples/sec   Loss 6.4088   LearningRate 0.0479   Epoch: 6   Global Step: 35010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:46,199-Speed 5603.77 samples/sec   Loss 6.5921   LearningRate 0.0479   Epoch: 6   Global Step: 35020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:48,064-Speed 5493.08 samples/sec   Loss 6.5274   LearningRate 0.0479   Epoch: 6   Global Step: 35030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:49,913-Speed 5539.45 samples/sec   Loss 6.4777   LearningRate 0.0479   Epoch: 6   Global Step: 35040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:51,753-Speed 5566.28 samples/sec   Loss 6.4413   LearningRate 0.0479   Epoch: 6   Global Step: 35050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:53,567-Speed 5649.57 samples/sec   Loss 6.4045   LearningRate 0.0478   Epoch: 6   Global Step: 35060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:55,398-Speed 5594.42 samples/sec   Loss 6.5234   LearningRate 0.0478   Epoch: 6   Global Step: 35070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:57,226-Speed 5602.80 samples/sec   Loss 6.5330   LearningRate 0.0478   Epoch: 6   Global Step: 35080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:39:59,070-Speed 5555.23 samples/sec   Loss 6.5793   LearningRate 0.0478   Epoch: 6   Global Step: 35090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:00,947-Speed 5456.56 samples/sec   Loss 6.5085   LearningRate 0.0478   Epoch: 6   Global Step: 35100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:02,788-Speed 5563.35 samples/sec   Loss 6.5589   LearningRate 0.0478   Epoch: 6   Global Step: 35110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:04,609-Speed 5626.83 samples/sec   Loss 6.6815   LearningRate 0.0478   Epoch: 6   Global Step: 35120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:06,422-Speed 5648.41 samples/sec   Loss 6.5992   LearningRate 0.0478   Epoch: 6   Global Step: 35130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:08,233-Speed 5657.84 samples/sec   Loss 6.7560   LearningRate 0.0477   Epoch: 6   Global Step: 35140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:10,060-Speed 5605.52 samples/sec   Loss 6.5507   LearningRate 0.0477   Epoch: 6   Global Step: 35150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:11,889-Speed 5602.17 samples/sec   Loss 6.5181   LearningRate 0.0477   Epoch: 6   Global Step: 35160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:13,695-Speed 5672.79 samples/sec   Loss 6.5549   LearningRate 0.0477   Epoch: 6   Global Step: 35170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:15,515-Speed 5625.43 samples/sec   Loss 6.4595   LearningRate 0.0477   Epoch: 6   Global Step: 35180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:17,342-Speed 5607.80 samples/sec   Loss 6.5683   LearningRate 0.0477   Epoch: 6   Global Step: 35190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:19,174-Speed 5589.76 samples/sec   Loss 6.6923   LearningRate 0.0477   Epoch: 6   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:21,005-Speed 5595.96 samples/sec   Loss 6.5646   LearningRate 0.0477   Epoch: 6   Global Step: 35210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:22,836-Speed 5593.02 samples/sec   Loss 6.5331   LearningRate 0.0477   Epoch: 6   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:24,646-Speed 5660.42 samples/sec   Loss 6.4931   LearningRate 0.0476   Epoch: 6   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:26,462-Speed 5647.52 samples/sec   Loss 6.5954   LearningRate 0.0476   Epoch: 6   Global Step: 35240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:28,349-Speed 5429.48 samples/sec   Loss 6.4812   LearningRate 0.0476   Epoch: 6   Global Step: 35250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:30,167-Speed 5635.20 samples/sec   Loss 6.6733   LearningRate 0.0476   Epoch: 6   Global Step: 35260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:32,008-Speed 5564.14 samples/sec   Loss 6.6848   LearningRate 0.0476   Epoch: 6   Global Step: 35270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:33,835-Speed 5605.07 samples/sec   Loss 6.5990   LearningRate 0.0476   Epoch: 6   Global Step: 35280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:35,675-Speed 5569.48 samples/sec   Loss 6.5688   LearningRate 0.0476   Epoch: 6   Global Step: 35290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:37,524-Speed 5537.71 samples/sec   Loss 6.5689   LearningRate 0.0476   Epoch: 6   Global Step: 35300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:39,343-Speed 5633.65 samples/sec   Loss 6.6518   LearningRate 0.0475   Epoch: 6   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:41,168-Speed 5611.93 samples/sec   Loss 6.6230   LearningRate 0.0475   Epoch: 6   Global Step: 35320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:42,988-Speed 5628.03 samples/sec   Loss 6.6662   LearningRate 0.0475   Epoch: 6   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:44,794-Speed 5672.55 samples/sec   Loss 6.5212   LearningRate 0.0475   Epoch: 6   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:46,624-Speed 5596.78 samples/sec   Loss 6.5641   LearningRate 0.0475   Epoch: 6   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:48,496-Speed 5472.98 samples/sec   Loss 6.6997   LearningRate 0.0475   Epoch: 6   Global Step: 35360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:50,342-Speed 5549.65 samples/sec   Loss 6.5065   LearningRate 0.0475   Epoch: 6   Global Step: 35370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:52,151-Speed 5662.97 samples/sec   Loss 6.4523   LearningRate 0.0475   Epoch: 6   Global Step: 35380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:53,957-Speed 5669.88 samples/sec   Loss 6.5980   LearningRate 0.0474   Epoch: 6   Global Step: 35390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:55,768-Speed 5656.94 samples/sec   Loss 6.5176   LearningRate 0.0474   Epoch: 6   Global Step: 35400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:40:57,608-Speed 5565.78 samples/sec   Loss 6.4054   LearningRate 0.0474   Epoch: 6   Global Step: 35410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:40:59,427-Speed 5632.10 samples/sec   Loss 6.5557   LearningRate 0.0474   Epoch: 6   Global Step: 35420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:01,262-Speed 5581.17 samples/sec   Loss 6.6215   LearningRate 0.0474   Epoch: 6   Global Step: 35430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:03,098-Speed 5581.03 samples/sec   Loss 6.4090   LearningRate 0.0474   Epoch: 6   Global Step: 35440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:04,936-Speed 5573.11 samples/sec   Loss 6.5505   LearningRate 0.0474   Epoch: 6   Global Step: 35450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:06,746-Speed 5659.42 samples/sec   Loss 6.5016   LearningRate 0.0474   Epoch: 6   Global Step: 35460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:08,616-Speed 5476.97 samples/sec   Loss 6.4950   LearningRate 0.0473   Epoch: 6   Global Step: 35470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:10,500-Speed 5438.58 samples/sec   Loss 6.7036   LearningRate 0.0473   Epoch: 6   Global Step: 35480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:12,351-Speed 5534.19 samples/sec   Loss 6.7606   LearningRate 0.0473   Epoch: 6   Global Step: 35490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:14,178-Speed 5607.59 samples/sec   Loss 6.5501   LearningRate 0.0473   Epoch: 6   Global Step: 35500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:15,995-Speed 5638.48 samples/sec   Loss 6.5877   LearningRate 0.0473   Epoch: 6   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:17,817-Speed 5620.99 samples/sec   Loss 6.4771   LearningRate 0.0473   Epoch: 6   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:19,664-Speed 5544.71 samples/sec   Loss 6.5216   LearningRate 0.0473   Epoch: 6   Global Step: 35530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:21,493-Speed 5600.99 samples/sec   Loss 6.4211   LearningRate 0.0473   Epoch: 6   Global Step: 35540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:23,320-Speed 5606.07 samples/sec   Loss 6.4389   LearningRate 0.0473   Epoch: 6   Global Step: 35550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:25,151-Speed 5594.45 samples/sec   Loss 6.5139   LearningRate 0.0472   Epoch: 6   Global Step: 35560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:26,973-Speed 5622.84 samples/sec   Loss 6.5182   LearningRate 0.0472   Epoch: 6   Global Step: 35570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:28,796-Speed 5619.67 samples/sec   Loss 6.5145   LearningRate 0.0472   Epoch: 6   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:30,616-Speed 5626.85 samples/sec   Loss 6.6959   LearningRate 0.0472   Epoch: 6   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:32,434-Speed 5635.64 samples/sec   Loss 6.5013   LearningRate 0.0472   Epoch: 6   Global Step: 35600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:34,261-Speed 5607.32 samples/sec   Loss 6.4675   LearningRate 0.0472   Epoch: 6   Global Step: 35610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:36,090-Speed 5601.60 samples/sec   Loss 6.7066   LearningRate 0.0472   Epoch: 6   Global Step: 35620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:37,944-Speed 5522.76 samples/sec   Loss 6.5435   LearningRate 0.0472   Epoch: 6   Global Step: 35630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:39,761-Speed 5637.83 samples/sec   Loss 6.6644   LearningRate 0.0471   Epoch: 6   Global Step: 35640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:41,590-Speed 5601.51 samples/sec   Loss 6.6375   LearningRate 0.0471   Epoch: 6   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:43,435-Speed 5551.85 samples/sec   Loss 6.5817   LearningRate 0.0471   Epoch: 6   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:45,261-Speed 5610.81 samples/sec   Loss 6.4348   LearningRate 0.0471   Epoch: 6   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:47,103-Speed 5561.04 samples/sec   Loss 6.6479   LearningRate 0.0471   Epoch: 6   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:48,925-Speed 5620.95 samples/sec   Loss 6.6010   LearningRate 0.0471   Epoch: 6   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:50,765-Speed 5566.53 samples/sec   Loss 6.5837   LearningRate 0.0471   Epoch: 6   Global Step: 35700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:41:52,630-Speed 5493.22 samples/sec   Loss 6.4914   LearningRate 0.0471   Epoch: 6   Global Step: 35710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:54,467-Speed 5573.93 samples/sec   Loss 6.5258   LearningRate 0.0470   Epoch: 6   Global Step: 35720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:56,285-Speed 5638.33 samples/sec   Loss 6.6621   LearningRate 0.0470   Epoch: 6   Global Step: 35730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:41:58,096-Speed 5655.32 samples/sec   Loss 6.6568   LearningRate 0.0470   Epoch: 6   Global Step: 35740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:41:59,963-Speed 5485.41 samples/sec   Loss 6.5325   LearningRate 0.0470   Epoch: 6   Global Step: 35750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:01,817-Speed 5525.45 samples/sec   Loss 6.5172   LearningRate 0.0470   Epoch: 6   Global Step: 35760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:03,659-Speed 5560.98 samples/sec   Loss 6.6355   LearningRate 0.0470   Epoch: 6   Global Step: 35770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:05,504-Speed 5551.01 samples/sec   Loss 6.4889   LearningRate 0.0470   Epoch: 6   Global Step: 35780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:07,335-Speed 5595.64 samples/sec   Loss 6.5656   LearningRate 0.0470   Epoch: 6   Global Step: 35790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:09,145-Speed 5659.16 samples/sec   Loss 6.5973   LearningRate 0.0469   Epoch: 6   Global Step: 35800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:10,990-Speed 5553.05 samples/sec   Loss 6.4658   LearningRate 0.0469   Epoch: 6   Global Step: 35810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:12,817-Speed 5607.48 samples/sec   Loss 6.6014   LearningRate 0.0469   Epoch: 6   Global Step: 35820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:14,662-Speed 5548.88 samples/sec   Loss 6.6006   LearningRate 0.0469   Epoch: 6   Global Step: 35830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:42:16,500-Speed 5575.03 samples/sec   Loss 6.5402   LearningRate 0.0469   Epoch: 6   Global Step: 35840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:18,318-Speed 5633.92 samples/sec   Loss 6.6960   LearningRate 0.0469   Epoch: 6   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:20,127-Speed 5661.76 samples/sec   Loss 6.6731   LearningRate 0.0469   Epoch: 6   Global Step: 35860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:21,970-Speed 5559.30 samples/sec   Loss 6.6629   LearningRate 0.0469   Epoch: 6   Global Step: 35870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:23,800-Speed 5596.64 samples/sec   Loss 6.5688   LearningRate 0.0469   Epoch: 6   Global Step: 35880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:25,634-Speed 5587.49 samples/sec   Loss 6.5692   LearningRate 0.0468   Epoch: 6   Global Step: 35890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:27,463-Speed 5598.89 samples/sec   Loss 6.4965   LearningRate 0.0468   Epoch: 6   Global Step: 35900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:29,282-Speed 5631.44 samples/sec   Loss 6.6280   LearningRate 0.0468   Epoch: 6   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:31,115-Speed 5589.41 samples/sec   Loss 6.4840   LearningRate 0.0468   Epoch: 6   Global Step: 35920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:32,940-Speed 5610.42 samples/sec   Loss 6.5628   LearningRate 0.0468   Epoch: 6   Global Step: 35930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:34,768-Speed 5603.30 samples/sec   Loss 6.4613   LearningRate 0.0468   Epoch: 6   Global Step: 35940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:42:36,594-Speed 5611.13 samples/sec   Loss 6.4791   LearningRate 0.0468   Epoch: 6   Global Step: 35950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:38,410-Speed 5639.52 samples/sec   Loss 6.6042   LearningRate 0.0468   Epoch: 6   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:40,231-Speed 5626.74 samples/sec   Loss 6.4516   LearningRate 0.0467   Epoch: 6   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:42,055-Speed 5615.17 samples/sec   Loss 6.6322   LearningRate 0.0467   Epoch: 6   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:43,872-Speed 5637.98 samples/sec   Loss 6.5452   LearningRate 0.0467   Epoch: 6   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:42:45,687-Speed 5644.24 samples/sec   Loss 6.5135   LearningRate 0.0467   Epoch: 6   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:43:12,079-[lfw][36000]XNorm: 23.653991
Training: 2022-04-27 03:43:12,080-[lfw][36000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-27 03:43:12,080-[lfw][36000]Accuracy-Highest: 0.99750
Training: 2022-04-27 03:43:43,233-[cfp_fp][36000]XNorm: 20.816380
Training: 2022-04-27 03:43:43,233-[cfp_fp][36000]Accuracy-Flip: 0.94343+-0.01184
Training: 2022-04-27 03:43:43,234-[cfp_fp][36000]Accuracy-Highest: 0.94771
Training: 2022-04-27 03:44:10,487-[agedb_30][36000]XNorm: 23.144299
Training: 2022-04-27 03:44:10,487-[agedb_30][36000]Accuracy-Flip: 0.97300+-0.00819
Training: 2022-04-27 03:44:10,488-[agedb_30][36000]Accuracy-Highest: 0.97300
Training: 2022-04-27 03:44:12,341-Speed 118.17 samples/sec   Loss 6.5886   LearningRate 0.0467   Epoch: 6   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:14,152-Speed 5656.17 samples/sec   Loss 6.5730   LearningRate 0.0467   Epoch: 6   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:15,965-Speed 5649.34 samples/sec   Loss 6.6172   LearningRate 0.0467   Epoch: 6   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:17,784-Speed 5630.34 samples/sec   Loss 6.5202   LearningRate 0.0467   Epoch: 6   Global Step: 36040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:19,607-Speed 5619.95 samples/sec   Loss 6.5089   LearningRate 0.0466   Epoch: 6   Global Step: 36050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:44:21,460-Speed 5528.49 samples/sec   Loss 6.5691   LearningRate 0.0466   Epoch: 6   Global Step: 36060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:44:23,330-Speed 5476.63 samples/sec   Loss 6.6441   LearningRate 0.0466   Epoch: 6   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:25,165-Speed 5581.69 samples/sec   Loss 6.6350   LearningRate 0.0466   Epoch: 6   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:26,992-Speed 5607.82 samples/sec   Loss 6.5897   LearningRate 0.0466   Epoch: 6   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:28,865-Speed 5469.07 samples/sec   Loss 6.5851   LearningRate 0.0466   Epoch: 6   Global Step: 36100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:30,680-Speed 5643.46 samples/sec   Loss 6.6289   LearningRate 0.0466   Epoch: 6   Global Step: 36110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:32,510-Speed 5595.79 samples/sec   Loss 6.5168   LearningRate 0.0466   Epoch: 6   Global Step: 36120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:34,369-Speed 5510.78 samples/sec   Loss 6.3880   LearningRate 0.0466   Epoch: 6   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:36,193-Speed 5617.88 samples/sec   Loss 6.5977   LearningRate 0.0465   Epoch: 6   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:38,019-Speed 5609.98 samples/sec   Loss 6.5288   LearningRate 0.0465   Epoch: 6   Global Step: 36150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:39,836-Speed 5635.86 samples/sec   Loss 6.4717   LearningRate 0.0465   Epoch: 6   Global Step: 36160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:41,657-Speed 5625.46 samples/sec   Loss 6.5489   LearningRate 0.0465   Epoch: 6   Global Step: 36170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:44:43,499-Speed 5562.87 samples/sec   Loss 6.5035   LearningRate 0.0465   Epoch: 6   Global Step: 36180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:44:45,331-Speed 5591.45 samples/sec   Loss 6.6465   LearningRate 0.0465   Epoch: 6   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:44:47,166-Speed 5581.92 samples/sec   Loss 6.4984   LearningRate 0.0465   Epoch: 6   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:44:48,994-Speed 5603.09 samples/sec   Loss 6.6365   LearningRate 0.0465   Epoch: 6   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:50,828-Speed 5584.89 samples/sec   Loss 6.6207   LearningRate 0.0464   Epoch: 6   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:52,653-Speed 5611.37 samples/sec   Loss 6.5407   LearningRate 0.0464   Epoch: 6   Global Step: 36230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:54,498-Speed 5552.72 samples/sec   Loss 6.5059   LearningRate 0.0464   Epoch: 6   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:56,342-Speed 5555.13 samples/sec   Loss 6.6058   LearningRate 0.0464   Epoch: 6   Global Step: 36250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:44:58,169-Speed 5606.77 samples/sec   Loss 6.6051   LearningRate 0.0464   Epoch: 6   Global Step: 36260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:00,003-Speed 5583.81 samples/sec   Loss 6.5113   LearningRate 0.0464   Epoch: 6   Global Step: 36270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:01,865-Speed 5501.40 samples/sec   Loss 6.6343   LearningRate 0.0464   Epoch: 6   Global Step: 36280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:03,678-Speed 5650.16 samples/sec   Loss 6.6235   LearningRate 0.0464   Epoch: 6   Global Step: 36290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:05,506-Speed 5606.86 samples/sec   Loss 6.5576   LearningRate 0.0463   Epoch: 6   Global Step: 36300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:07,335-Speed 5599.19 samples/sec   Loss 6.5380   LearningRate 0.0463   Epoch: 6   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:09,178-Speed 5557.95 samples/sec   Loss 6.4941   LearningRate 0.0463   Epoch: 6   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:11,002-Speed 5616.05 samples/sec   Loss 6.3698   LearningRate 0.0463   Epoch: 6   Global Step: 36330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:12,829-Speed 5605.72 samples/sec   Loss 6.5910   LearningRate 0.0463   Epoch: 6   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:14,677-Speed 5544.26 samples/sec   Loss 6.4117   LearningRate 0.0463   Epoch: 6   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:16,492-Speed 5644.25 samples/sec   Loss 6.5291   LearningRate 0.0463   Epoch: 6   Global Step: 36360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:18,313-Speed 5625.42 samples/sec   Loss 6.4151   LearningRate 0.0463   Epoch: 6   Global Step: 36370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:20,134-Speed 5622.24 samples/sec   Loss 6.6263   LearningRate 0.0463   Epoch: 6   Global Step: 36380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:22,004-Speed 5480.14 samples/sec   Loss 6.6366   LearningRate 0.0462   Epoch: 6   Global Step: 36390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:23,851-Speed 5544.44 samples/sec   Loss 6.5622   LearningRate 0.0462   Epoch: 6   Global Step: 36400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:25,684-Speed 5588.34 samples/sec   Loss 6.4805   LearningRate 0.0462   Epoch: 6   Global Step: 36410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:27,507-Speed 5621.20 samples/sec   Loss 6.6353   LearningRate 0.0462   Epoch: 6   Global Step: 36420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:29,339-Speed 5591.08 samples/sec   Loss 6.4909   LearningRate 0.0462   Epoch: 6   Global Step: 36430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:31,183-Speed 5554.16 samples/sec   Loss 6.4801   LearningRate 0.0462   Epoch: 6   Global Step: 36440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:33,012-Speed 5600.54 samples/sec   Loss 6.5970   LearningRate 0.0462   Epoch: 6   Global Step: 36450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:34,851-Speed 5571.41 samples/sec   Loss 6.5838   LearningRate 0.0462   Epoch: 6   Global Step: 36460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:36,670-Speed 5631.23 samples/sec   Loss 6.4596   LearningRate 0.0461   Epoch: 6   Global Step: 36470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:38,511-Speed 5563.37 samples/sec   Loss 6.6109   LearningRate 0.0461   Epoch: 6   Global Step: 36480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:40,337-Speed 5608.95 samples/sec   Loss 6.4629   LearningRate 0.0461   Epoch: 6   Global Step: 36490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:42,180-Speed 5557.23 samples/sec   Loss 6.4909   LearningRate 0.0461   Epoch: 6   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:44,012-Speed 5591.68 samples/sec   Loss 6.3863   LearningRate 0.0461   Epoch: 6   Global Step: 36510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:45,835-Speed 5618.75 samples/sec   Loss 6.5199   LearningRate 0.0461   Epoch: 6   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:47,666-Speed 5595.47 samples/sec   Loss 6.3617   LearningRate 0.0461   Epoch: 6   Global Step: 36530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:49,508-Speed 5561.58 samples/sec   Loss 6.4440   LearningRate 0.0461   Epoch: 6   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:51,330-Speed 5620.59 samples/sec   Loss 6.5995   LearningRate 0.0460   Epoch: 6   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:53,158-Speed 5605.99 samples/sec   Loss 6.6441   LearningRate 0.0460   Epoch: 6   Global Step: 36560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:54,971-Speed 5647.30 samples/sec   Loss 6.5197   LearningRate 0.0460   Epoch: 6   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:45:56,809-Speed 5574.09 samples/sec   Loss 6.5477   LearningRate 0.0460   Epoch: 6   Global Step: 36580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:45:58,640-Speed 5594.69 samples/sec   Loss 6.5079   LearningRate 0.0460   Epoch: 6   Global Step: 36590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:00,455-Speed 5644.24 samples/sec   Loss 6.3136   LearningRate 0.0460   Epoch: 6   Global Step: 36600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:02,288-Speed 5587.89 samples/sec   Loss 6.4631   LearningRate 0.0460   Epoch: 6   Global Step: 36610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:04,116-Speed 5604.81 samples/sec   Loss 6.4440   LearningRate 0.0460   Epoch: 6   Global Step: 36620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:05,959-Speed 5555.97 samples/sec   Loss 6.7014   LearningRate 0.0460   Epoch: 6   Global Step: 36630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:07,807-Speed 5543.24 samples/sec   Loss 6.6273   LearningRate 0.0459   Epoch: 6   Global Step: 36640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:09,618-Speed 5656.65 samples/sec   Loss 6.5480   LearningRate 0.0459   Epoch: 6   Global Step: 36650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:11,452-Speed 5584.91 samples/sec   Loss 6.5233   LearningRate 0.0459   Epoch: 6   Global Step: 36660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:13,277-Speed 5615.70 samples/sec   Loss 6.6093   LearningRate 0.0459   Epoch: 6   Global Step: 36670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:15,103-Speed 5607.17 samples/sec   Loss 6.5521   LearningRate 0.0459   Epoch: 6   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:16,948-Speed 5552.28 samples/sec   Loss 6.4564   LearningRate 0.0459   Epoch: 6   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:18,758-Speed 5659.63 samples/sec   Loss 6.3977   LearningRate 0.0459   Epoch: 6   Global Step: 36700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:20,585-Speed 5606.61 samples/sec   Loss 6.6120   LearningRate 0.0459   Epoch: 6   Global Step: 36710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:22,406-Speed 5625.91 samples/sec   Loss 6.5594   LearningRate 0.0458   Epoch: 6   Global Step: 36720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:24,214-Speed 5665.86 samples/sec   Loss 6.6830   LearningRate 0.0458   Epoch: 6   Global Step: 36730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:26,058-Speed 5554.86 samples/sec   Loss 6.4413   LearningRate 0.0458   Epoch: 6   Global Step: 36740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:27,879-Speed 5625.16 samples/sec   Loss 6.4393   LearningRate 0.0458   Epoch: 6   Global Step: 36750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:29,706-Speed 5606.85 samples/sec   Loss 6.4788   LearningRate 0.0458   Epoch: 6   Global Step: 36760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:31,533-Speed 5606.44 samples/sec   Loss 6.5856   LearningRate 0.0458   Epoch: 6   Global Step: 36770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:33,354-Speed 5622.70 samples/sec   Loss 6.6041   LearningRate 0.0458   Epoch: 6   Global Step: 36780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:35,211-Speed 5517.96 samples/sec   Loss 6.4935   LearningRate 0.0458   Epoch: 6   Global Step: 36790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:37,037-Speed 5610.44 samples/sec   Loss 6.4914   LearningRate 0.0458   Epoch: 6   Global Step: 36800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:38,861-Speed 5616.95 samples/sec   Loss 6.5157   LearningRate 0.0457   Epoch: 6   Global Step: 36810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:40,695-Speed 5582.43 samples/sec   Loss 6.7175   LearningRate 0.0457   Epoch: 6   Global Step: 36820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:42,505-Speed 5659.51 samples/sec   Loss 6.4202   LearningRate 0.0457   Epoch: 6   Global Step: 36830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:44,325-Speed 5630.72 samples/sec   Loss 6.4089   LearningRate 0.0457   Epoch: 6   Global Step: 36840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:46,151-Speed 5609.29 samples/sec   Loss 6.5959   LearningRate 0.0457   Epoch: 6   Global Step: 36850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:47,983-Speed 5590.40 samples/sec   Loss 6.4787   LearningRate 0.0457   Epoch: 6   Global Step: 36860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:49,829-Speed 5548.56 samples/sec   Loss 6.5325   LearningRate 0.0457   Epoch: 6   Global Step: 36870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:51,658-Speed 5600.96 samples/sec   Loss 6.6423   LearningRate 0.0457   Epoch: 6   Global Step: 36880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:53,466-Speed 5663.39 samples/sec   Loss 6.4827   LearningRate 0.0456   Epoch: 6   Global Step: 36890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:55,313-Speed 5548.16 samples/sec   Loss 6.5111   LearningRate 0.0456   Epoch: 6   Global Step: 36900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:46:57,173-Speed 5525.77 samples/sec   Loss 6.5920   LearningRate 0.0456   Epoch: 6   Global Step: 36910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:46:59,012-Speed 5572.12 samples/sec   Loss 6.4969   LearningRate 0.0456   Epoch: 6   Global Step: 36920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:00,851-Speed 5567.05 samples/sec   Loss 6.5566   LearningRate 0.0456   Epoch: 6   Global Step: 36930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:02,677-Speed 5612.42 samples/sec   Loss 6.6293   LearningRate 0.0456   Epoch: 6   Global Step: 36940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:04,493-Speed 5638.72 samples/sec   Loss 6.6390   LearningRate 0.0456   Epoch: 6   Global Step: 36950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:06,314-Speed 5629.66 samples/sec   Loss 6.5844   LearningRate 0.0456   Epoch: 6   Global Step: 36960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:08,140-Speed 5607.34 samples/sec   Loss 6.4365   LearningRate 0.0455   Epoch: 6   Global Step: 36970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:09,980-Speed 5567.39 samples/sec   Loss 6.5220   LearningRate 0.0455   Epoch: 6   Global Step: 36980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:11,801-Speed 5625.19 samples/sec   Loss 6.4592   LearningRate 0.0455   Epoch: 6   Global Step: 36990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:13,615-Speed 5647.20 samples/sec   Loss 6.4974   LearningRate 0.0455   Epoch: 6   Global Step: 37000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:15,438-Speed 5618.48 samples/sec   Loss 6.4390   LearningRate 0.0455   Epoch: 6   Global Step: 37010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:17,264-Speed 5612.45 samples/sec   Loss 6.5119   LearningRate 0.0455   Epoch: 6   Global Step: 37020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:19,096-Speed 5589.09 samples/sec   Loss 6.4186   LearningRate 0.0455   Epoch: 6   Global Step: 37030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:20,910-Speed 5647.63 samples/sec   Loss 6.4075   LearningRate 0.0455   Epoch: 6   Global Step: 37040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:22,730-Speed 5629.13 samples/sec   Loss 6.6003   LearningRate 0.0455   Epoch: 6   Global Step: 37050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:24,561-Speed 5592.06 samples/sec   Loss 6.4504   LearningRate 0.0454   Epoch: 6   Global Step: 37060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:26,398-Speed 5579.01 samples/sec   Loss 6.3352   LearningRate 0.0454   Epoch: 6   Global Step: 37070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:28,246-Speed 5542.05 samples/sec   Loss 6.5175   LearningRate 0.0454   Epoch: 6   Global Step: 37080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:30,072-Speed 5609.95 samples/sec   Loss 6.4335   LearningRate 0.0454   Epoch: 6   Global Step: 37090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:31,918-Speed 5547.98 samples/sec   Loss 6.5258   LearningRate 0.0454   Epoch: 6   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:33,838-Speed 5336.08 samples/sec   Loss 6.4288   LearningRate 0.0454   Epoch: 6   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:35,691-Speed 5527.70 samples/sec   Loss 6.5406   LearningRate 0.0454   Epoch: 6   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:37,524-Speed 5589.01 samples/sec   Loss 6.5807   LearningRate 0.0454   Epoch: 6   Global Step: 37130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:39,340-Speed 5641.56 samples/sec   Loss 6.5153   LearningRate 0.0453   Epoch: 6   Global Step: 37140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:41,182-Speed 5561.03 samples/sec   Loss 6.3377   LearningRate 0.0453   Epoch: 6   Global Step: 37150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:43,034-Speed 5530.82 samples/sec   Loss 6.2921   LearningRate 0.0453   Epoch: 6   Global Step: 37160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:44,864-Speed 5596.84 samples/sec   Loss 6.4807   LearningRate 0.0453   Epoch: 6   Global Step: 37170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:47:46,683-Speed 5629.53 samples/sec   Loss 6.4759   LearningRate 0.0453   Epoch: 6   Global Step: 37180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:48,522-Speed 5569.79 samples/sec   Loss 6.5042   LearningRate 0.0453   Epoch: 6   Global Step: 37190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:50,367-Speed 5553.67 samples/sec   Loss 6.5465   LearningRate 0.0453   Epoch: 6   Global Step: 37200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:52,209-Speed 5558.98 samples/sec   Loss 6.5724   LearningRate 0.0453   Epoch: 6   Global Step: 37210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:54,035-Speed 5609.88 samples/sec   Loss 6.3878   LearningRate 0.0453   Epoch: 6   Global Step: 37220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:55,884-Speed 5541.22 samples/sec   Loss 6.4686   LearningRate 0.0452   Epoch: 6   Global Step: 37230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:57,716-Speed 5590.35 samples/sec   Loss 6.5000   LearningRate 0.0452   Epoch: 6   Global Step: 37240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:47:59,639-Speed 5328.63 samples/sec   Loss 6.2997   LearningRate 0.0452   Epoch: 6   Global Step: 37250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:48:01,545-Speed 5376.47 samples/sec   Loss 6.5390   LearningRate 0.0452   Epoch: 6   Global Step: 37260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:03,369-Speed 5613.34 samples/sec   Loss 6.5601   LearningRate 0.0452   Epoch: 6   Global Step: 37270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:05,198-Speed 5602.47 samples/sec   Loss 6.3128   LearningRate 0.0452   Epoch: 6   Global Step: 37280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:07,035-Speed 5574.35 samples/sec   Loss 6.4750   LearningRate 0.0452   Epoch: 6   Global Step: 37290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:08,866-Speed 5596.65 samples/sec   Loss 6.4118   LearningRate 0.0452   Epoch: 6   Global Step: 37300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:10,703-Speed 5575.97 samples/sec   Loss 6.4714   LearningRate 0.0451   Epoch: 6   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:12,540-Speed 5575.71 samples/sec   Loss 6.4080   LearningRate 0.0451   Epoch: 6   Global Step: 37320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:14,351-Speed 5656.94 samples/sec   Loss 6.4928   LearningRate 0.0451   Epoch: 6   Global Step: 37330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:16,174-Speed 5616.82 samples/sec   Loss 6.3153   LearningRate 0.0451   Epoch: 6   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:18,020-Speed 5550.55 samples/sec   Loss 6.4845   LearningRate 0.0451   Epoch: 6   Global Step: 37350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:19,844-Speed 5614.33 samples/sec   Loss 6.5187   LearningRate 0.0451   Epoch: 6   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:48:21,701-Speed 5516.99 samples/sec   Loss 6.5255   LearningRate 0.0451   Epoch: 6   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:48:23,522-Speed 5625.12 samples/sec   Loss 6.5482   LearningRate 0.0451   Epoch: 6   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:48:25,386-Speed 5495.77 samples/sec   Loss 6.5704   LearningRate 0.0451   Epoch: 6   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:48:27,217-Speed 5593.04 samples/sec   Loss 6.5406   LearningRate 0.0450   Epoch: 6   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:48:29,035-Speed 5634.89 samples/sec   Loss 6.4863   LearningRate 0.0450   Epoch: 6   Global Step: 37410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:30,860-Speed 5614.17 samples/sec   Loss 6.2746   LearningRate 0.0450   Epoch: 6   Global Step: 37420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:32,731-Speed 5473.63 samples/sec   Loss 6.4325   LearningRate 0.0450   Epoch: 6   Global Step: 37430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:34,571-Speed 5568.29 samples/sec   Loss 6.3930   LearningRate 0.0450   Epoch: 6   Global Step: 37440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:36,427-Speed 5519.18 samples/sec   Loss 6.5507   LearningRate 0.0450   Epoch: 6   Global Step: 37450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:38,264-Speed 5576.72 samples/sec   Loss 6.4823   LearningRate 0.0450   Epoch: 6   Global Step: 37460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:40,089-Speed 5612.72 samples/sec   Loss 6.5775   LearningRate 0.0450   Epoch: 6   Global Step: 37470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:41,912-Speed 5618.14 samples/sec   Loss 6.4971   LearningRate 0.0449   Epoch: 6   Global Step: 37480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:43,740-Speed 5603.15 samples/sec   Loss 6.5206   LearningRate 0.0449   Epoch: 6   Global Step: 37490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:45,571-Speed 5595.40 samples/sec   Loss 6.2341   LearningRate 0.0449   Epoch: 6   Global Step: 37500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 03:48:47,387-Speed 5641.12 samples/sec   Loss 6.4971   LearningRate 0.0449   Epoch: 6   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:49,213-Speed 5609.01 samples/sec   Loss 6.3353   LearningRate 0.0449   Epoch: 6   Global Step: 37520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:51,041-Speed 5602.89 samples/sec   Loss 6.3780   LearningRate 0.0449   Epoch: 6   Global Step: 37530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:52,864-Speed 5619.84 samples/sec   Loss 6.5861   LearningRate 0.0449   Epoch: 6   Global Step: 37540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:54,702-Speed 5571.72 samples/sec   Loss 6.4234   LearningRate 0.0449   Epoch: 6   Global Step: 37550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:56,539-Speed 5576.19 samples/sec   Loss 6.6042   LearningRate 0.0449   Epoch: 6   Global Step: 37560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:48:58,375-Speed 5579.47 samples/sec   Loss 6.3781   LearningRate 0.0448   Epoch: 6   Global Step: 37570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:00,201-Speed 5609.72 samples/sec   Loss 6.4112   LearningRate 0.0448   Epoch: 6   Global Step: 37580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:02,029-Speed 5603.54 samples/sec   Loss 6.5914   LearningRate 0.0448   Epoch: 6   Global Step: 37590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:03,857-Speed 5604.50 samples/sec   Loss 6.5062   LearningRate 0.0448   Epoch: 6   Global Step: 37600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:05,686-Speed 5600.55 samples/sec   Loss 6.5457   LearningRate 0.0448   Epoch: 6   Global Step: 37610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:07,502-Speed 5641.39 samples/sec   Loss 6.4679   LearningRate 0.0448   Epoch: 6   Global Step: 37620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:09,329-Speed 5609.62 samples/sec   Loss 6.5767   LearningRate 0.0448   Epoch: 6   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:11,164-Speed 5581.04 samples/sec   Loss 6.4175   LearningRate 0.0448   Epoch: 6   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:13,039-Speed 5463.12 samples/sec   Loss 6.3926   LearningRate 0.0447   Epoch: 6   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:14,902-Speed 5499.64 samples/sec   Loss 6.3547   LearningRate 0.0447   Epoch: 6   Global Step: 37660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:16,728-Speed 5607.25 samples/sec   Loss 6.5187   LearningRate 0.0447   Epoch: 6   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:18,566-Speed 5574.12 samples/sec   Loss 6.3564   LearningRate 0.0447   Epoch: 6   Global Step: 37680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:20,399-Speed 5588.63 samples/sec   Loss 6.4315   LearningRate 0.0447   Epoch: 6   Global Step: 37690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:22,238-Speed 5567.89 samples/sec   Loss 6.5221   LearningRate 0.0447   Epoch: 6   Global Step: 37700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:49:24,059-Speed 5626.47 samples/sec   Loss 6.3958   LearningRate 0.0447   Epoch: 6   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:25,894-Speed 5581.24 samples/sec   Loss 6.5762   LearningRate 0.0447   Epoch: 6   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:27,731-Speed 5577.93 samples/sec   Loss 6.5149   LearningRate 0.0447   Epoch: 6   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:29,560-Speed 5600.46 samples/sec   Loss 6.3882   LearningRate 0.0446   Epoch: 6   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:31,373-Speed 5651.03 samples/sec   Loss 6.5474   LearningRate 0.0446   Epoch: 6   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:33,206-Speed 5587.79 samples/sec   Loss 6.5223   LearningRate 0.0446   Epoch: 6   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:35,031-Speed 5610.86 samples/sec   Loss 6.6320   LearningRate 0.0446   Epoch: 6   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:36,858-Speed 5606.84 samples/sec   Loss 6.4539   LearningRate 0.0446   Epoch: 6   Global Step: 37780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:38,687-Speed 5601.71 samples/sec   Loss 6.3054   LearningRate 0.0446   Epoch: 6   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:40,531-Speed 5554.02 samples/sec   Loss 6.4559   LearningRate 0.0446   Epoch: 6   Global Step: 37800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:42,351-Speed 5628.93 samples/sec   Loss 6.3776   LearningRate 0.0446   Epoch: 6   Global Step: 37810   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-27 03:49:44,158-Speed 5670.52 samples/sec   Loss 6.5029   LearningRate 0.0445   Epoch: 6   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:45,980-Speed 5620.62 samples/sec   Loss 6.5721   LearningRate 0.0445   Epoch: 6   Global Step: 37830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:47,797-Speed 5638.68 samples/sec   Loss 6.4617   LearningRate 0.0445   Epoch: 6   Global Step: 37840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:49,645-Speed 5542.24 samples/sec   Loss 6.5366   LearningRate 0.0445   Epoch: 6   Global Step: 37850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:51,483-Speed 5572.84 samples/sec   Loss 6.5085   LearningRate 0.0445   Epoch: 6   Global Step: 37860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:53,312-Speed 5600.68 samples/sec   Loss 6.3129   LearningRate 0.0445   Epoch: 6   Global Step: 37870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:55,151-Speed 5568.78 samples/sec   Loss 6.4012   LearningRate 0.0445   Epoch: 6   Global Step: 37880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:56,972-Speed 5627.33 samples/sec   Loss 6.3893   LearningRate 0.0445   Epoch: 6   Global Step: 37890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:49:58,793-Speed 5624.84 samples/sec   Loss 6.5036   LearningRate 0.0445   Epoch: 6   Global Step: 37900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:00,609-Speed 5640.45 samples/sec   Loss 6.5370   LearningRate 0.0444   Epoch: 6   Global Step: 37910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:02,435-Speed 5609.45 samples/sec   Loss 6.3124   LearningRate 0.0444   Epoch: 6   Global Step: 37920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:04,262-Speed 5606.32 samples/sec   Loss 6.4628   LearningRate 0.0444   Epoch: 6   Global Step: 37930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:06,091-Speed 5601.67 samples/sec   Loss 6.4595   LearningRate 0.0444   Epoch: 6   Global Step: 37940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:07,918-Speed 5606.76 samples/sec   Loss 6.3599   LearningRate 0.0444   Epoch: 6   Global Step: 37950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:09,746-Speed 5603.32 samples/sec   Loss 6.3656   LearningRate 0.0444   Epoch: 6   Global Step: 37960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:11,583-Speed 5573.91 samples/sec   Loss 6.3933   LearningRate 0.0444   Epoch: 6   Global Step: 37970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:13,413-Speed 5599.83 samples/sec   Loss 6.3271   LearningRate 0.0444   Epoch: 6   Global Step: 37980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:15,245-Speed 5590.90 samples/sec   Loss 6.5019   LearningRate 0.0443   Epoch: 6   Global Step: 37990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:17,088-Speed 5559.10 samples/sec   Loss 6.4761   LearningRate 0.0443   Epoch: 6   Global Step: 38000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:50:43,634-[lfw][38000]XNorm: 21.222165
Training: 2022-04-27 03:50:43,635-[lfw][38000]Accuracy-Flip: 0.99683+-0.00302
Training: 2022-04-27 03:50:43,635-[lfw][38000]Accuracy-Highest: 0.99750
Training: 2022-04-27 03:51:14,127-[cfp_fp][38000]XNorm: 18.071753
Training: 2022-04-27 03:51:14,128-[cfp_fp][38000]Accuracy-Flip: 0.94043+-0.01036
Training: 2022-04-27 03:51:14,128-[cfp_fp][38000]Accuracy-Highest: 0.94771
Training: 2022-04-27 03:51:40,431-[agedb_30][38000]XNorm: 20.705648
Training: 2022-04-27 03:51:40,431-[agedb_30][38000]Accuracy-Flip: 0.97150+-0.00947
Training: 2022-04-27 03:51:40,432-[agedb_30][38000]Accuracy-Highest: 0.97300
Training: 2022-04-27 03:51:42,284-Speed 120.19 samples/sec   Loss 6.4551   LearningRate 0.0443   Epoch: 6   Global Step: 38010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:51:44,102-Speed 5636.19 samples/sec   Loss 6.4517   LearningRate 0.0443   Epoch: 6   Global Step: 38020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:51:45,912-Speed 5659.25 samples/sec   Loss 6.4428   LearningRate 0.0443   Epoch: 6   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:51:47,716-Speed 5678.98 samples/sec   Loss 6.4753   LearningRate 0.0443   Epoch: 6   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:51:49,542-Speed 5610.23 samples/sec   Loss 6.3447   LearningRate 0.0443   Epoch: 6   Global Step: 38050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:51:51,351-Speed 5659.27 samples/sec   Loss 6.3783   LearningRate 0.0443   Epoch: 6   Global Step: 38060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:51:53,159-Speed 5666.70 samples/sec   Loss 6.4396   LearningRate 0.0443   Epoch: 6   Global Step: 38070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:51:54,972-Speed 5649.81 samples/sec   Loss 6.5568   LearningRate 0.0442   Epoch: 6   Global Step: 38080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:51:56,791-Speed 5632.86 samples/sec   Loss 6.4915   LearningRate 0.0442   Epoch: 6   Global Step: 38090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:51:58,610-Speed 5630.67 samples/sec   Loss 6.3708   LearningRate 0.0442   Epoch: 6   Global Step: 38100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:00,425-Speed 5644.08 samples/sec   Loss 6.3992   LearningRate 0.0442   Epoch: 6   Global Step: 38110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:02,233-Speed 5666.17 samples/sec   Loss 6.3999   LearningRate 0.0442   Epoch: 6   Global Step: 38120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:04,074-Speed 5561.29 samples/sec   Loss 6.4238   LearningRate 0.0442   Epoch: 6   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:05,895-Speed 5626.39 samples/sec   Loss 6.4116   LearningRate 0.0442   Epoch: 6   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:07,710-Speed 5642.85 samples/sec   Loss 6.4352   LearningRate 0.0442   Epoch: 6   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:09,539-Speed 5600.87 samples/sec   Loss 6.4688   LearningRate 0.0441   Epoch: 6   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:11,361-Speed 5623.50 samples/sec   Loss 6.4016   LearningRate 0.0441   Epoch: 6   Global Step: 38170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:13,178-Speed 5637.85 samples/sec   Loss 6.4413   LearningRate 0.0441   Epoch: 6   Global Step: 38180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:14,994-Speed 5639.97 samples/sec   Loss 6.2987   LearningRate 0.0441   Epoch: 6   Global Step: 38190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:16,809-Speed 5642.65 samples/sec   Loss 6.4706   LearningRate 0.0441   Epoch: 6   Global Step: 38200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:18,656-Speed 5547.03 samples/sec   Loss 6.5385   LearningRate 0.0441   Epoch: 6   Global Step: 38210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:20,496-Speed 5567.81 samples/sec   Loss 6.5234   LearningRate 0.0441   Epoch: 6   Global Step: 38220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:22,327-Speed 5591.88 samples/sec   Loss 6.3267   LearningRate 0.0441   Epoch: 6   Global Step: 38230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:24,161-Speed 5587.32 samples/sec   Loss 6.2691   LearningRate 0.0441   Epoch: 6   Global Step: 38240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:25,973-Speed 5654.02 samples/sec   Loss 6.5153   LearningRate 0.0440   Epoch: 6   Global Step: 38250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:27,791-Speed 5632.50 samples/sec   Loss 6.3544   LearningRate 0.0440   Epoch: 6   Global Step: 38260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:29,623-Speed 5592.36 samples/sec   Loss 6.4612   LearningRate 0.0440   Epoch: 6   Global Step: 38270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:31,453-Speed 5596.91 samples/sec   Loss 6.3152   LearningRate 0.0440   Epoch: 6   Global Step: 38280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:33,279-Speed 5609.50 samples/sec   Loss 6.5122   LearningRate 0.0440   Epoch: 6   Global Step: 38290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:35,099-Speed 5628.75 samples/sec   Loss 6.4120   LearningRate 0.0440   Epoch: 6   Global Step: 38300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:36,932-Speed 5588.40 samples/sec   Loss 6.2830   LearningRate 0.0440   Epoch: 6   Global Step: 38310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:38,756-Speed 5615.26 samples/sec   Loss 6.1685   LearningRate 0.0440   Epoch: 6   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:40,557-Speed 5687.53 samples/sec   Loss 6.3116   LearningRate 0.0439   Epoch: 6   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:42,393-Speed 5580.88 samples/sec   Loss 6.3226   LearningRate 0.0439   Epoch: 6   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:52:44,234-Speed 5563.30 samples/sec   Loss 6.3997   LearningRate 0.0439   Epoch: 6   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:46,071-Speed 5575.68 samples/sec   Loss 6.3735   LearningRate 0.0439   Epoch: 6   Global Step: 38360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:47,894-Speed 5619.68 samples/sec   Loss 6.3853   LearningRate 0.0439   Epoch: 6   Global Step: 38370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:49,709-Speed 5644.54 samples/sec   Loss 6.4420   LearningRate 0.0439   Epoch: 6   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:51,525-Speed 5637.99 samples/sec   Loss 6.2830   LearningRate 0.0439   Epoch: 6   Global Step: 38390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:53,353-Speed 5604.81 samples/sec   Loss 6.3153   LearningRate 0.0439   Epoch: 6   Global Step: 38400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:55,168-Speed 5643.75 samples/sec   Loss 6.3492   LearningRate 0.0439   Epoch: 6   Global Step: 38410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:56,995-Speed 5609.11 samples/sec   Loss 6.3948   LearningRate 0.0438   Epoch: 6   Global Step: 38420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:52:58,832-Speed 5574.69 samples/sec   Loss 6.3303   LearningRate 0.0438   Epoch: 6   Global Step: 38430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:00,638-Speed 5671.43 samples/sec   Loss 6.4623   LearningRate 0.0438   Epoch: 6   Global Step: 38440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:02,463-Speed 5614.48 samples/sec   Loss 6.4525   LearningRate 0.0438   Epoch: 6   Global Step: 38450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:04,283-Speed 5627.28 samples/sec   Loss 6.3420   LearningRate 0.0438   Epoch: 6   Global Step: 38460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:06,126-Speed 5558.22 samples/sec   Loss 6.2942   LearningRate 0.0438   Epoch: 6   Global Step: 38470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:07,953-Speed 5605.10 samples/sec   Loss 6.3784   LearningRate 0.0438   Epoch: 6   Global Step: 38480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:09,762-Speed 5662.71 samples/sec   Loss 6.4106   LearningRate 0.0438   Epoch: 6   Global Step: 38490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:11,591-Speed 5602.38 samples/sec   Loss 6.3250   LearningRate 0.0438   Epoch: 6   Global Step: 38500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:13,408-Speed 5635.08 samples/sec   Loss 6.5364   LearningRate 0.0437   Epoch: 6   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:15,222-Speed 5647.49 samples/sec   Loss 6.3634   LearningRate 0.0437   Epoch: 6   Global Step: 38520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:17,045-Speed 5619.98 samples/sec   Loss 6.4310   LearningRate 0.0437   Epoch: 6   Global Step: 38530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:18,863-Speed 5634.87 samples/sec   Loss 6.3721   LearningRate 0.0437   Epoch: 6   Global Step: 38540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:20,695-Speed 5593.40 samples/sec   Loss 6.2946   LearningRate 0.0437   Epoch: 6   Global Step: 38550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:22,519-Speed 5613.32 samples/sec   Loss 6.3011   LearningRate 0.0437   Epoch: 6   Global Step: 38560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:24,356-Speed 5580.53 samples/sec   Loss 6.2393   LearningRate 0.0437   Epoch: 6   Global Step: 38570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:26,198-Speed 5558.34 samples/sec   Loss 6.5762   LearningRate 0.0437   Epoch: 6   Global Step: 38580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:28,029-Speed 5596.92 samples/sec   Loss 6.2853   LearningRate 0.0436   Epoch: 6   Global Step: 38590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:29,856-Speed 5604.24 samples/sec   Loss 6.3660   LearningRate 0.0436   Epoch: 6   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:31,666-Speed 5659.37 samples/sec   Loss 6.4881   LearningRate 0.0436   Epoch: 6   Global Step: 38610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:33,487-Speed 5625.16 samples/sec   Loss 6.4220   LearningRate 0.0436   Epoch: 6   Global Step: 38620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:35,309-Speed 5623.06 samples/sec   Loss 6.3202   LearningRate 0.0436   Epoch: 6   Global Step: 38630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:37,128-Speed 5629.89 samples/sec   Loss 6.3718   LearningRate 0.0436   Epoch: 6   Global Step: 38640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:53:38,926-Speed 5698.96 samples/sec   Loss 6.5327   LearningRate 0.0436   Epoch: 6   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:40,744-Speed 5634.46 samples/sec   Loss 6.2710   LearningRate 0.0436   Epoch: 6   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:42,571-Speed 5605.88 samples/sec   Loss 6.3478   LearningRate 0.0436   Epoch: 6   Global Step: 38670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:44,391-Speed 5628.09 samples/sec   Loss 6.2888   LearningRate 0.0435   Epoch: 6   Global Step: 38680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:46,209-Speed 5637.63 samples/sec   Loss 6.4561   LearningRate 0.0435   Epoch: 6   Global Step: 38690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:48,032-Speed 5617.58 samples/sec   Loss 6.2375   LearningRate 0.0435   Epoch: 6   Global Step: 38700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:49,848-Speed 5641.64 samples/sec   Loss 6.3544   LearningRate 0.0435   Epoch: 6   Global Step: 38710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:51,674-Speed 5609.46 samples/sec   Loss 6.3310   LearningRate 0.0435   Epoch: 6   Global Step: 38720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:53,491-Speed 5637.40 samples/sec   Loss 6.3952   LearningRate 0.0435   Epoch: 6   Global Step: 38730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:55,334-Speed 5559.25 samples/sec   Loss 6.4519   LearningRate 0.0435   Epoch: 6   Global Step: 38740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:57,155-Speed 5625.56 samples/sec   Loss 6.3068   LearningRate 0.0435   Epoch: 6   Global Step: 38750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:53:58,983-Speed 5601.98 samples/sec   Loss 6.2259   LearningRate 0.0434   Epoch: 6   Global Step: 38760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:00,795-Speed 5655.22 samples/sec   Loss 6.4730   LearningRate 0.0434   Epoch: 6   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:02,612-Speed 5634.56 samples/sec   Loss 6.5117   LearningRate 0.0434   Epoch: 6   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:04,433-Speed 5628.29 samples/sec   Loss 6.4401   LearningRate 0.0434   Epoch: 6   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:06,246-Speed 5648.38 samples/sec   Loss 6.4722   LearningRate 0.0434   Epoch: 6   Global Step: 38800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:08,082-Speed 5580.95 samples/sec   Loss 6.3986   LearningRate 0.0434   Epoch: 6   Global Step: 38810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:09,903-Speed 5622.54 samples/sec   Loss 6.3373   LearningRate 0.0434   Epoch: 6   Global Step: 38820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:11,721-Speed 5634.52 samples/sec   Loss 6.3460   LearningRate 0.0434   Epoch: 6   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:13,556-Speed 5582.27 samples/sec   Loss 6.2923   LearningRate 0.0434   Epoch: 6   Global Step: 38840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:15,383-Speed 5606.76 samples/sec   Loss 6.2249   LearningRate 0.0433   Epoch: 6   Global Step: 38850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:54:17,200-Speed 5637.65 samples/sec   Loss 6.3889   LearningRate 0.0433   Epoch: 6   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:19,052-Speed 5530.74 samples/sec   Loss 6.2983   LearningRate 0.0433   Epoch: 6   Global Step: 38870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:20,882-Speed 5598.62 samples/sec   Loss 6.3237   LearningRate 0.0433   Epoch: 6   Global Step: 38880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:22,726-Speed 5554.05 samples/sec   Loss 6.4499   LearningRate 0.0433   Epoch: 6   Global Step: 38890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:24,562-Speed 5579.70 samples/sec   Loss 6.4244   LearningRate 0.0433   Epoch: 6   Global Step: 38900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:26,413-Speed 5535.07 samples/sec   Loss 6.2212   LearningRate 0.0433   Epoch: 6   Global Step: 38910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:28,245-Speed 5590.59 samples/sec   Loss 6.3596   LearningRate 0.0433   Epoch: 6   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:30,072-Speed 5607.92 samples/sec   Loss 6.3625   LearningRate 0.0433   Epoch: 6   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:31,896-Speed 5614.90 samples/sec   Loss 6.1933   LearningRate 0.0432   Epoch: 6   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:33,725-Speed 5600.49 samples/sec   Loss 6.2491   LearningRate 0.0432   Epoch: 6   Global Step: 38950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:35,543-Speed 5636.20 samples/sec   Loss 6.4729   LearningRate 0.0432   Epoch: 6   Global Step: 38960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:54:37,371-Speed 5602.41 samples/sec   Loss 6.4041   LearningRate 0.0432   Epoch: 6   Global Step: 38970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:54:39,205-Speed 5584.54 samples/sec   Loss 6.2740   LearningRate 0.0432   Epoch: 6   Global Step: 38980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:54:41,039-Speed 5585.48 samples/sec   Loss 6.4090   LearningRate 0.0432   Epoch: 6   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:42,865-Speed 5611.26 samples/sec   Loss 6.2934   LearningRate 0.0432   Epoch: 6   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:44,705-Speed 5565.87 samples/sec   Loss 6.3074   LearningRate 0.0432   Epoch: 6   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:46,541-Speed 5579.35 samples/sec   Loss 6.2355   LearningRate 0.0431   Epoch: 6   Global Step: 39020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:48,361-Speed 5629.49 samples/sec   Loss 6.3448   LearningRate 0.0431   Epoch: 6   Global Step: 39030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:50,193-Speed 5590.94 samples/sec   Loss 6.3079   LearningRate 0.0431   Epoch: 6   Global Step: 39040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:52,008-Speed 5645.47 samples/sec   Loss 6.3386   LearningRate 0.0431   Epoch: 6   Global Step: 39050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:53,827-Speed 5629.50 samples/sec   Loss 6.2882   LearningRate 0.0431   Epoch: 6   Global Step: 39060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:55,657-Speed 5599.12 samples/sec   Loss 6.2671   LearningRate 0.0431   Epoch: 6   Global Step: 39070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:57,475-Speed 5634.24 samples/sec   Loss 6.3614   LearningRate 0.0431   Epoch: 6   Global Step: 39080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:54:59,288-Speed 5647.84 samples/sec   Loss 6.3873   LearningRate 0.0431   Epoch: 6   Global Step: 39090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:01,106-Speed 5634.26 samples/sec   Loss 6.3947   LearningRate 0.0431   Epoch: 6   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:02,921-Speed 5644.98 samples/sec   Loss 6.2470   LearningRate 0.0430   Epoch: 6   Global Step: 39110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:04,743-Speed 5622.12 samples/sec   Loss 6.5578   LearningRate 0.0430   Epoch: 6   Global Step: 39120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:06,572-Speed 5600.84 samples/sec   Loss 6.3257   LearningRate 0.0430   Epoch: 6   Global Step: 39130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:08,403-Speed 5594.48 samples/sec   Loss 6.2434   LearningRate 0.0430   Epoch: 6   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:10,251-Speed 5542.19 samples/sec   Loss 6.2537   LearningRate 0.0430   Epoch: 6   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:12,070-Speed 5632.46 samples/sec   Loss 6.4199   LearningRate 0.0430   Epoch: 6   Global Step: 39160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:13,897-Speed 5605.93 samples/sec   Loss 6.2903   LearningRate 0.0430   Epoch: 6   Global Step: 39170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:15,720-Speed 5620.55 samples/sec   Loss 6.3226   LearningRate 0.0430   Epoch: 6   Global Step: 39180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:17,543-Speed 5619.12 samples/sec   Loss 6.3284   LearningRate 0.0430   Epoch: 6   Global Step: 39190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:19,375-Speed 5590.64 samples/sec   Loss 6.4716   LearningRate 0.0429   Epoch: 6   Global Step: 39200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:21,210-Speed 5580.43 samples/sec   Loss 6.3376   LearningRate 0.0429   Epoch: 6   Global Step: 39210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:23,040-Speed 5601.78 samples/sec   Loss 6.3181   LearningRate 0.0429   Epoch: 6   Global Step: 39220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:24,862-Speed 5620.65 samples/sec   Loss 6.4797   LearningRate 0.0429   Epoch: 6   Global Step: 39230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:26,695-Speed 5586.92 samples/sec   Loss 6.3337   LearningRate 0.0429   Epoch: 6   Global Step: 39240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:28,533-Speed 5575.43 samples/sec   Loss 6.3301   LearningRate 0.0429   Epoch: 6   Global Step: 39250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:30,357-Speed 5613.17 samples/sec   Loss 6.3516   LearningRate 0.0429   Epoch: 6   Global Step: 39260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:32,188-Speed 5596.26 samples/sec   Loss 6.2448   LearningRate 0.0429   Epoch: 6   Global Step: 39270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:34,005-Speed 5637.37 samples/sec   Loss 6.3257   LearningRate 0.0428   Epoch: 6   Global Step: 39280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:35,861-Speed 5518.69 samples/sec   Loss 6.2782   LearningRate 0.0428   Epoch: 6   Global Step: 39290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:37,698-Speed 5578.30 samples/sec   Loss 6.2597   LearningRate 0.0428   Epoch: 6   Global Step: 39300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:39,511-Speed 5648.48 samples/sec   Loss 6.3147   LearningRate 0.0428   Epoch: 6   Global Step: 39310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:41,329-Speed 5634.69 samples/sec   Loss 6.3121   LearningRate 0.0428   Epoch: 6   Global Step: 39320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:43,150-Speed 5623.28 samples/sec   Loss 6.3828   LearningRate 0.0428   Epoch: 6   Global Step: 39330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:44,974-Speed 5616.17 samples/sec   Loss 6.1764   LearningRate 0.0428   Epoch: 6   Global Step: 39340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:46,804-Speed 5600.23 samples/sec   Loss 6.3604   LearningRate 0.0428   Epoch: 6   Global Step: 39350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:48,646-Speed 5559.30 samples/sec   Loss 6.4487   LearningRate 0.0428   Epoch: 6   Global Step: 39360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:55:50,480-Speed 5586.90 samples/sec   Loss 6.2570   LearningRate 0.0427   Epoch: 6   Global Step: 39370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:52,320-Speed 5565.68 samples/sec   Loss 6.2742   LearningRate 0.0427   Epoch: 6   Global Step: 39380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:54,147-Speed 5608.04 samples/sec   Loss 6.3976   LearningRate 0.0427   Epoch: 6   Global Step: 39390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:55,998-Speed 5534.17 samples/sec   Loss 6.2771   LearningRate 0.0427   Epoch: 6   Global Step: 39400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:57,826-Speed 5602.11 samples/sec   Loss 6.3059   LearningRate 0.0427   Epoch: 6   Global Step: 39410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:55:59,639-Speed 5650.74 samples/sec   Loss 6.3455   LearningRate 0.0427   Epoch: 6   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:01,454-Speed 5645.89 samples/sec   Loss 6.3529   LearningRate 0.0427   Epoch: 6   Global Step: 39430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:03,267-Speed 5647.61 samples/sec   Loss 6.1618   LearningRate 0.0427   Epoch: 6   Global Step: 39440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:05,112-Speed 5553.99 samples/sec   Loss 6.2038   LearningRate 0.0427   Epoch: 6   Global Step: 39450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:06,975-Speed 5495.50 samples/sec   Loss 6.3129   LearningRate 0.0426   Epoch: 6   Global Step: 39460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:08,792-Speed 5638.76 samples/sec   Loss 6.1712   LearningRate 0.0426   Epoch: 6   Global Step: 39470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:10,621-Speed 5599.80 samples/sec   Loss 6.3842   LearningRate 0.0426   Epoch: 6   Global Step: 39480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:12,446-Speed 5615.41 samples/sec   Loss 6.3737   LearningRate 0.0426   Epoch: 6   Global Step: 39490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:14,258-Speed 5651.16 samples/sec   Loss 6.2840   LearningRate 0.0426   Epoch: 6   Global Step: 39500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:16,088-Speed 5599.57 samples/sec   Loss 6.3270   LearningRate 0.0426   Epoch: 6   Global Step: 39510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:17,925-Speed 5575.88 samples/sec   Loss 6.3274   LearningRate 0.0426   Epoch: 6   Global Step: 39520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:19,744-Speed 5630.93 samples/sec   Loss 6.3720   LearningRate 0.0426   Epoch: 6   Global Step: 39530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:21,561-Speed 5638.60 samples/sec   Loss 6.2936   LearningRate 0.0426   Epoch: 6   Global Step: 39540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:23,382-Speed 5625.24 samples/sec   Loss 6.2988   LearningRate 0.0425   Epoch: 6   Global Step: 39550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:25,194-Speed 5652.79 samples/sec   Loss 6.3578   LearningRate 0.0425   Epoch: 6   Global Step: 39560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:27,013-Speed 5630.05 samples/sec   Loss 6.2536   LearningRate 0.0425   Epoch: 6   Global Step: 39570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:28,843-Speed 5597.48 samples/sec   Loss 6.3029   LearningRate 0.0425   Epoch: 6   Global Step: 39580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:30,670-Speed 5606.25 samples/sec   Loss 6.2353   LearningRate 0.0425   Epoch: 6   Global Step: 39590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:32,498-Speed 5603.01 samples/sec   Loss 6.3529   LearningRate 0.0425   Epoch: 6   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:34,340-Speed 5560.77 samples/sec   Loss 6.3187   LearningRate 0.0425   Epoch: 6   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:36,164-Speed 5618.00 samples/sec   Loss 6.2281   LearningRate 0.0425   Epoch: 6   Global Step: 39620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:38,008-Speed 5555.89 samples/sec   Loss 6.2307   LearningRate 0.0424   Epoch: 6   Global Step: 39630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:39,845-Speed 5576.34 samples/sec   Loss 6.2491   LearningRate 0.0424   Epoch: 6   Global Step: 39640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:41,656-Speed 5657.77 samples/sec   Loss 6.2771   LearningRate 0.0424   Epoch: 6   Global Step: 39650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:43,485-Speed 5600.52 samples/sec   Loss 6.3360   LearningRate 0.0424   Epoch: 6   Global Step: 39660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:45,314-Speed 5599.69 samples/sec   Loss 6.2785   LearningRate 0.0424   Epoch: 6   Global Step: 39670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:47,147-Speed 5586.89 samples/sec   Loss 6.2380   LearningRate 0.0424   Epoch: 6   Global Step: 39680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:49,002-Speed 5521.75 samples/sec   Loss 6.3907   LearningRate 0.0424   Epoch: 6   Global Step: 39690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:56:50,826-Speed 5617.36 samples/sec   Loss 6.2675   LearningRate 0.0424   Epoch: 6   Global Step: 39700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:52,629-Speed 5681.55 samples/sec   Loss 6.3084   LearningRate 0.0424   Epoch: 6   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:54,454-Speed 5614.34 samples/sec   Loss 6.1734   LearningRate 0.0423   Epoch: 6   Global Step: 39720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:56,264-Speed 5656.93 samples/sec   Loss 6.3382   LearningRate 0.0423   Epoch: 6   Global Step: 39730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:58,074-Speed 5662.61 samples/sec   Loss 6.2875   LearningRate 0.0423   Epoch: 6   Global Step: 39740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:56:59,890-Speed 5641.45 samples/sec   Loss 6.3057   LearningRate 0.0423   Epoch: 6   Global Step: 39750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:01,699-Speed 5662.22 samples/sec   Loss 6.3860   LearningRate 0.0423   Epoch: 6   Global Step: 39760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:03,512-Speed 5647.42 samples/sec   Loss 6.1675   LearningRate 0.0423   Epoch: 6   Global Step: 39770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:05,336-Speed 5616.39 samples/sec   Loss 6.1490   LearningRate 0.0423   Epoch: 6   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:07,151-Speed 5645.33 samples/sec   Loss 6.2811   LearningRate 0.0423   Epoch: 6   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:09,009-Speed 5511.73 samples/sec   Loss 6.2505   LearningRate 0.0423   Epoch: 6   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:20,926-Speed 859.35 samples/sec   Loss 5.7272   LearningRate 0.0422   Epoch: 7   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:22,802-Speed 5460.94 samples/sec   Loss 5.6003   LearningRate 0.0422   Epoch: 7   Global Step: 39820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:24,670-Speed 5484.44 samples/sec   Loss 5.6331   LearningRate 0.0422   Epoch: 7   Global Step: 39830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:26,509-Speed 5568.27 samples/sec   Loss 5.5240   LearningRate 0.0422   Epoch: 7   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:28,337-Speed 5603.92 samples/sec   Loss 5.6577   LearningRate 0.0422   Epoch: 7   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:30,499-Speed 4737.62 samples/sec   Loss 5.8037   LearningRate 0.0422   Epoch: 7   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:32,350-Speed 5535.77 samples/sec   Loss 5.7585   LearningRate 0.0422   Epoch: 7   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:34,177-Speed 5606.58 samples/sec   Loss 5.6469   LearningRate 0.0422   Epoch: 7   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:36,014-Speed 5576.89 samples/sec   Loss 5.8597   LearningRate 0.0421   Epoch: 7   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:37,843-Speed 5601.31 samples/sec   Loss 5.6976   LearningRate 0.0421   Epoch: 7   Global Step: 39900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:39,691-Speed 5540.80 samples/sec   Loss 5.6961   LearningRate 0.0421   Epoch: 7   Global Step: 39910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:57:41,542-Speed 5536.02 samples/sec   Loss 5.6634   LearningRate 0.0421   Epoch: 7   Global Step: 39920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:43,359-Speed 5639.01 samples/sec   Loss 5.6733   LearningRate 0.0421   Epoch: 7   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:45,225-Speed 5489.03 samples/sec   Loss 5.7822   LearningRate 0.0421   Epoch: 7   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:47,070-Speed 5550.82 samples/sec   Loss 5.5437   LearningRate 0.0421   Epoch: 7   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:48,888-Speed 5636.61 samples/sec   Loss 5.6270   LearningRate 0.0421   Epoch: 7   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:50,706-Speed 5633.86 samples/sec   Loss 5.6557   LearningRate 0.0421   Epoch: 7   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:52,547-Speed 5563.94 samples/sec   Loss 5.7182   LearningRate 0.0420   Epoch: 7   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:54,373-Speed 5611.50 samples/sec   Loss 5.9953   LearningRate 0.0420   Epoch: 7   Global Step: 39990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:57:56,197-Speed 5615.86 samples/sec   Loss 5.7426   LearningRate 0.0420   Epoch: 7   Global Step: 40000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:58:22,205-[lfw][40000]XNorm: 22.794917
Training: 2022-04-27 03:58:22,205-[lfw][40000]Accuracy-Flip: 0.99583+-0.00300
Training: 2022-04-27 03:58:22,206-[lfw][40000]Accuracy-Highest: 0.99750
Training: 2022-04-27 03:58:52,351-[cfp_fp][40000]XNorm: 19.739909
Training: 2022-04-27 03:58:52,351-[cfp_fp][40000]Accuracy-Flip: 0.93886+-0.01240
Training: 2022-04-27 03:58:52,352-[cfp_fp][40000]Accuracy-Highest: 0.94771
Training: 2022-04-27 03:59:18,385-[agedb_30][40000]XNorm: 22.522597
Training: 2022-04-27 03:59:18,385-[agedb_30][40000]Accuracy-Flip: 0.97117+-0.00522
Training: 2022-04-27 03:59:18,385-[agedb_30][40000]Accuracy-Highest: 0.97300
Training: 2022-04-27 03:59:20,220-Speed 121.87 samples/sec   Loss 5.7374   LearningRate 0.0420   Epoch: 7   Global Step: 40010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:22,017-Speed 5702.18 samples/sec   Loss 5.7600   LearningRate 0.0420   Epoch: 7   Global Step: 40020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:23,835-Speed 5633.24 samples/sec   Loss 5.8910   LearningRate 0.0420   Epoch: 7   Global Step: 40030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:25,662-Speed 5608.80 samples/sec   Loss 5.9187   LearningRate 0.0420   Epoch: 7   Global Step: 40040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:27,480-Speed 5633.67 samples/sec   Loss 5.8355   LearningRate 0.0420   Epoch: 7   Global Step: 40050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:29,314-Speed 5585.54 samples/sec   Loss 5.7933   LearningRate 0.0420   Epoch: 7   Global Step: 40060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:31,131-Speed 5638.13 samples/sec   Loss 5.5650   LearningRate 0.0419   Epoch: 7   Global Step: 40070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:32,957-Speed 5609.49 samples/sec   Loss 5.7391   LearningRate 0.0419   Epoch: 7   Global Step: 40080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:34,796-Speed 5571.22 samples/sec   Loss 6.0047   LearningRate 0.0419   Epoch: 7   Global Step: 40090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:36,628-Speed 5588.76 samples/sec   Loss 5.7968   LearningRate 0.0419   Epoch: 7   Global Step: 40100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:38,455-Speed 5606.92 samples/sec   Loss 5.6955   LearningRate 0.0419   Epoch: 7   Global Step: 40110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:40,273-Speed 5634.97 samples/sec   Loss 5.8867   LearningRate 0.0419   Epoch: 7   Global Step: 40120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:42,082-Speed 5662.89 samples/sec   Loss 5.8195   LearningRate 0.0419   Epoch: 7   Global Step: 40130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:43,906-Speed 5615.37 samples/sec   Loss 5.9328   LearningRate 0.0419   Epoch: 7   Global Step: 40140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:45,727-Speed 5626.75 samples/sec   Loss 5.8665   LearningRate 0.0419   Epoch: 7   Global Step: 40150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:47,582-Speed 5521.78 samples/sec   Loss 5.7694   LearningRate 0.0418   Epoch: 7   Global Step: 40160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 03:59:49,418-Speed 5578.66 samples/sec   Loss 5.9927   LearningRate 0.0418   Epoch: 7   Global Step: 40170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:51,250-Speed 5591.61 samples/sec   Loss 5.9488   LearningRate 0.0418   Epoch: 7   Global Step: 40180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:53,106-Speed 5517.87 samples/sec   Loss 5.8938   LearningRate 0.0418   Epoch: 7   Global Step: 40190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:54,923-Speed 5639.36 samples/sec   Loss 5.7992   LearningRate 0.0418   Epoch: 7   Global Step: 40200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:56,752-Speed 5600.55 samples/sec   Loss 5.8906   LearningRate 0.0418   Epoch: 7   Global Step: 40210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 03:59:58,585-Speed 5588.40 samples/sec   Loss 5.8966   LearningRate 0.0418   Epoch: 7   Global Step: 40220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:00,395-Speed 5657.00 samples/sec   Loss 5.8452   LearningRate 0.0418   Epoch: 7   Global Step: 40230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:02,204-Speed 5663.47 samples/sec   Loss 5.8466   LearningRate 0.0418   Epoch: 7   Global Step: 40240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:04,011-Speed 5671.60 samples/sec   Loss 5.9040   LearningRate 0.0417   Epoch: 7   Global Step: 40250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:05,844-Speed 5585.98 samples/sec   Loss 5.7977   LearningRate 0.0417   Epoch: 7   Global Step: 40260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:07,651-Speed 5668.66 samples/sec   Loss 5.9248   LearningRate 0.0417   Epoch: 7   Global Step: 40270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:09,486-Speed 5582.86 samples/sec   Loss 5.9767   LearningRate 0.0417   Epoch: 7   Global Step: 40280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:11,301-Speed 5642.87 samples/sec   Loss 5.8902   LearningRate 0.0417   Epoch: 7   Global Step: 40290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:13,129-Speed 5603.47 samples/sec   Loss 5.8846   LearningRate 0.0417   Epoch: 7   Global Step: 40300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:14,945-Speed 5642.53 samples/sec   Loss 5.8743   LearningRate 0.0417   Epoch: 7   Global Step: 40310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:16,755-Speed 5658.87 samples/sec   Loss 5.9951   LearningRate 0.0417   Epoch: 7   Global Step: 40320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:18,573-Speed 5634.48 samples/sec   Loss 5.8198   LearningRate 0.0416   Epoch: 7   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:20,388-Speed 5642.12 samples/sec   Loss 6.0333   LearningRate 0.0416   Epoch: 7   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:22,215-Speed 5612.54 samples/sec   Loss 5.8758   LearningRate 0.0416   Epoch: 7   Global Step: 40350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:24,053-Speed 5572.52 samples/sec   Loss 5.8821   LearningRate 0.0416   Epoch: 7   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:25,926-Speed 5467.39 samples/sec   Loss 6.0120   LearningRate 0.0416   Epoch: 7   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:27,792-Speed 5491.93 samples/sec   Loss 6.1105   LearningRate 0.0416   Epoch: 7   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:29,626-Speed 5582.59 samples/sec   Loss 5.8884   LearningRate 0.0416   Epoch: 7   Global Step: 40390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:31,440-Speed 5647.28 samples/sec   Loss 5.9589   LearningRate 0.0416   Epoch: 7   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:33,286-Speed 5549.97 samples/sec   Loss 5.9397   LearningRate 0.0416   Epoch: 7   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:35,110-Speed 5614.96 samples/sec   Loss 6.0534   LearningRate 0.0415   Epoch: 7   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:36,932-Speed 5622.78 samples/sec   Loss 5.7971   LearningRate 0.0415   Epoch: 7   Global Step: 40430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:38,752-Speed 5627.41 samples/sec   Loss 6.0672   LearningRate 0.0415   Epoch: 7   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:40,587-Speed 5584.32 samples/sec   Loss 5.9733   LearningRate 0.0415   Epoch: 7   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:42,417-Speed 5597.36 samples/sec   Loss 5.9466   LearningRate 0.0415   Epoch: 7   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:44,241-Speed 5613.22 samples/sec   Loss 6.0344   LearningRate 0.0415   Epoch: 7   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:46,105-Speed 5496.81 samples/sec   Loss 5.9919   LearningRate 0.0415   Epoch: 7   Global Step: 40480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:47,937-Speed 5593.45 samples/sec   Loss 6.0547   LearningRate 0.0415   Epoch: 7   Global Step: 40490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:00:49,795-Speed 5512.74 samples/sec   Loss 5.8505   LearningRate 0.0415   Epoch: 7   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:51,685-Speed 5420.04 samples/sec   Loss 5.8578   LearningRate 0.0414   Epoch: 7   Global Step: 40510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:53,505-Speed 5626.81 samples/sec   Loss 5.8981   LearningRate 0.0414   Epoch: 7   Global Step: 40520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:55,347-Speed 5560.75 samples/sec   Loss 5.9429   LearningRate 0.0414   Epoch: 7   Global Step: 40530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:57,197-Speed 5538.02 samples/sec   Loss 5.9234   LearningRate 0.0414   Epoch: 7   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:00:59,035-Speed 5572.16 samples/sec   Loss 5.9829   LearningRate 0.0414   Epoch: 7   Global Step: 40550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:00,872-Speed 5575.40 samples/sec   Loss 5.9985   LearningRate 0.0414   Epoch: 7   Global Step: 40560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:02,714-Speed 5560.49 samples/sec   Loss 6.1632   LearningRate 0.0414   Epoch: 7   Global Step: 40570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:04,532-Speed 5634.00 samples/sec   Loss 5.8989   LearningRate 0.0414   Epoch: 7   Global Step: 40580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:06,368-Speed 5581.80 samples/sec   Loss 6.0403   LearningRate 0.0414   Epoch: 7   Global Step: 40590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:08,186-Speed 5634.40 samples/sec   Loss 5.8996   LearningRate 0.0413   Epoch: 7   Global Step: 40600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:10,043-Speed 5516.58 samples/sec   Loss 6.0297   LearningRate 0.0413   Epoch: 7   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:11,853-Speed 5657.51 samples/sec   Loss 6.0319   LearningRate 0.0413   Epoch: 7   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:13,685-Speed 5593.59 samples/sec   Loss 6.1131   LearningRate 0.0413   Epoch: 7   Global Step: 40630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:15,545-Speed 5506.75 samples/sec   Loss 6.1633   LearningRate 0.0413   Epoch: 7   Global Step: 40640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:17,383-Speed 5573.69 samples/sec   Loss 5.9573   LearningRate 0.0413   Epoch: 7   Global Step: 40650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:19,222-Speed 5569.39 samples/sec   Loss 6.0951   LearningRate 0.0413   Epoch: 7   Global Step: 40660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:21,037-Speed 5644.03 samples/sec   Loss 6.0730   LearningRate 0.0413   Epoch: 7   Global Step: 40670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:22,862-Speed 5611.15 samples/sec   Loss 6.1305   LearningRate 0.0413   Epoch: 7   Global Step: 40680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:24,681-Speed 5632.12 samples/sec   Loss 5.9536   LearningRate 0.0412   Epoch: 7   Global Step: 40690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:26,535-Speed 5525.54 samples/sec   Loss 5.9679   LearningRate 0.0412   Epoch: 7   Global Step: 40700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:28,372-Speed 5574.44 samples/sec   Loss 5.9689   LearningRate 0.0412   Epoch: 7   Global Step: 40710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:30,200-Speed 5603.71 samples/sec   Loss 6.1464   LearningRate 0.0412   Epoch: 7   Global Step: 40720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:32,051-Speed 5535.00 samples/sec   Loss 6.1266   LearningRate 0.0412   Epoch: 7   Global Step: 40730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:33,882-Speed 5594.92 samples/sec   Loss 6.0408   LearningRate 0.0412   Epoch: 7   Global Step: 40740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:35,700-Speed 5633.58 samples/sec   Loss 6.0001   LearningRate 0.0412   Epoch: 7   Global Step: 40750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:37,537-Speed 5578.21 samples/sec   Loss 6.0488   LearningRate 0.0412   Epoch: 7   Global Step: 40760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:39,371-Speed 5584.14 samples/sec   Loss 5.9507   LearningRate 0.0412   Epoch: 7   Global Step: 40770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:01:41,175-Speed 5677.44 samples/sec   Loss 6.1184   LearningRate 0.0411   Epoch: 7   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:43,001-Speed 5610.88 samples/sec   Loss 6.0774   LearningRate 0.0411   Epoch: 7   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:44,845-Speed 5559.38 samples/sec   Loss 6.0225   LearningRate 0.0411   Epoch: 7   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:46,674-Speed 5600.83 samples/sec   Loss 5.8966   LearningRate 0.0411   Epoch: 7   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:48,514-Speed 5566.83 samples/sec   Loss 6.1604   LearningRate 0.0411   Epoch: 7   Global Step: 40820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:50,369-Speed 5520.78 samples/sec   Loss 6.0833   LearningRate 0.0411   Epoch: 7   Global Step: 40830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:52,234-Speed 5492.65 samples/sec   Loss 6.0428   LearningRate 0.0411   Epoch: 7   Global Step: 40840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:54,145-Speed 5361.70 samples/sec   Loss 6.0688   LearningRate 0.0411   Epoch: 7   Global Step: 40850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:55,979-Speed 5586.03 samples/sec   Loss 6.1262   LearningRate 0.0410   Epoch: 7   Global Step: 40860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:57,797-Speed 5634.90 samples/sec   Loss 6.0797   LearningRate 0.0410   Epoch: 7   Global Step: 40870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:01:59,643-Speed 5547.77 samples/sec   Loss 6.1024   LearningRate 0.0410   Epoch: 7   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:01,462-Speed 5631.58 samples/sec   Loss 6.0159   LearningRate 0.0410   Epoch: 7   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:03,271-Speed 5661.43 samples/sec   Loss 6.1905   LearningRate 0.0410   Epoch: 7   Global Step: 40900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:05,121-Speed 5537.19 samples/sec   Loss 6.1464   LearningRate 0.0410   Epoch: 7   Global Step: 40910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:06,943-Speed 5623.15 samples/sec   Loss 6.0393   LearningRate 0.0410   Epoch: 7   Global Step: 40920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:08,775-Speed 5591.30 samples/sec   Loss 5.9744   LearningRate 0.0410   Epoch: 7   Global Step: 40930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:10,619-Speed 5553.81 samples/sec   Loss 6.0879   LearningRate 0.0410   Epoch: 7   Global Step: 40940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:12,432-Speed 5650.00 samples/sec   Loss 5.9193   LearningRate 0.0409   Epoch: 7   Global Step: 40950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:14,300-Speed 5483.75 samples/sec   Loss 5.9980   LearningRate 0.0409   Epoch: 7   Global Step: 40960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:16,138-Speed 5574.33 samples/sec   Loss 6.0177   LearningRate 0.0409   Epoch: 7   Global Step: 40970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:17,981-Speed 5557.01 samples/sec   Loss 6.0148   LearningRate 0.0409   Epoch: 7   Global Step: 40980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:19,796-Speed 5645.47 samples/sec   Loss 6.0670   LearningRate 0.0409   Epoch: 7   Global Step: 40990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:21,667-Speed 5474.84 samples/sec   Loss 6.0782   LearningRate 0.0409   Epoch: 7   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:23,486-Speed 5630.60 samples/sec   Loss 6.0039   LearningRate 0.0409   Epoch: 7   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:25,312-Speed 5609.35 samples/sec   Loss 6.0044   LearningRate 0.0409   Epoch: 7   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:27,138-Speed 5609.58 samples/sec   Loss 6.1958   LearningRate 0.0409   Epoch: 7   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:28,965-Speed 5606.94 samples/sec   Loss 6.0262   LearningRate 0.0408   Epoch: 7   Global Step: 41040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:30,795-Speed 5598.03 samples/sec   Loss 6.0882   LearningRate 0.0408   Epoch: 7   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:32,613-Speed 5632.77 samples/sec   Loss 6.0052   LearningRate 0.0408   Epoch: 7   Global Step: 41060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:34,461-Speed 5543.35 samples/sec   Loss 6.0269   LearningRate 0.0408   Epoch: 7   Global Step: 41070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:36,342-Speed 5446.17 samples/sec   Loss 6.2014   LearningRate 0.0408   Epoch: 7   Global Step: 41080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:38,204-Speed 5504.19 samples/sec   Loss 6.0242   LearningRate 0.0408   Epoch: 7   Global Step: 41090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:40,025-Speed 5624.30 samples/sec   Loss 6.0676   LearningRate 0.0408   Epoch: 7   Global Step: 41100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:41,854-Speed 5599.80 samples/sec   Loss 5.9011   LearningRate 0.0408   Epoch: 7   Global Step: 41110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:43,676-Speed 5620.84 samples/sec   Loss 6.0890   LearningRate 0.0408   Epoch: 7   Global Step: 41120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:45,508-Speed 5591.20 samples/sec   Loss 5.9788   LearningRate 0.0407   Epoch: 7   Global Step: 41130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:47,338-Speed 5598.30 samples/sec   Loss 6.0184   LearningRate 0.0407   Epoch: 7   Global Step: 41140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:02:49,158-Speed 5628.27 samples/sec   Loss 5.9941   LearningRate 0.0407   Epoch: 7   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:50,987-Speed 5601.91 samples/sec   Loss 6.1005   LearningRate 0.0407   Epoch: 7   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:52,797-Speed 5657.26 samples/sec   Loss 6.0907   LearningRate 0.0407   Epoch: 7   Global Step: 41170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:54,621-Speed 5615.00 samples/sec   Loss 5.9572   LearningRate 0.0407   Epoch: 7   Global Step: 41180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:56,435-Speed 5648.29 samples/sec   Loss 6.0079   LearningRate 0.0407   Epoch: 7   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:02:58,255-Speed 5628.97 samples/sec   Loss 5.9664   LearningRate 0.0407   Epoch: 7   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:00,069-Speed 5647.15 samples/sec   Loss 5.9427   LearningRate 0.0407   Epoch: 7   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:01,905-Speed 5578.99 samples/sec   Loss 6.1523   LearningRate 0.0406   Epoch: 7   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:03,765-Speed 5507.60 samples/sec   Loss 5.9890   LearningRate 0.0406   Epoch: 7   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:05,676-Speed 5361.47 samples/sec   Loss 6.1253   LearningRate 0.0406   Epoch: 7   Global Step: 41240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:07,496-Speed 5628.80 samples/sec   Loss 6.0597   LearningRate 0.0406   Epoch: 7   Global Step: 41250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:09,309-Speed 5648.40 samples/sec   Loss 6.0662   LearningRate 0.0406   Epoch: 7   Global Step: 41260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:11,200-Speed 5417.38 samples/sec   Loss 6.2670   LearningRate 0.0406   Epoch: 7   Global Step: 41270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:13,020-Speed 5627.78 samples/sec   Loss 6.0883   LearningRate 0.0406   Epoch: 7   Global Step: 41280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:14,855-Speed 5582.98 samples/sec   Loss 6.0995   LearningRate 0.0406   Epoch: 7   Global Step: 41290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:16,689-Speed 5584.39 samples/sec   Loss 6.0607   LearningRate 0.0406   Epoch: 7   Global Step: 41300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:18,522-Speed 5588.95 samples/sec   Loss 5.9383   LearningRate 0.0405   Epoch: 7   Global Step: 41310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:20,335-Speed 5649.93 samples/sec   Loss 6.1578   LearningRate 0.0405   Epoch: 7   Global Step: 41320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:22,146-Speed 5654.95 samples/sec   Loss 6.0725   LearningRate 0.0405   Epoch: 7   Global Step: 41330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:23,959-Speed 5651.45 samples/sec   Loss 5.9779   LearningRate 0.0405   Epoch: 7   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:25,787-Speed 5602.89 samples/sec   Loss 6.0706   LearningRate 0.0405   Epoch: 7   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:27,619-Speed 5595.48 samples/sec   Loss 6.0015   LearningRate 0.0405   Epoch: 7   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:29,443-Speed 5616.14 samples/sec   Loss 6.0984   LearningRate 0.0405   Epoch: 7   Global Step: 41370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:31,258-Speed 5641.90 samples/sec   Loss 5.9710   LearningRate 0.0405   Epoch: 7   Global Step: 41380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:33,087-Speed 5600.59 samples/sec   Loss 6.0272   LearningRate 0.0405   Epoch: 7   Global Step: 41390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:03:34,922-Speed 5581.45 samples/sec   Loss 5.9386   LearningRate 0.0404   Epoch: 7   Global Step: 41400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:36,736-Speed 5648.72 samples/sec   Loss 6.0775   LearningRate 0.0404   Epoch: 7   Global Step: 41410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:38,572-Speed 5577.82 samples/sec   Loss 6.0121   LearningRate 0.0404   Epoch: 7   Global Step: 41420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:40,398-Speed 5610.55 samples/sec   Loss 5.8985   LearningRate 0.0404   Epoch: 7   Global Step: 41430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:42,258-Speed 5508.28 samples/sec   Loss 6.0548   LearningRate 0.0404   Epoch: 7   Global Step: 41440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:44,087-Speed 5598.96 samples/sec   Loss 6.0095   LearningRate 0.0404   Epoch: 7   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:45,928-Speed 5563.82 samples/sec   Loss 6.1505   LearningRate 0.0404   Epoch: 7   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:47,777-Speed 5541.88 samples/sec   Loss 5.9212   LearningRate 0.0404   Epoch: 7   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:49,602-Speed 5611.71 samples/sec   Loss 6.0548   LearningRate 0.0404   Epoch: 7   Global Step: 41480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:51,445-Speed 5558.39 samples/sec   Loss 6.0958   LearningRate 0.0403   Epoch: 7   Global Step: 41490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:53,251-Speed 5672.26 samples/sec   Loss 5.9979   LearningRate 0.0403   Epoch: 7   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:55,064-Speed 5647.59 samples/sec   Loss 6.0301   LearningRate 0.0403   Epoch: 7   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:56,878-Speed 5648.72 samples/sec   Loss 6.0143   LearningRate 0.0403   Epoch: 7   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:03:58,690-Speed 5652.06 samples/sec   Loss 6.0581   LearningRate 0.0403   Epoch: 7   Global Step: 41530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:04:00,533-Speed 5557.00 samples/sec   Loss 6.1056   LearningRate 0.0403   Epoch: 7   Global Step: 41540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:04:02,363-Speed 5599.21 samples/sec   Loss 6.0896   LearningRate 0.0403   Epoch: 7   Global Step: 41550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:04,201-Speed 5573.71 samples/sec   Loss 6.0349   LearningRate 0.0403   Epoch: 7   Global Step: 41560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:06,046-Speed 5550.53 samples/sec   Loss 5.9799   LearningRate 0.0403   Epoch: 7   Global Step: 41570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:07,866-Speed 5629.83 samples/sec   Loss 6.0423   LearningRate 0.0402   Epoch: 7   Global Step: 41580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:09,690-Speed 5616.55 samples/sec   Loss 6.1574   LearningRate 0.0402   Epoch: 7   Global Step: 41590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:11,512-Speed 5620.39 samples/sec   Loss 6.2088   LearningRate 0.0402   Epoch: 7   Global Step: 41600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:13,383-Speed 5475.60 samples/sec   Loss 6.0565   LearningRate 0.0402   Epoch: 7   Global Step: 41610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:15,203-Speed 5628.85 samples/sec   Loss 6.0566   LearningRate 0.0402   Epoch: 7   Global Step: 41620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:17,018-Speed 5643.20 samples/sec   Loss 5.9441   LearningRate 0.0402   Epoch: 7   Global Step: 41630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:18,849-Speed 5593.37 samples/sec   Loss 5.8315   LearningRate 0.0402   Epoch: 7   Global Step: 41640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:04:20,697-Speed 5543.12 samples/sec   Loss 5.9978   LearningRate 0.0402   Epoch: 7   Global Step: 41650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:04:22,531-Speed 5585.73 samples/sec   Loss 6.0218   LearningRate 0.0402   Epoch: 7   Global Step: 41660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 04:04:24,349-Speed 5633.38 samples/sec   Loss 6.0673   LearningRate 0.0401   Epoch: 7   Global Step: 41670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:04:26,154-Speed 5673.94 samples/sec   Loss 6.0035   LearningRate 0.0401   Epoch: 7   Global Step: 41680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:27,966-Speed 5655.37 samples/sec   Loss 6.0645   LearningRate 0.0401   Epoch: 7   Global Step: 41690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:29,791-Speed 5612.61 samples/sec   Loss 6.1100   LearningRate 0.0401   Epoch: 7   Global Step: 41700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:31,617-Speed 5608.17 samples/sec   Loss 5.9683   LearningRate 0.0401   Epoch: 7   Global Step: 41710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:33,437-Speed 5630.68 samples/sec   Loss 5.9942   LearningRate 0.0401   Epoch: 7   Global Step: 41720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:35,268-Speed 5592.87 samples/sec   Loss 6.1931   LearningRate 0.0401   Epoch: 7   Global Step: 41730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:37,111-Speed 5558.44 samples/sec   Loss 6.0725   LearningRate 0.0401   Epoch: 7   Global Step: 41740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:38,946-Speed 5581.21 samples/sec   Loss 6.0976   LearningRate 0.0401   Epoch: 7   Global Step: 41750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:40,796-Speed 5537.15 samples/sec   Loss 5.9628   LearningRate 0.0400   Epoch: 7   Global Step: 41760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:42,623-Speed 5608.04 samples/sec   Loss 5.9955   LearningRate 0.0400   Epoch: 7   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:44,466-Speed 5555.73 samples/sec   Loss 5.9431   LearningRate 0.0400   Epoch: 7   Global Step: 41780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:04:46,344-Speed 5455.05 samples/sec   Loss 6.1706   LearningRate 0.0400   Epoch: 7   Global Step: 41790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:04:48,166-Speed 5622.02 samples/sec   Loss 6.0119   LearningRate 0.0400   Epoch: 7   Global Step: 41800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:04:49,994-Speed 5606.00 samples/sec   Loss 6.1806   LearningRate 0.0400   Epoch: 7   Global Step: 41810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:51,845-Speed 5533.93 samples/sec   Loss 6.1614   LearningRate 0.0400   Epoch: 7   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:53,716-Speed 5473.55 samples/sec   Loss 6.0179   LearningRate 0.0400   Epoch: 7   Global Step: 41830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:55,536-Speed 5626.70 samples/sec   Loss 6.1042   LearningRate 0.0400   Epoch: 7   Global Step: 41840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:57,365-Speed 5601.83 samples/sec   Loss 6.1650   LearningRate 0.0399   Epoch: 7   Global Step: 41850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:04:59,188-Speed 5618.06 samples/sec   Loss 6.0372   LearningRate 0.0399   Epoch: 7   Global Step: 41860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:05:01,020-Speed 5593.05 samples/sec   Loss 6.0150   LearningRate 0.0399   Epoch: 7   Global Step: 41870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:05:02,851-Speed 5593.12 samples/sec   Loss 6.1965   LearningRate 0.0399   Epoch: 7   Global Step: 41880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:05:04,677-Speed 5610.53 samples/sec   Loss 6.1249   LearningRate 0.0399   Epoch: 7   Global Step: 41890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:05:06,517-Speed 5565.77 samples/sec   Loss 6.0487   LearningRate 0.0399   Epoch: 7   Global Step: 41900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:05:08,328-Speed 5656.36 samples/sec   Loss 6.0315   LearningRate 0.0399   Epoch: 7   Global Step: 41910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:05:10,165-Speed 5576.41 samples/sec   Loss 6.1938   LearningRate 0.0399   Epoch: 7   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:05:11,987-Speed 5624.50 samples/sec   Loss 6.0596   LearningRate 0.0399   Epoch: 7   Global Step: 41930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:05:13,865-Speed 5453.22 samples/sec   Loss 6.1408   LearningRate 0.0398   Epoch: 7   Global Step: 41940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:05:15,719-Speed 5524.96 samples/sec   Loss 5.9951   LearningRate 0.0398   Epoch: 7   Global Step: 41950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:05:17,539-Speed 5628.98 samples/sec   Loss 6.0947   LearningRate 0.0398   Epoch: 7   Global Step: 41960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:05:19,380-Speed 5564.15 samples/sec   Loss 6.0018   LearningRate 0.0398   Epoch: 7   Global Step: 41970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:05:21,280-Speed 5392.74 samples/sec   Loss 5.9395   LearningRate 0.0398   Epoch: 7   Global Step: 41980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:05:23,122-Speed 5558.32 samples/sec   Loss 6.1379   LearningRate 0.0398   Epoch: 7   Global Step: 41990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:05:24,951-Speed 5601.83 samples/sec   Loss 6.1014   LearningRate 0.0398   Epoch: 7   Global Step: 42000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:05:51,123-[lfw][42000]XNorm: 20.757936
Training: 2022-04-27 04:05:51,123-[lfw][42000]Accuracy-Flip: 0.99650+-0.00361
Training: 2022-04-27 04:05:51,124-[lfw][42000]Accuracy-Highest: 0.99750
Training: 2022-04-27 04:06:21,569-[cfp_fp][42000]XNorm: 17.877480
Training: 2022-04-27 04:06:21,570-[cfp_fp][42000]Accuracy-Flip: 0.94857+-0.00969
Training: 2022-04-27 04:06:21,570-[cfp_fp][42000]Accuracy-Highest: 0.94857
Training: 2022-04-27 04:06:47,839-[agedb_30][42000]XNorm: 20.327300
Training: 2022-04-27 04:06:47,840-[agedb_30][42000]Accuracy-Flip: 0.97250+-0.00817
Training: 2022-04-27 04:06:47,840-[agedb_30][42000]Accuracy-Highest: 0.97300
Training: 2022-04-27 04:06:49,689-Speed 120.84 samples/sec   Loss 6.0675   LearningRate 0.0398   Epoch: 7   Global Step: 42010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 04:06:51,528-Speed 5567.47 samples/sec   Loss 5.9514   LearningRate 0.0398   Epoch: 7   Global Step: 42020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 04:06:53,369-Speed 5566.32 samples/sec   Loss 6.0587   LearningRate 0.0397   Epoch: 7   Global Step: 42030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 04:06:55,175-Speed 5670.44 samples/sec   Loss 5.9420   LearningRate 0.0397   Epoch: 7   Global Step: 42040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:06:56,998-Speed 5618.24 samples/sec   Loss 6.0608   LearningRate 0.0397   Epoch: 7   Global Step: 42050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:06:58,819-Speed 5625.62 samples/sec   Loss 6.1608   LearningRate 0.0397   Epoch: 7   Global Step: 42060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:00,636-Speed 5637.05 samples/sec   Loss 6.0628   LearningRate 0.0397   Epoch: 7   Global Step: 42070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:02,451-Speed 5644.49 samples/sec   Loss 6.1166   LearningRate 0.0397   Epoch: 7   Global Step: 42080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:04,272-Speed 5625.01 samples/sec   Loss 5.9460   LearningRate 0.0397   Epoch: 7   Global Step: 42090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:06,096-Speed 5616.52 samples/sec   Loss 6.0145   LearningRate 0.0397   Epoch: 7   Global Step: 42100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:07,920-Speed 5614.86 samples/sec   Loss 5.8991   LearningRate 0.0397   Epoch: 7   Global Step: 42110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:09,728-Speed 5666.09 samples/sec   Loss 6.0533   LearningRate 0.0396   Epoch: 7   Global Step: 42120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:11,563-Speed 5580.09 samples/sec   Loss 6.1318   LearningRate 0.0396   Epoch: 7   Global Step: 42130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 04:07:13,372-Speed 5664.18 samples/sec   Loss 6.0870   LearningRate 0.0396   Epoch: 7   Global Step: 42140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:15,180-Speed 5665.33 samples/sec   Loss 6.1027   LearningRate 0.0396   Epoch: 7   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:17,000-Speed 5627.80 samples/sec   Loss 5.9908   LearningRate 0.0396   Epoch: 7   Global Step: 42160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:18,810-Speed 5659.70 samples/sec   Loss 6.1487   LearningRate 0.0396   Epoch: 7   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:20,636-Speed 5612.25 samples/sec   Loss 6.0081   LearningRate 0.0396   Epoch: 7   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:22,459-Speed 5618.11 samples/sec   Loss 5.9818   LearningRate 0.0396   Epoch: 7   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:24,277-Speed 5634.17 samples/sec   Loss 6.0464   LearningRate 0.0396   Epoch: 7   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:26,081-Speed 5678.38 samples/sec   Loss 5.9498   LearningRate 0.0395   Epoch: 7   Global Step: 42210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:27,915-Speed 5582.67 samples/sec   Loss 5.9604   LearningRate 0.0395   Epoch: 7   Global Step: 42220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:29,750-Speed 5584.14 samples/sec   Loss 6.0431   LearningRate 0.0395   Epoch: 7   Global Step: 42230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:31,585-Speed 5581.66 samples/sec   Loss 6.0639   LearningRate 0.0395   Epoch: 7   Global Step: 42240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:33,417-Speed 5590.17 samples/sec   Loss 6.0044   LearningRate 0.0395   Epoch: 7   Global Step: 42250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:35,240-Speed 5618.93 samples/sec   Loss 6.0395   LearningRate 0.0395   Epoch: 7   Global Step: 42260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:37,085-Speed 5552.58 samples/sec   Loss 5.9918   LearningRate 0.0395   Epoch: 7   Global Step: 42270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:38,920-Speed 5581.41 samples/sec   Loss 5.8398   LearningRate 0.0395   Epoch: 7   Global Step: 42280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:40,732-Speed 5656.48 samples/sec   Loss 6.1499   LearningRate 0.0395   Epoch: 7   Global Step: 42290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:42,572-Speed 5567.98 samples/sec   Loss 6.0022   LearningRate 0.0394   Epoch: 7   Global Step: 42300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:44,386-Speed 5645.14 samples/sec   Loss 6.0112   LearningRate 0.0394   Epoch: 7   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:46,214-Speed 5602.57 samples/sec   Loss 6.0159   LearningRate 0.0394   Epoch: 7   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:48,070-Speed 5519.85 samples/sec   Loss 6.0557   LearningRate 0.0394   Epoch: 7   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:49,934-Speed 5496.90 samples/sec   Loss 6.2047   LearningRate 0.0394   Epoch: 7   Global Step: 42340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:51,779-Speed 5550.82 samples/sec   Loss 6.2143   LearningRate 0.0394   Epoch: 7   Global Step: 42350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:53,616-Speed 5577.00 samples/sec   Loss 6.1040   LearningRate 0.0394   Epoch: 7   Global Step: 42360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:55,447-Speed 5592.69 samples/sec   Loss 5.9667   LearningRate 0.0394   Epoch: 7   Global Step: 42370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:07:57,278-Speed 5595.64 samples/sec   Loss 6.1133   LearningRate 0.0394   Epoch: 7   Global Step: 42380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:07:59,105-Speed 5605.59 samples/sec   Loss 6.1064   LearningRate 0.0393   Epoch: 7   Global Step: 42390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:00,935-Speed 5596.27 samples/sec   Loss 6.0544   LearningRate 0.0393   Epoch: 7   Global Step: 42400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:02,774-Speed 5570.64 samples/sec   Loss 6.2302   LearningRate 0.0393   Epoch: 7   Global Step: 42410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:04,588-Speed 5648.57 samples/sec   Loss 5.9652   LearningRate 0.0393   Epoch: 7   Global Step: 42420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:06,421-Speed 5587.31 samples/sec   Loss 5.9395   LearningRate 0.0393   Epoch: 7   Global Step: 42430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:08,248-Speed 5606.80 samples/sec   Loss 6.0449   LearningRate 0.0393   Epoch: 7   Global Step: 42440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:10,091-Speed 5557.23 samples/sec   Loss 5.8803   LearningRate 0.0393   Epoch: 7   Global Step: 42450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:11,915-Speed 5615.89 samples/sec   Loss 5.9537   LearningRate 0.0393   Epoch: 7   Global Step: 42460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:13,737-Speed 5623.04 samples/sec   Loss 5.9439   LearningRate 0.0393   Epoch: 7   Global Step: 42470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:15,567-Speed 5598.31 samples/sec   Loss 5.8978   LearningRate 0.0392   Epoch: 7   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:08:17,411-Speed 5555.40 samples/sec   Loss 6.1573   LearningRate 0.0392   Epoch: 7   Global Step: 42490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:08:19,216-Speed 5675.04 samples/sec   Loss 6.0924   LearningRate 0.0392   Epoch: 7   Global Step: 42500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:21,023-Speed 5667.93 samples/sec   Loss 6.0098   LearningRate 0.0392   Epoch: 7   Global Step: 42510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:22,839-Speed 5638.96 samples/sec   Loss 5.8245   LearningRate 0.0392   Epoch: 7   Global Step: 42520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:24,656-Speed 5638.83 samples/sec   Loss 5.9067   LearningRate 0.0392   Epoch: 7   Global Step: 42530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:26,480-Speed 5617.88 samples/sec   Loss 6.0154   LearningRate 0.0392   Epoch: 7   Global Step: 42540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:28,302-Speed 5621.75 samples/sec   Loss 6.0263   LearningRate 0.0392   Epoch: 7   Global Step: 42550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:30,131-Speed 5598.81 samples/sec   Loss 5.9906   LearningRate 0.0392   Epoch: 7   Global Step: 42560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:31,950-Speed 5632.27 samples/sec   Loss 5.8645   LearningRate 0.0391   Epoch: 7   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:33,777-Speed 5607.46 samples/sec   Loss 6.0228   LearningRate 0.0391   Epoch: 7   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:35,590-Speed 5647.75 samples/sec   Loss 6.0242   LearningRate 0.0391   Epoch: 7   Global Step: 42590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:37,422-Speed 5591.23 samples/sec   Loss 5.9631   LearningRate 0.0391   Epoch: 7   Global Step: 42600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:08:39,237-Speed 5644.68 samples/sec   Loss 5.9187   LearningRate 0.0391   Epoch: 7   Global Step: 42610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:41,049-Speed 5652.05 samples/sec   Loss 6.0118   LearningRate 0.0391   Epoch: 7   Global Step: 42620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:42,873-Speed 5615.74 samples/sec   Loss 6.0070   LearningRate 0.0391   Epoch: 7   Global Step: 42630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:44,697-Speed 5616.60 samples/sec   Loss 5.9963   LearningRate 0.0391   Epoch: 7   Global Step: 42640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:46,504-Speed 5669.47 samples/sec   Loss 6.0936   LearningRate 0.0391   Epoch: 7   Global Step: 42650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:48,317-Speed 5650.15 samples/sec   Loss 5.9843   LearningRate 0.0390   Epoch: 7   Global Step: 42660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:50,130-Speed 5650.82 samples/sec   Loss 5.9952   LearningRate 0.0390   Epoch: 7   Global Step: 42670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:51,953-Speed 5617.87 samples/sec   Loss 6.1856   LearningRate 0.0390   Epoch: 7   Global Step: 42680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:53,785-Speed 5591.13 samples/sec   Loss 6.0501   LearningRate 0.0390   Epoch: 7   Global Step: 42690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:55,626-Speed 5565.01 samples/sec   Loss 6.0981   LearningRate 0.0390   Epoch: 7   Global Step: 42700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:08:57,439-Speed 5649.76 samples/sec   Loss 5.9070   LearningRate 0.0390   Epoch: 7   Global Step: 42710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:08:59,331-Speed 5414.03 samples/sec   Loss 6.0570   LearningRate 0.0390   Epoch: 7   Global Step: 42720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:01,186-Speed 5520.80 samples/sec   Loss 5.9809   LearningRate 0.0390   Epoch: 7   Global Step: 42730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:03,035-Speed 5540.67 samples/sec   Loss 5.9338   LearningRate 0.0390   Epoch: 7   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:04,857-Speed 5623.72 samples/sec   Loss 6.0755   LearningRate 0.0389   Epoch: 7   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:06,698-Speed 5562.03 samples/sec   Loss 6.0340   LearningRate 0.0389   Epoch: 7   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:08,543-Speed 5553.76 samples/sec   Loss 5.8374   LearningRate 0.0389   Epoch: 7   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:10,378-Speed 5582.64 samples/sec   Loss 5.9960   LearningRate 0.0389   Epoch: 7   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:12,203-Speed 5611.57 samples/sec   Loss 5.9989   LearningRate 0.0389   Epoch: 7   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:14,019-Speed 5640.48 samples/sec   Loss 6.0499   LearningRate 0.0389   Epoch: 7   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:15,844-Speed 5612.55 samples/sec   Loss 6.0980   LearningRate 0.0389   Epoch: 7   Global Step: 42810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:17,680-Speed 5580.09 samples/sec   Loss 5.9747   LearningRate 0.0389   Epoch: 7   Global Step: 42820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:19,545-Speed 5493.94 samples/sec   Loss 5.9173   LearningRate 0.0389   Epoch: 7   Global Step: 42830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:21,375-Speed 5597.42 samples/sec   Loss 6.0163   LearningRate 0.0388   Epoch: 7   Global Step: 42840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:23,194-Speed 5630.64 samples/sec   Loss 5.8985   LearningRate 0.0388   Epoch: 7   Global Step: 42850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:09:25,034-Speed 5565.50 samples/sec   Loss 6.0883   LearningRate 0.0388   Epoch: 7   Global Step: 42860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:26,880-Speed 5548.92 samples/sec   Loss 6.0315   LearningRate 0.0388   Epoch: 7   Global Step: 42870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:28,718-Speed 5573.24 samples/sec   Loss 5.9384   LearningRate 0.0388   Epoch: 7   Global Step: 42880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:30,565-Speed 5546.70 samples/sec   Loss 6.1196   LearningRate 0.0388   Epoch: 7   Global Step: 42890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:32,415-Speed 5536.94 samples/sec   Loss 5.9550   LearningRate 0.0388   Epoch: 7   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:34,259-Speed 5556.06 samples/sec   Loss 6.0921   LearningRate 0.0388   Epoch: 7   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:36,066-Speed 5667.74 samples/sec   Loss 5.9477   LearningRate 0.0388   Epoch: 7   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:37,877-Speed 5657.57 samples/sec   Loss 6.0442   LearningRate 0.0387   Epoch: 7   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:39,706-Speed 5598.71 samples/sec   Loss 5.9275   LearningRate 0.0387   Epoch: 7   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:41,549-Speed 5559.55 samples/sec   Loss 5.9754   LearningRate 0.0387   Epoch: 7   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:43,361-Speed 5651.93 samples/sec   Loss 6.0245   LearningRate 0.0387   Epoch: 7   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:45,192-Speed 5594.03 samples/sec   Loss 5.9841   LearningRate 0.0387   Epoch: 7   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:47,021-Speed 5600.53 samples/sec   Loss 5.9196   LearningRate 0.0387   Epoch: 7   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:48,863-Speed 5561.43 samples/sec   Loss 5.9393   LearningRate 0.0387   Epoch: 7   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:50,696-Speed 5588.87 samples/sec   Loss 6.1383   LearningRate 0.0387   Epoch: 7   Global Step: 43000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:52,530-Speed 5586.59 samples/sec   Loss 5.9815   LearningRate 0.0387   Epoch: 7   Global Step: 43010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:54,347-Speed 5637.18 samples/sec   Loss 5.8893   LearningRate 0.0387   Epoch: 7   Global Step: 43020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:56,181-Speed 5584.84 samples/sec   Loss 6.0098   LearningRate 0.0386   Epoch: 7   Global Step: 43030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:58,005-Speed 5614.58 samples/sec   Loss 6.0007   LearningRate 0.0386   Epoch: 7   Global Step: 43040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:09:59,829-Speed 5617.16 samples/sec   Loss 6.0065   LearningRate 0.0386   Epoch: 7   Global Step: 43050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:01,660-Speed 5592.77 samples/sec   Loss 6.0559   LearningRate 0.0386   Epoch: 7   Global Step: 43060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:03,476-Speed 5643.04 samples/sec   Loss 5.9441   LearningRate 0.0386   Epoch: 7   Global Step: 43070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:05,289-Speed 5649.17 samples/sec   Loss 6.0719   LearningRate 0.0386   Epoch: 7   Global Step: 43080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:07,108-Speed 5631.62 samples/sec   Loss 5.9173   LearningRate 0.0386   Epoch: 7   Global Step: 43090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:08,940-Speed 5589.72 samples/sec   Loss 5.9261   LearningRate 0.0386   Epoch: 7   Global Step: 43100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:10,778-Speed 5573.82 samples/sec   Loss 5.9390   LearningRate 0.0386   Epoch: 7   Global Step: 43110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:12,596-Speed 5633.38 samples/sec   Loss 6.0387   LearningRate 0.0385   Epoch: 7   Global Step: 43120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:14,426-Speed 5600.49 samples/sec   Loss 5.8735   LearningRate 0.0385   Epoch: 7   Global Step: 43130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:16,253-Speed 5605.74 samples/sec   Loss 6.1096   LearningRate 0.0385   Epoch: 7   Global Step: 43140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:18,068-Speed 5641.95 samples/sec   Loss 5.8777   LearningRate 0.0385   Epoch: 7   Global Step: 43150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:19,879-Speed 5659.15 samples/sec   Loss 6.0399   LearningRate 0.0385   Epoch: 7   Global Step: 43160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:21,701-Speed 5619.72 samples/sec   Loss 5.9900   LearningRate 0.0385   Epoch: 7   Global Step: 43170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:23,521-Speed 5628.18 samples/sec   Loss 6.0268   LearningRate 0.0385   Epoch: 7   Global Step: 43180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:25,346-Speed 5612.41 samples/sec   Loss 6.0991   LearningRate 0.0385   Epoch: 7   Global Step: 43190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:27,174-Speed 5605.13 samples/sec   Loss 6.0068   LearningRate 0.0385   Epoch: 7   Global Step: 43200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:28,991-Speed 5638.14 samples/sec   Loss 5.9433   LearningRate 0.0384   Epoch: 7   Global Step: 43210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:30,806-Speed 5641.35 samples/sec   Loss 5.9485   LearningRate 0.0384   Epoch: 7   Global Step: 43220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:32,614-Speed 5665.91 samples/sec   Loss 6.2161   LearningRate 0.0384   Epoch: 7   Global Step: 43230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:34,426-Speed 5653.26 samples/sec   Loss 5.9463   LearningRate 0.0384   Epoch: 7   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:36,253-Speed 5607.37 samples/sec   Loss 5.9641   LearningRate 0.0384   Epoch: 7   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:38,063-Speed 5659.51 samples/sec   Loss 5.9675   LearningRate 0.0384   Epoch: 7   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:39,870-Speed 5668.45 samples/sec   Loss 6.0665   LearningRate 0.0384   Epoch: 7   Global Step: 43270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:41,685-Speed 5643.35 samples/sec   Loss 6.0149   LearningRate 0.0384   Epoch: 7   Global Step: 43280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:43,510-Speed 5615.67 samples/sec   Loss 6.1294   LearningRate 0.0384   Epoch: 7   Global Step: 43290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:45,336-Speed 5607.78 samples/sec   Loss 5.9115   LearningRate 0.0383   Epoch: 7   Global Step: 43300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:47,155-Speed 5631.80 samples/sec   Loss 6.0486   LearningRate 0.0383   Epoch: 7   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:49,010-Speed 5521.91 samples/sec   Loss 5.9592   LearningRate 0.0383   Epoch: 7   Global Step: 43320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:50,860-Speed 5535.93 samples/sec   Loss 5.8977   LearningRate 0.0383   Epoch: 7   Global Step: 43330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:52,681-Speed 5627.14 samples/sec   Loss 5.9895   LearningRate 0.0383   Epoch: 7   Global Step: 43340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:54,510-Speed 5600.33 samples/sec   Loss 5.9322   LearningRate 0.0383   Epoch: 7   Global Step: 43350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:56,334-Speed 5615.88 samples/sec   Loss 5.8827   LearningRate 0.0383   Epoch: 7   Global Step: 43360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:10:58,146-Speed 5653.52 samples/sec   Loss 5.9485   LearningRate 0.0383   Epoch: 7   Global Step: 43370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:10:59,979-Speed 5586.99 samples/sec   Loss 5.8801   LearningRate 0.0383   Epoch: 7   Global Step: 43380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:01,803-Speed 5617.95 samples/sec   Loss 5.9381   LearningRate 0.0382   Epoch: 7   Global Step: 43390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:03,628-Speed 5611.05 samples/sec   Loss 5.9559   LearningRate 0.0382   Epoch: 7   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:05,466-Speed 5572.64 samples/sec   Loss 6.0234   LearningRate 0.0382   Epoch: 7   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:07,287-Speed 5626.85 samples/sec   Loss 6.0068   LearningRate 0.0382   Epoch: 7   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:09,104-Speed 5638.16 samples/sec   Loss 5.9304   LearningRate 0.0382   Epoch: 7   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:10,921-Speed 5635.26 samples/sec   Loss 5.8428   LearningRate 0.0382   Epoch: 7   Global Step: 43440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:12,740-Speed 5633.65 samples/sec   Loss 5.9504   LearningRate 0.0382   Epoch: 7   Global Step: 43450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:14,568-Speed 5601.22 samples/sec   Loss 5.8579   LearningRate 0.0382   Epoch: 7   Global Step: 43460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:16,372-Speed 5679.61 samples/sec   Loss 5.8515   LearningRate 0.0382   Epoch: 7   Global Step: 43470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:18,216-Speed 5553.42 samples/sec   Loss 5.8175   LearningRate 0.0382   Epoch: 7   Global Step: 43480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:20,048-Speed 5591.44 samples/sec   Loss 5.9127   LearningRate 0.0381   Epoch: 7   Global Step: 43490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:21,864-Speed 5641.78 samples/sec   Loss 5.7967   LearningRate 0.0381   Epoch: 7   Global Step: 43500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:23,676-Speed 5653.29 samples/sec   Loss 6.0729   LearningRate 0.0381   Epoch: 7   Global Step: 43510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:25,503-Speed 5609.19 samples/sec   Loss 5.8631   LearningRate 0.0381   Epoch: 7   Global Step: 43520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:27,332-Speed 5597.58 samples/sec   Loss 5.8363   LearningRate 0.0381   Epoch: 7   Global Step: 43530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:29,161-Speed 5602.53 samples/sec   Loss 5.9618   LearningRate 0.0381   Epoch: 7   Global Step: 43540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:30,988-Speed 5606.19 samples/sec   Loss 6.0397   LearningRate 0.0381   Epoch: 7   Global Step: 43550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:32,832-Speed 5554.94 samples/sec   Loss 5.9462   LearningRate 0.0381   Epoch: 7   Global Step: 43560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:34,636-Speed 5678.45 samples/sec   Loss 6.0247   LearningRate 0.0381   Epoch: 7   Global Step: 43570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:36,456-Speed 5628.17 samples/sec   Loss 5.9904   LearningRate 0.0380   Epoch: 7   Global Step: 43580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:38,275-Speed 5629.75 samples/sec   Loss 5.7617   LearningRate 0.0380   Epoch: 7   Global Step: 43590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:40,101-Speed 5610.44 samples/sec   Loss 6.0011   LearningRate 0.0380   Epoch: 7   Global Step: 43600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:41,952-Speed 5533.08 samples/sec   Loss 5.9644   LearningRate 0.0380   Epoch: 7   Global Step: 43610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:43,768-Speed 5640.47 samples/sec   Loss 6.0100   LearningRate 0.0380   Epoch: 7   Global Step: 43620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:45,602-Speed 5586.55 samples/sec   Loss 5.8958   LearningRate 0.0380   Epoch: 7   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:47,426-Speed 5617.85 samples/sec   Loss 5.8838   LearningRate 0.0380   Epoch: 7   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:49,256-Speed 5595.39 samples/sec   Loss 5.9635   LearningRate 0.0380   Epoch: 7   Global Step: 43650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:51,073-Speed 5638.68 samples/sec   Loss 5.9072   LearningRate 0.0380   Epoch: 7   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:52,888-Speed 5645.35 samples/sec   Loss 5.8646   LearningRate 0.0379   Epoch: 7   Global Step: 43670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:11:54,704-Speed 5637.71 samples/sec   Loss 5.8987   LearningRate 0.0379   Epoch: 7   Global Step: 43680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:56,521-Speed 5638.17 samples/sec   Loss 5.8513   LearningRate 0.0379   Epoch: 7   Global Step: 43690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:11:58,360-Speed 5569.50 samples/sec   Loss 6.0803   LearningRate 0.0379   Epoch: 7   Global Step: 43700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:12:00,174-Speed 5648.63 samples/sec   Loss 5.9143   LearningRate 0.0379   Epoch: 7   Global Step: 43710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:01,996-Speed 5620.30 samples/sec   Loss 6.0811   LearningRate 0.0379   Epoch: 7   Global Step: 43720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:03,825-Speed 5601.87 samples/sec   Loss 5.9130   LearningRate 0.0379   Epoch: 7   Global Step: 43730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:05,652-Speed 5605.72 samples/sec   Loss 5.8580   LearningRate 0.0379   Epoch: 7   Global Step: 43740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:07,485-Speed 5588.57 samples/sec   Loss 5.8987   LearningRate 0.0379   Epoch: 7   Global Step: 43750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:09,328-Speed 5560.15 samples/sec   Loss 5.8188   LearningRate 0.0378   Epoch: 7   Global Step: 43760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:11,138-Speed 5657.94 samples/sec   Loss 5.9822   LearningRate 0.0378   Epoch: 7   Global Step: 43770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:12,951-Speed 5650.27 samples/sec   Loss 5.9283   LearningRate 0.0378   Epoch: 7   Global Step: 43780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:14,787-Speed 5579.80 samples/sec   Loss 5.9545   LearningRate 0.0378   Epoch: 7   Global Step: 43790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:16,618-Speed 5592.64 samples/sec   Loss 6.0129   LearningRate 0.0378   Epoch: 7   Global Step: 43800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:18,440-Speed 5624.41 samples/sec   Loss 5.9706   LearningRate 0.0378   Epoch: 7   Global Step: 43810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:20,296-Speed 5519.55 samples/sec   Loss 5.8763   LearningRate 0.0378   Epoch: 7   Global Step: 43820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:22,130-Speed 5584.73 samples/sec   Loss 5.8831   LearningRate 0.0378   Epoch: 7   Global Step: 43830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:23,952-Speed 5619.44 samples/sec   Loss 5.8125   LearningRate 0.0378   Epoch: 7   Global Step: 43840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:25,762-Speed 5662.02 samples/sec   Loss 5.8204   LearningRate 0.0377   Epoch: 7   Global Step: 43850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:27,575-Speed 5647.38 samples/sec   Loss 5.8972   LearningRate 0.0377   Epoch: 7   Global Step: 43860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:29,436-Speed 5504.92 samples/sec   Loss 5.9547   LearningRate 0.0377   Epoch: 7   Global Step: 43870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:12:31,362-Speed 5319.42 samples/sec   Loss 5.9954   LearningRate 0.0377   Epoch: 7   Global Step: 43880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:33,206-Speed 5553.90 samples/sec   Loss 5.9561   LearningRate 0.0377   Epoch: 7   Global Step: 43890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:35,040-Speed 5585.24 samples/sec   Loss 5.8555   LearningRate 0.0377   Epoch: 7   Global Step: 43900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:36,854-Speed 5647.92 samples/sec   Loss 5.8772   LearningRate 0.0377   Epoch: 7   Global Step: 43910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:38,670-Speed 5641.68 samples/sec   Loss 5.9686   LearningRate 0.0377   Epoch: 7   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:40,478-Speed 5663.46 samples/sec   Loss 5.8263   LearningRate 0.0377   Epoch: 7   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:42,322-Speed 5555.86 samples/sec   Loss 6.0588   LearningRate 0.0377   Epoch: 7   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:44,182-Speed 5508.12 samples/sec   Loss 5.9113   LearningRate 0.0376   Epoch: 7   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:46,006-Speed 5614.35 samples/sec   Loss 5.8960   LearningRate 0.0376   Epoch: 7   Global Step: 43960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:47,835-Speed 5600.87 samples/sec   Loss 6.0128   LearningRate 0.0376   Epoch: 7   Global Step: 43970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:12:49,671-Speed 5579.17 samples/sec   Loss 5.8608   LearningRate 0.0376   Epoch: 7   Global Step: 43980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:12:51,490-Speed 5632.56 samples/sec   Loss 5.9399   LearningRate 0.0376   Epoch: 7   Global Step: 43990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:12:53,337-Speed 5546.99 samples/sec   Loss 5.9473   LearningRate 0.0376   Epoch: 7   Global Step: 44000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:13:27,770-[lfw][44000]XNorm: 20.828653
Training: 2022-04-27 04:13:27,771-[lfw][44000]Accuracy-Flip: 0.99667+-0.00269
Training: 2022-04-27 04:13:27,771-[lfw][44000]Accuracy-Highest: 0.99750
Training: 2022-04-27 04:14:07,567-[cfp_fp][44000]XNorm: 17.993965
Training: 2022-04-27 04:14:07,568-[cfp_fp][44000]Accuracy-Flip: 0.94400+-0.01179
Training: 2022-04-27 04:14:07,568-[cfp_fp][44000]Accuracy-Highest: 0.94857
Training: 2022-04-27 04:14:41,925-[agedb_30][44000]XNorm: 20.486939
Training: 2022-04-27 04:14:41,926-[agedb_30][44000]Accuracy-Flip: 0.97383+-0.00879
Training: 2022-04-27 04:14:41,926-[agedb_30][44000]Accuracy-Highest: 0.97383
Training: 2022-04-27 04:14:43,819-Speed 92.68 samples/sec   Loss 6.0498   LearningRate 0.0376   Epoch: 7   Global Step: 44010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:14:45,657-Speed 5573.30 samples/sec   Loss 5.8511   LearningRate 0.0376   Epoch: 7   Global Step: 44020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:14:47,482-Speed 5613.26 samples/sec   Loss 5.9472   LearningRate 0.0376   Epoch: 7   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:14:49,348-Speed 5489.78 samples/sec   Loss 5.8875   LearningRate 0.0375   Epoch: 7   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:14:51,188-Speed 5568.14 samples/sec   Loss 6.0417   LearningRate 0.0375   Epoch: 7   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:14:53,015-Speed 5605.44 samples/sec   Loss 5.9716   LearningRate 0.0375   Epoch: 7   Global Step: 44060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:14:54,865-Speed 5538.65 samples/sec   Loss 5.8722   LearningRate 0.0375   Epoch: 7   Global Step: 44070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:14:56,689-Speed 5615.33 samples/sec   Loss 5.7702   LearningRate 0.0375   Epoch: 7   Global Step: 44080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:14:58,505-Speed 5640.76 samples/sec   Loss 5.9450   LearningRate 0.0375   Epoch: 7   Global Step: 44090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:00,318-Speed 5648.43 samples/sec   Loss 5.8539   LearningRate 0.0375   Epoch: 7   Global Step: 44100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:02,139-Speed 5625.02 samples/sec   Loss 5.9175   LearningRate 0.0375   Epoch: 7   Global Step: 44110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:03,977-Speed 5574.47 samples/sec   Loss 5.8906   LearningRate 0.0375   Epoch: 7   Global Step: 44120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:05,815-Speed 5573.70 samples/sec   Loss 6.0543   LearningRate 0.0374   Epoch: 7   Global Step: 44130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:07,631-Speed 5641.13 samples/sec   Loss 5.9434   LearningRate 0.0374   Epoch: 7   Global Step: 44140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:09,457-Speed 5609.02 samples/sec   Loss 5.9453   LearningRate 0.0374   Epoch: 7   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:11,271-Speed 5647.16 samples/sec   Loss 5.8686   LearningRate 0.0374   Epoch: 7   Global Step: 44160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:13,089-Speed 5633.46 samples/sec   Loss 5.9440   LearningRate 0.0374   Epoch: 7   Global Step: 44170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:14,923-Speed 5586.07 samples/sec   Loss 6.0327   LearningRate 0.0374   Epoch: 7   Global Step: 44180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:16,749-Speed 5608.30 samples/sec   Loss 5.8424   LearningRate 0.0374   Epoch: 7   Global Step: 44190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:18,567-Speed 5635.63 samples/sec   Loss 5.8469   LearningRate 0.0374   Epoch: 7   Global Step: 44200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:20,375-Speed 5665.18 samples/sec   Loss 5.8324   LearningRate 0.0374   Epoch: 7   Global Step: 44210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:22,209-Speed 5586.57 samples/sec   Loss 5.8095   LearningRate 0.0374   Epoch: 7   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:24,029-Speed 5628.66 samples/sec   Loss 5.9116   LearningRate 0.0373   Epoch: 7   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:25,857-Speed 5603.45 samples/sec   Loss 5.8134   LearningRate 0.0373   Epoch: 7   Global Step: 44240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:27,667-Speed 5657.85 samples/sec   Loss 5.8459   LearningRate 0.0373   Epoch: 7   Global Step: 44250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:29,496-Speed 5601.97 samples/sec   Loss 5.7699   LearningRate 0.0373   Epoch: 7   Global Step: 44260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:31,306-Speed 5657.64 samples/sec   Loss 5.8484   LearningRate 0.0373   Epoch: 7   Global Step: 44270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:33,140-Speed 5584.29 samples/sec   Loss 5.9906   LearningRate 0.0373   Epoch: 7   Global Step: 44280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:34,968-Speed 5604.44 samples/sec   Loss 5.9824   LearningRate 0.0373   Epoch: 7   Global Step: 44290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:36,806-Speed 5575.03 samples/sec   Loss 5.8146   LearningRate 0.0373   Epoch: 7   Global Step: 44300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:38,624-Speed 5633.65 samples/sec   Loss 6.0616   LearningRate 0.0373   Epoch: 7   Global Step: 44310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:40,456-Speed 5591.88 samples/sec   Loss 5.9623   LearningRate 0.0372   Epoch: 7   Global Step: 44320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:42,269-Speed 5647.91 samples/sec   Loss 5.8252   LearningRate 0.0372   Epoch: 7   Global Step: 44330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:44,081-Speed 5654.37 samples/sec   Loss 5.8873   LearningRate 0.0372   Epoch: 7   Global Step: 44340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:15:45,909-Speed 5602.58 samples/sec   Loss 6.0367   LearningRate 0.0372   Epoch: 7   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:47,723-Speed 5648.52 samples/sec   Loss 5.8790   LearningRate 0.0372   Epoch: 7   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:49,548-Speed 5613.71 samples/sec   Loss 5.7878   LearningRate 0.0372   Epoch: 7   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:51,375-Speed 5604.86 samples/sec   Loss 5.9572   LearningRate 0.0372   Epoch: 7   Global Step: 44380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:53,203-Speed 5604.66 samples/sec   Loss 5.8586   LearningRate 0.0372   Epoch: 7   Global Step: 44390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:55,058-Speed 5521.58 samples/sec   Loss 5.8204   LearningRate 0.0372   Epoch: 7   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:56,881-Speed 5619.56 samples/sec   Loss 5.9429   LearningRate 0.0371   Epoch: 7   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:15:58,719-Speed 5574.24 samples/sec   Loss 5.8696   LearningRate 0.0371   Epoch: 7   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:16:00,541-Speed 5622.04 samples/sec   Loss 5.8223   LearningRate 0.0371   Epoch: 7   Global Step: 44430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:02,353-Speed 5652.31 samples/sec   Loss 5.8561   LearningRate 0.0371   Epoch: 7   Global Step: 44440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:04,166-Speed 5648.57 samples/sec   Loss 5.9555   LearningRate 0.0371   Epoch: 7   Global Step: 44450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:05,978-Speed 5655.04 samples/sec   Loss 5.8462   LearningRate 0.0371   Epoch: 7   Global Step: 44460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:07,792-Speed 5645.97 samples/sec   Loss 5.8160   LearningRate 0.0371   Epoch: 7   Global Step: 44470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:09,620-Speed 5604.90 samples/sec   Loss 5.8702   LearningRate 0.0371   Epoch: 7   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:11,432-Speed 5651.55 samples/sec   Loss 5.8843   LearningRate 0.0371   Epoch: 7   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:13,246-Speed 5646.04 samples/sec   Loss 5.8167   LearningRate 0.0371   Epoch: 7   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:15,086-Speed 5567.29 samples/sec   Loss 5.9198   LearningRate 0.0370   Epoch: 7   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:16,921-Speed 5583.82 samples/sec   Loss 5.9462   LearningRate 0.0370   Epoch: 7   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:18,752-Speed 5592.94 samples/sec   Loss 6.0317   LearningRate 0.0370   Epoch: 7   Global Step: 44530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:16:20,567-Speed 5645.04 samples/sec   Loss 5.8202   LearningRate 0.0370   Epoch: 7   Global Step: 44540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:16:22,374-Speed 5668.29 samples/sec   Loss 5.9178   LearningRate 0.0370   Epoch: 7   Global Step: 44550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:24,195-Speed 5624.81 samples/sec   Loss 5.9148   LearningRate 0.0370   Epoch: 7   Global Step: 44560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:26,018-Speed 5620.41 samples/sec   Loss 5.8431   LearningRate 0.0370   Epoch: 7   Global Step: 44570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:27,855-Speed 5574.82 samples/sec   Loss 5.9646   LearningRate 0.0370   Epoch: 7   Global Step: 44580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:29,680-Speed 5612.76 samples/sec   Loss 6.0715   LearningRate 0.0370   Epoch: 7   Global Step: 44590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:31,498-Speed 5633.27 samples/sec   Loss 5.8171   LearningRate 0.0369   Epoch: 7   Global Step: 44600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:33,321-Speed 5618.67 samples/sec   Loss 5.8335   LearningRate 0.0369   Epoch: 7   Global Step: 44610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:35,137-Speed 5640.84 samples/sec   Loss 5.8036   LearningRate 0.0369   Epoch: 7   Global Step: 44620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:36,986-Speed 5541.28 samples/sec   Loss 5.9038   LearningRate 0.0369   Epoch: 7   Global Step: 44630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:38,808-Speed 5624.08 samples/sec   Loss 5.8257   LearningRate 0.0369   Epoch: 7   Global Step: 44640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:40,633-Speed 5611.47 samples/sec   Loss 5.7497   LearningRate 0.0369   Epoch: 7   Global Step: 44650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:42,463-Speed 5598.09 samples/sec   Loss 5.8447   LearningRate 0.0369   Epoch: 7   Global Step: 44660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:44,283-Speed 5626.86 samples/sec   Loss 6.0079   LearningRate 0.0369   Epoch: 7   Global Step: 44670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:46,101-Speed 5634.60 samples/sec   Loss 5.7764   LearningRate 0.0369   Epoch: 7   Global Step: 44680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:47,920-Speed 5632.75 samples/sec   Loss 5.9613   LearningRate 0.0368   Epoch: 7   Global Step: 44690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:49,755-Speed 5581.26 samples/sec   Loss 5.8697   LearningRate 0.0368   Epoch: 7   Global Step: 44700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:51,575-Speed 5630.94 samples/sec   Loss 5.9244   LearningRate 0.0368   Epoch: 7   Global Step: 44710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:16:53,389-Speed 5643.70 samples/sec   Loss 5.8395   LearningRate 0.0368   Epoch: 7   Global Step: 44720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:55,222-Speed 5590.20 samples/sec   Loss 5.8395   LearningRate 0.0368   Epoch: 7   Global Step: 44730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:57,050-Speed 5603.44 samples/sec   Loss 5.8271   LearningRate 0.0368   Epoch: 7   Global Step: 44740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:16:58,875-Speed 5612.72 samples/sec   Loss 5.9083   LearningRate 0.0368   Epoch: 7   Global Step: 44750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:00,713-Speed 5573.62 samples/sec   Loss 5.8525   LearningRate 0.0368   Epoch: 7   Global Step: 44760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:02,536-Speed 5617.53 samples/sec   Loss 5.7933   LearningRate 0.0368   Epoch: 7   Global Step: 44770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:04,380-Speed 5555.47 samples/sec   Loss 5.9556   LearningRate 0.0368   Epoch: 7   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:06,219-Speed 5569.69 samples/sec   Loss 5.9300   LearningRate 0.0367   Epoch: 7   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:08,033-Speed 5648.44 samples/sec   Loss 5.7883   LearningRate 0.0367   Epoch: 7   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:09,847-Speed 5646.38 samples/sec   Loss 5.7561   LearningRate 0.0367   Epoch: 7   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:11,667-Speed 5627.95 samples/sec   Loss 5.8909   LearningRate 0.0367   Epoch: 7   Global Step: 44820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:13,483-Speed 5640.04 samples/sec   Loss 5.8363   LearningRate 0.0367   Epoch: 7   Global Step: 44830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:15,304-Speed 5625.45 samples/sec   Loss 5.8293   LearningRate 0.0367   Epoch: 7   Global Step: 44840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:17,142-Speed 5572.92 samples/sec   Loss 5.8387   LearningRate 0.0367   Epoch: 7   Global Step: 44850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:18,963-Speed 5626.94 samples/sec   Loss 5.8674   LearningRate 0.0367   Epoch: 7   Global Step: 44860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:20,794-Speed 5591.44 samples/sec   Loss 5.8703   LearningRate 0.0367   Epoch: 7   Global Step: 44870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:22,642-Speed 5543.55 samples/sec   Loss 5.7820   LearningRate 0.0366   Epoch: 7   Global Step: 44880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:24,489-Speed 5546.65 samples/sec   Loss 5.8538   LearningRate 0.0366   Epoch: 7   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:26,310-Speed 5627.06 samples/sec   Loss 5.8444   LearningRate 0.0366   Epoch: 7   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:28,131-Speed 5624.45 samples/sec   Loss 5.8681   LearningRate 0.0366   Epoch: 7   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:29,948-Speed 5636.92 samples/sec   Loss 5.8161   LearningRate 0.0366   Epoch: 7   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:31,775-Speed 5607.35 samples/sec   Loss 5.7870   LearningRate 0.0366   Epoch: 7   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:33,596-Speed 5623.48 samples/sec   Loss 5.8849   LearningRate 0.0366   Epoch: 7   Global Step: 44940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:35,414-Speed 5635.74 samples/sec   Loss 5.7771   LearningRate 0.0366   Epoch: 7   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:37,220-Speed 5670.92 samples/sec   Loss 5.8594   LearningRate 0.0366   Epoch: 7   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:39,036-Speed 5640.87 samples/sec   Loss 5.7076   LearningRate 0.0365   Epoch: 7   Global Step: 44970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:40,862-Speed 5609.96 samples/sec   Loss 5.8332   LearningRate 0.0365   Epoch: 7   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:42,680-Speed 5634.05 samples/sec   Loss 5.9111   LearningRate 0.0365   Epoch: 7   Global Step: 44990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:44,496-Speed 5642.18 samples/sec   Loss 5.8999   LearningRate 0.0365   Epoch: 7   Global Step: 45000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:46,343-Speed 5543.92 samples/sec   Loss 5.7884   LearningRate 0.0365   Epoch: 7   Global Step: 45010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:48,171-Speed 5606.61 samples/sec   Loss 5.7140   LearningRate 0.0365   Epoch: 7   Global Step: 45020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:17:49,988-Speed 5637.48 samples/sec   Loss 5.7658   LearningRate 0.0365   Epoch: 7   Global Step: 45030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:51,826-Speed 5572.22 samples/sec   Loss 5.7549   LearningRate 0.0365   Epoch: 7   Global Step: 45040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:53,639-Speed 5649.46 samples/sec   Loss 5.8755   LearningRate 0.0365   Epoch: 7   Global Step: 45050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:55,468-Speed 5599.46 samples/sec   Loss 5.8407   LearningRate 0.0365   Epoch: 7   Global Step: 45060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:57,313-Speed 5551.92 samples/sec   Loss 5.8255   LearningRate 0.0364   Epoch: 7   Global Step: 45070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:17:59,156-Speed 5557.62 samples/sec   Loss 6.0091   LearningRate 0.0364   Epoch: 7   Global Step: 45080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:00,996-Speed 5567.95 samples/sec   Loss 5.7971   LearningRate 0.0364   Epoch: 7   Global Step: 45090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:02,825-Speed 5601.76 samples/sec   Loss 5.7229   LearningRate 0.0364   Epoch: 7   Global Step: 45100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:04,625-Speed 5689.19 samples/sec   Loss 5.7572   LearningRate 0.0364   Epoch: 7   Global Step: 45110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:06,437-Speed 5653.51 samples/sec   Loss 5.9293   LearningRate 0.0364   Epoch: 7   Global Step: 45120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:08,262-Speed 5614.57 samples/sec   Loss 5.9142   LearningRate 0.0364   Epoch: 7   Global Step: 45130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:10,086-Speed 5615.56 samples/sec   Loss 5.7685   LearningRate 0.0364   Epoch: 7   Global Step: 45140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:11,935-Speed 5539.74 samples/sec   Loss 5.7392   LearningRate 0.0364   Epoch: 7   Global Step: 45150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:13,746-Speed 5655.45 samples/sec   Loss 5.8514   LearningRate 0.0363   Epoch: 7   Global Step: 45160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:15,572-Speed 5610.52 samples/sec   Loss 5.7802   LearningRate 0.0363   Epoch: 7   Global Step: 45170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:17,408-Speed 5583.41 samples/sec   Loss 5.7723   LearningRate 0.0363   Epoch: 7   Global Step: 45180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:19,232-Speed 5614.64 samples/sec   Loss 5.8708   LearningRate 0.0363   Epoch: 7   Global Step: 45190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:21,056-Speed 5616.38 samples/sec   Loss 5.9161   LearningRate 0.0363   Epoch: 7   Global Step: 45200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:18:22,889-Speed 5588.85 samples/sec   Loss 5.8212   LearningRate 0.0363   Epoch: 7   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:24,708-Speed 5631.22 samples/sec   Loss 5.9087   LearningRate 0.0363   Epoch: 7   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:26,526-Speed 5634.60 samples/sec   Loss 5.8696   LearningRate 0.0363   Epoch: 7   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:28,356-Speed 5597.64 samples/sec   Loss 5.7791   LearningRate 0.0363   Epoch: 7   Global Step: 45240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:30,168-Speed 5653.31 samples/sec   Loss 5.8795   LearningRate 0.0363   Epoch: 7   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:32,008-Speed 5566.60 samples/sec   Loss 5.6959   LearningRate 0.0362   Epoch: 7   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:33,869-Speed 5503.08 samples/sec   Loss 5.7634   LearningRate 0.0362   Epoch: 7   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:35,695-Speed 5611.51 samples/sec   Loss 5.9410   LearningRate 0.0362   Epoch: 7   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:37,516-Speed 5624.39 samples/sec   Loss 5.6471   LearningRate 0.0362   Epoch: 7   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:39,344-Speed 5602.65 samples/sec   Loss 5.8964   LearningRate 0.0362   Epoch: 7   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:18:41,187-Speed 5558.08 samples/sec   Loss 5.6845   LearningRate 0.0362   Epoch: 7   Global Step: 45310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:43,028-Speed 5563.71 samples/sec   Loss 5.8889   LearningRate 0.0362   Epoch: 7   Global Step: 45320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:44,853-Speed 5613.87 samples/sec   Loss 5.7844   LearningRate 0.0362   Epoch: 7   Global Step: 45330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:46,701-Speed 5541.33 samples/sec   Loss 6.0426   LearningRate 0.0362   Epoch: 7   Global Step: 45340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:48,566-Speed 5495.67 samples/sec   Loss 5.7877   LearningRate 0.0361   Epoch: 7   Global Step: 45350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:50,405-Speed 5567.90 samples/sec   Loss 5.7580   LearningRate 0.0361   Epoch: 7   Global Step: 45360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:52,225-Speed 5630.67 samples/sec   Loss 5.8975   LearningRate 0.0361   Epoch: 7   Global Step: 45370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:54,052-Speed 5605.04 samples/sec   Loss 5.8102   LearningRate 0.0361   Epoch: 7   Global Step: 45380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:55,886-Speed 5586.19 samples/sec   Loss 5.6348   LearningRate 0.0361   Epoch: 7   Global Step: 45390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:57,726-Speed 5566.00 samples/sec   Loss 5.7556   LearningRate 0.0361   Epoch: 7   Global Step: 45400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:18:59,553-Speed 5608.81 samples/sec   Loss 5.8169   LearningRate 0.0361   Epoch: 7   Global Step: 45410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:01,416-Speed 5497.79 samples/sec   Loss 5.7335   LearningRate 0.0361   Epoch: 7   Global Step: 45420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:03,244-Speed 5603.00 samples/sec   Loss 5.6286   LearningRate 0.0361   Epoch: 7   Global Step: 45430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:05,064-Speed 5627.11 samples/sec   Loss 5.7058   LearningRate 0.0361   Epoch: 7   Global Step: 45440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:06,887-Speed 5619.71 samples/sec   Loss 5.8483   LearningRate 0.0360   Epoch: 7   Global Step: 45450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:08,746-Speed 5510.64 samples/sec   Loss 5.8235   LearningRate 0.0360   Epoch: 7   Global Step: 45460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:10,575-Speed 5599.52 samples/sec   Loss 5.8653   LearningRate 0.0360   Epoch: 7   Global Step: 45470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:12,484-Speed 5367.75 samples/sec   Loss 5.8872   LearningRate 0.0360   Epoch: 7   Global Step: 45480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:24,996-Speed 818.46 samples/sec   Loss 5.7818   LearningRate 0.0360   Epoch: 8   Global Step: 45490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:26,946-Speed 5255.16 samples/sec   Loss 5.1901   LearningRate 0.0360   Epoch: 8   Global Step: 45500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:28,803-Speed 5516.27 samples/sec   Loss 5.1171   LearningRate 0.0360   Epoch: 8   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:30,658-Speed 5522.15 samples/sec   Loss 5.0973   LearningRate 0.0360   Epoch: 8   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:32,489-Speed 5596.58 samples/sec   Loss 5.2529   LearningRate 0.0360   Epoch: 8   Global Step: 45530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:34,366-Speed 5459.06 samples/sec   Loss 5.1046   LearningRate 0.0359   Epoch: 8   Global Step: 45540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:36,290-Speed 5324.49 samples/sec   Loss 5.1819   LearningRate 0.0359   Epoch: 8   Global Step: 45550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:38,132-Speed 5562.61 samples/sec   Loss 5.3442   LearningRate 0.0359   Epoch: 8   Global Step: 45560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:39,961-Speed 5600.83 samples/sec   Loss 5.2023   LearningRate 0.0359   Epoch: 8   Global Step: 45570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:41,815-Speed 5523.98 samples/sec   Loss 5.2866   LearningRate 0.0359   Epoch: 8   Global Step: 45580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:43,658-Speed 5558.28 samples/sec   Loss 5.2538   LearningRate 0.0359   Epoch: 8   Global Step: 45590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:45,479-Speed 5624.96 samples/sec   Loss 5.2855   LearningRate 0.0359   Epoch: 8   Global Step: 45600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:47,321-Speed 5563.65 samples/sec   Loss 5.2895   LearningRate 0.0359   Epoch: 8   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:49,146-Speed 5612.90 samples/sec   Loss 5.2222   LearningRate 0.0359   Epoch: 8   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:50,968-Speed 5623.46 samples/sec   Loss 5.1628   LearningRate 0.0359   Epoch: 8   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:19:52,809-Speed 5564.31 samples/sec   Loss 5.2438   LearningRate 0.0358   Epoch: 8   Global Step: 45640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:54,659-Speed 5537.97 samples/sec   Loss 5.3188   LearningRate 0.0358   Epoch: 8   Global Step: 45650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:56,506-Speed 5545.67 samples/sec   Loss 5.2753   LearningRate 0.0358   Epoch: 8   Global Step: 45660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:19:58,329-Speed 5617.98 samples/sec   Loss 5.2862   LearningRate 0.0358   Epoch: 8   Global Step: 45670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:00,176-Speed 5546.45 samples/sec   Loss 5.2722   LearningRate 0.0358   Epoch: 8   Global Step: 45680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:01,999-Speed 5618.23 samples/sec   Loss 5.3413   LearningRate 0.0358   Epoch: 8   Global Step: 45690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:03,866-Speed 5488.10 samples/sec   Loss 5.3714   LearningRate 0.0358   Epoch: 8   Global Step: 45700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:05,696-Speed 5595.85 samples/sec   Loss 5.3873   LearningRate 0.0358   Epoch: 8   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:07,534-Speed 5574.45 samples/sec   Loss 5.3796   LearningRate 0.0358   Epoch: 8   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:09,367-Speed 5588.24 samples/sec   Loss 5.2448   LearningRate 0.0357   Epoch: 8   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:11,180-Speed 5649.88 samples/sec   Loss 5.3197   LearningRate 0.0357   Epoch: 8   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:20:13,053-Speed 5471.74 samples/sec   Loss 5.2662   LearningRate 0.0357   Epoch: 8   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:20:14,873-Speed 5627.42 samples/sec   Loss 5.1973   LearningRate 0.0357   Epoch: 8   Global Step: 45760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:20:16,709-Speed 5579.78 samples/sec   Loss 5.3239   LearningRate 0.0357   Epoch: 8   Global Step: 45770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:20:18,533-Speed 5614.55 samples/sec   Loss 5.3310   LearningRate 0.0357   Epoch: 8   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:20,366-Speed 5591.62 samples/sec   Loss 5.3550   LearningRate 0.0357   Epoch: 8   Global Step: 45790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:22,201-Speed 5582.40 samples/sec   Loss 5.3689   LearningRate 0.0357   Epoch: 8   Global Step: 45800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:24,016-Speed 5643.91 samples/sec   Loss 5.3169   LearningRate 0.0357   Epoch: 8   Global Step: 45810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:25,828-Speed 5652.09 samples/sec   Loss 5.2163   LearningRate 0.0357   Epoch: 8   Global Step: 45820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:27,689-Speed 5503.24 samples/sec   Loss 5.5058   LearningRate 0.0356   Epoch: 8   Global Step: 45830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:29,529-Speed 5566.92 samples/sec   Loss 5.3558   LearningRate 0.0356   Epoch: 8   Global Step: 45840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:31,378-Speed 5542.61 samples/sec   Loss 5.5584   LearningRate 0.0356   Epoch: 8   Global Step: 45850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:33,197-Speed 5630.86 samples/sec   Loss 5.3856   LearningRate 0.0356   Epoch: 8   Global Step: 45860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:35,012-Speed 5644.32 samples/sec   Loss 5.3128   LearningRate 0.0356   Epoch: 8   Global Step: 45870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:36,832-Speed 5628.71 samples/sec   Loss 5.4592   LearningRate 0.0356   Epoch: 8   Global Step: 45880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:38,654-Speed 5623.04 samples/sec   Loss 5.3889   LearningRate 0.0356   Epoch: 8   Global Step: 45890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:40,485-Speed 5594.54 samples/sec   Loss 5.3954   LearningRate 0.0356   Epoch: 8   Global Step: 45900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:20:42,302-Speed 5635.87 samples/sec   Loss 5.3857   LearningRate 0.0356   Epoch: 8   Global Step: 45910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:44,126-Speed 5616.92 samples/sec   Loss 5.4796   LearningRate 0.0355   Epoch: 8   Global Step: 45920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:45,967-Speed 5563.59 samples/sec   Loss 5.4124   LearningRate 0.0355   Epoch: 8   Global Step: 45930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:47,780-Speed 5649.71 samples/sec   Loss 5.4463   LearningRate 0.0355   Epoch: 8   Global Step: 45940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:49,612-Speed 5592.52 samples/sec   Loss 5.2982   LearningRate 0.0355   Epoch: 8   Global Step: 45950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:51,458-Speed 5549.15 samples/sec   Loss 5.5471   LearningRate 0.0355   Epoch: 8   Global Step: 45960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:53,300-Speed 5559.23 samples/sec   Loss 5.4729   LearningRate 0.0355   Epoch: 8   Global Step: 45970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:55,151-Speed 5534.62 samples/sec   Loss 5.3325   LearningRate 0.0355   Epoch: 8   Global Step: 45980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:56,979-Speed 5603.51 samples/sec   Loss 5.5273   LearningRate 0.0355   Epoch: 8   Global Step: 45990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:20:58,815-Speed 5581.22 samples/sec   Loss 5.3929   LearningRate 0.0355   Epoch: 8   Global Step: 46000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:21:25,456-[lfw][46000]XNorm: 22.362603
Training: 2022-04-27 04:21:25,457-[lfw][46000]Accuracy-Flip: 0.99650+-0.00241
Training: 2022-04-27 04:21:25,457-[lfw][46000]Accuracy-Highest: 0.99750
Training: 2022-04-27 04:21:56,308-[cfp_fp][46000]XNorm: 19.956942
Training: 2022-04-27 04:21:56,309-[cfp_fp][46000]Accuracy-Flip: 0.94186+-0.01364
Training: 2022-04-27 04:21:56,309-[cfp_fp][46000]Accuracy-Highest: 0.94857
Training: 2022-04-27 04:22:22,921-[agedb_30][46000]XNorm: 21.985087
Training: 2022-04-27 04:22:22,922-[agedb_30][46000]Accuracy-Flip: 0.97267+-0.00932
Training: 2022-04-27 04:22:22,922-[agedb_30][46000]Accuracy-Highest: 0.97383
Training: 2022-04-27 04:22:24,757-Speed 119.15 samples/sec   Loss 5.4497   LearningRate 0.0355   Epoch: 8   Global Step: 46010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:22:26,595-Speed 5573.83 samples/sec   Loss 5.5328   LearningRate 0.0354   Epoch: 8   Global Step: 46020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:22:28,435-Speed 5566.61 samples/sec   Loss 5.2637   LearningRate 0.0354   Epoch: 8   Global Step: 46030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:22:30,244-Speed 5662.89 samples/sec   Loss 5.3575   LearningRate 0.0354   Epoch: 8   Global Step: 46040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:32,064-Speed 5627.84 samples/sec   Loss 5.4480   LearningRate 0.0354   Epoch: 8   Global Step: 46050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:33,880-Speed 5641.76 samples/sec   Loss 5.3987   LearningRate 0.0354   Epoch: 8   Global Step: 46060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:35,715-Speed 5583.10 samples/sec   Loss 5.4468   LearningRate 0.0354   Epoch: 8   Global Step: 46070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:37,541-Speed 5609.61 samples/sec   Loss 5.4649   LearningRate 0.0354   Epoch: 8   Global Step: 46080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:39,365-Speed 5614.62 samples/sec   Loss 5.3606   LearningRate 0.0354   Epoch: 8   Global Step: 46090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:41,215-Speed 5537.10 samples/sec   Loss 5.4202   LearningRate 0.0354   Epoch: 8   Global Step: 46100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:43,043-Speed 5604.25 samples/sec   Loss 5.5141   LearningRate 0.0353   Epoch: 8   Global Step: 46110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:44,873-Speed 5596.74 samples/sec   Loss 5.5154   LearningRate 0.0353   Epoch: 8   Global Step: 46120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:46,731-Speed 5511.25 samples/sec   Loss 5.2685   LearningRate 0.0353   Epoch: 8   Global Step: 46130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:48,600-Speed 5481.65 samples/sec   Loss 5.5192   LearningRate 0.0353   Epoch: 8   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:22:50,453-Speed 5528.57 samples/sec   Loss 5.4448   LearningRate 0.0353   Epoch: 8   Global Step: 46150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:52,295-Speed 5561.36 samples/sec   Loss 5.5566   LearningRate 0.0353   Epoch: 8   Global Step: 46160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:54,126-Speed 5595.38 samples/sec   Loss 5.4729   LearningRate 0.0353   Epoch: 8   Global Step: 46170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:55,943-Speed 5637.93 samples/sec   Loss 5.3291   LearningRate 0.0353   Epoch: 8   Global Step: 46180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:57,762-Speed 5629.91 samples/sec   Loss 5.3285   LearningRate 0.0353   Epoch: 8   Global Step: 46190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:22:59,599-Speed 5577.22 samples/sec   Loss 5.4188   LearningRate 0.0353   Epoch: 8   Global Step: 46200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:01,421-Speed 5621.00 samples/sec   Loss 5.4965   LearningRate 0.0352   Epoch: 8   Global Step: 46210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:03,252-Speed 5595.40 samples/sec   Loss 5.5713   LearningRate 0.0352   Epoch: 8   Global Step: 46220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:05,088-Speed 5578.33 samples/sec   Loss 5.4748   LearningRate 0.0352   Epoch: 8   Global Step: 46230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:06,921-Speed 5586.95 samples/sec   Loss 5.5199   LearningRate 0.0352   Epoch: 8   Global Step: 46240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:08,767-Speed 5551.74 samples/sec   Loss 5.4987   LearningRate 0.0352   Epoch: 8   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:10,599-Speed 5590.16 samples/sec   Loss 5.5927   LearningRate 0.0352   Epoch: 8   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:12,415-Speed 5639.18 samples/sec   Loss 5.4023   LearningRate 0.0352   Epoch: 8   Global Step: 46270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:14,241-Speed 5613.04 samples/sec   Loss 5.5274   LearningRate 0.0352   Epoch: 8   Global Step: 46280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:16,071-Speed 5596.35 samples/sec   Loss 5.4650   LearningRate 0.0352   Epoch: 8   Global Step: 46290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:17,907-Speed 5578.50 samples/sec   Loss 5.5532   LearningRate 0.0351   Epoch: 8   Global Step: 46300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:19,757-Speed 5539.34 samples/sec   Loss 5.6655   LearningRate 0.0351   Epoch: 8   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:21,614-Speed 5515.66 samples/sec   Loss 5.5127   LearningRate 0.0351   Epoch: 8   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:23,432-Speed 5633.34 samples/sec   Loss 5.4288   LearningRate 0.0351   Epoch: 8   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:25,246-Speed 5647.98 samples/sec   Loss 5.4737   LearningRate 0.0351   Epoch: 8   Global Step: 46340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:27,076-Speed 5597.99 samples/sec   Loss 5.4957   LearningRate 0.0351   Epoch: 8   Global Step: 46350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:28,894-Speed 5632.96 samples/sec   Loss 5.5193   LearningRate 0.0351   Epoch: 8   Global Step: 46360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:30,728-Speed 5583.93 samples/sec   Loss 5.3735   LearningRate 0.0351   Epoch: 8   Global Step: 46370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:32,585-Speed 5516.65 samples/sec   Loss 5.5721   LearningRate 0.0351   Epoch: 8   Global Step: 46380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:34,416-Speed 5594.55 samples/sec   Loss 5.4481   LearningRate 0.0351   Epoch: 8   Global Step: 46390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:36,228-Speed 5653.20 samples/sec   Loss 5.4428   LearningRate 0.0350   Epoch: 8   Global Step: 46400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:38,064-Speed 5581.13 samples/sec   Loss 5.5374   LearningRate 0.0350   Epoch: 8   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:39,887-Speed 5618.12 samples/sec   Loss 5.5650   LearningRate 0.0350   Epoch: 8   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:41,716-Speed 5601.22 samples/sec   Loss 5.5791   LearningRate 0.0350   Epoch: 8   Global Step: 46430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:43,565-Speed 5538.63 samples/sec   Loss 5.5533   LearningRate 0.0350   Epoch: 8   Global Step: 46440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:45,393-Speed 5605.04 samples/sec   Loss 5.5148   LearningRate 0.0350   Epoch: 8   Global Step: 46450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:47,223-Speed 5595.85 samples/sec   Loss 5.3773   LearningRate 0.0350   Epoch: 8   Global Step: 46460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:49,048-Speed 5613.25 samples/sec   Loss 5.4353   LearningRate 0.0350   Epoch: 8   Global Step: 46470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:23:50,876-Speed 5605.47 samples/sec   Loss 5.5569   LearningRate 0.0350   Epoch: 8   Global Step: 46480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:52,702-Speed 5609.55 samples/sec   Loss 5.5346   LearningRate 0.0350   Epoch: 8   Global Step: 46490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:54,552-Speed 5538.25 samples/sec   Loss 5.4866   LearningRate 0.0349   Epoch: 8   Global Step: 46500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:56,383-Speed 5594.76 samples/sec   Loss 5.4684   LearningRate 0.0349   Epoch: 8   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:23:58,232-Speed 5538.06 samples/sec   Loss 5.5406   LearningRate 0.0349   Epoch: 8   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:00,058-Speed 5611.55 samples/sec   Loss 5.5033   LearningRate 0.0349   Epoch: 8   Global Step: 46530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:01,905-Speed 5546.64 samples/sec   Loss 5.5486   LearningRate 0.0349   Epoch: 8   Global Step: 46540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:03,749-Speed 5554.34 samples/sec   Loss 5.5817   LearningRate 0.0349   Epoch: 8   Global Step: 46550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:05,607-Speed 5514.55 samples/sec   Loss 5.5279   LearningRate 0.0349   Epoch: 8   Global Step: 46560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:07,458-Speed 5533.41 samples/sec   Loss 5.6591   LearningRate 0.0349   Epoch: 8   Global Step: 46570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:09,279-Speed 5626.34 samples/sec   Loss 5.5026   LearningRate 0.0349   Epoch: 8   Global Step: 46580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:11,094-Speed 5642.76 samples/sec   Loss 5.4339   LearningRate 0.0348   Epoch: 8   Global Step: 46590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:12,947-Speed 5529.07 samples/sec   Loss 5.6427   LearningRate 0.0348   Epoch: 8   Global Step: 46600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:14,757-Speed 5658.57 samples/sec   Loss 5.6636   LearningRate 0.0348   Epoch: 8   Global Step: 46610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:16,573-Speed 5640.37 samples/sec   Loss 5.4558   LearningRate 0.0348   Epoch: 8   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:18,385-Speed 5654.61 samples/sec   Loss 5.6032   LearningRate 0.0348   Epoch: 8   Global Step: 46630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:20,197-Speed 5651.23 samples/sec   Loss 5.4744   LearningRate 0.0348   Epoch: 8   Global Step: 46640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:21,994-Speed 5701.56 samples/sec   Loss 5.4794   LearningRate 0.0348   Epoch: 8   Global Step: 46650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:23,814-Speed 5628.27 samples/sec   Loss 5.5031   LearningRate 0.0348   Epoch: 8   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:25,632-Speed 5632.75 samples/sec   Loss 5.4668   LearningRate 0.0348   Epoch: 8   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:27,437-Speed 5678.13 samples/sec   Loss 5.5139   LearningRate 0.0348   Epoch: 8   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:29,252-Speed 5644.44 samples/sec   Loss 5.4631   LearningRate 0.0347   Epoch: 8   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:31,077-Speed 5612.82 samples/sec   Loss 5.5782   LearningRate 0.0347   Epoch: 8   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:32,893-Speed 5638.98 samples/sec   Loss 5.3672   LearningRate 0.0347   Epoch: 8   Global Step: 46710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:34,743-Speed 5538.14 samples/sec   Loss 5.5344   LearningRate 0.0347   Epoch: 8   Global Step: 46720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:36,564-Speed 5624.99 samples/sec   Loss 5.6605   LearningRate 0.0347   Epoch: 8   Global Step: 46730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:38,386-Speed 5620.76 samples/sec   Loss 5.4896   LearningRate 0.0347   Epoch: 8   Global Step: 46740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:40,191-Speed 5676.13 samples/sec   Loss 5.5710   LearningRate 0.0347   Epoch: 8   Global Step: 46750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:24:42,005-Speed 5645.48 samples/sec   Loss 5.5543   LearningRate 0.0347   Epoch: 8   Global Step: 46760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:43,809-Speed 5677.76 samples/sec   Loss 5.5461   LearningRate 0.0347   Epoch: 8   Global Step: 46770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:45,645-Speed 5579.22 samples/sec   Loss 5.6122   LearningRate 0.0346   Epoch: 8   Global Step: 46780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:47,474-Speed 5599.65 samples/sec   Loss 5.6293   LearningRate 0.0346   Epoch: 8   Global Step: 46790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:49,292-Speed 5636.93 samples/sec   Loss 5.6160   LearningRate 0.0346   Epoch: 8   Global Step: 46800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:51,111-Speed 5632.03 samples/sec   Loss 5.5099   LearningRate 0.0346   Epoch: 8   Global Step: 46810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:52,959-Speed 5542.09 samples/sec   Loss 5.6360   LearningRate 0.0346   Epoch: 8   Global Step: 46820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:54,789-Speed 5598.12 samples/sec   Loss 5.4971   LearningRate 0.0346   Epoch: 8   Global Step: 46830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:56,613-Speed 5618.38 samples/sec   Loss 5.6332   LearningRate 0.0346   Epoch: 8   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:24:58,435-Speed 5620.04 samples/sec   Loss 5.5575   LearningRate 0.0346   Epoch: 8   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:00,264-Speed 5601.59 samples/sec   Loss 5.7102   LearningRate 0.0346   Epoch: 8   Global Step: 46860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:02,083-Speed 5629.51 samples/sec   Loss 5.5365   LearningRate 0.0346   Epoch: 8   Global Step: 46870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:03,905-Speed 5621.88 samples/sec   Loss 5.6098   LearningRate 0.0345   Epoch: 8   Global Step: 46880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:05,708-Speed 5682.62 samples/sec   Loss 5.4809   LearningRate 0.0345   Epoch: 8   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:07,520-Speed 5652.81 samples/sec   Loss 5.5024   LearningRate 0.0345   Epoch: 8   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:09,334-Speed 5649.45 samples/sec   Loss 5.4677   LearningRate 0.0345   Epoch: 8   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:11,176-Speed 5561.06 samples/sec   Loss 5.4897   LearningRate 0.0345   Epoch: 8   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:12,994-Speed 5632.52 samples/sec   Loss 5.5390   LearningRate 0.0345   Epoch: 8   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:14,825-Speed 5596.42 samples/sec   Loss 5.5306   LearningRate 0.0345   Epoch: 8   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:16,651-Speed 5607.08 samples/sec   Loss 5.5265   LearningRate 0.0345   Epoch: 8   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:18,474-Speed 5621.08 samples/sec   Loss 5.5423   LearningRate 0.0345   Epoch: 8   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:20,295-Speed 5625.18 samples/sec   Loss 5.5364   LearningRate 0.0345   Epoch: 8   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:22,118-Speed 5618.84 samples/sec   Loss 5.5473   LearningRate 0.0344   Epoch: 8   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:23,950-Speed 5591.49 samples/sec   Loss 5.6623   LearningRate 0.0344   Epoch: 8   Global Step: 46990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:25,769-Speed 5632.19 samples/sec   Loss 5.5842   LearningRate 0.0344   Epoch: 8   Global Step: 47000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:27,601-Speed 5590.87 samples/sec   Loss 5.5804   LearningRate 0.0344   Epoch: 8   Global Step: 47010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:29,406-Speed 5674.45 samples/sec   Loss 5.5216   LearningRate 0.0344   Epoch: 8   Global Step: 47020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:31,251-Speed 5553.74 samples/sec   Loss 5.6770   LearningRate 0.0344   Epoch: 8   Global Step: 47030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:33,059-Speed 5664.59 samples/sec   Loss 5.6135   LearningRate 0.0344   Epoch: 8   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:34,872-Speed 5649.94 samples/sec   Loss 5.5512   LearningRate 0.0344   Epoch: 8   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:36,709-Speed 5574.81 samples/sec   Loss 5.5243   LearningRate 0.0344   Epoch: 8   Global Step: 47060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:38,541-Speed 5592.27 samples/sec   Loss 5.4944   LearningRate 0.0343   Epoch: 8   Global Step: 47070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:40,371-Speed 5596.85 samples/sec   Loss 5.7574   LearningRate 0.0343   Epoch: 8   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:42,259-Speed 5426.11 samples/sec   Loss 5.6017   LearningRate 0.0343   Epoch: 8   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:44,106-Speed 5566.40 samples/sec   Loss 5.4704   LearningRate 0.0343   Epoch: 8   Global Step: 47100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:25:45,923-Speed 5638.11 samples/sec   Loss 5.7691   LearningRate 0.0343   Epoch: 8   Global Step: 47110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:47,744-Speed 5623.43 samples/sec   Loss 5.5833   LearningRate 0.0343   Epoch: 8   Global Step: 47120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:49,575-Speed 5595.78 samples/sec   Loss 5.5922   LearningRate 0.0343   Epoch: 8   Global Step: 47130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:51,433-Speed 5514.01 samples/sec   Loss 5.5848   LearningRate 0.0343   Epoch: 8   Global Step: 47140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:53,268-Speed 5583.89 samples/sec   Loss 5.4889   LearningRate 0.0343   Epoch: 8   Global Step: 47150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:55,095-Speed 5606.36 samples/sec   Loss 5.5227   LearningRate 0.0343   Epoch: 8   Global Step: 47160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:56,905-Speed 5658.22 samples/sec   Loss 5.5484   LearningRate 0.0342   Epoch: 8   Global Step: 47170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:25:58,727-Speed 5622.84 samples/sec   Loss 5.5598   LearningRate 0.0342   Epoch: 8   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:00,556-Speed 5599.94 samples/sec   Loss 5.5663   LearningRate 0.0342   Epoch: 8   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:02,397-Speed 5565.75 samples/sec   Loss 5.6503   LearningRate 0.0342   Epoch: 8   Global Step: 47200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:04,205-Speed 5664.93 samples/sec   Loss 5.6400   LearningRate 0.0342   Epoch: 8   Global Step: 47210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:06,018-Speed 5649.73 samples/sec   Loss 5.5251   LearningRate 0.0342   Epoch: 8   Global Step: 47220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:07,826-Speed 5663.43 samples/sec   Loss 5.4954   LearningRate 0.0342   Epoch: 8   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:09,644-Speed 5636.12 samples/sec   Loss 5.5533   LearningRate 0.0342   Epoch: 8   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:11,476-Speed 5592.01 samples/sec   Loss 5.6368   LearningRate 0.0342   Epoch: 8   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:13,288-Speed 5654.66 samples/sec   Loss 5.5753   LearningRate 0.0342   Epoch: 8   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:15,113-Speed 5611.54 samples/sec   Loss 5.4545   LearningRate 0.0341   Epoch: 8   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:16,941-Speed 5603.70 samples/sec   Loss 5.5411   LearningRate 0.0341   Epoch: 8   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:18,780-Speed 5571.55 samples/sec   Loss 5.6285   LearningRate 0.0341   Epoch: 8   Global Step: 47290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:20,641-Speed 5503.92 samples/sec   Loss 5.6588   LearningRate 0.0341   Epoch: 8   Global Step: 47300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:22,489-Speed 5541.14 samples/sec   Loss 5.5952   LearningRate 0.0341   Epoch: 8   Global Step: 47310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:24,353-Speed 5496.25 samples/sec   Loss 5.7028   LearningRate 0.0341   Epoch: 8   Global Step: 47320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:26,181-Speed 5603.72 samples/sec   Loss 5.4926   LearningRate 0.0341   Epoch: 8   Global Step: 47330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:27,994-Speed 5650.64 samples/sec   Loss 5.5316   LearningRate 0.0341   Epoch: 8   Global Step: 47340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:29,830-Speed 5578.09 samples/sec   Loss 5.5862   LearningRate 0.0341   Epoch: 8   Global Step: 47350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:26:31,668-Speed 5572.76 samples/sec   Loss 5.6170   LearningRate 0.0341   Epoch: 8   Global Step: 47360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:33,497-Speed 5600.50 samples/sec   Loss 5.5770   LearningRate 0.0340   Epoch: 8   Global Step: 47370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:35,327-Speed 5598.73 samples/sec   Loss 5.5721   LearningRate 0.0340   Epoch: 8   Global Step: 47380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:37,147-Speed 5627.74 samples/sec   Loss 5.5947   LearningRate 0.0340   Epoch: 8   Global Step: 47390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:38,991-Speed 5554.96 samples/sec   Loss 5.6553   LearningRate 0.0340   Epoch: 8   Global Step: 47400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:40,814-Speed 5619.60 samples/sec   Loss 5.5635   LearningRate 0.0340   Epoch: 8   Global Step: 47410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:42,637-Speed 5619.68 samples/sec   Loss 5.4849   LearningRate 0.0340   Epoch: 8   Global Step: 47420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:44,471-Speed 5583.79 samples/sec   Loss 5.5552   LearningRate 0.0340   Epoch: 8   Global Step: 47430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:46,294-Speed 5620.15 samples/sec   Loss 5.5668   LearningRate 0.0340   Epoch: 8   Global Step: 47440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:48,106-Speed 5654.55 samples/sec   Loss 5.4981   LearningRate 0.0340   Epoch: 8   Global Step: 47450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:49,941-Speed 5579.50 samples/sec   Loss 5.5472   LearningRate 0.0339   Epoch: 8   Global Step: 47460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:51,787-Speed 5551.22 samples/sec   Loss 5.4967   LearningRate 0.0339   Epoch: 8   Global Step: 47470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:53,636-Speed 5540.21 samples/sec   Loss 5.6605   LearningRate 0.0339   Epoch: 8   Global Step: 47480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:55,478-Speed 5561.44 samples/sec   Loss 5.6548   LearningRate 0.0339   Epoch: 8   Global Step: 47490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:57,314-Speed 5576.48 samples/sec   Loss 5.5316   LearningRate 0.0339   Epoch: 8   Global Step: 47500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:26:59,158-Speed 5556.20 samples/sec   Loss 5.4829   LearningRate 0.0339   Epoch: 8   Global Step: 47510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:00,986-Speed 5604.41 samples/sec   Loss 5.5599   LearningRate 0.0339   Epoch: 8   Global Step: 47520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:02,810-Speed 5615.74 samples/sec   Loss 5.5989   LearningRate 0.0339   Epoch: 8   Global Step: 47530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:04,622-Speed 5653.30 samples/sec   Loss 5.5033   LearningRate 0.0339   Epoch: 8   Global Step: 47540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:06,447-Speed 5612.00 samples/sec   Loss 5.5505   LearningRate 0.0339   Epoch: 8   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:08,273-Speed 5610.13 samples/sec   Loss 5.7667   LearningRate 0.0338   Epoch: 8   Global Step: 47560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:27:10,097-Speed 5615.72 samples/sec   Loss 5.5313   LearningRate 0.0338   Epoch: 8   Global Step: 47570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:27:11,929-Speed 5591.76 samples/sec   Loss 5.4650   LearningRate 0.0338   Epoch: 8   Global Step: 47580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:27:13,772-Speed 5558.49 samples/sec   Loss 5.5742   LearningRate 0.0338   Epoch: 8   Global Step: 47590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:15,580-Speed 5663.06 samples/sec   Loss 5.4913   LearningRate 0.0338   Epoch: 8   Global Step: 47600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:17,426-Speed 5549.12 samples/sec   Loss 5.6075   LearningRate 0.0338   Epoch: 8   Global Step: 47610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:19,255-Speed 5602.37 samples/sec   Loss 5.6310   LearningRate 0.0338   Epoch: 8   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:21,125-Speed 5478.62 samples/sec   Loss 5.7949   LearningRate 0.0338   Epoch: 8   Global Step: 47630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:22,944-Speed 5632.68 samples/sec   Loss 5.6208   LearningRate 0.0338   Epoch: 8   Global Step: 47640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:24,768-Speed 5614.68 samples/sec   Loss 5.6351   LearningRate 0.0338   Epoch: 8   Global Step: 47650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:26,616-Speed 5544.99 samples/sec   Loss 5.5425   LearningRate 0.0337   Epoch: 8   Global Step: 47660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:28,449-Speed 5586.23 samples/sec   Loss 5.5288   LearningRate 0.0337   Epoch: 8   Global Step: 47670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:30,306-Speed 5517.74 samples/sec   Loss 5.5501   LearningRate 0.0337   Epoch: 8   Global Step: 47680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:32,131-Speed 5612.08 samples/sec   Loss 5.5263   LearningRate 0.0337   Epoch: 8   Global Step: 47690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:27:33,941-Speed 5658.35 samples/sec   Loss 5.5850   LearningRate 0.0337   Epoch: 8   Global Step: 47700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:27:35,783-Speed 5560.06 samples/sec   Loss 5.7066   LearningRate 0.0337   Epoch: 8   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:37,615-Speed 5591.33 samples/sec   Loss 5.4760   LearningRate 0.0337   Epoch: 8   Global Step: 47720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:39,453-Speed 5575.66 samples/sec   Loss 5.5234   LearningRate 0.0337   Epoch: 8   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:41,293-Speed 5568.87 samples/sec   Loss 5.5493   LearningRate 0.0337   Epoch: 8   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:43,111-Speed 5634.88 samples/sec   Loss 5.6490   LearningRate 0.0337   Epoch: 8   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:44,929-Speed 5633.12 samples/sec   Loss 5.5137   LearningRate 0.0336   Epoch: 8   Global Step: 47760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:46,754-Speed 5612.96 samples/sec   Loss 5.5967   LearningRate 0.0336   Epoch: 8   Global Step: 47770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:48,571-Speed 5637.45 samples/sec   Loss 5.6513   LearningRate 0.0336   Epoch: 8   Global Step: 47780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:50,385-Speed 5647.16 samples/sec   Loss 5.5800   LearningRate 0.0336   Epoch: 8   Global Step: 47790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:52,205-Speed 5627.76 samples/sec   Loss 5.5514   LearningRate 0.0336   Epoch: 8   Global Step: 47800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:54,022-Speed 5638.18 samples/sec   Loss 5.5994   LearningRate 0.0336   Epoch: 8   Global Step: 47810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:55,834-Speed 5652.93 samples/sec   Loss 5.5642   LearningRate 0.0336   Epoch: 8   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:57,646-Speed 5655.39 samples/sec   Loss 5.5248   LearningRate 0.0336   Epoch: 8   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:27:59,473-Speed 5605.66 samples/sec   Loss 5.5797   LearningRate 0.0336   Epoch: 8   Global Step: 47840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:01,293-Speed 5627.44 samples/sec   Loss 5.5993   LearningRate 0.0336   Epoch: 8   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:03,140-Speed 5546.35 samples/sec   Loss 5.5517   LearningRate 0.0335   Epoch: 8   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:04,975-Speed 5584.47 samples/sec   Loss 5.5540   LearningRate 0.0335   Epoch: 8   Global Step: 47870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:06,801-Speed 5608.41 samples/sec   Loss 5.5311   LearningRate 0.0335   Epoch: 8   Global Step: 47880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:08,641-Speed 5566.03 samples/sec   Loss 5.5340   LearningRate 0.0335   Epoch: 8   Global Step: 47890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:10,476-Speed 5583.32 samples/sec   Loss 5.4987   LearningRate 0.0335   Epoch: 8   Global Step: 47900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:12,331-Speed 5522.46 samples/sec   Loss 5.5935   LearningRate 0.0335   Epoch: 8   Global Step: 47910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:28:14,169-Speed 5571.40 samples/sec   Loss 5.6319   LearningRate 0.0335   Epoch: 8   Global Step: 47920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:16,017-Speed 5544.49 samples/sec   Loss 5.4488   LearningRate 0.0335   Epoch: 8   Global Step: 47930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:17,865-Speed 5546.00 samples/sec   Loss 5.6977   LearningRate 0.0335   Epoch: 8   Global Step: 47940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:19,695-Speed 5596.91 samples/sec   Loss 5.5002   LearningRate 0.0334   Epoch: 8   Global Step: 47950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:21,627-Speed 5302.48 samples/sec   Loss 5.5280   LearningRate 0.0334   Epoch: 8   Global Step: 47960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:23,479-Speed 5533.68 samples/sec   Loss 5.5771   LearningRate 0.0334   Epoch: 8   Global Step: 47970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:25,300-Speed 5623.78 samples/sec   Loss 5.5862   LearningRate 0.0334   Epoch: 8   Global Step: 47980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:27,119-Speed 5631.50 samples/sec   Loss 5.6587   LearningRate 0.0334   Epoch: 8   Global Step: 47990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:28,942-Speed 5620.56 samples/sec   Loss 5.5933   LearningRate 0.0334   Epoch: 8   Global Step: 48000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:28:55,258-[lfw][48000]XNorm: 22.802285
Training: 2022-04-27 04:28:55,258-[lfw][48000]Accuracy-Flip: 0.99717+-0.00325
Training: 2022-04-27 04:28:55,259-[lfw][48000]Accuracy-Highest: 0.99750
Training: 2022-04-27 04:29:25,772-[cfp_fp][48000]XNorm: 20.126648
Training: 2022-04-27 04:29:25,773-[cfp_fp][48000]Accuracy-Flip: 0.95257+-0.00958
Training: 2022-04-27 04:29:25,773-[cfp_fp][48000]Accuracy-Highest: 0.95257
Training: 2022-04-27 04:29:52,075-[agedb_30][48000]XNorm: 22.615926
Training: 2022-04-27 04:29:52,076-[agedb_30][48000]Accuracy-Flip: 0.97067+-0.00720
Training: 2022-04-27 04:29:52,076-[agedb_30][48000]Accuracy-Highest: 0.97383
Training: 2022-04-27 04:29:53,941-Speed 120.47 samples/sec   Loss 5.5185   LearningRate 0.0334   Epoch: 8   Global Step: 48010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:29:55,853-Speed 5355.80 samples/sec   Loss 5.6398   LearningRate 0.0334   Epoch: 8   Global Step: 48020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:29:57,777-Speed 5323.74 samples/sec   Loss 5.4525   LearningRate 0.0334   Epoch: 8   Global Step: 48030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:29:59,601-Speed 5615.63 samples/sec   Loss 5.3583   LearningRate 0.0334   Epoch: 8   Global Step: 48040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:30:01,426-Speed 5614.69 samples/sec   Loss 5.4128   LearningRate 0.0333   Epoch: 8   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:03,247-Speed 5623.77 samples/sec   Loss 5.5606   LearningRate 0.0333   Epoch: 8   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:05,078-Speed 5594.61 samples/sec   Loss 5.6358   LearningRate 0.0333   Epoch: 8   Global Step: 48070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:06,902-Speed 5616.73 samples/sec   Loss 5.5123   LearningRate 0.0333   Epoch: 8   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:08,713-Speed 5655.54 samples/sec   Loss 5.4213   LearningRate 0.0333   Epoch: 8   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:10,530-Speed 5639.54 samples/sec   Loss 5.5933   LearningRate 0.0333   Epoch: 8   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:12,343-Speed 5650.01 samples/sec   Loss 5.5383   LearningRate 0.0333   Epoch: 8   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:14,148-Speed 5676.49 samples/sec   Loss 5.6284   LearningRate 0.0333   Epoch: 8   Global Step: 48120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:15,949-Speed 5685.56 samples/sec   Loss 5.5778   LearningRate 0.0333   Epoch: 8   Global Step: 48130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:17,767-Speed 5633.76 samples/sec   Loss 5.5853   LearningRate 0.0333   Epoch: 8   Global Step: 48140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:19,577-Speed 5661.58 samples/sec   Loss 5.4499   LearningRate 0.0332   Epoch: 8   Global Step: 48150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:21,391-Speed 5645.91 samples/sec   Loss 5.4932   LearningRate 0.0332   Epoch: 8   Global Step: 48160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:23,211-Speed 5627.48 samples/sec   Loss 5.4859   LearningRate 0.0332   Epoch: 8   Global Step: 48170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:25,031-Speed 5627.56 samples/sec   Loss 5.6976   LearningRate 0.0332   Epoch: 8   Global Step: 48180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:26,844-Speed 5651.48 samples/sec   Loss 5.7465   LearningRate 0.0332   Epoch: 8   Global Step: 48190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:28,681-Speed 5575.50 samples/sec   Loss 5.5344   LearningRate 0.0332   Epoch: 8   Global Step: 48200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:30,496-Speed 5643.44 samples/sec   Loss 5.6271   LearningRate 0.0332   Epoch: 8   Global Step: 48210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:30:32,347-Speed 5535.53 samples/sec   Loss 5.6145   LearningRate 0.0332   Epoch: 8   Global Step: 48220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:34,188-Speed 5563.24 samples/sec   Loss 5.5456   LearningRate 0.0332   Epoch: 8   Global Step: 48230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:36,002-Speed 5648.54 samples/sec   Loss 5.5643   LearningRate 0.0332   Epoch: 8   Global Step: 48240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:37,820-Speed 5633.58 samples/sec   Loss 5.6186   LearningRate 0.0331   Epoch: 8   Global Step: 48250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:39,633-Speed 5651.87 samples/sec   Loss 5.5683   LearningRate 0.0331   Epoch: 8   Global Step: 48260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:41,441-Speed 5663.00 samples/sec   Loss 5.5023   LearningRate 0.0331   Epoch: 8   Global Step: 48270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:43,261-Speed 5628.93 samples/sec   Loss 5.5032   LearningRate 0.0331   Epoch: 8   Global Step: 48280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:45,129-Speed 5483.27 samples/sec   Loss 5.5486   LearningRate 0.0331   Epoch: 8   Global Step: 48290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:46,950-Speed 5625.11 samples/sec   Loss 5.5686   LearningRate 0.0331   Epoch: 8   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:48,816-Speed 5491.31 samples/sec   Loss 5.5268   LearningRate 0.0331   Epoch: 8   Global Step: 48310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:30:50,650-Speed 5583.06 samples/sec   Loss 5.5389   LearningRate 0.0331   Epoch: 8   Global Step: 48320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:30:52,500-Speed 5538.10 samples/sec   Loss 5.5804   LearningRate 0.0331   Epoch: 8   Global Step: 48330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:30:54,379-Speed 5453.85 samples/sec   Loss 5.4820   LearningRate 0.0331   Epoch: 8   Global Step: 48340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:30:56,225-Speed 5550.10 samples/sec   Loss 5.5050   LearningRate 0.0330   Epoch: 8   Global Step: 48350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:30:58,061-Speed 5576.53 samples/sec   Loss 5.4404   LearningRate 0.0330   Epoch: 8   Global Step: 48360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:30:59,891-Speed 5598.57 samples/sec   Loss 5.6067   LearningRate 0.0330   Epoch: 8   Global Step: 48370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:31:01,720-Speed 5602.32 samples/sec   Loss 5.5369   LearningRate 0.0330   Epoch: 8   Global Step: 48380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:31:03,550-Speed 5596.28 samples/sec   Loss 5.6281   LearningRate 0.0330   Epoch: 8   Global Step: 48390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:05,366-Speed 5641.72 samples/sec   Loss 5.5531   LearningRate 0.0330   Epoch: 8   Global Step: 48400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:07,187-Speed 5624.98 samples/sec   Loss 5.5399   LearningRate 0.0330   Epoch: 8   Global Step: 48410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:09,016-Speed 5600.38 samples/sec   Loss 5.4290   LearningRate 0.0330   Epoch: 8   Global Step: 48420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:10,843-Speed 5606.48 samples/sec   Loss 5.5746   LearningRate 0.0330   Epoch: 8   Global Step: 48430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:12,693-Speed 5537.74 samples/sec   Loss 5.5806   LearningRate 0.0330   Epoch: 8   Global Step: 48440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:14,515-Speed 5622.33 samples/sec   Loss 5.5729   LearningRate 0.0329   Epoch: 8   Global Step: 48450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:16,375-Speed 5506.64 samples/sec   Loss 5.4785   LearningRate 0.0329   Epoch: 8   Global Step: 48460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:18,193-Speed 5634.25 samples/sec   Loss 5.5898   LearningRate 0.0329   Epoch: 8   Global Step: 48470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:20,020-Speed 5610.04 samples/sec   Loss 5.4781   LearningRate 0.0329   Epoch: 8   Global Step: 48480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:21,856-Speed 5579.36 samples/sec   Loss 5.4634   LearningRate 0.0329   Epoch: 8   Global Step: 48490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:23,675-Speed 5630.90 samples/sec   Loss 5.5420   LearningRate 0.0329   Epoch: 8   Global Step: 48500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:25,528-Speed 5526.17 samples/sec   Loss 5.5532   LearningRate 0.0329   Epoch: 8   Global Step: 48510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:27,366-Speed 5575.63 samples/sec   Loss 5.5038   LearningRate 0.0329   Epoch: 8   Global Step: 48520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:29,213-Speed 5544.34 samples/sec   Loss 5.5797   LearningRate 0.0329   Epoch: 8   Global Step: 48530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:31,038-Speed 5613.55 samples/sec   Loss 5.5588   LearningRate 0.0329   Epoch: 8   Global Step: 48540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:32,879-Speed 5563.25 samples/sec   Loss 5.6545   LearningRate 0.0328   Epoch: 8   Global Step: 48550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:34,713-Speed 5584.72 samples/sec   Loss 5.6147   LearningRate 0.0328   Epoch: 8   Global Step: 48560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:36,524-Speed 5655.94 samples/sec   Loss 5.4802   LearningRate 0.0328   Epoch: 8   Global Step: 48570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:38,358-Speed 5586.71 samples/sec   Loss 5.4265   LearningRate 0.0328   Epoch: 8   Global Step: 48580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:40,197-Speed 5569.75 samples/sec   Loss 5.5030   LearningRate 0.0328   Epoch: 8   Global Step: 48590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:31:42,044-Speed 5545.78 samples/sec   Loss 5.5236   LearningRate 0.0328   Epoch: 8   Global Step: 48600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:31:43,855-Speed 5657.76 samples/sec   Loss 5.4313   LearningRate 0.0328   Epoch: 8   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:45,749-Speed 5406.89 samples/sec   Loss 5.5740   LearningRate 0.0328   Epoch: 8   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:47,571-Speed 5622.97 samples/sec   Loss 5.5711   LearningRate 0.0328   Epoch: 8   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:49,442-Speed 5475.97 samples/sec   Loss 5.5376   LearningRate 0.0328   Epoch: 8   Global Step: 48640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:51,302-Speed 5506.18 samples/sec   Loss 5.4848   LearningRate 0.0327   Epoch: 8   Global Step: 48650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:53,116-Speed 5648.28 samples/sec   Loss 5.5934   LearningRate 0.0327   Epoch: 8   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:54,951-Speed 5582.05 samples/sec   Loss 5.4317   LearningRate 0.0327   Epoch: 8   Global Step: 48670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:56,772-Speed 5624.41 samples/sec   Loss 5.5406   LearningRate 0.0327   Epoch: 8   Global Step: 48680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:31:58,588-Speed 5640.53 samples/sec   Loss 5.7195   LearningRate 0.0327   Epoch: 8   Global Step: 48690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:00,400-Speed 5654.11 samples/sec   Loss 5.3294   LearningRate 0.0327   Epoch: 8   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:02,255-Speed 5522.18 samples/sec   Loss 5.4688   LearningRate 0.0327   Epoch: 8   Global Step: 48710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:04,138-Speed 5439.84 samples/sec   Loss 5.5345   LearningRate 0.0327   Epoch: 8   Global Step: 48720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:05,978-Speed 5567.50 samples/sec   Loss 5.5032   LearningRate 0.0327   Epoch: 8   Global Step: 48730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:07,799-Speed 5625.31 samples/sec   Loss 5.5261   LearningRate 0.0327   Epoch: 8   Global Step: 48740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:09,610-Speed 5658.10 samples/sec   Loss 5.4409   LearningRate 0.0326   Epoch: 8   Global Step: 48750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:11,412-Speed 5681.43 samples/sec   Loss 5.4688   LearningRate 0.0326   Epoch: 8   Global Step: 48760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:13,242-Speed 5597.96 samples/sec   Loss 5.5307   LearningRate 0.0326   Epoch: 8   Global Step: 48770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:15,055-Speed 5650.02 samples/sec   Loss 5.4860   LearningRate 0.0326   Epoch: 8   Global Step: 48780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:16,880-Speed 5612.88 samples/sec   Loss 5.6236   LearningRate 0.0326   Epoch: 8   Global Step: 48790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:18,715-Speed 5583.69 samples/sec   Loss 5.5217   LearningRate 0.0326   Epoch: 8   Global Step: 48800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:20,542-Speed 5606.33 samples/sec   Loss 5.4913   LearningRate 0.0326   Epoch: 8   Global Step: 48810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:22,375-Speed 5591.69 samples/sec   Loss 5.5447   LearningRate 0.0326   Epoch: 8   Global Step: 48820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:24,222-Speed 5545.35 samples/sec   Loss 5.5916   LearningRate 0.0326   Epoch: 8   Global Step: 48830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:26,063-Speed 5565.94 samples/sec   Loss 5.5145   LearningRate 0.0325   Epoch: 8   Global Step: 48840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:27,923-Speed 5504.81 samples/sec   Loss 5.5711   LearningRate 0.0325   Epoch: 8   Global Step: 48850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:29,842-Speed 5337.53 samples/sec   Loss 5.5116   LearningRate 0.0325   Epoch: 8   Global Step: 48860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:31,688-Speed 5550.80 samples/sec   Loss 5.5234   LearningRate 0.0325   Epoch: 8   Global Step: 48870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:33,524-Speed 5577.59 samples/sec   Loss 5.4855   LearningRate 0.0325   Epoch: 8   Global Step: 48880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:35,351-Speed 5607.58 samples/sec   Loss 5.5468   LearningRate 0.0325   Epoch: 8   Global Step: 48890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:37,207-Speed 5518.75 samples/sec   Loss 5.4571   LearningRate 0.0325   Epoch: 8   Global Step: 48900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:39,048-Speed 5563.34 samples/sec   Loss 5.6111   LearningRate 0.0325   Epoch: 8   Global Step: 48910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:40,872-Speed 5616.31 samples/sec   Loss 5.4768   LearningRate 0.0325   Epoch: 8   Global Step: 48920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:32:42,677-Speed 5677.68 samples/sec   Loss 5.6031   LearningRate 0.0325   Epoch: 8   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:44,490-Speed 5649.01 samples/sec   Loss 5.4721   LearningRate 0.0324   Epoch: 8   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:46,323-Speed 5586.82 samples/sec   Loss 5.5686   LearningRate 0.0324   Epoch: 8   Global Step: 48950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:48,147-Speed 5616.32 samples/sec   Loss 5.6390   LearningRate 0.0324   Epoch: 8   Global Step: 48960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:49,997-Speed 5536.66 samples/sec   Loss 5.5276   LearningRate 0.0324   Epoch: 8   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:51,822-Speed 5613.20 samples/sec   Loss 5.5709   LearningRate 0.0324   Epoch: 8   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:53,647-Speed 5614.41 samples/sec   Loss 5.3897   LearningRate 0.0324   Epoch: 8   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:55,461-Speed 5646.10 samples/sec   Loss 5.3903   LearningRate 0.0324   Epoch: 8   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:57,279-Speed 5635.92 samples/sec   Loss 5.7091   LearningRate 0.0324   Epoch: 8   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:32:59,114-Speed 5580.53 samples/sec   Loss 5.5154   LearningRate 0.0324   Epoch: 8   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:00,929-Speed 5643.20 samples/sec   Loss 5.3813   LearningRate 0.0324   Epoch: 8   Global Step: 49030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:02,748-Speed 5633.67 samples/sec   Loss 5.3708   LearningRate 0.0323   Epoch: 8   Global Step: 49040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:04,574-Speed 5608.40 samples/sec   Loss 5.4378   LearningRate 0.0323   Epoch: 8   Global Step: 49050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:06,405-Speed 5595.30 samples/sec   Loss 5.4136   LearningRate 0.0323   Epoch: 8   Global Step: 49060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:08,213-Speed 5665.49 samples/sec   Loss 5.5034   LearningRate 0.0323   Epoch: 8   Global Step: 49070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:10,011-Speed 5703.97 samples/sec   Loss 5.5183   LearningRate 0.0323   Epoch: 8   Global Step: 49080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:11,844-Speed 5587.56 samples/sec   Loss 5.5719   LearningRate 0.0323   Epoch: 8   Global Step: 49090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:13,662-Speed 5636.26 samples/sec   Loss 5.4693   LearningRate 0.0323   Epoch: 8   Global Step: 49100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:15,484-Speed 5621.71 samples/sec   Loss 5.3706   LearningRate 0.0323   Epoch: 8   Global Step: 49110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:17,293-Speed 5660.92 samples/sec   Loss 5.4186   LearningRate 0.0323   Epoch: 8   Global Step: 49120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:19,110-Speed 5636.97 samples/sec   Loss 5.5655   LearningRate 0.0323   Epoch: 8   Global Step: 49130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:20,927-Speed 5637.57 samples/sec   Loss 5.4675   LearningRate 0.0322   Epoch: 8   Global Step: 49140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:22,743-Speed 5641.29 samples/sec   Loss 5.5210   LearningRate 0.0322   Epoch: 8   Global Step: 49150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:24,567-Speed 5616.60 samples/sec   Loss 5.4894   LearningRate 0.0322   Epoch: 8   Global Step: 49160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:26,399-Speed 5591.71 samples/sec   Loss 5.4160   LearningRate 0.0322   Epoch: 8   Global Step: 49170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:28,301-Speed 5385.83 samples/sec   Loss 5.5253   LearningRate 0.0322   Epoch: 8   Global Step: 49180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:30,120-Speed 5632.34 samples/sec   Loss 5.5905   LearningRate 0.0322   Epoch: 8   Global Step: 49190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:31,955-Speed 5583.53 samples/sec   Loss 5.5007   LearningRate 0.0322   Epoch: 8   Global Step: 49200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:33,773-Speed 5634.13 samples/sec   Loss 5.5864   LearningRate 0.0322   Epoch: 8   Global Step: 49210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:35,613-Speed 5566.56 samples/sec   Loss 5.5439   LearningRate 0.0322   Epoch: 8   Global Step: 49220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:37,457-Speed 5553.67 samples/sec   Loss 5.3382   LearningRate 0.0322   Epoch: 8   Global Step: 49230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:39,272-Speed 5643.42 samples/sec   Loss 5.5216   LearningRate 0.0321   Epoch: 8   Global Step: 49240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:41,089-Speed 5638.86 samples/sec   Loss 5.5315   LearningRate 0.0321   Epoch: 8   Global Step: 49250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:42,914-Speed 5612.99 samples/sec   Loss 5.5688   LearningRate 0.0321   Epoch: 8   Global Step: 49260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:44,739-Speed 5612.72 samples/sec   Loss 5.5087   LearningRate 0.0321   Epoch: 8   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:33:46,553-Speed 5645.62 samples/sec   Loss 5.5548   LearningRate 0.0321   Epoch: 8   Global Step: 49280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:48,385-Speed 5593.28 samples/sec   Loss 5.4346   LearningRate 0.0321   Epoch: 8   Global Step: 49290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:50,216-Speed 5593.10 samples/sec   Loss 5.4762   LearningRate 0.0321   Epoch: 8   Global Step: 49300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:52,060-Speed 5554.69 samples/sec   Loss 5.4166   LearningRate 0.0321   Epoch: 8   Global Step: 49310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:53,887-Speed 5608.86 samples/sec   Loss 5.3962   LearningRate 0.0321   Epoch: 8   Global Step: 49320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:55,719-Speed 5589.34 samples/sec   Loss 5.5963   LearningRate 0.0321   Epoch: 8   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:57,540-Speed 5628.06 samples/sec   Loss 5.5485   LearningRate 0.0321   Epoch: 8   Global Step: 49340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:33:59,354-Speed 5648.44 samples/sec   Loss 5.5033   LearningRate 0.0320   Epoch: 8   Global Step: 49350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:01,173-Speed 5630.87 samples/sec   Loss 5.4440   LearningRate 0.0320   Epoch: 8   Global Step: 49360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:02,996-Speed 5616.84 samples/sec   Loss 5.3957   LearningRate 0.0320   Epoch: 8   Global Step: 49370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:04,835-Speed 5569.81 samples/sec   Loss 5.4197   LearningRate 0.0320   Epoch: 8   Global Step: 49380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:06,651-Speed 5642.02 samples/sec   Loss 5.4283   LearningRate 0.0320   Epoch: 8   Global Step: 49390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:08,483-Speed 5592.88 samples/sec   Loss 5.4369   LearningRate 0.0320   Epoch: 8   Global Step: 49400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:10,315-Speed 5592.15 samples/sec   Loss 5.4304   LearningRate 0.0320   Epoch: 8   Global Step: 49410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:12,145-Speed 5596.94 samples/sec   Loss 5.4020   LearningRate 0.0320   Epoch: 8   Global Step: 49420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:13,982-Speed 5577.21 samples/sec   Loss 5.3845   LearningRate 0.0320   Epoch: 8   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:15,811-Speed 5598.04 samples/sec   Loss 5.3801   LearningRate 0.0320   Epoch: 8   Global Step: 49440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:17,668-Speed 5517.77 samples/sec   Loss 5.5353   LearningRate 0.0319   Epoch: 8   Global Step: 49450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:19,506-Speed 5572.93 samples/sec   Loss 5.4808   LearningRate 0.0319   Epoch: 8   Global Step: 49460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:21,328-Speed 5621.29 samples/sec   Loss 5.4104   LearningRate 0.0319   Epoch: 8   Global Step: 49470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:23,137-Speed 5662.33 samples/sec   Loss 5.4299   LearningRate 0.0319   Epoch: 8   Global Step: 49480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:24,943-Speed 5670.43 samples/sec   Loss 5.4176   LearningRate 0.0319   Epoch: 8   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:26,771-Speed 5604.96 samples/sec   Loss 5.5145   LearningRate 0.0319   Epoch: 8   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:28,599-Speed 5602.84 samples/sec   Loss 5.4537   LearningRate 0.0319   Epoch: 8   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:30,426-Speed 5608.31 samples/sec   Loss 5.3437   LearningRate 0.0319   Epoch: 8   Global Step: 49520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:32,278-Speed 5531.29 samples/sec   Loss 5.4381   LearningRate 0.0319   Epoch: 8   Global Step: 49530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:34,105-Speed 5606.60 samples/sec   Loss 5.4901   LearningRate 0.0319   Epoch: 8   Global Step: 49540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:35,979-Speed 5464.75 samples/sec   Loss 5.6222   LearningRate 0.0318   Epoch: 8   Global Step: 49550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:37,820-Speed 5564.77 samples/sec   Loss 5.5388   LearningRate 0.0318   Epoch: 8   Global Step: 49560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:39,650-Speed 5599.25 samples/sec   Loss 5.3307   LearningRate 0.0318   Epoch: 8   Global Step: 49570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:41,482-Speed 5590.24 samples/sec   Loss 5.4677   LearningRate 0.0318   Epoch: 8   Global Step: 49580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:43,330-Speed 5544.54 samples/sec   Loss 5.5612   LearningRate 0.0318   Epoch: 8   Global Step: 49590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:45,153-Speed 5619.51 samples/sec   Loss 5.5055   LearningRate 0.0318   Epoch: 8   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:34:46,982-Speed 5599.80 samples/sec   Loss 5.4930   LearningRate 0.0318   Epoch: 8   Global Step: 49610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:48,815-Speed 5612.85 samples/sec   Loss 5.3628   LearningRate 0.0318   Epoch: 8   Global Step: 49620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:50,661-Speed 5548.16 samples/sec   Loss 5.4220   LearningRate 0.0318   Epoch: 8   Global Step: 49630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:52,510-Speed 5541.45 samples/sec   Loss 5.4133   LearningRate 0.0318   Epoch: 8   Global Step: 49640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:54,358-Speed 5541.53 samples/sec   Loss 5.5002   LearningRate 0.0317   Epoch: 8   Global Step: 49650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:56,193-Speed 5583.75 samples/sec   Loss 5.5488   LearningRate 0.0317   Epoch: 8   Global Step: 49660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:58,026-Speed 5586.55 samples/sec   Loss 5.4351   LearningRate 0.0317   Epoch: 8   Global Step: 49670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:34:59,853-Speed 5606.46 samples/sec   Loss 5.3910   LearningRate 0.0317   Epoch: 8   Global Step: 49680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:01,720-Speed 5487.83 samples/sec   Loss 5.3769   LearningRate 0.0317   Epoch: 8   Global Step: 49690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:03,562-Speed 5562.85 samples/sec   Loss 5.4538   LearningRate 0.0317   Epoch: 8   Global Step: 49700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:05,417-Speed 5521.65 samples/sec   Loss 5.4421   LearningRate 0.0317   Epoch: 8   Global Step: 49710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:07,228-Speed 5655.18 samples/sec   Loss 5.4350   LearningRate 0.0317   Epoch: 8   Global Step: 49720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:09,068-Speed 5567.29 samples/sec   Loss 5.4920   LearningRate 0.0317   Epoch: 8   Global Step: 49730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:10,902-Speed 5587.20 samples/sec   Loss 5.4707   LearningRate 0.0317   Epoch: 8   Global Step: 49740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:12,713-Speed 5653.63 samples/sec   Loss 5.4167   LearningRate 0.0316   Epoch: 8   Global Step: 49750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:14,550-Speed 5577.45 samples/sec   Loss 5.3494   LearningRate 0.0316   Epoch: 8   Global Step: 49760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:16,391-Speed 5563.59 samples/sec   Loss 5.3736   LearningRate 0.0316   Epoch: 8   Global Step: 49770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:18,221-Speed 5597.15 samples/sec   Loss 5.5987   LearningRate 0.0316   Epoch: 8   Global Step: 49780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:20,056-Speed 5583.16 samples/sec   Loss 5.3538   LearningRate 0.0316   Epoch: 8   Global Step: 49790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:21,915-Speed 5510.19 samples/sec   Loss 5.4978   LearningRate 0.0316   Epoch: 8   Global Step: 49800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:23,755-Speed 5568.84 samples/sec   Loss 5.3892   LearningRate 0.0316   Epoch: 8   Global Step: 49810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:25,585-Speed 5596.28 samples/sec   Loss 5.4754   LearningRate 0.0316   Epoch: 8   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:27,419-Speed 5587.26 samples/sec   Loss 5.3966   LearningRate 0.0316   Epoch: 8   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:29,257-Speed 5572.13 samples/sec   Loss 5.4599   LearningRate 0.0316   Epoch: 8   Global Step: 49840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:31,086-Speed 5601.18 samples/sec   Loss 5.3857   LearningRate 0.0315   Epoch: 8   Global Step: 49850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:32,895-Speed 5660.46 samples/sec   Loss 5.4170   LearningRate 0.0315   Epoch: 8   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:34,720-Speed 5612.57 samples/sec   Loss 5.3616   LearningRate 0.0315   Epoch: 8   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:36,559-Speed 5569.92 samples/sec   Loss 5.3792   LearningRate 0.0315   Epoch: 8   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:38,374-Speed 5644.48 samples/sec   Loss 5.4574   LearningRate 0.0315   Epoch: 8   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:40,215-Speed 5563.96 samples/sec   Loss 5.4246   LearningRate 0.0315   Epoch: 8   Global Step: 49900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:42,037-Speed 5621.13 samples/sec   Loss 5.5511   LearningRate 0.0315   Epoch: 8   Global Step: 49910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:43,868-Speed 5596.77 samples/sec   Loss 5.2946   LearningRate 0.0315   Epoch: 8   Global Step: 49920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:45,715-Speed 5545.96 samples/sec   Loss 5.5551   LearningRate 0.0315   Epoch: 8   Global Step: 49930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:47,575-Speed 5506.90 samples/sec   Loss 5.4364   LearningRate 0.0315   Epoch: 8   Global Step: 49940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:49,423-Speed 5542.80 samples/sec   Loss 5.5040   LearningRate 0.0314   Epoch: 8   Global Step: 49950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:35:51,247-Speed 5615.56 samples/sec   Loss 5.3832   LearningRate 0.0314   Epoch: 8   Global Step: 49960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:53,061-Speed 5646.38 samples/sec   Loss 5.3827   LearningRate 0.0314   Epoch: 8   Global Step: 49970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:54,886-Speed 5613.59 samples/sec   Loss 5.3545   LearningRate 0.0314   Epoch: 8   Global Step: 49980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:56,708-Speed 5623.45 samples/sec   Loss 5.3142   LearningRate 0.0314   Epoch: 8   Global Step: 49990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:35:58,510-Speed 5684.74 samples/sec   Loss 5.4579   LearningRate 0.0314   Epoch: 8   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:36:24,607-[lfw][50000]XNorm: 22.979849
Training: 2022-04-27 04:36:24,607-[lfw][50000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-27 04:36:24,608-[lfw][50000]Accuracy-Highest: 0.99800
Training: 2022-04-27 04:36:54,814-[cfp_fp][50000]XNorm: 19.953626
Training: 2022-04-27 04:36:54,815-[cfp_fp][50000]Accuracy-Flip: 0.94829+-0.01188
Training: 2022-04-27 04:36:54,815-[cfp_fp][50000]Accuracy-Highest: 0.95257
Training: 2022-04-27 04:37:20,844-[agedb_30][50000]XNorm: 22.859662
Training: 2022-04-27 04:37:20,845-[agedb_30][50000]Accuracy-Flip: 0.97217+-0.00820
Training: 2022-04-27 04:37:20,845-[agedb_30][50000]Accuracy-Highest: 0.97383
Training: 2022-04-27 04:37:22,698-Speed 121.63 samples/sec   Loss 5.3863   LearningRate 0.0314   Epoch: 8   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:24,511-Speed 5651.59 samples/sec   Loss 5.4194   LearningRate 0.0314   Epoch: 8   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:26,326-Speed 5643.63 samples/sec   Loss 5.3486   LearningRate 0.0314   Epoch: 8   Global Step: 50030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:28,129-Speed 5679.75 samples/sec   Loss 5.4279   LearningRate 0.0314   Epoch: 8   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:29,941-Speed 5654.87 samples/sec   Loss 5.5469   LearningRate 0.0313   Epoch: 8   Global Step: 50050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:31,787-Speed 5549.76 samples/sec   Loss 5.5559   LearningRate 0.0313   Epoch: 8   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:33,612-Speed 5612.05 samples/sec   Loss 5.3708   LearningRate 0.0313   Epoch: 8   Global Step: 50070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:35,441-Speed 5598.84 samples/sec   Loss 5.3564   LearningRate 0.0313   Epoch: 8   Global Step: 50080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:37,269-Speed 5605.65 samples/sec   Loss 5.3995   LearningRate 0.0313   Epoch: 8   Global Step: 50090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:39,072-Speed 5678.76 samples/sec   Loss 5.2726   LearningRate 0.0313   Epoch: 8   Global Step: 50100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:40,905-Speed 5587.54 samples/sec   Loss 5.4320   LearningRate 0.0313   Epoch: 8   Global Step: 50110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:42,802-Speed 5402.37 samples/sec   Loss 5.4034   LearningRate 0.0313   Epoch: 8   Global Step: 50120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:44,690-Speed 5425.94 samples/sec   Loss 5.4454   LearningRate 0.0313   Epoch: 8   Global Step: 50130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:46,500-Speed 5660.68 samples/sec   Loss 5.5323   LearningRate 0.0313   Epoch: 8   Global Step: 50140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:48,332-Speed 5590.81 samples/sec   Loss 5.4445   LearningRate 0.0312   Epoch: 8   Global Step: 50150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:50,197-Speed 5491.17 samples/sec   Loss 5.3892   LearningRate 0.0312   Epoch: 8   Global Step: 50160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:52,021-Speed 5617.96 samples/sec   Loss 5.4884   LearningRate 0.0312   Epoch: 8   Global Step: 50170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:53,860-Speed 5569.11 samples/sec   Loss 5.3968   LearningRate 0.0312   Epoch: 8   Global Step: 50180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:55,707-Speed 5546.44 samples/sec   Loss 5.4953   LearningRate 0.0312   Epoch: 8   Global Step: 50190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:37:57,541-Speed 5585.83 samples/sec   Loss 5.4336   LearningRate 0.0312   Epoch: 8   Global Step: 50200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:37:59,362-Speed 5625.18 samples/sec   Loss 5.5659   LearningRate 0.0312   Epoch: 8   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:38:01,179-Speed 5636.46 samples/sec   Loss 5.4281   LearningRate 0.0312   Epoch: 8   Global Step: 50220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:03,021-Speed 5559.79 samples/sec   Loss 5.4386   LearningRate 0.0312   Epoch: 8   Global Step: 50230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:04,846-Speed 5613.81 samples/sec   Loss 5.4537   LearningRate 0.0312   Epoch: 8   Global Step: 50240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:06,653-Speed 5669.00 samples/sec   Loss 5.3575   LearningRate 0.0312   Epoch: 8   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:08,461-Speed 5666.26 samples/sec   Loss 5.4696   LearningRate 0.0311   Epoch: 8   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:10,280-Speed 5632.56 samples/sec   Loss 5.3360   LearningRate 0.0311   Epoch: 8   Global Step: 50270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:12,108-Speed 5604.00 samples/sec   Loss 5.4463   LearningRate 0.0311   Epoch: 8   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:13,920-Speed 5651.16 samples/sec   Loss 5.4592   LearningRate 0.0311   Epoch: 8   Global Step: 50290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:15,770-Speed 5535.99 samples/sec   Loss 5.3139   LearningRate 0.0311   Epoch: 8   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:17,594-Speed 5617.98 samples/sec   Loss 5.5668   LearningRate 0.0311   Epoch: 8   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:19,411-Speed 5635.73 samples/sec   Loss 5.3348   LearningRate 0.0311   Epoch: 8   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:21,227-Speed 5640.99 samples/sec   Loss 5.2993   LearningRate 0.0311   Epoch: 8   Global Step: 50330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:23,056-Speed 5600.64 samples/sec   Loss 5.3484   LearningRate 0.0311   Epoch: 8   Global Step: 50340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:24,889-Speed 5589.11 samples/sec   Loss 5.3566   LearningRate 0.0311   Epoch: 8   Global Step: 50350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:26,727-Speed 5574.37 samples/sec   Loss 5.4318   LearningRate 0.0310   Epoch: 8   Global Step: 50360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:28,574-Speed 5543.62 samples/sec   Loss 5.4311   LearningRate 0.0310   Epoch: 8   Global Step: 50370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:30,422-Speed 5543.87 samples/sec   Loss 5.3672   LearningRate 0.0310   Epoch: 8   Global Step: 50380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:32,269-Speed 5547.89 samples/sec   Loss 5.4005   LearningRate 0.0310   Epoch: 8   Global Step: 50390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:34,112-Speed 5557.88 samples/sec   Loss 5.3204   LearningRate 0.0310   Epoch: 8   Global Step: 50400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:35,927-Speed 5642.49 samples/sec   Loss 5.3393   LearningRate 0.0310   Epoch: 8   Global Step: 50410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:37,749-Speed 5622.21 samples/sec   Loss 5.4126   LearningRate 0.0310   Epoch: 8   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:38:39,579-Speed 5597.95 samples/sec   Loss 5.3833   LearningRate 0.0310   Epoch: 8   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:38:41,405-Speed 5609.34 samples/sec   Loss 5.3308   LearningRate 0.0310   Epoch: 8   Global Step: 50440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:43,263-Speed 5511.49 samples/sec   Loss 5.4923   LearningRate 0.0310   Epoch: 8   Global Step: 50450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:45,110-Speed 5548.25 samples/sec   Loss 5.3890   LearningRate 0.0309   Epoch: 8   Global Step: 50460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:46,942-Speed 5590.55 samples/sec   Loss 5.3680   LearningRate 0.0309   Epoch: 8   Global Step: 50470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:48,777-Speed 5583.44 samples/sec   Loss 5.4017   LearningRate 0.0309   Epoch: 8   Global Step: 50480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:50,628-Speed 5533.07 samples/sec   Loss 5.4052   LearningRate 0.0309   Epoch: 8   Global Step: 50490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:52,454-Speed 5610.62 samples/sec   Loss 5.3680   LearningRate 0.0309   Epoch: 8   Global Step: 50500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:54,283-Speed 5601.50 samples/sec   Loss 5.3596   LearningRate 0.0309   Epoch: 8   Global Step: 50510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:56,096-Speed 5650.18 samples/sec   Loss 5.4369   LearningRate 0.0309   Epoch: 8   Global Step: 50520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:57,906-Speed 5659.18 samples/sec   Loss 5.4286   LearningRate 0.0309   Epoch: 8   Global Step: 50530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:38:59,729-Speed 5619.92 samples/sec   Loss 5.5593   LearningRate 0.0309   Epoch: 8   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:01,553-Speed 5614.19 samples/sec   Loss 5.4793   LearningRate 0.0309   Epoch: 8   Global Step: 50550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:03,391-Speed 5576.26 samples/sec   Loss 5.3625   LearningRate 0.0308   Epoch: 8   Global Step: 50560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:05,211-Speed 5626.86 samples/sec   Loss 5.2587   LearningRate 0.0308   Epoch: 8   Global Step: 50570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:07,053-Speed 5562.04 samples/sec   Loss 5.4226   LearningRate 0.0308   Epoch: 8   Global Step: 50580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:08,872-Speed 5629.23 samples/sec   Loss 5.2951   LearningRate 0.0308   Epoch: 8   Global Step: 50590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:10,685-Speed 5651.30 samples/sec   Loss 5.3812   LearningRate 0.0308   Epoch: 8   Global Step: 50600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:12,504-Speed 5630.48 samples/sec   Loss 5.3307   LearningRate 0.0308   Epoch: 8   Global Step: 50610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:14,315-Speed 5655.41 samples/sec   Loss 5.5338   LearningRate 0.0308   Epoch: 8   Global Step: 50620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:16,142-Speed 5608.41 samples/sec   Loss 5.2697   LearningRate 0.0308   Epoch: 8   Global Step: 50630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:17,972-Speed 5596.98 samples/sec   Loss 5.4370   LearningRate 0.0308   Epoch: 8   Global Step: 50640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:19,787-Speed 5644.66 samples/sec   Loss 5.4028   LearningRate 0.0308   Epoch: 8   Global Step: 50650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:21,596-Speed 5662.06 samples/sec   Loss 5.4029   LearningRate 0.0307   Epoch: 8   Global Step: 50660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:23,414-Speed 5634.04 samples/sec   Loss 5.2746   LearningRate 0.0307   Epoch: 8   Global Step: 50670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:25,246-Speed 5592.25 samples/sec   Loss 5.4549   LearningRate 0.0307   Epoch: 8   Global Step: 50680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:27,069-Speed 5619.42 samples/sec   Loss 5.3344   LearningRate 0.0307   Epoch: 8   Global Step: 50690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:28,912-Speed 5556.52 samples/sec   Loss 5.4566   LearningRate 0.0307   Epoch: 8   Global Step: 50700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:30,733-Speed 5625.76 samples/sec   Loss 5.4359   LearningRate 0.0307   Epoch: 8   Global Step: 50710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:32,558-Speed 5613.97 samples/sec   Loss 5.4387   LearningRate 0.0307   Epoch: 8   Global Step: 50720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:34,377-Speed 5632.01 samples/sec   Loss 5.3326   LearningRate 0.0307   Epoch: 8   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:36,194-Speed 5635.62 samples/sec   Loss 5.4226   LearningRate 0.0307   Epoch: 8   Global Step: 50740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:38,017-Speed 5619.82 samples/sec   Loss 5.2701   LearningRate 0.0307   Epoch: 8   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:39,832-Speed 5643.30 samples/sec   Loss 5.3001   LearningRate 0.0307   Epoch: 8   Global Step: 50760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:41,661-Speed 5600.80 samples/sec   Loss 5.4062   LearningRate 0.0306   Epoch: 8   Global Step: 50770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:43,492-Speed 5594.06 samples/sec   Loss 5.4216   LearningRate 0.0306   Epoch: 8   Global Step: 50780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:45,330-Speed 5572.83 samples/sec   Loss 5.4201   LearningRate 0.0306   Epoch: 8   Global Step: 50790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:47,164-Speed 5585.34 samples/sec   Loss 5.3020   LearningRate 0.0306   Epoch: 8   Global Step: 50800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:49,010-Speed 5549.78 samples/sec   Loss 5.3246   LearningRate 0.0306   Epoch: 8   Global Step: 50810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:50,844-Speed 5586.48 samples/sec   Loss 5.4324   LearningRate 0.0306   Epoch: 8   Global Step: 50820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:52,680-Speed 5578.73 samples/sec   Loss 5.3660   LearningRate 0.0306   Epoch: 8   Global Step: 50830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:39:54,483-Speed 5680.84 samples/sec   Loss 5.4202   LearningRate 0.0306   Epoch: 8   Global Step: 50840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:56,305-Speed 5622.62 samples/sec   Loss 5.3751   LearningRate 0.0306   Epoch: 8   Global Step: 50850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:58,164-Speed 5509.71 samples/sec   Loss 5.2943   LearningRate 0.0306   Epoch: 8   Global Step: 50860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:39:59,979-Speed 5644.51 samples/sec   Loss 5.2993   LearningRate 0.0305   Epoch: 8   Global Step: 50870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:01,866-Speed 5426.37 samples/sec   Loss 5.3432   LearningRate 0.0305   Epoch: 8   Global Step: 50880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:03,693-Speed 5607.90 samples/sec   Loss 5.3808   LearningRate 0.0305   Epoch: 8   Global Step: 50890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:05,534-Speed 5564.16 samples/sec   Loss 5.3548   LearningRate 0.0305   Epoch: 8   Global Step: 50900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:07,389-Speed 5522.29 samples/sec   Loss 5.5173   LearningRate 0.0305   Epoch: 8   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:09,238-Speed 5539.30 samples/sec   Loss 5.4439   LearningRate 0.0305   Epoch: 8   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:11,059-Speed 5625.48 samples/sec   Loss 5.2632   LearningRate 0.0305   Epoch: 8   Global Step: 50930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:12,896-Speed 5576.43 samples/sec   Loss 5.2454   LearningRate 0.0305   Epoch: 8   Global Step: 50940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:40:14,708-Speed 5651.66 samples/sec   Loss 5.3613   LearningRate 0.0305   Epoch: 8   Global Step: 50950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:16,532-Speed 5620.68 samples/sec   Loss 5.4388   LearningRate 0.0305   Epoch: 8   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:18,354-Speed 5621.92 samples/sec   Loss 5.3620   LearningRate 0.0304   Epoch: 8   Global Step: 50970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:20,188-Speed 5585.22 samples/sec   Loss 5.4182   LearningRate 0.0304   Epoch: 8   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:22,019-Speed 5591.85 samples/sec   Loss 5.2460   LearningRate 0.0304   Epoch: 8   Global Step: 50990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:23,852-Speed 5589.13 samples/sec   Loss 5.3245   LearningRate 0.0304   Epoch: 8   Global Step: 51000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:25,685-Speed 5590.73 samples/sec   Loss 5.1971   LearningRate 0.0304   Epoch: 8   Global Step: 51010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:27,494-Speed 5661.57 samples/sec   Loss 5.3261   LearningRate 0.0304   Epoch: 8   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:29,316-Speed 5622.10 samples/sec   Loss 5.2841   LearningRate 0.0304   Epoch: 8   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:31,148-Speed 5591.81 samples/sec   Loss 5.1495   LearningRate 0.0304   Epoch: 8   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:32,974-Speed 5609.84 samples/sec   Loss 5.2480   LearningRate 0.0304   Epoch: 8   Global Step: 51050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:40:34,807-Speed 5588.87 samples/sec   Loss 5.4196   LearningRate 0.0304   Epoch: 8   Global Step: 51060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:40:36,644-Speed 5574.95 samples/sec   Loss 5.4364   LearningRate 0.0304   Epoch: 8   Global Step: 51070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:40:38,470-Speed 5609.11 samples/sec   Loss 5.3406   LearningRate 0.0303   Epoch: 8   Global Step: 51080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:40:40,291-Speed 5627.21 samples/sec   Loss 5.3413   LearningRate 0.0303   Epoch: 8   Global Step: 51090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:40:42,112-Speed 5622.71 samples/sec   Loss 5.3687   LearningRate 0.0303   Epoch: 8   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:43,929-Speed 5639.12 samples/sec   Loss 5.2841   LearningRate 0.0303   Epoch: 8   Global Step: 51110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:45,763-Speed 5584.63 samples/sec   Loss 5.3601   LearningRate 0.0303   Epoch: 8   Global Step: 51120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:47,576-Speed 5651.82 samples/sec   Loss 5.2811   LearningRate 0.0303   Epoch: 8   Global Step: 51130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:49,412-Speed 5580.47 samples/sec   Loss 5.3649   LearningRate 0.0303   Epoch: 8   Global Step: 51140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:51,270-Speed 5511.59 samples/sec   Loss 5.2257   LearningRate 0.0303   Epoch: 8   Global Step: 51150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:53,117-Speed 5546.83 samples/sec   Loss 5.3066   LearningRate 0.0303   Epoch: 8   Global Step: 51160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:40:54,999-Speed 5442.72 samples/sec   Loss 5.4113   LearningRate 0.0303   Epoch: 8   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:08,858-Speed 738.93 samples/sec   Loss 4.9760   LearningRate 0.0302   Epoch: 9   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:10,695-Speed 5575.84 samples/sec   Loss 4.6893   LearningRate 0.0302   Epoch: 9   Global Step: 51190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:12,566-Speed 5476.39 samples/sec   Loss 4.6158   LearningRate 0.0302   Epoch: 9   Global Step: 51200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:14,413-Speed 5546.99 samples/sec   Loss 4.6266   LearningRate 0.0302   Epoch: 9   Global Step: 51210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:16,240-Speed 5606.17 samples/sec   Loss 4.7044   LearningRate 0.0302   Epoch: 9   Global Step: 51220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:18,059-Speed 5632.35 samples/sec   Loss 4.6392   LearningRate 0.0302   Epoch: 9   Global Step: 51230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:19,884-Speed 5612.98 samples/sec   Loss 4.7309   LearningRate 0.0302   Epoch: 9   Global Step: 51240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:21,709-Speed 5612.83 samples/sec   Loss 4.6451   LearningRate 0.0302   Epoch: 9   Global Step: 51250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:23,525-Speed 5641.45 samples/sec   Loss 4.7851   LearningRate 0.0302   Epoch: 9   Global Step: 51260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:25,413-Speed 5423.67 samples/sec   Loss 4.8202   LearningRate 0.0302   Epoch: 9   Global Step: 51270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:41:27,265-Speed 5534.53 samples/sec   Loss 4.7141   LearningRate 0.0301   Epoch: 9   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:29,072-Speed 5667.52 samples/sec   Loss 4.7347   LearningRate 0.0301   Epoch: 9   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:30,905-Speed 5590.17 samples/sec   Loss 4.5634   LearningRate 0.0301   Epoch: 9   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:32,723-Speed 5632.01 samples/sec   Loss 4.8200   LearningRate 0.0301   Epoch: 9   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:34,661-Speed 5287.16 samples/sec   Loss 4.7941   LearningRate 0.0301   Epoch: 9   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:36,490-Speed 5599.99 samples/sec   Loss 4.8365   LearningRate 0.0301   Epoch: 9   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:38,305-Speed 5645.55 samples/sec   Loss 4.7886   LearningRate 0.0301   Epoch: 9   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:40,147-Speed 5558.94 samples/sec   Loss 4.8056   LearningRate 0.0301   Epoch: 9   Global Step: 51350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:41:41,981-Speed 5586.25 samples/sec   Loss 4.6969   LearningRate 0.0301   Epoch: 9   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:43,835-Speed 5527.45 samples/sec   Loss 4.9149   LearningRate 0.0301   Epoch: 9   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:45,726-Speed 5416.17 samples/sec   Loss 4.8875   LearningRate 0.0301   Epoch: 9   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:47,592-Speed 5489.55 samples/sec   Loss 4.8636   LearningRate 0.0300   Epoch: 9   Global Step: 51390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:49,431-Speed 5570.78 samples/sec   Loss 4.8001   LearningRate 0.0300   Epoch: 9   Global Step: 51400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:51,286-Speed 5520.76 samples/sec   Loss 4.8095   LearningRate 0.0300   Epoch: 9   Global Step: 51410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:53,136-Speed 5535.96 samples/sec   Loss 4.8137   LearningRate 0.0300   Epoch: 9   Global Step: 51420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:54,967-Speed 5594.63 samples/sec   Loss 4.7680   LearningRate 0.0300   Epoch: 9   Global Step: 51430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:56,843-Speed 5461.90 samples/sec   Loss 4.9782   LearningRate 0.0300   Epoch: 9   Global Step: 51440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:41:58,677-Speed 5584.17 samples/sec   Loss 5.0541   LearningRate 0.0300   Epoch: 9   Global Step: 51450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:42:00,508-Speed 5596.32 samples/sec   Loss 4.8309   LearningRate 0.0300   Epoch: 9   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:02,327-Speed 5629.98 samples/sec   Loss 4.9375   LearningRate 0.0300   Epoch: 9   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:04,147-Speed 5630.65 samples/sec   Loss 4.8000   LearningRate 0.0300   Epoch: 9   Global Step: 51480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:05,958-Speed 5654.62 samples/sec   Loss 4.8743   LearningRate 0.0299   Epoch: 9   Global Step: 51490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:07,773-Speed 5643.96 samples/sec   Loss 4.7847   LearningRate 0.0299   Epoch: 9   Global Step: 51500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:09,608-Speed 5580.90 samples/sec   Loss 4.8383   LearningRate 0.0299   Epoch: 9   Global Step: 51510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:11,417-Speed 5663.49 samples/sec   Loss 4.9224   LearningRate 0.0299   Epoch: 9   Global Step: 51520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:13,239-Speed 5623.78 samples/sec   Loss 4.8888   LearningRate 0.0299   Epoch: 9   Global Step: 51530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:15,082-Speed 5554.94 samples/sec   Loss 4.9075   LearningRate 0.0299   Epoch: 9   Global Step: 51540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:16,904-Speed 5625.76 samples/sec   Loss 4.9380   LearningRate 0.0299   Epoch: 9   Global Step: 51550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:18,722-Speed 5634.06 samples/sec   Loss 4.9158   LearningRate 0.0299   Epoch: 9   Global Step: 51560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:20,542-Speed 5626.08 samples/sec   Loss 4.8384   LearningRate 0.0299   Epoch: 9   Global Step: 51570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:22,381-Speed 5571.59 samples/sec   Loss 5.0186   LearningRate 0.0299   Epoch: 9   Global Step: 51580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:24,193-Speed 5653.73 samples/sec   Loss 4.7985   LearningRate 0.0298   Epoch: 9   Global Step: 51590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:26,006-Speed 5648.82 samples/sec   Loss 4.9126   LearningRate 0.0298   Epoch: 9   Global Step: 51600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:27,853-Speed 5545.89 samples/sec   Loss 4.8782   LearningRate 0.0298   Epoch: 9   Global Step: 51610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:29,682-Speed 5599.93 samples/sec   Loss 5.0145   LearningRate 0.0298   Epoch: 9   Global Step: 51620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:31,611-Speed 5311.17 samples/sec   Loss 4.9973   LearningRate 0.0298   Epoch: 9   Global Step: 51630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:33,543-Speed 5302.85 samples/sec   Loss 4.9620   LearningRate 0.0298   Epoch: 9   Global Step: 51640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:35,396-Speed 5526.00 samples/sec   Loss 5.0548   LearningRate 0.0298   Epoch: 9   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:37,236-Speed 5567.00 samples/sec   Loss 5.0616   LearningRate 0.0298   Epoch: 9   Global Step: 51660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:39,059-Speed 5619.55 samples/sec   Loss 4.8961   LearningRate 0.0298   Epoch: 9   Global Step: 51670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:40,893-Speed 5585.79 samples/sec   Loss 4.9679   LearningRate 0.0298   Epoch: 9   Global Step: 51680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:42,705-Speed 5651.68 samples/sec   Loss 4.9739   LearningRate 0.0298   Epoch: 9   Global Step: 51690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:44,540-Speed 5582.40 samples/sec   Loss 5.0947   LearningRate 0.0297   Epoch: 9   Global Step: 51700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:46,354-Speed 5647.73 samples/sec   Loss 5.0775   LearningRate 0.0297   Epoch: 9   Global Step: 51710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:42:48,181-Speed 5608.60 samples/sec   Loss 5.0324   LearningRate 0.0297   Epoch: 9   Global Step: 51720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:49,992-Speed 5655.59 samples/sec   Loss 4.8678   LearningRate 0.0297   Epoch: 9   Global Step: 51730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:51,828-Speed 5580.02 samples/sec   Loss 4.9340   LearningRate 0.0297   Epoch: 9   Global Step: 51740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:53,651-Speed 5618.43 samples/sec   Loss 4.9854   LearningRate 0.0297   Epoch: 9   Global Step: 51750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:55,482-Speed 5593.64 samples/sec   Loss 4.9198   LearningRate 0.0297   Epoch: 9   Global Step: 51760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:57,328-Speed 5550.00 samples/sec   Loss 5.0475   LearningRate 0.0297   Epoch: 9   Global Step: 51770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:42:59,160-Speed 5590.00 samples/sec   Loss 4.9629   LearningRate 0.0297   Epoch: 9   Global Step: 51780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:00,998-Speed 5572.01 samples/sec   Loss 5.0415   LearningRate 0.0297   Epoch: 9   Global Step: 51790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:02,837-Speed 5571.42 samples/sec   Loss 4.8764   LearningRate 0.0296   Epoch: 9   Global Step: 51800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:04,660-Speed 5619.43 samples/sec   Loss 4.9035   LearningRate 0.0296   Epoch: 9   Global Step: 51810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:06,477-Speed 5637.47 samples/sec   Loss 5.0495   LearningRate 0.0296   Epoch: 9   Global Step: 51820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:08,295-Speed 5635.84 samples/sec   Loss 5.0537   LearningRate 0.0296   Epoch: 9   Global Step: 51830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:10,099-Speed 5677.66 samples/sec   Loss 4.9587   LearningRate 0.0296   Epoch: 9   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:11,917-Speed 5635.06 samples/sec   Loss 5.0261   LearningRate 0.0296   Epoch: 9   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:13,737-Speed 5625.27 samples/sec   Loss 4.9744   LearningRate 0.0296   Epoch: 9   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:15,569-Speed 5591.94 samples/sec   Loss 4.9787   LearningRate 0.0296   Epoch: 9   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:17,388-Speed 5632.19 samples/sec   Loss 4.9050   LearningRate 0.0296   Epoch: 9   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:19,211-Speed 5619.35 samples/sec   Loss 4.8759   LearningRate 0.0296   Epoch: 9   Global Step: 51890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:21,047-Speed 5580.07 samples/sec   Loss 4.9484   LearningRate 0.0296   Epoch: 9   Global Step: 51900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:22,886-Speed 5567.72 samples/sec   Loss 5.0528   LearningRate 0.0295   Epoch: 9   Global Step: 51910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:24,707-Speed 5626.67 samples/sec   Loss 5.0756   LearningRate 0.0295   Epoch: 9   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:26,533-Speed 5608.87 samples/sec   Loss 5.1066   LearningRate 0.0295   Epoch: 9   Global Step: 51930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:43:28,357-Speed 5616.71 samples/sec   Loss 5.0250   LearningRate 0.0295   Epoch: 9   Global Step: 51940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:30,195-Speed 5573.91 samples/sec   Loss 5.0733   LearningRate 0.0295   Epoch: 9   Global Step: 51950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:32,045-Speed 5536.33 samples/sec   Loss 5.0063   LearningRate 0.0295   Epoch: 9   Global Step: 51960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:33,887-Speed 5563.05 samples/sec   Loss 5.0681   LearningRate 0.0295   Epoch: 9   Global Step: 51970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:35,719-Speed 5591.40 samples/sec   Loss 5.1030   LearningRate 0.0295   Epoch: 9   Global Step: 51980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:37,562-Speed 5558.34 samples/sec   Loss 4.9820   LearningRate 0.0295   Epoch: 9   Global Step: 51990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:43:39,389-Speed 5604.11 samples/sec   Loss 5.0993   LearningRate 0.0295   Epoch: 9   Global Step: 52000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:44:05,731-[lfw][52000]XNorm: 21.763303
Training: 2022-04-27 04:44:05,731-[lfw][52000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-27 04:44:05,732-[lfw][52000]Accuracy-Highest: 0.99800
Training: 2022-04-27 04:44:36,233-[cfp_fp][52000]XNorm: 19.181753
Training: 2022-04-27 04:44:36,233-[cfp_fp][52000]Accuracy-Flip: 0.94700+-0.01251
Training: 2022-04-27 04:44:36,234-[cfp_fp][52000]Accuracy-Highest: 0.95257
Training: 2022-04-27 04:45:02,538-[agedb_30][52000]XNorm: 21.631263
Training: 2022-04-27 04:45:02,538-[agedb_30][52000]Accuracy-Flip: 0.97483+-0.00962
Training: 2022-04-27 04:45:02,539-[agedb_30][52000]Accuracy-Highest: 0.97483
Training: 2022-04-27 04:45:04,366-Speed 120.50 samples/sec   Loss 5.0092   LearningRate 0.0294   Epoch: 9   Global Step: 52010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:06,164-Speed 5697.45 samples/sec   Loss 5.1457   LearningRate 0.0294   Epoch: 9   Global Step: 52020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:07,957-Speed 5713.16 samples/sec   Loss 5.0687   LearningRate 0.0294   Epoch: 9   Global Step: 52030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:09,767-Speed 5658.56 samples/sec   Loss 4.9335   LearningRate 0.0294   Epoch: 9   Global Step: 52040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:11,576-Speed 5664.72 samples/sec   Loss 5.0092   LearningRate 0.0294   Epoch: 9   Global Step: 52050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:13,389-Speed 5649.54 samples/sec   Loss 5.0749   LearningRate 0.0294   Epoch: 9   Global Step: 52060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:15,221-Speed 5591.00 samples/sec   Loss 4.9177   LearningRate 0.0294   Epoch: 9   Global Step: 52070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:17,029-Speed 5665.87 samples/sec   Loss 5.0457   LearningRate 0.0294   Epoch: 9   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:18,830-Speed 5685.04 samples/sec   Loss 5.1145   LearningRate 0.0294   Epoch: 9   Global Step: 52090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:20,648-Speed 5636.67 samples/sec   Loss 5.1562   LearningRate 0.0294   Epoch: 9   Global Step: 52100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:22,458-Speed 5657.32 samples/sec   Loss 4.9536   LearningRate 0.0294   Epoch: 9   Global Step: 52110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:24,276-Speed 5635.24 samples/sec   Loss 5.1652   LearningRate 0.0293   Epoch: 9   Global Step: 52120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:26,105-Speed 5599.45 samples/sec   Loss 5.0964   LearningRate 0.0293   Epoch: 9   Global Step: 52130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:27,921-Speed 5642.08 samples/sec   Loss 4.9298   LearningRate 0.0293   Epoch: 9   Global Step: 52140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:29,753-Speed 5592.30 samples/sec   Loss 4.9848   LearningRate 0.0293   Epoch: 9   Global Step: 52150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:31,564-Speed 5656.32 samples/sec   Loss 4.9214   LearningRate 0.0293   Epoch: 9   Global Step: 52160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:33,386-Speed 5622.12 samples/sec   Loss 5.1729   LearningRate 0.0293   Epoch: 9   Global Step: 52170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:35,213-Speed 5608.75 samples/sec   Loss 5.0291   LearningRate 0.0293   Epoch: 9   Global Step: 52180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:37,013-Speed 5688.30 samples/sec   Loss 5.0891   LearningRate 0.0293   Epoch: 9   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:38,825-Speed 5654.95 samples/sec   Loss 5.0576   LearningRate 0.0293   Epoch: 9   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:40,641-Speed 5640.47 samples/sec   Loss 4.9970   LearningRate 0.0293   Epoch: 9   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:42,456-Speed 5641.89 samples/sec   Loss 4.9697   LearningRate 0.0292   Epoch: 9   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:44,272-Speed 5639.76 samples/sec   Loss 4.9659   LearningRate 0.0292   Epoch: 9   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:46,084-Speed 5654.46 samples/sec   Loss 5.0322   LearningRate 0.0292   Epoch: 9   Global Step: 52240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:47,926-Speed 5561.36 samples/sec   Loss 5.1146   LearningRate 0.0292   Epoch: 9   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:49,740-Speed 5646.71 samples/sec   Loss 5.0940   LearningRate 0.0292   Epoch: 9   Global Step: 52260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:51,560-Speed 5628.51 samples/sec   Loss 5.0823   LearningRate 0.0292   Epoch: 9   Global Step: 52270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:53,403-Speed 5558.45 samples/sec   Loss 5.1748   LearningRate 0.0292   Epoch: 9   Global Step: 52280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:55,242-Speed 5568.94 samples/sec   Loss 5.0227   LearningRate 0.0292   Epoch: 9   Global Step: 52290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:45:57,072-Speed 5598.81 samples/sec   Loss 5.0463   LearningRate 0.0292   Epoch: 9   Global Step: 52300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:45:58,883-Speed 5654.73 samples/sec   Loss 5.1395   LearningRate 0.0292   Epoch: 9   Global Step: 52310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:00,699-Speed 5641.49 samples/sec   Loss 5.1602   LearningRate 0.0292   Epoch: 9   Global Step: 52320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:02,528-Speed 5601.07 samples/sec   Loss 5.0513   LearningRate 0.0291   Epoch: 9   Global Step: 52330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:04,345-Speed 5636.40 samples/sec   Loss 5.1694   LearningRate 0.0291   Epoch: 9   Global Step: 52340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:06,162-Speed 5637.72 samples/sec   Loss 5.2203   LearningRate 0.0291   Epoch: 9   Global Step: 52350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:07,982-Speed 5629.85 samples/sec   Loss 5.2048   LearningRate 0.0291   Epoch: 9   Global Step: 52360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:09,824-Speed 5560.22 samples/sec   Loss 5.0690   LearningRate 0.0291   Epoch: 9   Global Step: 52370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:11,653-Speed 5599.36 samples/sec   Loss 5.0238   LearningRate 0.0291   Epoch: 9   Global Step: 52380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:13,484-Speed 5593.44 samples/sec   Loss 5.1686   LearningRate 0.0291   Epoch: 9   Global Step: 52390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:15,329-Speed 5554.71 samples/sec   Loss 5.1190   LearningRate 0.0291   Epoch: 9   Global Step: 52400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:17,132-Speed 5681.33 samples/sec   Loss 5.1555   LearningRate 0.0291   Epoch: 9   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:18,942-Speed 5658.35 samples/sec   Loss 5.0489   LearningRate 0.0291   Epoch: 9   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:20,761-Speed 5632.07 samples/sec   Loss 4.9785   LearningRate 0.0290   Epoch: 9   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:22,569-Speed 5664.10 samples/sec   Loss 5.1449   LearningRate 0.0290   Epoch: 9   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:24,434-Speed 5493.14 samples/sec   Loss 5.1713   LearningRate 0.0290   Epoch: 9   Global Step: 52450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:26,307-Speed 5468.68 samples/sec   Loss 5.0289   LearningRate 0.0290   Epoch: 9   Global Step: 52460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:28,143-Speed 5579.53 samples/sec   Loss 5.1309   LearningRate 0.0290   Epoch: 9   Global Step: 52470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:29,962-Speed 5631.66 samples/sec   Loss 5.1909   LearningRate 0.0290   Epoch: 9   Global Step: 52480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:31,789-Speed 5608.82 samples/sec   Loss 5.0666   LearningRate 0.0290   Epoch: 9   Global Step: 52490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:33,602-Speed 5646.89 samples/sec   Loss 5.1546   LearningRate 0.0290   Epoch: 9   Global Step: 52500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:35,438-Speed 5581.90 samples/sec   Loss 5.0725   LearningRate 0.0290   Epoch: 9   Global Step: 52510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:37,250-Speed 5653.19 samples/sec   Loss 5.0284   LearningRate 0.0290   Epoch: 9   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:39,058-Speed 5665.56 samples/sec   Loss 4.9280   LearningRate 0.0290   Epoch: 9   Global Step: 52530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:40,873-Speed 5644.35 samples/sec   Loss 5.0442   LearningRate 0.0289   Epoch: 9   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:42,690-Speed 5637.82 samples/sec   Loss 5.0298   LearningRate 0.0289   Epoch: 9   Global Step: 52550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:44,511-Speed 5622.25 samples/sec   Loss 5.1384   LearningRate 0.0289   Epoch: 9   Global Step: 52560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:46,321-Speed 5661.65 samples/sec   Loss 5.0679   LearningRate 0.0289   Epoch: 9   Global Step: 52570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:46:48,126-Speed 5675.37 samples/sec   Loss 5.2163   LearningRate 0.0289   Epoch: 9   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:49,935-Speed 5660.09 samples/sec   Loss 5.0882   LearningRate 0.0289   Epoch: 9   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:51,744-Speed 5664.78 samples/sec   Loss 5.0645   LearningRate 0.0289   Epoch: 9   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:53,552-Speed 5664.25 samples/sec   Loss 5.1926   LearningRate 0.0289   Epoch: 9   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:55,369-Speed 5637.05 samples/sec   Loss 5.0847   LearningRate 0.0289   Epoch: 9   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:57,201-Speed 5592.37 samples/sec   Loss 5.1314   LearningRate 0.0289   Epoch: 9   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:46:59,014-Speed 5649.87 samples/sec   Loss 5.1190   LearningRate 0.0288   Epoch: 9   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:00,827-Speed 5651.22 samples/sec   Loss 5.0697   LearningRate 0.0288   Epoch: 9   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:02,686-Speed 5510.94 samples/sec   Loss 4.9619   LearningRate 0.0288   Epoch: 9   Global Step: 52660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:04,549-Speed 5497.75 samples/sec   Loss 5.2121   LearningRate 0.0288   Epoch: 9   Global Step: 52670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:06,370-Speed 5624.76 samples/sec   Loss 5.0582   LearningRate 0.0288   Epoch: 9   Global Step: 52680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:08,209-Speed 5569.78 samples/sec   Loss 5.1677   LearningRate 0.0288   Epoch: 9   Global Step: 52690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:10,044-Speed 5581.63 samples/sec   Loss 4.9742   LearningRate 0.0288   Epoch: 9   Global Step: 52700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:11,873-Speed 5602.57 samples/sec   Loss 5.1733   LearningRate 0.0288   Epoch: 9   Global Step: 52710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:13,699-Speed 5608.89 samples/sec   Loss 5.1678   LearningRate 0.0288   Epoch: 9   Global Step: 52720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:15,512-Speed 5648.41 samples/sec   Loss 5.0970   LearningRate 0.0288   Epoch: 9   Global Step: 52730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:17,334-Speed 5621.92 samples/sec   Loss 5.0726   LearningRate 0.0288   Epoch: 9   Global Step: 52740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:19,160-Speed 5611.67 samples/sec   Loss 5.1425   LearningRate 0.0287   Epoch: 9   Global Step: 52750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:20,989-Speed 5600.86 samples/sec   Loss 5.0919   LearningRate 0.0287   Epoch: 9   Global Step: 52760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:22,823-Speed 5586.24 samples/sec   Loss 4.9452   LearningRate 0.0287   Epoch: 9   Global Step: 52770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:24,642-Speed 5630.37 samples/sec   Loss 5.1354   LearningRate 0.0287   Epoch: 9   Global Step: 52780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:26,463-Speed 5625.41 samples/sec   Loss 5.1621   LearningRate 0.0287   Epoch: 9   Global Step: 52790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:28,277-Speed 5647.01 samples/sec   Loss 5.0892   LearningRate 0.0287   Epoch: 9   Global Step: 52800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:30,114-Speed 5575.93 samples/sec   Loss 5.1812   LearningRate 0.0287   Epoch: 9   Global Step: 52810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:31,959-Speed 5552.58 samples/sec   Loss 4.9474   LearningRate 0.0287   Epoch: 9   Global Step: 52820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:33,804-Speed 5550.28 samples/sec   Loss 5.1006   LearningRate 0.0287   Epoch: 9   Global Step: 52830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:35,619-Speed 5644.23 samples/sec   Loss 5.0856   LearningRate 0.0287   Epoch: 9   Global Step: 52840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:37,445-Speed 5611.18 samples/sec   Loss 4.9992   LearningRate 0.0287   Epoch: 9   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:39,273-Speed 5604.15 samples/sec   Loss 5.0943   LearningRate 0.0286   Epoch: 9   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:41,103-Speed 5596.81 samples/sec   Loss 5.1577   LearningRate 0.0286   Epoch: 9   Global Step: 52870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:42,936-Speed 5588.14 samples/sec   Loss 5.0549   LearningRate 0.0286   Epoch: 9   Global Step: 52880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:44,765-Speed 5598.97 samples/sec   Loss 4.9228   LearningRate 0.0286   Epoch: 9   Global Step: 52890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:46,597-Speed 5593.59 samples/sec   Loss 5.0503   LearningRate 0.0286   Epoch: 9   Global Step: 52900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:48,444-Speed 5545.99 samples/sec   Loss 5.2343   LearningRate 0.0286   Epoch: 9   Global Step: 52910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:50,282-Speed 5573.31 samples/sec   Loss 5.1566   LearningRate 0.0286   Epoch: 9   Global Step: 52920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:52,100-Speed 5632.85 samples/sec   Loss 5.0876   LearningRate 0.0286   Epoch: 9   Global Step: 52930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:47:53,906-Speed 5672.14 samples/sec   Loss 5.1279   LearningRate 0.0286   Epoch: 9   Global Step: 52940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:55,729-Speed 5618.18 samples/sec   Loss 5.0760   LearningRate 0.0286   Epoch: 9   Global Step: 52950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:57,542-Speed 5651.38 samples/sec   Loss 5.0620   LearningRate 0.0285   Epoch: 9   Global Step: 52960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:47:59,359-Speed 5636.79 samples/sec   Loss 5.1179   LearningRate 0.0285   Epoch: 9   Global Step: 52970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:01,170-Speed 5657.61 samples/sec   Loss 5.0396   LearningRate 0.0285   Epoch: 9   Global Step: 52980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:02,991-Speed 5624.64 samples/sec   Loss 5.0634   LearningRate 0.0285   Epoch: 9   Global Step: 52990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:04,809-Speed 5636.13 samples/sec   Loss 5.0506   LearningRate 0.0285   Epoch: 9   Global Step: 53000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:06,622-Speed 5651.18 samples/sec   Loss 5.1965   LearningRate 0.0285   Epoch: 9   Global Step: 53010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:08,442-Speed 5628.28 samples/sec   Loss 5.0840   LearningRate 0.0285   Epoch: 9   Global Step: 53020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:10,276-Speed 5585.37 samples/sec   Loss 5.0680   LearningRate 0.0285   Epoch: 9   Global Step: 53030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:12,091-Speed 5642.45 samples/sec   Loss 5.1202   LearningRate 0.0285   Epoch: 9   Global Step: 53040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:13,916-Speed 5611.25 samples/sec   Loss 5.1056   LearningRate 0.0285   Epoch: 9   Global Step: 53050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:15,743-Speed 5608.20 samples/sec   Loss 5.0727   LearningRate 0.0285   Epoch: 9   Global Step: 53060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:17,558-Speed 5645.01 samples/sec   Loss 5.0816   LearningRate 0.0284   Epoch: 9   Global Step: 53070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:19,397-Speed 5569.19 samples/sec   Loss 5.0549   LearningRate 0.0284   Epoch: 9   Global Step: 53080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:21,243-Speed 5549.38 samples/sec   Loss 5.1946   LearningRate 0.0284   Epoch: 9   Global Step: 53090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:23,067-Speed 5617.03 samples/sec   Loss 5.3265   LearningRate 0.0284   Epoch: 9   Global Step: 53100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:24,885-Speed 5634.30 samples/sec   Loss 5.1376   LearningRate 0.0284   Epoch: 9   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:26,706-Speed 5625.31 samples/sec   Loss 5.0052   LearningRate 0.0284   Epoch: 9   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:28,552-Speed 5547.68 samples/sec   Loss 5.0820   LearningRate 0.0284   Epoch: 9   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:30,365-Speed 5651.53 samples/sec   Loss 5.0076   LearningRate 0.0284   Epoch: 9   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:32,176-Speed 5655.63 samples/sec   Loss 5.0562   LearningRate 0.0284   Epoch: 9   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:34,020-Speed 5554.85 samples/sec   Loss 5.1258   LearningRate 0.0284   Epoch: 9   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:35,829-Speed 5662.88 samples/sec   Loss 5.0983   LearningRate 0.0284   Epoch: 9   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:37,656-Speed 5605.19 samples/sec   Loss 4.9856   LearningRate 0.0283   Epoch: 9   Global Step: 53180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:39,465-Speed 5662.80 samples/sec   Loss 5.1128   LearningRate 0.0283   Epoch: 9   Global Step: 53190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:41,284-Speed 5629.43 samples/sec   Loss 5.1023   LearningRate 0.0283   Epoch: 9   Global Step: 53200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:43,130-Speed 5550.34 samples/sec   Loss 5.1156   LearningRate 0.0283   Epoch: 9   Global Step: 53210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:48:44,940-Speed 5660.25 samples/sec   Loss 5.1761   LearningRate 0.0283   Epoch: 9   Global Step: 53220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:46,785-Speed 5552.04 samples/sec   Loss 4.9678   LearningRate 0.0283   Epoch: 9   Global Step: 53230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:48,600-Speed 5644.21 samples/sec   Loss 5.1552   LearningRate 0.0283   Epoch: 9   Global Step: 53240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:50,414-Speed 5645.06 samples/sec   Loss 5.1134   LearningRate 0.0283   Epoch: 9   Global Step: 53250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:52,231-Speed 5637.65 samples/sec   Loss 5.1252   LearningRate 0.0283   Epoch: 9   Global Step: 53260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:54,059-Speed 5604.25 samples/sec   Loss 5.1546   LearningRate 0.0283   Epoch: 9   Global Step: 53270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:55,881-Speed 5621.82 samples/sec   Loss 5.0264   LearningRate 0.0282   Epoch: 9   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:57,692-Speed 5659.04 samples/sec   Loss 5.1515   LearningRate 0.0282   Epoch: 9   Global Step: 53290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:48:59,499-Speed 5667.50 samples/sec   Loss 5.0772   LearningRate 0.0282   Epoch: 9   Global Step: 53300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:01,318-Speed 5630.11 samples/sec   Loss 4.9978   LearningRate 0.0282   Epoch: 9   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:03,142-Speed 5616.97 samples/sec   Loss 5.1310   LearningRate 0.0282   Epoch: 9   Global Step: 53320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:04,960-Speed 5635.03 samples/sec   Loss 5.1127   LearningRate 0.0282   Epoch: 9   Global Step: 53330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:06,759-Speed 5693.34 samples/sec   Loss 5.1524   LearningRate 0.0282   Epoch: 9   Global Step: 53340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:08,575-Speed 5640.67 samples/sec   Loss 5.0494   LearningRate 0.0282   Epoch: 9   Global Step: 53350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:10,404-Speed 5601.43 samples/sec   Loss 5.1567   LearningRate 0.0282   Epoch: 9   Global Step: 53360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:12,214-Speed 5659.66 samples/sec   Loss 5.1121   LearningRate 0.0282   Epoch: 9   Global Step: 53370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:14,039-Speed 5611.31 samples/sec   Loss 5.0154   LearningRate 0.0282   Epoch: 9   Global Step: 53380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:15,864-Speed 5614.15 samples/sec   Loss 5.0337   LearningRate 0.0281   Epoch: 9   Global Step: 53390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:17,708-Speed 5554.63 samples/sec   Loss 4.9937   LearningRate 0.0281   Epoch: 9   Global Step: 53400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:19,538-Speed 5597.40 samples/sec   Loss 5.0742   LearningRate 0.0281   Epoch: 9   Global Step: 53410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:21,361-Speed 5619.74 samples/sec   Loss 5.2212   LearningRate 0.0281   Epoch: 9   Global Step: 53420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:23,219-Speed 5511.96 samples/sec   Loss 5.0414   LearningRate 0.0281   Epoch: 9   Global Step: 53430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:25,039-Speed 5627.15 samples/sec   Loss 5.0654   LearningRate 0.0281   Epoch: 9   Global Step: 53440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:26,852-Speed 5652.42 samples/sec   Loss 5.0994   LearningRate 0.0281   Epoch: 9   Global Step: 53450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:28,661-Speed 5662.44 samples/sec   Loss 5.0042   LearningRate 0.0281   Epoch: 9   Global Step: 53460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:30,471-Speed 5660.74 samples/sec   Loss 5.0681   LearningRate 0.0281   Epoch: 9   Global Step: 53470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:32,296-Speed 5613.17 samples/sec   Loss 5.1117   LearningRate 0.0281   Epoch: 9   Global Step: 53480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:34,110-Speed 5647.73 samples/sec   Loss 5.0396   LearningRate 0.0281   Epoch: 9   Global Step: 53490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:35,935-Speed 5614.63 samples/sec   Loss 5.1263   LearningRate 0.0280   Epoch: 9   Global Step: 53500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:37,746-Speed 5655.16 samples/sec   Loss 5.0772   LearningRate 0.0280   Epoch: 9   Global Step: 53510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:39,586-Speed 5565.00 samples/sec   Loss 5.1184   LearningRate 0.0280   Epoch: 9   Global Step: 53520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:49:41,389-Speed 5682.62 samples/sec   Loss 5.0603   LearningRate 0.0280   Epoch: 9   Global Step: 53530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:43,222-Speed 5589.29 samples/sec   Loss 5.0951   LearningRate 0.0280   Epoch: 9   Global Step: 53540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:45,061-Speed 5569.36 samples/sec   Loss 5.0957   LearningRate 0.0280   Epoch: 9   Global Step: 53550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:46,891-Speed 5595.77 samples/sec   Loss 5.1186   LearningRate 0.0280   Epoch: 9   Global Step: 53560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:48,729-Speed 5572.87 samples/sec   Loss 5.0536   LearningRate 0.0280   Epoch: 9   Global Step: 53570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:50,536-Speed 5669.20 samples/sec   Loss 5.0542   LearningRate 0.0280   Epoch: 9   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:52,360-Speed 5617.72 samples/sec   Loss 5.0333   LearningRate 0.0280   Epoch: 9   Global Step: 53590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:54,219-Speed 5510.60 samples/sec   Loss 5.1224   LearningRate 0.0279   Epoch: 9   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:56,044-Speed 5611.71 samples/sec   Loss 5.1205   LearningRate 0.0279   Epoch: 9   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:57,886-Speed 5560.52 samples/sec   Loss 5.0012   LearningRate 0.0279   Epoch: 9   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:49:59,703-Speed 5639.49 samples/sec   Loss 4.9405   LearningRate 0.0279   Epoch: 9   Global Step: 53630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:01,513-Speed 5657.64 samples/sec   Loss 5.1168   LearningRate 0.0279   Epoch: 9   Global Step: 53640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:03,352-Speed 5570.76 samples/sec   Loss 5.0722   LearningRate 0.0279   Epoch: 9   Global Step: 53650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:05,168-Speed 5642.43 samples/sec   Loss 5.1643   LearningRate 0.0279   Epoch: 9   Global Step: 53660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:06,995-Speed 5604.51 samples/sec   Loss 5.0410   LearningRate 0.0279   Epoch: 9   Global Step: 53670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:08,844-Speed 5541.23 samples/sec   Loss 5.0322   LearningRate 0.0279   Epoch: 9   Global Step: 53680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:10,680-Speed 5576.97 samples/sec   Loss 5.0763   LearningRate 0.0279   Epoch: 9   Global Step: 53690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:12,511-Speed 5595.94 samples/sec   Loss 5.1124   LearningRate 0.0279   Epoch: 9   Global Step: 53700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:14,336-Speed 5612.43 samples/sec   Loss 5.0191   LearningRate 0.0278   Epoch: 9   Global Step: 53710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:16,166-Speed 5599.19 samples/sec   Loss 5.1973   LearningRate 0.0278   Epoch: 9   Global Step: 53720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:18,013-Speed 5547.18 samples/sec   Loss 5.1037   LearningRate 0.0278   Epoch: 9   Global Step: 53730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:19,855-Speed 5559.20 samples/sec   Loss 5.1083   LearningRate 0.0278   Epoch: 9   Global Step: 53740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:21,751-Speed 5401.35 samples/sec   Loss 5.0046   LearningRate 0.0278   Epoch: 9   Global Step: 53750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:23,672-Speed 5333.34 samples/sec   Loss 5.1170   LearningRate 0.0278   Epoch: 9   Global Step: 53760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:25,480-Speed 5665.84 samples/sec   Loss 5.0977   LearningRate 0.0278   Epoch: 9   Global Step: 53770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:27,304-Speed 5615.52 samples/sec   Loss 5.1301   LearningRate 0.0278   Epoch: 9   Global Step: 53780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:29,128-Speed 5617.40 samples/sec   Loss 5.0937   LearningRate 0.0278   Epoch: 9   Global Step: 53790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:50:30,945-Speed 5636.87 samples/sec   Loss 5.1777   LearningRate 0.0278   Epoch: 9   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:32,771-Speed 5609.58 samples/sec   Loss 5.1288   LearningRate 0.0278   Epoch: 9   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:34,605-Speed 5585.97 samples/sec   Loss 5.1661   LearningRate 0.0277   Epoch: 9   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:36,451-Speed 5549.46 samples/sec   Loss 5.0236   LearningRate 0.0277   Epoch: 9   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:38,284-Speed 5587.66 samples/sec   Loss 5.0381   LearningRate 0.0277   Epoch: 9   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:40,102-Speed 5634.62 samples/sec   Loss 5.1354   LearningRate 0.0277   Epoch: 9   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:41,924-Speed 5623.67 samples/sec   Loss 5.1715   LearningRate 0.0277   Epoch: 9   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:43,756-Speed 5590.87 samples/sec   Loss 5.1743   LearningRate 0.0277   Epoch: 9   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:45,598-Speed 5561.00 samples/sec   Loss 4.9768   LearningRate 0.0277   Epoch: 9   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:47,427-Speed 5601.60 samples/sec   Loss 5.0507   LearningRate 0.0277   Epoch: 9   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:49,228-Speed 5688.31 samples/sec   Loss 5.0959   LearningRate 0.0277   Epoch: 9   Global Step: 53900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:51,059-Speed 5593.79 samples/sec   Loss 5.0438   LearningRate 0.0277   Epoch: 9   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:52,905-Speed 5547.84 samples/sec   Loss 5.1850   LearningRate 0.0277   Epoch: 9   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:54,729-Speed 5617.02 samples/sec   Loss 5.1074   LearningRate 0.0276   Epoch: 9   Global Step: 53930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:56,561-Speed 5592.26 samples/sec   Loss 4.9920   LearningRate 0.0276   Epoch: 9   Global Step: 53940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:50:58,375-Speed 5646.57 samples/sec   Loss 5.1138   LearningRate 0.0276   Epoch: 9   Global Step: 53950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:51:00,202-Speed 5605.60 samples/sec   Loss 5.0019   LearningRate 0.0276   Epoch: 9   Global Step: 53960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:51:02,033-Speed 5596.14 samples/sec   Loss 5.0312   LearningRate 0.0276   Epoch: 9   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:51:03,863-Speed 5596.29 samples/sec   Loss 5.0636   LearningRate 0.0276   Epoch: 9   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:51:05,691-Speed 5606.84 samples/sec   Loss 5.1317   LearningRate 0.0276   Epoch: 9   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:51:07,577-Speed 5429.17 samples/sec   Loss 4.8906   LearningRate 0.0276   Epoch: 9   Global Step: 54000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:51:33,938-[lfw][54000]XNorm: 21.300929
Training: 2022-04-27 04:51:33,938-[lfw][54000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-27 04:51:33,938-[lfw][54000]Accuracy-Highest: 0.99800
Training: 2022-04-27 04:52:04,485-[cfp_fp][54000]XNorm: 18.890860
Training: 2022-04-27 04:52:04,486-[cfp_fp][54000]Accuracy-Flip: 0.95286+-0.01322
Training: 2022-04-27 04:52:04,486-[cfp_fp][54000]Accuracy-Highest: 0.95286
Training: 2022-04-27 04:52:30,854-[agedb_30][54000]XNorm: 21.154284
Training: 2022-04-27 04:52:30,854-[agedb_30][54000]Accuracy-Flip: 0.97133+-0.00875
Training: 2022-04-27 04:52:30,855-[agedb_30][54000]Accuracy-Highest: 0.97483
Training: 2022-04-27 04:52:32,701-Speed 120.30 samples/sec   Loss 4.9527   LearningRate 0.0276   Epoch: 9   Global Step: 54010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:52:34,529-Speed 5602.61 samples/sec   Loss 5.0382   LearningRate 0.0276   Epoch: 9   Global Step: 54020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:52:36,333-Speed 5678.49 samples/sec   Loss 5.0386   LearningRate 0.0276   Epoch: 9   Global Step: 54030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:52:38,160-Speed 5606.46 samples/sec   Loss 5.0248   LearningRate 0.0275   Epoch: 9   Global Step: 54040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:52:39,971-Speed 5658.49 samples/sec   Loss 5.0890   LearningRate 0.0275   Epoch: 9   Global Step: 54050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:52:41,799-Speed 5603.64 samples/sec   Loss 5.0553   LearningRate 0.0275   Epoch: 9   Global Step: 54060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:43,641-Speed 5559.62 samples/sec   Loss 5.1322   LearningRate 0.0275   Epoch: 9   Global Step: 54070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:45,486-Speed 5552.64 samples/sec   Loss 5.0246   LearningRate 0.0275   Epoch: 9   Global Step: 54080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:47,316-Speed 5595.56 samples/sec   Loss 4.8979   LearningRate 0.0275   Epoch: 9   Global Step: 54090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:49,169-Speed 5527.87 samples/sec   Loss 5.1522   LearningRate 0.0275   Epoch: 9   Global Step: 54100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:51,007-Speed 5574.38 samples/sec   Loss 5.0369   LearningRate 0.0275   Epoch: 9   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:52,827-Speed 5628.12 samples/sec   Loss 5.0522   LearningRate 0.0275   Epoch: 9   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:54,654-Speed 5605.02 samples/sec   Loss 5.0482   LearningRate 0.0275   Epoch: 9   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:56,472-Speed 5637.08 samples/sec   Loss 5.0951   LearningRate 0.0274   Epoch: 9   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:52:58,291-Speed 5630.73 samples/sec   Loss 4.9323   LearningRate 0.0274   Epoch: 9   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:00,103-Speed 5652.26 samples/sec   Loss 5.2015   LearningRate 0.0274   Epoch: 9   Global Step: 54160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:01,905-Speed 5685.35 samples/sec   Loss 4.9382   LearningRate 0.0274   Epoch: 9   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:03,712-Speed 5667.87 samples/sec   Loss 5.0940   LearningRate 0.0274   Epoch: 9   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:05,524-Speed 5655.25 samples/sec   Loss 5.0644   LearningRate 0.0274   Epoch: 9   Global Step: 54190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:07,325-Speed 5687.54 samples/sec   Loss 5.0892   LearningRate 0.0274   Epoch: 9   Global Step: 54200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:09,133-Speed 5663.83 samples/sec   Loss 5.0492   LearningRate 0.0274   Epoch: 9   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:10,939-Speed 5673.59 samples/sec   Loss 5.0650   LearningRate 0.0274   Epoch: 9   Global Step: 54220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:12,751-Speed 5653.35 samples/sec   Loss 5.0574   LearningRate 0.0274   Epoch: 9   Global Step: 54230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:14,562-Speed 5656.20 samples/sec   Loss 5.1791   LearningRate 0.0274   Epoch: 9   Global Step: 54240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:16,362-Speed 5690.63 samples/sec   Loss 5.0958   LearningRate 0.0273   Epoch: 9   Global Step: 54250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:18,182-Speed 5626.80 samples/sec   Loss 5.0618   LearningRate 0.0273   Epoch: 9   Global Step: 54260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:20,014-Speed 5591.80 samples/sec   Loss 5.0705   LearningRate 0.0273   Epoch: 9   Global Step: 54270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:21,822-Speed 5667.15 samples/sec   Loss 5.0563   LearningRate 0.0273   Epoch: 9   Global Step: 54280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:23,635-Speed 5650.63 samples/sec   Loss 4.9740   LearningRate 0.0273   Epoch: 9   Global Step: 54290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:25,458-Speed 5618.79 samples/sec   Loss 5.1676   LearningRate 0.0273   Epoch: 9   Global Step: 54300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:27,288-Speed 5597.75 samples/sec   Loss 5.0754   LearningRate 0.0273   Epoch: 9   Global Step: 54310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:29,123-Speed 5580.06 samples/sec   Loss 5.0372   LearningRate 0.0273   Epoch: 9   Global Step: 54320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:30,957-Speed 5587.50 samples/sec   Loss 5.1768   LearningRate 0.0273   Epoch: 9   Global Step: 54330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:32,777-Speed 5626.30 samples/sec   Loss 5.1027   LearningRate 0.0273   Epoch: 9   Global Step: 54340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:34,593-Speed 5640.54 samples/sec   Loss 5.0601   LearningRate 0.0273   Epoch: 9   Global Step: 54350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:36,431-Speed 5572.48 samples/sec   Loss 5.0451   LearningRate 0.0272   Epoch: 9   Global Step: 54360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:38,241-Speed 5662.03 samples/sec   Loss 5.0452   LearningRate 0.0272   Epoch: 9   Global Step: 54370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:40,058-Speed 5635.53 samples/sec   Loss 4.9733   LearningRate 0.0272   Epoch: 9   Global Step: 54380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:41,870-Speed 5653.92 samples/sec   Loss 5.0502   LearningRate 0.0272   Epoch: 9   Global Step: 54390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:43,693-Speed 5619.69 samples/sec   Loss 5.0088   LearningRate 0.0272   Epoch: 9   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:45,527-Speed 5584.61 samples/sec   Loss 5.1270   LearningRate 0.0272   Epoch: 9   Global Step: 54410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:47,334-Speed 5669.54 samples/sec   Loss 4.9376   LearningRate 0.0272   Epoch: 9   Global Step: 54420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:49,172-Speed 5572.63 samples/sec   Loss 5.0278   LearningRate 0.0272   Epoch: 9   Global Step: 54430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:50,997-Speed 5612.79 samples/sec   Loss 4.9774   LearningRate 0.0272   Epoch: 9   Global Step: 54440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:52,809-Speed 5652.37 samples/sec   Loss 5.0843   LearningRate 0.0272   Epoch: 9   Global Step: 54450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:54,629-Speed 5629.33 samples/sec   Loss 5.1424   LearningRate 0.0272   Epoch: 9   Global Step: 54460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:53:56,428-Speed 5695.69 samples/sec   Loss 5.1198   LearningRate 0.0271   Epoch: 9   Global Step: 54470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:53:58,239-Speed 5654.20 samples/sec   Loss 5.0919   LearningRate 0.0271   Epoch: 9   Global Step: 54480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:00,062-Speed 5617.98 samples/sec   Loss 5.0556   LearningRate 0.0271   Epoch: 9   Global Step: 54490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:01,897-Speed 5582.12 samples/sec   Loss 5.0672   LearningRate 0.0271   Epoch: 9   Global Step: 54500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:03,715-Speed 5634.28 samples/sec   Loss 5.1473   LearningRate 0.0271   Epoch: 9   Global Step: 54510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:05,646-Speed 5306.07 samples/sec   Loss 4.9426   LearningRate 0.0271   Epoch: 9   Global Step: 54520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:07,500-Speed 5524.79 samples/sec   Loss 5.0964   LearningRate 0.0271   Epoch: 9   Global Step: 54530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:09,332-Speed 5593.86 samples/sec   Loss 5.0875   LearningRate 0.0271   Epoch: 9   Global Step: 54540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:11,162-Speed 5596.01 samples/sec   Loss 5.0263   LearningRate 0.0271   Epoch: 9   Global Step: 54550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:13,002-Speed 5566.77 samples/sec   Loss 5.0426   LearningRate 0.0271   Epoch: 9   Global Step: 54560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:14,815-Speed 5651.70 samples/sec   Loss 5.0045   LearningRate 0.0271   Epoch: 9   Global Step: 54570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:54:16,655-Speed 5565.90 samples/sec   Loss 4.9549   LearningRate 0.0270   Epoch: 9   Global Step: 54580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:54:18,476-Speed 5625.88 samples/sec   Loss 5.0515   LearningRate 0.0270   Epoch: 9   Global Step: 54590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:54:20,287-Speed 5654.04 samples/sec   Loss 5.1411   LearningRate 0.0270   Epoch: 9   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:22,091-Speed 5678.00 samples/sec   Loss 5.0481   LearningRate 0.0270   Epoch: 9   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:23,899-Speed 5667.83 samples/sec   Loss 5.0438   LearningRate 0.0270   Epoch: 9   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:25,724-Speed 5612.26 samples/sec   Loss 4.9047   LearningRate 0.0270   Epoch: 9   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:27,551-Speed 5607.89 samples/sec   Loss 4.9989   LearningRate 0.0270   Epoch: 9   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:29,369-Speed 5635.64 samples/sec   Loss 5.0369   LearningRate 0.0270   Epoch: 9   Global Step: 54650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:31,193-Speed 5613.19 samples/sec   Loss 5.0552   LearningRate 0.0270   Epoch: 9   Global Step: 54660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:33,027-Speed 5587.03 samples/sec   Loss 5.0101   LearningRate 0.0270   Epoch: 9   Global Step: 54670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:34,836-Speed 5660.71 samples/sec   Loss 5.0766   LearningRate 0.0270   Epoch: 9   Global Step: 54680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:36,677-Speed 5564.42 samples/sec   Loss 5.1496   LearningRate 0.0269   Epoch: 9   Global Step: 54690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:38,503-Speed 5610.23 samples/sec   Loss 5.0304   LearningRate 0.0269   Epoch: 9   Global Step: 54700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:54:40,311-Speed 5664.70 samples/sec   Loss 5.0095   LearningRate 0.0269   Epoch: 9   Global Step: 54710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:42,132-Speed 5626.66 samples/sec   Loss 4.9141   LearningRate 0.0269   Epoch: 9   Global Step: 54720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:43,949-Speed 5636.93 samples/sec   Loss 5.0466   LearningRate 0.0269   Epoch: 9   Global Step: 54730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:45,767-Speed 5636.38 samples/sec   Loss 5.0348   LearningRate 0.0269   Epoch: 9   Global Step: 54740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:47,590-Speed 5619.10 samples/sec   Loss 5.1884   LearningRate 0.0269   Epoch: 9   Global Step: 54750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:49,410-Speed 5628.00 samples/sec   Loss 5.0236   LearningRate 0.0269   Epoch: 9   Global Step: 54760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:51,264-Speed 5526.49 samples/sec   Loss 5.1172   LearningRate 0.0269   Epoch: 9   Global Step: 54770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:53,100-Speed 5579.92 samples/sec   Loss 4.9597   LearningRate 0.0269   Epoch: 9   Global Step: 54780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:54,910-Speed 5658.48 samples/sec   Loss 4.9616   LearningRate 0.0269   Epoch: 9   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:56,724-Speed 5646.20 samples/sec   Loss 5.0681   LearningRate 0.0268   Epoch: 9   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:54:58,566-Speed 5562.74 samples/sec   Loss 4.9412   LearningRate 0.0268   Epoch: 9   Global Step: 54810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:00,403-Speed 5575.23 samples/sec   Loss 5.0188   LearningRate 0.0268   Epoch: 9   Global Step: 54820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:02,222-Speed 5629.00 samples/sec   Loss 5.0215   LearningRate 0.0268   Epoch: 9   Global Step: 54830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:04,049-Speed 5609.16 samples/sec   Loss 4.9864   LearningRate 0.0268   Epoch: 9   Global Step: 54840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:05,866-Speed 5634.65 samples/sec   Loss 4.8349   LearningRate 0.0268   Epoch: 9   Global Step: 54850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:07,702-Speed 5580.96 samples/sec   Loss 4.9735   LearningRate 0.0268   Epoch: 9   Global Step: 54860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:09,535-Speed 5587.54 samples/sec   Loss 5.0478   LearningRate 0.0268   Epoch: 9   Global Step: 54870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:11,354-Speed 5630.85 samples/sec   Loss 5.1395   LearningRate 0.0268   Epoch: 9   Global Step: 54880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:13,183-Speed 5601.76 samples/sec   Loss 4.9253   LearningRate 0.0268   Epoch: 9   Global Step: 54890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:15,021-Speed 5575.24 samples/sec   Loss 5.1409   LearningRate 0.0268   Epoch: 9   Global Step: 54900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:16,834-Speed 5650.66 samples/sec   Loss 5.0561   LearningRate 0.0267   Epoch: 9   Global Step: 54910   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-27 04:55:18,646-Speed 5650.67 samples/sec   Loss 5.0119   LearningRate 0.0267   Epoch: 9   Global Step: 54920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:20,479-Speed 5589.93 samples/sec   Loss 4.8994   LearningRate 0.0267   Epoch: 9   Global Step: 54930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:22,305-Speed 5608.82 samples/sec   Loss 5.0181   LearningRate 0.0267   Epoch: 9   Global Step: 54940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:24,136-Speed 5596.27 samples/sec   Loss 5.0572   LearningRate 0.0267   Epoch: 9   Global Step: 54950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:25,959-Speed 5618.06 samples/sec   Loss 4.9830   LearningRate 0.0267   Epoch: 9   Global Step: 54960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:27,787-Speed 5603.06 samples/sec   Loss 4.9593   LearningRate 0.0267   Epoch: 9   Global Step: 54970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:29,612-Speed 5612.78 samples/sec   Loss 4.9391   LearningRate 0.0267   Epoch: 9   Global Step: 54980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:31,421-Speed 5665.00 samples/sec   Loss 5.0008   LearningRate 0.0267   Epoch: 9   Global Step: 54990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:33,226-Speed 5673.39 samples/sec   Loss 5.0085   LearningRate 0.0267   Epoch: 9   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:35,029-Speed 5683.48 samples/sec   Loss 5.0207   LearningRate 0.0267   Epoch: 9   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:36,831-Speed 5683.44 samples/sec   Loss 4.9566   LearningRate 0.0266   Epoch: 9   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:38,647-Speed 5640.01 samples/sec   Loss 4.9525   LearningRate 0.0266   Epoch: 9   Global Step: 55030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:40,477-Speed 5598.57 samples/sec   Loss 5.0464   LearningRate 0.0266   Epoch: 9   Global Step: 55040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:42,290-Speed 5650.29 samples/sec   Loss 4.9619   LearningRate 0.0266   Epoch: 9   Global Step: 55050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:44,103-Speed 5652.00 samples/sec   Loss 4.9858   LearningRate 0.0266   Epoch: 9   Global Step: 55060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:45,955-Speed 5528.39 samples/sec   Loss 5.0500   LearningRate 0.0266   Epoch: 9   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:47,780-Speed 5613.56 samples/sec   Loss 5.1072   LearningRate 0.0266   Epoch: 9   Global Step: 55080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:55:49,586-Speed 5671.94 samples/sec   Loss 4.9338   LearningRate 0.0266   Epoch: 9   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:51,414-Speed 5603.04 samples/sec   Loss 4.9937   LearningRate 0.0266   Epoch: 9   Global Step: 55100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:53,240-Speed 5609.65 samples/sec   Loss 4.9712   LearningRate 0.0266   Epoch: 9   Global Step: 55110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:55,050-Speed 5658.93 samples/sec   Loss 5.0974   LearningRate 0.0266   Epoch: 9   Global Step: 55120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:56,876-Speed 5609.77 samples/sec   Loss 5.0819   LearningRate 0.0265   Epoch: 9   Global Step: 55130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:55:58,737-Speed 5507.93 samples/sec   Loss 4.9985   LearningRate 0.0265   Epoch: 9   Global Step: 55140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:00,566-Speed 5602.06 samples/sec   Loss 5.0435   LearningRate 0.0265   Epoch: 9   Global Step: 55150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:02,465-Speed 5393.93 samples/sec   Loss 5.1102   LearningRate 0.0265   Epoch: 9   Global Step: 55160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:04,380-Speed 5348.92 samples/sec   Loss 5.1723   LearningRate 0.0265   Epoch: 9   Global Step: 55170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:06,264-Speed 5437.76 samples/sec   Loss 4.9853   LearningRate 0.0265   Epoch: 9   Global Step: 55180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:08,072-Speed 5664.27 samples/sec   Loss 5.0595   LearningRate 0.0265   Epoch: 9   Global Step: 55190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:56:09,888-Speed 5641.54 samples/sec   Loss 4.8943   LearningRate 0.0265   Epoch: 9   Global Step: 55200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:56:11,698-Speed 5658.19 samples/sec   Loss 5.0028   LearningRate 0.0265   Epoch: 9   Global Step: 55210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:13,522-Speed 5615.11 samples/sec   Loss 4.8829   LearningRate 0.0265   Epoch: 9   Global Step: 55220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:15,375-Speed 5530.33 samples/sec   Loss 4.9534   LearningRate 0.0265   Epoch: 9   Global Step: 55230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:17,220-Speed 5549.40 samples/sec   Loss 4.9206   LearningRate 0.0264   Epoch: 9   Global Step: 55240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:19,059-Speed 5570.14 samples/sec   Loss 4.9512   LearningRate 0.0264   Epoch: 9   Global Step: 55250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:20,872-Speed 5652.01 samples/sec   Loss 4.8927   LearningRate 0.0264   Epoch: 9   Global Step: 55260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:22,679-Speed 5669.92 samples/sec   Loss 5.0752   LearningRate 0.0264   Epoch: 9   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:24,505-Speed 5610.23 samples/sec   Loss 4.8920   LearningRate 0.0264   Epoch: 9   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:26,332-Speed 5605.81 samples/sec   Loss 4.8606   LearningRate 0.0264   Epoch: 9   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:28,143-Speed 5654.85 samples/sec   Loss 4.9966   LearningRate 0.0264   Epoch: 9   Global Step: 55300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:29,951-Speed 5667.32 samples/sec   Loss 5.0556   LearningRate 0.0264   Epoch: 9   Global Step: 55310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:56:31,775-Speed 5616.03 samples/sec   Loss 4.9389   LearningRate 0.0264   Epoch: 9   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:33,622-Speed 5544.76 samples/sec   Loss 5.0154   LearningRate 0.0264   Epoch: 9   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:35,447-Speed 5612.08 samples/sec   Loss 5.0592   LearningRate 0.0264   Epoch: 9   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:37,278-Speed 5595.68 samples/sec   Loss 5.2379   LearningRate 0.0263   Epoch: 9   Global Step: 55350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:39,130-Speed 5531.88 samples/sec   Loss 5.0255   LearningRate 0.0263   Epoch: 9   Global Step: 55360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:40,960-Speed 5596.67 samples/sec   Loss 5.0489   LearningRate 0.0263   Epoch: 9   Global Step: 55370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:42,773-Speed 5653.14 samples/sec   Loss 4.9828   LearningRate 0.0263   Epoch: 9   Global Step: 55380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:44,597-Speed 5614.30 samples/sec   Loss 5.1233   LearningRate 0.0263   Epoch: 9   Global Step: 55390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:46,420-Speed 5620.37 samples/sec   Loss 4.9839   LearningRate 0.0263   Epoch: 9   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:48,260-Speed 5566.91 samples/sec   Loss 4.9355   LearningRate 0.0263   Epoch: 9   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:50,085-Speed 5610.38 samples/sec   Loss 4.9389   LearningRate 0.0263   Epoch: 9   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:56:51,932-Speed 5547.31 samples/sec   Loss 5.0198   LearningRate 0.0263   Epoch: 9   Global Step: 55430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:56:53,759-Speed 5604.64 samples/sec   Loss 5.1502   LearningRate 0.0263   Epoch: 9   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:55,607-Speed 5544.22 samples/sec   Loss 4.9018   LearningRate 0.0263   Epoch: 9   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:57,419-Speed 5654.30 samples/sec   Loss 4.9712   LearningRate 0.0262   Epoch: 9   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:56:59,226-Speed 5666.37 samples/sec   Loss 4.9244   LearningRate 0.0262   Epoch: 9   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:01,032-Speed 5672.53 samples/sec   Loss 4.9011   LearningRate 0.0262   Epoch: 9   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:02,865-Speed 5590.77 samples/sec   Loss 5.0552   LearningRate 0.0262   Epoch: 9   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:04,691-Speed 5607.63 samples/sec   Loss 4.9727   LearningRate 0.0262   Epoch: 9   Global Step: 55500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:06,509-Speed 5636.00 samples/sec   Loss 5.0865   LearningRate 0.0262   Epoch: 9   Global Step: 55510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:08,319-Speed 5657.60 samples/sec   Loss 4.9814   LearningRate 0.0262   Epoch: 9   Global Step: 55520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:10,144-Speed 5614.34 samples/sec   Loss 4.9992   LearningRate 0.0262   Epoch: 9   Global Step: 55530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:11,972-Speed 5605.41 samples/sec   Loss 4.9903   LearningRate 0.0262   Epoch: 9   Global Step: 55540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:57:13,785-Speed 5647.40 samples/sec   Loss 4.9549   LearningRate 0.0262   Epoch: 9   Global Step: 55550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:57:15,601-Speed 5641.02 samples/sec   Loss 4.9930   LearningRate 0.0262   Epoch: 9   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:17,435-Speed 5585.14 samples/sec   Loss 4.9580   LearningRate 0.0261   Epoch: 9   Global Step: 55570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:19,261-Speed 5610.91 samples/sec   Loss 4.9839   LearningRate 0.0261   Epoch: 9   Global Step: 55580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:21,069-Speed 5665.29 samples/sec   Loss 5.0221   LearningRate 0.0261   Epoch: 9   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:22,895-Speed 5608.86 samples/sec   Loss 4.8866   LearningRate 0.0261   Epoch: 9   Global Step: 55600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:24,718-Speed 5619.31 samples/sec   Loss 4.9860   LearningRate 0.0261   Epoch: 9   Global Step: 55610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:26,563-Speed 5552.78 samples/sec   Loss 5.1098   LearningRate 0.0261   Epoch: 9   Global Step: 55620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:28,408-Speed 5551.06 samples/sec   Loss 4.9561   LearningRate 0.0261   Epoch: 9   Global Step: 55630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:30,219-Speed 5658.54 samples/sec   Loss 4.8981   LearningRate 0.0261   Epoch: 9   Global Step: 55640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:32,088-Speed 5478.56 samples/sec   Loss 4.9413   LearningRate 0.0261   Epoch: 9   Global Step: 55650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:33,982-Speed 5410.36 samples/sec   Loss 4.9476   LearningRate 0.0261   Epoch: 9   Global Step: 55660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:35,815-Speed 5588.16 samples/sec   Loss 4.9661   LearningRate 0.0261   Epoch: 9   Global Step: 55670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:37,642-Speed 5604.33 samples/sec   Loss 4.8402   LearningRate 0.0260   Epoch: 9   Global Step: 55680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:39,464-Speed 5624.63 samples/sec   Loss 5.0276   LearningRate 0.0260   Epoch: 9   Global Step: 55690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:41,310-Speed 5548.92 samples/sec   Loss 4.8867   LearningRate 0.0260   Epoch: 9   Global Step: 55700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:43,113-Speed 5680.07 samples/sec   Loss 4.9578   LearningRate 0.0260   Epoch: 9   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:44,932-Speed 5631.59 samples/sec   Loss 4.9431   LearningRate 0.0260   Epoch: 9   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:46,750-Speed 5636.72 samples/sec   Loss 4.9242   LearningRate 0.0260   Epoch: 9   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:48,553-Speed 5679.68 samples/sec   Loss 4.7594   LearningRate 0.0260   Epoch: 9   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:50,378-Speed 5613.02 samples/sec   Loss 4.9599   LearningRate 0.0260   Epoch: 9   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:57:52,196-Speed 5636.45 samples/sec   Loss 4.9067   LearningRate 0.0260   Epoch: 9   Global Step: 55760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:57:54,020-Speed 5613.24 samples/sec   Loss 4.9585   LearningRate 0.0260   Epoch: 9   Global Step: 55770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:57:55,859-Speed 5569.94 samples/sec   Loss 4.9667   LearningRate 0.0260   Epoch: 9   Global Step: 55780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:57:57,675-Speed 5642.01 samples/sec   Loss 5.0270   LearningRate 0.0259   Epoch: 9   Global Step: 55790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:57:59,492-Speed 5636.41 samples/sec   Loss 4.9569   LearningRate 0.0259   Epoch: 9   Global Step: 55800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:58:01,310-Speed 5638.04 samples/sec   Loss 4.9924   LearningRate 0.0259   Epoch: 9   Global Step: 55810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:58:03,132-Speed 5621.55 samples/sec   Loss 4.9539   LearningRate 0.0259   Epoch: 9   Global Step: 55820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:58:04,947-Speed 5641.93 samples/sec   Loss 5.0733   LearningRate 0.0259   Epoch: 9   Global Step: 55830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:58:06,755-Speed 5666.34 samples/sec   Loss 5.0154   LearningRate 0.0259   Epoch: 9   Global Step: 55840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:58:08,560-Speed 5673.78 samples/sec   Loss 5.0269   LearningRate 0.0259   Epoch: 9   Global Step: 55850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 04:58:10,378-Speed 5635.59 samples/sec   Loss 5.0680   LearningRate 0.0259   Epoch: 9   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:12,204-Speed 5610.45 samples/sec   Loss 4.9132   LearningRate 0.0259   Epoch: 9   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:14,011-Speed 5671.58 samples/sec   Loss 4.9303   LearningRate 0.0259   Epoch: 9   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:15,821-Speed 5657.21 samples/sec   Loss 5.0600   LearningRate 0.0259   Epoch: 9   Global Step: 55890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:17,630-Speed 5662.91 samples/sec   Loss 5.1196   LearningRate 0.0259   Epoch: 9   Global Step: 55900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:19,438-Speed 5665.68 samples/sec   Loss 4.9221   LearningRate 0.0258   Epoch: 9   Global Step: 55910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:21,244-Speed 5674.47 samples/sec   Loss 4.8672   LearningRate 0.0258   Epoch: 9   Global Step: 55920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:23,049-Speed 5674.27 samples/sec   Loss 5.0149   LearningRate 0.0258   Epoch: 9   Global Step: 55930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:24,866-Speed 5636.67 samples/sec   Loss 4.9178   LearningRate 0.0258   Epoch: 9   Global Step: 55940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:26,674-Speed 5665.52 samples/sec   Loss 5.0580   LearningRate 0.0258   Epoch: 9   Global Step: 55950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 04:58:28,487-Speed 5651.24 samples/sec   Loss 4.9424   LearningRate 0.0258   Epoch: 9   Global Step: 55960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:58:30,313-Speed 5609.10 samples/sec   Loss 4.8960   LearningRate 0.0258   Epoch: 9   Global Step: 55970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:58:32,120-Speed 5669.05 samples/sec   Loss 4.9394   LearningRate 0.0258   Epoch: 9   Global Step: 55980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:58:33,945-Speed 5613.20 samples/sec   Loss 4.8618   LearningRate 0.0258   Epoch: 9   Global Step: 55990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:58:35,765-Speed 5626.78 samples/sec   Loss 4.8149   LearningRate 0.0258   Epoch: 9   Global Step: 56000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 04:59:01,952-[lfw][56000]XNorm: 21.273996
Training: 2022-04-27 04:59:01,952-[lfw][56000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-27 04:59:01,953-[lfw][56000]Accuracy-Highest: 0.99800
Training: 2022-04-27 04:59:32,301-[cfp_fp][56000]XNorm: 18.822446
Training: 2022-04-27 04:59:32,302-[cfp_fp][56000]Accuracy-Flip: 0.95343+-0.00976
Training: 2022-04-27 04:59:32,302-[cfp_fp][56000]Accuracy-Highest: 0.95343
Training: 2022-04-27 04:59:58,503-[agedb_30][56000]XNorm: 21.008991
Training: 2022-04-27 04:59:58,503-[agedb_30][56000]Accuracy-Flip: 0.97550+-0.00863
Training: 2022-04-27 04:59:58,504-[agedb_30][56000]Accuracy-Highest: 0.97550
Training: 2022-04-27 05:00:00,368-Speed 121.04 samples/sec   Loss 4.8828   LearningRate 0.0258   Epoch: 9   Global Step: 56010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:02,167-Speed 5692.17 samples/sec   Loss 5.0275   LearningRate 0.0257   Epoch: 9   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:04,004-Speed 5577.47 samples/sec   Loss 5.0176   LearningRate 0.0257   Epoch: 9   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:05,839-Speed 5581.26 samples/sec   Loss 5.0194   LearningRate 0.0257   Epoch: 9   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:07,671-Speed 5590.61 samples/sec   Loss 4.9002   LearningRate 0.0257   Epoch: 9   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:09,498-Speed 5607.43 samples/sec   Loss 4.8781   LearningRate 0.0257   Epoch: 9   Global Step: 56060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:11,323-Speed 5612.42 samples/sec   Loss 4.9169   LearningRate 0.0257   Epoch: 9   Global Step: 56070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:13,144-Speed 5626.02 samples/sec   Loss 5.0792   LearningRate 0.0257   Epoch: 9   Global Step: 56080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:14,965-Speed 5625.68 samples/sec   Loss 4.9939   LearningRate 0.0257   Epoch: 9   Global Step: 56090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:16,794-Speed 5599.91 samples/sec   Loss 5.0508   LearningRate 0.0257   Epoch: 9   Global Step: 56100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:18,619-Speed 5610.99 samples/sec   Loss 4.9827   LearningRate 0.0257   Epoch: 9   Global Step: 56110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:20,506-Speed 5430.76 samples/sec   Loss 4.8680   LearningRate 0.0257   Epoch: 9   Global Step: 56120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:22,366-Speed 5506.65 samples/sec   Loss 4.9050   LearningRate 0.0256   Epoch: 9   Global Step: 56130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:24,297-Speed 5305.06 samples/sec   Loss 4.9420   LearningRate 0.0256   Epoch: 9   Global Step: 56140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:26,215-Speed 5340.44 samples/sec   Loss 4.9624   LearningRate 0.0256   Epoch: 9   Global Step: 56150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:28,135-Speed 5333.29 samples/sec   Loss 4.9165   LearningRate 0.0256   Epoch: 9   Global Step: 56160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:29,956-Speed 5625.38 samples/sec   Loss 4.8365   LearningRate 0.0256   Epoch: 9   Global Step: 56170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:31,777-Speed 5626.62 samples/sec   Loss 5.0046   LearningRate 0.0256   Epoch: 9   Global Step: 56180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:33,593-Speed 5641.66 samples/sec   Loss 4.8703   LearningRate 0.0256   Epoch: 9   Global Step: 56190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:35,435-Speed 5561.04 samples/sec   Loss 5.0148   LearningRate 0.0256   Epoch: 9   Global Step: 56200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:37,259-Speed 5615.33 samples/sec   Loss 4.9365   LearningRate 0.0256   Epoch: 9   Global Step: 56210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:39,074-Speed 5643.31 samples/sec   Loss 4.8826   LearningRate 0.0256   Epoch: 9   Global Step: 56220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:40,885-Speed 5656.06 samples/sec   Loss 4.9482   LearningRate 0.0256   Epoch: 9   Global Step: 56230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:42,699-Speed 5648.45 samples/sec   Loss 5.0571   LearningRate 0.0255   Epoch: 9   Global Step: 56240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:44,515-Speed 5639.89 samples/sec   Loss 4.8822   LearningRate 0.0255   Epoch: 9   Global Step: 56250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:46,337-Speed 5623.91 samples/sec   Loss 5.0605   LearningRate 0.0255   Epoch: 9   Global Step: 56260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:48,181-Speed 5554.83 samples/sec   Loss 4.9012   LearningRate 0.0255   Epoch: 9   Global Step: 56270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:50,099-Speed 5341.57 samples/sec   Loss 5.0598   LearningRate 0.0255   Epoch: 9   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:51,963-Speed 5493.44 samples/sec   Loss 4.9175   LearningRate 0.0255   Epoch: 9   Global Step: 56290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:53,796-Speed 5588.39 samples/sec   Loss 4.8813   LearningRate 0.0255   Epoch: 9   Global Step: 56300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:00:55,597-Speed 5686.63 samples/sec   Loss 4.9345   LearningRate 0.0255   Epoch: 9   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:57,413-Speed 5642.89 samples/sec   Loss 4.8966   LearningRate 0.0255   Epoch: 9   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:00:59,238-Speed 5612.41 samples/sec   Loss 4.9170   LearningRate 0.0255   Epoch: 9   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:01,062-Speed 5617.59 samples/sec   Loss 5.0847   LearningRate 0.0255   Epoch: 9   Global Step: 56340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:02,899-Speed 5576.91 samples/sec   Loss 5.0005   LearningRate 0.0255   Epoch: 9   Global Step: 56350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:04,720-Speed 5623.07 samples/sec   Loss 4.8526   LearningRate 0.0254   Epoch: 9   Global Step: 56360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:06,550-Speed 5597.26 samples/sec   Loss 4.8735   LearningRate 0.0254   Epoch: 9   Global Step: 56370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:08,371-Speed 5628.47 samples/sec   Loss 4.8645   LearningRate 0.0254   Epoch: 9   Global Step: 56380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:10,184-Speed 5648.00 samples/sec   Loss 5.0451   LearningRate 0.0254   Epoch: 9   Global Step: 56390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:11,997-Speed 5651.39 samples/sec   Loss 4.8522   LearningRate 0.0254   Epoch: 9   Global Step: 56400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:13,836-Speed 5570.59 samples/sec   Loss 4.8755   LearningRate 0.0254   Epoch: 9   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:01:15,656-Speed 5626.84 samples/sec   Loss 4.9617   LearningRate 0.0254   Epoch: 9   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:01:17,480-Speed 5616.42 samples/sec   Loss 4.8875   LearningRate 0.0254   Epoch: 9   Global Step: 56430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:19,315-Speed 5583.70 samples/sec   Loss 4.9031   LearningRate 0.0254   Epoch: 9   Global Step: 56440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:21,135-Speed 5627.26 samples/sec   Loss 4.9886   LearningRate 0.0254   Epoch: 9   Global Step: 56450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:22,981-Speed 5550.50 samples/sec   Loss 4.8244   LearningRate 0.0254   Epoch: 9   Global Step: 56460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:24,833-Speed 5532.21 samples/sec   Loss 4.9562   LearningRate 0.0253   Epoch: 9   Global Step: 56470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:26,646-Speed 5649.35 samples/sec   Loss 5.0169   LearningRate 0.0253   Epoch: 9   Global Step: 56480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:28,469-Speed 5618.43 samples/sec   Loss 4.8231   LearningRate 0.0253   Epoch: 9   Global Step: 56490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:30,286-Speed 5637.36 samples/sec   Loss 4.9658   LearningRate 0.0253   Epoch: 9   Global Step: 56500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:32,112-Speed 5608.89 samples/sec   Loss 4.7503   LearningRate 0.0253   Epoch: 9   Global Step: 56510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:33,916-Speed 5679.42 samples/sec   Loss 4.8455   LearningRate 0.0253   Epoch: 9   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:35,723-Speed 5666.59 samples/sec   Loss 4.8389   LearningRate 0.0253   Epoch: 9   Global Step: 56530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:01:37,535-Speed 5654.88 samples/sec   Loss 5.1127   LearningRate 0.0253   Epoch: 9   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:39,344-Speed 5662.62 samples/sec   Loss 4.8945   LearningRate 0.0253   Epoch: 9   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:41,162-Speed 5634.70 samples/sec   Loss 4.9236   LearningRate 0.0253   Epoch: 9   Global Step: 56560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:42,975-Speed 5650.23 samples/sec   Loss 4.8898   LearningRate 0.0253   Epoch: 9   Global Step: 56570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:44,804-Speed 5600.10 samples/sec   Loss 5.0410   LearningRate 0.0252   Epoch: 9   Global Step: 56580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:46,618-Speed 5648.03 samples/sec   Loss 4.9111   LearningRate 0.0252   Epoch: 9   Global Step: 56590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:48,439-Speed 5623.75 samples/sec   Loss 5.0208   LearningRate 0.0252   Epoch: 9   Global Step: 56600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:50,253-Speed 5648.54 samples/sec   Loss 5.0086   LearningRate 0.0252   Epoch: 9   Global Step: 56610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:01:52,067-Speed 5643.85 samples/sec   Loss 4.8853   LearningRate 0.0252   Epoch: 9   Global Step: 56620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:01:53,883-Speed 5644.21 samples/sec   Loss 4.8917   LearningRate 0.0252   Epoch: 9   Global Step: 56630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:01:55,696-Speed 5648.66 samples/sec   Loss 4.9016   LearningRate 0.0252   Epoch: 9   Global Step: 56640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:01:57,515-Speed 5630.65 samples/sec   Loss 4.9225   LearningRate 0.0252   Epoch: 9   Global Step: 56650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:01:59,328-Speed 5649.39 samples/sec   Loss 4.9694   LearningRate 0.0252   Epoch: 9   Global Step: 56660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:02:01,142-Speed 5648.86 samples/sec   Loss 4.8715   LearningRate 0.0252   Epoch: 9   Global Step: 56670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:02:02,963-Speed 5625.76 samples/sec   Loss 4.8261   LearningRate 0.0252   Epoch: 9   Global Step: 56680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:02:04,774-Speed 5655.66 samples/sec   Loss 4.8664   LearningRate 0.0251   Epoch: 9   Global Step: 56690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:02:06,640-Speed 5490.58 samples/sec   Loss 4.7604   LearningRate 0.0251   Epoch: 9   Global Step: 56700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:02:08,496-Speed 5519.27 samples/sec   Loss 4.8850   LearningRate 0.0251   Epoch: 9   Global Step: 56710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:02:10,303-Speed 5667.96 samples/sec   Loss 4.9692   LearningRate 0.0251   Epoch: 9   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:12,123-Speed 5628.38 samples/sec   Loss 4.9550   LearningRate 0.0251   Epoch: 9   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:13,931-Speed 5665.43 samples/sec   Loss 4.7731   LearningRate 0.0251   Epoch: 9   Global Step: 56740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:15,747-Speed 5639.51 samples/sec   Loss 4.8675   LearningRate 0.0251   Epoch: 9   Global Step: 56750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:17,601-Speed 5526.47 samples/sec   Loss 5.0239   LearningRate 0.0251   Epoch: 9   Global Step: 56760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:19,441-Speed 5565.67 samples/sec   Loss 4.9827   LearningRate 0.0251   Epoch: 9   Global Step: 56770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:21,266-Speed 5614.59 samples/sec   Loss 5.0762   LearningRate 0.0251   Epoch: 9   Global Step: 56780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:23,099-Speed 5588.24 samples/sec   Loss 4.7508   LearningRate 0.0251   Epoch: 9   Global Step: 56790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:24,922-Speed 5620.89 samples/sec   Loss 4.9572   LearningRate 0.0251   Epoch: 9   Global Step: 56800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:26,747-Speed 5612.46 samples/sec   Loss 4.8768   LearningRate 0.0250   Epoch: 9   Global Step: 56810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:28,586-Speed 5570.12 samples/sec   Loss 4.9997   LearningRate 0.0250   Epoch: 9   Global Step: 56820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:02:30,414-Speed 5601.50 samples/sec   Loss 4.8415   LearningRate 0.0250   Epoch: 9   Global Step: 56830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:02:32,230-Speed 5642.96 samples/sec   Loss 4.8339   LearningRate 0.0250   Epoch: 9   Global Step: 56840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:02:34,103-Speed 5467.51 samples/sec   Loss 4.7529   LearningRate 0.0250   Epoch: 9   Global Step: 56850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:35,896-Speed 5713.03 samples/sec   Loss 4.8096   LearningRate 0.0250   Epoch: 9   Global Step: 56860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:50,686-Speed 692.41 samples/sec   Loss 4.3587   LearningRate 0.0250   Epoch: 10   Global Step: 56870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:52,513-Speed 5608.05 samples/sec   Loss 4.2520   LearningRate 0.0250   Epoch: 10   Global Step: 56880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:54,349-Speed 5581.20 samples/sec   Loss 4.2726   LearningRate 0.0250   Epoch: 10   Global Step: 56890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:56,319-Speed 5198.33 samples/sec   Loss 4.3551   LearningRate 0.0250   Epoch: 10   Global Step: 56900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:58,143-Speed 5616.54 samples/sec   Loss 4.3397   LearningRate 0.0250   Epoch: 10   Global Step: 56910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:02:59,961-Speed 5635.06 samples/sec   Loss 4.2820   LearningRate 0.0249   Epoch: 10   Global Step: 56920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:01,808-Speed 5546.44 samples/sec   Loss 4.2223   LearningRate 0.0249   Epoch: 10   Global Step: 56930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:03,652-Speed 5554.11 samples/sec   Loss 4.2938   LearningRate 0.0249   Epoch: 10   Global Step: 56940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:05,494-Speed 5560.14 samples/sec   Loss 4.3930   LearningRate 0.0249   Epoch: 10   Global Step: 56950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:07,323-Speed 5600.27 samples/sec   Loss 4.1599   LearningRate 0.0249   Epoch: 10   Global Step: 56960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:09,182-Speed 5511.07 samples/sec   Loss 4.1753   LearningRate 0.0249   Epoch: 10   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:11,025-Speed 5555.30 samples/sec   Loss 4.2332   LearningRate 0.0249   Epoch: 10   Global Step: 56980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:12,874-Speed 5542.63 samples/sec   Loss 4.2924   LearningRate 0.0249   Epoch: 10   Global Step: 56990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:14,722-Speed 5542.22 samples/sec   Loss 4.2469   LearningRate 0.0249   Epoch: 10   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:16,557-Speed 5582.66 samples/sec   Loss 4.1955   LearningRate 0.0249   Epoch: 10   Global Step: 57010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:18,391-Speed 5585.28 samples/sec   Loss 4.4133   LearningRate 0.0249   Epoch: 10   Global Step: 57020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:20,221-Speed 5598.46 samples/sec   Loss 4.3402   LearningRate 0.0249   Epoch: 10   Global Step: 57030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:22,050-Speed 5599.78 samples/sec   Loss 4.3105   LearningRate 0.0248   Epoch: 10   Global Step: 57040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:23,880-Speed 5598.16 samples/sec   Loss 4.2179   LearningRate 0.0248   Epoch: 10   Global Step: 57050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:25,711-Speed 5594.42 samples/sec   Loss 4.3905   LearningRate 0.0248   Epoch: 10   Global Step: 57060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:27,547-Speed 5578.14 samples/sec   Loss 4.2840   LearningRate 0.0248   Epoch: 10   Global Step: 57070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:29,364-Speed 5636.40 samples/sec   Loss 4.3691   LearningRate 0.0248   Epoch: 10   Global Step: 57080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:31,238-Speed 5467.35 samples/sec   Loss 4.4490   LearningRate 0.0248   Epoch: 10   Global Step: 57090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:33,070-Speed 5590.40 samples/sec   Loss 4.4429   LearningRate 0.0248   Epoch: 10   Global Step: 57100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:03:34,889-Speed 5632.11 samples/sec   Loss 4.4298   LearningRate 0.0248   Epoch: 10   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:36,709-Speed 5629.01 samples/sec   Loss 4.4688   LearningRate 0.0248   Epoch: 10   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:38,557-Speed 5541.43 samples/sec   Loss 4.2764   LearningRate 0.0248   Epoch: 10   Global Step: 57130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:40,394-Speed 5576.94 samples/sec   Loss 4.4180   LearningRate 0.0248   Epoch: 10   Global Step: 57140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:42,242-Speed 5543.23 samples/sec   Loss 4.4892   LearningRate 0.0247   Epoch: 10   Global Step: 57150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:44,078-Speed 5580.01 samples/sec   Loss 4.4035   LearningRate 0.0247   Epoch: 10   Global Step: 57160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:45,893-Speed 5642.78 samples/sec   Loss 4.3229   LearningRate 0.0247   Epoch: 10   Global Step: 57170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:47,710-Speed 5637.30 samples/sec   Loss 4.4855   LearningRate 0.0247   Epoch: 10   Global Step: 57180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:49,554-Speed 5555.57 samples/sec   Loss 4.2845   LearningRate 0.0247   Epoch: 10   Global Step: 57190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:51,378-Speed 5614.49 samples/sec   Loss 4.3127   LearningRate 0.0247   Epoch: 10   Global Step: 57200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:03:53,203-Speed 5613.85 samples/sec   Loss 4.4230   LearningRate 0.0247   Epoch: 10   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:03:55,047-Speed 5556.74 samples/sec   Loss 4.5290   LearningRate 0.0247   Epoch: 10   Global Step: 57220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:03:56,868-Speed 5623.88 samples/sec   Loss 4.4267   LearningRate 0.0247   Epoch: 10   Global Step: 57230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:03:58,697-Speed 5600.34 samples/sec   Loss 4.4565   LearningRate 0.0247   Epoch: 10   Global Step: 57240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:04:00,530-Speed 5589.12 samples/sec   Loss 4.5058   LearningRate 0.0247   Epoch: 10   Global Step: 57250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:04:02,366-Speed 5579.73 samples/sec   Loss 4.4772   LearningRate 0.0246   Epoch: 10   Global Step: 57260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:04,196-Speed 5597.24 samples/sec   Loss 4.4476   LearningRate 0.0246   Epoch: 10   Global Step: 57270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:06,017-Speed 5623.74 samples/sec   Loss 4.4231   LearningRate 0.0246   Epoch: 10   Global Step: 57280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:07,829-Speed 5655.45 samples/sec   Loss 4.4708   LearningRate 0.0246   Epoch: 10   Global Step: 57290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:09,662-Speed 5587.90 samples/sec   Loss 4.4389   LearningRate 0.0246   Epoch: 10   Global Step: 57300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:11,485-Speed 5617.37 samples/sec   Loss 4.3724   LearningRate 0.0246   Epoch: 10   Global Step: 57310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:13,306-Speed 5624.80 samples/sec   Loss 4.4860   LearningRate 0.0246   Epoch: 10   Global Step: 57320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:15,142-Speed 5581.17 samples/sec   Loss 4.3985   LearningRate 0.0246   Epoch: 10   Global Step: 57330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:16,971-Speed 5598.67 samples/sec   Loss 4.4752   LearningRate 0.0246   Epoch: 10   Global Step: 57340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:18,792-Speed 5625.89 samples/sec   Loss 4.5273   LearningRate 0.0246   Epoch: 10   Global Step: 57350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:20,612-Speed 5626.85 samples/sec   Loss 4.3586   LearningRate 0.0246   Epoch: 10   Global Step: 57360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:22,433-Speed 5627.53 samples/sec   Loss 4.5711   LearningRate 0.0246   Epoch: 10   Global Step: 57370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:24,285-Speed 5531.29 samples/sec   Loss 4.5175   LearningRate 0.0245   Epoch: 10   Global Step: 57380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 05:04:26,110-Speed 5613.62 samples/sec   Loss 4.3886   LearningRate 0.0245   Epoch: 10   Global Step: 57390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:27,933-Speed 5619.32 samples/sec   Loss 4.5478   LearningRate 0.0245   Epoch: 10   Global Step: 57400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:29,767-Speed 5583.28 samples/sec   Loss 4.3713   LearningRate 0.0245   Epoch: 10   Global Step: 57410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:31,593-Speed 5609.45 samples/sec   Loss 4.5375   LearningRate 0.0245   Epoch: 10   Global Step: 57420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:33,416-Speed 5619.02 samples/sec   Loss 4.6612   LearningRate 0.0245   Epoch: 10   Global Step: 57430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:35,253-Speed 5576.52 samples/sec   Loss 4.4781   LearningRate 0.0245   Epoch: 10   Global Step: 57440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:37,092-Speed 5571.66 samples/sec   Loss 4.5369   LearningRate 0.0245   Epoch: 10   Global Step: 57450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:38,945-Speed 5525.85 samples/sec   Loss 4.6088   LearningRate 0.0245   Epoch: 10   Global Step: 57460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:40,838-Speed 5412.21 samples/sec   Loss 4.5112   LearningRate 0.0245   Epoch: 10   Global Step: 57470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:42,679-Speed 5563.34 samples/sec   Loss 4.6104   LearningRate 0.0245   Epoch: 10   Global Step: 57480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:44,513-Speed 5585.86 samples/sec   Loss 4.5018   LearningRate 0.0244   Epoch: 10   Global Step: 57490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:04:46,344-Speed 5594.67 samples/sec   Loss 4.5267   LearningRate 0.0244   Epoch: 10   Global Step: 57500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:04:48,175-Speed 5594.67 samples/sec   Loss 4.5496   LearningRate 0.0244   Epoch: 10   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:50,025-Speed 5536.48 samples/sec   Loss 4.5991   LearningRate 0.0244   Epoch: 10   Global Step: 57520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:51,894-Speed 5480.73 samples/sec   Loss 4.4899   LearningRate 0.0244   Epoch: 10   Global Step: 57530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:53,750-Speed 5517.64 samples/sec   Loss 4.4969   LearningRate 0.0244   Epoch: 10   Global Step: 57540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:55,591-Speed 5565.96 samples/sec   Loss 4.4913   LearningRate 0.0244   Epoch: 10   Global Step: 57550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:57,424-Speed 5589.26 samples/sec   Loss 4.5461   LearningRate 0.0244   Epoch: 10   Global Step: 57560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:04:59,268-Speed 5551.93 samples/sec   Loss 4.5490   LearningRate 0.0244   Epoch: 10   Global Step: 57570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:01,101-Speed 5590.16 samples/sec   Loss 4.4957   LearningRate 0.0244   Epoch: 10   Global Step: 57580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:02,933-Speed 5590.28 samples/sec   Loss 4.5012   LearningRate 0.0244   Epoch: 10   Global Step: 57590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:05,011-Speed 4929.67 samples/sec   Loss 4.4454   LearningRate 0.0244   Epoch: 10   Global Step: 57600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:06,855-Speed 5553.40 samples/sec   Loss 4.5657   LearningRate 0.0243   Epoch: 10   Global Step: 57610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:08,682-Speed 5610.03 samples/sec   Loss 4.4353   LearningRate 0.0243   Epoch: 10   Global Step: 57620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:10,515-Speed 5586.70 samples/sec   Loss 4.5838   LearningRate 0.0243   Epoch: 10   Global Step: 57630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:12,352-Speed 5575.23 samples/sec   Loss 4.5595   LearningRate 0.0243   Epoch: 10   Global Step: 57640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:14,185-Speed 5588.37 samples/sec   Loss 4.4156   LearningRate 0.0243   Epoch: 10   Global Step: 57650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:16,023-Speed 5573.61 samples/sec   Loss 4.5190   LearningRate 0.0243   Epoch: 10   Global Step: 57660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:17,842-Speed 5633.17 samples/sec   Loss 4.5961   LearningRate 0.0243   Epoch: 10   Global Step: 57670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:19,662-Speed 5628.16 samples/sec   Loss 4.6427   LearningRate 0.0243   Epoch: 10   Global Step: 57680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:21,498-Speed 5579.52 samples/sec   Loss 4.5994   LearningRate 0.0243   Epoch: 10   Global Step: 57690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:23,344-Speed 5547.16 samples/sec   Loss 4.4926   LearningRate 0.0243   Epoch: 10   Global Step: 57700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:25,258-Speed 5353.55 samples/sec   Loss 4.7063   LearningRate 0.0243   Epoch: 10   Global Step: 57710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:27,096-Speed 5571.63 samples/sec   Loss 4.6565   LearningRate 0.0242   Epoch: 10   Global Step: 57720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:28,916-Speed 5626.96 samples/sec   Loss 4.5549   LearningRate 0.0242   Epoch: 10   Global Step: 57730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:30,742-Speed 5610.25 samples/sec   Loss 4.5504   LearningRate 0.0242   Epoch: 10   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:32,576-Speed 5585.48 samples/sec   Loss 4.5214   LearningRate 0.0242   Epoch: 10   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:34,416-Speed 5567.27 samples/sec   Loss 4.6197   LearningRate 0.0242   Epoch: 10   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:36,267-Speed 5533.80 samples/sec   Loss 4.5866   LearningRate 0.0242   Epoch: 10   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:38,087-Speed 5627.50 samples/sec   Loss 4.6669   LearningRate 0.0242   Epoch: 10   Global Step: 57780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:39,907-Speed 5628.16 samples/sec   Loss 4.6195   LearningRate 0.0242   Epoch: 10   Global Step: 57790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:41,738-Speed 5596.03 samples/sec   Loss 4.5654   LearningRate 0.0242   Epoch: 10   Global Step: 57800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:05:43,563-Speed 5611.88 samples/sec   Loss 4.6089   LearningRate 0.0242   Epoch: 10   Global Step: 57810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:45,409-Speed 5549.32 samples/sec   Loss 4.6092   LearningRate 0.0242   Epoch: 10   Global Step: 57820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:47,245-Speed 5580.48 samples/sec   Loss 4.7157   LearningRate 0.0242   Epoch: 10   Global Step: 57830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:49,077-Speed 5589.34 samples/sec   Loss 4.6165   LearningRate 0.0241   Epoch: 10   Global Step: 57840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:50,916-Speed 5568.87 samples/sec   Loss 4.5863   LearningRate 0.0241   Epoch: 10   Global Step: 57850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:52,757-Speed 5566.25 samples/sec   Loss 4.5239   LearningRate 0.0241   Epoch: 10   Global Step: 57860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:54,596-Speed 5571.62 samples/sec   Loss 4.5795   LearningRate 0.0241   Epoch: 10   Global Step: 57870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:05:56,421-Speed 5612.60 samples/sec   Loss 4.5702   LearningRate 0.0241   Epoch: 10   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:05:58,278-Speed 5515.73 samples/sec   Loss 4.5398   LearningRate 0.0241   Epoch: 10   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:00,112-Speed 5584.35 samples/sec   Loss 4.6201   LearningRate 0.0241   Epoch: 10   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:01,935-Speed 5619.35 samples/sec   Loss 4.4907   LearningRate 0.0241   Epoch: 10   Global Step: 57910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:06:03,770-Speed 5580.77 samples/sec   Loss 4.6918   LearningRate 0.0241   Epoch: 10   Global Step: 57920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:05,629-Speed 5511.78 samples/sec   Loss 4.6248   LearningRate 0.0241   Epoch: 10   Global Step: 57930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:07,483-Speed 5525.59 samples/sec   Loss 4.5872   LearningRate 0.0241   Epoch: 10   Global Step: 57940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:09,322-Speed 5569.36 samples/sec   Loss 4.4442   LearningRate 0.0241   Epoch: 10   Global Step: 57950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:11,142-Speed 5626.99 samples/sec   Loss 4.6024   LearningRate 0.0240   Epoch: 10   Global Step: 57960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:12,981-Speed 5570.49 samples/sec   Loss 4.5021   LearningRate 0.0240   Epoch: 10   Global Step: 57970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:14,821-Speed 5567.14 samples/sec   Loss 4.5745   LearningRate 0.0240   Epoch: 10   Global Step: 57980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:16,644-Speed 5619.46 samples/sec   Loss 4.5436   LearningRate 0.0240   Epoch: 10   Global Step: 57990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:18,478-Speed 5586.39 samples/sec   Loss 4.5630   LearningRate 0.0240   Epoch: 10   Global Step: 58000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:06:44,638-[lfw][58000]XNorm: 22.345384
Training: 2022-04-27 05:06:44,639-[lfw][58000]Accuracy-Flip: 0.99733+-0.00260
Training: 2022-04-27 05:06:44,639-[lfw][58000]Accuracy-Highest: 0.99800
Training: 2022-04-27 05:07:15,040-[cfp_fp][58000]XNorm: 19.903842
Training: 2022-04-27 05:07:15,040-[cfp_fp][58000]Accuracy-Flip: 0.95557+-0.01053
Training: 2022-04-27 05:07:15,041-[cfp_fp][58000]Accuracy-Highest: 0.95557
Training: 2022-04-27 05:07:41,202-[agedb_30][58000]XNorm: 22.140580
Training: 2022-04-27 05:07:41,202-[agedb_30][58000]Accuracy-Flip: 0.97483+-0.00976
Training: 2022-04-27 05:07:41,203-[agedb_30][58000]Accuracy-Highest: 0.97550
Training: 2022-04-27 05:07:43,041-Speed 121.09 samples/sec   Loss 4.5631   LearningRate 0.0240   Epoch: 10   Global Step: 58010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:44,870-Speed 5601.38 samples/sec   Loss 4.5854   LearningRate 0.0240   Epoch: 10   Global Step: 58020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:07:46,679-Speed 5662.85 samples/sec   Loss 4.6195   LearningRate 0.0240   Epoch: 10   Global Step: 58030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:48,498-Speed 5630.62 samples/sec   Loss 4.5550   LearningRate 0.0240   Epoch: 10   Global Step: 58040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:50,346-Speed 5542.40 samples/sec   Loss 4.5675   LearningRate 0.0240   Epoch: 10   Global Step: 58050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:52,186-Speed 5566.89 samples/sec   Loss 4.4772   LearningRate 0.0240   Epoch: 10   Global Step: 58060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:54,025-Speed 5572.19 samples/sec   Loss 4.4751   LearningRate 0.0239   Epoch: 10   Global Step: 58070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:55,906-Speed 5444.54 samples/sec   Loss 4.6576   LearningRate 0.0239   Epoch: 10   Global Step: 58080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:57,736-Speed 5598.80 samples/sec   Loss 4.5922   LearningRate 0.0239   Epoch: 10   Global Step: 58090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:07:59,563-Speed 5605.99 samples/sec   Loss 4.4795   LearningRate 0.0239   Epoch: 10   Global Step: 58100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:08:01,410-Speed 5544.95 samples/sec   Loss 4.6372   LearningRate 0.0239   Epoch: 10   Global Step: 58110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:08:03,239-Speed 5599.93 samples/sec   Loss 4.4964   LearningRate 0.0239   Epoch: 10   Global Step: 58120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:08:05,077-Speed 5574.52 samples/sec   Loss 4.4934   LearningRate 0.0239   Epoch: 10   Global Step: 58130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 05:08:06,900-Speed 5616.08 samples/sec   Loss 4.6738   LearningRate 0.0239   Epoch: 10   Global Step: 58140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:08:08,748-Speed 5545.89 samples/sec   Loss 4.5912   LearningRate 0.0239   Epoch: 10   Global Step: 58150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:08:10,572-Speed 5613.87 samples/sec   Loss 4.4512   LearningRate 0.0239   Epoch: 10   Global Step: 58160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:08:12,394-Speed 5622.87 samples/sec   Loss 4.4865   LearningRate 0.0239   Epoch: 10   Global Step: 58170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 05:08:14,230-Speed 5578.75 samples/sec   Loss 4.7460   LearningRate 0.0239   Epoch: 10   Global Step: 58180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:16,061-Speed 5594.13 samples/sec   Loss 4.6408   LearningRate 0.0238   Epoch: 10   Global Step: 58190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:17,886-Speed 5615.01 samples/sec   Loss 4.5518   LearningRate 0.0238   Epoch: 10   Global Step: 58200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:19,724-Speed 5571.76 samples/sec   Loss 4.7193   LearningRate 0.0238   Epoch: 10   Global Step: 58210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:21,550-Speed 5611.98 samples/sec   Loss 4.6480   LearningRate 0.0238   Epoch: 10   Global Step: 58220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:23,388-Speed 5572.48 samples/sec   Loss 4.5909   LearningRate 0.0238   Epoch: 10   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:25,208-Speed 5628.24 samples/sec   Loss 4.5139   LearningRate 0.0238   Epoch: 10   Global Step: 58240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:27,032-Speed 5615.63 samples/sec   Loss 4.6763   LearningRate 0.0238   Epoch: 10   Global Step: 58250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:28,876-Speed 5555.56 samples/sec   Loss 4.6929   LearningRate 0.0238   Epoch: 10   Global Step: 58260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:30,700-Speed 5613.01 samples/sec   Loss 4.7078   LearningRate 0.0238   Epoch: 10   Global Step: 58270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:32,537-Speed 5578.57 samples/sec   Loss 4.5525   LearningRate 0.0238   Epoch: 10   Global Step: 58280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:34,383-Speed 5547.20 samples/sec   Loss 4.6367   LearningRate 0.0238   Epoch: 10   Global Step: 58290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:36,221-Speed 5574.27 samples/sec   Loss 4.6005   LearningRate 0.0237   Epoch: 10   Global Step: 58300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:38,039-Speed 5636.55 samples/sec   Loss 4.6269   LearningRate 0.0237   Epoch: 10   Global Step: 58310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:39,869-Speed 5596.56 samples/sec   Loss 4.6397   LearningRate 0.0237   Epoch: 10   Global Step: 58320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:41,693-Speed 5615.44 samples/sec   Loss 4.6761   LearningRate 0.0237   Epoch: 10   Global Step: 58330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:43,549-Speed 5519.20 samples/sec   Loss 4.6566   LearningRate 0.0237   Epoch: 10   Global Step: 58340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:45,374-Speed 5610.81 samples/sec   Loss 4.7146   LearningRate 0.0237   Epoch: 10   Global Step: 58350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:47,192-Speed 5634.14 samples/sec   Loss 4.6862   LearningRate 0.0237   Epoch: 10   Global Step: 58360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:08:49,011-Speed 5631.68 samples/sec   Loss 4.6885   LearningRate 0.0237   Epoch: 10   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:50,851-Speed 5566.47 samples/sec   Loss 4.7859   LearningRate 0.0237   Epoch: 10   Global Step: 58380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:52,672-Speed 5627.49 samples/sec   Loss 4.5682   LearningRate 0.0237   Epoch: 10   Global Step: 58390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:54,511-Speed 5568.19 samples/sec   Loss 4.6796   LearningRate 0.0237   Epoch: 10   Global Step: 58400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:56,390-Speed 5452.67 samples/sec   Loss 4.6508   LearningRate 0.0237   Epoch: 10   Global Step: 58410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:08:58,221-Speed 5593.40 samples/sec   Loss 4.6033   LearningRate 0.0236   Epoch: 10   Global Step: 58420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:00,052-Speed 5596.05 samples/sec   Loss 4.6528   LearningRate 0.0236   Epoch: 10   Global Step: 58430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:01,910-Speed 5514.04 samples/sec   Loss 4.5086   LearningRate 0.0236   Epoch: 10   Global Step: 58440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:03,754-Speed 5554.94 samples/sec   Loss 4.6059   LearningRate 0.0236   Epoch: 10   Global Step: 58450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:05,575-Speed 5624.52 samples/sec   Loss 4.6487   LearningRate 0.0236   Epoch: 10   Global Step: 58460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:07,395-Speed 5627.34 samples/sec   Loss 4.4799   LearningRate 0.0236   Epoch: 10   Global Step: 58470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:09,240-Speed 5552.06 samples/sec   Loss 4.6724   LearningRate 0.0236   Epoch: 10   Global Step: 58480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:11,064-Speed 5615.31 samples/sec   Loss 4.5565   LearningRate 0.0236   Epoch: 10   Global Step: 58490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:12,891-Speed 5607.34 samples/sec   Loss 4.6177   LearningRate 0.0236   Epoch: 10   Global Step: 58500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:14,717-Speed 5610.40 samples/sec   Loss 4.7200   LearningRate 0.0236   Epoch: 10   Global Step: 58510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:16,537-Speed 5628.71 samples/sec   Loss 4.5827   LearningRate 0.0236   Epoch: 10   Global Step: 58520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:09:18,359-Speed 5621.27 samples/sec   Loss 4.5379   LearningRate 0.0236   Epoch: 10   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:20,190-Speed 5595.53 samples/sec   Loss 4.5901   LearningRate 0.0235   Epoch: 10   Global Step: 58540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:22,018-Speed 5602.57 samples/sec   Loss 4.5594   LearningRate 0.0235   Epoch: 10   Global Step: 58550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:23,838-Speed 5629.15 samples/sec   Loss 4.6704   LearningRate 0.0235   Epoch: 10   Global Step: 58560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:25,665-Speed 5605.93 samples/sec   Loss 4.5561   LearningRate 0.0235   Epoch: 10   Global Step: 58570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:27,485-Speed 5629.39 samples/sec   Loss 4.5657   LearningRate 0.0235   Epoch: 10   Global Step: 58580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:29,308-Speed 5616.45 samples/sec   Loss 4.7030   LearningRate 0.0235   Epoch: 10   Global Step: 58590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:31,155-Speed 5546.20 samples/sec   Loss 4.6289   LearningRate 0.0235   Epoch: 10   Global Step: 58600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:32,993-Speed 5572.81 samples/sec   Loss 4.7262   LearningRate 0.0235   Epoch: 10   Global Step: 58610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:34,821-Speed 5603.85 samples/sec   Loss 4.5491   LearningRate 0.0235   Epoch: 10   Global Step: 58620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:36,651-Speed 5598.37 samples/sec   Loss 4.4630   LearningRate 0.0235   Epoch: 10   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:09:38,480-Speed 5599.12 samples/sec   Loss 4.5833   LearningRate 0.0235   Epoch: 10   Global Step: 58640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:09:40,294-Speed 5646.50 samples/sec   Loss 4.5966   LearningRate 0.0235   Epoch: 10   Global Step: 58650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:42,138-Speed 5555.43 samples/sec   Loss 4.6284   LearningRate 0.0234   Epoch: 10   Global Step: 58660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:43,985-Speed 5547.88 samples/sec   Loss 4.6196   LearningRate 0.0234   Epoch: 10   Global Step: 58670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:45,871-Speed 5431.12 samples/sec   Loss 4.6058   LearningRate 0.0234   Epoch: 10   Global Step: 58680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:47,707-Speed 5578.65 samples/sec   Loss 4.5931   LearningRate 0.0234   Epoch: 10   Global Step: 58690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:49,542-Speed 5582.36 samples/sec   Loss 4.7120   LearningRate 0.0234   Epoch: 10   Global Step: 58700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:51,375-Speed 5588.74 samples/sec   Loss 4.6605   LearningRate 0.0234   Epoch: 10   Global Step: 58710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:53,201-Speed 5608.34 samples/sec   Loss 4.7342   LearningRate 0.0234   Epoch: 10   Global Step: 58720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:55,041-Speed 5566.98 samples/sec   Loss 4.5958   LearningRate 0.0234   Epoch: 10   Global Step: 58730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:56,876-Speed 5582.31 samples/sec   Loss 4.5535   LearningRate 0.0234   Epoch: 10   Global Step: 58740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:09:58,762-Speed 5432.78 samples/sec   Loss 4.6704   LearningRate 0.0234   Epoch: 10   Global Step: 58750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:00,610-Speed 5543.72 samples/sec   Loss 4.5975   LearningRate 0.0234   Epoch: 10   Global Step: 58760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:02,447-Speed 5573.47 samples/sec   Loss 4.6579   LearningRate 0.0233   Epoch: 10   Global Step: 58770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:04,269-Speed 5622.24 samples/sec   Loss 4.6675   LearningRate 0.0233   Epoch: 10   Global Step: 58780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:06,111-Speed 5564.00 samples/sec   Loss 4.5356   LearningRate 0.0233   Epoch: 10   Global Step: 58790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:07,933-Speed 5621.88 samples/sec   Loss 4.6765   LearningRate 0.0233   Epoch: 10   Global Step: 58800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:09,800-Speed 5485.07 samples/sec   Loss 4.5958   LearningRate 0.0233   Epoch: 10   Global Step: 58810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:11,648-Speed 5543.56 samples/sec   Loss 4.6198   LearningRate 0.0233   Epoch: 10   Global Step: 58820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:13,489-Speed 5565.09 samples/sec   Loss 4.6402   LearningRate 0.0233   Epoch: 10   Global Step: 58830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:15,380-Speed 5415.96 samples/sec   Loss 4.7327   LearningRate 0.0233   Epoch: 10   Global Step: 58840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:17,291-Speed 5358.85 samples/sec   Loss 4.6610   LearningRate 0.0233   Epoch: 10   Global Step: 58850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:19,119-Speed 5603.14 samples/sec   Loss 4.6895   LearningRate 0.0233   Epoch: 10   Global Step: 58860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:20,947-Speed 5605.05 samples/sec   Loss 4.5239   LearningRate 0.0233   Epoch: 10   Global Step: 58870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:22,797-Speed 5536.48 samples/sec   Loss 4.5698   LearningRate 0.0233   Epoch: 10   Global Step: 58880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:24,654-Speed 5516.80 samples/sec   Loss 4.7398   LearningRate 0.0232   Epoch: 10   Global Step: 58890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:26,478-Speed 5616.71 samples/sec   Loss 4.7044   LearningRate 0.0232   Epoch: 10   Global Step: 58900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:28,322-Speed 5555.25 samples/sec   Loss 4.7469   LearningRate 0.0232   Epoch: 10   Global Step: 58910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:30,167-Speed 5550.54 samples/sec   Loss 4.4811   LearningRate 0.0232   Epoch: 10   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:31,992-Speed 5615.59 samples/sec   Loss 4.7015   LearningRate 0.0232   Epoch: 10   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:33,818-Speed 5609.47 samples/sec   Loss 4.6701   LearningRate 0.0232   Epoch: 10   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:35,634-Speed 5638.00 samples/sec   Loss 4.5162   LearningRate 0.0232   Epoch: 10   Global Step: 58950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:37,459-Speed 5612.78 samples/sec   Loss 4.5816   LearningRate 0.0232   Epoch: 10   Global Step: 58960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:39,297-Speed 5574.92 samples/sec   Loss 4.5104   LearningRate 0.0232   Epoch: 10   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:41,123-Speed 5608.23 samples/sec   Loss 4.5846   LearningRate 0.0232   Epoch: 10   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:42,961-Speed 5574.59 samples/sec   Loss 4.5867   LearningRate 0.0232   Epoch: 10   Global Step: 58990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:44,789-Speed 5602.78 samples/sec   Loss 4.6731   LearningRate 0.0232   Epoch: 10   Global Step: 59000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:46,665-Speed 5458.47 samples/sec   Loss 4.5099   LearningRate 0.0231   Epoch: 10   Global Step: 59010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:10:48,504-Speed 5572.02 samples/sec   Loss 4.6727   LearningRate 0.0231   Epoch: 10   Global Step: 59020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:50,319-Speed 5643.29 samples/sec   Loss 4.5513   LearningRate 0.0231   Epoch: 10   Global Step: 59030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:52,142-Speed 5618.75 samples/sec   Loss 4.5649   LearningRate 0.0231   Epoch: 10   Global Step: 59040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:53,964-Speed 5624.39 samples/sec   Loss 4.5224   LearningRate 0.0231   Epoch: 10   Global Step: 59050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:55,784-Speed 5627.31 samples/sec   Loss 4.6222   LearningRate 0.0231   Epoch: 10   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:57,605-Speed 5623.54 samples/sec   Loss 4.6270   LearningRate 0.0231   Epoch: 10   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:10:59,418-Speed 5651.09 samples/sec   Loss 4.7082   LearningRate 0.0231   Epoch: 10   Global Step: 59080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:01,239-Speed 5624.32 samples/sec   Loss 4.6469   LearningRate 0.0231   Epoch: 10   Global Step: 59090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:03,073-Speed 5585.42 samples/sec   Loss 4.5953   LearningRate 0.0231   Epoch: 10   Global Step: 59100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:04,892-Speed 5630.74 samples/sec   Loss 4.7005   LearningRate 0.0231   Epoch: 10   Global Step: 59110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:06,705-Speed 5650.24 samples/sec   Loss 4.7470   LearningRate 0.0231   Epoch: 10   Global Step: 59120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:08,526-Speed 5625.42 samples/sec   Loss 4.5357   LearningRate 0.0230   Epoch: 10   Global Step: 59130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:10,360-Speed 5584.59 samples/sec   Loss 4.5553   LearningRate 0.0230   Epoch: 10   Global Step: 59140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:12,190-Speed 5599.02 samples/sec   Loss 4.6489   LearningRate 0.0230   Epoch: 10   Global Step: 59150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:14,024-Speed 5587.29 samples/sec   Loss 4.8099   LearningRate 0.0230   Epoch: 10   Global Step: 59160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:15,845-Speed 5622.73 samples/sec   Loss 4.6744   LearningRate 0.0230   Epoch: 10   Global Step: 59170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:17,689-Speed 5555.20 samples/sec   Loss 4.4819   LearningRate 0.0230   Epoch: 10   Global Step: 59180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:19,532-Speed 5558.53 samples/sec   Loss 4.6441   LearningRate 0.0230   Epoch: 10   Global Step: 59190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:21,382-Speed 5536.90 samples/sec   Loss 4.6223   LearningRate 0.0230   Epoch: 10   Global Step: 59200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:23,200-Speed 5635.47 samples/sec   Loss 4.6573   LearningRate 0.0230   Epoch: 10   Global Step: 59210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:11:25,026-Speed 5608.84 samples/sec   Loss 4.6093   LearningRate 0.0230   Epoch: 10   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:26,893-Speed 5485.57 samples/sec   Loss 4.6524   LearningRate 0.0230   Epoch: 10   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:28,716-Speed 5618.95 samples/sec   Loss 4.6541   LearningRate 0.0230   Epoch: 10   Global Step: 59240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:30,554-Speed 5573.95 samples/sec   Loss 4.6524   LearningRate 0.0229   Epoch: 10   Global Step: 59250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:32,396-Speed 5560.99 samples/sec   Loss 4.6395   LearningRate 0.0229   Epoch: 10   Global Step: 59260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:34,217-Speed 5623.61 samples/sec   Loss 4.5787   LearningRate 0.0229   Epoch: 10   Global Step: 59270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:36,040-Speed 5622.07 samples/sec   Loss 4.7819   LearningRate 0.0229   Epoch: 10   Global Step: 59280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:37,869-Speed 5601.29 samples/sec   Loss 4.5577   LearningRate 0.0229   Epoch: 10   Global Step: 59290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:39,694-Speed 5609.99 samples/sec   Loss 4.6437   LearningRate 0.0229   Epoch: 10   Global Step: 59300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:41,523-Speed 5600.43 samples/sec   Loss 4.8036   LearningRate 0.0229   Epoch: 10   Global Step: 59310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:43,361-Speed 5574.38 samples/sec   Loss 4.5397   LearningRate 0.0229   Epoch: 10   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:11:45,181-Speed 5630.03 samples/sec   Loss 4.6497   LearningRate 0.0229   Epoch: 10   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:11:47,001-Speed 5627.23 samples/sec   Loss 4.5642   LearningRate 0.0229   Epoch: 10   Global Step: 59340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:48,847-Speed 5546.80 samples/sec   Loss 4.5763   LearningRate 0.0229   Epoch: 10   Global Step: 59350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:50,686-Speed 5570.77 samples/sec   Loss 4.7002   LearningRate 0.0228   Epoch: 10   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:52,539-Speed 5527.52 samples/sec   Loss 4.5981   LearningRate 0.0228   Epoch: 10   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:54,384-Speed 5552.08 samples/sec   Loss 4.6699   LearningRate 0.0228   Epoch: 10   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:56,234-Speed 5538.00 samples/sec   Loss 4.5796   LearningRate 0.0228   Epoch: 10   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:58,087-Speed 5528.28 samples/sec   Loss 4.5305   LearningRate 0.0228   Epoch: 10   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:11:59,911-Speed 5617.28 samples/sec   Loss 4.6111   LearningRate 0.0228   Epoch: 10   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:01,760-Speed 5541.18 samples/sec   Loss 4.6847   LearningRate 0.0228   Epoch: 10   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:03,629-Speed 5480.28 samples/sec   Loss 4.6013   LearningRate 0.0228   Epoch: 10   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:05,459-Speed 5595.91 samples/sec   Loss 4.5371   LearningRate 0.0228   Epoch: 10   Global Step: 59440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:12:07,291-Speed 5589.98 samples/sec   Loss 4.5444   LearningRate 0.0228   Epoch: 10   Global Step: 59450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:09,126-Speed 5583.32 samples/sec   Loss 4.6315   LearningRate 0.0228   Epoch: 10   Global Step: 59460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:10,980-Speed 5525.53 samples/sec   Loss 4.6301   LearningRate 0.0228   Epoch: 10   Global Step: 59470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:12,808-Speed 5603.49 samples/sec   Loss 4.6450   LearningRate 0.0227   Epoch: 10   Global Step: 59480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:14,633-Speed 5613.26 samples/sec   Loss 4.6787   LearningRate 0.0227   Epoch: 10   Global Step: 59490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:16,453-Speed 5627.83 samples/sec   Loss 4.5409   LearningRate 0.0227   Epoch: 10   Global Step: 59500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:18,280-Speed 5606.74 samples/sec   Loss 4.5697   LearningRate 0.0227   Epoch: 10   Global Step: 59510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:20,133-Speed 5527.34 samples/sec   Loss 4.6231   LearningRate 0.0227   Epoch: 10   Global Step: 59520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:21,959-Speed 5610.54 samples/sec   Loss 4.5640   LearningRate 0.0227   Epoch: 10   Global Step: 59530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:23,793-Speed 5586.93 samples/sec   Loss 4.6306   LearningRate 0.0227   Epoch: 10   Global Step: 59540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:25,643-Speed 5534.40 samples/sec   Loss 4.5398   LearningRate 0.0227   Epoch: 10   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:12:27,474-Speed 5596.79 samples/sec   Loss 4.6675   LearningRate 0.0227   Epoch: 10   Global Step: 59560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:12:29,318-Speed 5552.02 samples/sec   Loss 4.6506   LearningRate 0.0227   Epoch: 10   Global Step: 59570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:12:31,147-Speed 5601.82 samples/sec   Loss 4.5497   LearningRate 0.0227   Epoch: 10   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:12:32,965-Speed 5633.48 samples/sec   Loss 4.5898   LearningRate 0.0227   Epoch: 10   Global Step: 59590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:34,795-Speed 5599.03 samples/sec   Loss 4.5409   LearningRate 0.0226   Epoch: 10   Global Step: 59600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:36,623-Speed 5603.37 samples/sec   Loss 4.5441   LearningRate 0.0226   Epoch: 10   Global Step: 59610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:38,488-Speed 5492.97 samples/sec   Loss 4.5168   LearningRate 0.0226   Epoch: 10   Global Step: 59620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:40,316-Speed 5603.16 samples/sec   Loss 4.5964   LearningRate 0.0226   Epoch: 10   Global Step: 59630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:42,144-Speed 5603.26 samples/sec   Loss 4.5764   LearningRate 0.0226   Epoch: 10   Global Step: 59640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:43,978-Speed 5587.98 samples/sec   Loss 4.6174   LearningRate 0.0226   Epoch: 10   Global Step: 59650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:45,807-Speed 5598.14 samples/sec   Loss 4.5654   LearningRate 0.0226   Epoch: 10   Global Step: 59660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:47,624-Speed 5638.02 samples/sec   Loss 4.5465   LearningRate 0.0226   Epoch: 10   Global Step: 59670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:49,441-Speed 5636.34 samples/sec   Loss 4.6528   LearningRate 0.0226   Epoch: 10   Global Step: 59680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:12:51,265-Speed 5617.48 samples/sec   Loss 4.5593   LearningRate 0.0226   Epoch: 10   Global Step: 59690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:12:53,188-Speed 5326.01 samples/sec   Loss 4.5650   LearningRate 0.0226   Epoch: 10   Global Step: 59700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:12:55,050-Speed 5500.89 samples/sec   Loss 4.6584   LearningRate 0.0226   Epoch: 10   Global Step: 59710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:12:56,887-Speed 5576.33 samples/sec   Loss 4.5555   LearningRate 0.0225   Epoch: 10   Global Step: 59720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:12:58,733-Speed 5550.93 samples/sec   Loss 4.5574   LearningRate 0.0225   Epoch: 10   Global Step: 59730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:13:00,591-Speed 5513.41 samples/sec   Loss 4.5916   LearningRate 0.0225   Epoch: 10   Global Step: 59740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:13:02,420-Speed 5599.68 samples/sec   Loss 4.6519   LearningRate 0.0225   Epoch: 10   Global Step: 59750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:13:04,275-Speed 5521.55 samples/sec   Loss 4.6028   LearningRate 0.0225   Epoch: 10   Global Step: 59760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:13:06,111-Speed 5581.47 samples/sec   Loss 4.5817   LearningRate 0.0225   Epoch: 10   Global Step: 59770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:13:07,942-Speed 5593.42 samples/sec   Loss 4.5963   LearningRate 0.0225   Epoch: 10   Global Step: 59780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:13:09,769-Speed 5606.90 samples/sec   Loss 4.6366   LearningRate 0.0225   Epoch: 10   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:11,600-Speed 5593.34 samples/sec   Loss 4.6759   LearningRate 0.0225   Epoch: 10   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:13,447-Speed 5546.99 samples/sec   Loss 4.6675   LearningRate 0.0225   Epoch: 10   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:15,278-Speed 5593.62 samples/sec   Loss 4.5606   LearningRate 0.0225   Epoch: 10   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:17,105-Speed 5606.63 samples/sec   Loss 4.6082   LearningRate 0.0225   Epoch: 10   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:18,944-Speed 5569.58 samples/sec   Loss 4.5148   LearningRate 0.0224   Epoch: 10   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:20,784-Speed 5565.78 samples/sec   Loss 4.4946   LearningRate 0.0224   Epoch: 10   Global Step: 59850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:22,608-Speed 5616.91 samples/sec   Loss 4.6306   LearningRate 0.0224   Epoch: 10   Global Step: 59860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:24,438-Speed 5597.81 samples/sec   Loss 4.5872   LearningRate 0.0224   Epoch: 10   Global Step: 59870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:26,272-Speed 5585.12 samples/sec   Loss 4.4770   LearningRate 0.0224   Epoch: 10   Global Step: 59880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:28,113-Speed 5565.97 samples/sec   Loss 4.5423   LearningRate 0.0224   Epoch: 10   Global Step: 59890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:29,945-Speed 5589.79 samples/sec   Loss 4.5691   LearningRate 0.0224   Epoch: 10   Global Step: 59900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:31,786-Speed 5563.20 samples/sec   Loss 4.6101   LearningRate 0.0224   Epoch: 10   Global Step: 59910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:33,615-Speed 5602.44 samples/sec   Loss 4.7559   LearningRate 0.0224   Epoch: 10   Global Step: 59920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:35,437-Speed 5621.06 samples/sec   Loss 4.5123   LearningRate 0.0224   Epoch: 10   Global Step: 59930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:37,278-Speed 5563.10 samples/sec   Loss 4.6039   LearningRate 0.0224   Epoch: 10   Global Step: 59940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:39,116-Speed 5572.14 samples/sec   Loss 4.5863   LearningRate 0.0224   Epoch: 10   Global Step: 59950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:40,954-Speed 5574.77 samples/sec   Loss 4.6892   LearningRate 0.0223   Epoch: 10   Global Step: 59960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:42,781-Speed 5607.45 samples/sec   Loss 4.5716   LearningRate 0.0223   Epoch: 10   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:44,616-Speed 5580.34 samples/sec   Loss 4.4574   LearningRate 0.0223   Epoch: 10   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:13:46,450-Speed 5586.83 samples/sec   Loss 4.6288   LearningRate 0.0223   Epoch: 10   Global Step: 59990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:13:48,298-Speed 5542.05 samples/sec   Loss 4.6075   LearningRate 0.0223   Epoch: 10   Global Step: 60000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:14:14,524-[lfw][60000]XNorm: 22.201556
Training: 2022-04-27 05:14:14,525-[lfw][60000]Accuracy-Flip: 0.99683+-0.00337
Training: 2022-04-27 05:14:14,525-[lfw][60000]Accuracy-Highest: 0.99800
Training: 2022-04-27 05:14:44,793-[cfp_fp][60000]XNorm: 19.690361
Training: 2022-04-27 05:14:44,794-[cfp_fp][60000]Accuracy-Flip: 0.95771+-0.00926
Training: 2022-04-27 05:14:44,794-[cfp_fp][60000]Accuracy-Highest: 0.95771
Training: 2022-04-27 05:15:10,942-[agedb_30][60000]XNorm: 21.847595
Training: 2022-04-27 05:15:10,943-[agedb_30][60000]Accuracy-Flip: 0.97533+-0.00891
Training: 2022-04-27 05:15:10,943-[agedb_30][60000]Accuracy-Highest: 0.97550
Training: 2022-04-27 05:15:12,774-Speed 121.22 samples/sec   Loss 4.5299   LearningRate 0.0223   Epoch: 10   Global Step: 60010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:15:14,588-Speed 5647.89 samples/sec   Loss 4.5984   LearningRate 0.0223   Epoch: 10   Global Step: 60020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:16,419-Speed 5594.05 samples/sec   Loss 4.6005   LearningRate 0.0223   Epoch: 10   Global Step: 60030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:18,241-Speed 5619.73 samples/sec   Loss 4.3832   LearningRate 0.0223   Epoch: 10   Global Step: 60040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:20,068-Speed 5608.55 samples/sec   Loss 4.4697   LearningRate 0.0223   Epoch: 10   Global Step: 60050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:21,930-Speed 5499.20 samples/sec   Loss 4.6012   LearningRate 0.0223   Epoch: 10   Global Step: 60060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:23,804-Speed 5466.53 samples/sec   Loss 4.5342   LearningRate 0.0223   Epoch: 10   Global Step: 60070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:25,663-Speed 5511.64 samples/sec   Loss 4.5494   LearningRate 0.0222   Epoch: 10   Global Step: 60080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:27,508-Speed 5551.41 samples/sec   Loss 4.5144   LearningRate 0.0222   Epoch: 10   Global Step: 60090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:29,342-Speed 5585.85 samples/sec   Loss 4.7174   LearningRate 0.0222   Epoch: 10   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:31,167-Speed 5613.60 samples/sec   Loss 4.5153   LearningRate 0.0222   Epoch: 10   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:33,001-Speed 5585.69 samples/sec   Loss 4.5513   LearningRate 0.0222   Epoch: 10   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:34,844-Speed 5557.58 samples/sec   Loss 4.5722   LearningRate 0.0222   Epoch: 10   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:36,684-Speed 5565.56 samples/sec   Loss 4.5878   LearningRate 0.0222   Epoch: 10   Global Step: 60140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:38,553-Speed 5481.90 samples/sec   Loss 4.4690   LearningRate 0.0222   Epoch: 10   Global Step: 60150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:40,390-Speed 5575.72 samples/sec   Loss 4.7503   LearningRate 0.0222   Epoch: 10   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:42,234-Speed 5555.51 samples/sec   Loss 4.6014   LearningRate 0.0222   Epoch: 10   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:44,065-Speed 5595.25 samples/sec   Loss 4.6692   LearningRate 0.0222   Epoch: 10   Global Step: 60180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:45,934-Speed 5480.40 samples/sec   Loss 4.6928   LearningRate 0.0222   Epoch: 10   Global Step: 60190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:47,820-Speed 5430.99 samples/sec   Loss 4.6306   LearningRate 0.0221   Epoch: 10   Global Step: 60200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:49,649-Speed 5599.57 samples/sec   Loss 4.7116   LearningRate 0.0221   Epoch: 10   Global Step: 60210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:15:51,483-Speed 5586.35 samples/sec   Loss 4.6761   LearningRate 0.0221   Epoch: 10   Global Step: 60220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:15:53,315-Speed 5590.23 samples/sec   Loss 4.5627   LearningRate 0.0221   Epoch: 10   Global Step: 60230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:15:55,137-Speed 5622.27 samples/sec   Loss 4.5742   LearningRate 0.0221   Epoch: 10   Global Step: 60240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:15:56,952-Speed 5643.24 samples/sec   Loss 4.6161   LearningRate 0.0221   Epoch: 10   Global Step: 60250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:15:58,787-Speed 5583.78 samples/sec   Loss 4.5302   LearningRate 0.0221   Epoch: 10   Global Step: 60260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:00,609-Speed 5621.46 samples/sec   Loss 4.6081   LearningRate 0.0221   Epoch: 10   Global Step: 60270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:02,437-Speed 5604.71 samples/sec   Loss 4.5245   LearningRate 0.0221   Epoch: 10   Global Step: 60280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:04,265-Speed 5601.62 samples/sec   Loss 4.6777   LearningRate 0.0221   Epoch: 10   Global Step: 60290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:06,093-Speed 5605.28 samples/sec   Loss 4.7464   LearningRate 0.0221   Epoch: 10   Global Step: 60300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:07,939-Speed 5546.82 samples/sec   Loss 4.4824   LearningRate 0.0221   Epoch: 10   Global Step: 60310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:09,759-Speed 5630.38 samples/sec   Loss 4.5370   LearningRate 0.0221   Epoch: 10   Global Step: 60320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:11,596-Speed 5574.70 samples/sec   Loss 4.5903   LearningRate 0.0220   Epoch: 10   Global Step: 60330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:13,417-Speed 5625.81 samples/sec   Loss 4.5111   LearningRate 0.0220   Epoch: 10   Global Step: 60340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:15,276-Speed 5511.93 samples/sec   Loss 4.5896   LearningRate 0.0220   Epoch: 10   Global Step: 60350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:17,118-Speed 5558.85 samples/sec   Loss 4.7387   LearningRate 0.0220   Epoch: 10   Global Step: 60360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:18,953-Speed 5582.51 samples/sec   Loss 4.5176   LearningRate 0.0220   Epoch: 10   Global Step: 60370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:20,788-Speed 5582.26 samples/sec   Loss 4.5487   LearningRate 0.0220   Epoch: 10   Global Step: 60380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:22,635-Speed 5544.42 samples/sec   Loss 4.5906   LearningRate 0.0220   Epoch: 10   Global Step: 60390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:24,476-Speed 5567.60 samples/sec   Loss 4.4980   LearningRate 0.0220   Epoch: 10   Global Step: 60400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:26,303-Speed 5603.98 samples/sec   Loss 4.4375   LearningRate 0.0220   Epoch: 10   Global Step: 60410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:28,122-Speed 5631.92 samples/sec   Loss 4.5843   LearningRate 0.0220   Epoch: 10   Global Step: 60420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:29,946-Speed 5616.89 samples/sec   Loss 4.5214   LearningRate 0.0220   Epoch: 10   Global Step: 60430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:31,776-Speed 5596.55 samples/sec   Loss 4.4254   LearningRate 0.0220   Epoch: 10   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:33,614-Speed 5572.48 samples/sec   Loss 4.6859   LearningRate 0.0219   Epoch: 10   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:35,441-Speed 5609.02 samples/sec   Loss 4.5283   LearningRate 0.0219   Epoch: 10   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:37,279-Speed 5572.39 samples/sec   Loss 4.7627   LearningRate 0.0219   Epoch: 10   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:39,119-Speed 5567.13 samples/sec   Loss 4.4856   LearningRate 0.0219   Epoch: 10   Global Step: 60480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:40,955-Speed 5578.48 samples/sec   Loss 4.6692   LearningRate 0.0219   Epoch: 10   Global Step: 60490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:42,801-Speed 5550.18 samples/sec   Loss 4.5302   LearningRate 0.0219   Epoch: 10   Global Step: 60500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:44,627-Speed 5608.76 samples/sec   Loss 4.5050   LearningRate 0.0219   Epoch: 10   Global Step: 60510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:46,458-Speed 5595.89 samples/sec   Loss 4.3980   LearningRate 0.0219   Epoch: 10   Global Step: 60520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:48,289-Speed 5594.47 samples/sec   Loss 4.5596   LearningRate 0.0219   Epoch: 10   Global Step: 60530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:50,115-Speed 5607.79 samples/sec   Loss 4.5836   LearningRate 0.0219   Epoch: 10   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:16:51,960-Speed 5553.57 samples/sec   Loss 4.6485   LearningRate 0.0219   Epoch: 10   Global Step: 60550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:53,794-Speed 5585.32 samples/sec   Loss 4.5156   LearningRate 0.0219   Epoch: 10   Global Step: 60560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:55,627-Speed 5588.25 samples/sec   Loss 4.4909   LearningRate 0.0218   Epoch: 10   Global Step: 60570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:57,466-Speed 5570.70 samples/sec   Loss 4.5674   LearningRate 0.0218   Epoch: 10   Global Step: 60580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:16:59,318-Speed 5531.91 samples/sec   Loss 4.5690   LearningRate 0.0218   Epoch: 10   Global Step: 60590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:01,146-Speed 5602.19 samples/sec   Loss 4.5895   LearningRate 0.0218   Epoch: 10   Global Step: 60600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:03,002-Speed 5518.77 samples/sec   Loss 4.5482   LearningRate 0.0218   Epoch: 10   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:04,834-Speed 5592.60 samples/sec   Loss 4.4785   LearningRate 0.0218   Epoch: 10   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:06,658-Speed 5615.09 samples/sec   Loss 4.5675   LearningRate 0.0218   Epoch: 10   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:08,485-Speed 5605.87 samples/sec   Loss 4.6049   LearningRate 0.0218   Epoch: 10   Global Step: 60640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:10,331-Speed 5550.65 samples/sec   Loss 4.6467   LearningRate 0.0218   Epoch: 10   Global Step: 60650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:12,186-Speed 5520.02 samples/sec   Loss 4.5828   LearningRate 0.0218   Epoch: 10   Global Step: 60660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:14,019-Speed 5587.41 samples/sec   Loss 4.5382   LearningRate 0.0218   Epoch: 10   Global Step: 60670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:15,840-Speed 5626.63 samples/sec   Loss 4.6209   LearningRate 0.0218   Epoch: 10   Global Step: 60680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:17,666-Speed 5610.40 samples/sec   Loss 4.5428   LearningRate 0.0217   Epoch: 10   Global Step: 60690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:19,498-Speed 5590.91 samples/sec   Loss 4.4437   LearningRate 0.0217   Epoch: 10   Global Step: 60700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:21,389-Speed 5417.45 samples/sec   Loss 4.6144   LearningRate 0.0217   Epoch: 10   Global Step: 60710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:23,211-Speed 5623.00 samples/sec   Loss 4.5562   LearningRate 0.0217   Epoch: 10   Global Step: 60720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:25,041-Speed 5595.63 samples/sec   Loss 4.4992   LearningRate 0.0217   Epoch: 10   Global Step: 60730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:26,858-Speed 5638.02 samples/sec   Loss 4.4323   LearningRate 0.0217   Epoch: 10   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:28,701-Speed 5559.01 samples/sec   Loss 4.6138   LearningRate 0.0217   Epoch: 10   Global Step: 60750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:30,538-Speed 5576.35 samples/sec   Loss 4.5672   LearningRate 0.0217   Epoch: 10   Global Step: 60760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:32,364-Speed 5608.90 samples/sec   Loss 4.5376   LearningRate 0.0217   Epoch: 10   Global Step: 60770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:34,210-Speed 5549.23 samples/sec   Loss 4.6730   LearningRate 0.0217   Epoch: 10   Global Step: 60780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:36,030-Speed 5626.80 samples/sec   Loss 4.5159   LearningRate 0.0217   Epoch: 10   Global Step: 60790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:37,863-Speed 5588.12 samples/sec   Loss 4.6577   LearningRate 0.0217   Epoch: 10   Global Step: 60800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:39,699-Speed 5582.34 samples/sec   Loss 4.4460   LearningRate 0.0216   Epoch: 10   Global Step: 60810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:41,541-Speed 5559.25 samples/sec   Loss 4.5243   LearningRate 0.0216   Epoch: 10   Global Step: 60820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:17:43,378-Speed 5577.57 samples/sec   Loss 4.5109   LearningRate 0.0216   Epoch: 10   Global Step: 60830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:45,223-Speed 5552.32 samples/sec   Loss 4.5655   LearningRate 0.0216   Epoch: 10   Global Step: 60840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:47,070-Speed 5544.65 samples/sec   Loss 4.5478   LearningRate 0.0216   Epoch: 10   Global Step: 60850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:48,902-Speed 5592.71 samples/sec   Loss 4.4470   LearningRate 0.0216   Epoch: 10   Global Step: 60860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:50,730-Speed 5602.11 samples/sec   Loss 4.5107   LearningRate 0.0216   Epoch: 10   Global Step: 60870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:52,546-Speed 5641.89 samples/sec   Loss 4.5497   LearningRate 0.0216   Epoch: 10   Global Step: 60880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:54,367-Speed 5624.11 samples/sec   Loss 4.5308   LearningRate 0.0216   Epoch: 10   Global Step: 60890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:56,188-Speed 5625.60 samples/sec   Loss 4.6371   LearningRate 0.0216   Epoch: 10   Global Step: 60900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:58,020-Speed 5590.83 samples/sec   Loss 4.6113   LearningRate 0.0216   Epoch: 10   Global Step: 60910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:17:59,866-Speed 5550.49 samples/sec   Loss 4.4224   LearningRate 0.0216   Epoch: 10   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:01,693-Speed 5606.52 samples/sec   Loss 4.5135   LearningRate 0.0215   Epoch: 10   Global Step: 60930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:18:03,524-Speed 5594.43 samples/sec   Loss 4.4969   LearningRate 0.0215   Epoch: 10   Global Step: 60940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:18:05,336-Speed 5652.74 samples/sec   Loss 4.6317   LearningRate 0.0215   Epoch: 10   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:07,172-Speed 5581.22 samples/sec   Loss 4.5190   LearningRate 0.0215   Epoch: 10   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:09,017-Speed 5549.22 samples/sec   Loss 4.6395   LearningRate 0.0215   Epoch: 10   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:10,858-Speed 5564.63 samples/sec   Loss 4.4404   LearningRate 0.0215   Epoch: 10   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:12,694-Speed 5578.52 samples/sec   Loss 4.6320   LearningRate 0.0215   Epoch: 10   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:14,526-Speed 5591.18 samples/sec   Loss 4.6312   LearningRate 0.0215   Epoch: 10   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:16,363-Speed 5575.96 samples/sec   Loss 4.5920   LearningRate 0.0215   Epoch: 10   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:18,218-Speed 5522.22 samples/sec   Loss 4.4994   LearningRate 0.0215   Epoch: 10   Global Step: 61020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:20,088-Speed 5478.45 samples/sec   Loss 4.5266   LearningRate 0.0215   Epoch: 10   Global Step: 61030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:21,924-Speed 5580.18 samples/sec   Loss 4.6379   LearningRate 0.0215   Epoch: 10   Global Step: 61040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:23,763-Speed 5569.45 samples/sec   Loss 4.7722   LearningRate 0.0215   Epoch: 10   Global Step: 61050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:25,588-Speed 5613.52 samples/sec   Loss 4.4233   LearningRate 0.0214   Epoch: 10   Global Step: 61060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:27,414-Speed 5609.69 samples/sec   Loss 4.4399   LearningRate 0.0214   Epoch: 10   Global Step: 61070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:29,242-Speed 5604.68 samples/sec   Loss 4.6775   LearningRate 0.0214   Epoch: 10   Global Step: 61080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:31,073-Speed 5593.20 samples/sec   Loss 4.6106   LearningRate 0.0214   Epoch: 10   Global Step: 61090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:32,939-Speed 5489.25 samples/sec   Loss 4.4871   LearningRate 0.0214   Epoch: 10   Global Step: 61100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:34,761-Speed 5622.94 samples/sec   Loss 4.4860   LearningRate 0.0214   Epoch: 10   Global Step: 61110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:36,612-Speed 5534.70 samples/sec   Loss 4.5680   LearningRate 0.0214   Epoch: 10   Global Step: 61120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:38,534-Speed 5329.99 samples/sec   Loss 4.5989   LearningRate 0.0214   Epoch: 10   Global Step: 61130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:40,398-Speed 5493.79 samples/sec   Loss 4.5646   LearningRate 0.0214   Epoch: 10   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:42,227-Speed 5602.51 samples/sec   Loss 4.5164   LearningRate 0.0214   Epoch: 10   Global Step: 61150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:18:44,045-Speed 5633.83 samples/sec   Loss 4.6007   LearningRate 0.0214   Epoch: 10   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:45,917-Speed 5471.63 samples/sec   Loss 4.4740   LearningRate 0.0214   Epoch: 10   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:47,738-Speed 5624.31 samples/sec   Loss 4.4951   LearningRate 0.0213   Epoch: 10   Global Step: 61180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:49,573-Speed 5584.75 samples/sec   Loss 4.6061   LearningRate 0.0213   Epoch: 10   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:51,410-Speed 5575.70 samples/sec   Loss 4.3930   LearningRate 0.0213   Epoch: 10   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:53,280-Speed 5476.41 samples/sec   Loss 4.5089   LearningRate 0.0213   Epoch: 10   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:55,121-Speed 5563.14 samples/sec   Loss 4.4162   LearningRate 0.0213   Epoch: 10   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:56,944-Speed 5619.00 samples/sec   Loss 4.5749   LearningRate 0.0213   Epoch: 10   Global Step: 61230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:18:58,786-Speed 5562.61 samples/sec   Loss 4.6292   LearningRate 0.0213   Epoch: 10   Global Step: 61240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:00,619-Speed 5587.87 samples/sec   Loss 4.4369   LearningRate 0.0213   Epoch: 10   Global Step: 61250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:02,442-Speed 5619.29 samples/sec   Loss 4.5500   LearningRate 0.0213   Epoch: 10   Global Step: 61260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:04,272-Speed 5596.67 samples/sec   Loss 4.6176   LearningRate 0.0213   Epoch: 10   Global Step: 61270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:06,101-Speed 5601.39 samples/sec   Loss 4.4485   LearningRate 0.0213   Epoch: 10   Global Step: 61280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:07,945-Speed 5554.92 samples/sec   Loss 4.6615   LearningRate 0.0213   Epoch: 10   Global Step: 61290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:09,789-Speed 5555.30 samples/sec   Loss 4.6612   LearningRate 0.0212   Epoch: 10   Global Step: 61300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:11,631-Speed 5559.62 samples/sec   Loss 4.3878   LearningRate 0.0212   Epoch: 10   Global Step: 61310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:13,457-Speed 5610.99 samples/sec   Loss 4.4647   LearningRate 0.0212   Epoch: 10   Global Step: 61320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:15,285-Speed 5603.89 samples/sec   Loss 4.5537   LearningRate 0.0212   Epoch: 10   Global Step: 61330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:17,125-Speed 5567.07 samples/sec   Loss 4.4552   LearningRate 0.0212   Epoch: 10   Global Step: 61340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:18,946-Speed 5625.76 samples/sec   Loss 4.4991   LearningRate 0.0212   Epoch: 10   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:20,779-Speed 5587.66 samples/sec   Loss 4.5220   LearningRate 0.0212   Epoch: 10   Global Step: 61360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:19:22,594-Speed 5644.38 samples/sec   Loss 4.4572   LearningRate 0.0212   Epoch: 10   Global Step: 61370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:24,430-Speed 5579.03 samples/sec   Loss 4.5925   LearningRate 0.0212   Epoch: 10   Global Step: 61380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:26,281-Speed 5532.47 samples/sec   Loss 4.5438   LearningRate 0.0212   Epoch: 10   Global Step: 61390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:28,127-Speed 5550.27 samples/sec   Loss 4.4212   LearningRate 0.0212   Epoch: 10   Global Step: 61400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:29,955-Speed 5603.70 samples/sec   Loss 4.5046   LearningRate 0.0212   Epoch: 10   Global Step: 61410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:31,782-Speed 5604.38 samples/sec   Loss 4.3194   LearningRate 0.0212   Epoch: 10   Global Step: 61420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:33,621-Speed 5573.14 samples/sec   Loss 4.5112   LearningRate 0.0211   Epoch: 10   Global Step: 61430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:35,448-Speed 5604.37 samples/sec   Loss 4.5409   LearningRate 0.0211   Epoch: 10   Global Step: 61440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:37,299-Speed 5536.15 samples/sec   Loss 4.5471   LearningRate 0.0211   Epoch: 10   Global Step: 61450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:39,126-Speed 5604.73 samples/sec   Loss 4.6534   LearningRate 0.0211   Epoch: 10   Global Step: 61460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:40,958-Speed 5593.59 samples/sec   Loss 4.5761   LearningRate 0.0211   Epoch: 10   Global Step: 61470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:42,799-Speed 5562.35 samples/sec   Loss 4.4923   LearningRate 0.0211   Epoch: 10   Global Step: 61480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:44,638-Speed 5569.49 samples/sec   Loss 4.5546   LearningRate 0.0211   Epoch: 10   Global Step: 61490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:46,484-Speed 5550.87 samples/sec   Loss 4.4841   LearningRate 0.0211   Epoch: 10   Global Step: 61500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:48,324-Speed 5564.28 samples/sec   Loss 4.5714   LearningRate 0.0211   Epoch: 10   Global Step: 61510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:50,152-Speed 5604.50 samples/sec   Loss 4.3936   LearningRate 0.0211   Epoch: 10   Global Step: 61520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:52,011-Speed 5512.12 samples/sec   Loss 4.4746   LearningRate 0.0211   Epoch: 10   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:53,850-Speed 5568.58 samples/sec   Loss 4.5204   LearningRate 0.0211   Epoch: 10   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:55,694-Speed 5557.01 samples/sec   Loss 4.5247   LearningRate 0.0210   Epoch: 10   Global Step: 61550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:57,534-Speed 5566.12 samples/sec   Loss 4.4211   LearningRate 0.0210   Epoch: 10   Global Step: 61560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:19:59,390-Speed 5518.29 samples/sec   Loss 4.5284   LearningRate 0.0210   Epoch: 10   Global Step: 61570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:20:01,246-Speed 5519.32 samples/sec   Loss 4.5767   LearningRate 0.0210   Epoch: 10   Global Step: 61580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:20:03,079-Speed 5589.22 samples/sec   Loss 4.4953   LearningRate 0.0210   Epoch: 10   Global Step: 61590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:04,904-Speed 5610.86 samples/sec   Loss 4.5498   LearningRate 0.0210   Epoch: 10   Global Step: 61600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:06,726-Speed 5624.82 samples/sec   Loss 4.4856   LearningRate 0.0210   Epoch: 10   Global Step: 61610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:08,553-Speed 5604.22 samples/sec   Loss 4.4942   LearningRate 0.0210   Epoch: 10   Global Step: 61620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:10,395-Speed 5562.19 samples/sec   Loss 4.3982   LearningRate 0.0210   Epoch: 10   Global Step: 61630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:12,240-Speed 5552.90 samples/sec   Loss 4.5606   LearningRate 0.0210   Epoch: 10   Global Step: 61640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:14,087-Speed 5543.83 samples/sec   Loss 4.5364   LearningRate 0.0210   Epoch: 10   Global Step: 61650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:15,911-Speed 5616.28 samples/sec   Loss 4.3451   LearningRate 0.0210   Epoch: 10   Global Step: 61660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:17,781-Speed 5479.55 samples/sec   Loss 4.4143   LearningRate 0.0209   Epoch: 10   Global Step: 61670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:19,606-Speed 5612.89 samples/sec   Loss 4.5194   LearningRate 0.0209   Epoch: 10   Global Step: 61680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:21,449-Speed 5558.40 samples/sec   Loss 4.4823   LearningRate 0.0209   Epoch: 10   Global Step: 61690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:20:23,279-Speed 5598.08 samples/sec   Loss 4.4792   LearningRate 0.0209   Epoch: 10   Global Step: 61700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:20:25,105-Speed 5609.56 samples/sec   Loss 4.5184   LearningRate 0.0209   Epoch: 10   Global Step: 61710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:26,942-Speed 5575.41 samples/sec   Loss 4.4518   LearningRate 0.0209   Epoch: 10   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:28,768-Speed 5608.23 samples/sec   Loss 4.4312   LearningRate 0.0209   Epoch: 10   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:30,594-Speed 5610.60 samples/sec   Loss 4.4439   LearningRate 0.0209   Epoch: 10   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:32,432-Speed 5571.82 samples/sec   Loss 4.5203   LearningRate 0.0209   Epoch: 10   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:34,298-Speed 5490.28 samples/sec   Loss 4.5699   LearningRate 0.0209   Epoch: 10   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:36,138-Speed 5566.10 samples/sec   Loss 4.4669   LearningRate 0.0209   Epoch: 10   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:37,970-Speed 5593.84 samples/sec   Loss 4.5561   LearningRate 0.0209   Epoch: 10   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:39,802-Speed 5588.63 samples/sec   Loss 4.3281   LearningRate 0.0209   Epoch: 10   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:41,650-Speed 5546.34 samples/sec   Loss 4.4782   LearningRate 0.0208   Epoch: 10   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:43,474-Speed 5613.36 samples/sec   Loss 4.5588   LearningRate 0.0208   Epoch: 10   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:20:45,294-Speed 5629.65 samples/sec   Loss 4.4248   LearningRate 0.0208   Epoch: 10   Global Step: 61820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:20:47,116-Speed 5623.13 samples/sec   Loss 4.3902   LearningRate 0.0208   Epoch: 10   Global Step: 61830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:20:48,959-Speed 5557.53 samples/sec   Loss 4.5064   LearningRate 0.0208   Epoch: 10   Global Step: 61840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:20:50,793-Speed 5584.19 samples/sec   Loss 4.5249   LearningRate 0.0208   Epoch: 10   Global Step: 61850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:20:52,625-Speed 5589.84 samples/sec   Loss 4.5357   LearningRate 0.0208   Epoch: 10   Global Step: 61860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:20:54,454-Speed 5602.40 samples/sec   Loss 4.5010   LearningRate 0.0208   Epoch: 10   Global Step: 61870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:20:56,299-Speed 5551.97 samples/sec   Loss 4.4034   LearningRate 0.0208   Epoch: 10   Global Step: 61880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:20:58,143-Speed 5553.38 samples/sec   Loss 4.6025   LearningRate 0.0208   Epoch: 10   Global Step: 61890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:20:59,986-Speed 5558.05 samples/sec   Loss 4.3961   LearningRate 0.0208   Epoch: 10   Global Step: 61900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:21:01,813-Speed 5607.66 samples/sec   Loss 4.4602   LearningRate 0.0208   Epoch: 10   Global Step: 61910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:21:03,644-Speed 5596.57 samples/sec   Loss 4.4605   LearningRate 0.0207   Epoch: 10   Global Step: 61920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:21:05,466-Speed 5620.15 samples/sec   Loss 4.5576   LearningRate 0.0207   Epoch: 10   Global Step: 61930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:07,294-Speed 5605.06 samples/sec   Loss 4.4110   LearningRate 0.0207   Epoch: 10   Global Step: 61940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:09,119-Speed 5611.12 samples/sec   Loss 4.5623   LearningRate 0.0207   Epoch: 10   Global Step: 61950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:10,945-Speed 5612.51 samples/sec   Loss 4.5873   LearningRate 0.0207   Epoch: 10   Global Step: 61960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:12,782-Speed 5575.28 samples/sec   Loss 4.4703   LearningRate 0.0207   Epoch: 10   Global Step: 61970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:14,613-Speed 5593.01 samples/sec   Loss 4.4496   LearningRate 0.0207   Epoch: 10   Global Step: 61980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:16,435-Speed 5623.56 samples/sec   Loss 4.5342   LearningRate 0.0207   Epoch: 10   Global Step: 61990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:18,290-Speed 5519.87 samples/sec   Loss 4.3617   LearningRate 0.0207   Epoch: 10   Global Step: 62000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:21:44,520-[lfw][62000]XNorm: 22.538425
Training: 2022-04-27 05:21:44,521-[lfw][62000]Accuracy-Flip: 0.99767+-0.00271
Training: 2022-04-27 05:21:44,521-[lfw][62000]Accuracy-Highest: 0.99800
Training: 2022-04-27 05:22:14,900-[cfp_fp][62000]XNorm: 19.937683
Training: 2022-04-27 05:22:14,901-[cfp_fp][62000]Accuracy-Flip: 0.95514+-0.00853
Training: 2022-04-27 05:22:14,901-[cfp_fp][62000]Accuracy-Highest: 0.95771
Training: 2022-04-27 05:22:41,080-[agedb_30][62000]XNorm: 22.216626
Training: 2022-04-27 05:22:41,080-[agedb_30][62000]Accuracy-Flip: 0.97667+-0.00813
Training: 2022-04-27 05:22:41,081-[agedb_30][62000]Accuracy-Highest: 0.97667
Training: 2022-04-27 05:22:42,936-Speed 120.98 samples/sec   Loss 4.4760   LearningRate 0.0207   Epoch: 10   Global Step: 62010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:44,767-Speed 5592.94 samples/sec   Loss 4.5009   LearningRate 0.0207   Epoch: 10   Global Step: 62020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:46,607-Speed 5567.19 samples/sec   Loss 4.4485   LearningRate 0.0207   Epoch: 10   Global Step: 62030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:22:48,431-Speed 5615.66 samples/sec   Loss 4.4586   LearningRate 0.0207   Epoch: 10   Global Step: 62040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:50,264-Speed 5590.84 samples/sec   Loss 4.4742   LearningRate 0.0206   Epoch: 10   Global Step: 62050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:52,107-Speed 5557.49 samples/sec   Loss 4.5137   LearningRate 0.0206   Epoch: 10   Global Step: 62060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:53,928-Speed 5624.68 samples/sec   Loss 4.5501   LearningRate 0.0206   Epoch: 10   Global Step: 62070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:55,758-Speed 5595.64 samples/sec   Loss 4.3908   LearningRate 0.0206   Epoch: 10   Global Step: 62080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:57,595-Speed 5578.48 samples/sec   Loss 4.4322   LearningRate 0.0206   Epoch: 10   Global Step: 62090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:22:59,431-Speed 5576.93 samples/sec   Loss 4.5530   LearningRate 0.0206   Epoch: 10   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:01,293-Speed 5500.69 samples/sec   Loss 4.4420   LearningRate 0.0206   Epoch: 10   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:03,111-Speed 5636.67 samples/sec   Loss 4.4723   LearningRate 0.0206   Epoch: 10   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:04,947-Speed 5579.96 samples/sec   Loss 4.5015   LearningRate 0.0206   Epoch: 10   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:06,765-Speed 5634.47 samples/sec   Loss 4.4759   LearningRate 0.0206   Epoch: 10   Global Step: 62140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:23:08,584-Speed 5630.87 samples/sec   Loss 4.5513   LearningRate 0.0206   Epoch: 10   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:10,423-Speed 5570.14 samples/sec   Loss 4.4076   LearningRate 0.0206   Epoch: 10   Global Step: 62160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:12,261-Speed 5573.84 samples/sec   Loss 4.5388   LearningRate 0.0205   Epoch: 10   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:14,141-Speed 5448.77 samples/sec   Loss 4.3649   LearningRate 0.0205   Epoch: 10   Global Step: 62180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:15,975-Speed 5583.80 samples/sec   Loss 4.4062   LearningRate 0.0205   Epoch: 10   Global Step: 62190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:17,845-Speed 5477.57 samples/sec   Loss 4.4977   LearningRate 0.0205   Epoch: 10   Global Step: 62200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:19,678-Speed 5588.71 samples/sec   Loss 4.4546   LearningRate 0.0205   Epoch: 10   Global Step: 62210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:21,505-Speed 5607.98 samples/sec   Loss 4.5295   LearningRate 0.0205   Epoch: 10   Global Step: 62220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:23,360-Speed 5521.46 samples/sec   Loss 4.4217   LearningRate 0.0205   Epoch: 10   Global Step: 62230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:25,233-Speed 5468.25 samples/sec   Loss 4.5305   LearningRate 0.0205   Epoch: 10   Global Step: 62240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:27,061-Speed 5604.57 samples/sec   Loss 4.4426   LearningRate 0.0205   Epoch: 10   Global Step: 62250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:23:28,906-Speed 5553.48 samples/sec   Loss 4.4536   LearningRate 0.0205   Epoch: 10   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:30,778-Speed 5471.62 samples/sec   Loss 4.4651   LearningRate 0.0205   Epoch: 10   Global Step: 62270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:32,609-Speed 5592.73 samples/sec   Loss 4.3908   LearningRate 0.0205   Epoch: 10   Global Step: 62280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:34,438-Speed 5601.06 samples/sec   Loss 4.4036   LearningRate 0.0205   Epoch: 10   Global Step: 62290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:36,271-Speed 5588.44 samples/sec   Loss 4.5897   LearningRate 0.0204   Epoch: 10   Global Step: 62300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:38,096-Speed 5613.00 samples/sec   Loss 4.5094   LearningRate 0.0204   Epoch: 10   Global Step: 62310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:39,942-Speed 5548.60 samples/sec   Loss 4.3928   LearningRate 0.0204   Epoch: 10   Global Step: 62320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:41,776-Speed 5586.28 samples/sec   Loss 4.4201   LearningRate 0.0204   Epoch: 10   Global Step: 62330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:43,603-Speed 5604.92 samples/sec   Loss 4.6243   LearningRate 0.0204   Epoch: 10   Global Step: 62340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:45,436-Speed 5588.96 samples/sec   Loss 4.4337   LearningRate 0.0204   Epoch: 10   Global Step: 62350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:47,284-Speed 5541.54 samples/sec   Loss 4.4221   LearningRate 0.0204   Epoch: 10   Global Step: 62360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:23:49,116-Speed 5592.93 samples/sec   Loss 4.4901   LearningRate 0.0204   Epoch: 10   Global Step: 62370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:50,942-Speed 5609.22 samples/sec   Loss 4.4182   LearningRate 0.0204   Epoch: 10   Global Step: 62380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:52,768-Speed 5610.07 samples/sec   Loss 4.3786   LearningRate 0.0204   Epoch: 10   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:54,605-Speed 5578.02 samples/sec   Loss 4.4529   LearningRate 0.0204   Epoch: 10   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:56,440-Speed 5580.16 samples/sec   Loss 4.4241   LearningRate 0.0204   Epoch: 10   Global Step: 62410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:23:58,261-Speed 5625.51 samples/sec   Loss 4.6358   LearningRate 0.0203   Epoch: 10   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:00,097-Speed 5578.59 samples/sec   Loss 4.4164   LearningRate 0.0203   Epoch: 10   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:01,908-Speed 5658.22 samples/sec   Loss 4.4454   LearningRate 0.0203   Epoch: 10   Global Step: 62440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:03,751-Speed 5558.62 samples/sec   Loss 4.3246   LearningRate 0.0203   Epoch: 10   Global Step: 62450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:05,573-Speed 5619.41 samples/sec   Loss 4.4838   LearningRate 0.0203   Epoch: 10   Global Step: 62460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:07,399-Speed 5612.10 samples/sec   Loss 4.3285   LearningRate 0.0203   Epoch: 10   Global Step: 62470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:09,233-Speed 5584.15 samples/sec   Loss 4.5545   LearningRate 0.0203   Epoch: 10   Global Step: 62480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:11,054-Speed 5625.84 samples/sec   Loss 4.4178   LearningRate 0.0203   Epoch: 10   Global Step: 62490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:12,883-Speed 5600.37 samples/sec   Loss 4.5136   LearningRate 0.0203   Epoch: 10   Global Step: 62500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:14,699-Speed 5643.30 samples/sec   Loss 4.4277   LearningRate 0.0203   Epoch: 10   Global Step: 62510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:16,537-Speed 5572.10 samples/sec   Loss 4.4276   LearningRate 0.0203   Epoch: 10   Global Step: 62520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:18,372-Speed 5580.76 samples/sec   Loss 4.4983   LearningRate 0.0203   Epoch: 10   Global Step: 62530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:20,267-Speed 5406.64 samples/sec   Loss 4.2919   LearningRate 0.0203   Epoch: 10   Global Step: 62540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:33,426-Speed 778.23 samples/sec   Loss 4.2073   LearningRate 0.0202   Epoch: 11   Global Step: 62550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:35,281-Speed 5521.24 samples/sec   Loss 3.8368   LearningRate 0.0202   Epoch: 11   Global Step: 62560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:37,111-Speed 5597.93 samples/sec   Loss 3.7461   LearningRate 0.0202   Epoch: 11   Global Step: 62570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:38,948-Speed 5598.92 samples/sec   Loss 3.7399   LearningRate 0.0202   Epoch: 11   Global Step: 62580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:40,781-Speed 5587.03 samples/sec   Loss 3.7553   LearningRate 0.0202   Epoch: 11   Global Step: 62590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:42,627-Speed 5550.78 samples/sec   Loss 3.7314   LearningRate 0.0202   Epoch: 11   Global Step: 62600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:44,456-Speed 5599.34 samples/sec   Loss 3.8382   LearningRate 0.0202   Epoch: 11   Global Step: 62610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:46,317-Speed 5503.67 samples/sec   Loss 3.8189   LearningRate 0.0202   Epoch: 11   Global Step: 62620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:48,166-Speed 5542.05 samples/sec   Loss 3.8280   LearningRate 0.0202   Epoch: 11   Global Step: 62630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:50,001-Speed 5583.14 samples/sec   Loss 3.8739   LearningRate 0.0202   Epoch: 11   Global Step: 62640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:24:51,829-Speed 5601.32 samples/sec   Loss 3.9360   LearningRate 0.0202   Epoch: 11   Global Step: 62650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:53,655-Speed 5609.82 samples/sec   Loss 3.8099   LearningRate 0.0202   Epoch: 11   Global Step: 62660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:55,518-Speed 5499.03 samples/sec   Loss 3.8299   LearningRate 0.0202   Epoch: 11   Global Step: 62670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:24:57,335-Speed 5637.84 samples/sec   Loss 3.8328   LearningRate 0.0201   Epoch: 11   Global Step: 62680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:24:59,159-Speed 5614.46 samples/sec   Loss 3.9533   LearningRate 0.0201   Epoch: 11   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:00,981-Speed 5622.52 samples/sec   Loss 3.9334   LearningRate 0.0201   Epoch: 11   Global Step: 62700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:02,822-Speed 5564.72 samples/sec   Loss 3.8829   LearningRate 0.0201   Epoch: 11   Global Step: 62710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:04,645-Speed 5619.97 samples/sec   Loss 3.8800   LearningRate 0.0201   Epoch: 11   Global Step: 62720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:06,474-Speed 5598.05 samples/sec   Loss 3.7700   LearningRate 0.0201   Epoch: 11   Global Step: 62730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:08,331-Speed 5516.04 samples/sec   Loss 3.8563   LearningRate 0.0201   Epoch: 11   Global Step: 62740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:10,164-Speed 5587.77 samples/sec   Loss 3.8431   LearningRate 0.0201   Epoch: 11   Global Step: 62750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:12,017-Speed 5529.74 samples/sec   Loss 3.9162   LearningRate 0.0201   Epoch: 11   Global Step: 62760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:13,854-Speed 5577.96 samples/sec   Loss 3.9427   LearningRate 0.0201   Epoch: 11   Global Step: 62770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:25:15,694-Speed 5567.58 samples/sec   Loss 3.9264   LearningRate 0.0201   Epoch: 11   Global Step: 62780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:17,512-Speed 5633.20 samples/sec   Loss 3.8574   LearningRate 0.0201   Epoch: 11   Global Step: 62790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:19,340-Speed 5603.78 samples/sec   Loss 3.8921   LearningRate 0.0200   Epoch: 11   Global Step: 62800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:21,180-Speed 5565.03 samples/sec   Loss 3.9116   LearningRate 0.0200   Epoch: 11   Global Step: 62810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:22,994-Speed 5648.31 samples/sec   Loss 3.8053   LearningRate 0.0200   Epoch: 11   Global Step: 62820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:24,831-Speed 5576.77 samples/sec   Loss 3.8929   LearningRate 0.0200   Epoch: 11   Global Step: 62830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:26,662-Speed 5593.69 samples/sec   Loss 4.0412   LearningRate 0.0200   Epoch: 11   Global Step: 62840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:28,505-Speed 5557.66 samples/sec   Loss 3.9880   LearningRate 0.0200   Epoch: 11   Global Step: 62850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:30,336-Speed 5595.17 samples/sec   Loss 4.0296   LearningRate 0.0200   Epoch: 11   Global Step: 62860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:32,200-Speed 5494.52 samples/sec   Loss 3.9067   LearningRate 0.0200   Epoch: 11   Global Step: 62870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:34,058-Speed 5514.82 samples/sec   Loss 3.9104   LearningRate 0.0200   Epoch: 11   Global Step: 62880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:25:35,903-Speed 5551.06 samples/sec   Loss 3.8989   LearningRate 0.0200   Epoch: 11   Global Step: 62890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:25:37,841-Speed 5286.44 samples/sec   Loss 3.9068   LearningRate 0.0200   Epoch: 11   Global Step: 62900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:25:39,764-Speed 5326.85 samples/sec   Loss 3.8726   LearningRate 0.0200   Epoch: 11   Global Step: 62910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:25:41,699-Speed 5293.88 samples/sec   Loss 3.9421   LearningRate 0.0200   Epoch: 11   Global Step: 62920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:25:43,639-Speed 5279.19 samples/sec   Loss 3.9665   LearningRate 0.0199   Epoch: 11   Global Step: 62930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:25:45,607-Speed 5205.65 samples/sec   Loss 3.9894   LearningRate 0.0199   Epoch: 11   Global Step: 62940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:25:47,614-Speed 5103.54 samples/sec   Loss 3.7872   LearningRate 0.0199   Epoch: 11   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:49,449-Speed 5581.65 samples/sec   Loss 4.0219   LearningRate 0.0199   Epoch: 11   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:51,295-Speed 5549.13 samples/sec   Loss 3.9227   LearningRate 0.0199   Epoch: 11   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:53,146-Speed 5534.84 samples/sec   Loss 4.0288   LearningRate 0.0199   Epoch: 11   Global Step: 62980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:54,976-Speed 5597.76 samples/sec   Loss 3.9250   LearningRate 0.0199   Epoch: 11   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:56,810-Speed 5586.64 samples/sec   Loss 3.8688   LearningRate 0.0199   Epoch: 11   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:25:58,625-Speed 5641.51 samples/sec   Loss 3.8895   LearningRate 0.0199   Epoch: 11   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:00,442-Speed 5637.26 samples/sec   Loss 3.9044   LearningRate 0.0199   Epoch: 11   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:02,285-Speed 5558.69 samples/sec   Loss 4.0403   LearningRate 0.0199   Epoch: 11   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:04,119-Speed 5584.23 samples/sec   Loss 4.0510   LearningRate 0.0199   Epoch: 11   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:05,938-Speed 5633.62 samples/sec   Loss 3.9087   LearningRate 0.0199   Epoch: 11   Global Step: 63050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:26:07,753-Speed 5643.62 samples/sec   Loss 4.0361   LearningRate 0.0198   Epoch: 11   Global Step: 63060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:26:09,566-Speed 5647.55 samples/sec   Loss 4.0251   LearningRate 0.0198   Epoch: 11   Global Step: 63070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:26:11,437-Speed 5475.38 samples/sec   Loss 4.0570   LearningRate 0.0198   Epoch: 11   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:13,266-Speed 5600.48 samples/sec   Loss 4.0507   LearningRate 0.0198   Epoch: 11   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:15,119-Speed 5527.70 samples/sec   Loss 4.0073   LearningRate 0.0198   Epoch: 11   Global Step: 63100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:16,936-Speed 5638.65 samples/sec   Loss 3.9061   LearningRate 0.0198   Epoch: 11   Global Step: 63110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:18,807-Speed 5474.29 samples/sec   Loss 4.1130   LearningRate 0.0198   Epoch: 11   Global Step: 63120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:20,633-Speed 5609.36 samples/sec   Loss 3.9275   LearningRate 0.0198   Epoch: 11   Global Step: 63130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:22,464-Speed 5595.25 samples/sec   Loss 4.0088   LearningRate 0.0198   Epoch: 11   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:24,285-Speed 5626.90 samples/sec   Loss 4.1730   LearningRate 0.0198   Epoch: 11   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:26,105-Speed 5626.29 samples/sec   Loss 3.8849   LearningRate 0.0198   Epoch: 11   Global Step: 63160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:27,919-Speed 5646.45 samples/sec   Loss 4.0218   LearningRate 0.0198   Epoch: 11   Global Step: 63170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:29,747-Speed 5604.90 samples/sec   Loss 3.9826   LearningRate 0.0198   Epoch: 11   Global Step: 63180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:26:31,559-Speed 5652.14 samples/sec   Loss 3.8888   LearningRate 0.0197   Epoch: 11   Global Step: 63190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:26:33,370-Speed 5655.23 samples/sec   Loss 3.9418   LearningRate 0.0197   Epoch: 11   Global Step: 63200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:26:35,188-Speed 5636.87 samples/sec   Loss 4.1116   LearningRate 0.0197   Epoch: 11   Global Step: 63210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:26:37,014-Speed 5610.54 samples/sec   Loss 4.0272   LearningRate 0.0197   Epoch: 11   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:38,858-Speed 5554.44 samples/sec   Loss 4.1030   LearningRate 0.0197   Epoch: 11   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:40,680-Speed 5621.23 samples/sec   Loss 4.0592   LearningRate 0.0197   Epoch: 11   Global Step: 63240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:42,521-Speed 5564.90 samples/sec   Loss 4.0461   LearningRate 0.0197   Epoch: 11   Global Step: 63250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:44,343-Speed 5622.46 samples/sec   Loss 4.0424   LearningRate 0.0197   Epoch: 11   Global Step: 63260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:46,158-Speed 5643.86 samples/sec   Loss 4.0179   LearningRate 0.0197   Epoch: 11   Global Step: 63270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:47,970-Speed 5653.97 samples/sec   Loss 4.0952   LearningRate 0.0197   Epoch: 11   Global Step: 63280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:49,788-Speed 5633.28 samples/sec   Loss 4.0781   LearningRate 0.0197   Epoch: 11   Global Step: 63290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:51,597-Speed 5661.48 samples/sec   Loss 4.1533   LearningRate 0.0197   Epoch: 11   Global Step: 63300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:53,405-Speed 5665.74 samples/sec   Loss 4.1865   LearningRate 0.0196   Epoch: 11   Global Step: 63310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:55,226-Speed 5627.51 samples/sec   Loss 3.8831   LearningRate 0.0196   Epoch: 11   Global Step: 63320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:57,040-Speed 5644.85 samples/sec   Loss 4.1552   LearningRate 0.0196   Epoch: 11   Global Step: 63330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:26:58,867-Speed 5608.57 samples/sec   Loss 4.1336   LearningRate 0.0196   Epoch: 11   Global Step: 63340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:00,701-Speed 5585.15 samples/sec   Loss 3.9843   LearningRate 0.0196   Epoch: 11   Global Step: 63350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:02,543-Speed 5559.51 samples/sec   Loss 3.9905   LearningRate 0.0196   Epoch: 11   Global Step: 63360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:04,350-Speed 5671.58 samples/sec   Loss 4.0433   LearningRate 0.0196   Epoch: 11   Global Step: 63370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:06,157-Speed 5666.56 samples/sec   Loss 3.9115   LearningRate 0.0196   Epoch: 11   Global Step: 63380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:07,990-Speed 5589.32 samples/sec   Loss 3.9862   LearningRate 0.0196   Epoch: 11   Global Step: 63390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:09,855-Speed 5490.59 samples/sec   Loss 3.9803   LearningRate 0.0196   Epoch: 11   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:11,676-Speed 5625.56 samples/sec   Loss 3.9852   LearningRate 0.0196   Epoch: 11   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:13,493-Speed 5639.13 samples/sec   Loss 4.0906   LearningRate 0.0196   Epoch: 11   Global Step: 63420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:27:15,298-Speed 5673.10 samples/sec   Loss 3.9095   LearningRate 0.0196   Epoch: 11   Global Step: 63430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:17,115-Speed 5637.32 samples/sec   Loss 4.0761   LearningRate 0.0195   Epoch: 11   Global Step: 63440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:18,939-Speed 5616.14 samples/sec   Loss 4.1187   LearningRate 0.0195   Epoch: 11   Global Step: 63450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:20,756-Speed 5639.14 samples/sec   Loss 4.0010   LearningRate 0.0195   Epoch: 11   Global Step: 63460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:22,568-Speed 5652.05 samples/sec   Loss 3.9719   LearningRate 0.0195   Epoch: 11   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:24,406-Speed 5575.85 samples/sec   Loss 4.0177   LearningRate 0.0195   Epoch: 11   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:26,213-Speed 5667.56 samples/sec   Loss 3.9930   LearningRate 0.0195   Epoch: 11   Global Step: 63490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:28,055-Speed 5560.04 samples/sec   Loss 4.1699   LearningRate 0.0195   Epoch: 11   Global Step: 63500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:29,861-Speed 5671.56 samples/sec   Loss 4.0923   LearningRate 0.0195   Epoch: 11   Global Step: 63510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:31,669-Speed 5667.47 samples/sec   Loss 4.1405   LearningRate 0.0195   Epoch: 11   Global Step: 63520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:33,477-Speed 5664.41 samples/sec   Loss 4.2071   LearningRate 0.0195   Epoch: 11   Global Step: 63530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:35,284-Speed 5670.24 samples/sec   Loss 4.1218   LearningRate 0.0195   Epoch: 11   Global Step: 63540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:37,090-Speed 5668.93 samples/sec   Loss 4.1105   LearningRate 0.0195   Epoch: 11   Global Step: 63550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:38,900-Speed 5659.62 samples/sec   Loss 4.0404   LearningRate 0.0195   Epoch: 11   Global Step: 63560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:40,705-Speed 5674.43 samples/sec   Loss 4.1002   LearningRate 0.0194   Epoch: 11   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:42,571-Speed 5489.74 samples/sec   Loss 3.9883   LearningRate 0.0194   Epoch: 11   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:44,437-Speed 5489.87 samples/sec   Loss 4.0967   LearningRate 0.0194   Epoch: 11   Global Step: 63590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:46,256-Speed 5633.60 samples/sec   Loss 4.1694   LearningRate 0.0194   Epoch: 11   Global Step: 63600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:48,093-Speed 5574.23 samples/sec   Loss 4.0558   LearningRate 0.0194   Epoch: 11   Global Step: 63610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:27:49,939-Speed 5548.46 samples/sec   Loss 4.1712   LearningRate 0.0194   Epoch: 11   Global Step: 63620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:51,767-Speed 5604.32 samples/sec   Loss 4.0259   LearningRate 0.0194   Epoch: 11   Global Step: 63630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:53,595-Speed 5602.92 samples/sec   Loss 4.1087   LearningRate 0.0194   Epoch: 11   Global Step: 63640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:55,427-Speed 5594.36 samples/sec   Loss 4.0275   LearningRate 0.0194   Epoch: 11   Global Step: 63650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:57,253-Speed 5607.55 samples/sec   Loss 4.0594   LearningRate 0.0194   Epoch: 11   Global Step: 63660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:27:59,070-Speed 5637.81 samples/sec   Loss 4.1048   LearningRate 0.0194   Epoch: 11   Global Step: 63670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:00,906-Speed 5577.61 samples/sec   Loss 4.1519   LearningRate 0.0194   Epoch: 11   Global Step: 63680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:02,724-Speed 5637.05 samples/sec   Loss 4.1429   LearningRate 0.0194   Epoch: 11   Global Step: 63690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:04,570-Speed 5548.63 samples/sec   Loss 4.1736   LearningRate 0.0193   Epoch: 11   Global Step: 63700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:06,464-Speed 5407.00 samples/sec   Loss 4.1476   LearningRate 0.0193   Epoch: 11   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:08,283-Speed 5634.10 samples/sec   Loss 4.1286   LearningRate 0.0193   Epoch: 11   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:10,112-Speed 5599.92 samples/sec   Loss 4.1550   LearningRate 0.0193   Epoch: 11   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:11,928-Speed 5641.56 samples/sec   Loss 4.0745   LearningRate 0.0193   Epoch: 11   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:13,734-Speed 5669.77 samples/sec   Loss 4.1118   LearningRate 0.0193   Epoch: 11   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:15,571-Speed 5575.52 samples/sec   Loss 4.1296   LearningRate 0.0193   Epoch: 11   Global Step: 63760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:17,384-Speed 5652.88 samples/sec   Loss 4.1628   LearningRate 0.0193   Epoch: 11   Global Step: 63770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:19,219-Speed 5579.59 samples/sec   Loss 4.1129   LearningRate 0.0193   Epoch: 11   Global Step: 63780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:21,027-Speed 5665.57 samples/sec   Loss 4.1635   LearningRate 0.0193   Epoch: 11   Global Step: 63790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:22,835-Speed 5667.65 samples/sec   Loss 4.0915   LearningRate 0.0193   Epoch: 11   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:24,647-Speed 5651.59 samples/sec   Loss 4.0543   LearningRate 0.0193   Epoch: 11   Global Step: 63810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:26,466-Speed 5631.34 samples/sec   Loss 4.0008   LearningRate 0.0193   Epoch: 11   Global Step: 63820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:28:28,282-Speed 5640.91 samples/sec   Loss 4.1615   LearningRate 0.0192   Epoch: 11   Global Step: 63830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:28:30,076-Speed 5709.13 samples/sec   Loss 4.0345   LearningRate 0.0192   Epoch: 11   Global Step: 63840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:31,901-Speed 5613.95 samples/sec   Loss 4.0253   LearningRate 0.0192   Epoch: 11   Global Step: 63850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:33,722-Speed 5626.69 samples/sec   Loss 4.0967   LearningRate 0.0192   Epoch: 11   Global Step: 63860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:35,542-Speed 5626.22 samples/sec   Loss 4.1207   LearningRate 0.0192   Epoch: 11   Global Step: 63870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:37,361-Speed 5631.14 samples/sec   Loss 4.0881   LearningRate 0.0192   Epoch: 11   Global Step: 63880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:39,191-Speed 5598.47 samples/sec   Loss 4.0871   LearningRate 0.0192   Epoch: 11   Global Step: 63890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:41,028-Speed 5577.67 samples/sec   Loss 4.1055   LearningRate 0.0192   Epoch: 11   Global Step: 63900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:42,860-Speed 5590.97 samples/sec   Loss 4.0241   LearningRate 0.0192   Epoch: 11   Global Step: 63910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:44,706-Speed 5546.34 samples/sec   Loss 4.0654   LearningRate 0.0192   Epoch: 11   Global Step: 63920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:46,532-Speed 5612.92 samples/sec   Loss 4.0299   LearningRate 0.0192   Epoch: 11   Global Step: 63930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:48,354-Speed 5621.73 samples/sec   Loss 4.2390   LearningRate 0.0192   Epoch: 11   Global Step: 63940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:50,173-Speed 5631.16 samples/sec   Loss 3.9651   LearningRate 0.0192   Epoch: 11   Global Step: 63950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:52,002-Speed 5600.98 samples/sec   Loss 4.0771   LearningRate 0.0191   Epoch: 11   Global Step: 63960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:53,834-Speed 5590.27 samples/sec   Loss 4.1275   LearningRate 0.0191   Epoch: 11   Global Step: 63970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:55,666-Speed 5590.69 samples/sec   Loss 4.1831   LearningRate 0.0191   Epoch: 11   Global Step: 63980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:57,499-Speed 5587.63 samples/sec   Loss 4.1252   LearningRate 0.0191   Epoch: 11   Global Step: 63990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:28:59,326-Speed 5608.62 samples/sec   Loss 4.1103   LearningRate 0.0191   Epoch: 11   Global Step: 64000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:29:25,328-[lfw][64000]XNorm: 22.442670
Training: 2022-04-27 05:29:25,329-[lfw][64000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-27 05:29:25,329-[lfw][64000]Accuracy-Highest: 0.99800
Training: 2022-04-27 05:29:55,458-[cfp_fp][64000]XNorm: 20.453871
Training: 2022-04-27 05:29:55,459-[cfp_fp][64000]Accuracy-Flip: 0.96386+-0.00965
Training: 2022-04-27 05:29:55,459-[cfp_fp][64000]Accuracy-Highest: 0.96386
Training: 2022-04-27 05:30:21,661-[agedb_30][64000]XNorm: 22.180480
Training: 2022-04-27 05:30:21,662-[agedb_30][64000]Accuracy-Flip: 0.97517+-0.00929
Training: 2022-04-27 05:30:21,662-[agedb_30][64000]Accuracy-Highest: 0.97667
Training: 2022-04-27 05:30:23,502-Speed 121.65 samples/sec   Loss 4.1059   LearningRate 0.0191   Epoch: 11   Global Step: 64010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:25,326-Speed 5615.28 samples/sec   Loss 4.1771   LearningRate 0.0191   Epoch: 11   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:27,131-Speed 5675.95 samples/sec   Loss 4.1508   LearningRate 0.0191   Epoch: 11   Global Step: 64030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:28,948-Speed 5638.49 samples/sec   Loss 4.0892   LearningRate 0.0191   Epoch: 11   Global Step: 64040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:30,774-Speed 5609.30 samples/sec   Loss 4.2496   LearningRate 0.0191   Epoch: 11   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:32,618-Speed 5554.25 samples/sec   Loss 4.0958   LearningRate 0.0191   Epoch: 11   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:34,426-Speed 5667.13 samples/sec   Loss 4.2357   LearningRate 0.0191   Epoch: 11   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:36,236-Speed 5657.60 samples/sec   Loss 4.0600   LearningRate 0.0191   Epoch: 11   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:38,107-Speed 5474.41 samples/sec   Loss 4.1062   LearningRate 0.0190   Epoch: 11   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:39,934-Speed 5607.40 samples/sec   Loss 4.0405   LearningRate 0.0190   Epoch: 11   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:41,840-Speed 5372.90 samples/sec   Loss 4.1547   LearningRate 0.0190   Epoch: 11   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:43,694-Speed 5525.19 samples/sec   Loss 4.2374   LearningRate 0.0190   Epoch: 11   Global Step: 64120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:45,529-Speed 5581.89 samples/sec   Loss 4.1658   LearningRate 0.0190   Epoch: 11   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:47,345-Speed 5641.91 samples/sec   Loss 4.1129   LearningRate 0.0190   Epoch: 11   Global Step: 64140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:30:49,197-Speed 5530.78 samples/sec   Loss 4.2278   LearningRate 0.0190   Epoch: 11   Global Step: 64150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:51,048-Speed 5534.94 samples/sec   Loss 4.0131   LearningRate 0.0190   Epoch: 11   Global Step: 64160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:52,904-Speed 5519.25 samples/sec   Loss 4.1590   LearningRate 0.0190   Epoch: 11   Global Step: 64170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:54,740-Speed 5577.87 samples/sec   Loss 4.3001   LearningRate 0.0190   Epoch: 11   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:56,610-Speed 5477.77 samples/sec   Loss 3.9834   LearningRate 0.0190   Epoch: 11   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:30:58,456-Speed 5550.74 samples/sec   Loss 4.1320   LearningRate 0.0190   Epoch: 11   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:00,269-Speed 5650.18 samples/sec   Loss 4.1945   LearningRate 0.0190   Epoch: 11   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:02,072-Speed 5680.64 samples/sec   Loss 4.1701   LearningRate 0.0189   Epoch: 11   Global Step: 64220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:03,891-Speed 5630.62 samples/sec   Loss 4.1292   LearningRate 0.0189   Epoch: 11   Global Step: 64230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:05,710-Speed 5632.86 samples/sec   Loss 4.2427   LearningRate 0.0189   Epoch: 11   Global Step: 64240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:07,546-Speed 5578.85 samples/sec   Loss 4.1388   LearningRate 0.0189   Epoch: 11   Global Step: 64250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:09,366-Speed 5625.98 samples/sec   Loss 4.1862   LearningRate 0.0189   Epoch: 11   Global Step: 64260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:11,186-Speed 5629.23 samples/sec   Loss 4.1106   LearningRate 0.0189   Epoch: 11   Global Step: 64270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:13,044-Speed 5513.20 samples/sec   Loss 4.1223   LearningRate 0.0189   Epoch: 11   Global Step: 64280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:14,861-Speed 5636.80 samples/sec   Loss 4.1109   LearningRate 0.0189   Epoch: 11   Global Step: 64290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:16,686-Speed 5613.32 samples/sec   Loss 4.3019   LearningRate 0.0189   Epoch: 11   Global Step: 64300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:18,511-Speed 5613.49 samples/sec   Loss 4.0709   LearningRate 0.0189   Epoch: 11   Global Step: 64310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:31:20,340-Speed 5600.67 samples/sec   Loss 4.3000   LearningRate 0.0189   Epoch: 11   Global Step: 64320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:22,155-Speed 5642.89 samples/sec   Loss 4.2332   LearningRate 0.0189   Epoch: 11   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:23,995-Speed 5567.46 samples/sec   Loss 4.1391   LearningRate 0.0189   Epoch: 11   Global Step: 64340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:25,825-Speed 5597.27 samples/sec   Loss 4.0957   LearningRate 0.0188   Epoch: 11   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:27,717-Speed 5416.17 samples/sec   Loss 4.1636   LearningRate 0.0188   Epoch: 11   Global Step: 64360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:29,541-Speed 5615.20 samples/sec   Loss 4.1003   LearningRate 0.0188   Epoch: 11   Global Step: 64370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:31,369-Speed 5602.56 samples/sec   Loss 4.0951   LearningRate 0.0188   Epoch: 11   Global Step: 64380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:33,217-Speed 5542.22 samples/sec   Loss 4.0991   LearningRate 0.0188   Epoch: 11   Global Step: 64390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:35,052-Speed 5581.73 samples/sec   Loss 4.1137   LearningRate 0.0188   Epoch: 11   Global Step: 64400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:36,879-Speed 5607.40 samples/sec   Loss 4.2700   LearningRate 0.0188   Epoch: 11   Global Step: 64410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:38,758-Speed 5452.77 samples/sec   Loss 4.1361   LearningRate 0.0188   Epoch: 11   Global Step: 64420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:31:40,594-Speed 5579.73 samples/sec   Loss 4.0916   LearningRate 0.0188   Epoch: 11   Global Step: 64430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:31:42,419-Speed 5610.55 samples/sec   Loss 4.1231   LearningRate 0.0188   Epoch: 11   Global Step: 64440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:44,252-Speed 5589.86 samples/sec   Loss 4.1385   LearningRate 0.0188   Epoch: 11   Global Step: 64450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:46,115-Speed 5497.89 samples/sec   Loss 4.1720   LearningRate 0.0188   Epoch: 11   Global Step: 64460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:47,968-Speed 5528.21 samples/sec   Loss 4.1653   LearningRate 0.0188   Epoch: 11   Global Step: 64470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:49,785-Speed 5636.48 samples/sec   Loss 4.1952   LearningRate 0.0187   Epoch: 11   Global Step: 64480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:51,617-Speed 5592.19 samples/sec   Loss 4.2180   LearningRate 0.0187   Epoch: 11   Global Step: 64490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:53,461-Speed 5554.11 samples/sec   Loss 4.1090   LearningRate 0.0187   Epoch: 11   Global Step: 64500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:55,309-Speed 5544.05 samples/sec   Loss 4.2547   LearningRate 0.0187   Epoch: 11   Global Step: 64510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:57,139-Speed 5599.03 samples/sec   Loss 4.1167   LearningRate 0.0187   Epoch: 11   Global Step: 64520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:31:58,970-Speed 5592.35 samples/sec   Loss 4.1619   LearningRate 0.0187   Epoch: 11   Global Step: 64530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:00,795-Speed 5614.55 samples/sec   Loss 4.0080   LearningRate 0.0187   Epoch: 11   Global Step: 64540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:32:02,610-Speed 5641.35 samples/sec   Loss 4.2670   LearningRate 0.0187   Epoch: 11   Global Step: 64550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:04,466-Speed 5519.47 samples/sec   Loss 4.1675   LearningRate 0.0187   Epoch: 11   Global Step: 64560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:06,302-Speed 5581.42 samples/sec   Loss 4.1814   LearningRate 0.0187   Epoch: 11   Global Step: 64570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:08,140-Speed 5572.30 samples/sec   Loss 4.1475   LearningRate 0.0187   Epoch: 11   Global Step: 64580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:10,062-Speed 5330.31 samples/sec   Loss 4.1850   LearningRate 0.0187   Epoch: 11   Global Step: 64590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:11,897-Speed 5582.26 samples/sec   Loss 4.1584   LearningRate 0.0187   Epoch: 11   Global Step: 64600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:13,761-Speed 5492.88 samples/sec   Loss 4.1018   LearningRate 0.0186   Epoch: 11   Global Step: 64610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:15,588-Speed 5608.55 samples/sec   Loss 4.1626   LearningRate 0.0186   Epoch: 11   Global Step: 64620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:17,408-Speed 5629.30 samples/sec   Loss 4.2070   LearningRate 0.0186   Epoch: 11   Global Step: 64630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:19,242-Speed 5584.79 samples/sec   Loss 4.1697   LearningRate 0.0186   Epoch: 11   Global Step: 64640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:21,060-Speed 5635.27 samples/sec   Loss 4.1142   LearningRate 0.0186   Epoch: 11   Global Step: 64650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:22,896-Speed 5579.57 samples/sec   Loss 4.0567   LearningRate 0.0186   Epoch: 11   Global Step: 64660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:24,724-Speed 5602.58 samples/sec   Loss 4.0652   LearningRate 0.0186   Epoch: 11   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:26,562-Speed 5572.64 samples/sec   Loss 4.2346   LearningRate 0.0186   Epoch: 11   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:28,395-Speed 5590.12 samples/sec   Loss 4.1418   LearningRate 0.0186   Epoch: 11   Global Step: 64690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:30,237-Speed 5559.60 samples/sec   Loss 4.1059   LearningRate 0.0186   Epoch: 11   Global Step: 64700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:32,051-Speed 5645.19 samples/sec   Loss 4.2225   LearningRate 0.0186   Epoch: 11   Global Step: 64710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:33,860-Speed 5662.38 samples/sec   Loss 4.1183   LearningRate 0.0186   Epoch: 11   Global Step: 64720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:35,692-Speed 5594.31 samples/sec   Loss 4.1162   LearningRate 0.0186   Epoch: 11   Global Step: 64730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:37,522-Speed 5596.55 samples/sec   Loss 4.0905   LearningRate 0.0186   Epoch: 11   Global Step: 64740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:39,327-Speed 5675.74 samples/sec   Loss 4.0654   LearningRate 0.0185   Epoch: 11   Global Step: 64750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:41,149-Speed 5621.66 samples/sec   Loss 4.1524   LearningRate 0.0185   Epoch: 11   Global Step: 64760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:42,956-Speed 5669.19 samples/sec   Loss 4.1784   LearningRate 0.0185   Epoch: 11   Global Step: 64770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:44,770-Speed 5645.84 samples/sec   Loss 4.0272   LearningRate 0.0185   Epoch: 11   Global Step: 64780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:46,588-Speed 5634.71 samples/sec   Loss 4.1407   LearningRate 0.0185   Epoch: 11   Global Step: 64790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:48,410-Speed 5624.08 samples/sec   Loss 4.0798   LearningRate 0.0185   Epoch: 11   Global Step: 64800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:50,223-Speed 5648.48 samples/sec   Loss 4.1817   LearningRate 0.0185   Epoch: 11   Global Step: 64810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:52,074-Speed 5533.74 samples/sec   Loss 4.1974   LearningRate 0.0185   Epoch: 11   Global Step: 64820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:53,896-Speed 5620.44 samples/sec   Loss 4.2074   LearningRate 0.0185   Epoch: 11   Global Step: 64830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:55,707-Speed 5656.56 samples/sec   Loss 4.1970   LearningRate 0.0185   Epoch: 11   Global Step: 64840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:57,513-Speed 5673.10 samples/sec   Loss 4.1508   LearningRate 0.0185   Epoch: 11   Global Step: 64850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:32:59,342-Speed 5599.92 samples/sec   Loss 4.2833   LearningRate 0.0185   Epoch: 11   Global Step: 64860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:01,160-Speed 5634.18 samples/sec   Loss 4.0664   LearningRate 0.0185   Epoch: 11   Global Step: 64870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:02,980-Speed 5629.78 samples/sec   Loss 4.1610   LearningRate 0.0184   Epoch: 11   Global Step: 64880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:04,821-Speed 5562.83 samples/sec   Loss 4.1911   LearningRate 0.0184   Epoch: 11   Global Step: 64890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:06,641-Speed 5629.59 samples/sec   Loss 4.2842   LearningRate 0.0184   Epoch: 11   Global Step: 64900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:08,447-Speed 5671.81 samples/sec   Loss 4.1578   LearningRate 0.0184   Epoch: 11   Global Step: 64910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:10,275-Speed 5603.35 samples/sec   Loss 4.0800   LearningRate 0.0184   Epoch: 11   Global Step: 64920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:12,097-Speed 5621.10 samples/sec   Loss 4.1625   LearningRate 0.0184   Epoch: 11   Global Step: 64930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:13,921-Speed 5615.59 samples/sec   Loss 4.1888   LearningRate 0.0184   Epoch: 11   Global Step: 64940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:15,746-Speed 5614.91 samples/sec   Loss 4.1677   LearningRate 0.0184   Epoch: 11   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:17,558-Speed 5653.63 samples/sec   Loss 4.1744   LearningRate 0.0184   Epoch: 11   Global Step: 64960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:19,372-Speed 5646.81 samples/sec   Loss 4.1425   LearningRate 0.0184   Epoch: 11   Global Step: 64970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:21,205-Speed 5585.86 samples/sec   Loss 4.1784   LearningRate 0.0184   Epoch: 11   Global Step: 64980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:23,059-Speed 5524.44 samples/sec   Loss 4.0353   LearningRate 0.0184   Epoch: 11   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:24,902-Speed 5558.62 samples/sec   Loss 4.1691   LearningRate 0.0184   Epoch: 11   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:26,725-Speed 5618.79 samples/sec   Loss 4.1127   LearningRate 0.0183   Epoch: 11   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:28,577-Speed 5532.51 samples/sec   Loss 4.1771   LearningRate 0.0183   Epoch: 11   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:30,509-Speed 5301.06 samples/sec   Loss 4.1369   LearningRate 0.0183   Epoch: 11   Global Step: 65030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:32,443-Speed 5298.71 samples/sec   Loss 4.2642   LearningRate 0.0183   Epoch: 11   Global Step: 65040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:34,285-Speed 5558.58 samples/sec   Loss 4.1399   LearningRate 0.0183   Epoch: 11   Global Step: 65050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:33:36,115-Speed 5597.16 samples/sec   Loss 4.1215   LearningRate 0.0183   Epoch: 11   Global Step: 65060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:33:37,920-Speed 5676.03 samples/sec   Loss 4.1928   LearningRate 0.0183   Epoch: 11   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:39,757-Speed 5574.80 samples/sec   Loss 4.1224   LearningRate 0.0183   Epoch: 11   Global Step: 65080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:41,572-Speed 5645.48 samples/sec   Loss 4.0452   LearningRate 0.0183   Epoch: 11   Global Step: 65090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:43,411-Speed 5570.16 samples/sec   Loss 4.3086   LearningRate 0.0183   Epoch: 11   Global Step: 65100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:45,234-Speed 5619.14 samples/sec   Loss 4.2315   LearningRate 0.0183   Epoch: 11   Global Step: 65110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:47,057-Speed 5618.43 samples/sec   Loss 4.1127   LearningRate 0.0183   Epoch: 11   Global Step: 65120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:48,869-Speed 5654.15 samples/sec   Loss 4.1508   LearningRate 0.0183   Epoch: 11   Global Step: 65130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:50,689-Speed 5627.13 samples/sec   Loss 4.0349   LearningRate 0.0182   Epoch: 11   Global Step: 65140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:52,531-Speed 5562.28 samples/sec   Loss 4.0835   LearningRate 0.0182   Epoch: 11   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:54,405-Speed 5466.65 samples/sec   Loss 4.1570   LearningRate 0.0182   Epoch: 11   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:33:56,299-Speed 5405.95 samples/sec   Loss 4.0571   LearningRate 0.0182   Epoch: 11   Global Step: 65170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:33:58,231-Speed 5302.11 samples/sec   Loss 4.2298   LearningRate 0.0182   Epoch: 11   Global Step: 65180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:34:00,084-Speed 5530.47 samples/sec   Loss 4.2315   LearningRate 0.0182   Epoch: 11   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:01,900-Speed 5639.20 samples/sec   Loss 4.1537   LearningRate 0.0182   Epoch: 11   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:03,797-Speed 5398.62 samples/sec   Loss 4.1004   LearningRate 0.0182   Epoch: 11   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:05,723-Speed 5318.73 samples/sec   Loss 4.1306   LearningRate 0.0182   Epoch: 11   Global Step: 65220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:07,649-Speed 5318.32 samples/sec   Loss 4.2766   LearningRate 0.0182   Epoch: 11   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:09,528-Speed 5452.88 samples/sec   Loss 4.2521   LearningRate 0.0182   Epoch: 11   Global Step: 65240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:11,445-Speed 5345.13 samples/sec   Loss 4.1292   LearningRate 0.0182   Epoch: 11   Global Step: 65250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:13,300-Speed 5521.30 samples/sec   Loss 4.2607   LearningRate 0.0182   Epoch: 11   Global Step: 65260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:15,143-Speed 5556.69 samples/sec   Loss 4.1133   LearningRate 0.0182   Epoch: 11   Global Step: 65270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:16,984-Speed 5565.67 samples/sec   Loss 4.1245   LearningRate 0.0181   Epoch: 11   Global Step: 65280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:18,803-Speed 5631.39 samples/sec   Loss 4.0628   LearningRate 0.0181   Epoch: 11   Global Step: 65290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:34:20,603-Speed 5693.10 samples/sec   Loss 4.2005   LearningRate 0.0181   Epoch: 11   Global Step: 65300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:22,416-Speed 5649.83 samples/sec   Loss 4.1739   LearningRate 0.0181   Epoch: 11   Global Step: 65310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:24,253-Speed 5576.97 samples/sec   Loss 4.1283   LearningRate 0.0181   Epoch: 11   Global Step: 65320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:26,075-Speed 5620.98 samples/sec   Loss 4.1365   LearningRate 0.0181   Epoch: 11   Global Step: 65330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:27,908-Speed 5590.12 samples/sec   Loss 4.1409   LearningRate 0.0181   Epoch: 11   Global Step: 65340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:29,727-Speed 5629.39 samples/sec   Loss 4.1912   LearningRate 0.0181   Epoch: 11   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:31,539-Speed 5655.23 samples/sec   Loss 4.1067   LearningRate 0.0181   Epoch: 11   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:33,364-Speed 5613.65 samples/sec   Loss 4.3103   LearningRate 0.0181   Epoch: 11   Global Step: 65370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:35,174-Speed 5658.12 samples/sec   Loss 4.1033   LearningRate 0.0181   Epoch: 11   Global Step: 65380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:36,999-Speed 5612.22 samples/sec   Loss 4.2638   LearningRate 0.0181   Epoch: 11   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:38,806-Speed 5669.53 samples/sec   Loss 4.0970   LearningRate 0.0181   Epoch: 11   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:40,647-Speed 5563.85 samples/sec   Loss 4.1349   LearningRate 0.0180   Epoch: 11   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:42,569-Speed 5330.94 samples/sec   Loss 4.1269   LearningRate 0.0180   Epoch: 11   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:44,383-Speed 5647.15 samples/sec   Loss 4.1085   LearningRate 0.0180   Epoch: 11   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:46,193-Speed 5658.31 samples/sec   Loss 4.0675   LearningRate 0.0180   Epoch: 11   Global Step: 65440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:48,028-Speed 5581.12 samples/sec   Loss 4.1550   LearningRate 0.0180   Epoch: 11   Global Step: 65450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:49,854-Speed 5610.04 samples/sec   Loss 4.1283   LearningRate 0.0180   Epoch: 11   Global Step: 65460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:51,687-Speed 5589.46 samples/sec   Loss 4.1895   LearningRate 0.0180   Epoch: 11   Global Step: 65470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:53,534-Speed 5544.94 samples/sec   Loss 4.1257   LearningRate 0.0180   Epoch: 11   Global Step: 65480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:55,380-Speed 5549.98 samples/sec   Loss 4.0664   LearningRate 0.0180   Epoch: 11   Global Step: 65490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:34:57,209-Speed 5599.86 samples/sec   Loss 4.0663   LearningRate 0.0180   Epoch: 11   Global Step: 65500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:34:59,035-Speed 5611.17 samples/sec   Loss 4.1951   LearningRate 0.0180   Epoch: 11   Global Step: 65510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:00,863-Speed 5603.71 samples/sec   Loss 4.1415   LearningRate 0.0180   Epoch: 11   Global Step: 65520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:02,673-Speed 5657.20 samples/sec   Loss 4.0903   LearningRate 0.0180   Epoch: 11   Global Step: 65530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:04,488-Speed 5646.13 samples/sec   Loss 4.1982   LearningRate 0.0179   Epoch: 11   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:06,318-Speed 5596.12 samples/sec   Loss 4.1841   LearningRate 0.0179   Epoch: 11   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:08,157-Speed 5570.92 samples/sec   Loss 4.2248   LearningRate 0.0179   Epoch: 11   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:09,965-Speed 5663.13 samples/sec   Loss 4.1415   LearningRate 0.0179   Epoch: 11   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:11,782-Speed 5638.16 samples/sec   Loss 4.1264   LearningRate 0.0179   Epoch: 11   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:13,619-Speed 5576.94 samples/sec   Loss 3.9973   LearningRate 0.0179   Epoch: 11   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:15,438-Speed 5630.86 samples/sec   Loss 4.0831   LearningRate 0.0179   Epoch: 11   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:17,239-Speed 5689.08 samples/sec   Loss 4.2105   LearningRate 0.0179   Epoch: 11   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:19,053-Speed 5647.63 samples/sec   Loss 4.1282   LearningRate 0.0179   Epoch: 11   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:20,883-Speed 5596.41 samples/sec   Loss 4.1283   LearningRate 0.0179   Epoch: 11   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:22,714-Speed 5593.83 samples/sec   Loss 4.1811   LearningRate 0.0179   Epoch: 11   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:24,547-Speed 5589.09 samples/sec   Loss 4.1638   LearningRate 0.0179   Epoch: 11   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:26,358-Speed 5654.04 samples/sec   Loss 4.0798   LearningRate 0.0179   Epoch: 11   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:28,171-Speed 5650.65 samples/sec   Loss 4.2358   LearningRate 0.0179   Epoch: 11   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:29,987-Speed 5641.83 samples/sec   Loss 4.0980   LearningRate 0.0178   Epoch: 11   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:31,809-Speed 5620.44 samples/sec   Loss 4.1575   LearningRate 0.0178   Epoch: 11   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:33,626-Speed 5637.20 samples/sec   Loss 4.1981   LearningRate 0.0178   Epoch: 11   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:35,451-Speed 5615.98 samples/sec   Loss 4.1157   LearningRate 0.0178   Epoch: 11   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:37,286-Speed 5580.83 samples/sec   Loss 4.1302   LearningRate 0.0178   Epoch: 11   Global Step: 65720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:39,115-Speed 5600.69 samples/sec   Loss 4.0947   LearningRate 0.0178   Epoch: 11   Global Step: 65730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:40,925-Speed 5661.85 samples/sec   Loss 4.1745   LearningRate 0.0178   Epoch: 11   Global Step: 65740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:42,749-Speed 5615.79 samples/sec   Loss 4.0883   LearningRate 0.0178   Epoch: 11   Global Step: 65750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:44,560-Speed 5653.71 samples/sec   Loss 4.0839   LearningRate 0.0178   Epoch: 11   Global Step: 65760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:46,391-Speed 5594.89 samples/sec   Loss 4.2247   LearningRate 0.0178   Epoch: 11   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:48,226-Speed 5581.27 samples/sec   Loss 4.1872   LearningRate 0.0178   Epoch: 11   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:50,052-Speed 5611.79 samples/sec   Loss 4.2334   LearningRate 0.0178   Epoch: 11   Global Step: 65790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:51,879-Speed 5605.30 samples/sec   Loss 4.1301   LearningRate 0.0178   Epoch: 11   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:53,717-Speed 5572.39 samples/sec   Loss 4.0461   LearningRate 0.0177   Epoch: 11   Global Step: 65810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:35:55,531-Speed 5646.22 samples/sec   Loss 4.1582   LearningRate 0.0177   Epoch: 11   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:57,349-Speed 5635.47 samples/sec   Loss 4.1553   LearningRate 0.0177   Epoch: 11   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:35:59,156-Speed 5668.85 samples/sec   Loss 4.1825   LearningRate 0.0177   Epoch: 11   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:00,976-Speed 5630.32 samples/sec   Loss 4.2167   LearningRate 0.0177   Epoch: 11   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:02,786-Speed 5659.99 samples/sec   Loss 4.2121   LearningRate 0.0177   Epoch: 11   Global Step: 65860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:04,606-Speed 5628.38 samples/sec   Loss 4.0992   LearningRate 0.0177   Epoch: 11   Global Step: 65870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:06,434-Speed 5601.82 samples/sec   Loss 3.9387   LearningRate 0.0177   Epoch: 11   Global Step: 65880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:08,253-Speed 5631.74 samples/sec   Loss 4.2410   LearningRate 0.0177   Epoch: 11   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:10,095-Speed 5560.91 samples/sec   Loss 4.0943   LearningRate 0.0177   Epoch: 11   Global Step: 65900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:11,926-Speed 5593.73 samples/sec   Loss 4.0715   LearningRate 0.0177   Epoch: 11   Global Step: 65910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:13,740-Speed 5645.74 samples/sec   Loss 4.0910   LearningRate 0.0177   Epoch: 11   Global Step: 65920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:36:15,546-Speed 5673.31 samples/sec   Loss 3.9667   LearningRate 0.0177   Epoch: 11   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:17,360-Speed 5645.09 samples/sec   Loss 4.1747   LearningRate 0.0177   Epoch: 11   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:19,174-Speed 5647.67 samples/sec   Loss 4.1379   LearningRate 0.0176   Epoch: 11   Global Step: 65950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:20,986-Speed 5652.83 samples/sec   Loss 4.0623   LearningRate 0.0176   Epoch: 11   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:22,855-Speed 5484.23 samples/sec   Loss 4.1739   LearningRate 0.0176   Epoch: 11   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:24,688-Speed 5587.86 samples/sec   Loss 4.0967   LearningRate 0.0176   Epoch: 11   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:26,525-Speed 5576.15 samples/sec   Loss 4.1166   LearningRate 0.0176   Epoch: 11   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:28,383-Speed 5512.95 samples/sec   Loss 4.0829   LearningRate 0.0176   Epoch: 11   Global Step: 66000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:36:58,473-[lfw][66000]XNorm: 22.807010
Training: 2022-04-27 05:36:58,474-[lfw][66000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-04-27 05:36:58,474-[lfw][66000]Accuracy-Highest: 0.99800
Training: 2022-04-27 05:37:28,797-[cfp_fp][66000]XNorm: 20.418062
Training: 2022-04-27 05:37:28,797-[cfp_fp][66000]Accuracy-Flip: 0.96314+-0.00722
Training: 2022-04-27 05:37:28,798-[cfp_fp][66000]Accuracy-Highest: 0.96386
Training: 2022-04-27 05:37:54,975-[agedb_30][66000]XNorm: 22.413620
Training: 2022-04-27 05:37:54,976-[agedb_30][66000]Accuracy-Flip: 0.97817+-0.00797
Training: 2022-04-27 05:37:54,976-[agedb_30][66000]Accuracy-Highest: 0.97817
Training: 2022-04-27 05:37:56,810-Speed 115.80 samples/sec   Loss 4.0261   LearningRate 0.0176   Epoch: 11   Global Step: 66010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:37:58,616-Speed 5673.71 samples/sec   Loss 4.0662   LearningRate 0.0176   Epoch: 11   Global Step: 66020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:00,423-Speed 5668.34 samples/sec   Loss 4.1404   LearningRate 0.0176   Epoch: 11   Global Step: 66030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:02,272-Speed 5540.91 samples/sec   Loss 4.1219   LearningRate 0.0176   Epoch: 11   Global Step: 66040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:04,098-Speed 5609.85 samples/sec   Loss 4.1520   LearningRate 0.0176   Epoch: 11   Global Step: 66050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:05,928-Speed 5596.93 samples/sec   Loss 4.0330   LearningRate 0.0176   Epoch: 11   Global Step: 66060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:07,733-Speed 5673.43 samples/sec   Loss 4.0165   LearningRate 0.0176   Epoch: 11   Global Step: 66070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:09,536-Speed 5682.91 samples/sec   Loss 4.0177   LearningRate 0.0175   Epoch: 11   Global Step: 66080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:11,345-Speed 5661.61 samples/sec   Loss 4.1571   LearningRate 0.0175   Epoch: 11   Global Step: 66090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:13,171-Speed 5610.08 samples/sec   Loss 4.0840   LearningRate 0.0175   Epoch: 11   Global Step: 66100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:14,981-Speed 5658.32 samples/sec   Loss 4.1630   LearningRate 0.0175   Epoch: 11   Global Step: 66110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:16,795-Speed 5645.66 samples/sec   Loss 4.1560   LearningRate 0.0175   Epoch: 11   Global Step: 66120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:18,619-Speed 5617.33 samples/sec   Loss 4.1818   LearningRate 0.0175   Epoch: 11   Global Step: 66130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:38:20,445-Speed 5610.42 samples/sec   Loss 4.1469   LearningRate 0.0175   Epoch: 11   Global Step: 66140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:22,269-Speed 5615.28 samples/sec   Loss 4.0392   LearningRate 0.0175   Epoch: 11   Global Step: 66150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:24,089-Speed 5629.83 samples/sec   Loss 4.1283   LearningRate 0.0175   Epoch: 11   Global Step: 66160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:25,918-Speed 5600.98 samples/sec   Loss 4.2726   LearningRate 0.0175   Epoch: 11   Global Step: 66170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:27,737-Speed 5628.73 samples/sec   Loss 4.1914   LearningRate 0.0175   Epoch: 11   Global Step: 66180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:29,554-Speed 5639.96 samples/sec   Loss 4.0146   LearningRate 0.0175   Epoch: 11   Global Step: 66190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:31,380-Speed 5608.49 samples/sec   Loss 4.1421   LearningRate 0.0175   Epoch: 11   Global Step: 66200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:33,201-Speed 5624.89 samples/sec   Loss 4.1473   LearningRate 0.0175   Epoch: 11   Global Step: 66210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:35,016-Speed 5643.27 samples/sec   Loss 4.1222   LearningRate 0.0174   Epoch: 11   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:36,832-Speed 5642.93 samples/sec   Loss 4.1242   LearningRate 0.0174   Epoch: 11   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:38,634-Speed 5682.86 samples/sec   Loss 4.1407   LearningRate 0.0174   Epoch: 11   Global Step: 66240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:40,443-Speed 5664.07 samples/sec   Loss 4.0492   LearningRate 0.0174   Epoch: 11   Global Step: 66250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:42,262-Speed 5629.52 samples/sec   Loss 4.1135   LearningRate 0.0174   Epoch: 11   Global Step: 66260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:44,075-Speed 5651.95 samples/sec   Loss 4.1630   LearningRate 0.0174   Epoch: 11   Global Step: 66270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:45,894-Speed 5631.42 samples/sec   Loss 3.9906   LearningRate 0.0174   Epoch: 11   Global Step: 66280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:47,718-Speed 5614.41 samples/sec   Loss 4.1569   LearningRate 0.0174   Epoch: 11   Global Step: 66290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:49,548-Speed 5598.48 samples/sec   Loss 4.1094   LearningRate 0.0174   Epoch: 11   Global Step: 66300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:51,362-Speed 5646.28 samples/sec   Loss 4.0198   LearningRate 0.0174   Epoch: 11   Global Step: 66310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:53,192-Speed 5596.71 samples/sec   Loss 4.1570   LearningRate 0.0174   Epoch: 11   Global Step: 66320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:55,018-Speed 5610.04 samples/sec   Loss 4.0664   LearningRate 0.0174   Epoch: 11   Global Step: 66330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:38:56,836-Speed 5633.88 samples/sec   Loss 4.1547   LearningRate 0.0174   Epoch: 11   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:38:58,659-Speed 5620.70 samples/sec   Loss 3.9959   LearningRate 0.0174   Epoch: 11   Global Step: 66350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:39:00,467-Speed 5663.47 samples/sec   Loss 4.0559   LearningRate 0.0173   Epoch: 11   Global Step: 66360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:02,304-Speed 5578.75 samples/sec   Loss 4.0537   LearningRate 0.0173   Epoch: 11   Global Step: 66370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:04,119-Speed 5642.50 samples/sec   Loss 4.0636   LearningRate 0.0173   Epoch: 11   Global Step: 66380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:05,966-Speed 5546.24 samples/sec   Loss 4.1413   LearningRate 0.0173   Epoch: 11   Global Step: 66390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:07,785-Speed 5633.12 samples/sec   Loss 3.9739   LearningRate 0.0173   Epoch: 11   Global Step: 66400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:09,613-Speed 5604.12 samples/sec   Loss 4.0612   LearningRate 0.0173   Epoch: 11   Global Step: 66410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:11,445-Speed 5591.68 samples/sec   Loss 4.2173   LearningRate 0.0173   Epoch: 11   Global Step: 66420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:13,260-Speed 5641.04 samples/sec   Loss 4.0281   LearningRate 0.0173   Epoch: 11   Global Step: 66430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:15,090-Speed 5599.57 samples/sec   Loss 4.0722   LearningRate 0.0173   Epoch: 11   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:16,933-Speed 5555.61 samples/sec   Loss 4.1937   LearningRate 0.0173   Epoch: 11   Global Step: 66450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:18,758-Speed 5613.10 samples/sec   Loss 4.0819   LearningRate 0.0173   Epoch: 11   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:39:20,587-Speed 5599.33 samples/sec   Loss 4.1622   LearningRate 0.0173   Epoch: 11   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:39:22,416-Speed 5602.68 samples/sec   Loss 4.0290   LearningRate 0.0173   Epoch: 11   Global Step: 66480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:39:24,236-Speed 5626.42 samples/sec   Loss 4.0413   LearningRate 0.0172   Epoch: 11   Global Step: 66490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:39:26,051-Speed 5644.32 samples/sec   Loss 4.1780   LearningRate 0.0172   Epoch: 11   Global Step: 66500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:27,867-Speed 5643.10 samples/sec   Loss 4.2047   LearningRate 0.0172   Epoch: 11   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:29,677-Speed 5657.18 samples/sec   Loss 4.1066   LearningRate 0.0172   Epoch: 11   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:31,504-Speed 5608.39 samples/sec   Loss 4.0607   LearningRate 0.0172   Epoch: 11   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:33,318-Speed 5645.38 samples/sec   Loss 4.0003   LearningRate 0.0172   Epoch: 11   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:35,133-Speed 5643.66 samples/sec   Loss 4.0947   LearningRate 0.0172   Epoch: 11   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:36,954-Speed 5627.59 samples/sec   Loss 3.9493   LearningRate 0.0172   Epoch: 11   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:38,837-Speed 5438.95 samples/sec   Loss 3.9587   LearningRate 0.0172   Epoch: 11   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:40,656-Speed 5630.79 samples/sec   Loss 4.0140   LearningRate 0.0172   Epoch: 11   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:42,499-Speed 5557.39 samples/sec   Loss 4.0564   LearningRate 0.0172   Epoch: 11   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:44,336-Speed 5575.51 samples/sec   Loss 4.1433   LearningRate 0.0172   Epoch: 11   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:46,151-Speed 5643.87 samples/sec   Loss 4.1818   LearningRate 0.0172   Epoch: 11   Global Step: 66610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:47,978-Speed 5608.76 samples/sec   Loss 4.1383   LearningRate 0.0172   Epoch: 11   Global Step: 66620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:49,786-Speed 5666.65 samples/sec   Loss 3.9265   LearningRate 0.0171   Epoch: 11   Global Step: 66630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:51,616-Speed 5595.01 samples/sec   Loss 4.0750   LearningRate 0.0171   Epoch: 11   Global Step: 66640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:53,426-Speed 5660.22 samples/sec   Loss 4.0406   LearningRate 0.0171   Epoch: 11   Global Step: 66650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:55,236-Speed 5661.80 samples/sec   Loss 4.0491   LearningRate 0.0171   Epoch: 11   Global Step: 66660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:57,062-Speed 5609.35 samples/sec   Loss 4.0872   LearningRate 0.0171   Epoch: 11   Global Step: 66670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:39:58,877-Speed 5642.74 samples/sec   Loss 4.0494   LearningRate 0.0171   Epoch: 11   Global Step: 66680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:00,689-Speed 5651.43 samples/sec   Loss 4.0618   LearningRate 0.0171   Epoch: 11   Global Step: 66690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:02,517-Speed 5605.37 samples/sec   Loss 4.0408   LearningRate 0.0171   Epoch: 11   Global Step: 66700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:40:04,338-Speed 5626.09 samples/sec   Loss 4.0910   LearningRate 0.0171   Epoch: 11   Global Step: 66710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:06,159-Speed 5622.76 samples/sec   Loss 4.0789   LearningRate 0.0171   Epoch: 11   Global Step: 66720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:07,982-Speed 5618.39 samples/sec   Loss 4.0634   LearningRate 0.0171   Epoch: 11   Global Step: 66730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:09,799-Speed 5638.33 samples/sec   Loss 4.0833   LearningRate 0.0171   Epoch: 11   Global Step: 66740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:11,607-Speed 5667.44 samples/sec   Loss 4.1838   LearningRate 0.0171   Epoch: 11   Global Step: 66750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:13,432-Speed 5611.21 samples/sec   Loss 4.0560   LearningRate 0.0171   Epoch: 11   Global Step: 66760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:15,240-Speed 5668.04 samples/sec   Loss 4.1470   LearningRate 0.0170   Epoch: 11   Global Step: 66770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:17,061-Speed 5625.25 samples/sec   Loss 4.1857   LearningRate 0.0170   Epoch: 11   Global Step: 66780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:18,886-Speed 5612.25 samples/sec   Loss 4.0526   LearningRate 0.0170   Epoch: 11   Global Step: 66790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:20,721-Speed 5579.97 samples/sec   Loss 4.1550   LearningRate 0.0170   Epoch: 11   Global Step: 66800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:22,542-Speed 5626.03 samples/sec   Loss 4.1369   LearningRate 0.0170   Epoch: 11   Global Step: 66810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:24,354-Speed 5654.48 samples/sec   Loss 4.0127   LearningRate 0.0170   Epoch: 11   Global Step: 66820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:26,171-Speed 5637.20 samples/sec   Loss 4.0808   LearningRate 0.0170   Epoch: 11   Global Step: 66830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:27,986-Speed 5641.86 samples/sec   Loss 4.1512   LearningRate 0.0170   Epoch: 11   Global Step: 66840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:29,803-Speed 5637.65 samples/sec   Loss 4.1790   LearningRate 0.0170   Epoch: 11   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:31,613-Speed 5661.97 samples/sec   Loss 3.9810   LearningRate 0.0170   Epoch: 11   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:33,433-Speed 5626.31 samples/sec   Loss 4.0135   LearningRate 0.0170   Epoch: 11   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:35,249-Speed 5642.41 samples/sec   Loss 4.0844   LearningRate 0.0170   Epoch: 11   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:37,070-Speed 5623.32 samples/sec   Loss 4.1064   LearningRate 0.0170   Epoch: 11   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:38,894-Speed 5616.31 samples/sec   Loss 4.0336   LearningRate 0.0170   Epoch: 11   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:40,720-Speed 5609.27 samples/sec   Loss 4.0717   LearningRate 0.0169   Epoch: 11   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:40:42,533-Speed 5651.78 samples/sec   Loss 4.0174   LearningRate 0.0169   Epoch: 11   Global Step: 66920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:44,365-Speed 5591.16 samples/sec   Loss 4.0409   LearningRate 0.0169   Epoch: 11   Global Step: 66930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:46,203-Speed 5572.03 samples/sec   Loss 3.9655   LearningRate 0.0169   Epoch: 11   Global Step: 66940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:48,015-Speed 5653.67 samples/sec   Loss 4.1258   LearningRate 0.0169   Epoch: 11   Global Step: 66950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:49,851-Speed 5579.04 samples/sec   Loss 4.0823   LearningRate 0.0169   Epoch: 11   Global Step: 66960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:51,684-Speed 5587.61 samples/sec   Loss 4.0760   LearningRate 0.0169   Epoch: 11   Global Step: 66970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:53,514-Speed 5597.84 samples/sec   Loss 4.1326   LearningRate 0.0169   Epoch: 11   Global Step: 66980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:55,347-Speed 5588.86 samples/sec   Loss 4.1049   LearningRate 0.0169   Epoch: 11   Global Step: 66990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:57,161-Speed 5646.28 samples/sec   Loss 4.0624   LearningRate 0.0169   Epoch: 11   Global Step: 67000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:40:58,976-Speed 5646.33 samples/sec   Loss 4.0650   LearningRate 0.0169   Epoch: 11   Global Step: 67010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:41:00,803-Speed 5606.90 samples/sec   Loss 4.0703   LearningRate 0.0169   Epoch: 11   Global Step: 67020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:02,630-Speed 5604.85 samples/sec   Loss 4.0422   LearningRate 0.0169   Epoch: 11   Global Step: 67030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:04,452-Speed 5623.22 samples/sec   Loss 4.2379   LearningRate 0.0168   Epoch: 11   Global Step: 67040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:06,298-Speed 5546.88 samples/sec   Loss 3.9984   LearningRate 0.0168   Epoch: 11   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:08,119-Speed 5627.36 samples/sec   Loss 3.9991   LearningRate 0.0168   Epoch: 11   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:09,941-Speed 5620.00 samples/sec   Loss 4.1391   LearningRate 0.0168   Epoch: 11   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:11,751-Speed 5659.88 samples/sec   Loss 4.2303   LearningRate 0.0168   Epoch: 11   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:13,571-Speed 5627.77 samples/sec   Loss 4.0349   LearningRate 0.0168   Epoch: 11   Global Step: 67090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:15,393-Speed 5622.04 samples/sec   Loss 4.1243   LearningRate 0.0168   Epoch: 11   Global Step: 67100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:17,219-Speed 5610.04 samples/sec   Loss 4.1459   LearningRate 0.0168   Epoch: 11   Global Step: 67110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:19,045-Speed 5610.13 samples/sec   Loss 4.0427   LearningRate 0.0168   Epoch: 11   Global Step: 67120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:41:20,877-Speed 5593.19 samples/sec   Loss 4.0164   LearningRate 0.0168   Epoch: 11   Global Step: 67130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:41:22,705-Speed 5603.24 samples/sec   Loss 4.0731   LearningRate 0.0168   Epoch: 11   Global Step: 67140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:24,535-Speed 5596.29 samples/sec   Loss 3.9351   LearningRate 0.0168   Epoch: 11   Global Step: 67150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:26,367-Speed 5592.32 samples/sec   Loss 3.9976   LearningRate 0.0168   Epoch: 11   Global Step: 67160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:28,208-Speed 5564.68 samples/sec   Loss 4.0873   LearningRate 0.0168   Epoch: 11   Global Step: 67170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:30,051-Speed 5557.44 samples/sec   Loss 4.0600   LearningRate 0.0167   Epoch: 11   Global Step: 67180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:31,887-Speed 5577.62 samples/sec   Loss 3.9802   LearningRate 0.0167   Epoch: 11   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:33,719-Speed 5591.87 samples/sec   Loss 4.1020   LearningRate 0.0167   Epoch: 11   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:35,544-Speed 5614.21 samples/sec   Loss 4.1500   LearningRate 0.0167   Epoch: 11   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:37,367-Speed 5617.23 samples/sec   Loss 3.9144   LearningRate 0.0167   Epoch: 11   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:39,189-Speed 5625.28 samples/sec   Loss 4.0076   LearningRate 0.0167   Epoch: 11   Global Step: 67230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:41,008-Speed 5631.11 samples/sec   Loss 4.0859   LearningRate 0.0167   Epoch: 11   Global Step: 67240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:41:42,815-Speed 5667.06 samples/sec   Loss 4.2236   LearningRate 0.0167   Epoch: 11   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:44,635-Speed 5627.82 samples/sec   Loss 4.0382   LearningRate 0.0167   Epoch: 11   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:46,450-Speed 5643.72 samples/sec   Loss 4.1873   LearningRate 0.0167   Epoch: 11   Global Step: 67270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:48,267-Speed 5639.11 samples/sec   Loss 4.0605   LearningRate 0.0167   Epoch: 11   Global Step: 67280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:50,081-Speed 5644.78 samples/sec   Loss 4.1379   LearningRate 0.0167   Epoch: 11   Global Step: 67290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:51,900-Speed 5634.04 samples/sec   Loss 3.8987   LearningRate 0.0167   Epoch: 11   Global Step: 67300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:53,734-Speed 5585.17 samples/sec   Loss 3.9920   LearningRate 0.0167   Epoch: 11   Global Step: 67310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:55,555-Speed 5624.89 samples/sec   Loss 4.1673   LearningRate 0.0166   Epoch: 11   Global Step: 67320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:57,384-Speed 5601.26 samples/sec   Loss 4.0126   LearningRate 0.0166   Epoch: 11   Global Step: 67330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:41:59,196-Speed 5651.98 samples/sec   Loss 4.0759   LearningRate 0.0166   Epoch: 11   Global Step: 67340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:01,014-Speed 5634.43 samples/sec   Loss 4.0003   LearningRate 0.0166   Epoch: 11   Global Step: 67350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:42:02,858-Speed 5554.49 samples/sec   Loss 4.0894   LearningRate 0.0166   Epoch: 11   Global Step: 67360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:42:04,683-Speed 5614.59 samples/sec   Loss 4.1011   LearningRate 0.0166   Epoch: 11   Global Step: 67370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:42:06,514-Speed 5593.06 samples/sec   Loss 4.0125   LearningRate 0.0166   Epoch: 11   Global Step: 67380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:42:08,334-Speed 5629.26 samples/sec   Loss 4.1065   LearningRate 0.0166   Epoch: 11   Global Step: 67390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:42:10,140-Speed 5671.60 samples/sec   Loss 4.0033   LearningRate 0.0166   Epoch: 11   Global Step: 67400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:11,950-Speed 5658.59 samples/sec   Loss 3.8839   LearningRate 0.0166   Epoch: 11   Global Step: 67410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:13,786-Speed 5580.54 samples/sec   Loss 4.0206   LearningRate 0.0166   Epoch: 11   Global Step: 67420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:15,596-Speed 5658.90 samples/sec   Loss 4.0138   LearningRate 0.0166   Epoch: 11   Global Step: 67430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:17,429-Speed 5588.91 samples/sec   Loss 4.0183   LearningRate 0.0166   Epoch: 11   Global Step: 67440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:19,241-Speed 5653.41 samples/sec   Loss 4.1507   LearningRate 0.0166   Epoch: 11   Global Step: 67450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:21,054-Speed 5649.53 samples/sec   Loss 4.0132   LearningRate 0.0165   Epoch: 11   Global Step: 67460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:22,866-Speed 5651.48 samples/sec   Loss 4.0787   LearningRate 0.0165   Epoch: 11   Global Step: 67470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:24,688-Speed 5623.90 samples/sec   Loss 4.0942   LearningRate 0.0165   Epoch: 11   Global Step: 67480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:26,526-Speed 5574.13 samples/sec   Loss 3.9828   LearningRate 0.0165   Epoch: 11   Global Step: 67490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:28,337-Speed 5655.85 samples/sec   Loss 3.9299   LearningRate 0.0165   Epoch: 11   Global Step: 67500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:42:30,141-Speed 5678.00 samples/sec   Loss 4.0806   LearningRate 0.0165   Epoch: 11   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:31,967-Speed 5609.46 samples/sec   Loss 3.9108   LearningRate 0.0165   Epoch: 11   Global Step: 67520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:33,779-Speed 5653.01 samples/sec   Loss 4.0409   LearningRate 0.0165   Epoch: 11   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:35,596-Speed 5635.94 samples/sec   Loss 4.1255   LearningRate 0.0165   Epoch: 11   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:37,425-Speed 5599.83 samples/sec   Loss 4.0674   LearningRate 0.0165   Epoch: 11   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:39,252-Speed 5608.96 samples/sec   Loss 4.0284   LearningRate 0.0165   Epoch: 11   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:41,064-Speed 5653.36 samples/sec   Loss 4.0883   LearningRate 0.0165   Epoch: 11   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:42,880-Speed 5638.99 samples/sec   Loss 4.0772   LearningRate 0.0165   Epoch: 11   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:44,694-Speed 5649.44 samples/sec   Loss 4.1107   LearningRate 0.0165   Epoch: 11   Global Step: 67590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:46,506-Speed 5653.56 samples/sec   Loss 3.9903   LearningRate 0.0164   Epoch: 11   Global Step: 67600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:48,326-Speed 5625.41 samples/sec   Loss 4.0287   LearningRate 0.0164   Epoch: 11   Global Step: 67610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:42:50,135-Speed 5664.80 samples/sec   Loss 3.9983   LearningRate 0.0164   Epoch: 11   Global Step: 67620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:51,955-Speed 5626.23 samples/sec   Loss 4.0533   LearningRate 0.0164   Epoch: 11   Global Step: 67630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:53,770-Speed 5644.76 samples/sec   Loss 4.0781   LearningRate 0.0164   Epoch: 11   Global Step: 67640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:55,581-Speed 5654.73 samples/sec   Loss 4.0782   LearningRate 0.0164   Epoch: 11   Global Step: 67650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:57,401-Speed 5629.75 samples/sec   Loss 3.9586   LearningRate 0.0164   Epoch: 11   Global Step: 67660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:42:59,215-Speed 5646.32 samples/sec   Loss 4.1006   LearningRate 0.0164   Epoch: 11   Global Step: 67670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:01,076-Speed 5503.45 samples/sec   Loss 4.0172   LearningRate 0.0164   Epoch: 11   Global Step: 67680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:02,897-Speed 5626.96 samples/sec   Loss 3.9895   LearningRate 0.0164   Epoch: 11   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:04,748-Speed 5533.94 samples/sec   Loss 4.1810   LearningRate 0.0164   Epoch: 11   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:06,576-Speed 5604.15 samples/sec   Loss 3.9820   LearningRate 0.0164   Epoch: 11   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:08,412-Speed 5579.49 samples/sec   Loss 4.0307   LearningRate 0.0164   Epoch: 11   Global Step: 67720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:43:10,241-Speed 5600.85 samples/sec   Loss 3.9025   LearningRate 0.0164   Epoch: 11   Global Step: 67730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:12,083-Speed 5560.81 samples/sec   Loss 4.0696   LearningRate 0.0163   Epoch: 11   Global Step: 67740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:13,937-Speed 5523.88 samples/sec   Loss 4.1559   LearningRate 0.0163   Epoch: 11   Global Step: 67750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:15,870-Speed 5298.89 samples/sec   Loss 3.9234   LearningRate 0.0163   Epoch: 11   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:17,704-Speed 5587.54 samples/sec   Loss 4.0383   LearningRate 0.0163   Epoch: 11   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:19,544-Speed 5564.73 samples/sec   Loss 4.0319   LearningRate 0.0163   Epoch: 11   Global Step: 67780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:21,367-Speed 5619.41 samples/sec   Loss 4.1196   LearningRate 0.0163   Epoch: 11   Global Step: 67790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:23,195-Speed 5603.74 samples/sec   Loss 3.8739   LearningRate 0.0163   Epoch: 11   Global Step: 67800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:25,014-Speed 5630.46 samples/sec   Loss 3.9422   LearningRate 0.0163   Epoch: 11   Global Step: 67810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:26,836-Speed 5623.44 samples/sec   Loss 4.2219   LearningRate 0.0163   Epoch: 11   Global Step: 67820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:28,644-Speed 5667.18 samples/sec   Loss 4.0331   LearningRate 0.0163   Epoch: 11   Global Step: 67830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:43:30,467-Speed 5619.09 samples/sec   Loss 4.0694   LearningRate 0.0163   Epoch: 11   Global Step: 67840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:32,313-Speed 5547.26 samples/sec   Loss 4.0691   LearningRate 0.0163   Epoch: 11   Global Step: 67850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:34,139-Speed 5609.67 samples/sec   Loss 4.1544   LearningRate 0.0163   Epoch: 11   Global Step: 67860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:35,960-Speed 5625.88 samples/sec   Loss 4.0198   LearningRate 0.0163   Epoch: 11   Global Step: 67870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:37,778-Speed 5635.38 samples/sec   Loss 4.1138   LearningRate 0.0162   Epoch: 11   Global Step: 67880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:39,589-Speed 5655.62 samples/sec   Loss 4.0376   LearningRate 0.0162   Epoch: 11   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:41,400-Speed 5654.51 samples/sec   Loss 3.9660   LearningRate 0.0162   Epoch: 11   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:43,236-Speed 5579.80 samples/sec   Loss 4.1833   LearningRate 0.0162   Epoch: 11   Global Step: 67910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:45,057-Speed 5627.17 samples/sec   Loss 4.0620   LearningRate 0.0162   Epoch: 11   Global Step: 67920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:46,892-Speed 5580.83 samples/sec   Loss 4.0288   LearningRate 0.0162   Epoch: 11   Global Step: 67930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:48,702-Speed 5658.07 samples/sec   Loss 4.0272   LearningRate 0.0162   Epoch: 11   Global Step: 67940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:50,516-Speed 5648.43 samples/sec   Loss 4.1174   LearningRate 0.0162   Epoch: 11   Global Step: 67950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:52,335-Speed 5630.86 samples/sec   Loss 4.0132   LearningRate 0.0162   Epoch: 11   Global Step: 67960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:54,190-Speed 5521.82 samples/sec   Loss 4.0802   LearningRate 0.0162   Epoch: 11   Global Step: 67970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:56,012-Speed 5624.07 samples/sec   Loss 3.9370   LearningRate 0.0162   Epoch: 11   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:57,845-Speed 5587.11 samples/sec   Loss 4.0809   LearningRate 0.0162   Epoch: 11   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:43:59,676-Speed 5593.17 samples/sec   Loss 4.0267   LearningRate 0.0162   Epoch: 11   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:44:26,106-[lfw][68000]XNorm: 22.783929
Training: 2022-04-27 05:44:26,106-[lfw][68000]Accuracy-Flip: 0.99800+-0.00277
Training: 2022-04-27 05:44:26,107-[lfw][68000]Accuracy-Highest: 0.99800
Training: 2022-04-27 05:44:56,279-[cfp_fp][68000]XNorm: 20.524930
Training: 2022-04-27 05:44:56,280-[cfp_fp][68000]Accuracy-Flip: 0.96257+-0.00889
Training: 2022-04-27 05:44:56,280-[cfp_fp][68000]Accuracy-Highest: 0.96386
Training: 2022-04-27 05:45:22,336-[agedb_30][68000]XNorm: 22.781881
Training: 2022-04-27 05:45:22,337-[agedb_30][68000]Accuracy-Flip: 0.97483+-0.00626
Training: 2022-04-27 05:45:22,337-[agedb_30][68000]Accuracy-Highest: 0.97817
Training: 2022-04-27 05:45:24,168-Speed 121.20 samples/sec   Loss 4.0359   LearningRate 0.0162   Epoch: 11   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:25,995-Speed 5606.67 samples/sec   Loss 3.9886   LearningRate 0.0161   Epoch: 11   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:27,825-Speed 5595.44 samples/sec   Loss 4.0592   LearningRate 0.0161   Epoch: 11   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:29,643-Speed 5633.74 samples/sec   Loss 3.9196   LearningRate 0.0161   Epoch: 11   Global Step: 68040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:45:31,446-Speed 5682.79 samples/sec   Loss 3.9861   LearningRate 0.0161   Epoch: 11   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:33,267-Speed 5623.63 samples/sec   Loss 4.0640   LearningRate 0.0161   Epoch: 11   Global Step: 68060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:35,087-Speed 5631.03 samples/sec   Loss 3.9414   LearningRate 0.0161   Epoch: 11   Global Step: 68070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:36,905-Speed 5631.85 samples/sec   Loss 3.9536   LearningRate 0.0161   Epoch: 11   Global Step: 68080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:38,711-Speed 5674.06 samples/sec   Loss 4.0717   LearningRate 0.0161   Epoch: 11   Global Step: 68090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:40,547-Speed 5577.05 samples/sec   Loss 3.8548   LearningRate 0.0161   Epoch: 11   Global Step: 68100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:42,360-Speed 5651.39 samples/sec   Loss 4.0474   LearningRate 0.0161   Epoch: 11   Global Step: 68110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:44,194-Speed 5585.82 samples/sec   Loss 3.9758   LearningRate 0.0161   Epoch: 11   Global Step: 68120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:46,023-Speed 5598.65 samples/sec   Loss 4.0273   LearningRate 0.0161   Epoch: 11   Global Step: 68130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:47,857-Speed 5586.17 samples/sec   Loss 3.9963   LearningRate 0.0161   Epoch: 11   Global Step: 68140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:49,691-Speed 5584.91 samples/sec   Loss 3.9529   LearningRate 0.0161   Epoch: 11   Global Step: 68150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:45:51,564-Speed 5468.99 samples/sec   Loss 4.0381   LearningRate 0.0161   Epoch: 11   Global Step: 68160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:53,378-Speed 5648.85 samples/sec   Loss 3.8302   LearningRate 0.0160   Epoch: 11   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:55,212-Speed 5583.71 samples/sec   Loss 4.0484   LearningRate 0.0160   Epoch: 11   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:57,044-Speed 5591.57 samples/sec   Loss 4.0249   LearningRate 0.0160   Epoch: 11   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:45:58,857-Speed 5650.78 samples/sec   Loss 4.0923   LearningRate 0.0160   Epoch: 11   Global Step: 68200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:00,666-Speed 5661.09 samples/sec   Loss 4.1515   LearningRate 0.0160   Epoch: 11   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:02,479-Speed 5652.58 samples/sec   Loss 3.9643   LearningRate 0.0160   Epoch: 11   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:04,349-Speed 5476.42 samples/sec   Loss 4.0473   LearningRate 0.0160   Epoch: 11   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:17,262-Speed 793.03 samples/sec   Loss 3.5282   LearningRate 0.0160   Epoch: 12   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:19,447-Speed 4688.10 samples/sec   Loss 3.2558   LearningRate 0.0160   Epoch: 12   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:21,284-Speed 5577.22 samples/sec   Loss 3.3061   LearningRate 0.0160   Epoch: 12   Global Step: 68260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:46:23,128-Speed 5556.52 samples/sec   Loss 3.3569   LearningRate 0.0160   Epoch: 12   Global Step: 68270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:46:24,958-Speed 5597.02 samples/sec   Loss 3.3194   LearningRate 0.0160   Epoch: 12   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:26,819-Speed 5505.14 samples/sec   Loss 3.3084   LearningRate 0.0160   Epoch: 12   Global Step: 68290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:28,670-Speed 5532.67 samples/sec   Loss 3.3332   LearningRate 0.0160   Epoch: 12   Global Step: 68300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:30,521-Speed 5535.38 samples/sec   Loss 3.3529   LearningRate 0.0159   Epoch: 12   Global Step: 68310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:32,347-Speed 5609.75 samples/sec   Loss 3.4192   LearningRate 0.0159   Epoch: 12   Global Step: 68320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:34,189-Speed 5559.22 samples/sec   Loss 3.3243   LearningRate 0.0159   Epoch: 12   Global Step: 68330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:36,029-Speed 5566.87 samples/sec   Loss 3.3765   LearningRate 0.0159   Epoch: 12   Global Step: 68340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:37,886-Speed 5516.55 samples/sec   Loss 3.2775   LearningRate 0.0159   Epoch: 12   Global Step: 68350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:39,728-Speed 5560.66 samples/sec   Loss 3.3495   LearningRate 0.0159   Epoch: 12   Global Step: 68360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:41,542-Speed 5646.74 samples/sec   Loss 3.4960   LearningRate 0.0159   Epoch: 12   Global Step: 68370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:43,385-Speed 5557.67 samples/sec   Loss 3.4469   LearningRate 0.0159   Epoch: 12   Global Step: 68380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:45,199-Speed 5647.75 samples/sec   Loss 3.4963   LearningRate 0.0159   Epoch: 12   Global Step: 68390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:47,028-Speed 5602.94 samples/sec   Loss 3.4108   LearningRate 0.0159   Epoch: 12   Global Step: 68400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:48,950-Speed 5328.24 samples/sec   Loss 3.5143   LearningRate 0.0159   Epoch: 12   Global Step: 68410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:50,811-Speed 5504.22 samples/sec   Loss 3.4531   LearningRate 0.0159   Epoch: 12   Global Step: 68420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:52,631-Speed 5628.96 samples/sec   Loss 3.3936   LearningRate 0.0159   Epoch: 12   Global Step: 68430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:54,458-Speed 5606.09 samples/sec   Loss 3.5374   LearningRate 0.0159   Epoch: 12   Global Step: 68440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:56,288-Speed 5599.12 samples/sec   Loss 3.3250   LearningRate 0.0158   Epoch: 12   Global Step: 68450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:58,122-Speed 5582.78 samples/sec   Loss 3.4187   LearningRate 0.0158   Epoch: 12   Global Step: 68460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:46:59,984-Speed 5502.46 samples/sec   Loss 3.3751   LearningRate 0.0158   Epoch: 12   Global Step: 68470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:01,810-Speed 5610.71 samples/sec   Loss 3.3792   LearningRate 0.0158   Epoch: 12   Global Step: 68480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:03,627-Speed 5635.06 samples/sec   Loss 3.4451   LearningRate 0.0158   Epoch: 12   Global Step: 68490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:05,446-Speed 5633.34 samples/sec   Loss 3.3779   LearningRate 0.0158   Epoch: 12   Global Step: 68500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:07,271-Speed 5612.43 samples/sec   Loss 3.4847   LearningRate 0.0158   Epoch: 12   Global Step: 68510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:09,120-Speed 5542.57 samples/sec   Loss 3.3724   LearningRate 0.0158   Epoch: 12   Global Step: 68520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:10,969-Speed 5539.80 samples/sec   Loss 3.3700   LearningRate 0.0158   Epoch: 12   Global Step: 68530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:12,801-Speed 5589.86 samples/sec   Loss 3.4956   LearningRate 0.0158   Epoch: 12   Global Step: 68540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:14,638-Speed 5575.87 samples/sec   Loss 3.4068   LearningRate 0.0158   Epoch: 12   Global Step: 68550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:16,453-Speed 5644.81 samples/sec   Loss 3.5586   LearningRate 0.0158   Epoch: 12   Global Step: 68560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:18,280-Speed 5605.89 samples/sec   Loss 3.4221   LearningRate 0.0158   Epoch: 12   Global Step: 68570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:20,115-Speed 5582.31 samples/sec   Loss 3.4997   LearningRate 0.0158   Epoch: 12   Global Step: 68580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:21,944-Speed 5600.71 samples/sec   Loss 3.5171   LearningRate 0.0157   Epoch: 12   Global Step: 68590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:23,771-Speed 5604.49 samples/sec   Loss 3.4724   LearningRate 0.0157   Epoch: 12   Global Step: 68600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:25,585-Speed 5648.27 samples/sec   Loss 3.4811   LearningRate 0.0157   Epoch: 12   Global Step: 68610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:27,407-Speed 5620.75 samples/sec   Loss 3.4821   LearningRate 0.0157   Epoch: 12   Global Step: 68620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:29,223-Speed 5642.62 samples/sec   Loss 3.4630   LearningRate 0.0157   Epoch: 12   Global Step: 68630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:31,040-Speed 5638.42 samples/sec   Loss 3.6473   LearningRate 0.0157   Epoch: 12   Global Step: 68640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:32,860-Speed 5628.41 samples/sec   Loss 3.4116   LearningRate 0.0157   Epoch: 12   Global Step: 68650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:34,687-Speed 5607.25 samples/sec   Loss 3.4968   LearningRate 0.0157   Epoch: 12   Global Step: 68660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:36,499-Speed 5652.13 samples/sec   Loss 3.6271   LearningRate 0.0157   Epoch: 12   Global Step: 68670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:38,315-Speed 5642.77 samples/sec   Loss 3.5448   LearningRate 0.0157   Epoch: 12   Global Step: 68680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:40,154-Speed 5568.32 samples/sec   Loss 3.5231   LearningRate 0.0157   Epoch: 12   Global Step: 68690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:41,979-Speed 5613.90 samples/sec   Loss 3.6093   LearningRate 0.0157   Epoch: 12   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:43,809-Speed 5594.90 samples/sec   Loss 3.4645   LearningRate 0.0157   Epoch: 12   Global Step: 68710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:45,628-Speed 5634.13 samples/sec   Loss 3.5540   LearningRate 0.0157   Epoch: 12   Global Step: 68720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:47,468-Speed 5566.37 samples/sec   Loss 3.4051   LearningRate 0.0157   Epoch: 12   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:49,294-Speed 5608.24 samples/sec   Loss 3.4567   LearningRate 0.0156   Epoch: 12   Global Step: 68740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:51,115-Speed 5624.57 samples/sec   Loss 3.5605   LearningRate 0.0156   Epoch: 12   Global Step: 68750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:47:52,944-Speed 5601.69 samples/sec   Loss 3.6068   LearningRate 0.0156   Epoch: 12   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:54,767-Speed 5619.23 samples/sec   Loss 3.5128   LearningRate 0.0156   Epoch: 12   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:56,585-Speed 5635.68 samples/sec   Loss 3.6318   LearningRate 0.0156   Epoch: 12   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:47:58,399-Speed 5646.20 samples/sec   Loss 3.5082   LearningRate 0.0156   Epoch: 12   Global Step: 68790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:00,209-Speed 5660.62 samples/sec   Loss 3.5466   LearningRate 0.0156   Epoch: 12   Global Step: 68800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:02,019-Speed 5658.83 samples/sec   Loss 3.6569   LearningRate 0.0156   Epoch: 12   Global Step: 68810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:03,843-Speed 5613.90 samples/sec   Loss 3.5126   LearningRate 0.0156   Epoch: 12   Global Step: 68820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:05,671-Speed 5605.54 samples/sec   Loss 3.4999   LearningRate 0.0156   Epoch: 12   Global Step: 68830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:07,499-Speed 5602.37 samples/sec   Loss 3.6157   LearningRate 0.0156   Epoch: 12   Global Step: 68840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:09,309-Speed 5659.18 samples/sec   Loss 3.5440   LearningRate 0.0156   Epoch: 12   Global Step: 68850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:11,146-Speed 5576.76 samples/sec   Loss 3.5319   LearningRate 0.0156   Epoch: 12   Global Step: 68860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:12,969-Speed 5617.74 samples/sec   Loss 3.5964   LearningRate 0.0156   Epoch: 12   Global Step: 68870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:14,789-Speed 5629.77 samples/sec   Loss 3.5642   LearningRate 0.0155   Epoch: 12   Global Step: 68880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:16,610-Speed 5626.26 samples/sec   Loss 3.6743   LearningRate 0.0155   Epoch: 12   Global Step: 68890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:18,430-Speed 5627.55 samples/sec   Loss 3.5420   LearningRate 0.0155   Epoch: 12   Global Step: 68900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:20,279-Speed 5539.62 samples/sec   Loss 3.4827   LearningRate 0.0155   Epoch: 12   Global Step: 68910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:22,111-Speed 5592.03 samples/sec   Loss 3.5546   LearningRate 0.0155   Epoch: 12   Global Step: 68920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:23,943-Speed 5592.86 samples/sec   Loss 3.5675   LearningRate 0.0155   Epoch: 12   Global Step: 68930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:25,757-Speed 5646.32 samples/sec   Loss 3.5439   LearningRate 0.0155   Epoch: 12   Global Step: 68940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:27,587-Speed 5597.05 samples/sec   Loss 3.5014   LearningRate 0.0155   Epoch: 12   Global Step: 68950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:29,408-Speed 5626.34 samples/sec   Loss 3.6196   LearningRate 0.0155   Epoch: 12   Global Step: 68960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:31,224-Speed 5640.55 samples/sec   Loss 3.5703   LearningRate 0.0155   Epoch: 12   Global Step: 68970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:33,053-Speed 5597.87 samples/sec   Loss 3.5233   LearningRate 0.0155   Epoch: 12   Global Step: 68980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:34,896-Speed 5558.39 samples/sec   Loss 3.5898   LearningRate 0.0155   Epoch: 12   Global Step: 68990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:36,716-Speed 5629.03 samples/sec   Loss 3.4973   LearningRate 0.0155   Epoch: 12   Global Step: 69000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:38,536-Speed 5627.85 samples/sec   Loss 3.6195   LearningRate 0.0155   Epoch: 12   Global Step: 69010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:40,346-Speed 5661.33 samples/sec   Loss 3.5077   LearningRate 0.0155   Epoch: 12   Global Step: 69020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:42,181-Speed 5580.99 samples/sec   Loss 3.5277   LearningRate 0.0154   Epoch: 12   Global Step: 69030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:43,999-Speed 5635.10 samples/sec   Loss 3.5388   LearningRate 0.0154   Epoch: 12   Global Step: 69040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:45,840-Speed 5563.83 samples/sec   Loss 3.6483   LearningRate 0.0154   Epoch: 12   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:47,679-Speed 5571.38 samples/sec   Loss 3.5528   LearningRate 0.0154   Epoch: 12   Global Step: 69060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:48:49,499-Speed 5626.83 samples/sec   Loss 3.5833   LearningRate 0.0154   Epoch: 12   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:51,346-Speed 5545.32 samples/sec   Loss 3.6973   LearningRate 0.0154   Epoch: 12   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:53,162-Speed 5639.86 samples/sec   Loss 3.6703   LearningRate 0.0154   Epoch: 12   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:54,988-Speed 5610.88 samples/sec   Loss 3.5797   LearningRate 0.0154   Epoch: 12   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:56,814-Speed 5608.50 samples/sec   Loss 3.6364   LearningRate 0.0154   Epoch: 12   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:48:58,636-Speed 5622.57 samples/sec   Loss 3.6010   LearningRate 0.0154   Epoch: 12   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:00,511-Speed 5464.78 samples/sec   Loss 3.7065   LearningRate 0.0154   Epoch: 12   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:02,346-Speed 5582.58 samples/sec   Loss 3.5551   LearningRate 0.0154   Epoch: 12   Global Step: 69140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:04,173-Speed 5604.77 samples/sec   Loss 3.5436   LearningRate 0.0154   Epoch: 12   Global Step: 69150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:06,569-Speed 4274.88 samples/sec   Loss 3.5484   LearningRate 0.0154   Epoch: 12   Global Step: 69160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:08,390-Speed 5625.33 samples/sec   Loss 3.6053   LearningRate 0.0153   Epoch: 12   Global Step: 69170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:10,222-Speed 5593.85 samples/sec   Loss 3.5667   LearningRate 0.0153   Epoch: 12   Global Step: 69180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:12,045-Speed 5617.10 samples/sec   Loss 3.5571   LearningRate 0.0153   Epoch: 12   Global Step: 69190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:13,857-Speed 5653.77 samples/sec   Loss 3.4414   LearningRate 0.0153   Epoch: 12   Global Step: 69200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:15,670-Speed 5649.53 samples/sec   Loss 3.5795   LearningRate 0.0153   Epoch: 12   Global Step: 69210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:17,480-Speed 5660.31 samples/sec   Loss 3.5972   LearningRate 0.0153   Epoch: 12   Global Step: 69220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:19,317-Speed 5574.06 samples/sec   Loss 3.6024   LearningRate 0.0153   Epoch: 12   Global Step: 69230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:21,143-Speed 5612.10 samples/sec   Loss 3.5827   LearningRate 0.0153   Epoch: 12   Global Step: 69240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:23,015-Speed 5471.04 samples/sec   Loss 3.6178   LearningRate 0.0153   Epoch: 12   Global Step: 69250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:24,856-Speed 5563.95 samples/sec   Loss 3.5637   LearningRate 0.0153   Epoch: 12   Global Step: 69260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:26,697-Speed 5563.97 samples/sec   Loss 3.5761   LearningRate 0.0153   Epoch: 12   Global Step: 69270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:49:28,542-Speed 5552.15 samples/sec   Loss 3.5583   LearningRate 0.0153   Epoch: 12   Global Step: 69280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:49:30,345-Speed 5683.39 samples/sec   Loss 3.5606   LearningRate 0.0153   Epoch: 12   Global Step: 69290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:32,163-Speed 5632.53 samples/sec   Loss 3.6264   LearningRate 0.0153   Epoch: 12   Global Step: 69300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:33,982-Speed 5632.36 samples/sec   Loss 3.6840   LearningRate 0.0153   Epoch: 12   Global Step: 69310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:35,798-Speed 5638.70 samples/sec   Loss 3.7003   LearningRate 0.0152   Epoch: 12   Global Step: 69320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:37,619-Speed 5625.11 samples/sec   Loss 3.6912   LearningRate 0.0152   Epoch: 12   Global Step: 69330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:39,444-Speed 5613.65 samples/sec   Loss 3.5602   LearningRate 0.0152   Epoch: 12   Global Step: 69340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:41,256-Speed 5653.62 samples/sec   Loss 3.6674   LearningRate 0.0152   Epoch: 12   Global Step: 69350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:43,069-Speed 5651.05 samples/sec   Loss 3.6498   LearningRate 0.0152   Epoch: 12   Global Step: 69360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:44,897-Speed 5601.57 samples/sec   Loss 3.5785   LearningRate 0.0152   Epoch: 12   Global Step: 69370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:46,732-Speed 5582.64 samples/sec   Loss 3.6561   LearningRate 0.0152   Epoch: 12   Global Step: 69380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:48,569-Speed 5578.46 samples/sec   Loss 3.6579   LearningRate 0.0152   Epoch: 12   Global Step: 69390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:49:50,397-Speed 5602.49 samples/sec   Loss 3.5901   LearningRate 0.0152   Epoch: 12   Global Step: 69400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:52,217-Speed 5626.60 samples/sec   Loss 3.6229   LearningRate 0.0152   Epoch: 12   Global Step: 69410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:54,104-Speed 5429.49 samples/sec   Loss 3.7408   LearningRate 0.0152   Epoch: 12   Global Step: 69420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:55,967-Speed 5498.47 samples/sec   Loss 3.6682   LearningRate 0.0152   Epoch: 12   Global Step: 69430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:57,795-Speed 5604.16 samples/sec   Loss 3.6554   LearningRate 0.0152   Epoch: 12   Global Step: 69440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:49:59,626-Speed 5594.20 samples/sec   Loss 3.6683   LearningRate 0.0152   Epoch: 12   Global Step: 69450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:01,456-Speed 5597.86 samples/sec   Loss 3.6127   LearningRate 0.0151   Epoch: 12   Global Step: 69460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:03,290-Speed 5585.24 samples/sec   Loss 3.5913   LearningRate 0.0151   Epoch: 12   Global Step: 69470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:05,120-Speed 5594.99 samples/sec   Loss 3.6389   LearningRate 0.0151   Epoch: 12   Global Step: 69480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:06,963-Speed 5561.57 samples/sec   Loss 3.6674   LearningRate 0.0151   Epoch: 12   Global Step: 69490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:08,791-Speed 5601.96 samples/sec   Loss 3.5311   LearningRate 0.0151   Epoch: 12   Global Step: 69500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:50:10,662-Speed 5475.58 samples/sec   Loss 3.6068   LearningRate 0.0151   Epoch: 12   Global Step: 69510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:50:12,489-Speed 5605.52 samples/sec   Loss 3.6586   LearningRate 0.0151   Epoch: 12   Global Step: 69520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:14,331-Speed 5561.75 samples/sec   Loss 3.5032   LearningRate 0.0151   Epoch: 12   Global Step: 69530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:16,147-Speed 5642.93 samples/sec   Loss 3.7403   LearningRate 0.0151   Epoch: 12   Global Step: 69540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:17,983-Speed 5578.56 samples/sec   Loss 3.6895   LearningRate 0.0151   Epoch: 12   Global Step: 69550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:19,862-Speed 5450.70 samples/sec   Loss 3.7084   LearningRate 0.0151   Epoch: 12   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:21,765-Speed 5381.91 samples/sec   Loss 3.5801   LearningRate 0.0151   Epoch: 12   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:23,584-Speed 5631.17 samples/sec   Loss 3.6862   LearningRate 0.0151   Epoch: 12   Global Step: 69580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:25,409-Speed 5611.64 samples/sec   Loss 3.6328   LearningRate 0.0151   Epoch: 12   Global Step: 69590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:27,232-Speed 5619.18 samples/sec   Loss 3.7559   LearningRate 0.0151   Epoch: 12   Global Step: 69600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:29,067-Speed 5585.21 samples/sec   Loss 3.7037   LearningRate 0.0150   Epoch: 12   Global Step: 69610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:30,881-Speed 5646.97 samples/sec   Loss 3.7406   LearningRate 0.0150   Epoch: 12   Global Step: 69620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:50:32,687-Speed 5670.84 samples/sec   Loss 3.5902   LearningRate 0.0150   Epoch: 12   Global Step: 69630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:34,503-Speed 5639.79 samples/sec   Loss 3.7122   LearningRate 0.0150   Epoch: 12   Global Step: 69640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:36,327-Speed 5618.47 samples/sec   Loss 3.5892   LearningRate 0.0150   Epoch: 12   Global Step: 69650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:38,146-Speed 5629.28 samples/sec   Loss 3.7504   LearningRate 0.0150   Epoch: 12   Global Step: 69660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:39,962-Speed 5642.33 samples/sec   Loss 3.6373   LearningRate 0.0150   Epoch: 12   Global Step: 69670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:41,783-Speed 5624.17 samples/sec   Loss 3.6794   LearningRate 0.0150   Epoch: 12   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:43,618-Speed 5583.03 samples/sec   Loss 3.6664   LearningRate 0.0150   Epoch: 12   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:45,469-Speed 5534.17 samples/sec   Loss 3.6712   LearningRate 0.0150   Epoch: 12   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:47,317-Speed 5540.79 samples/sec   Loss 3.5967   LearningRate 0.0150   Epoch: 12   Global Step: 69710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:49,142-Speed 5615.12 samples/sec   Loss 3.6649   LearningRate 0.0150   Epoch: 12   Global Step: 69720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:50,967-Speed 5613.70 samples/sec   Loss 3.6927   LearningRate 0.0150   Epoch: 12   Global Step: 69730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:52,810-Speed 5557.02 samples/sec   Loss 3.6206   LearningRate 0.0150   Epoch: 12   Global Step: 69740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:54,647-Speed 5577.52 samples/sec   Loss 3.5471   LearningRate 0.0149   Epoch: 12   Global Step: 69750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:56,495-Speed 5542.70 samples/sec   Loss 3.7185   LearningRate 0.0149   Epoch: 12   Global Step: 69760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:50:58,319-Speed 5616.04 samples/sec   Loss 3.7466   LearningRate 0.0149   Epoch: 12   Global Step: 69770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:00,154-Speed 5579.98 samples/sec   Loss 3.6655   LearningRate 0.0149   Epoch: 12   Global Step: 69780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:01,990-Speed 5580.98 samples/sec   Loss 3.6423   LearningRate 0.0149   Epoch: 12   Global Step: 69790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:03,821-Speed 5592.64 samples/sec   Loss 3.6833   LearningRate 0.0149   Epoch: 12   Global Step: 69800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:05,658-Speed 5575.89 samples/sec   Loss 3.6156   LearningRate 0.0149   Epoch: 12   Global Step: 69810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:07,482-Speed 5615.16 samples/sec   Loss 3.6373   LearningRate 0.0149   Epoch: 12   Global Step: 69820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:09,835-Speed 4354.73 samples/sec   Loss 3.6411   LearningRate 0.0149   Epoch: 12   Global Step: 69830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:51:12,581-Speed 3729.70 samples/sec   Loss 3.6334   LearningRate 0.0149   Epoch: 12   Global Step: 69840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:14,758-Speed 4705.16 samples/sec   Loss 3.6394   LearningRate 0.0149   Epoch: 12   Global Step: 69850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:16,585-Speed 5608.55 samples/sec   Loss 3.6380   LearningRate 0.0149   Epoch: 12   Global Step: 69860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:18,415-Speed 5597.31 samples/sec   Loss 3.6918   LearningRate 0.0149   Epoch: 12   Global Step: 69870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:20,236-Speed 5625.05 samples/sec   Loss 3.6183   LearningRate 0.0149   Epoch: 12   Global Step: 69880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:22,073-Speed 5573.62 samples/sec   Loss 3.5748   LearningRate 0.0149   Epoch: 12   Global Step: 69890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:23,904-Speed 5596.14 samples/sec   Loss 3.7036   LearningRate 0.0148   Epoch: 12   Global Step: 69900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:25,747-Speed 5557.19 samples/sec   Loss 3.6520   LearningRate 0.0148   Epoch: 12   Global Step: 69910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:27,594-Speed 5550.18 samples/sec   Loss 3.5940   LearningRate 0.0148   Epoch: 12   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:29,430-Speed 5577.33 samples/sec   Loss 3.7539   LearningRate 0.0148   Epoch: 12   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:31,269-Speed 5571.72 samples/sec   Loss 3.6495   LearningRate 0.0148   Epoch: 12   Global Step: 69940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:51:33,088-Speed 5631.51 samples/sec   Loss 3.5938   LearningRate 0.0148   Epoch: 12   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:34,901-Speed 5649.78 samples/sec   Loss 3.6673   LearningRate 0.0148   Epoch: 12   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:36,719-Speed 5635.71 samples/sec   Loss 3.5350   LearningRate 0.0148   Epoch: 12   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:38,532-Speed 5650.32 samples/sec   Loss 3.6612   LearningRate 0.0148   Epoch: 12   Global Step: 69980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:40,372-Speed 5566.88 samples/sec   Loss 3.6389   LearningRate 0.0148   Epoch: 12   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:51:42,183-Speed 5654.04 samples/sec   Loss 3.6137   LearningRate 0.0148   Epoch: 12   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:52:08,136-[lfw][70000]XNorm: 21.865118
Training: 2022-04-27 05:52:08,136-[lfw][70000]Accuracy-Flip: 0.99717+-0.00308
Training: 2022-04-27 05:52:08,137-[lfw][70000]Accuracy-Highest: 0.99800
Training: 2022-04-27 05:52:38,214-[cfp_fp][70000]XNorm: 19.692456
Training: 2022-04-27 05:52:38,215-[cfp_fp][70000]Accuracy-Flip: 0.96543+-0.00763
Training: 2022-04-27 05:52:38,215-[cfp_fp][70000]Accuracy-Highest: 0.96543
Training: 2022-04-27 05:53:04,170-[agedb_30][70000]XNorm: 21.599249
Training: 2022-04-27 05:53:04,171-[agedb_30][70000]Accuracy-Flip: 0.97917+-0.00724
Training: 2022-04-27 05:53:04,171-[agedb_30][70000]Accuracy-Highest: 0.97917
Training: 2022-04-27 05:53:06,004-Speed 122.17 samples/sec   Loss 3.7741   LearningRate 0.0148   Epoch: 12   Global Step: 70010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:07,812-Speed 5665.03 samples/sec   Loss 3.6307   LearningRate 0.0148   Epoch: 12   Global Step: 70020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:09,632-Speed 5631.02 samples/sec   Loss 3.6379   LearningRate 0.0148   Epoch: 12   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:11,449-Speed 5634.68 samples/sec   Loss 3.7226   LearningRate 0.0148   Epoch: 12   Global Step: 70040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:13,264-Speed 5643.98 samples/sec   Loss 3.6355   LearningRate 0.0147   Epoch: 12   Global Step: 70050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:15,100-Speed 5580.02 samples/sec   Loss 3.6602   LearningRate 0.0147   Epoch: 12   Global Step: 70060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:16,927-Speed 5605.39 samples/sec   Loss 3.6579   LearningRate 0.0147   Epoch: 12   Global Step: 70070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:18,749-Speed 5624.19 samples/sec   Loss 3.6306   LearningRate 0.0147   Epoch: 12   Global Step: 70080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:20,562-Speed 5647.61 samples/sec   Loss 3.6663   LearningRate 0.0147   Epoch: 12   Global Step: 70090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:22,377-Speed 5646.46 samples/sec   Loss 3.8069   LearningRate 0.0147   Epoch: 12   Global Step: 70100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:24,206-Speed 5600.32 samples/sec   Loss 3.6779   LearningRate 0.0147   Epoch: 12   Global Step: 70110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:26,023-Speed 5636.13 samples/sec   Loss 3.7836   LearningRate 0.0147   Epoch: 12   Global Step: 70120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:27,845-Speed 5622.79 samples/sec   Loss 3.6218   LearningRate 0.0147   Epoch: 12   Global Step: 70130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:29,682-Speed 5575.76 samples/sec   Loss 3.5986   LearningRate 0.0147   Epoch: 12   Global Step: 70140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:31,512-Speed 5596.86 samples/sec   Loss 3.6438   LearningRate 0.0147   Epoch: 12   Global Step: 70150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:53:33,348-Speed 5580.45 samples/sec   Loss 3.6329   LearningRate 0.0147   Epoch: 12   Global Step: 70160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:35,171-Speed 5619.89 samples/sec   Loss 3.6440   LearningRate 0.0147   Epoch: 12   Global Step: 70170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:36,995-Speed 5616.16 samples/sec   Loss 3.7462   LearningRate 0.0147   Epoch: 12   Global Step: 70180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:38,855-Speed 5507.36 samples/sec   Loss 3.7331   LearningRate 0.0147   Epoch: 12   Global Step: 70190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:40,734-Speed 5452.03 samples/sec   Loss 3.7226   LearningRate 0.0146   Epoch: 12   Global Step: 70200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:42,632-Speed 5396.96 samples/sec   Loss 3.6787   LearningRate 0.0146   Epoch: 12   Global Step: 70210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:44,488-Speed 5518.04 samples/sec   Loss 3.6771   LearningRate 0.0146   Epoch: 12   Global Step: 70220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:46,308-Speed 5629.57 samples/sec   Loss 3.6538   LearningRate 0.0146   Epoch: 12   Global Step: 70230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:48,124-Speed 5638.25 samples/sec   Loss 3.7374   LearningRate 0.0146   Epoch: 12   Global Step: 70240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:49,934-Speed 5658.67 samples/sec   Loss 3.6491   LearningRate 0.0146   Epoch: 12   Global Step: 70250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:51,762-Speed 5604.70 samples/sec   Loss 3.6023   LearningRate 0.0146   Epoch: 12   Global Step: 70260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:53:53,588-Speed 5609.69 samples/sec   Loss 3.7337   LearningRate 0.0146   Epoch: 12   Global Step: 70270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:55,454-Speed 5491.25 samples/sec   Loss 3.6742   LearningRate 0.0146   Epoch: 12   Global Step: 70280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:57,292-Speed 5571.19 samples/sec   Loss 3.5855   LearningRate 0.0146   Epoch: 12   Global Step: 70290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:53:59,137-Speed 5553.29 samples/sec   Loss 3.6416   LearningRate 0.0146   Epoch: 12   Global Step: 70300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:00,987-Speed 5536.82 samples/sec   Loss 3.6321   LearningRate 0.0146   Epoch: 12   Global Step: 70310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:02,827-Speed 5569.75 samples/sec   Loss 3.6805   LearningRate 0.0146   Epoch: 12   Global Step: 70320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:04,665-Speed 5572.98 samples/sec   Loss 3.6351   LearningRate 0.0146   Epoch: 12   Global Step: 70330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:06,503-Speed 5572.77 samples/sec   Loss 3.8148   LearningRate 0.0146   Epoch: 12   Global Step: 70340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:08,329-Speed 5607.19 samples/sec   Loss 3.6572   LearningRate 0.0145   Epoch: 12   Global Step: 70350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:10,171-Speed 5563.70 samples/sec   Loss 3.7544   LearningRate 0.0145   Epoch: 12   Global Step: 70360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:11,985-Speed 5645.34 samples/sec   Loss 3.6965   LearningRate 0.0145   Epoch: 12   Global Step: 70370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:13,900-Speed 5351.28 samples/sec   Loss 3.5360   LearningRate 0.0145   Epoch: 12   Global Step: 70380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:15,767-Speed 5485.88 samples/sec   Loss 3.6771   LearningRate 0.0145   Epoch: 12   Global Step: 70390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:17,622-Speed 5521.29 samples/sec   Loss 3.5467   LearningRate 0.0145   Epoch: 12   Global Step: 70400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:19,440-Speed 5636.09 samples/sec   Loss 3.6571   LearningRate 0.0145   Epoch: 12   Global Step: 70410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:21,269-Speed 5601.44 samples/sec   Loss 3.5917   LearningRate 0.0145   Epoch: 12   Global Step: 70420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:23,089-Speed 5626.56 samples/sec   Loss 3.5082   LearningRate 0.0145   Epoch: 12   Global Step: 70430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:24,908-Speed 5631.80 samples/sec   Loss 3.7445   LearningRate 0.0145   Epoch: 12   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:26,723-Speed 5642.60 samples/sec   Loss 3.5600   LearningRate 0.0145   Epoch: 12   Global Step: 70450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:28,533-Speed 5658.41 samples/sec   Loss 3.6932   LearningRate 0.0145   Epoch: 12   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:30,352-Speed 5634.03 samples/sec   Loss 3.6207   LearningRate 0.0145   Epoch: 12   Global Step: 70470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:54:32,155-Speed 5681.49 samples/sec   Loss 3.6291   LearningRate 0.0145   Epoch: 12   Global Step: 70480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:33,983-Speed 5603.22 samples/sec   Loss 3.5362   LearningRate 0.0145   Epoch: 12   Global Step: 70490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:35,798-Speed 5644.13 samples/sec   Loss 3.7800   LearningRate 0.0144   Epoch: 12   Global Step: 70500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:37,609-Speed 5654.09 samples/sec   Loss 3.5257   LearningRate 0.0144   Epoch: 12   Global Step: 70510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:39,431-Speed 5622.89 samples/sec   Loss 3.6877   LearningRate 0.0144   Epoch: 12   Global Step: 70520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:41,264-Speed 5587.65 samples/sec   Loss 3.7460   LearningRate 0.0144   Epoch: 12   Global Step: 70530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:43,089-Speed 5614.19 samples/sec   Loss 3.7438   LearningRate 0.0144   Epoch: 12   Global Step: 70540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:44,910-Speed 5624.01 samples/sec   Loss 3.7802   LearningRate 0.0144   Epoch: 12   Global Step: 70550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:46,735-Speed 5614.05 samples/sec   Loss 3.6041   LearningRate 0.0144   Epoch: 12   Global Step: 70560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:48,566-Speed 5595.45 samples/sec   Loss 3.6759   LearningRate 0.0144   Epoch: 12   Global Step: 70570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:50,391-Speed 5611.83 samples/sec   Loss 3.7163   LearningRate 0.0144   Epoch: 12   Global Step: 70580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:52,213-Speed 5622.41 samples/sec   Loss 3.7144   LearningRate 0.0144   Epoch: 12   Global Step: 70590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:54,041-Speed 5603.27 samples/sec   Loss 3.5370   LearningRate 0.0144   Epoch: 12   Global Step: 70600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:55,878-Speed 5575.89 samples/sec   Loss 3.7178   LearningRate 0.0144   Epoch: 12   Global Step: 70610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:57,771-Speed 5410.48 samples/sec   Loss 3.7629   LearningRate 0.0144   Epoch: 12   Global Step: 70620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:54:59,606-Speed 5584.52 samples/sec   Loss 3.6978   LearningRate 0.0144   Epoch: 12   Global Step: 70630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:01,435-Speed 5598.33 samples/sec   Loss 3.6550   LearningRate 0.0144   Epoch: 12   Global Step: 70640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:03,259-Speed 5616.60 samples/sec   Loss 3.6699   LearningRate 0.0143   Epoch: 12   Global Step: 70650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:05,078-Speed 5631.35 samples/sec   Loss 3.6170   LearningRate 0.0143   Epoch: 12   Global Step: 70660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:06,889-Speed 5656.36 samples/sec   Loss 3.5884   LearningRate 0.0143   Epoch: 12   Global Step: 70670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:08,722-Speed 5588.23 samples/sec   Loss 3.6604   LearningRate 0.0143   Epoch: 12   Global Step: 70680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:55:10,530-Speed 5665.02 samples/sec   Loss 3.6583   LearningRate 0.0143   Epoch: 12   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:12,342-Speed 5655.58 samples/sec   Loss 3.6974   LearningRate 0.0143   Epoch: 12   Global Step: 70700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:14,165-Speed 5618.12 samples/sec   Loss 3.7395   LearningRate 0.0143   Epoch: 12   Global Step: 70710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:15,990-Speed 5613.42 samples/sec   Loss 3.6832   LearningRate 0.0143   Epoch: 12   Global Step: 70720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:17,826-Speed 5578.48 samples/sec   Loss 3.7270   LearningRate 0.0143   Epoch: 12   Global Step: 70730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:19,660-Speed 5583.35 samples/sec   Loss 3.6823   LearningRate 0.0143   Epoch: 12   Global Step: 70740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:21,499-Speed 5572.44 samples/sec   Loss 3.5945   LearningRate 0.0143   Epoch: 12   Global Step: 70750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:23,339-Speed 5564.87 samples/sec   Loss 3.5989   LearningRate 0.0143   Epoch: 12   Global Step: 70760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:25,173-Speed 5587.54 samples/sec   Loss 3.6251   LearningRate 0.0143   Epoch: 12   Global Step: 70770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:27,007-Speed 5583.58 samples/sec   Loss 3.7668   LearningRate 0.0143   Epoch: 12   Global Step: 70780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:28,843-Speed 5580.79 samples/sec   Loss 3.6546   LearningRate 0.0143   Epoch: 12   Global Step: 70790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:55:30,658-Speed 5643.28 samples/sec   Loss 3.5438   LearningRate 0.0142   Epoch: 12   Global Step: 70800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:55:32,463-Speed 5676.00 samples/sec   Loss 3.7212   LearningRate 0.0142   Epoch: 12   Global Step: 70810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:34,274-Speed 5655.86 samples/sec   Loss 3.7784   LearningRate 0.0142   Epoch: 12   Global Step: 70820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:36,095-Speed 5624.18 samples/sec   Loss 3.6687   LearningRate 0.0142   Epoch: 12   Global Step: 70830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:37,928-Speed 5589.13 samples/sec   Loss 3.6763   LearningRate 0.0142   Epoch: 12   Global Step: 70840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:39,747-Speed 5631.90 samples/sec   Loss 3.6546   LearningRate 0.0142   Epoch: 12   Global Step: 70850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:41,560-Speed 5649.39 samples/sec   Loss 3.6766   LearningRate 0.0142   Epoch: 12   Global Step: 70860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:43,392-Speed 5592.15 samples/sec   Loss 3.6104   LearningRate 0.0142   Epoch: 12   Global Step: 70870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:45,211-Speed 5630.42 samples/sec   Loss 3.6142   LearningRate 0.0142   Epoch: 12   Global Step: 70880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:47,026-Speed 5645.48 samples/sec   Loss 3.5580   LearningRate 0.0142   Epoch: 12   Global Step: 70890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:48,845-Speed 5630.21 samples/sec   Loss 3.6450   LearningRate 0.0142   Epoch: 12   Global Step: 70900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:50,683-Speed 5574.42 samples/sec   Loss 3.7261   LearningRate 0.0142   Epoch: 12   Global Step: 70910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:52,512-Speed 5598.45 samples/sec   Loss 3.7843   LearningRate 0.0142   Epoch: 12   Global Step: 70920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:55:54,333-Speed 5626.01 samples/sec   Loss 3.6327   LearningRate 0.0142   Epoch: 12   Global Step: 70930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:55:56,162-Speed 5600.54 samples/sec   Loss 3.7608   LearningRate 0.0142   Epoch: 12   Global Step: 70940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:55:57,997-Speed 5581.03 samples/sec   Loss 3.7202   LearningRate 0.0141   Epoch: 12   Global Step: 70950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:55:59,820-Speed 5620.91 samples/sec   Loss 3.6580   LearningRate 0.0141   Epoch: 12   Global Step: 70960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:56:01,670-Speed 5537.19 samples/sec   Loss 3.7076   LearningRate 0.0141   Epoch: 12   Global Step: 70970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:56:03,503-Speed 5588.18 samples/sec   Loss 3.5943   LearningRate 0.0141   Epoch: 12   Global Step: 70980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:56:05,344-Speed 5562.02 samples/sec   Loss 3.5847   LearningRate 0.0141   Epoch: 12   Global Step: 70990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:56:07,165-Speed 5627.68 samples/sec   Loss 3.5361   LearningRate 0.0141   Epoch: 12   Global Step: 71000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:56:08,997-Speed 5590.57 samples/sec   Loss 3.5788   LearningRate 0.0141   Epoch: 12   Global Step: 71010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:56:10,816-Speed 5632.19 samples/sec   Loss 3.8058   LearningRate 0.0141   Epoch: 12   Global Step: 71020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 05:56:12,634-Speed 5633.28 samples/sec   Loss 3.6152   LearningRate 0.0141   Epoch: 12   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:14,509-Speed 5463.70 samples/sec   Loss 3.6652   LearningRate 0.0141   Epoch: 12   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:16,419-Speed 5363.64 samples/sec   Loss 3.7881   LearningRate 0.0141   Epoch: 12   Global Step: 71050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:18,328-Speed 5366.89 samples/sec   Loss 3.5508   LearningRate 0.0141   Epoch: 12   Global Step: 71060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:20,179-Speed 5532.82 samples/sec   Loss 3.7637   LearningRate 0.0141   Epoch: 12   Global Step: 71070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:22,005-Speed 5609.02 samples/sec   Loss 3.6718   LearningRate 0.0141   Epoch: 12   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:23,827-Speed 5623.00 samples/sec   Loss 3.7138   LearningRate 0.0141   Epoch: 12   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:25,664-Speed 5576.30 samples/sec   Loss 3.6612   LearningRate 0.0140   Epoch: 12   Global Step: 71100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:27,486-Speed 5620.74 samples/sec   Loss 3.6472   LearningRate 0.0140   Epoch: 12   Global Step: 71110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:29,309-Speed 5620.12 samples/sec   Loss 3.6166   LearningRate 0.0140   Epoch: 12   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:31,144-Speed 5584.29 samples/sec   Loss 3.6685   LearningRate 0.0140   Epoch: 12   Global Step: 71130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:56:32,954-Speed 5657.51 samples/sec   Loss 3.8220   LearningRate 0.0140   Epoch: 12   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:34,773-Speed 5632.86 samples/sec   Loss 3.6823   LearningRate 0.0140   Epoch: 12   Global Step: 71150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:36,599-Speed 5609.64 samples/sec   Loss 3.6076   LearningRate 0.0140   Epoch: 12   Global Step: 71160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:38,419-Speed 5627.65 samples/sec   Loss 3.7500   LearningRate 0.0140   Epoch: 12   Global Step: 71170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:40,263-Speed 5555.99 samples/sec   Loss 3.7096   LearningRate 0.0140   Epoch: 12   Global Step: 71180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:42,081-Speed 5632.82 samples/sec   Loss 3.5926   LearningRate 0.0140   Epoch: 12   Global Step: 71190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:43,904-Speed 5618.77 samples/sec   Loss 3.7035   LearningRate 0.0140   Epoch: 12   Global Step: 71200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:45,748-Speed 5555.45 samples/sec   Loss 3.7151   LearningRate 0.0140   Epoch: 12   Global Step: 71210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:47,592-Speed 5554.33 samples/sec   Loss 3.5813   LearningRate 0.0140   Epoch: 12   Global Step: 71220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:49,415-Speed 5619.64 samples/sec   Loss 3.7663   LearningRate 0.0140   Epoch: 12   Global Step: 71230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:51,226-Speed 5654.50 samples/sec   Loss 3.5934   LearningRate 0.0140   Epoch: 12   Global Step: 71240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:53,113-Speed 5429.07 samples/sec   Loss 3.7442   LearningRate 0.0139   Epoch: 12   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:55,037-Speed 5324.80 samples/sec   Loss 3.7097   LearningRate 0.0139   Epoch: 12   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:56,870-Speed 5587.77 samples/sec   Loss 3.6126   LearningRate 0.0139   Epoch: 12   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:56:58,697-Speed 5608.33 samples/sec   Loss 3.7716   LearningRate 0.0139   Epoch: 12   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:00,544-Speed 5546.79 samples/sec   Loss 3.7190   LearningRate 0.0139   Epoch: 12   Global Step: 71290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:02,397-Speed 5525.36 samples/sec   Loss 3.6633   LearningRate 0.0139   Epoch: 12   Global Step: 71300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:04,236-Speed 5570.43 samples/sec   Loss 3.6338   LearningRate 0.0139   Epoch: 12   Global Step: 71310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:06,055-Speed 5633.28 samples/sec   Loss 3.6155   LearningRate 0.0139   Epoch: 12   Global Step: 71320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:07,879-Speed 5614.36 samples/sec   Loss 3.6787   LearningRate 0.0139   Epoch: 12   Global Step: 71330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:09,693-Speed 5648.77 samples/sec   Loss 3.6690   LearningRate 0.0139   Epoch: 12   Global Step: 71340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:11,520-Speed 5607.09 samples/sec   Loss 3.6964   LearningRate 0.0139   Epoch: 12   Global Step: 71350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:13,351-Speed 5593.28 samples/sec   Loss 3.6526   LearningRate 0.0139   Epoch: 12   Global Step: 71360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:15,180-Speed 5600.44 samples/sec   Loss 3.6826   LearningRate 0.0139   Epoch: 12   Global Step: 71370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:16,995-Speed 5644.78 samples/sec   Loss 3.8524   LearningRate 0.0139   Epoch: 12   Global Step: 71380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:18,835-Speed 5567.49 samples/sec   Loss 3.6108   LearningRate 0.0139   Epoch: 12   Global Step: 71390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:20,677-Speed 5560.05 samples/sec   Loss 3.6645   LearningRate 0.0138   Epoch: 12   Global Step: 71400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:22,502-Speed 5613.89 samples/sec   Loss 3.6754   LearningRate 0.0138   Epoch: 12   Global Step: 71410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:24,331-Speed 5598.48 samples/sec   Loss 3.6858   LearningRate 0.0138   Epoch: 12   Global Step: 71420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:26,170-Speed 5570.30 samples/sec   Loss 3.6117   LearningRate 0.0138   Epoch: 12   Global Step: 71430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:27,995-Speed 5614.56 samples/sec   Loss 3.6246   LearningRate 0.0138   Epoch: 12   Global Step: 71440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:57:29,800-Speed 5672.96 samples/sec   Loss 3.6506   LearningRate 0.0138   Epoch: 12   Global Step: 71450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:31,639-Speed 5569.94 samples/sec   Loss 3.6090   LearningRate 0.0138   Epoch: 12   Global Step: 71460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:33,474-Speed 5583.30 samples/sec   Loss 3.6147   LearningRate 0.0138   Epoch: 12   Global Step: 71470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:35,315-Speed 5564.34 samples/sec   Loss 3.8033   LearningRate 0.0138   Epoch: 12   Global Step: 71480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:37,241-Speed 5317.99 samples/sec   Loss 3.6407   LearningRate 0.0138   Epoch: 12   Global Step: 71490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:39,164-Speed 5327.50 samples/sec   Loss 3.6099   LearningRate 0.0138   Epoch: 12   Global Step: 71500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:41,004-Speed 5568.85 samples/sec   Loss 3.6246   LearningRate 0.0138   Epoch: 12   Global Step: 71510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:42,847-Speed 5556.16 samples/sec   Loss 3.6070   LearningRate 0.0138   Epoch: 12   Global Step: 71520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:44,666-Speed 5632.16 samples/sec   Loss 3.6909   LearningRate 0.0138   Epoch: 12   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:46,492-Speed 5609.34 samples/sec   Loss 3.6883   LearningRate 0.0138   Epoch: 12   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:48,330-Speed 5575.08 samples/sec   Loss 3.5969   LearningRate 0.0138   Epoch: 12   Global Step: 71550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:57:50,170-Speed 5564.47 samples/sec   Loss 3.6731   LearningRate 0.0137   Epoch: 12   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:52,001-Speed 5594.04 samples/sec   Loss 3.5463   LearningRate 0.0137   Epoch: 12   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:53,824-Speed 5620.25 samples/sec   Loss 3.5795   LearningRate 0.0137   Epoch: 12   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:55,661-Speed 5575.91 samples/sec   Loss 3.7568   LearningRate 0.0137   Epoch: 12   Global Step: 71590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:57,501-Speed 5566.44 samples/sec   Loss 3.6323   LearningRate 0.0137   Epoch: 12   Global Step: 71600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:57:59,359-Speed 5514.52 samples/sec   Loss 3.7333   LearningRate 0.0137   Epoch: 12   Global Step: 71610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:01,195-Speed 5578.89 samples/sec   Loss 3.7078   LearningRate 0.0137   Epoch: 12   Global Step: 71620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:03,026-Speed 5594.43 samples/sec   Loss 3.6881   LearningRate 0.0137   Epoch: 12   Global Step: 71630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:04,887-Speed 5506.00 samples/sec   Loss 3.5576   LearningRate 0.0137   Epoch: 12   Global Step: 71640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:06,737-Speed 5536.86 samples/sec   Loss 3.7043   LearningRate 0.0137   Epoch: 12   Global Step: 71650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:08,573-Speed 5577.25 samples/sec   Loss 3.7599   LearningRate 0.0137   Epoch: 12   Global Step: 71660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:10,393-Speed 5627.54 samples/sec   Loss 3.7436   LearningRate 0.0137   Epoch: 12   Global Step: 71670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:12,227-Speed 5586.88 samples/sec   Loss 3.7428   LearningRate 0.0137   Epoch: 12   Global Step: 71680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:14,051-Speed 5615.59 samples/sec   Loss 3.7268   LearningRate 0.0137   Epoch: 12   Global Step: 71690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:15,894-Speed 5557.53 samples/sec   Loss 3.6672   LearningRate 0.0137   Epoch: 12   Global Step: 71700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:17,723-Speed 5601.69 samples/sec   Loss 3.6941   LearningRate 0.0136   Epoch: 12   Global Step: 71710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:19,550-Speed 5607.05 samples/sec   Loss 3.7641   LearningRate 0.0136   Epoch: 12   Global Step: 71720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:21,472-Speed 5330.23 samples/sec   Loss 3.5677   LearningRate 0.0136   Epoch: 12   Global Step: 71730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:23,394-Speed 5328.43 samples/sec   Loss 3.7131   LearningRate 0.0136   Epoch: 12   Global Step: 71740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:25,316-Speed 5328.91 samples/sec   Loss 3.5687   LearningRate 0.0136   Epoch: 12   Global Step: 71750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:27,221-Speed 5376.24 samples/sec   Loss 3.6388   LearningRate 0.0136   Epoch: 12   Global Step: 71760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:29,144-Speed 5327.81 samples/sec   Loss 3.7026   LearningRate 0.0136   Epoch: 12   Global Step: 71770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:31,067-Speed 5326.40 samples/sec   Loss 3.7063   LearningRate 0.0136   Epoch: 12   Global Step: 71780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:32,914-Speed 5545.74 samples/sec   Loss 3.6617   LearningRate 0.0136   Epoch: 12   Global Step: 71790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:34,737-Speed 5620.16 samples/sec   Loss 3.6563   LearningRate 0.0136   Epoch: 12   Global Step: 71800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:36,563-Speed 5608.82 samples/sec   Loss 3.6069   LearningRate 0.0136   Epoch: 12   Global Step: 71810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:38,385-Speed 5622.79 samples/sec   Loss 3.6928   LearningRate 0.0136   Epoch: 12   Global Step: 71820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:40,228-Speed 5557.03 samples/sec   Loss 3.5889   LearningRate 0.0136   Epoch: 12   Global Step: 71830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:42,059-Speed 5595.63 samples/sec   Loss 3.5561   LearningRate 0.0136   Epoch: 12   Global Step: 71840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:43,896-Speed 5576.98 samples/sec   Loss 3.6468   LearningRate 0.0136   Epoch: 12   Global Step: 71850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:45,735-Speed 5570.49 samples/sec   Loss 3.6844   LearningRate 0.0135   Epoch: 12   Global Step: 71860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 05:58:47,565-Speed 5598.51 samples/sec   Loss 3.6441   LearningRate 0.0135   Epoch: 12   Global Step: 71870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:49,391-Speed 5610.28 samples/sec   Loss 3.7332   LearningRate 0.0135   Epoch: 12   Global Step: 71880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:51,229-Speed 5572.23 samples/sec   Loss 3.6629   LearningRate 0.0135   Epoch: 12   Global Step: 71890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:53,047-Speed 5634.01 samples/sec   Loss 3.7071   LearningRate 0.0135   Epoch: 12   Global Step: 71900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:54,884-Speed 5576.88 samples/sec   Loss 3.6923   LearningRate 0.0135   Epoch: 12   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:56,733-Speed 5541.31 samples/sec   Loss 3.7206   LearningRate 0.0135   Epoch: 12   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:58:58,567-Speed 5582.81 samples/sec   Loss 3.5797   LearningRate 0.0135   Epoch: 12   Global Step: 71930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:00,411-Speed 5555.87 samples/sec   Loss 3.7053   LearningRate 0.0135   Epoch: 12   Global Step: 71940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:02,243-Speed 5591.87 samples/sec   Loss 3.5710   LearningRate 0.0135   Epoch: 12   Global Step: 71950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:04,071-Speed 5604.00 samples/sec   Loss 3.6314   LearningRate 0.0135   Epoch: 12   Global Step: 71960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:05,891-Speed 5628.35 samples/sec   Loss 3.6507   LearningRate 0.0135   Epoch: 12   Global Step: 71970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:07,721-Speed 5598.73 samples/sec   Loss 3.5902   LearningRate 0.0135   Epoch: 12   Global Step: 71980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:09,544-Speed 5619.69 samples/sec   Loss 3.6358   LearningRate 0.0135   Epoch: 12   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:11,377-Speed 5586.12 samples/sec   Loss 3.6165   LearningRate 0.0135   Epoch: 12   Global Step: 72000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 05:59:37,629-[lfw][72000]XNorm: 21.907372
Training: 2022-04-27 05:59:37,629-[lfw][72000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-27 05:59:37,630-[lfw][72000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:00:08,141-[cfp_fp][72000]XNorm: 19.808723
Training: 2022-04-27 06:00:08,142-[cfp_fp][72000]Accuracy-Flip: 0.96386+-0.01002
Training: 2022-04-27 06:00:08,142-[cfp_fp][72000]Accuracy-Highest: 0.96543
Training: 2022-04-27 06:00:34,470-[agedb_30][72000]XNorm: 21.776772
Training: 2022-04-27 06:00:34,470-[agedb_30][72000]Accuracy-Flip: 0.97883+-0.00882
Training: 2022-04-27 06:00:34,471-[agedb_30][72000]Accuracy-Highest: 0.97917
Training: 2022-04-27 06:00:36,379-Speed 120.47 samples/sec   Loss 3.7351   LearningRate 0.0135   Epoch: 12   Global Step: 72010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:38,190-Speed 5656.74 samples/sec   Loss 3.6815   LearningRate 0.0134   Epoch: 12   Global Step: 72020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:40,029-Speed 5570.94 samples/sec   Loss 3.6099   LearningRate 0.0134   Epoch: 12   Global Step: 72030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:41,917-Speed 5423.53 samples/sec   Loss 3.6174   LearningRate 0.0134   Epoch: 12   Global Step: 72040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:43,800-Speed 5441.59 samples/sec   Loss 3.6142   LearningRate 0.0134   Epoch: 12   Global Step: 72050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:45,625-Speed 5609.73 samples/sec   Loss 3.6035   LearningRate 0.0134   Epoch: 12   Global Step: 72060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:47,429-Speed 5680.06 samples/sec   Loss 3.5804   LearningRate 0.0134   Epoch: 12   Global Step: 72070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:49,257-Speed 5602.67 samples/sec   Loss 3.5877   LearningRate 0.0134   Epoch: 12   Global Step: 72080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:51,074-Speed 5637.40 samples/sec   Loss 3.5866   LearningRate 0.0134   Epoch: 12   Global Step: 72090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:52,884-Speed 5660.43 samples/sec   Loss 3.6825   LearningRate 0.0134   Epoch: 12   Global Step: 72100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:54,701-Speed 5635.77 samples/sec   Loss 3.5286   LearningRate 0.0134   Epoch: 12   Global Step: 72110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:00:56,526-Speed 5612.57 samples/sec   Loss 3.5627   LearningRate 0.0134   Epoch: 12   Global Step: 72120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:00:58,342-Speed 5640.61 samples/sec   Loss 3.5888   LearningRate 0.0134   Epoch: 12   Global Step: 72130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:00,164-Speed 5622.68 samples/sec   Loss 3.6896   LearningRate 0.0134   Epoch: 12   Global Step: 72140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:01,989-Speed 5614.42 samples/sec   Loss 3.6889   LearningRate 0.0134   Epoch: 12   Global Step: 72150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:03,801-Speed 5651.51 samples/sec   Loss 3.6710   LearningRate 0.0134   Epoch: 12   Global Step: 72160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:05,640-Speed 5571.23 samples/sec   Loss 3.5687   LearningRate 0.0133   Epoch: 12   Global Step: 72170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:07,452-Speed 5654.35 samples/sec   Loss 3.6610   LearningRate 0.0133   Epoch: 12   Global Step: 72180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:09,279-Speed 5604.76 samples/sec   Loss 3.6533   LearningRate 0.0133   Epoch: 12   Global Step: 72190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:11,095-Speed 5641.06 samples/sec   Loss 3.5383   LearningRate 0.0133   Epoch: 12   Global Step: 72200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:12,911-Speed 5641.83 samples/sec   Loss 3.6292   LearningRate 0.0133   Epoch: 12   Global Step: 72210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 06:01:14,722-Speed 5655.04 samples/sec   Loss 3.7124   LearningRate 0.0133   Epoch: 12   Global Step: 72220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:16,530-Speed 5665.99 samples/sec   Loss 3.7635   LearningRate 0.0133   Epoch: 12   Global Step: 72230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:18,345-Speed 5643.16 samples/sec   Loss 3.5492   LearningRate 0.0133   Epoch: 12   Global Step: 72240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:20,213-Speed 5485.34 samples/sec   Loss 3.7207   LearningRate 0.0133   Epoch: 12   Global Step: 72250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:22,136-Speed 5325.73 samples/sec   Loss 3.5894   LearningRate 0.0133   Epoch: 12   Global Step: 72260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:24,002-Speed 5488.84 samples/sec   Loss 3.6154   LearningRate 0.0133   Epoch: 12   Global Step: 72270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:25,820-Speed 5634.50 samples/sec   Loss 3.5666   LearningRate 0.0133   Epoch: 12   Global Step: 72280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:27,639-Speed 5632.08 samples/sec   Loss 3.6472   LearningRate 0.0133   Epoch: 12   Global Step: 72290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:29,453-Speed 5645.90 samples/sec   Loss 3.5759   LearningRate 0.0133   Epoch: 12   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:31,278-Speed 5614.85 samples/sec   Loss 3.6489   LearningRate 0.0133   Epoch: 12   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:33,102-Speed 5614.76 samples/sec   Loss 3.6548   LearningRate 0.0133   Epoch: 12   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:34,938-Speed 5579.04 samples/sec   Loss 3.6993   LearningRate 0.0132   Epoch: 12   Global Step: 72330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:36,764-Speed 5608.35 samples/sec   Loss 3.6561   LearningRate 0.0132   Epoch: 12   Global Step: 72340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:38,575-Speed 5656.26 samples/sec   Loss 3.6066   LearningRate 0.0132   Epoch: 12   Global Step: 72350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:40,398-Speed 5620.02 samples/sec   Loss 3.5577   LearningRate 0.0132   Epoch: 12   Global Step: 72360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:42,224-Speed 5611.54 samples/sec   Loss 3.5985   LearningRate 0.0132   Epoch: 12   Global Step: 72370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:44,060-Speed 5578.56 samples/sec   Loss 3.6475   LearningRate 0.0132   Epoch: 12   Global Step: 72380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:45,876-Speed 5640.03 samples/sec   Loss 3.5924   LearningRate 0.0132   Epoch: 12   Global Step: 72390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:47,700-Speed 5618.54 samples/sec   Loss 3.6302   LearningRate 0.0132   Epoch: 12   Global Step: 72400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:49,526-Speed 5608.35 samples/sec   Loss 3.5708   LearningRate 0.0132   Epoch: 12   Global Step: 72410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:51,362-Speed 5578.41 samples/sec   Loss 3.5766   LearningRate 0.0132   Epoch: 12   Global Step: 72420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 06:01:53,204-Speed 5561.86 samples/sec   Loss 3.5325   LearningRate 0.0132   Epoch: 12   Global Step: 72430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:55,041-Speed 5574.18 samples/sec   Loss 3.5403   LearningRate 0.0132   Epoch: 12   Global Step: 72440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:56,861-Speed 5630.52 samples/sec   Loss 3.6170   LearningRate 0.0132   Epoch: 12   Global Step: 72450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:01:58,685-Speed 5614.68 samples/sec   Loss 3.6839   LearningRate 0.0132   Epoch: 12   Global Step: 72460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:00,510-Speed 5611.78 samples/sec   Loss 3.5080   LearningRate 0.0132   Epoch: 12   Global Step: 72470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:02,341-Speed 5596.94 samples/sec   Loss 3.6466   LearningRate 0.0132   Epoch: 12   Global Step: 72480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:04,161-Speed 5626.05 samples/sec   Loss 3.4963   LearningRate 0.0131   Epoch: 12   Global Step: 72490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:05,992-Speed 5595.65 samples/sec   Loss 3.6210   LearningRate 0.0131   Epoch: 12   Global Step: 72500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:07,811-Speed 5629.31 samples/sec   Loss 3.6355   LearningRate 0.0131   Epoch: 12   Global Step: 72510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:09,621-Speed 5662.48 samples/sec   Loss 3.6398   LearningRate 0.0131   Epoch: 12   Global Step: 72520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:11,432-Speed 5656.54 samples/sec   Loss 3.5389   LearningRate 0.0131   Epoch: 12   Global Step: 72530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:13,243-Speed 5656.83 samples/sec   Loss 3.6228   LearningRate 0.0131   Epoch: 12   Global Step: 72540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:15,062-Speed 5630.09 samples/sec   Loss 3.6139   LearningRate 0.0131   Epoch: 12   Global Step: 72550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:16,887-Speed 5612.79 samples/sec   Loss 3.5629   LearningRate 0.0131   Epoch: 12   Global Step: 72560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:18,698-Speed 5657.19 samples/sec   Loss 3.6139   LearningRate 0.0131   Epoch: 12   Global Step: 72570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:20,528-Speed 5597.73 samples/sec   Loss 3.6245   LearningRate 0.0131   Epoch: 12   Global Step: 72580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:22,364-Speed 5578.89 samples/sec   Loss 3.5709   LearningRate 0.0131   Epoch: 12   Global Step: 72590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:24,201-Speed 5575.29 samples/sec   Loss 3.4997   LearningRate 0.0131   Epoch: 12   Global Step: 72600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:26,068-Speed 5487.86 samples/sec   Loss 3.6125   LearningRate 0.0131   Epoch: 12   Global Step: 72610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:27,896-Speed 5601.14 samples/sec   Loss 3.6580   LearningRate 0.0131   Epoch: 12   Global Step: 72620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:29,756-Speed 5508.31 samples/sec   Loss 3.5800   LearningRate 0.0131   Epoch: 12   Global Step: 72630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 06:02:31,572-Speed 5641.73 samples/sec   Loss 3.6044   LearningRate 0.0130   Epoch: 12   Global Step: 72640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:33,387-Speed 5643.70 samples/sec   Loss 3.6628   LearningRate 0.0130   Epoch: 12   Global Step: 72650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:35,205-Speed 5634.13 samples/sec   Loss 3.6013   LearningRate 0.0130   Epoch: 12   Global Step: 72660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:37,017-Speed 5651.95 samples/sec   Loss 3.5365   LearningRate 0.0130   Epoch: 12   Global Step: 72670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:38,839-Speed 5623.79 samples/sec   Loss 3.5361   LearningRate 0.0130   Epoch: 12   Global Step: 72680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:40,674-Speed 5581.51 samples/sec   Loss 3.6520   LearningRate 0.0130   Epoch: 12   Global Step: 72690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:42,483-Speed 5661.50 samples/sec   Loss 3.6598   LearningRate 0.0130   Epoch: 12   Global Step: 72700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:44,330-Speed 5545.75 samples/sec   Loss 3.6417   LearningRate 0.0130   Epoch: 12   Global Step: 72710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:46,169-Speed 5570.03 samples/sec   Loss 3.5939   LearningRate 0.0130   Epoch: 12   Global Step: 72720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:47,994-Speed 5614.26 samples/sec   Loss 3.5109   LearningRate 0.0130   Epoch: 12   Global Step: 72730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:49,821-Speed 5606.11 samples/sec   Loss 3.6049   LearningRate 0.0130   Epoch: 12   Global Step: 72740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:51,639-Speed 5633.69 samples/sec   Loss 3.5551   LearningRate 0.0130   Epoch: 12   Global Step: 72750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:53,489-Speed 5538.05 samples/sec   Loss 3.7208   LearningRate 0.0130   Epoch: 12   Global Step: 72760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:55,303-Speed 5647.79 samples/sec   Loss 3.5098   LearningRate 0.0130   Epoch: 12   Global Step: 72770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:57,130-Speed 5606.45 samples/sec   Loss 3.5407   LearningRate 0.0130   Epoch: 12   Global Step: 72780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:02:58,956-Speed 5609.69 samples/sec   Loss 3.6752   LearningRate 0.0130   Epoch: 12   Global Step: 72790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:00,773-Speed 5636.82 samples/sec   Loss 3.5484   LearningRate 0.0129   Epoch: 12   Global Step: 72800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:02,588-Speed 5643.12 samples/sec   Loss 3.6242   LearningRate 0.0129   Epoch: 12   Global Step: 72810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:04,410-Speed 5622.73 samples/sec   Loss 3.6491   LearningRate 0.0129   Epoch: 12   Global Step: 72820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:06,287-Speed 5457.17 samples/sec   Loss 3.5706   LearningRate 0.0129   Epoch: 12   Global Step: 72830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:08,113-Speed 5608.78 samples/sec   Loss 3.5832   LearningRate 0.0129   Epoch: 12   Global Step: 72840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 06:03:09,928-Speed 5645.29 samples/sec   Loss 3.5217   LearningRate 0.0129   Epoch: 12   Global Step: 72850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 06:03:11,731-Speed 5682.29 samples/sec   Loss 3.5577   LearningRate 0.0129   Epoch: 12   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:13,556-Speed 5613.29 samples/sec   Loss 3.5527   LearningRate 0.0129   Epoch: 12   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:15,383-Speed 5606.26 samples/sec   Loss 3.6711   LearningRate 0.0129   Epoch: 12   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:17,212-Speed 5599.45 samples/sec   Loss 3.6287   LearningRate 0.0129   Epoch: 12   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:19,047-Speed 5585.46 samples/sec   Loss 3.5181   LearningRate 0.0129   Epoch: 12   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:20,879-Speed 5590.49 samples/sec   Loss 3.7087   LearningRate 0.0129   Epoch: 12   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:22,692-Speed 5647.84 samples/sec   Loss 3.6757   LearningRate 0.0129   Epoch: 12   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:24,508-Speed 5642.90 samples/sec   Loss 3.5768   LearningRate 0.0129   Epoch: 12   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:26,328-Speed 5628.52 samples/sec   Loss 3.6803   LearningRate 0.0129   Epoch: 12   Global Step: 72940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:28,138-Speed 5658.11 samples/sec   Loss 3.5095   LearningRate 0.0129   Epoch: 12   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:29,938-Speed 5691.77 samples/sec   Loss 3.5486   LearningRate 0.0128   Epoch: 12   Global Step: 72960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:31,752-Speed 5647.36 samples/sec   Loss 3.6744   LearningRate 0.0128   Epoch: 12   Global Step: 72970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:33,586-Speed 5583.75 samples/sec   Loss 3.6300   LearningRate 0.0128   Epoch: 12   Global Step: 72980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:35,413-Speed 5607.13 samples/sec   Loss 3.6880   LearningRate 0.0128   Epoch: 12   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:37,235-Speed 5621.61 samples/sec   Loss 3.6395   LearningRate 0.0128   Epoch: 12   Global Step: 73000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:39,044-Speed 5662.78 samples/sec   Loss 3.7277   LearningRate 0.0128   Epoch: 12   Global Step: 73010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:40,862-Speed 5634.00 samples/sec   Loss 3.6745   LearningRate 0.0128   Epoch: 12   Global Step: 73020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:42,681-Speed 5632.20 samples/sec   Loss 3.6313   LearningRate 0.0128   Epoch: 12   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:44,507-Speed 5609.90 samples/sec   Loss 3.6353   LearningRate 0.0128   Epoch: 12   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:46,360-Speed 5528.14 samples/sec   Loss 3.4840   LearningRate 0.0128   Epoch: 12   Global Step: 73050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:48,170-Speed 5661.00 samples/sec   Loss 3.6462   LearningRate 0.0128   Epoch: 12   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:49,992-Speed 5621.72 samples/sec   Loss 3.6832   LearningRate 0.0128   Epoch: 12   Global Step: 73070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:51,825-Speed 5585.52 samples/sec   Loss 3.6225   LearningRate 0.0128   Epoch: 12   Global Step: 73080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:53,675-Speed 5539.44 samples/sec   Loss 3.5711   LearningRate 0.0128   Epoch: 12   Global Step: 73090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:55,501-Speed 5609.12 samples/sec   Loss 3.5759   LearningRate 0.0128   Epoch: 12   Global Step: 73100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:57,323-Speed 5621.45 samples/sec   Loss 3.5282   LearningRate 0.0128   Epoch: 12   Global Step: 73110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:03:59,145-Speed 5621.11 samples/sec   Loss 3.7241   LearningRate 0.0127   Epoch: 12   Global Step: 73120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:00,977-Speed 5593.04 samples/sec   Loss 3.5618   LearningRate 0.0127   Epoch: 12   Global Step: 73130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:02,814-Speed 5575.53 samples/sec   Loss 3.6412   LearningRate 0.0127   Epoch: 12   Global Step: 73140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:04,630-Speed 5641.78 samples/sec   Loss 3.6559   LearningRate 0.0127   Epoch: 12   Global Step: 73150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:06,435-Speed 5673.04 samples/sec   Loss 3.5886   LearningRate 0.0127   Epoch: 12   Global Step: 73160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:08,254-Speed 5633.32 samples/sec   Loss 3.5567   LearningRate 0.0127   Epoch: 12   Global Step: 73170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:10,092-Speed 5573.54 samples/sec   Loss 3.6227   LearningRate 0.0127   Epoch: 12   Global Step: 73180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:11,903-Speed 5655.58 samples/sec   Loss 3.4938   LearningRate 0.0127   Epoch: 12   Global Step: 73190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:13,724-Speed 5624.76 samples/sec   Loss 3.6276   LearningRate 0.0127   Epoch: 12   Global Step: 73200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:15,545-Speed 5626.11 samples/sec   Loss 3.5708   LearningRate 0.0127   Epoch: 12   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:17,364-Speed 5630.93 samples/sec   Loss 3.4671   LearningRate 0.0127   Epoch: 12   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:19,184-Speed 5626.90 samples/sec   Loss 3.5487   LearningRate 0.0127   Epoch: 12   Global Step: 73230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:21,006-Speed 5624.22 samples/sec   Loss 3.4993   LearningRate 0.0127   Epoch: 12   Global Step: 73240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:22,838-Speed 5590.20 samples/sec   Loss 3.5367   LearningRate 0.0127   Epoch: 12   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:24,657-Speed 5631.38 samples/sec   Loss 3.6368   LearningRate 0.0127   Epoch: 12   Global Step: 73260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 06:04:26,473-Speed 5642.78 samples/sec   Loss 3.5861   LearningRate 0.0127   Epoch: 12   Global Step: 73270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:28,289-Speed 5639.79 samples/sec   Loss 3.5868   LearningRate 0.0126   Epoch: 12   Global Step: 73280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:30,105-Speed 5640.17 samples/sec   Loss 3.5788   LearningRate 0.0126   Epoch: 12   Global Step: 73290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:31,928-Speed 5620.18 samples/sec   Loss 3.5926   LearningRate 0.0126   Epoch: 12   Global Step: 73300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:33,741-Speed 5647.67 samples/sec   Loss 3.5986   LearningRate 0.0126   Epoch: 12   Global Step: 73310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:35,560-Speed 5631.83 samples/sec   Loss 3.5423   LearningRate 0.0126   Epoch: 12   Global Step: 73320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:37,367-Speed 5668.78 samples/sec   Loss 3.5262   LearningRate 0.0126   Epoch: 12   Global Step: 73330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:39,190-Speed 5619.84 samples/sec   Loss 3.6998   LearningRate 0.0126   Epoch: 12   Global Step: 73340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:41,034-Speed 5555.31 samples/sec   Loss 3.5739   LearningRate 0.0126   Epoch: 12   Global Step: 73350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:42,855-Speed 5623.08 samples/sec   Loss 3.5748   LearningRate 0.0126   Epoch: 12   Global Step: 73360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:44,668-Speed 5650.08 samples/sec   Loss 3.5495   LearningRate 0.0126   Epoch: 12   Global Step: 73370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:46,481-Speed 5651.60 samples/sec   Loss 3.5479   LearningRate 0.0126   Epoch: 12   Global Step: 73380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:48,291-Speed 5661.53 samples/sec   Loss 3.5598   LearningRate 0.0126   Epoch: 12   Global Step: 73390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:50,156-Speed 5490.34 samples/sec   Loss 3.5331   LearningRate 0.0126   Epoch: 12   Global Step: 73400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:51,989-Speed 5589.29 samples/sec   Loss 3.5583   LearningRate 0.0126   Epoch: 12   Global Step: 73410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:53,835-Speed 5547.76 samples/sec   Loss 3.5466   LearningRate 0.0126   Epoch: 12   Global Step: 73420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:55,672-Speed 5575.57 samples/sec   Loss 3.5270   LearningRate 0.0126   Epoch: 12   Global Step: 73430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:57,494-Speed 5623.34 samples/sec   Loss 3.5416   LearningRate 0.0125   Epoch: 12   Global Step: 73440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:04:59,320-Speed 5608.98 samples/sec   Loss 3.5140   LearningRate 0.0125   Epoch: 12   Global Step: 73450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:01,146-Speed 5608.92 samples/sec   Loss 3.5220   LearningRate 0.0125   Epoch: 12   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:02,964-Speed 5636.00 samples/sec   Loss 3.5401   LearningRate 0.0125   Epoch: 12   Global Step: 73470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 06:05:04,773-Speed 5662.51 samples/sec   Loss 3.4073   LearningRate 0.0125   Epoch: 12   Global Step: 73480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:06,589-Speed 5640.41 samples/sec   Loss 3.5462   LearningRate 0.0125   Epoch: 12   Global Step: 73490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:08,412-Speed 5618.38 samples/sec   Loss 3.6665   LearningRate 0.0125   Epoch: 12   Global Step: 73500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:10,226-Speed 5648.24 samples/sec   Loss 3.4653   LearningRate 0.0125   Epoch: 12   Global Step: 73510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:12,046-Speed 5628.13 samples/sec   Loss 3.6293   LearningRate 0.0125   Epoch: 12   Global Step: 73520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:13,860-Speed 5647.35 samples/sec   Loss 3.5193   LearningRate 0.0125   Epoch: 12   Global Step: 73530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:15,668-Speed 5663.72 samples/sec   Loss 3.5628   LearningRate 0.0125   Epoch: 12   Global Step: 73540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:17,520-Speed 5533.25 samples/sec   Loss 3.5718   LearningRate 0.0125   Epoch: 12   Global Step: 73550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:19,358-Speed 5572.62 samples/sec   Loss 3.4294   LearningRate 0.0125   Epoch: 12   Global Step: 73560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:21,210-Speed 5530.30 samples/sec   Loss 3.5675   LearningRate 0.0125   Epoch: 12   Global Step: 73570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:23,034-Speed 5616.10 samples/sec   Loss 3.5138   LearningRate 0.0125   Epoch: 12   Global Step: 73580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:24,872-Speed 5572.55 samples/sec   Loss 3.6458   LearningRate 0.0125   Epoch: 12   Global Step: 73590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:26,707-Speed 5581.82 samples/sec   Loss 3.3997   LearningRate 0.0124   Epoch: 12   Global Step: 73600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:28,533-Speed 5609.67 samples/sec   Loss 3.4863   LearningRate 0.0124   Epoch: 12   Global Step: 73610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:30,360-Speed 5609.47 samples/sec   Loss 3.5266   LearningRate 0.0124   Epoch: 12   Global Step: 73620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:32,213-Speed 5528.44 samples/sec   Loss 3.4802   LearningRate 0.0124   Epoch: 12   Global Step: 73630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:34,047-Speed 5583.07 samples/sec   Loss 3.5136   LearningRate 0.0124   Epoch: 12   Global Step: 73640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:35,862-Speed 5643.59 samples/sec   Loss 3.5619   LearningRate 0.0124   Epoch: 12   Global Step: 73650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:37,692-Speed 5597.03 samples/sec   Loss 3.4493   LearningRate 0.0124   Epoch: 12   Global Step: 73660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:39,529-Speed 5576.19 samples/sec   Loss 3.5472   LearningRate 0.0124   Epoch: 12   Global Step: 73670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:41,338-Speed 5663.35 samples/sec   Loss 3.5575   LearningRate 0.0124   Epoch: 12   Global Step: 73680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:43,159-Speed 5624.43 samples/sec   Loss 3.4424   LearningRate 0.0124   Epoch: 12   Global Step: 73690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:44,986-Speed 5608.12 samples/sec   Loss 3.4252   LearningRate 0.0124   Epoch: 12   Global Step: 73700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:46,800-Speed 5645.97 samples/sec   Loss 3.4503   LearningRate 0.0124   Epoch: 12   Global Step: 73710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:48,630-Speed 5599.89 samples/sec   Loss 3.5784   LearningRate 0.0124   Epoch: 12   Global Step: 73720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:50,443-Speed 5650.08 samples/sec   Loss 3.5460   LearningRate 0.0124   Epoch: 12   Global Step: 73730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:52,250-Speed 5666.40 samples/sec   Loss 3.5763   LearningRate 0.0124   Epoch: 12   Global Step: 73740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:54,060-Speed 5660.78 samples/sec   Loss 3.4613   LearningRate 0.0124   Epoch: 12   Global Step: 73750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:55,877-Speed 5637.25 samples/sec   Loss 3.5146   LearningRate 0.0123   Epoch: 12   Global Step: 73760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:57,695-Speed 5633.77 samples/sec   Loss 3.5379   LearningRate 0.0123   Epoch: 12   Global Step: 73770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:05:59,555-Speed 5505.80 samples/sec   Loss 3.6010   LearningRate 0.0123   Epoch: 12   Global Step: 73780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:01,421-Speed 5491.75 samples/sec   Loss 3.4767   LearningRate 0.0123   Epoch: 12   Global Step: 73790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:03,308-Speed 5427.36 samples/sec   Loss 3.4448   LearningRate 0.0123   Epoch: 12   Global Step: 73800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:05,127-Speed 5632.85 samples/sec   Loss 3.5469   LearningRate 0.0123   Epoch: 12   Global Step: 73810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:06,942-Speed 5640.80 samples/sec   Loss 3.6000   LearningRate 0.0123   Epoch: 12   Global Step: 73820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:08,768-Speed 5610.93 samples/sec   Loss 3.6282   LearningRate 0.0123   Epoch: 12   Global Step: 73830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:10,588-Speed 5628.74 samples/sec   Loss 3.3539   LearningRate 0.0123   Epoch: 12   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:12,433-Speed 5553.84 samples/sec   Loss 3.5555   LearningRate 0.0123   Epoch: 12   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:14,258-Speed 5612.89 samples/sec   Loss 3.5496   LearningRate 0.0123   Epoch: 12   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:06:16,071-Speed 5649.62 samples/sec   Loss 3.5099   LearningRate 0.0123   Epoch: 12   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:17,893-Speed 5622.30 samples/sec   Loss 3.6031   LearningRate 0.0123   Epoch: 12   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:19,720-Speed 5605.84 samples/sec   Loss 3.4014   LearningRate 0.0123   Epoch: 12   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:21,536-Speed 5639.15 samples/sec   Loss 3.4222   LearningRate 0.0123   Epoch: 12   Global Step: 73900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:23,427-Speed 5419.15 samples/sec   Loss 3.4035   LearningRate 0.0123   Epoch: 12   Global Step: 73910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:35,146-Speed 873.85 samples/sec   Loss 3.2095   LearningRate 0.0122   Epoch: 13   Global Step: 73920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:36,995-Speed 5541.68 samples/sec   Loss 2.8364   LearningRate 0.0122   Epoch: 13   Global Step: 73930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:38,824-Speed 5598.40 samples/sec   Loss 2.8550   LearningRate 0.0122   Epoch: 13   Global Step: 73940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:40,705-Speed 5445.29 samples/sec   Loss 2.8814   LearningRate 0.0122   Epoch: 13   Global Step: 73950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:42,589-Speed 5438.23 samples/sec   Loss 2.9108   LearningRate 0.0122   Epoch: 13   Global Step: 73960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:44,418-Speed 5603.15 samples/sec   Loss 2.9216   LearningRate 0.0122   Epoch: 13   Global Step: 73970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:46,234-Speed 5638.41 samples/sec   Loss 2.8080   LearningRate 0.0122   Epoch: 13   Global Step: 73980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:06:48,064-Speed 5598.69 samples/sec   Loss 2.8353   LearningRate 0.0122   Epoch: 13   Global Step: 73990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:06:49,898-Speed 5585.69 samples/sec   Loss 2.9502   LearningRate 0.0122   Epoch: 13   Global Step: 74000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:07:16,352-[lfw][74000]XNorm: 22.042191
Training: 2022-04-27 06:07:16,352-[lfw][74000]Accuracy-Flip: 0.99733+-0.00327
Training: 2022-04-27 06:07:16,353-[lfw][74000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:07:47,008-[cfp_fp][74000]XNorm: 19.978743
Training: 2022-04-27 06:07:47,009-[cfp_fp][74000]Accuracy-Flip: 0.95914+-0.01046
Training: 2022-04-27 06:07:47,009-[cfp_fp][74000]Accuracy-Highest: 0.96543
Training: 2022-04-27 06:08:13,414-[agedb_30][74000]XNorm: 21.946677
Training: 2022-04-27 06:08:13,415-[agedb_30][74000]Accuracy-Flip: 0.97683+-0.00529
Training: 2022-04-27 06:08:13,415-[agedb_30][74000]Accuracy-Highest: 0.97917
Training: 2022-04-27 06:08:15,244-Speed 119.98 samples/sec   Loss 2.8585   LearningRate 0.0122   Epoch: 13   Global Step: 74010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:08:17,061-Speed 5638.26 samples/sec   Loss 2.9743   LearningRate 0.0122   Epoch: 13   Global Step: 74020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:08:18,878-Speed 5636.22 samples/sec   Loss 2.9519   LearningRate 0.0122   Epoch: 13   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:08:20,700-Speed 5621.87 samples/sec   Loss 2.9285   LearningRate 0.0122   Epoch: 13   Global Step: 74040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:08:22,512-Speed 5653.31 samples/sec   Loss 2.8828   LearningRate 0.0122   Epoch: 13   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:08:24,345-Speed 5589.09 samples/sec   Loss 2.9773   LearningRate 0.0122   Epoch: 13   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 06:08:26,188-Speed 5559.64 samples/sec   Loss 2.9241   LearningRate 0.0122   Epoch: 13   Global Step: 74070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:28,013-Speed 5613.71 samples/sec   Loss 2.8985   LearningRate 0.0122   Epoch: 13   Global Step: 74080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:29,829-Speed 5637.79 samples/sec   Loss 2.8883   LearningRate 0.0121   Epoch: 13   Global Step: 74090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:08:31,649-Speed 5627.90 samples/sec   Loss 3.0163   LearningRate 0.0121   Epoch: 13   Global Step: 74100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:33,481-Speed 5591.71 samples/sec   Loss 2.8685   LearningRate 0.0121   Epoch: 13   Global Step: 74110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:35,300-Speed 5632.34 samples/sec   Loss 2.9357   LearningRate 0.0121   Epoch: 13   Global Step: 74120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:37,138-Speed 5573.58 samples/sec   Loss 2.9911   LearningRate 0.0121   Epoch: 13   Global Step: 74130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:38,962-Speed 5614.20 samples/sec   Loss 2.8796   LearningRate 0.0121   Epoch: 13   Global Step: 74140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:40,801-Speed 5573.46 samples/sec   Loss 2.9803   LearningRate 0.0121   Epoch: 13   Global Step: 74150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:42,632-Speed 5592.56 samples/sec   Loss 2.9298   LearningRate 0.0121   Epoch: 13   Global Step: 74160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:44,453-Speed 5625.56 samples/sec   Loss 3.0325   LearningRate 0.0121   Epoch: 13   Global Step: 74170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:46,277-Speed 5618.06 samples/sec   Loss 2.9818   LearningRate 0.0121   Epoch: 13   Global Step: 74180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:48,091-Speed 5645.51 samples/sec   Loss 2.9616   LearningRate 0.0121   Epoch: 13   Global Step: 74190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:49,893-Speed 5685.99 samples/sec   Loss 3.0393   LearningRate 0.0121   Epoch: 13   Global Step: 74200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:51,712-Speed 5629.97 samples/sec   Loss 2.9320   LearningRate 0.0121   Epoch: 13   Global Step: 74210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:53,532-Speed 5627.57 samples/sec   Loss 3.0314   LearningRate 0.0121   Epoch: 13   Global Step: 74220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:08:55,358-Speed 5611.12 samples/sec   Loss 2.9535   LearningRate 0.0121   Epoch: 13   Global Step: 74230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:08:57,164-Speed 5670.86 samples/sec   Loss 2.9504   LearningRate 0.0121   Epoch: 13   Global Step: 74240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:08:59,007-Speed 5556.86 samples/sec   Loss 3.0340   LearningRate 0.0120   Epoch: 13   Global Step: 74250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:00,839-Speed 5592.60 samples/sec   Loss 2.9700   LearningRate 0.0120   Epoch: 13   Global Step: 74260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:02,654-Speed 5643.69 samples/sec   Loss 2.9952   LearningRate 0.0120   Epoch: 13   Global Step: 74270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:04,468-Speed 5648.35 samples/sec   Loss 3.0304   LearningRate 0.0120   Epoch: 13   Global Step: 74280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:06,280-Speed 5651.74 samples/sec   Loss 2.9860   LearningRate 0.0120   Epoch: 13   Global Step: 74290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:08,093-Speed 5649.08 samples/sec   Loss 3.0692   LearningRate 0.0120   Epoch: 13   Global Step: 74300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:09,909-Speed 5643.07 samples/sec   Loss 3.0436   LearningRate 0.0120   Epoch: 13   Global Step: 74310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:11,762-Speed 5527.90 samples/sec   Loss 3.0901   LearningRate 0.0120   Epoch: 13   Global Step: 74320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:09:13,612-Speed 5534.76 samples/sec   Loss 3.0245   LearningRate 0.0120   Epoch: 13   Global Step: 74330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:15,444-Speed 5591.44 samples/sec   Loss 3.0009   LearningRate 0.0120   Epoch: 13   Global Step: 74340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:17,260-Speed 5642.29 samples/sec   Loss 3.0597   LearningRate 0.0120   Epoch: 13   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:19,089-Speed 5599.84 samples/sec   Loss 2.9544   LearningRate 0.0120   Epoch: 13   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:20,941-Speed 5532.36 samples/sec   Loss 3.0365   LearningRate 0.0120   Epoch: 13   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:22,772-Speed 5591.59 samples/sec   Loss 2.9892   LearningRate 0.0120   Epoch: 13   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:24,588-Speed 5643.73 samples/sec   Loss 3.0777   LearningRate 0.0120   Epoch: 13   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:26,405-Speed 5636.45 samples/sec   Loss 3.0011   LearningRate 0.0120   Epoch: 13   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:28,243-Speed 5572.14 samples/sec   Loss 3.0755   LearningRate 0.0119   Epoch: 13   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:30,064-Speed 5626.95 samples/sec   Loss 3.0551   LearningRate 0.0119   Epoch: 13   Global Step: 74420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:31,873-Speed 5661.15 samples/sec   Loss 2.9760   LearningRate 0.0119   Epoch: 13   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:33,702-Speed 5600.04 samples/sec   Loss 3.0782   LearningRate 0.0119   Epoch: 13   Global Step: 74440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:35,520-Speed 5634.79 samples/sec   Loss 3.0229   LearningRate 0.0119   Epoch: 13   Global Step: 74450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:37,331-Speed 5657.57 samples/sec   Loss 3.0392   LearningRate 0.0119   Epoch: 13   Global Step: 74460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:39,156-Speed 5610.54 samples/sec   Loss 3.0100   LearningRate 0.0119   Epoch: 13   Global Step: 74470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:40,979-Speed 5619.98 samples/sec   Loss 3.0158   LearningRate 0.0119   Epoch: 13   Global Step: 74480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:42,806-Speed 5605.90 samples/sec   Loss 3.1844   LearningRate 0.0119   Epoch: 13   Global Step: 74490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:44,629-Speed 5620.55 samples/sec   Loss 3.0100   LearningRate 0.0119   Epoch: 13   Global Step: 74500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:46,448-Speed 5631.81 samples/sec   Loss 3.0355   LearningRate 0.0119   Epoch: 13   Global Step: 74510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:48,288-Speed 5567.30 samples/sec   Loss 2.9949   LearningRate 0.0119   Epoch: 13   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:50,144-Speed 5518.87 samples/sec   Loss 3.1019   LearningRate 0.0119   Epoch: 13   Global Step: 74530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:09:51,963-Speed 5630.58 samples/sec   Loss 3.0998   LearningRate 0.0119   Epoch: 13   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:53,834-Speed 5474.89 samples/sec   Loss 3.0075   LearningRate 0.0119   Epoch: 13   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:55,656-Speed 5623.27 samples/sec   Loss 3.0555   LearningRate 0.0119   Epoch: 13   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:57,488-Speed 5589.75 samples/sec   Loss 3.0336   LearningRate 0.0119   Epoch: 13   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:09:59,358-Speed 5477.92 samples/sec   Loss 3.0633   LearningRate 0.0118   Epoch: 13   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:01,451-Speed 4895.95 samples/sec   Loss 3.0885   LearningRate 0.0118   Epoch: 13   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:03,374-Speed 5326.68 samples/sec   Loss 3.1218   LearningRate 0.0118   Epoch: 13   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:05,280-Speed 5372.86 samples/sec   Loss 3.1142   LearningRate 0.0118   Epoch: 13   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:07,095-Speed 5643.45 samples/sec   Loss 3.0387   LearningRate 0.0118   Epoch: 13   Global Step: 74620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:08,909-Speed 5649.44 samples/sec   Loss 3.1248   LearningRate 0.0118   Epoch: 13   Global Step: 74630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:10,723-Speed 5646.35 samples/sec   Loss 3.0958   LearningRate 0.0118   Epoch: 13   Global Step: 74640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:10:12,525-Speed 5683.92 samples/sec   Loss 3.0918   LearningRate 0.0118   Epoch: 13   Global Step: 74650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:14,346-Speed 5626.39 samples/sec   Loss 3.0701   LearningRate 0.0118   Epoch: 13   Global Step: 74660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:16,157-Speed 5656.27 samples/sec   Loss 3.0556   LearningRate 0.0118   Epoch: 13   Global Step: 74670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:17,965-Speed 5663.23 samples/sec   Loss 2.9564   LearningRate 0.0118   Epoch: 13   Global Step: 74680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:19,784-Speed 5631.53 samples/sec   Loss 3.0598   LearningRate 0.0118   Epoch: 13   Global Step: 74690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:21,605-Speed 5626.61 samples/sec   Loss 3.0157   LearningRate 0.0118   Epoch: 13   Global Step: 74700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:23,416-Speed 5655.60 samples/sec   Loss 3.1559   LearningRate 0.0118   Epoch: 13   Global Step: 74710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:25,222-Speed 5670.08 samples/sec   Loss 3.0939   LearningRate 0.0118   Epoch: 13   Global Step: 74720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:27,032-Speed 5660.82 samples/sec   Loss 3.1043   LearningRate 0.0118   Epoch: 13   Global Step: 74730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:28,858-Speed 5607.91 samples/sec   Loss 3.1017   LearningRate 0.0117   Epoch: 13   Global Step: 74740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:30,667-Speed 5663.80 samples/sec   Loss 3.1065   LearningRate 0.0117   Epoch: 13   Global Step: 74750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:10:32,472-Speed 5677.83 samples/sec   Loss 3.1513   LearningRate 0.0117   Epoch: 13   Global Step: 74760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:34,285-Speed 5649.35 samples/sec   Loss 3.0832   LearningRate 0.0117   Epoch: 13   Global Step: 74770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:36,107-Speed 5621.60 samples/sec   Loss 3.1520   LearningRate 0.0117   Epoch: 13   Global Step: 74780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:37,929-Speed 5622.37 samples/sec   Loss 3.1499   LearningRate 0.0117   Epoch: 13   Global Step: 74790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:39,749-Speed 5627.73 samples/sec   Loss 3.0031   LearningRate 0.0117   Epoch: 13   Global Step: 74800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:41,559-Speed 5657.94 samples/sec   Loss 3.0944   LearningRate 0.0117   Epoch: 13   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:43,397-Speed 5575.60 samples/sec   Loss 3.0335   LearningRate 0.0117   Epoch: 13   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:45,212-Speed 5641.21 samples/sec   Loss 3.1629   LearningRate 0.0117   Epoch: 13   Global Step: 74830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:47,037-Speed 5614.52 samples/sec   Loss 3.0634   LearningRate 0.0117   Epoch: 13   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:48,868-Speed 5592.31 samples/sec   Loss 3.2085   LearningRate 0.0117   Epoch: 13   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:50,700-Speed 5591.92 samples/sec   Loss 2.9375   LearningRate 0.0117   Epoch: 13   Global Step: 74860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:10:52,517-Speed 5638.52 samples/sec   Loss 3.0560   LearningRate 0.0117   Epoch: 13   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:54,341-Speed 5615.73 samples/sec   Loss 3.0832   LearningRate 0.0117   Epoch: 13   Global Step: 74880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:56,243-Speed 5385.52 samples/sec   Loss 3.1551   LearningRate 0.0117   Epoch: 13   Global Step: 74890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:58,161-Speed 5340.77 samples/sec   Loss 3.1650   LearningRate 0.0117   Epoch: 13   Global Step: 74900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:10:59,979-Speed 5634.68 samples/sec   Loss 3.0245   LearningRate 0.0116   Epoch: 13   Global Step: 74910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:01,820-Speed 5566.16 samples/sec   Loss 3.1911   LearningRate 0.0116   Epoch: 13   Global Step: 74920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:03,639-Speed 5629.66 samples/sec   Loss 3.0598   LearningRate 0.0116   Epoch: 13   Global Step: 74930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:05,457-Speed 5636.01 samples/sec   Loss 3.1354   LearningRate 0.0116   Epoch: 13   Global Step: 74940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:07,273-Speed 5638.30 samples/sec   Loss 3.1674   LearningRate 0.0116   Epoch: 13   Global Step: 74950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:09,100-Speed 5607.11 samples/sec   Loss 3.0551   LearningRate 0.0116   Epoch: 13   Global Step: 74960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:10,926-Speed 5609.52 samples/sec   Loss 3.0416   LearningRate 0.0116   Epoch: 13   Global Step: 74970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:11:12,743-Speed 5638.08 samples/sec   Loss 3.1167   LearningRate 0.0116   Epoch: 13   Global Step: 74980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:14,572-Speed 5600.43 samples/sec   Loss 3.1358   LearningRate 0.0116   Epoch: 13   Global Step: 74990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:16,393-Speed 5627.03 samples/sec   Loss 3.1331   LearningRate 0.0116   Epoch: 13   Global Step: 75000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:18,203-Speed 5658.29 samples/sec   Loss 3.1539   LearningRate 0.0116   Epoch: 13   Global Step: 75010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:20,017-Speed 5647.73 samples/sec   Loss 3.0132   LearningRate 0.0116   Epoch: 13   Global Step: 75020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:21,830-Speed 5650.30 samples/sec   Loss 3.0255   LearningRate 0.0116   Epoch: 13   Global Step: 75030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:23,644-Speed 5647.00 samples/sec   Loss 3.1156   LearningRate 0.0116   Epoch: 13   Global Step: 75040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:25,463-Speed 5631.94 samples/sec   Loss 3.1292   LearningRate 0.0116   Epoch: 13   Global Step: 75050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:27,304-Speed 5563.87 samples/sec   Loss 3.1482   LearningRate 0.0116   Epoch: 13   Global Step: 75060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:29,131-Speed 5604.49 samples/sec   Loss 3.3105   LearningRate 0.0116   Epoch: 13   Global Step: 75070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:30,966-Speed 5582.59 samples/sec   Loss 3.1419   LearningRate 0.0115   Epoch: 13   Global Step: 75080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:11:32,793-Speed 5606.81 samples/sec   Loss 3.0131   LearningRate 0.0115   Epoch: 13   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:34,626-Speed 5590.14 samples/sec   Loss 3.0964   LearningRate 0.0115   Epoch: 13   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:36,454-Speed 5602.85 samples/sec   Loss 3.1470   LearningRate 0.0115   Epoch: 13   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:38,269-Speed 5643.46 samples/sec   Loss 3.1819   LearningRate 0.0115   Epoch: 13   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:40,082-Speed 5650.92 samples/sec   Loss 2.9800   LearningRate 0.0115   Epoch: 13   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:41,896-Speed 5647.66 samples/sec   Loss 3.1565   LearningRate 0.0115   Epoch: 13   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:43,712-Speed 5638.21 samples/sec   Loss 3.2309   LearningRate 0.0115   Epoch: 13   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:45,539-Speed 5607.17 samples/sec   Loss 3.1053   LearningRate 0.0115   Epoch: 13   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:47,362-Speed 5618.82 samples/sec   Loss 3.2080   LearningRate 0.0115   Epoch: 13   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:49,204-Speed 5560.32 samples/sec   Loss 3.2118   LearningRate 0.0115   Epoch: 13   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:51,024-Speed 5630.37 samples/sec   Loss 3.1995   LearningRate 0.0115   Epoch: 13   Global Step: 75190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:52,852-Speed 5603.89 samples/sec   Loss 3.1282   LearningRate 0.0115   Epoch: 13   Global Step: 75200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:54,705-Speed 5526.45 samples/sec   Loss 3.1657   LearningRate 0.0115   Epoch: 13   Global Step: 75210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:56,524-Speed 5631.15 samples/sec   Loss 2.9877   LearningRate 0.0115   Epoch: 13   Global Step: 75220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:11:58,338-Speed 5649.87 samples/sec   Loss 3.1259   LearningRate 0.0115   Epoch: 13   Global Step: 75230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:00,152-Speed 5645.00 samples/sec   Loss 3.1659   LearningRate 0.0114   Epoch: 13   Global Step: 75240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:02,000-Speed 5543.21 samples/sec   Loss 3.2069   LearningRate 0.0114   Epoch: 13   Global Step: 75250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:03,820-Speed 5627.41 samples/sec   Loss 3.1757   LearningRate 0.0114   Epoch: 13   Global Step: 75260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:05,633-Speed 5652.14 samples/sec   Loss 3.1555   LearningRate 0.0114   Epoch: 13   Global Step: 75270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:07,450-Speed 5635.83 samples/sec   Loss 3.0970   LearningRate 0.0114   Epoch: 13   Global Step: 75280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:09,266-Speed 5641.90 samples/sec   Loss 3.0688   LearningRate 0.0114   Epoch: 13   Global Step: 75290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:11,091-Speed 5610.54 samples/sec   Loss 3.1363   LearningRate 0.0114   Epoch: 13   Global Step: 75300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:12,919-Speed 5606.19 samples/sec   Loss 3.1751   LearningRate 0.0114   Epoch: 13   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:14,758-Speed 5567.00 samples/sec   Loss 3.1900   LearningRate 0.0114   Epoch: 13   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:16,580-Speed 5626.10 samples/sec   Loss 3.1085   LearningRate 0.0114   Epoch: 13   Global Step: 75330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:18,394-Speed 5646.29 samples/sec   Loss 3.2697   LearningRate 0.0114   Epoch: 13   Global Step: 75340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:20,239-Speed 5552.08 samples/sec   Loss 3.1521   LearningRate 0.0114   Epoch: 13   Global Step: 75350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:22,072-Speed 5588.62 samples/sec   Loss 3.0800   LearningRate 0.0114   Epoch: 13   Global Step: 75360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:23,900-Speed 5602.28 samples/sec   Loss 3.1793   LearningRate 0.0114   Epoch: 13   Global Step: 75370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:25,731-Speed 5594.94 samples/sec   Loss 3.0312   LearningRate 0.0114   Epoch: 13   Global Step: 75380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:27,567-Speed 5578.47 samples/sec   Loss 3.1249   LearningRate 0.0114   Epoch: 13   Global Step: 75390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:29,409-Speed 5560.57 samples/sec   Loss 3.1799   LearningRate 0.0114   Epoch: 13   Global Step: 75400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:31,307-Speed 5397.79 samples/sec   Loss 3.0859   LearningRate 0.0113   Epoch: 13   Global Step: 75410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:33,117-Speed 5659.21 samples/sec   Loss 3.1828   LearningRate 0.0113   Epoch: 13   Global Step: 75420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:34,958-Speed 5564.02 samples/sec   Loss 3.1250   LearningRate 0.0113   Epoch: 13   Global Step: 75430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:36,801-Speed 5558.36 samples/sec   Loss 3.1640   LearningRate 0.0113   Epoch: 13   Global Step: 75440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:38,653-Speed 5532.04 samples/sec   Loss 3.0948   LearningRate 0.0113   Epoch: 13   Global Step: 75450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:40,477-Speed 5616.49 samples/sec   Loss 3.2059   LearningRate 0.0113   Epoch: 13   Global Step: 75460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:42,292-Speed 5642.42 samples/sec   Loss 3.2045   LearningRate 0.0113   Epoch: 13   Global Step: 75470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:44,108-Speed 5641.98 samples/sec   Loss 3.2102   LearningRate 0.0113   Epoch: 13   Global Step: 75480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:45,917-Speed 5660.86 samples/sec   Loss 3.0962   LearningRate 0.0113   Epoch: 13   Global Step: 75490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:47,740-Speed 5621.38 samples/sec   Loss 3.1564   LearningRate 0.0113   Epoch: 13   Global Step: 75500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:49,553-Speed 5647.62 samples/sec   Loss 3.2450   LearningRate 0.0113   Epoch: 13   Global Step: 75510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:51,387-Speed 5586.45 samples/sec   Loss 3.1933   LearningRate 0.0113   Epoch: 13   Global Step: 75520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:53,250-Speed 5498.59 samples/sec   Loss 3.2699   LearningRate 0.0113   Epoch: 13   Global Step: 75530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:55,072-Speed 5622.80 samples/sec   Loss 3.0938   LearningRate 0.0113   Epoch: 13   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:56,918-Speed 5549.59 samples/sec   Loss 3.2161   LearningRate 0.0113   Epoch: 13   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:12:58,760-Speed 5560.35 samples/sec   Loss 3.0855   LearningRate 0.0113   Epoch: 13   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:00,590-Speed 5599.06 samples/sec   Loss 3.1909   LearningRate 0.0113   Epoch: 13   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:02,415-Speed 5612.27 samples/sec   Loss 3.1767   LearningRate 0.0112   Epoch: 13   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:04,231-Speed 5640.86 samples/sec   Loss 3.2231   LearningRate 0.0112   Epoch: 13   Global Step: 75590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:13:06,051-Speed 5628.06 samples/sec   Loss 3.1746   LearningRate 0.0112   Epoch: 13   Global Step: 75600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:07,881-Speed 5595.54 samples/sec   Loss 3.2913   LearningRate 0.0112   Epoch: 13   Global Step: 75610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:09,699-Speed 5636.40 samples/sec   Loss 3.1554   LearningRate 0.0112   Epoch: 13   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:11,539-Speed 5566.89 samples/sec   Loss 3.0817   LearningRate 0.0112   Epoch: 13   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:13,391-Speed 5530.47 samples/sec   Loss 3.1641   LearningRate 0.0112   Epoch: 13   Global Step: 75640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:15,216-Speed 5612.75 samples/sec   Loss 3.2308   LearningRate 0.0112   Epoch: 13   Global Step: 75650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:17,045-Speed 5599.47 samples/sec   Loss 3.1718   LearningRate 0.0112   Epoch: 13   Global Step: 75660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:18,859-Speed 5646.87 samples/sec   Loss 3.1473   LearningRate 0.0112   Epoch: 13   Global Step: 75670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:20,677-Speed 5634.53 samples/sec   Loss 3.0761   LearningRate 0.0112   Epoch: 13   Global Step: 75680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:22,510-Speed 5589.18 samples/sec   Loss 3.0220   LearningRate 0.0112   Epoch: 13   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:24,311-Speed 5689.44 samples/sec   Loss 3.2356   LearningRate 0.0112   Epoch: 13   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:26,120-Speed 5662.80 samples/sec   Loss 3.1555   LearningRate 0.0112   Epoch: 13   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:27,935-Speed 5643.49 samples/sec   Loss 3.2341   LearningRate 0.0112   Epoch: 13   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:29,749-Speed 5646.58 samples/sec   Loss 3.1507   LearningRate 0.0112   Epoch: 13   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:31,580-Speed 5594.47 samples/sec   Loss 3.1900   LearningRate 0.0112   Epoch: 13   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:33,415-Speed 5580.49 samples/sec   Loss 3.2230   LearningRate 0.0111   Epoch: 13   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:35,251-Speed 5579.74 samples/sec   Loss 3.2018   LearningRate 0.0111   Epoch: 13   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:37,079-Speed 5602.13 samples/sec   Loss 3.2378   LearningRate 0.0111   Epoch: 13   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:38,897-Speed 5636.08 samples/sec   Loss 3.3169   LearningRate 0.0111   Epoch: 13   Global Step: 75780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:40,718-Speed 5624.96 samples/sec   Loss 3.2672   LearningRate 0.0111   Epoch: 13   Global Step: 75790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:42,535-Speed 5638.41 samples/sec   Loss 3.1899   LearningRate 0.0111   Epoch: 13   Global Step: 75800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:44,364-Speed 5600.30 samples/sec   Loss 3.1967   LearningRate 0.0111   Epoch: 13   Global Step: 75810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:46,179-Speed 5644.41 samples/sec   Loss 3.3121   LearningRate 0.0111   Epoch: 13   Global Step: 75820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:48,018-Speed 5570.99 samples/sec   Loss 3.2560   LearningRate 0.0111   Epoch: 13   Global Step: 75830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:49,839-Speed 5624.56 samples/sec   Loss 3.1888   LearningRate 0.0111   Epoch: 13   Global Step: 75840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:51,670-Speed 5593.94 samples/sec   Loss 3.2734   LearningRate 0.0111   Epoch: 13   Global Step: 75850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:53,481-Speed 5657.59 samples/sec   Loss 3.1744   LearningRate 0.0111   Epoch: 13   Global Step: 75860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:55,307-Speed 5607.42 samples/sec   Loss 3.1267   LearningRate 0.0111   Epoch: 13   Global Step: 75870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:57,144-Speed 5575.53 samples/sec   Loss 3.1724   LearningRate 0.0111   Epoch: 13   Global Step: 75880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:13:58,969-Speed 5613.49 samples/sec   Loss 3.1432   LearningRate 0.0111   Epoch: 13   Global Step: 75890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:00,786-Speed 5636.89 samples/sec   Loss 3.2163   LearningRate 0.0111   Epoch: 13   Global Step: 75900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:02,612-Speed 5609.05 samples/sec   Loss 3.2025   LearningRate 0.0111   Epoch: 13   Global Step: 75910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:04,435-Speed 5621.92 samples/sec   Loss 3.2734   LearningRate 0.0110   Epoch: 13   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:06,262-Speed 5607.27 samples/sec   Loss 3.2749   LearningRate 0.0110   Epoch: 13   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:08,113-Speed 5532.25 samples/sec   Loss 3.1258   LearningRate 0.0110   Epoch: 13   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:09,969-Speed 5518.86 samples/sec   Loss 3.0513   LearningRate 0.0110   Epoch: 13   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:11,799-Speed 5598.55 samples/sec   Loss 3.2046   LearningRate 0.0110   Epoch: 13   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:13,616-Speed 5638.42 samples/sec   Loss 3.1237   LearningRate 0.0110   Epoch: 13   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:15,474-Speed 5512.75 samples/sec   Loss 3.1061   LearningRate 0.0110   Epoch: 13   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:17,292-Speed 5633.29 samples/sec   Loss 2.9823   LearningRate 0.0110   Epoch: 13   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:14:19,114-Speed 5622.98 samples/sec   Loss 3.1666   LearningRate 0.0110   Epoch: 13   Global Step: 76000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:14:45,637-[lfw][76000]XNorm: 22.480857
Training: 2022-04-27 06:14:45,637-[lfw][76000]Accuracy-Flip: 0.99733+-0.00309
Training: 2022-04-27 06:14:45,638-[lfw][76000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:15:16,357-[cfp_fp][76000]XNorm: 20.312722
Training: 2022-04-27 06:15:16,357-[cfp_fp][76000]Accuracy-Flip: 0.96657+-0.00600
Training: 2022-04-27 06:15:16,358-[cfp_fp][76000]Accuracy-Highest: 0.96657
Training: 2022-04-27 06:15:42,926-[agedb_30][76000]XNorm: 22.537930
Training: 2022-04-27 06:15:42,926-[agedb_30][76000]Accuracy-Flip: 0.97933+-0.00680
Training: 2022-04-27 06:15:42,927-[agedb_30][76000]Accuracy-Highest: 0.97933
Training: 2022-04-27 06:15:44,740-Speed 119.59 samples/sec   Loss 3.2192   LearningRate 0.0110   Epoch: 13   Global Step: 76010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:46,558-Speed 5636.47 samples/sec   Loss 3.1610   LearningRate 0.0110   Epoch: 13   Global Step: 76020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:48,383-Speed 5611.63 samples/sec   Loss 3.1845   LearningRate 0.0110   Epoch: 13   Global Step: 76030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:50,205-Speed 5623.27 samples/sec   Loss 3.1842   LearningRate 0.0110   Epoch: 13   Global Step: 76040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:52,013-Speed 5664.29 samples/sec   Loss 3.1947   LearningRate 0.0110   Epoch: 13   Global Step: 76050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:53,824-Speed 5658.12 samples/sec   Loss 3.0325   LearningRate 0.0110   Epoch: 13   Global Step: 76060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:55,711-Speed 5426.71 samples/sec   Loss 3.1020   LearningRate 0.0110   Epoch: 13   Global Step: 76070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:57,518-Speed 5669.34 samples/sec   Loss 3.0667   LearningRate 0.0110   Epoch: 13   Global Step: 76080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:15:59,339-Speed 5624.47 samples/sec   Loss 3.1940   LearningRate 0.0109   Epoch: 13   Global Step: 76090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:01,225-Speed 5430.57 samples/sec   Loss 3.1720   LearningRate 0.0109   Epoch: 13   Global Step: 76100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:03,022-Speed 5700.48 samples/sec   Loss 3.2277   LearningRate 0.0109   Epoch: 13   Global Step: 76110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:04,848-Speed 5610.16 samples/sec   Loss 3.2500   LearningRate 0.0109   Epoch: 13   Global Step: 76120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:06,679-Speed 5596.48 samples/sec   Loss 3.1608   LearningRate 0.0109   Epoch: 13   Global Step: 76130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:08,497-Speed 5632.60 samples/sec   Loss 3.2804   LearningRate 0.0109   Epoch: 13   Global Step: 76140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:10,313-Speed 5640.91 samples/sec   Loss 3.1058   LearningRate 0.0109   Epoch: 13   Global Step: 76150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:12,130-Speed 5636.88 samples/sec   Loss 3.2166   LearningRate 0.0109   Epoch: 13   Global Step: 76160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:13,955-Speed 5613.50 samples/sec   Loss 3.1363   LearningRate 0.0109   Epoch: 13   Global Step: 76170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:15,780-Speed 5612.84 samples/sec   Loss 3.2661   LearningRate 0.0109   Epoch: 13   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:17,599-Speed 5632.11 samples/sec   Loss 3.1670   LearningRate 0.0109   Epoch: 13   Global Step: 76190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:19,409-Speed 5657.18 samples/sec   Loss 3.0980   LearningRate 0.0109   Epoch: 13   Global Step: 76200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:21,218-Speed 5664.13 samples/sec   Loss 3.2082   LearningRate 0.0109   Epoch: 13   Global Step: 76210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:23,041-Speed 5617.39 samples/sec   Loss 3.2395   LearningRate 0.0109   Epoch: 13   Global Step: 76220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:24,874-Speed 5590.25 samples/sec   Loss 3.2090   LearningRate 0.0109   Epoch: 13   Global Step: 76230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:26,723-Speed 5539.32 samples/sec   Loss 3.1198   LearningRate 0.0109   Epoch: 13   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:28,556-Speed 5587.14 samples/sec   Loss 3.1932   LearningRate 0.0109   Epoch: 13   Global Step: 76250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:30,389-Speed 5590.55 samples/sec   Loss 3.1359   LearningRate 0.0109   Epoch: 13   Global Step: 76260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:32,214-Speed 5613.32 samples/sec   Loss 3.1879   LearningRate 0.0108   Epoch: 13   Global Step: 76270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:34,026-Speed 5651.59 samples/sec   Loss 3.2333   LearningRate 0.0108   Epoch: 13   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:35,849-Speed 5620.01 samples/sec   Loss 3.2509   LearningRate 0.0108   Epoch: 13   Global Step: 76290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:37,659-Speed 5657.88 samples/sec   Loss 3.2118   LearningRate 0.0108   Epoch: 13   Global Step: 76300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:39,486-Speed 5606.40 samples/sec   Loss 3.2410   LearningRate 0.0108   Epoch: 13   Global Step: 76310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:16:41,290-Speed 5680.54 samples/sec   Loss 3.2171   LearningRate 0.0108   Epoch: 13   Global Step: 76320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:43,103-Speed 5649.19 samples/sec   Loss 3.2268   LearningRate 0.0108   Epoch: 13   Global Step: 76330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:44,914-Speed 5656.06 samples/sec   Loss 3.1864   LearningRate 0.0108   Epoch: 13   Global Step: 76340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:46,730-Speed 5640.11 samples/sec   Loss 3.1138   LearningRate 0.0108   Epoch: 13   Global Step: 76350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:48,542-Speed 5652.14 samples/sec   Loss 3.2296   LearningRate 0.0108   Epoch: 13   Global Step: 76360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:50,364-Speed 5623.86 samples/sec   Loss 3.2077   LearningRate 0.0108   Epoch: 13   Global Step: 76370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:52,186-Speed 5622.56 samples/sec   Loss 3.1257   LearningRate 0.0108   Epoch: 13   Global Step: 76380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:54,006-Speed 5629.36 samples/sec   Loss 3.1094   LearningRate 0.0108   Epoch: 13   Global Step: 76390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:55,842-Speed 5579.16 samples/sec   Loss 3.1729   LearningRate 0.0108   Epoch: 13   Global Step: 76400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:57,666-Speed 5615.27 samples/sec   Loss 3.1543   LearningRate 0.0108   Epoch: 13   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:16:59,467-Speed 5686.59 samples/sec   Loss 3.1642   LearningRate 0.0108   Epoch: 13   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:01,282-Speed 5643.27 samples/sec   Loss 3.2876   LearningRate 0.0108   Epoch: 13   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:03,097-Speed 5645.44 samples/sec   Loss 3.1817   LearningRate 0.0107   Epoch: 13   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:04,913-Speed 5638.30 samples/sec   Loss 3.2999   LearningRate 0.0107   Epoch: 13   Global Step: 76450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:06,765-Speed 5532.49 samples/sec   Loss 3.1026   LearningRate 0.0107   Epoch: 13   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:08,580-Speed 5643.54 samples/sec   Loss 3.1901   LearningRate 0.0107   Epoch: 13   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:10,394-Speed 5645.47 samples/sec   Loss 3.0701   LearningRate 0.0107   Epoch: 13   Global Step: 76480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:12,222-Speed 5604.52 samples/sec   Loss 3.2317   LearningRate 0.0107   Epoch: 13   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:14,044-Speed 5622.23 samples/sec   Loss 3.1760   LearningRate 0.0107   Epoch: 13   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:15,906-Speed 5502.43 samples/sec   Loss 3.2176   LearningRate 0.0107   Epoch: 13   Global Step: 76510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:17,717-Speed 5657.87 samples/sec   Loss 3.0989   LearningRate 0.0107   Epoch: 13   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:19,545-Speed 5601.71 samples/sec   Loss 3.2129   LearningRate 0.0107   Epoch: 13   Global Step: 76530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:21,370-Speed 5614.11 samples/sec   Loss 3.3102   LearningRate 0.0107   Epoch: 13   Global Step: 76540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:23,220-Speed 5536.83 samples/sec   Loss 3.1038   LearningRate 0.0107   Epoch: 13   Global Step: 76550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:25,050-Speed 5597.45 samples/sec   Loss 3.1778   LearningRate 0.0107   Epoch: 13   Global Step: 76560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:26,894-Speed 5555.17 samples/sec   Loss 3.1590   LearningRate 0.0107   Epoch: 13   Global Step: 76570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:28,711-Speed 5635.59 samples/sec   Loss 3.1158   LearningRate 0.0107   Epoch: 13   Global Step: 76580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:30,533-Speed 5621.93 samples/sec   Loss 3.2116   LearningRate 0.0107   Epoch: 13   Global Step: 76590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:32,357-Speed 5614.87 samples/sec   Loss 3.1653   LearningRate 0.0107   Epoch: 13   Global Step: 76600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:34,180-Speed 5619.06 samples/sec   Loss 3.2298   LearningRate 0.0106   Epoch: 13   Global Step: 76610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:35,995-Speed 5643.55 samples/sec   Loss 3.1728   LearningRate 0.0106   Epoch: 13   Global Step: 76620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:37,825-Speed 5601.21 samples/sec   Loss 3.1811   LearningRate 0.0106   Epoch: 13   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:39,663-Speed 5572.76 samples/sec   Loss 3.2791   LearningRate 0.0106   Epoch: 13   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:41,505-Speed 5559.07 samples/sec   Loss 3.1483   LearningRate 0.0106   Epoch: 13   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:43,324-Speed 5632.94 samples/sec   Loss 3.2219   LearningRate 0.0106   Epoch: 13   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:45,141-Speed 5636.99 samples/sec   Loss 3.2520   LearningRate 0.0106   Epoch: 13   Global Step: 76670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:46,961-Speed 5627.07 samples/sec   Loss 3.1643   LearningRate 0.0106   Epoch: 13   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:48,795-Speed 5586.34 samples/sec   Loss 3.2734   LearningRate 0.0106   Epoch: 13   Global Step: 76690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:50,624-Speed 5600.84 samples/sec   Loss 3.1713   LearningRate 0.0106   Epoch: 13   Global Step: 76700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:52,498-Speed 5464.43 samples/sec   Loss 3.0955   LearningRate 0.0106   Epoch: 13   Global Step: 76710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:54,344-Speed 5551.26 samples/sec   Loss 3.1416   LearningRate 0.0106   Epoch: 13   Global Step: 76720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:56,153-Speed 5660.37 samples/sec   Loss 3.3051   LearningRate 0.0106   Epoch: 13   Global Step: 76730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:57,967-Speed 5646.77 samples/sec   Loss 3.1617   LearningRate 0.0106   Epoch: 13   Global Step: 76740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:17:59,812-Speed 5554.63 samples/sec   Loss 3.2178   LearningRate 0.0106   Epoch: 13   Global Step: 76750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:01,629-Speed 5635.48 samples/sec   Loss 3.1447   LearningRate 0.0106   Epoch: 13   Global Step: 76760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:03,457-Speed 5603.84 samples/sec   Loss 3.1453   LearningRate 0.0106   Epoch: 13   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:05,285-Speed 5604.08 samples/sec   Loss 3.2737   LearningRate 0.0106   Epoch: 13   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:07,117-Speed 5592.99 samples/sec   Loss 3.1197   LearningRate 0.0105   Epoch: 13   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:08,945-Speed 5601.90 samples/sec   Loss 3.1747   LearningRate 0.0105   Epoch: 13   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:10,802-Speed 5517.39 samples/sec   Loss 3.1445   LearningRate 0.0105   Epoch: 13   Global Step: 76810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:12,630-Speed 5602.06 samples/sec   Loss 3.1329   LearningRate 0.0105   Epoch: 13   Global Step: 76820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:18:14,430-Speed 5690.43 samples/sec   Loss 3.2872   LearningRate 0.0105   Epoch: 13   Global Step: 76830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:16,249-Speed 5632.98 samples/sec   Loss 3.1971   LearningRate 0.0105   Epoch: 13   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:18,152-Speed 5382.60 samples/sec   Loss 3.1447   LearningRate 0.0105   Epoch: 13   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:20,098-Speed 5263.11 samples/sec   Loss 3.1172   LearningRate 0.0105   Epoch: 13   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:21,932-Speed 5585.84 samples/sec   Loss 3.0666   LearningRate 0.0105   Epoch: 13   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:23,759-Speed 5606.90 samples/sec   Loss 3.1078   LearningRate 0.0105   Epoch: 13   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:25,586-Speed 5607.65 samples/sec   Loss 3.1723   LearningRate 0.0105   Epoch: 13   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:27,406-Speed 5625.77 samples/sec   Loss 3.1015   LearningRate 0.0105   Epoch: 13   Global Step: 76900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:29,235-Speed 5603.34 samples/sec   Loss 3.1190   LearningRate 0.0105   Epoch: 13   Global Step: 76910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:31,059-Speed 5613.93 samples/sec   Loss 3.2040   LearningRate 0.0105   Epoch: 13   Global Step: 76920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:32,871-Speed 5653.33 samples/sec   Loss 3.1503   LearningRate 0.0105   Epoch: 13   Global Step: 76930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:34,714-Speed 5557.25 samples/sec   Loss 3.1643   LearningRate 0.0105   Epoch: 13   Global Step: 76940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:36,519-Speed 5675.27 samples/sec   Loss 3.1161   LearningRate 0.0105   Epoch: 13   Global Step: 76950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:38,351-Speed 5592.08 samples/sec   Loss 3.1628   LearningRate 0.0104   Epoch: 13   Global Step: 76960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:40,186-Speed 5582.73 samples/sec   Loss 3.0821   LearningRate 0.0104   Epoch: 13   Global Step: 76970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:42,010-Speed 5617.45 samples/sec   Loss 3.2289   LearningRate 0.0104   Epoch: 13   Global Step: 76980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:43,851-Speed 5563.19 samples/sec   Loss 3.0887   LearningRate 0.0104   Epoch: 13   Global Step: 76990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:45,706-Speed 5521.68 samples/sec   Loss 3.0439   LearningRate 0.0104   Epoch: 13   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:47,543-Speed 5575.42 samples/sec   Loss 3.1367   LearningRate 0.0104   Epoch: 13   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:49,356-Speed 5649.71 samples/sec   Loss 3.3571   LearningRate 0.0104   Epoch: 13   Global Step: 77020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:51,166-Speed 5662.12 samples/sec   Loss 3.2133   LearningRate 0.0104   Epoch: 13   Global Step: 77030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:52,984-Speed 5632.93 samples/sec   Loss 3.2622   LearningRate 0.0104   Epoch: 13   Global Step: 77040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:54,805-Speed 5626.35 samples/sec   Loss 3.1972   LearningRate 0.0104   Epoch: 13   Global Step: 77050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:56,613-Speed 5662.84 samples/sec   Loss 3.1639   LearningRate 0.0104   Epoch: 13   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:18:58,427-Speed 5649.85 samples/sec   Loss 3.1144   LearningRate 0.0104   Epoch: 13   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:00,255-Speed 5602.13 samples/sec   Loss 3.2680   LearningRate 0.0104   Epoch: 13   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:02,073-Speed 5635.76 samples/sec   Loss 3.1273   LearningRate 0.0104   Epoch: 13   Global Step: 77090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:03,900-Speed 5604.84 samples/sec   Loss 3.3080   LearningRate 0.0104   Epoch: 13   Global Step: 77100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:05,729-Speed 5603.10 samples/sec   Loss 3.1712   LearningRate 0.0104   Epoch: 13   Global Step: 77110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:07,542-Speed 5649.84 samples/sec   Loss 3.1137   LearningRate 0.0104   Epoch: 13   Global Step: 77120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:09,356-Speed 5644.92 samples/sec   Loss 3.2399   LearningRate 0.0104   Epoch: 13   Global Step: 77130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:11,191-Speed 5584.15 samples/sec   Loss 3.1625   LearningRate 0.0103   Epoch: 13   Global Step: 77140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:13,009-Speed 5633.36 samples/sec   Loss 3.1166   LearningRate 0.0103   Epoch: 13   Global Step: 77150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:14,827-Speed 5633.65 samples/sec   Loss 3.0976   LearningRate 0.0103   Epoch: 13   Global Step: 77160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:16,641-Speed 5646.77 samples/sec   Loss 3.1868   LearningRate 0.0103   Epoch: 13   Global Step: 77170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:18,465-Speed 5615.42 samples/sec   Loss 3.2144   LearningRate 0.0103   Epoch: 13   Global Step: 77180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:20,304-Speed 5572.06 samples/sec   Loss 3.1602   LearningRate 0.0103   Epoch: 13   Global Step: 77190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:22,121-Speed 5637.98 samples/sec   Loss 3.2846   LearningRate 0.0103   Epoch: 13   Global Step: 77200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:23,946-Speed 5612.25 samples/sec   Loss 3.1630   LearningRate 0.0103   Epoch: 13   Global Step: 77210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:25,761-Speed 5644.63 samples/sec   Loss 3.0920   LearningRate 0.0103   Epoch: 13   Global Step: 77220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:27,593-Speed 5591.19 samples/sec   Loss 3.1506   LearningRate 0.0103   Epoch: 13   Global Step: 77230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:19:29,396-Speed 5680.10 samples/sec   Loss 3.1876   LearningRate 0.0103   Epoch: 13   Global Step: 77240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:31,230-Speed 5586.46 samples/sec   Loss 3.1890   LearningRate 0.0103   Epoch: 13   Global Step: 77250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:33,054-Speed 5616.70 samples/sec   Loss 3.0875   LearningRate 0.0103   Epoch: 13   Global Step: 77260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:34,876-Speed 5621.22 samples/sec   Loss 3.1046   LearningRate 0.0103   Epoch: 13   Global Step: 77270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:36,708-Speed 5589.97 samples/sec   Loss 3.1851   LearningRate 0.0103   Epoch: 13   Global Step: 77280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:38,538-Speed 5597.82 samples/sec   Loss 3.1329   LearningRate 0.0103   Epoch: 13   Global Step: 77290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:40,363-Speed 5611.83 samples/sec   Loss 3.0834   LearningRate 0.0103   Epoch: 13   Global Step: 77300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:42,187-Speed 5618.37 samples/sec   Loss 3.2092   LearningRate 0.0103   Epoch: 13   Global Step: 77310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:44,004-Speed 5635.95 samples/sec   Loss 3.2547   LearningRate 0.0102   Epoch: 13   Global Step: 77320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:45,819-Speed 5643.30 samples/sec   Loss 3.2605   LearningRate 0.0102   Epoch: 13   Global Step: 77330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:47,648-Speed 5602.71 samples/sec   Loss 3.0288   LearningRate 0.0102   Epoch: 13   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:49,483-Speed 5581.88 samples/sec   Loss 3.1550   LearningRate 0.0102   Epoch: 13   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:51,293-Speed 5658.51 samples/sec   Loss 3.2143   LearningRate 0.0102   Epoch: 13   Global Step: 77360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:53,121-Speed 5605.29 samples/sec   Loss 3.1559   LearningRate 0.0102   Epoch: 13   Global Step: 77370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:54,942-Speed 5623.65 samples/sec   Loss 3.0765   LearningRate 0.0102   Epoch: 13   Global Step: 77380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:56,758-Speed 5640.87 samples/sec   Loss 3.1751   LearningRate 0.0102   Epoch: 13   Global Step: 77390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:19:58,608-Speed 5536.93 samples/sec   Loss 3.1226   LearningRate 0.0102   Epoch: 13   Global Step: 77400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:00,454-Speed 5548.37 samples/sec   Loss 3.1416   LearningRate 0.0102   Epoch: 13   Global Step: 77410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:02,320-Speed 5491.29 samples/sec   Loss 3.1669   LearningRate 0.0102   Epoch: 13   Global Step: 77420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:04,154-Speed 5583.77 samples/sec   Loss 3.1179   LearningRate 0.0102   Epoch: 13   Global Step: 77430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:05,983-Speed 5599.15 samples/sec   Loss 3.1673   LearningRate 0.0102   Epoch: 13   Global Step: 77440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:20:07,805-Speed 5622.91 samples/sec   Loss 3.1382   LearningRate 0.0102   Epoch: 13   Global Step: 77450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:09,627-Speed 5622.39 samples/sec   Loss 3.1673   LearningRate 0.0102   Epoch: 13   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:11,463-Speed 5579.07 samples/sec   Loss 3.2164   LearningRate 0.0102   Epoch: 13   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:13,283-Speed 5629.53 samples/sec   Loss 3.2658   LearningRate 0.0102   Epoch: 13   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:15,109-Speed 5609.99 samples/sec   Loss 3.1378   LearningRate 0.0101   Epoch: 13   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:16,921-Speed 5652.42 samples/sec   Loss 3.2423   LearningRate 0.0101   Epoch: 13   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:18,741-Speed 5626.70 samples/sec   Loss 3.1438   LearningRate 0.0101   Epoch: 13   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:20,569-Speed 5605.85 samples/sec   Loss 3.1695   LearningRate 0.0101   Epoch: 13   Global Step: 77520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:22,381-Speed 5654.25 samples/sec   Loss 3.1755   LearningRate 0.0101   Epoch: 13   Global Step: 77530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:24,230-Speed 5538.62 samples/sec   Loss 3.0719   LearningRate 0.0101   Epoch: 13   Global Step: 77540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:26,041-Speed 5655.17 samples/sec   Loss 3.1311   LearningRate 0.0101   Epoch: 13   Global Step: 77550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:27,853-Speed 5652.19 samples/sec   Loss 3.2555   LearningRate 0.0101   Epoch: 13   Global Step: 77560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:29,667-Speed 5649.41 samples/sec   Loss 3.1021   LearningRate 0.0101   Epoch: 13   Global Step: 77570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:31,482-Speed 5644.88 samples/sec   Loss 3.1495   LearningRate 0.0101   Epoch: 13   Global Step: 77580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:33,318-Speed 5579.16 samples/sec   Loss 3.2431   LearningRate 0.0101   Epoch: 13   Global Step: 77590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:35,137-Speed 5630.20 samples/sec   Loss 3.3041   LearningRate 0.0101   Epoch: 13   Global Step: 77600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:36,952-Speed 5642.94 samples/sec   Loss 3.2893   LearningRate 0.0101   Epoch: 13   Global Step: 77610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:38,772-Speed 5629.06 samples/sec   Loss 3.2195   LearningRate 0.0101   Epoch: 13   Global Step: 77620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:40,609-Speed 5576.75 samples/sec   Loss 3.2884   LearningRate 0.0101   Epoch: 13   Global Step: 77630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:42,438-Speed 5598.68 samples/sec   Loss 3.0691   LearningRate 0.0101   Epoch: 13   Global Step: 77640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:44,253-Speed 5643.72 samples/sec   Loss 3.1709   LearningRate 0.0101   Epoch: 13   Global Step: 77650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:46,099-Speed 5549.76 samples/sec   Loss 3.2866   LearningRate 0.0101   Epoch: 13   Global Step: 77660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:47,914-Speed 5643.78 samples/sec   Loss 3.2249   LearningRate 0.0100   Epoch: 13   Global Step: 77670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:49,728-Speed 5646.13 samples/sec   Loss 3.2084   LearningRate 0.0100   Epoch: 13   Global Step: 77680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:51,545-Speed 5639.08 samples/sec   Loss 3.1901   LearningRate 0.0100   Epoch: 13   Global Step: 77690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:53,355-Speed 5659.12 samples/sec   Loss 3.1538   LearningRate 0.0100   Epoch: 13   Global Step: 77700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:55,183-Speed 5602.93 samples/sec   Loss 3.1789   LearningRate 0.0100   Epoch: 13   Global Step: 77710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:57,018-Speed 5584.35 samples/sec   Loss 3.1045   LearningRate 0.0100   Epoch: 13   Global Step: 77720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:20:58,835-Speed 5637.37 samples/sec   Loss 3.1364   LearningRate 0.0100   Epoch: 13   Global Step: 77730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:00,648-Speed 5648.05 samples/sec   Loss 3.2277   LearningRate 0.0100   Epoch: 13   Global Step: 77740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:02,479-Speed 5594.51 samples/sec   Loss 3.1974   LearningRate 0.0100   Epoch: 13   Global Step: 77750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:04,296-Speed 5637.17 samples/sec   Loss 3.1627   LearningRate 0.0100   Epoch: 13   Global Step: 77760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:06,120-Speed 5614.96 samples/sec   Loss 3.2451   LearningRate 0.0100   Epoch: 13   Global Step: 77770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:07,934-Speed 5647.05 samples/sec   Loss 3.1677   LearningRate 0.0100   Epoch: 13   Global Step: 77780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:09,760-Speed 5609.00 samples/sec   Loss 3.0349   LearningRate 0.0100   Epoch: 13   Global Step: 77790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:11,589-Speed 5600.96 samples/sec   Loss 3.1191   LearningRate 0.0100   Epoch: 13   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:13,416-Speed 5606.99 samples/sec   Loss 3.1318   LearningRate 0.0100   Epoch: 13   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:15,228-Speed 5654.07 samples/sec   Loss 3.1214   LearningRate 0.0100   Epoch: 13   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:17,060-Speed 5591.11 samples/sec   Loss 3.2822   LearningRate 0.0100   Epoch: 13   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:18,890-Speed 5599.51 samples/sec   Loss 3.2057   LearningRate 0.0100   Epoch: 13   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:20,711-Speed 5623.83 samples/sec   Loss 3.1556   LearningRate 0.0099   Epoch: 13   Global Step: 77850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:21:22,530-Speed 5632.29 samples/sec   Loss 3.1710   LearningRate 0.0099   Epoch: 13   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:24,349-Speed 5631.82 samples/sec   Loss 3.0500   LearningRate 0.0099   Epoch: 13   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:26,162-Speed 5649.98 samples/sec   Loss 3.1885   LearningRate 0.0099   Epoch: 13   Global Step: 77880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:27,984-Speed 5621.45 samples/sec   Loss 3.1613   LearningRate 0.0099   Epoch: 13   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:29,793-Speed 5662.69 samples/sec   Loss 3.1534   LearningRate 0.0099   Epoch: 13   Global Step: 77900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:31,604-Speed 5655.23 samples/sec   Loss 3.1107   LearningRate 0.0099   Epoch: 13   Global Step: 77910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:33,414-Speed 5660.14 samples/sec   Loss 3.1897   LearningRate 0.0099   Epoch: 13   Global Step: 77920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:35,230-Speed 5639.02 samples/sec   Loss 3.0785   LearningRate 0.0099   Epoch: 13   Global Step: 77930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:37,048-Speed 5634.25 samples/sec   Loss 3.1017   LearningRate 0.0099   Epoch: 13   Global Step: 77940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:38,861-Speed 5651.29 samples/sec   Loss 3.3543   LearningRate 0.0099   Epoch: 13   Global Step: 77950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:40,691-Speed 5597.80 samples/sec   Loss 3.3864   LearningRate 0.0099   Epoch: 13   Global Step: 77960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:42,521-Speed 5597.38 samples/sec   Loss 3.1575   LearningRate 0.0099   Epoch: 13   Global Step: 77970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:44,348-Speed 5607.41 samples/sec   Loss 3.1280   LearningRate 0.0099   Epoch: 13   Global Step: 77980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:46,167-Speed 5631.37 samples/sec   Loss 3.1218   LearningRate 0.0099   Epoch: 13   Global Step: 77990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:21:47,988-Speed 5624.69 samples/sec   Loss 3.2023   LearningRate 0.0099   Epoch: 13   Global Step: 78000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:22:14,152-[lfw][78000]XNorm: 22.107671
Training: 2022-04-27 06:22:14,152-[lfw][78000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-27 06:22:14,153-[lfw][78000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:22:44,487-[cfp_fp][78000]XNorm: 20.108242
Training: 2022-04-27 06:22:44,488-[cfp_fp][78000]Accuracy-Flip: 0.96943+-0.00860
Training: 2022-04-27 06:22:44,488-[cfp_fp][78000]Accuracy-Highest: 0.96943
Training: 2022-04-27 06:23:10,722-[agedb_30][78000]XNorm: 22.114465
Training: 2022-04-27 06:23:10,723-[agedb_30][78000]Accuracy-Flip: 0.97633+-0.00785
Training: 2022-04-27 06:23:10,723-[agedb_30][78000]Accuracy-Highest: 0.97933
Training: 2022-04-27 06:23:12,547-Speed 121.10 samples/sec   Loss 3.1804   LearningRate 0.0099   Epoch: 13   Global Step: 78010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:14,381-Speed 5587.30 samples/sec   Loss 3.1165   LearningRate 0.0099   Epoch: 13   Global Step: 78020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:16,200-Speed 5629.67 samples/sec   Loss 3.1369   LearningRate 0.0098   Epoch: 13   Global Step: 78030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:18,027-Speed 5607.22 samples/sec   Loss 2.9713   LearningRate 0.0098   Epoch: 13   Global Step: 78040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:19,847-Speed 5627.11 samples/sec   Loss 3.2287   LearningRate 0.0098   Epoch: 13   Global Step: 78050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:21,647-Speed 5692.61 samples/sec   Loss 3.1772   LearningRate 0.0098   Epoch: 13   Global Step: 78060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:23,480-Speed 5587.72 samples/sec   Loss 3.1251   LearningRate 0.0098   Epoch: 13   Global Step: 78070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:25,307-Speed 5608.03 samples/sec   Loss 3.1798   LearningRate 0.0098   Epoch: 13   Global Step: 78080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:27,147-Speed 5566.93 samples/sec   Loss 3.2302   LearningRate 0.0098   Epoch: 13   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:29,000-Speed 5526.67 samples/sec   Loss 3.0805   LearningRate 0.0098   Epoch: 13   Global Step: 78100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:30,806-Speed 5673.23 samples/sec   Loss 3.0584   LearningRate 0.0098   Epoch: 13   Global Step: 78110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:32,636-Speed 5597.09 samples/sec   Loss 3.1085   LearningRate 0.0098   Epoch: 13   Global Step: 78120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:34,476-Speed 5566.50 samples/sec   Loss 3.1274   LearningRate 0.0098   Epoch: 13   Global Step: 78130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:36,301-Speed 5613.25 samples/sec   Loss 3.1534   LearningRate 0.0098   Epoch: 13   Global Step: 78140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:38,142-Speed 5561.70 samples/sec   Loss 3.0922   LearningRate 0.0098   Epoch: 13   Global Step: 78150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:39,949-Speed 5671.44 samples/sec   Loss 3.0799   LearningRate 0.0098   Epoch: 13   Global Step: 78160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:41,762-Speed 5648.78 samples/sec   Loss 3.1042   LearningRate 0.0098   Epoch: 13   Global Step: 78170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:43,592-Speed 5596.98 samples/sec   Loss 3.2598   LearningRate 0.0098   Epoch: 13   Global Step: 78180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:45,412-Speed 5629.37 samples/sec   Loss 3.1319   LearningRate 0.0098   Epoch: 13   Global Step: 78190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:47,249-Speed 5575.54 samples/sec   Loss 3.1932   LearningRate 0.0098   Epoch: 13   Global Step: 78200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:49,087-Speed 5572.14 samples/sec   Loss 3.1226   LearningRate 0.0098   Epoch: 13   Global Step: 78210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:50,920-Speed 5590.14 samples/sec   Loss 3.1560   LearningRate 0.0097   Epoch: 13   Global Step: 78220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:52,759-Speed 5569.10 samples/sec   Loss 2.9904   LearningRate 0.0097   Epoch: 13   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:54,597-Speed 5573.31 samples/sec   Loss 2.9770   LearningRate 0.0097   Epoch: 13   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:56,435-Speed 5572.75 samples/sec   Loss 3.1198   LearningRate 0.0097   Epoch: 13   Global Step: 78250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:23:58,255-Speed 5626.49 samples/sec   Loss 3.0945   LearningRate 0.0097   Epoch: 13   Global Step: 78260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:24:00,078-Speed 5621.58 samples/sec   Loss 3.0997   LearningRate 0.0097   Epoch: 13   Global Step: 78270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:01,917-Speed 5570.30 samples/sec   Loss 3.1348   LearningRate 0.0097   Epoch: 13   Global Step: 78280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:03,741-Speed 5615.64 samples/sec   Loss 3.0764   LearningRate 0.0097   Epoch: 13   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:05,558-Speed 5637.23 samples/sec   Loss 3.0921   LearningRate 0.0097   Epoch: 13   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:07,375-Speed 5639.27 samples/sec   Loss 3.1763   LearningRate 0.0097   Epoch: 13   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:09,183-Speed 5665.26 samples/sec   Loss 3.1507   LearningRate 0.0097   Epoch: 13   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:10,995-Speed 5650.81 samples/sec   Loss 3.1487   LearningRate 0.0097   Epoch: 13   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:12,808-Speed 5651.26 samples/sec   Loss 3.1544   LearningRate 0.0097   Epoch: 13   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:14,626-Speed 5633.35 samples/sec   Loss 3.1651   LearningRate 0.0097   Epoch: 13   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:16,444-Speed 5635.84 samples/sec   Loss 3.2241   LearningRate 0.0097   Epoch: 13   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:18,258-Speed 5645.68 samples/sec   Loss 3.0820   LearningRate 0.0097   Epoch: 13   Global Step: 78370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:20,093-Speed 5583.06 samples/sec   Loss 3.2459   LearningRate 0.0097   Epoch: 13   Global Step: 78380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:21,923-Speed 5598.52 samples/sec   Loss 3.1131   LearningRate 0.0097   Epoch: 13   Global Step: 78390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:23,749-Speed 5608.90 samples/sec   Loss 3.1449   LearningRate 0.0096   Epoch: 13   Global Step: 78400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:25,657-Speed 5368.73 samples/sec   Loss 3.1110   LearningRate 0.0096   Epoch: 13   Global Step: 78410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:27,504-Speed 5549.15 samples/sec   Loss 3.0848   LearningRate 0.0096   Epoch: 13   Global Step: 78420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:29,430-Speed 5316.40 samples/sec   Loss 3.1122   LearningRate 0.0096   Epoch: 13   Global Step: 78430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:31,319-Speed 5422.65 samples/sec   Loss 3.1801   LearningRate 0.0096   Epoch: 13   Global Step: 78440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:33,164-Speed 5553.47 samples/sec   Loss 3.0818   LearningRate 0.0096   Epoch: 13   Global Step: 78450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:34,996-Speed 5591.43 samples/sec   Loss 3.2032   LearningRate 0.0096   Epoch: 13   Global Step: 78460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:36,815-Speed 5630.39 samples/sec   Loss 3.2297   LearningRate 0.0096   Epoch: 13   Global Step: 78470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:24:38,623-Speed 5666.28 samples/sec   Loss 3.1776   LearningRate 0.0096   Epoch: 13   Global Step: 78480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:40,456-Speed 5589.11 samples/sec   Loss 3.1482   LearningRate 0.0096   Epoch: 13   Global Step: 78490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:42,279-Speed 5616.63 samples/sec   Loss 2.9060   LearningRate 0.0096   Epoch: 13   Global Step: 78500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:44,095-Speed 5640.49 samples/sec   Loss 3.1430   LearningRate 0.0096   Epoch: 13   Global Step: 78510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:45,922-Speed 5607.44 samples/sec   Loss 3.1637   LearningRate 0.0096   Epoch: 13   Global Step: 78520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:47,773-Speed 5534.54 samples/sec   Loss 3.0902   LearningRate 0.0096   Epoch: 13   Global Step: 78530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:49,606-Speed 5589.85 samples/sec   Loss 3.1520   LearningRate 0.0096   Epoch: 13   Global Step: 78540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:51,440-Speed 5585.37 samples/sec   Loss 3.0438   LearningRate 0.0096   Epoch: 13   Global Step: 78550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:53,260-Speed 5628.14 samples/sec   Loss 3.1916   LearningRate 0.0096   Epoch: 13   Global Step: 78560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:55,090-Speed 5595.56 samples/sec   Loss 2.9703   LearningRate 0.0096   Epoch: 13   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:56,932-Speed 5563.93 samples/sec   Loss 3.0716   LearningRate 0.0095   Epoch: 13   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:24:58,773-Speed 5562.50 samples/sec   Loss 3.1498   LearningRate 0.0095   Epoch: 13   Global Step: 78590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:00,664-Speed 5416.17 samples/sec   Loss 3.1068   LearningRate 0.0095   Epoch: 13   Global Step: 78600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:02,514-Speed 5536.63 samples/sec   Loss 3.1042   LearningRate 0.0095   Epoch: 13   Global Step: 78610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:04,342-Speed 5605.52 samples/sec   Loss 3.2181   LearningRate 0.0095   Epoch: 13   Global Step: 78620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:06,164-Speed 5622.13 samples/sec   Loss 2.9937   LearningRate 0.0095   Epoch: 13   Global Step: 78630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:07,984-Speed 5626.86 samples/sec   Loss 3.2153   LearningRate 0.0095   Epoch: 13   Global Step: 78640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:09,814-Speed 5600.34 samples/sec   Loss 3.0810   LearningRate 0.0095   Epoch: 13   Global Step: 78650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:11,658-Speed 5554.86 samples/sec   Loss 3.1475   LearningRate 0.0095   Epoch: 13   Global Step: 78660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:13,512-Speed 5523.20 samples/sec   Loss 3.2523   LearningRate 0.0095   Epoch: 13   Global Step: 78670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:15,326-Speed 5648.85 samples/sec   Loss 3.0464   LearningRate 0.0095   Epoch: 13   Global Step: 78680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:17,159-Speed 5585.97 samples/sec   Loss 3.1136   LearningRate 0.0095   Epoch: 13   Global Step: 78690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:18,992-Speed 5591.10 samples/sec   Loss 3.1022   LearningRate 0.0095   Epoch: 13   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:20,827-Speed 5581.70 samples/sec   Loss 3.0570   LearningRate 0.0095   Epoch: 13   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:22,654-Speed 5606.84 samples/sec   Loss 3.0801   LearningRate 0.0095   Epoch: 13   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:24,482-Speed 5602.19 samples/sec   Loss 3.1621   LearningRate 0.0095   Epoch: 13   Global Step: 78730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:26,295-Speed 5649.29 samples/sec   Loss 3.2029   LearningRate 0.0095   Epoch: 13   Global Step: 78740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:28,113-Speed 5636.55 samples/sec   Loss 3.0764   LearningRate 0.0095   Epoch: 13   Global Step: 78750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:29,943-Speed 5597.13 samples/sec   Loss 3.1483   LearningRate 0.0095   Epoch: 13   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:31,868-Speed 5322.37 samples/sec   Loss 2.9786   LearningRate 0.0094   Epoch: 13   Global Step: 78770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:33,752-Speed 5437.10 samples/sec   Loss 3.1824   LearningRate 0.0094   Epoch: 13   Global Step: 78780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:25:35,560-Speed 5665.40 samples/sec   Loss 3.1558   LearningRate 0.0094   Epoch: 13   Global Step: 78790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:37,412-Speed 5529.63 samples/sec   Loss 3.2244   LearningRate 0.0094   Epoch: 13   Global Step: 78800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:39,250-Speed 5572.13 samples/sec   Loss 2.9813   LearningRate 0.0094   Epoch: 13   Global Step: 78810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:41,086-Speed 5579.86 samples/sec   Loss 3.1393   LearningRate 0.0094   Epoch: 13   Global Step: 78820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:42,919-Speed 5589.52 samples/sec   Loss 3.0371   LearningRate 0.0094   Epoch: 13   Global Step: 78830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:44,743-Speed 5614.41 samples/sec   Loss 3.0898   LearningRate 0.0094   Epoch: 13   Global Step: 78840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:46,574-Speed 5593.53 samples/sec   Loss 3.0856   LearningRate 0.0094   Epoch: 13   Global Step: 78850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:48,393-Speed 5632.03 samples/sec   Loss 3.0339   LearningRate 0.0094   Epoch: 13   Global Step: 78860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:50,234-Speed 5563.35 samples/sec   Loss 3.0570   LearningRate 0.0094   Epoch: 13   Global Step: 78870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:52,053-Speed 5633.84 samples/sec   Loss 3.1556   LearningRate 0.0094   Epoch: 13   Global Step: 78880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:53,888-Speed 5583.05 samples/sec   Loss 3.1092   LearningRate 0.0094   Epoch: 13   Global Step: 78890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:25:55,707-Speed 5631.12 samples/sec   Loss 3.1993   LearningRate 0.0094   Epoch: 13   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:57,543-Speed 5577.74 samples/sec   Loss 3.1474   LearningRate 0.0094   Epoch: 13   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:25:59,396-Speed 5527.73 samples/sec   Loss 3.0010   LearningRate 0.0094   Epoch: 13   Global Step: 78920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:01,231-Speed 5584.65 samples/sec   Loss 3.1012   LearningRate 0.0094   Epoch: 13   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:03,064-Speed 5586.44 samples/sec   Loss 3.0223   LearningRate 0.0094   Epoch: 13   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:04,897-Speed 5588.40 samples/sec   Loss 2.9832   LearningRate 0.0093   Epoch: 13   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:06,746-Speed 5539.77 samples/sec   Loss 3.0362   LearningRate 0.0093   Epoch: 13   Global Step: 78960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:08,567-Speed 5626.74 samples/sec   Loss 3.0773   LearningRate 0.0093   Epoch: 13   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:10,397-Speed 5596.84 samples/sec   Loss 3.0794   LearningRate 0.0093   Epoch: 13   Global Step: 78980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:12,230-Speed 5587.91 samples/sec   Loss 3.0694   LearningRate 0.0093   Epoch: 13   Global Step: 78990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:14,031-Speed 5689.05 samples/sec   Loss 2.9932   LearningRate 0.0093   Epoch: 13   Global Step: 79000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:15,844-Speed 5649.98 samples/sec   Loss 3.1218   LearningRate 0.0093   Epoch: 13   Global Step: 79010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:17,682-Speed 5571.31 samples/sec   Loss 3.1642   LearningRate 0.0093   Epoch: 13   Global Step: 79020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:19,497-Speed 5645.66 samples/sec   Loss 2.9571   LearningRate 0.0093   Epoch: 13   Global Step: 79030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:21,324-Speed 5604.75 samples/sec   Loss 3.0517   LearningRate 0.0093   Epoch: 13   Global Step: 79040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:23,168-Speed 5557.47 samples/sec   Loss 3.1575   LearningRate 0.0093   Epoch: 13   Global Step: 79050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:25,001-Speed 5585.84 samples/sec   Loss 3.0474   LearningRate 0.0093   Epoch: 13   Global Step: 79060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:26,840-Speed 5571.44 samples/sec   Loss 3.0525   LearningRate 0.0093   Epoch: 13   Global Step: 79070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:28,667-Speed 5607.58 samples/sec   Loss 3.0767   LearningRate 0.0093   Epoch: 13   Global Step: 79080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:30,500-Speed 5586.47 samples/sec   Loss 3.1379   LearningRate 0.0093   Epoch: 13   Global Step: 79090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:32,316-Speed 5642.31 samples/sec   Loss 3.0973   LearningRate 0.0093   Epoch: 13   Global Step: 79100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:34,140-Speed 5615.15 samples/sec   Loss 3.1189   LearningRate 0.0093   Epoch: 13   Global Step: 79110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:35,982-Speed 5561.97 samples/sec   Loss 3.1245   LearningRate 0.0093   Epoch: 13   Global Step: 79120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:37,810-Speed 5602.56 samples/sec   Loss 3.1728   LearningRate 0.0093   Epoch: 13   Global Step: 79130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:39,637-Speed 5609.22 samples/sec   Loss 2.9669   LearningRate 0.0092   Epoch: 13   Global Step: 79140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:41,463-Speed 5609.29 samples/sec   Loss 3.0826   LearningRate 0.0092   Epoch: 13   Global Step: 79150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:43,271-Speed 5665.22 samples/sec   Loss 3.1966   LearningRate 0.0092   Epoch: 13   Global Step: 79160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:45,102-Speed 5593.90 samples/sec   Loss 3.0350   LearningRate 0.0092   Epoch: 13   Global Step: 79170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:46,928-Speed 5610.87 samples/sec   Loss 3.1505   LearningRate 0.0092   Epoch: 13   Global Step: 79180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:48,742-Speed 5644.94 samples/sec   Loss 3.1635   LearningRate 0.0092   Epoch: 13   Global Step: 79190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:50,567-Speed 5614.86 samples/sec   Loss 2.9794   LearningRate 0.0092   Epoch: 13   Global Step: 79200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:52,380-Speed 5650.50 samples/sec   Loss 3.1230   LearningRate 0.0092   Epoch: 13   Global Step: 79210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:54,200-Speed 5627.45 samples/sec   Loss 3.0098   LearningRate 0.0092   Epoch: 13   Global Step: 79220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:56,035-Speed 5583.10 samples/sec   Loss 3.0903   LearningRate 0.0092   Epoch: 13   Global Step: 79230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:57,847-Speed 5651.51 samples/sec   Loss 3.1034   LearningRate 0.0092   Epoch: 13   Global Step: 79240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:26:59,668-Speed 5625.67 samples/sec   Loss 3.1360   LearningRate 0.0092   Epoch: 13   Global Step: 79250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:01,518-Speed 5536.52 samples/sec   Loss 3.1063   LearningRate 0.0092   Epoch: 13   Global Step: 79260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:03,366-Speed 5544.91 samples/sec   Loss 3.0730   LearningRate 0.0092   Epoch: 13   Global Step: 79270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:05,200-Speed 5586.11 samples/sec   Loss 3.2944   LearningRate 0.0092   Epoch: 13   Global Step: 79280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:07,026-Speed 5609.48 samples/sec   Loss 3.1384   LearningRate 0.0092   Epoch: 13   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:08,843-Speed 5635.49 samples/sec   Loss 3.0454   LearningRate 0.0092   Epoch: 13   Global Step: 79300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:10,656-Speed 5651.10 samples/sec   Loss 3.0608   LearningRate 0.0092   Epoch: 13   Global Step: 79310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:12,481-Speed 5612.80 samples/sec   Loss 2.9834   LearningRate 0.0092   Epoch: 13   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:14,324-Speed 5557.92 samples/sec   Loss 3.0983   LearningRate 0.0091   Epoch: 13   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:16,137-Speed 5650.26 samples/sec   Loss 3.1103   LearningRate 0.0091   Epoch: 13   Global Step: 79340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:17,955-Speed 5633.65 samples/sec   Loss 3.1127   LearningRate 0.0091   Epoch: 13   Global Step: 79350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:19,777-Speed 5621.39 samples/sec   Loss 3.0927   LearningRate 0.0091   Epoch: 13   Global Step: 79360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:21,605-Speed 5604.96 samples/sec   Loss 2.9989   LearningRate 0.0091   Epoch: 13   Global Step: 79370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:23,441-Speed 5581.09 samples/sec   Loss 3.0729   LearningRate 0.0091   Epoch: 13   Global Step: 79380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:25,281-Speed 5565.35 samples/sec   Loss 3.0597   LearningRate 0.0091   Epoch: 13   Global Step: 79390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:27,085-Speed 5677.94 samples/sec   Loss 3.1341   LearningRate 0.0091   Epoch: 13   Global Step: 79400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:28,899-Speed 5646.10 samples/sec   Loss 3.0121   LearningRate 0.0091   Epoch: 13   Global Step: 79410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:30,719-Speed 5627.54 samples/sec   Loss 3.0863   LearningRate 0.0091   Epoch: 13   Global Step: 79420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:32,548-Speed 5602.94 samples/sec   Loss 3.1486   LearningRate 0.0091   Epoch: 13   Global Step: 79430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:34,369-Speed 5624.65 samples/sec   Loss 3.0912   LearningRate 0.0091   Epoch: 13   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:36,209-Speed 5566.25 samples/sec   Loss 3.0433   LearningRate 0.0091   Epoch: 13   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:38,037-Speed 5603.26 samples/sec   Loss 3.0184   LearningRate 0.0091   Epoch: 13   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:39,863-Speed 5611.52 samples/sec   Loss 3.1382   LearningRate 0.0091   Epoch: 13   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:41,675-Speed 5654.10 samples/sec   Loss 2.9810   LearningRate 0.0091   Epoch: 13   Global Step: 79480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:43,499-Speed 5616.74 samples/sec   Loss 3.0112   LearningRate 0.0091   Epoch: 13   Global Step: 79490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:45,324-Speed 5612.62 samples/sec   Loss 3.0422   LearningRate 0.0091   Epoch: 13   Global Step: 79500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:27:47,215-Speed 5415.47 samples/sec   Loss 3.2039   LearningRate 0.0090   Epoch: 13   Global Step: 79510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:49,043-Speed 5604.56 samples/sec   Loss 3.1363   LearningRate 0.0090   Epoch: 13   Global Step: 79520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:50,874-Speed 5595.00 samples/sec   Loss 3.0258   LearningRate 0.0090   Epoch: 13   Global Step: 79530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:52,699-Speed 5613.51 samples/sec   Loss 3.1058   LearningRate 0.0090   Epoch: 13   Global Step: 79540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:54,519-Speed 5626.71 samples/sec   Loss 2.8801   LearningRate 0.0090   Epoch: 13   Global Step: 79550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:56,340-Speed 5625.71 samples/sec   Loss 3.0525   LearningRate 0.0090   Epoch: 13   Global Step: 79560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:58,173-Speed 5587.67 samples/sec   Loss 3.0144   LearningRate 0.0090   Epoch: 13   Global Step: 79570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:27:59,988-Speed 5642.01 samples/sec   Loss 3.1131   LearningRate 0.0090   Epoch: 13   Global Step: 79580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:01,816-Speed 5604.51 samples/sec   Loss 3.0663   LearningRate 0.0090   Epoch: 13   Global Step: 79590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:03,699-Speed 5441.31 samples/sec   Loss 3.0415   LearningRate 0.0090   Epoch: 13   Global Step: 79600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:15,201-Speed 890.38 samples/sec   Loss 2.7140   LearningRate 0.0090   Epoch: 14   Global Step: 79610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:17,052-Speed 5535.03 samples/sec   Loss 2.5417   LearningRate 0.0090   Epoch: 14   Global Step: 79620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:18,901-Speed 5540.43 samples/sec   Loss 2.4799   LearningRate 0.0090   Epoch: 14   Global Step: 79630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:20,720-Speed 5630.64 samples/sec   Loss 2.4051   LearningRate 0.0090   Epoch: 14   Global Step: 79640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:22,821-Speed 4874.23 samples/sec   Loss 2.3981   LearningRate 0.0090   Epoch: 14   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:24,670-Speed 5541.43 samples/sec   Loss 2.4373   LearningRate 0.0090   Epoch: 14   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:26,491-Speed 5624.78 samples/sec   Loss 2.4530   LearningRate 0.0090   Epoch: 14   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:28,333-Speed 5560.14 samples/sec   Loss 2.3599   LearningRate 0.0090   Epoch: 14   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:30,166-Speed 5590.27 samples/sec   Loss 2.4640   LearningRate 0.0090   Epoch: 14   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:31,995-Speed 5598.43 samples/sec   Loss 2.4447   LearningRate 0.0089   Epoch: 14   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:33,846-Speed 5537.11 samples/sec   Loss 2.4108   LearningRate 0.0089   Epoch: 14   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:35,658-Speed 5652.73 samples/sec   Loss 2.5049   LearningRate 0.0089   Epoch: 14   Global Step: 79720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:37,480-Speed 5622.24 samples/sec   Loss 2.4285   LearningRate 0.0089   Epoch: 14   Global Step: 79730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:39,300-Speed 5627.14 samples/sec   Loss 2.3776   LearningRate 0.0089   Epoch: 14   Global Step: 79740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:41,114-Speed 5648.97 samples/sec   Loss 2.5663   LearningRate 0.0089   Epoch: 14   Global Step: 79750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:42,962-Speed 5540.37 samples/sec   Loss 2.2988   LearningRate 0.0089   Epoch: 14   Global Step: 79760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:44,793-Speed 5596.35 samples/sec   Loss 2.4592   LearningRate 0.0089   Epoch: 14   Global Step: 79770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:46,610-Speed 5636.43 samples/sec   Loss 2.4305   LearningRate 0.0089   Epoch: 14   Global Step: 79780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:48,446-Speed 5580.67 samples/sec   Loss 2.5396   LearningRate 0.0089   Epoch: 14   Global Step: 79790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:50,267-Speed 5622.51 samples/sec   Loss 2.4922   LearningRate 0.0089   Epoch: 14   Global Step: 79800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:52,093-Speed 5611.95 samples/sec   Loss 2.6125   LearningRate 0.0089   Epoch: 14   Global Step: 79810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:53,924-Speed 5594.97 samples/sec   Loss 2.4930   LearningRate 0.0089   Epoch: 14   Global Step: 79820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:55,824-Speed 5392.24 samples/sec   Loss 2.4609   LearningRate 0.0089   Epoch: 14   Global Step: 79830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:57,653-Speed 5598.49 samples/sec   Loss 2.5205   LearningRate 0.0089   Epoch: 14   Global Step: 79840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:28:59,473-Speed 5630.57 samples/sec   Loss 2.4954   LearningRate 0.0089   Epoch: 14   Global Step: 79850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:01,298-Speed 5612.54 samples/sec   Loss 2.3699   LearningRate 0.0089   Epoch: 14   Global Step: 79860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:03,134-Speed 5577.83 samples/sec   Loss 2.4299   LearningRate 0.0089   Epoch: 14   Global Step: 79870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:04,965-Speed 5596.25 samples/sec   Loss 2.5378   LearningRate 0.0089   Epoch: 14   Global Step: 79880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:06,786-Speed 5624.33 samples/sec   Loss 2.5353   LearningRate 0.0088   Epoch: 14   Global Step: 79890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:08,632-Speed 5549.51 samples/sec   Loss 2.5685   LearningRate 0.0088   Epoch: 14   Global Step: 79900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:10,484-Speed 5529.03 samples/sec   Loss 2.5203   LearningRate 0.0088   Epoch: 14   Global Step: 79910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:29:12,338-Speed 5527.48 samples/sec   Loss 2.5097   LearningRate 0.0088   Epoch: 14   Global Step: 79920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:14,178-Speed 5566.29 samples/sec   Loss 2.5782   LearningRate 0.0088   Epoch: 14   Global Step: 79930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:16,011-Speed 5588.59 samples/sec   Loss 2.3907   LearningRate 0.0088   Epoch: 14   Global Step: 79940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:17,823-Speed 5653.79 samples/sec   Loss 2.6297   LearningRate 0.0088   Epoch: 14   Global Step: 79950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:19,667-Speed 5555.70 samples/sec   Loss 2.5892   LearningRate 0.0088   Epoch: 14   Global Step: 79960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:21,504-Speed 5575.75 samples/sec   Loss 2.6074   LearningRate 0.0088   Epoch: 14   Global Step: 79970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:23,344-Speed 5568.19 samples/sec   Loss 2.6253   LearningRate 0.0088   Epoch: 14   Global Step: 79980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:25,186-Speed 5558.69 samples/sec   Loss 2.5336   LearningRate 0.0088   Epoch: 14   Global Step: 79990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:27,020-Speed 5585.86 samples/sec   Loss 2.3842   LearningRate 0.0088   Epoch: 14   Global Step: 80000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:29:53,097-[lfw][80000]XNorm: 21.572815
Training: 2022-04-27 06:29:53,097-[lfw][80000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-04-27 06:29:53,098-[lfw][80000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:30:23,335-[cfp_fp][80000]XNorm: 20.100877
Training: 2022-04-27 06:30:23,336-[cfp_fp][80000]Accuracy-Flip: 0.96543+-0.00792
Training: 2022-04-27 06:30:23,336-[cfp_fp][80000]Accuracy-Highest: 0.96943
Training: 2022-04-27 06:30:49,423-[agedb_30][80000]XNorm: 21.515273
Training: 2022-04-27 06:30:49,423-[agedb_30][80000]Accuracy-Flip: 0.98017+-0.00765
Training: 2022-04-27 06:30:49,424-[agedb_30][80000]Accuracy-Highest: 0.98017
Training: 2022-04-27 06:30:51,330-Speed 121.46 samples/sec   Loss 2.5493   LearningRate 0.0088   Epoch: 14   Global Step: 80010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:30:53,206-Speed 5458.12 samples/sec   Loss 2.5403   LearningRate 0.0088   Epoch: 14   Global Step: 80020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:30:55,017-Speed 5657.06 samples/sec   Loss 2.5544   LearningRate 0.0088   Epoch: 14   Global Step: 80030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:30:56,846-Speed 5601.78 samples/sec   Loss 2.4983   LearningRate 0.0088   Epoch: 14   Global Step: 80040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:30:58,678-Speed 5591.05 samples/sec   Loss 2.5965   LearningRate 0.0088   Epoch: 14   Global Step: 80050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:00,574-Speed 5403.10 samples/sec   Loss 2.6477   LearningRate 0.0088   Epoch: 14   Global Step: 80060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:02,422-Speed 5541.52 samples/sec   Loss 2.5105   LearningRate 0.0088   Epoch: 14   Global Step: 80070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:04,236-Speed 5646.85 samples/sec   Loss 2.5666   LearningRate 0.0088   Epoch: 14   Global Step: 80080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:06,044-Speed 5665.81 samples/sec   Loss 2.5044   LearningRate 0.0087   Epoch: 14   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:07,862-Speed 5634.90 samples/sec   Loss 2.5494   LearningRate 0.0087   Epoch: 14   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:09,699-Speed 5574.34 samples/sec   Loss 2.4999   LearningRate 0.0087   Epoch: 14   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:11,538-Speed 5571.25 samples/sec   Loss 2.5314   LearningRate 0.0087   Epoch: 14   Global Step: 80120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:13,404-Speed 5489.14 samples/sec   Loss 2.5580   LearningRate 0.0087   Epoch: 14   Global Step: 80130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:15,234-Speed 5598.51 samples/sec   Loss 2.5478   LearningRate 0.0087   Epoch: 14   Global Step: 80140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:17,074-Speed 5567.64 samples/sec   Loss 2.5406   LearningRate 0.0087   Epoch: 14   Global Step: 80150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:18,896-Speed 5621.85 samples/sec   Loss 2.4578   LearningRate 0.0087   Epoch: 14   Global Step: 80160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:20,739-Speed 5557.72 samples/sec   Loss 2.6078   LearningRate 0.0087   Epoch: 14   Global Step: 80170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:22,567-Speed 5603.65 samples/sec   Loss 2.5359   LearningRate 0.0087   Epoch: 14   Global Step: 80180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:24,378-Speed 5656.22 samples/sec   Loss 2.6744   LearningRate 0.0087   Epoch: 14   Global Step: 80190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:26,192-Speed 5645.78 samples/sec   Loss 2.5919   LearningRate 0.0087   Epoch: 14   Global Step: 80200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:27,998-Speed 5671.06 samples/sec   Loss 2.6020   LearningRate 0.0087   Epoch: 14   Global Step: 80210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:29,817-Speed 5633.25 samples/sec   Loss 2.5346   LearningRate 0.0087   Epoch: 14   Global Step: 80220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:31:31,626-Speed 5661.34 samples/sec   Loss 2.5246   LearningRate 0.0087   Epoch: 14   Global Step: 80230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:33,471-Speed 5553.40 samples/sec   Loss 2.5538   LearningRate 0.0087   Epoch: 14   Global Step: 80240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:35,281-Speed 5657.46 samples/sec   Loss 2.6539   LearningRate 0.0087   Epoch: 14   Global Step: 80250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:37,091-Speed 5659.75 samples/sec   Loss 2.5824   LearningRate 0.0087   Epoch: 14   Global Step: 80260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:38,899-Speed 5667.57 samples/sec   Loss 2.5378   LearningRate 0.0087   Epoch: 14   Global Step: 80270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:40,723-Speed 5616.27 samples/sec   Loss 2.5661   LearningRate 0.0086   Epoch: 14   Global Step: 80280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:42,557-Speed 5583.13 samples/sec   Loss 2.6032   LearningRate 0.0086   Epoch: 14   Global Step: 80290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:44,368-Speed 5657.98 samples/sec   Loss 2.4726   LearningRate 0.0086   Epoch: 14   Global Step: 80300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:46,182-Speed 5647.09 samples/sec   Loss 2.5398   LearningRate 0.0086   Epoch: 14   Global Step: 80310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:48,020-Speed 5571.84 samples/sec   Loss 2.5625   LearningRate 0.0086   Epoch: 14   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:49,841-Speed 5625.62 samples/sec   Loss 2.5828   LearningRate 0.0086   Epoch: 14   Global Step: 80330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:51,667-Speed 5609.13 samples/sec   Loss 2.5556   LearningRate 0.0086   Epoch: 14   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:53,481-Speed 5647.71 samples/sec   Loss 2.5860   LearningRate 0.0086   Epoch: 14   Global Step: 80350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:55,296-Speed 5643.82 samples/sec   Loss 2.7246   LearningRate 0.0086   Epoch: 14   Global Step: 80360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:57,121-Speed 5610.50 samples/sec   Loss 2.6573   LearningRate 0.0086   Epoch: 14   Global Step: 80370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:31:58,949-Speed 5604.14 samples/sec   Loss 2.5630   LearningRate 0.0086   Epoch: 14   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:00,791-Speed 5564.08 samples/sec   Loss 2.5024   LearningRate 0.0086   Epoch: 14   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:02,623-Speed 5592.01 samples/sec   Loss 2.6206   LearningRate 0.0086   Epoch: 14   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:04,441-Speed 5634.08 samples/sec   Loss 2.6888   LearningRate 0.0086   Epoch: 14   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:06,265-Speed 5615.62 samples/sec   Loss 2.5449   LearningRate 0.0086   Epoch: 14   Global Step: 80420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:08,059-Speed 5709.11 samples/sec   Loss 2.4515   LearningRate 0.0086   Epoch: 14   Global Step: 80430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:09,885-Speed 5608.63 samples/sec   Loss 2.6062   LearningRate 0.0086   Epoch: 14   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:11,727-Speed 5560.86 samples/sec   Loss 2.6769   LearningRate 0.0086   Epoch: 14   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:13,566-Speed 5570.94 samples/sec   Loss 2.6321   LearningRate 0.0086   Epoch: 14   Global Step: 80460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:15,392-Speed 5609.17 samples/sec   Loss 2.5478   LearningRate 0.0085   Epoch: 14   Global Step: 80470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:17,216-Speed 5616.96 samples/sec   Loss 2.5696   LearningRate 0.0085   Epoch: 14   Global Step: 80480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:19,021-Speed 5674.05 samples/sec   Loss 2.6134   LearningRate 0.0085   Epoch: 14   Global Step: 80490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:20,830-Speed 5662.28 samples/sec   Loss 2.6026   LearningRate 0.0085   Epoch: 14   Global Step: 80500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:22,647-Speed 5638.43 samples/sec   Loss 2.6213   LearningRate 0.0085   Epoch: 14   Global Step: 80510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:24,470-Speed 5619.05 samples/sec   Loss 2.6929   LearningRate 0.0085   Epoch: 14   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:26,285-Speed 5644.12 samples/sec   Loss 2.6161   LearningRate 0.0085   Epoch: 14   Global Step: 80530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:32:28,101-Speed 5639.21 samples/sec   Loss 2.6082   LearningRate 0.0085   Epoch: 14   Global Step: 80540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:29,931-Speed 5598.03 samples/sec   Loss 2.6579   LearningRate 0.0085   Epoch: 14   Global Step: 80550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:31,749-Speed 5635.36 samples/sec   Loss 2.6596   LearningRate 0.0085   Epoch: 14   Global Step: 80560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:33,566-Speed 5638.83 samples/sec   Loss 2.7042   LearningRate 0.0085   Epoch: 14   Global Step: 80570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:35,391-Speed 5611.89 samples/sec   Loss 2.6147   LearningRate 0.0085   Epoch: 14   Global Step: 80580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:37,200-Speed 5661.23 samples/sec   Loss 2.5555   LearningRate 0.0085   Epoch: 14   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:39,023-Speed 5620.16 samples/sec   Loss 2.4880   LearningRate 0.0085   Epoch: 14   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:40,835-Speed 5651.85 samples/sec   Loss 2.7156   LearningRate 0.0085   Epoch: 14   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:42,661-Speed 5609.29 samples/sec   Loss 2.6421   LearningRate 0.0085   Epoch: 14   Global Step: 80620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:44,481-Speed 5628.12 samples/sec   Loss 2.6440   LearningRate 0.0085   Epoch: 14   Global Step: 80630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:46,289-Speed 5668.84 samples/sec   Loss 2.5904   LearningRate 0.0085   Epoch: 14   Global Step: 80640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:32:48,091-Speed 5683.35 samples/sec   Loss 2.6400   LearningRate 0.0085   Epoch: 14   Global Step: 80650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:49,919-Speed 5603.08 samples/sec   Loss 2.5493   LearningRate 0.0085   Epoch: 14   Global Step: 80660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:51,760-Speed 5565.00 samples/sec   Loss 2.6076   LearningRate 0.0084   Epoch: 14   Global Step: 80670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:53,584-Speed 5614.55 samples/sec   Loss 2.6607   LearningRate 0.0084   Epoch: 14   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:55,414-Speed 5598.28 samples/sec   Loss 2.6424   LearningRate 0.0084   Epoch: 14   Global Step: 80690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:57,245-Speed 5594.87 samples/sec   Loss 2.6506   LearningRate 0.0084   Epoch: 14   Global Step: 80700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:32:59,067-Speed 5622.16 samples/sec   Loss 2.6532   LearningRate 0.0084   Epoch: 14   Global Step: 80710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:00,885-Speed 5632.47 samples/sec   Loss 2.6301   LearningRate 0.0084   Epoch: 14   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:02,716-Speed 5595.69 samples/sec   Loss 2.6032   LearningRate 0.0084   Epoch: 14   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:04,545-Speed 5600.49 samples/sec   Loss 2.6862   LearningRate 0.0084   Epoch: 14   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:06,364-Speed 5630.86 samples/sec   Loss 2.6966   LearningRate 0.0084   Epoch: 14   Global Step: 80750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:08,181-Speed 5637.60 samples/sec   Loss 2.6161   LearningRate 0.0084   Epoch: 14   Global Step: 80760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:10,006-Speed 5613.37 samples/sec   Loss 2.6772   LearningRate 0.0084   Epoch: 14   Global Step: 80770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:11,830-Speed 5617.76 samples/sec   Loss 2.6328   LearningRate 0.0084   Epoch: 14   Global Step: 80780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:13,664-Speed 5582.55 samples/sec   Loss 2.6517   LearningRate 0.0084   Epoch: 14   Global Step: 80790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:15,482-Speed 5635.11 samples/sec   Loss 2.6404   LearningRate 0.0084   Epoch: 14   Global Step: 80800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:17,319-Speed 5577.57 samples/sec   Loss 2.6402   LearningRate 0.0084   Epoch: 14   Global Step: 80810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:19,153-Speed 5585.11 samples/sec   Loss 2.5943   LearningRate 0.0084   Epoch: 14   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:20,978-Speed 5613.17 samples/sec   Loss 2.6991   LearningRate 0.0084   Epoch: 14   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:22,802-Speed 5615.93 samples/sec   Loss 2.6580   LearningRate 0.0084   Epoch: 14   Global Step: 80840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:24,637-Speed 5581.31 samples/sec   Loss 2.6909   LearningRate 0.0084   Epoch: 14   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:26,500-Speed 5499.38 samples/sec   Loss 2.6280   LearningRate 0.0083   Epoch: 14   Global Step: 80860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:28,349-Speed 5539.58 samples/sec   Loss 2.5851   LearningRate 0.0083   Epoch: 14   Global Step: 80870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:30,190-Speed 5564.04 samples/sec   Loss 2.5787   LearningRate 0.0083   Epoch: 14   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:32,018-Speed 5603.26 samples/sec   Loss 2.6209   LearningRate 0.0083   Epoch: 14   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:33,833-Speed 5644.03 samples/sec   Loss 2.6508   LearningRate 0.0083   Epoch: 14   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:35,656-Speed 5620.21 samples/sec   Loss 2.6802   LearningRate 0.0083   Epoch: 14   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:37,469-Speed 5650.98 samples/sec   Loss 2.6678   LearningRate 0.0083   Epoch: 14   Global Step: 80920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:39,281-Speed 5652.17 samples/sec   Loss 2.5954   LearningRate 0.0083   Epoch: 14   Global Step: 80930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:41,110-Speed 5601.33 samples/sec   Loss 2.6897   LearningRate 0.0083   Epoch: 14   Global Step: 80940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:42,920-Speed 5659.90 samples/sec   Loss 2.6591   LearningRate 0.0083   Epoch: 14   Global Step: 80950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:44,734-Speed 5646.34 samples/sec   Loss 2.6514   LearningRate 0.0083   Epoch: 14   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:46,562-Speed 5603.97 samples/sec   Loss 2.7448   LearningRate 0.0083   Epoch: 14   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:48,393-Speed 5594.61 samples/sec   Loss 2.6299   LearningRate 0.0083   Epoch: 14   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:50,208-Speed 5641.45 samples/sec   Loss 2.5948   LearningRate 0.0083   Epoch: 14   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:52,034-Speed 5611.55 samples/sec   Loss 2.7235   LearningRate 0.0083   Epoch: 14   Global Step: 81000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:53,857-Speed 5619.83 samples/sec   Loss 2.6646   LearningRate 0.0083   Epoch: 14   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:55,673-Speed 5639.86 samples/sec   Loss 2.7433   LearningRate 0.0083   Epoch: 14   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:57,492-Speed 5630.40 samples/sec   Loss 2.6813   LearningRate 0.0083   Epoch: 14   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:33:59,314-Speed 5622.59 samples/sec   Loss 2.6954   LearningRate 0.0083   Epoch: 14   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:01,135-Speed 5625.72 samples/sec   Loss 2.7293   LearningRate 0.0083   Epoch: 14   Global Step: 81050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:34:02,943-Speed 5664.73 samples/sec   Loss 2.5844   LearningRate 0.0082   Epoch: 14   Global Step: 81060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:04,773-Speed 5599.81 samples/sec   Loss 2.6387   LearningRate 0.0082   Epoch: 14   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:06,593-Speed 5626.10 samples/sec   Loss 2.6054   LearningRate 0.0082   Epoch: 14   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:08,407-Speed 5647.33 samples/sec   Loss 2.6985   LearningRate 0.0082   Epoch: 14   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:10,220-Speed 5648.74 samples/sec   Loss 2.6797   LearningRate 0.0082   Epoch: 14   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:12,041-Speed 5626.40 samples/sec   Loss 2.6442   LearningRate 0.0082   Epoch: 14   Global Step: 81110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:13,878-Speed 5575.95 samples/sec   Loss 2.7806   LearningRate 0.0082   Epoch: 14   Global Step: 81120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:15,693-Speed 5644.56 samples/sec   Loss 2.6337   LearningRate 0.0082   Epoch: 14   Global Step: 81130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:17,517-Speed 5615.65 samples/sec   Loss 2.7110   LearningRate 0.0082   Epoch: 14   Global Step: 81140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:19,363-Speed 5549.43 samples/sec   Loss 2.6514   LearningRate 0.0082   Epoch: 14   Global Step: 81150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:21,176-Speed 5650.40 samples/sec   Loss 2.6824   LearningRate 0.0082   Epoch: 14   Global Step: 81160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:22,995-Speed 5629.54 samples/sec   Loss 2.5746   LearningRate 0.0082   Epoch: 14   Global Step: 81170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:24,823-Speed 5604.09 samples/sec   Loss 2.6901   LearningRate 0.0082   Epoch: 14   Global Step: 81180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:26,640-Speed 5637.97 samples/sec   Loss 2.6439   LearningRate 0.0082   Epoch: 14   Global Step: 81190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:28,480-Speed 5568.55 samples/sec   Loss 2.6839   LearningRate 0.0082   Epoch: 14   Global Step: 81200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:30,321-Speed 5563.77 samples/sec   Loss 2.6643   LearningRate 0.0082   Epoch: 14   Global Step: 81210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:32,152-Speed 5593.29 samples/sec   Loss 2.6027   LearningRate 0.0082   Epoch: 14   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:33,974-Speed 5621.97 samples/sec   Loss 2.6151   LearningRate 0.0082   Epoch: 14   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:35,796-Speed 5622.05 samples/sec   Loss 2.7181   LearningRate 0.0082   Epoch: 14   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:37,693-Speed 5402.13 samples/sec   Loss 2.6963   LearningRate 0.0082   Epoch: 14   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:39,556-Speed 5497.26 samples/sec   Loss 2.6792   LearningRate 0.0081   Epoch: 14   Global Step: 81260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:41,381-Speed 5615.69 samples/sec   Loss 2.5918   LearningRate 0.0081   Epoch: 14   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:43,232-Speed 5533.22 samples/sec   Loss 2.7227   LearningRate 0.0081   Epoch: 14   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:45,090-Speed 5511.38 samples/sec   Loss 2.6744   LearningRate 0.0081   Epoch: 14   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:46,921-Speed 5596.59 samples/sec   Loss 2.6425   LearningRate 0.0081   Epoch: 14   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:48,754-Speed 5586.29 samples/sec   Loss 2.7675   LearningRate 0.0081   Epoch: 14   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:50,591-Speed 5577.28 samples/sec   Loss 2.6551   LearningRate 0.0081   Epoch: 14   Global Step: 81320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:52,443-Speed 5529.37 samples/sec   Loss 2.6578   LearningRate 0.0081   Epoch: 14   Global Step: 81330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:54,318-Speed 5464.73 samples/sec   Loss 2.5974   LearningRate 0.0081   Epoch: 14   Global Step: 81340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:56,143-Speed 5612.15 samples/sec   Loss 2.6856   LearningRate 0.0081   Epoch: 14   Global Step: 81350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:34:57,966-Speed 5619.77 samples/sec   Loss 2.6348   LearningRate 0.0081   Epoch: 14   Global Step: 81360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:34:59,807-Speed 5564.67 samples/sec   Loss 2.5623   LearningRate 0.0081   Epoch: 14   Global Step: 81370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:01,640-Speed 5587.53 samples/sec   Loss 2.5640   LearningRate 0.0081   Epoch: 14   Global Step: 81380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:03,462-Speed 5624.03 samples/sec   Loss 2.6257   LearningRate 0.0081   Epoch: 14   Global Step: 81390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:05,270-Speed 5665.73 samples/sec   Loss 2.6826   LearningRate 0.0081   Epoch: 14   Global Step: 81400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:07,081-Speed 5654.55 samples/sec   Loss 2.5440   LearningRate 0.0081   Epoch: 14   Global Step: 81410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:08,891-Speed 5658.98 samples/sec   Loss 2.6861   LearningRate 0.0081   Epoch: 14   Global Step: 81420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:10,715-Speed 5618.23 samples/sec   Loss 2.7037   LearningRate 0.0081   Epoch: 14   Global Step: 81430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:12,554-Speed 5568.96 samples/sec   Loss 2.7588   LearningRate 0.0081   Epoch: 14   Global Step: 81440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:14,383-Speed 5600.77 samples/sec   Loss 2.6221   LearningRate 0.0081   Epoch: 14   Global Step: 81450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:16,207-Speed 5614.78 samples/sec   Loss 2.7288   LearningRate 0.0080   Epoch: 14   Global Step: 81460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:18,031-Speed 5617.29 samples/sec   Loss 2.6613   LearningRate 0.0080   Epoch: 14   Global Step: 81470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:19,847-Speed 5641.76 samples/sec   Loss 2.6255   LearningRate 0.0080   Epoch: 14   Global Step: 81480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:21,738-Speed 5417.14 samples/sec   Loss 2.7011   LearningRate 0.0080   Epoch: 14   Global Step: 81490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:23,555-Speed 5636.76 samples/sec   Loss 2.7350   LearningRate 0.0080   Epoch: 14   Global Step: 81500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:25,372-Speed 5636.76 samples/sec   Loss 2.6532   LearningRate 0.0080   Epoch: 14   Global Step: 81510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:27,201-Speed 5601.60 samples/sec   Loss 2.6426   LearningRate 0.0080   Epoch: 14   Global Step: 81520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:29,011-Speed 5658.50 samples/sec   Loss 2.7860   LearningRate 0.0080   Epoch: 14   Global Step: 81530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:30,824-Speed 5650.46 samples/sec   Loss 2.6391   LearningRate 0.0080   Epoch: 14   Global Step: 81540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:32,633-Speed 5662.33 samples/sec   Loss 2.6096   LearningRate 0.0080   Epoch: 14   Global Step: 81550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:34,449-Speed 5641.57 samples/sec   Loss 2.6360   LearningRate 0.0080   Epoch: 14   Global Step: 81560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:36,275-Speed 5609.09 samples/sec   Loss 2.7322   LearningRate 0.0080   Epoch: 14   Global Step: 81570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:38,156-Speed 5445.83 samples/sec   Loss 2.6985   LearningRate 0.0080   Epoch: 14   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:39,991-Speed 5581.14 samples/sec   Loss 2.6547   LearningRate 0.0080   Epoch: 14   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:41,827-Speed 5580.45 samples/sec   Loss 2.7208   LearningRate 0.0080   Epoch: 14   Global Step: 81600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:43,746-Speed 5339.12 samples/sec   Loss 2.5985   LearningRate 0.0080   Epoch: 14   Global Step: 81610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:45,617-Speed 5472.92 samples/sec   Loss 2.7426   LearningRate 0.0080   Epoch: 14   Global Step: 81620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:47,463-Speed 5549.73 samples/sec   Loss 2.7479   LearningRate 0.0080   Epoch: 14   Global Step: 81630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:49,298-Speed 5581.35 samples/sec   Loss 2.6560   LearningRate 0.0080   Epoch: 14   Global Step: 81640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:51,122-Speed 5617.91 samples/sec   Loss 2.6768   LearningRate 0.0080   Epoch: 14   Global Step: 81650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:52,987-Speed 5490.68 samples/sec   Loss 2.8112   LearningRate 0.0079   Epoch: 14   Global Step: 81660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:54,813-Speed 5610.39 samples/sec   Loss 2.7612   LearningRate 0.0079   Epoch: 14   Global Step: 81670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:56,637-Speed 5616.84 samples/sec   Loss 2.6742   LearningRate 0.0079   Epoch: 14   Global Step: 81680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:35:58,454-Speed 5637.14 samples/sec   Loss 2.6324   LearningRate 0.0079   Epoch: 14   Global Step: 81690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:00,267-Speed 5649.19 samples/sec   Loss 2.6393   LearningRate 0.0079   Epoch: 14   Global Step: 81700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:02,091-Speed 5617.67 samples/sec   Loss 2.5426   LearningRate 0.0079   Epoch: 14   Global Step: 81710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:03,899-Speed 5665.49 samples/sec   Loss 2.8641   LearningRate 0.0079   Epoch: 14   Global Step: 81720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:05,724-Speed 5612.37 samples/sec   Loss 2.6688   LearningRate 0.0079   Epoch: 14   Global Step: 81730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:07,540-Speed 5642.09 samples/sec   Loss 2.6387   LearningRate 0.0079   Epoch: 14   Global Step: 81740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:09,366-Speed 5609.05 samples/sec   Loss 2.6579   LearningRate 0.0079   Epoch: 14   Global Step: 81750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:11,179-Speed 5649.40 samples/sec   Loss 2.6414   LearningRate 0.0079   Epoch: 14   Global Step: 81760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:12,993-Speed 5647.86 samples/sec   Loss 2.6118   LearningRate 0.0079   Epoch: 14   Global Step: 81770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:14,804-Speed 5653.79 samples/sec   Loss 2.6959   LearningRate 0.0079   Epoch: 14   Global Step: 81780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:16,630-Speed 5611.89 samples/sec   Loss 2.7080   LearningRate 0.0079   Epoch: 14   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:18,456-Speed 5609.34 samples/sec   Loss 2.7896   LearningRate 0.0079   Epoch: 14   Global Step: 81800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:20,283-Speed 5607.04 samples/sec   Loss 2.6712   LearningRate 0.0079   Epoch: 14   Global Step: 81810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:22,100-Speed 5634.54 samples/sec   Loss 2.8046   LearningRate 0.0079   Epoch: 14   Global Step: 81820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:23,921-Speed 5626.95 samples/sec   Loss 2.7270   LearningRate 0.0079   Epoch: 14   Global Step: 81830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:25,751-Speed 5597.76 samples/sec   Loss 2.6447   LearningRate 0.0079   Epoch: 14   Global Step: 81840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:27,590-Speed 5570.78 samples/sec   Loss 2.7330   LearningRate 0.0079   Epoch: 14   Global Step: 81850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:29,430-Speed 5567.31 samples/sec   Loss 2.6851   LearningRate 0.0078   Epoch: 14   Global Step: 81860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:31,240-Speed 5659.44 samples/sec   Loss 2.5062   LearningRate 0.0078   Epoch: 14   Global Step: 81870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:33,083-Speed 5556.18 samples/sec   Loss 2.7399   LearningRate 0.0078   Epoch: 14   Global Step: 81880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:34,902-Speed 5633.13 samples/sec   Loss 2.7603   LearningRate 0.0078   Epoch: 14   Global Step: 81890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:36:36,727-Speed 5613.63 samples/sec   Loss 2.8064   LearningRate 0.0078   Epoch: 14   Global Step: 81900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:38,569-Speed 5558.30 samples/sec   Loss 2.6745   LearningRate 0.0078   Epoch: 14   Global Step: 81910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:40,394-Speed 5612.63 samples/sec   Loss 2.7181   LearningRate 0.0078   Epoch: 14   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:42,231-Speed 5579.26 samples/sec   Loss 2.6737   LearningRate 0.0078   Epoch: 14   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:44,069-Speed 5570.24 samples/sec   Loss 2.7276   LearningRate 0.0078   Epoch: 14   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:45,926-Speed 5516.61 samples/sec   Loss 2.7333   LearningRate 0.0078   Epoch: 14   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:47,772-Speed 5548.86 samples/sec   Loss 2.7095   LearningRate 0.0078   Epoch: 14   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:49,609-Speed 5576.34 samples/sec   Loss 2.7680   LearningRate 0.0078   Epoch: 14   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:51,436-Speed 5608.71 samples/sec   Loss 2.6426   LearningRate 0.0078   Epoch: 14   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:53,262-Speed 5609.21 samples/sec   Loss 2.8085   LearningRate 0.0078   Epoch: 14   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:36:55,101-Speed 5570.05 samples/sec   Loss 2.6660   LearningRate 0.0078   Epoch: 14   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:37:21,391-[lfw][82000]XNorm: 21.839809
Training: 2022-04-27 06:37:21,392-[lfw][82000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-27 06:37:21,392-[lfw][82000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:37:51,847-[cfp_fp][82000]XNorm: 19.821888
Training: 2022-04-27 06:37:51,848-[cfp_fp][82000]Accuracy-Flip: 0.96857+-0.00761
Training: 2022-04-27 06:37:51,848-[cfp_fp][82000]Accuracy-Highest: 0.96943
Training: 2022-04-27 06:38:18,164-[agedb_30][82000]XNorm: 21.807388
Training: 2022-04-27 06:38:18,165-[agedb_30][82000]Accuracy-Flip: 0.97933+-0.00873
Training: 2022-04-27 06:38:18,165-[agedb_30][82000]Accuracy-Highest: 0.98017
Training: 2022-04-27 06:38:20,041-Speed 120.56 samples/sec   Loss 2.7373   LearningRate 0.0078   Epoch: 14   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:21,868-Speed 5605.94 samples/sec   Loss 2.8470   LearningRate 0.0078   Epoch: 14   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:23,687-Speed 5629.30 samples/sec   Loss 2.7982   LearningRate 0.0078   Epoch: 14   Global Step: 82030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:25,500-Speed 5649.47 samples/sec   Loss 2.7957   LearningRate 0.0078   Epoch: 14   Global Step: 82040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:27,314-Speed 5647.91 samples/sec   Loss 2.7459   LearningRate 0.0078   Epoch: 14   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:29,138-Speed 5617.91 samples/sec   Loss 2.7259   LearningRate 0.0078   Epoch: 14   Global Step: 82060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:30,953-Speed 5642.46 samples/sec   Loss 2.6716   LearningRate 0.0077   Epoch: 14   Global Step: 82070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:32,792-Speed 5569.69 samples/sec   Loss 2.6854   LearningRate 0.0077   Epoch: 14   Global Step: 82080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:34,599-Speed 5669.82 samples/sec   Loss 2.5657   LearningRate 0.0077   Epoch: 14   Global Step: 82090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:36,444-Speed 5550.07 samples/sec   Loss 2.5992   LearningRate 0.0077   Epoch: 14   Global Step: 82100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:38,268-Speed 5617.91 samples/sec   Loss 2.7919   LearningRate 0.0077   Epoch: 14   Global Step: 82110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:40,075-Speed 5667.92 samples/sec   Loss 2.6596   LearningRate 0.0077   Epoch: 14   Global Step: 82120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:41,899-Speed 5616.15 samples/sec   Loss 2.6829   LearningRate 0.0077   Epoch: 14   Global Step: 82130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:43,725-Speed 5607.63 samples/sec   Loss 2.7246   LearningRate 0.0077   Epoch: 14   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:45,540-Speed 5645.12 samples/sec   Loss 2.6864   LearningRate 0.0077   Epoch: 14   Global Step: 82150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:47,383-Speed 5558.74 samples/sec   Loss 2.7558   LearningRate 0.0077   Epoch: 14   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:49,229-Speed 5547.89 samples/sec   Loss 2.6646   LearningRate 0.0077   Epoch: 14   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:51,057-Speed 5605.54 samples/sec   Loss 2.7474   LearningRate 0.0077   Epoch: 14   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:52,884-Speed 5606.71 samples/sec   Loss 2.7870   LearningRate 0.0077   Epoch: 14   Global Step: 82190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:54,722-Speed 5572.31 samples/sec   Loss 2.6446   LearningRate 0.0077   Epoch: 14   Global Step: 82200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:38:56,539-Speed 5638.13 samples/sec   Loss 2.6277   LearningRate 0.0077   Epoch: 14   Global Step: 82210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:38:58,347-Speed 5663.66 samples/sec   Loss 2.6927   LearningRate 0.0077   Epoch: 14   Global Step: 82220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:00,175-Speed 5605.32 samples/sec   Loss 2.6425   LearningRate 0.0077   Epoch: 14   Global Step: 82230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:02,001-Speed 5608.76 samples/sec   Loss 2.6224   LearningRate 0.0077   Epoch: 14   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:03,836-Speed 5582.21 samples/sec   Loss 2.7527   LearningRate 0.0077   Epoch: 14   Global Step: 82250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:05,659-Speed 5620.10 samples/sec   Loss 2.7135   LearningRate 0.0077   Epoch: 14   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:07,475-Speed 5640.32 samples/sec   Loss 2.6210   LearningRate 0.0076   Epoch: 14   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:09,316-Speed 5565.57 samples/sec   Loss 2.6925   LearningRate 0.0076   Epoch: 14   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:11,132-Speed 5638.55 samples/sec   Loss 2.7719   LearningRate 0.0076   Epoch: 14   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:12,949-Speed 5637.83 samples/sec   Loss 2.7166   LearningRate 0.0076   Epoch: 14   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:14,751-Speed 5685.02 samples/sec   Loss 2.6750   LearningRate 0.0076   Epoch: 14   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:16,580-Speed 5599.99 samples/sec   Loss 2.6227   LearningRate 0.0076   Epoch: 14   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:18,405-Speed 5613.15 samples/sec   Loss 2.7521   LearningRate 0.0076   Epoch: 14   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:20,228-Speed 5617.40 samples/sec   Loss 2.7377   LearningRate 0.0076   Epoch: 14   Global Step: 82340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:22,069-Speed 5564.06 samples/sec   Loss 2.7015   LearningRate 0.0076   Epoch: 14   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:23,890-Speed 5625.21 samples/sec   Loss 2.6515   LearningRate 0.0076   Epoch: 14   Global Step: 82360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:25,706-Speed 5642.04 samples/sec   Loss 2.7520   LearningRate 0.0076   Epoch: 14   Global Step: 82370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:27,531-Speed 5611.73 samples/sec   Loss 2.6887   LearningRate 0.0076   Epoch: 14   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:29,362-Speed 5594.22 samples/sec   Loss 2.6301   LearningRate 0.0076   Epoch: 14   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:31,200-Speed 5576.08 samples/sec   Loss 2.6647   LearningRate 0.0076   Epoch: 14   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:33,018-Speed 5633.94 samples/sec   Loss 2.6344   LearningRate 0.0076   Epoch: 14   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:34,843-Speed 5611.57 samples/sec   Loss 2.7101   LearningRate 0.0076   Epoch: 14   Global Step: 82420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:36,669-Speed 5609.61 samples/sec   Loss 2.7256   LearningRate 0.0076   Epoch: 14   Global Step: 82430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:38,512-Speed 5560.28 samples/sec   Loss 2.6763   LearningRate 0.0076   Epoch: 14   Global Step: 82440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:40,333-Speed 5622.33 samples/sec   Loss 2.6189   LearningRate 0.0076   Epoch: 14   Global Step: 82450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:42,170-Speed 5576.28 samples/sec   Loss 2.6847   LearningRate 0.0076   Epoch: 14   Global Step: 82460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:43,995-Speed 5612.16 samples/sec   Loss 2.6296   LearningRate 0.0076   Epoch: 14   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:45,828-Speed 5588.02 samples/sec   Loss 2.6817   LearningRate 0.0075   Epoch: 14   Global Step: 82480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:47,655-Speed 5607.33 samples/sec   Loss 2.7218   LearningRate 0.0075   Epoch: 14   Global Step: 82490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:49,518-Speed 5498.92 samples/sec   Loss 2.6633   LearningRate 0.0075   Epoch: 14   Global Step: 82500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:51,350-Speed 5591.46 samples/sec   Loss 2.6887   LearningRate 0.0075   Epoch: 14   Global Step: 82510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:53,169-Speed 5633.58 samples/sec   Loss 2.7310   LearningRate 0.0075   Epoch: 14   Global Step: 82520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:54,996-Speed 5606.55 samples/sec   Loss 2.6777   LearningRate 0.0075   Epoch: 14   Global Step: 82530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:56,816-Speed 5627.56 samples/sec   Loss 2.7379   LearningRate 0.0075   Epoch: 14   Global Step: 82540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:39:58,634-Speed 5635.54 samples/sec   Loss 2.7386   LearningRate 0.0075   Epoch: 14   Global Step: 82550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:00,467-Speed 5587.96 samples/sec   Loss 2.6198   LearningRate 0.0075   Epoch: 14   Global Step: 82560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:02,283-Speed 5640.97 samples/sec   Loss 2.7870   LearningRate 0.0075   Epoch: 14   Global Step: 82570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:04,110-Speed 5606.82 samples/sec   Loss 2.5995   LearningRate 0.0075   Epoch: 14   Global Step: 82580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:05,935-Speed 5612.28 samples/sec   Loss 2.7519   LearningRate 0.0075   Epoch: 14   Global Step: 82590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:07,763-Speed 5603.67 samples/sec   Loss 2.7366   LearningRate 0.0075   Epoch: 14   Global Step: 82600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:09,579-Speed 5641.12 samples/sec   Loss 2.7266   LearningRate 0.0075   Epoch: 14   Global Step: 82610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:11,399-Speed 5627.81 samples/sec   Loss 2.5890   LearningRate 0.0075   Epoch: 14   Global Step: 82620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:13,266-Speed 5483.90 samples/sec   Loss 2.7095   LearningRate 0.0075   Epoch: 14   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:15,090-Speed 5616.77 samples/sec   Loss 2.7008   LearningRate 0.0075   Epoch: 14   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:16,928-Speed 5573.03 samples/sec   Loss 2.6654   LearningRate 0.0075   Epoch: 14   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:18,781-Speed 5530.16 samples/sec   Loss 2.6664   LearningRate 0.0075   Epoch: 14   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:20,605-Speed 5614.08 samples/sec   Loss 2.6383   LearningRate 0.0075   Epoch: 14   Global Step: 82670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:22,433-Speed 5603.11 samples/sec   Loss 2.6662   LearningRate 0.0075   Epoch: 14   Global Step: 82680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:24,291-Speed 5514.90 samples/sec   Loss 2.6801   LearningRate 0.0074   Epoch: 14   Global Step: 82690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:26,109-Speed 5634.58 samples/sec   Loss 2.6465   LearningRate 0.0074   Epoch: 14   Global Step: 82700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:27,913-Speed 5677.15 samples/sec   Loss 2.6234   LearningRate 0.0074   Epoch: 14   Global Step: 82710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:29,734-Speed 5624.95 samples/sec   Loss 2.7013   LearningRate 0.0074   Epoch: 14   Global Step: 82720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:31,559-Speed 5612.57 samples/sec   Loss 2.7149   LearningRate 0.0074   Epoch: 14   Global Step: 82730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:33,377-Speed 5634.36 samples/sec   Loss 2.7435   LearningRate 0.0074   Epoch: 14   Global Step: 82740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:35,194-Speed 5639.02 samples/sec   Loss 2.7435   LearningRate 0.0074   Epoch: 14   Global Step: 82750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:37,021-Speed 5606.68 samples/sec   Loss 2.6208   LearningRate 0.0074   Epoch: 14   Global Step: 82760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:38,861-Speed 5568.19 samples/sec   Loss 2.7513   LearningRate 0.0074   Epoch: 14   Global Step: 82770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:40,679-Speed 5633.58 samples/sec   Loss 2.7506   LearningRate 0.0074   Epoch: 14   Global Step: 82780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:42,499-Speed 5626.87 samples/sec   Loss 2.7034   LearningRate 0.0074   Epoch: 14   Global Step: 82790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:44,325-Speed 5609.96 samples/sec   Loss 2.6048   LearningRate 0.0074   Epoch: 14   Global Step: 82800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:46,133-Speed 5665.34 samples/sec   Loss 2.7114   LearningRate 0.0074   Epoch: 14   Global Step: 82810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:47,953-Speed 5629.66 samples/sec   Loss 2.7167   LearningRate 0.0074   Epoch: 14   Global Step: 82820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:49,767-Speed 5645.54 samples/sec   Loss 2.8593   LearningRate 0.0074   Epoch: 14   Global Step: 82830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:51,593-Speed 5610.14 samples/sec   Loss 2.7151   LearningRate 0.0074   Epoch: 14   Global Step: 82840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:53,455-Speed 5500.42 samples/sec   Loss 2.6705   LearningRate 0.0074   Epoch: 14   Global Step: 82850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:55,284-Speed 5602.68 samples/sec   Loss 2.6260   LearningRate 0.0074   Epoch: 14   Global Step: 82860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:57,105-Speed 5623.20 samples/sec   Loss 2.6487   LearningRate 0.0074   Epoch: 14   Global Step: 82870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:40:58,934-Speed 5599.77 samples/sec   Loss 2.6384   LearningRate 0.0074   Epoch: 14   Global Step: 82880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:00,750-Speed 5641.95 samples/sec   Loss 2.5977   LearningRate 0.0073   Epoch: 14   Global Step: 82890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:02,574-Speed 5616.83 samples/sec   Loss 2.6478   LearningRate 0.0073   Epoch: 14   Global Step: 82900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:04,396-Speed 5622.68 samples/sec   Loss 2.7669   LearningRate 0.0073   Epoch: 14   Global Step: 82910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:41:06,206-Speed 5659.42 samples/sec   Loss 2.7189   LearningRate 0.0073   Epoch: 14   Global Step: 82920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:08,018-Speed 5654.30 samples/sec   Loss 2.7360   LearningRate 0.0073   Epoch: 14   Global Step: 82930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:09,849-Speed 5594.17 samples/sec   Loss 2.6590   LearningRate 0.0073   Epoch: 14   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:11,663-Speed 5643.80 samples/sec   Loss 2.7152   LearningRate 0.0073   Epoch: 14   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:13,482-Speed 5632.62 samples/sec   Loss 2.7553   LearningRate 0.0073   Epoch: 14   Global Step: 82960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:15,307-Speed 5613.86 samples/sec   Loss 2.6020   LearningRate 0.0073   Epoch: 14   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:17,126-Speed 5630.78 samples/sec   Loss 2.6798   LearningRate 0.0073   Epoch: 14   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:18,936-Speed 5660.03 samples/sec   Loss 2.6929   LearningRate 0.0073   Epoch: 14   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:20,755-Speed 5629.54 samples/sec   Loss 2.5959   LearningRate 0.0073   Epoch: 14   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:22,573-Speed 5636.31 samples/sec   Loss 2.5891   LearningRate 0.0073   Epoch: 14   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:24,399-Speed 5610.17 samples/sec   Loss 2.6728   LearningRate 0.0073   Epoch: 14   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:26,212-Speed 5650.02 samples/sec   Loss 2.6180   LearningRate 0.0073   Epoch: 14   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:28,025-Speed 5648.64 samples/sec   Loss 2.8103   LearningRate 0.0073   Epoch: 14   Global Step: 83040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:29,846-Speed 5624.61 samples/sec   Loss 2.7299   LearningRate 0.0073   Epoch: 14   Global Step: 83050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:31,678-Speed 5593.51 samples/sec   Loss 2.6272   LearningRate 0.0073   Epoch: 14   Global Step: 83060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:33,488-Speed 5656.97 samples/sec   Loss 2.6588   LearningRate 0.0073   Epoch: 14   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:35,304-Speed 5642.48 samples/sec   Loss 2.7562   LearningRate 0.0073   Epoch: 14   Global Step: 83080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:37,121-Speed 5635.91 samples/sec   Loss 2.6569   LearningRate 0.0073   Epoch: 14   Global Step: 83090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:38,957-Speed 5580.46 samples/sec   Loss 2.7221   LearningRate 0.0072   Epoch: 14   Global Step: 83100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:40,779-Speed 5620.82 samples/sec   Loss 2.6524   LearningRate 0.0072   Epoch: 14   Global Step: 83110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:42,591-Speed 5652.76 samples/sec   Loss 2.6699   LearningRate 0.0072   Epoch: 14   Global Step: 83120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:44,414-Speed 5619.97 samples/sec   Loss 2.7128   LearningRate 0.0072   Epoch: 14   Global Step: 83130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:46,225-Speed 5658.11 samples/sec   Loss 2.6122   LearningRate 0.0072   Epoch: 14   Global Step: 83140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:48,046-Speed 5625.92 samples/sec   Loss 2.7026   LearningRate 0.0072   Epoch: 14   Global Step: 83150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:49,858-Speed 5653.43 samples/sec   Loss 2.7167   LearningRate 0.0072   Epoch: 14   Global Step: 83160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:51,724-Speed 5489.27 samples/sec   Loss 2.7475   LearningRate 0.0072   Epoch: 14   Global Step: 83170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:53,603-Speed 5450.02 samples/sec   Loss 2.6022   LearningRate 0.0072   Epoch: 14   Global Step: 83180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:41:55,445-Speed 5561.98 samples/sec   Loss 2.6815   LearningRate 0.0072   Epoch: 14   Global Step: 83190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:57,299-Speed 5525.01 samples/sec   Loss 2.6801   LearningRate 0.0072   Epoch: 14   Global Step: 83200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:41:59,130-Speed 5593.41 samples/sec   Loss 2.6469   LearningRate 0.0072   Epoch: 14   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:00,968-Speed 5573.13 samples/sec   Loss 2.7056   LearningRate 0.0072   Epoch: 14   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:02,807-Speed 5570.50 samples/sec   Loss 2.6595   LearningRate 0.0072   Epoch: 14   Global Step: 83230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:04,651-Speed 5556.04 samples/sec   Loss 2.6726   LearningRate 0.0072   Epoch: 14   Global Step: 83240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:06,474-Speed 5617.10 samples/sec   Loss 2.7448   LearningRate 0.0072   Epoch: 14   Global Step: 83250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:08,290-Speed 5643.10 samples/sec   Loss 2.6488   LearningRate 0.0072   Epoch: 14   Global Step: 83260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:10,140-Speed 5536.12 samples/sec   Loss 2.6764   LearningRate 0.0072   Epoch: 14   Global Step: 83270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:11,961-Speed 5625.45 samples/sec   Loss 2.5847   LearningRate 0.0072   Epoch: 14   Global Step: 83280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:13,785-Speed 5616.64 samples/sec   Loss 2.6389   LearningRate 0.0072   Epoch: 14   Global Step: 83290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:42:15,589-Speed 5675.82 samples/sec   Loss 2.5873   LearningRate 0.0072   Epoch: 14   Global Step: 83300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:17,415-Speed 5612.38 samples/sec   Loss 2.7126   LearningRate 0.0072   Epoch: 14   Global Step: 83310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:19,248-Speed 5587.52 samples/sec   Loss 2.6321   LearningRate 0.0071   Epoch: 14   Global Step: 83320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:21,068-Speed 5627.21 samples/sec   Loss 2.6771   LearningRate 0.0071   Epoch: 14   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:22,890-Speed 5623.63 samples/sec   Loss 2.6589   LearningRate 0.0071   Epoch: 14   Global Step: 83340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:24,724-Speed 5582.21 samples/sec   Loss 2.6939   LearningRate 0.0071   Epoch: 14   Global Step: 83350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:26,542-Speed 5635.43 samples/sec   Loss 2.5897   LearningRate 0.0071   Epoch: 14   Global Step: 83360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:28,364-Speed 5622.19 samples/sec   Loss 2.6019   LearningRate 0.0071   Epoch: 14   Global Step: 83370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:30,177-Speed 5652.35 samples/sec   Loss 2.6322   LearningRate 0.0071   Epoch: 14   Global Step: 83380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:32,011-Speed 5585.37 samples/sec   Loss 2.7290   LearningRate 0.0071   Epoch: 14   Global Step: 83390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:33,818-Speed 5667.59 samples/sec   Loss 2.6604   LearningRate 0.0071   Epoch: 14   Global Step: 83400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:35,649-Speed 5596.17 samples/sec   Loss 2.7264   LearningRate 0.0071   Epoch: 14   Global Step: 83410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:37,479-Speed 5595.40 samples/sec   Loss 2.5909   LearningRate 0.0071   Epoch: 14   Global Step: 83420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:39,305-Speed 5611.03 samples/sec   Loss 2.6985   LearningRate 0.0071   Epoch: 14   Global Step: 83430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:41,131-Speed 5608.71 samples/sec   Loss 2.6158   LearningRate 0.0071   Epoch: 14   Global Step: 83440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:42,960-Speed 5601.20 samples/sec   Loss 2.5564   LearningRate 0.0071   Epoch: 14   Global Step: 83450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:44,795-Speed 5579.97 samples/sec   Loss 2.7593   LearningRate 0.0071   Epoch: 14   Global Step: 83460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:46,633-Speed 5575.13 samples/sec   Loss 2.6954   LearningRate 0.0071   Epoch: 14   Global Step: 83470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:48,471-Speed 5573.32 samples/sec   Loss 2.4597   LearningRate 0.0071   Epoch: 14   Global Step: 83480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:50,310-Speed 5570.43 samples/sec   Loss 2.6270   LearningRate 0.0071   Epoch: 14   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:52,130-Speed 5626.53 samples/sec   Loss 2.5593   LearningRate 0.0071   Epoch: 14   Global Step: 83500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:53,951-Speed 5624.99 samples/sec   Loss 2.6342   LearningRate 0.0071   Epoch: 14   Global Step: 83510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:55,776-Speed 5614.95 samples/sec   Loss 2.6691   LearningRate 0.0071   Epoch: 14   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:57,620-Speed 5553.30 samples/sec   Loss 2.6969   LearningRate 0.0070   Epoch: 14   Global Step: 83530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:42:59,437-Speed 5639.80 samples/sec   Loss 2.7596   LearningRate 0.0070   Epoch: 14   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:01,277-Speed 5567.39 samples/sec   Loss 2.7612   LearningRate 0.0070   Epoch: 14   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:03,125-Speed 5540.54 samples/sec   Loss 2.7060   LearningRate 0.0070   Epoch: 14   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:04,943-Speed 5637.22 samples/sec   Loss 2.6652   LearningRate 0.0070   Epoch: 14   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:06,766-Speed 5616.17 samples/sec   Loss 2.6612   LearningRate 0.0070   Epoch: 14   Global Step: 83580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:08,583-Speed 5637.45 samples/sec   Loss 2.5481   LearningRate 0.0070   Epoch: 14   Global Step: 83590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:10,390-Speed 5669.24 samples/sec   Loss 2.6439   LearningRate 0.0070   Epoch: 14   Global Step: 83600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:12,228-Speed 5573.90 samples/sec   Loss 2.6683   LearningRate 0.0070   Epoch: 14   Global Step: 83610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:14,057-Speed 5601.26 samples/sec   Loss 2.6370   LearningRate 0.0070   Epoch: 14   Global Step: 83620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:15,884-Speed 5606.10 samples/sec   Loss 2.6307   LearningRate 0.0070   Epoch: 14   Global Step: 83630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:17,731-Speed 5545.37 samples/sec   Loss 2.6809   LearningRate 0.0070   Epoch: 14   Global Step: 83640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:19,565-Speed 5586.23 samples/sec   Loss 2.5866   LearningRate 0.0070   Epoch: 14   Global Step: 83650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:21,387-Speed 5623.43 samples/sec   Loss 2.6670   LearningRate 0.0070   Epoch: 14   Global Step: 83660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:23,227-Speed 5564.18 samples/sec   Loss 2.5566   LearningRate 0.0070   Epoch: 14   Global Step: 83670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:25,063-Speed 5579.05 samples/sec   Loss 2.7552   LearningRate 0.0070   Epoch: 14   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:26,891-Speed 5604.43 samples/sec   Loss 2.6978   LearningRate 0.0070   Epoch: 14   Global Step: 83690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:28,717-Speed 5609.57 samples/sec   Loss 2.5905   LearningRate 0.0070   Epoch: 14   Global Step: 83700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:43:30,534-Speed 5637.31 samples/sec   Loss 2.6565   LearningRate 0.0070   Epoch: 14   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:32,393-Speed 5512.22 samples/sec   Loss 2.6497   LearningRate 0.0070   Epoch: 14   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:34,209-Speed 5637.56 samples/sec   Loss 2.6513   LearningRate 0.0070   Epoch: 14   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:36,035-Speed 5613.25 samples/sec   Loss 2.6481   LearningRate 0.0070   Epoch: 14   Global Step: 83740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:37,847-Speed 5652.68 samples/sec   Loss 2.6149   LearningRate 0.0069   Epoch: 14   Global Step: 83750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:39,668-Speed 5624.04 samples/sec   Loss 2.6596   LearningRate 0.0069   Epoch: 14   Global Step: 83760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:41,494-Speed 5610.52 samples/sec   Loss 2.6330   LearningRate 0.0069   Epoch: 14   Global Step: 83770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:43,315-Speed 5624.22 samples/sec   Loss 2.4894   LearningRate 0.0069   Epoch: 14   Global Step: 83780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:45,146-Speed 5594.77 samples/sec   Loss 2.5485   LearningRate 0.0069   Epoch: 14   Global Step: 83790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:46,971-Speed 5611.78 samples/sec   Loss 2.6688   LearningRate 0.0069   Epoch: 14   Global Step: 83800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:48,816-Speed 5553.94 samples/sec   Loss 2.6589   LearningRate 0.0069   Epoch: 14   Global Step: 83810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:50,644-Speed 5604.19 samples/sec   Loss 2.6538   LearningRate 0.0069   Epoch: 14   Global Step: 83820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:52,476-Speed 5590.24 samples/sec   Loss 2.6860   LearningRate 0.0069   Epoch: 14   Global Step: 83830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:54,310-Speed 5585.96 samples/sec   Loss 2.5738   LearningRate 0.0069   Epoch: 14   Global Step: 83840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:56,132-Speed 5621.64 samples/sec   Loss 2.5919   LearningRate 0.0069   Epoch: 14   Global Step: 83850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:57,966-Speed 5586.46 samples/sec   Loss 2.6158   LearningRate 0.0069   Epoch: 14   Global Step: 83860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:43:59,816-Speed 5536.47 samples/sec   Loss 2.6846   LearningRate 0.0069   Epoch: 14   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:01,661-Speed 5552.03 samples/sec   Loss 2.6813   LearningRate 0.0069   Epoch: 14   Global Step: 83880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:03,489-Speed 5604.55 samples/sec   Loss 2.6300   LearningRate 0.0069   Epoch: 14   Global Step: 83890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:05,310-Speed 5624.85 samples/sec   Loss 2.6099   LearningRate 0.0069   Epoch: 14   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:07,135-Speed 5611.33 samples/sec   Loss 2.7045   LearningRate 0.0069   Epoch: 14   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:08,978-Speed 5557.75 samples/sec   Loss 2.7896   LearningRate 0.0069   Epoch: 14   Global Step: 83920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:10,800-Speed 5622.29 samples/sec   Loss 2.7009   LearningRate 0.0069   Epoch: 14   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:12,637-Speed 5577.00 samples/sec   Loss 2.6710   LearningRate 0.0069   Epoch: 14   Global Step: 83940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:14,452-Speed 5644.38 samples/sec   Loss 2.6449   LearningRate 0.0069   Epoch: 14   Global Step: 83950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:16,260-Speed 5664.87 samples/sec   Loss 2.7128   LearningRate 0.0068   Epoch: 14   Global Step: 83960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:18,088-Speed 5604.39 samples/sec   Loss 2.6024   LearningRate 0.0068   Epoch: 14   Global Step: 83970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:19,912-Speed 5615.24 samples/sec   Loss 2.5303   LearningRate 0.0068   Epoch: 14   Global Step: 83980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:21,727-Speed 5644.27 samples/sec   Loss 2.7667   LearningRate 0.0068   Epoch: 14   Global Step: 83990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:23,571-Speed 5556.45 samples/sec   Loss 2.5808   LearningRate 0.0068   Epoch: 14   Global Step: 84000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:44:49,760-[lfw][84000]XNorm: 22.024569
Training: 2022-04-27 06:44:49,760-[lfw][84000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-27 06:44:49,761-[lfw][84000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:45:20,393-[cfp_fp][84000]XNorm: 20.142442
Training: 2022-04-27 06:45:20,393-[cfp_fp][84000]Accuracy-Flip: 0.97114+-0.00728
Training: 2022-04-27 06:45:20,394-[cfp_fp][84000]Accuracy-Highest: 0.97114
Training: 2022-04-27 06:45:46,759-[agedb_30][84000]XNorm: 22.211608
Training: 2022-04-27 06:45:46,760-[agedb_30][84000]Accuracy-Flip: 0.98117+-0.00730
Training: 2022-04-27 06:45:46,760-[agedb_30][84000]Accuracy-Highest: 0.98117
Training: 2022-04-27 06:45:48,574-Speed 120.47 samples/sec   Loss 2.5513   LearningRate 0.0068   Epoch: 14   Global Step: 84010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:45:50,380-Speed 5673.84 samples/sec   Loss 2.6131   LearningRate 0.0068   Epoch: 14   Global Step: 84020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:45:52,216-Speed 5577.93 samples/sec   Loss 2.6097   LearningRate 0.0068   Epoch: 14   Global Step: 84030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:45:54,026-Speed 5660.10 samples/sec   Loss 2.7747   LearningRate 0.0068   Epoch: 14   Global Step: 84040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:45:55,839-Speed 5650.24 samples/sec   Loss 2.6565   LearningRate 0.0068   Epoch: 14   Global Step: 84050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:45:57,664-Speed 5612.97 samples/sec   Loss 2.6472   LearningRate 0.0068   Epoch: 14   Global Step: 84060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:45:59,473-Speed 5660.26 samples/sec   Loss 2.5819   LearningRate 0.0068   Epoch: 14   Global Step: 84070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:01,281-Speed 5667.19 samples/sec   Loss 2.7385   LearningRate 0.0068   Epoch: 14   Global Step: 84080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:03,113-Speed 5589.58 samples/sec   Loss 2.6308   LearningRate 0.0068   Epoch: 14   Global Step: 84090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:04,924-Speed 5657.00 samples/sec   Loss 2.6345   LearningRate 0.0068   Epoch: 14   Global Step: 84100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:06,724-Speed 5690.78 samples/sec   Loss 2.6060   LearningRate 0.0068   Epoch: 14   Global Step: 84110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:08,543-Speed 5630.46 samples/sec   Loss 2.5064   LearningRate 0.0068   Epoch: 14   Global Step: 84120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:10,363-Speed 5628.45 samples/sec   Loss 2.6954   LearningRate 0.0068   Epoch: 14   Global Step: 84130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:12,177-Speed 5646.77 samples/sec   Loss 2.6785   LearningRate 0.0068   Epoch: 14   Global Step: 84140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:14,008-Speed 5594.30 samples/sec   Loss 2.6318   LearningRate 0.0068   Epoch: 14   Global Step: 84150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:15,864-Speed 5518.06 samples/sec   Loss 2.7280   LearningRate 0.0068   Epoch: 14   Global Step: 84160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:17,725-Speed 5506.68 samples/sec   Loss 2.6775   LearningRate 0.0068   Epoch: 14   Global Step: 84170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:19,550-Speed 5614.26 samples/sec   Loss 2.5960   LearningRate 0.0067   Epoch: 14   Global Step: 84180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:21,364-Speed 5644.85 samples/sec   Loss 2.5573   LearningRate 0.0067   Epoch: 14   Global Step: 84190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:23,181-Speed 5638.56 samples/sec   Loss 2.6570   LearningRate 0.0067   Epoch: 14   Global Step: 84200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:25,001-Speed 5626.66 samples/sec   Loss 2.5749   LearningRate 0.0067   Epoch: 14   Global Step: 84210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:26,808-Speed 5669.39 samples/sec   Loss 2.6012   LearningRate 0.0067   Epoch: 14   Global Step: 84220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:28,624-Speed 5644.80 samples/sec   Loss 2.6836   LearningRate 0.0067   Epoch: 14   Global Step: 84230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:30,460-Speed 5578.14 samples/sec   Loss 2.6388   LearningRate 0.0067   Epoch: 14   Global Step: 84240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:32,276-Speed 5639.48 samples/sec   Loss 2.6885   LearningRate 0.0067   Epoch: 14   Global Step: 84250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:34,105-Speed 5600.17 samples/sec   Loss 2.6109   LearningRate 0.0067   Epoch: 14   Global Step: 84260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:35,920-Speed 5645.60 samples/sec   Loss 2.6135   LearningRate 0.0067   Epoch: 14   Global Step: 84270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:37,786-Speed 5491.27 samples/sec   Loss 2.5886   LearningRate 0.0067   Epoch: 14   Global Step: 84280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:39,619-Speed 5588.14 samples/sec   Loss 2.6574   LearningRate 0.0067   Epoch: 14   Global Step: 84290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:41,457-Speed 5572.10 samples/sec   Loss 2.5803   LearningRate 0.0067   Epoch: 14   Global Step: 84300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:43,261-Speed 5679.56 samples/sec   Loss 2.6717   LearningRate 0.0067   Epoch: 14   Global Step: 84310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:45,094-Speed 5587.22 samples/sec   Loss 2.6070   LearningRate 0.0067   Epoch: 14   Global Step: 84320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:46,915-Speed 5624.38 samples/sec   Loss 2.5739   LearningRate 0.0067   Epoch: 14   Global Step: 84330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:48,746-Speed 5593.56 samples/sec   Loss 2.6426   LearningRate 0.0067   Epoch: 14   Global Step: 84340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:50,621-Speed 5463.23 samples/sec   Loss 2.6037   LearningRate 0.0067   Epoch: 14   Global Step: 84350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:52,542-Speed 5334.06 samples/sec   Loss 2.5716   LearningRate 0.0067   Epoch: 14   Global Step: 84360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:54,382-Speed 5566.73 samples/sec   Loss 2.5081   LearningRate 0.0067   Epoch: 14   Global Step: 84370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:56,211-Speed 5600.03 samples/sec   Loss 2.6336   LearningRate 0.0067   Epoch: 14   Global Step: 84380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:46:58,045-Speed 5589.11 samples/sec   Loss 2.7026   LearningRate 0.0067   Epoch: 14   Global Step: 84390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:46:59,885-Speed 5566.55 samples/sec   Loss 2.5572   LearningRate 0.0066   Epoch: 14   Global Step: 84400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:01,726-Speed 5564.05 samples/sec   Loss 2.7601   LearningRate 0.0066   Epoch: 14   Global Step: 84410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:03,544-Speed 5633.04 samples/sec   Loss 2.6320   LearningRate 0.0066   Epoch: 14   Global Step: 84420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:05,391-Speed 5545.36 samples/sec   Loss 2.5367   LearningRate 0.0066   Epoch: 14   Global Step: 84430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:07,260-Speed 5480.12 samples/sec   Loss 2.6715   LearningRate 0.0066   Epoch: 14   Global Step: 84440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:09,111-Speed 5536.03 samples/sec   Loss 2.6428   LearningRate 0.0066   Epoch: 14   Global Step: 84450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:10,931-Speed 5628.22 samples/sec   Loss 2.7021   LearningRate 0.0066   Epoch: 14   Global Step: 84460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:12,767-Speed 5577.65 samples/sec   Loss 2.6363   LearningRate 0.0066   Epoch: 14   Global Step: 84470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:14,594-Speed 5608.74 samples/sec   Loss 2.6622   LearningRate 0.0066   Epoch: 14   Global Step: 84480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:47:16,419-Speed 5611.56 samples/sec   Loss 2.6062   LearningRate 0.0066   Epoch: 14   Global Step: 84490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:18,272-Speed 5529.36 samples/sec   Loss 2.5801   LearningRate 0.0066   Epoch: 14   Global Step: 84500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:20,104-Speed 5589.79 samples/sec   Loss 2.6283   LearningRate 0.0066   Epoch: 14   Global Step: 84510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:21,933-Speed 5601.20 samples/sec   Loss 2.6897   LearningRate 0.0066   Epoch: 14   Global Step: 84520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:23,832-Speed 5394.16 samples/sec   Loss 2.5790   LearningRate 0.0066   Epoch: 14   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:25,756-Speed 5324.09 samples/sec   Loss 2.6156   LearningRate 0.0066   Epoch: 14   Global Step: 84540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:27,609-Speed 5529.68 samples/sec   Loss 2.5165   LearningRate 0.0066   Epoch: 14   Global Step: 84550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:29,433-Speed 5614.99 samples/sec   Loss 2.6698   LearningRate 0.0066   Epoch: 14   Global Step: 84560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:31,241-Speed 5664.25 samples/sec   Loss 2.6042   LearningRate 0.0066   Epoch: 14   Global Step: 84570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:33,078-Speed 5576.64 samples/sec   Loss 2.5851   LearningRate 0.0066   Epoch: 14   Global Step: 84580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:34,894-Speed 5641.45 samples/sec   Loss 2.6226   LearningRate 0.0066   Epoch: 14   Global Step: 84590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:36,715-Speed 5625.73 samples/sec   Loss 2.5954   LearningRate 0.0066   Epoch: 14   Global Step: 84600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:38,566-Speed 5534.44 samples/sec   Loss 2.6095   LearningRate 0.0066   Epoch: 14   Global Step: 84610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:40,400-Speed 5583.93 samples/sec   Loss 2.5809   LearningRate 0.0065   Epoch: 14   Global Step: 84620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:42,227-Speed 5608.59 samples/sec   Loss 2.6671   LearningRate 0.0065   Epoch: 14   Global Step: 84630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:44,055-Speed 5602.89 samples/sec   Loss 2.5965   LearningRate 0.0065   Epoch: 14   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:45,907-Speed 5532.39 samples/sec   Loss 2.6509   LearningRate 0.0065   Epoch: 14   Global Step: 84650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:47,728-Speed 5625.08 samples/sec   Loss 2.5883   LearningRate 0.0065   Epoch: 14   Global Step: 84660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:49,537-Speed 5661.72 samples/sec   Loss 2.6662   LearningRate 0.0065   Epoch: 14   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:51,366-Speed 5600.53 samples/sec   Loss 2.5802   LearningRate 0.0065   Epoch: 14   Global Step: 84680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:53,185-Speed 5629.08 samples/sec   Loss 2.4683   LearningRate 0.0065   Epoch: 14   Global Step: 84690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:55,001-Speed 5640.91 samples/sec   Loss 2.6148   LearningRate 0.0065   Epoch: 14   Global Step: 84700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:56,840-Speed 5572.20 samples/sec   Loss 2.5323   LearningRate 0.0065   Epoch: 14   Global Step: 84710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:47:58,679-Speed 5567.96 samples/sec   Loss 2.4876   LearningRate 0.0065   Epoch: 14   Global Step: 84720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:00,504-Speed 5614.06 samples/sec   Loss 2.5191   LearningRate 0.0065   Epoch: 14   Global Step: 84730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:02,322-Speed 5633.80 samples/sec   Loss 2.5679   LearningRate 0.0065   Epoch: 14   Global Step: 84740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:04,149-Speed 5608.55 samples/sec   Loss 2.7268   LearningRate 0.0065   Epoch: 14   Global Step: 84750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:05,981-Speed 5591.70 samples/sec   Loss 2.5880   LearningRate 0.0065   Epoch: 14   Global Step: 84760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:07,826-Speed 5549.70 samples/sec   Loss 2.5985   LearningRate 0.0065   Epoch: 14   Global Step: 84770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:09,679-Speed 5529.99 samples/sec   Loss 2.6411   LearningRate 0.0065   Epoch: 14   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:11,490-Speed 5656.01 samples/sec   Loss 2.6510   LearningRate 0.0065   Epoch: 14   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:13,299-Speed 5660.28 samples/sec   Loss 2.5470   LearningRate 0.0065   Epoch: 14   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:15,120-Speed 5628.23 samples/sec   Loss 2.5932   LearningRate 0.0065   Epoch: 14   Global Step: 84810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:16,952-Speed 5589.08 samples/sec   Loss 2.5257   LearningRate 0.0065   Epoch: 14   Global Step: 84820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:18,761-Speed 5663.34 samples/sec   Loss 2.6099   LearningRate 0.0065   Epoch: 14   Global Step: 84830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:20,595-Speed 5584.46 samples/sec   Loss 2.6484   LearningRate 0.0064   Epoch: 14   Global Step: 84840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:22,404-Speed 5661.79 samples/sec   Loss 2.6330   LearningRate 0.0064   Epoch: 14   Global Step: 84850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:24,238-Speed 5587.85 samples/sec   Loss 2.6793   LearningRate 0.0064   Epoch: 14   Global Step: 84860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:26,100-Speed 5500.09 samples/sec   Loss 2.6867   LearningRate 0.0064   Epoch: 14   Global Step: 84870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:27,931-Speed 5595.04 samples/sec   Loss 2.5743   LearningRate 0.0064   Epoch: 14   Global Step: 84880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:29,756-Speed 5611.04 samples/sec   Loss 2.6052   LearningRate 0.0064   Epoch: 14   Global Step: 84890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:31,595-Speed 5572.41 samples/sec   Loss 2.5374   LearningRate 0.0064   Epoch: 14   Global Step: 84900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:33,401-Speed 5671.04 samples/sec   Loss 2.6160   LearningRate 0.0064   Epoch: 14   Global Step: 84910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:35,219-Speed 5633.50 samples/sec   Loss 2.6342   LearningRate 0.0064   Epoch: 14   Global Step: 84920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:48:37,036-Speed 5638.78 samples/sec   Loss 2.6894   LearningRate 0.0064   Epoch: 14   Global Step: 84930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:38,856-Speed 5628.12 samples/sec   Loss 2.6456   LearningRate 0.0064   Epoch: 14   Global Step: 84940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:40,677-Speed 5623.43 samples/sec   Loss 2.5845   LearningRate 0.0064   Epoch: 14   Global Step: 84950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:42,497-Speed 5629.56 samples/sec   Loss 2.6143   LearningRate 0.0064   Epoch: 14   Global Step: 84960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:44,344-Speed 5546.69 samples/sec   Loss 2.6411   LearningRate 0.0064   Epoch: 14   Global Step: 84970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:46,172-Speed 5603.41 samples/sec   Loss 2.6651   LearningRate 0.0064   Epoch: 14   Global Step: 84980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:48,013-Speed 5564.95 samples/sec   Loss 2.5621   LearningRate 0.0064   Epoch: 14   Global Step: 84990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:49,846-Speed 5586.81 samples/sec   Loss 2.5248   LearningRate 0.0064   Epoch: 14   Global Step: 85000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:51,681-Speed 5582.58 samples/sec   Loss 2.5623   LearningRate 0.0064   Epoch: 14   Global Step: 85010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:53,511-Speed 5598.37 samples/sec   Loss 2.6380   LearningRate 0.0064   Epoch: 14   Global Step: 85020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:55,327-Speed 5640.55 samples/sec   Loss 2.5419   LearningRate 0.0064   Epoch: 14   Global Step: 85030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:57,153-Speed 5607.27 samples/sec   Loss 2.5507   LearningRate 0.0064   Epoch: 14   Global Step: 85040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:48:58,966-Speed 5651.46 samples/sec   Loss 2.5960   LearningRate 0.0064   Epoch: 14   Global Step: 85050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:00,792-Speed 5609.15 samples/sec   Loss 2.6111   LearningRate 0.0064   Epoch: 14   Global Step: 85060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:02,613-Speed 5624.40 samples/sec   Loss 2.5768   LearningRate 0.0063   Epoch: 14   Global Step: 85070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:04,432-Speed 5633.65 samples/sec   Loss 2.5767   LearningRate 0.0063   Epoch: 14   Global Step: 85080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:06,251-Speed 5630.00 samples/sec   Loss 2.5588   LearningRate 0.0063   Epoch: 14   Global Step: 85090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:08,069-Speed 5634.07 samples/sec   Loss 2.6024   LearningRate 0.0063   Epoch: 14   Global Step: 85100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:09,881-Speed 5655.23 samples/sec   Loss 2.5219   LearningRate 0.0063   Epoch: 14   Global Step: 85110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:11,733-Speed 5531.81 samples/sec   Loss 2.4864   LearningRate 0.0063   Epoch: 14   Global Step: 85120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:13,557-Speed 5615.16 samples/sec   Loss 2.4748   LearningRate 0.0063   Epoch: 14   Global Step: 85130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:15,395-Speed 5571.83 samples/sec   Loss 2.6555   LearningRate 0.0063   Epoch: 14   Global Step: 85140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:17,222-Speed 5608.87 samples/sec   Loss 2.5302   LearningRate 0.0063   Epoch: 14   Global Step: 85150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:19,063-Speed 5563.55 samples/sec   Loss 2.5971   LearningRate 0.0063   Epoch: 14   Global Step: 85160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:20,879-Speed 5638.09 samples/sec   Loss 2.5704   LearningRate 0.0063   Epoch: 14   Global Step: 85170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:22,690-Speed 5655.99 samples/sec   Loss 2.5099   LearningRate 0.0063   Epoch: 14   Global Step: 85180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:24,514-Speed 5616.38 samples/sec   Loss 2.5158   LearningRate 0.0063   Epoch: 14   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:26,346-Speed 5592.21 samples/sec   Loss 2.6176   LearningRate 0.0063   Epoch: 14   Global Step: 85200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:28,164-Speed 5632.82 samples/sec   Loss 2.4966   LearningRate 0.0063   Epoch: 14   Global Step: 85210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:29,976-Speed 5653.85 samples/sec   Loss 2.5662   LearningRate 0.0063   Epoch: 14   Global Step: 85220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:31,803-Speed 5608.87 samples/sec   Loss 2.6640   LearningRate 0.0063   Epoch: 14   Global Step: 85230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:49:33,607-Speed 5677.45 samples/sec   Loss 2.6414   LearningRate 0.0063   Epoch: 14   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:35,432-Speed 5613.60 samples/sec   Loss 2.4453   LearningRate 0.0063   Epoch: 14   Global Step: 85250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:37,259-Speed 5607.83 samples/sec   Loss 2.6367   LearningRate 0.0063   Epoch: 14   Global Step: 85260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:39,100-Speed 5564.58 samples/sec   Loss 2.6090   LearningRate 0.0063   Epoch: 14   Global Step: 85270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:41,020-Speed 5332.76 samples/sec   Loss 2.5434   LearningRate 0.0063   Epoch: 14   Global Step: 85280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:42,832-Speed 5653.43 samples/sec   Loss 2.5866   LearningRate 0.0063   Epoch: 14   Global Step: 85290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:54,343-Speed 889.65 samples/sec   Loss 2.0116   LearningRate 0.0062   Epoch: 15   Global Step: 85300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:56,310-Speed 5210.06 samples/sec   Loss 1.9755   LearningRate 0.0062   Epoch: 15   Global Step: 85310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:49:58,163-Speed 5527.66 samples/sec   Loss 2.0267   LearningRate 0.0062   Epoch: 15   Global Step: 85320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:00,003-Speed 5566.26 samples/sec   Loss 2.0159   LearningRate 0.0062   Epoch: 15   Global Step: 85330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:01,838-Speed 5580.75 samples/sec   Loss 1.9352   LearningRate 0.0062   Epoch: 15   Global Step: 85340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:03,676-Speed 5576.19 samples/sec   Loss 2.0299   LearningRate 0.0062   Epoch: 15   Global Step: 85350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:05,495-Speed 5632.22 samples/sec   Loss 2.0057   LearningRate 0.0062   Epoch: 15   Global Step: 85360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:07,311-Speed 5640.57 samples/sec   Loss 1.8824   LearningRate 0.0062   Epoch: 15   Global Step: 85370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:09,142-Speed 5594.61 samples/sec   Loss 1.9820   LearningRate 0.0062   Epoch: 15   Global Step: 85380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:10,971-Speed 5599.16 samples/sec   Loss 1.9518   LearningRate 0.0062   Epoch: 15   Global Step: 85390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:12,814-Speed 5557.74 samples/sec   Loss 1.8721   LearningRate 0.0062   Epoch: 15   Global Step: 85400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:14,735-Speed 5334.64 samples/sec   Loss 2.0760   LearningRate 0.0062   Epoch: 15   Global Step: 85410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:16,596-Speed 5503.75 samples/sec   Loss 1.9916   LearningRate 0.0062   Epoch: 15   Global Step: 85420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:18,431-Speed 5582.76 samples/sec   Loss 2.0600   LearningRate 0.0062   Epoch: 15   Global Step: 85430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:20,251-Speed 5627.47 samples/sec   Loss 1.9937   LearningRate 0.0062   Epoch: 15   Global Step: 85440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:22,090-Speed 5569.95 samples/sec   Loss 1.9375   LearningRate 0.0062   Epoch: 15   Global Step: 85450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:23,919-Speed 5599.90 samples/sec   Loss 2.0550   LearningRate 0.0062   Epoch: 15   Global Step: 85460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:25,745-Speed 5610.68 samples/sec   Loss 2.0321   LearningRate 0.0062   Epoch: 15   Global Step: 85470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:27,598-Speed 5527.62 samples/sec   Loss 2.1211   LearningRate 0.0062   Epoch: 15   Global Step: 85480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:29,419-Speed 5624.53 samples/sec   Loss 2.0959   LearningRate 0.0062   Epoch: 15   Global Step: 85490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:31,241-Speed 5623.22 samples/sec   Loss 2.0064   LearningRate 0.0062   Epoch: 15   Global Step: 85500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:33,075-Speed 5584.10 samples/sec   Loss 1.9898   LearningRate 0.0062   Epoch: 15   Global Step: 85510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:34,903-Speed 5604.37 samples/sec   Loss 2.0049   LearningRate 0.0061   Epoch: 15   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:36,846-Speed 5270.85 samples/sec   Loss 2.1517   LearningRate 0.0061   Epoch: 15   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:38,686-Speed 5567.88 samples/sec   Loss 1.9408   LearningRate 0.0061   Epoch: 15   Global Step: 85540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:50:40,499-Speed 5649.67 samples/sec   Loss 2.0618   LearningRate 0.0061   Epoch: 15   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:42,340-Speed 5565.77 samples/sec   Loss 1.9871   LearningRate 0.0061   Epoch: 15   Global Step: 85560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:44,181-Speed 5563.55 samples/sec   Loss 1.9389   LearningRate 0.0061   Epoch: 15   Global Step: 85570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:46,001-Speed 5628.19 samples/sec   Loss 2.0446   LearningRate 0.0061   Epoch: 15   Global Step: 85580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:47,825-Speed 5616.33 samples/sec   Loss 2.0150   LearningRate 0.0061   Epoch: 15   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:49,663-Speed 5573.00 samples/sec   Loss 2.0128   LearningRate 0.0061   Epoch: 15   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:51,506-Speed 5557.45 samples/sec   Loss 1.9882   LearningRate 0.0061   Epoch: 15   Global Step: 85610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:53,355-Speed 5539.10 samples/sec   Loss 2.1135   LearningRate 0.0061   Epoch: 15   Global Step: 85620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:55,182-Speed 5607.44 samples/sec   Loss 1.9278   LearningRate 0.0061   Epoch: 15   Global Step: 85630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:57,025-Speed 5558.10 samples/sec   Loss 1.9915   LearningRate 0.0061   Epoch: 15   Global Step: 85640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:50:58,869-Speed 5555.37 samples/sec   Loss 2.0626   LearningRate 0.0061   Epoch: 15   Global Step: 85650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:00,711-Speed 5560.21 samples/sec   Loss 2.0905   LearningRate 0.0061   Epoch: 15   Global Step: 85660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:02,546-Speed 5583.52 samples/sec   Loss 2.2154   LearningRate 0.0061   Epoch: 15   Global Step: 85670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:04,388-Speed 5560.20 samples/sec   Loss 1.9661   LearningRate 0.0061   Epoch: 15   Global Step: 85680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:06,211-Speed 5619.88 samples/sec   Loss 2.0248   LearningRate 0.0061   Epoch: 15   Global Step: 85690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:08,054-Speed 5557.90 samples/sec   Loss 2.0430   LearningRate 0.0061   Epoch: 15   Global Step: 85700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:09,867-Speed 5649.45 samples/sec   Loss 2.0308   LearningRate 0.0061   Epoch: 15   Global Step: 85710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:11,695-Speed 5605.21 samples/sec   Loss 2.0825   LearningRate 0.0061   Epoch: 15   Global Step: 85720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:13,554-Speed 5509.27 samples/sec   Loss 2.1556   LearningRate 0.0061   Epoch: 15   Global Step: 85730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:15,373-Speed 5632.69 samples/sec   Loss 2.0923   LearningRate 0.0061   Epoch: 15   Global Step: 85740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:17,181-Speed 5664.25 samples/sec   Loss 1.9388   LearningRate 0.0060   Epoch: 15   Global Step: 85750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:19,014-Speed 5587.78 samples/sec   Loss 1.9458   LearningRate 0.0060   Epoch: 15   Global Step: 85760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:20,840-Speed 5610.95 samples/sec   Loss 2.0412   LearningRate 0.0060   Epoch: 15   Global Step: 85770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:22,674-Speed 5586.08 samples/sec   Loss 2.0337   LearningRate 0.0060   Epoch: 15   Global Step: 85780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:24,510-Speed 5578.71 samples/sec   Loss 2.0914   LearningRate 0.0060   Epoch: 15   Global Step: 85790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:26,329-Speed 5633.97 samples/sec   Loss 1.9600   LearningRate 0.0060   Epoch: 15   Global Step: 85800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:28,157-Speed 5603.11 samples/sec   Loss 2.1200   LearningRate 0.0060   Epoch: 15   Global Step: 85810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:30,000-Speed 5558.04 samples/sec   Loss 2.0049   LearningRate 0.0060   Epoch: 15   Global Step: 85820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:31,859-Speed 5508.92 samples/sec   Loss 2.0686   LearningRate 0.0060   Epoch: 15   Global Step: 85830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:33,677-Speed 5637.45 samples/sec   Loss 2.0484   LearningRate 0.0060   Epoch: 15   Global Step: 85840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:35,500-Speed 5619.33 samples/sec   Loss 2.0797   LearningRate 0.0060   Epoch: 15   Global Step: 85850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:37,323-Speed 5618.72 samples/sec   Loss 2.0786   LearningRate 0.0060   Epoch: 15   Global Step: 85860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:51:39,138-Speed 5643.46 samples/sec   Loss 2.0327   LearningRate 0.0060   Epoch: 15   Global Step: 85870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:40,972-Speed 5582.89 samples/sec   Loss 2.1208   LearningRate 0.0060   Epoch: 15   Global Step: 85880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:42,799-Speed 5608.83 samples/sec   Loss 2.0279   LearningRate 0.0060   Epoch: 15   Global Step: 85890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:44,614-Speed 5642.24 samples/sec   Loss 2.0078   LearningRate 0.0060   Epoch: 15   Global Step: 85900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:46,433-Speed 5631.13 samples/sec   Loss 2.1016   LearningRate 0.0060   Epoch: 15   Global Step: 85910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:48,242-Speed 5663.93 samples/sec   Loss 2.0810   LearningRate 0.0060   Epoch: 15   Global Step: 85920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:50,072-Speed 5596.32 samples/sec   Loss 2.1386   LearningRate 0.0060   Epoch: 15   Global Step: 85930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:51,887-Speed 5656.12 samples/sec   Loss 2.1297   LearningRate 0.0060   Epoch: 15   Global Step: 85940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:53,711-Speed 5616.75 samples/sec   Loss 2.1668   LearningRate 0.0060   Epoch: 15   Global Step: 85950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:55,534-Speed 5620.10 samples/sec   Loss 2.1806   LearningRate 0.0060   Epoch: 15   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:51:57,358-Speed 5616.45 samples/sec   Loss 2.0848   LearningRate 0.0060   Epoch: 15   Global Step: 85970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:51:59,175-Speed 5636.50 samples/sec   Loss 2.0757   LearningRate 0.0060   Epoch: 15   Global Step: 85980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:52:01,003-Speed 5602.65 samples/sec   Loss 2.1096   LearningRate 0.0059   Epoch: 15   Global Step: 85990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:52:02,823-Speed 5628.43 samples/sec   Loss 2.1358   LearningRate 0.0059   Epoch: 15   Global Step: 86000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:52:28,877-[lfw][86000]XNorm: 22.929928
Training: 2022-04-27 06:52:28,877-[lfw][86000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-27 06:52:28,878-[lfw][86000]Accuracy-Highest: 0.99800
Training: 2022-04-27 06:52:59,058-[cfp_fp][86000]XNorm: 21.044778
Training: 2022-04-27 06:52:59,058-[cfp_fp][86000]Accuracy-Flip: 0.97357+-0.00806
Training: 2022-04-27 06:52:59,059-[cfp_fp][86000]Accuracy-Highest: 0.97357
Training: 2022-04-27 06:53:25,118-[agedb_30][86000]XNorm: 22.775292
Training: 2022-04-27 06:53:25,119-[agedb_30][86000]Accuracy-Flip: 0.97967+-0.00802
Training: 2022-04-27 06:53:25,119-[agedb_30][86000]Accuracy-Highest: 0.98117
Training: 2022-04-27 06:53:26,976-Speed 121.69 samples/sec   Loss 2.0402   LearningRate 0.0059   Epoch: 15   Global Step: 86010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:28,787-Speed 5655.39 samples/sec   Loss 2.1865   LearningRate 0.0059   Epoch: 15   Global Step: 86020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:30,592-Speed 5673.03 samples/sec   Loss 2.0303   LearningRate 0.0059   Epoch: 15   Global Step: 86030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:32,408-Speed 5642.07 samples/sec   Loss 2.0746   LearningRate 0.0059   Epoch: 15   Global Step: 86040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:34,224-Speed 5641.36 samples/sec   Loss 1.9870   LearningRate 0.0059   Epoch: 15   Global Step: 86050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:36,048-Speed 5614.25 samples/sec   Loss 2.0376   LearningRate 0.0059   Epoch: 15   Global Step: 86060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:37,862-Speed 5647.34 samples/sec   Loss 2.0254   LearningRate 0.0059   Epoch: 15   Global Step: 86070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:39,658-Speed 5703.03 samples/sec   Loss 2.1255   LearningRate 0.0059   Epoch: 15   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:41,472-Speed 5647.96 samples/sec   Loss 2.0680   LearningRate 0.0059   Epoch: 15   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:43,284-Speed 5653.14 samples/sec   Loss 2.1092   LearningRate 0.0059   Epoch: 15   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:45,110-Speed 5609.40 samples/sec   Loss 2.1103   LearningRate 0.0059   Epoch: 15   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:46,920-Speed 5659.60 samples/sec   Loss 2.1186   LearningRate 0.0059   Epoch: 15   Global Step: 86120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:48,726-Speed 5671.18 samples/sec   Loss 2.1459   LearningRate 0.0059   Epoch: 15   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:50,544-Speed 5633.07 samples/sec   Loss 2.0053   LearningRate 0.0059   Epoch: 15   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:52,368-Speed 5616.29 samples/sec   Loss 2.0870   LearningRate 0.0059   Epoch: 15   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:54,217-Speed 5540.83 samples/sec   Loss 2.1051   LearningRate 0.0059   Epoch: 15   Global Step: 86160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:56,059-Speed 5561.72 samples/sec   Loss 2.1185   LearningRate 0.0059   Epoch: 15   Global Step: 86170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:53:57,877-Speed 5633.94 samples/sec   Loss 2.0779   LearningRate 0.0059   Epoch: 15   Global Step: 86180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:53:59,703-Speed 5608.37 samples/sec   Loss 2.1111   LearningRate 0.0059   Epoch: 15   Global Step: 86190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:01,546-Speed 5560.81 samples/sec   Loss 2.1781   LearningRate 0.0059   Epoch: 15   Global Step: 86200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:03,379-Speed 5587.78 samples/sec   Loss 2.1638   LearningRate 0.0059   Epoch: 15   Global Step: 86210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:05,215-Speed 5577.28 samples/sec   Loss 2.1265   LearningRate 0.0058   Epoch: 15   Global Step: 86220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:07,032-Speed 5638.15 samples/sec   Loss 2.0110   LearningRate 0.0058   Epoch: 15   Global Step: 86230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:08,861-Speed 5600.74 samples/sec   Loss 2.1130   LearningRate 0.0058   Epoch: 15   Global Step: 86240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:10,696-Speed 5583.63 samples/sec   Loss 2.0459   LearningRate 0.0058   Epoch: 15   Global Step: 86250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:12,514-Speed 5633.82 samples/sec   Loss 2.2006   LearningRate 0.0058   Epoch: 15   Global Step: 86260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:14,329-Speed 5644.80 samples/sec   Loss 2.1895   LearningRate 0.0058   Epoch: 15   Global Step: 86270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:54:16,146-Speed 5637.95 samples/sec   Loss 2.1463   LearningRate 0.0058   Epoch: 15   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:17,977-Speed 5592.10 samples/sec   Loss 2.1214   LearningRate 0.0058   Epoch: 15   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:19,802-Speed 5615.06 samples/sec   Loss 2.1382   LearningRate 0.0058   Epoch: 15   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:21,626-Speed 5614.26 samples/sec   Loss 2.1494   LearningRate 0.0058   Epoch: 15   Global Step: 86310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:23,463-Speed 5575.73 samples/sec   Loss 2.1444   LearningRate 0.0058   Epoch: 15   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:25,316-Speed 5527.53 samples/sec   Loss 2.1491   LearningRate 0.0058   Epoch: 15   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:27,154-Speed 5575.17 samples/sec   Loss 2.2616   LearningRate 0.0058   Epoch: 15   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:28,986-Speed 5592.92 samples/sec   Loss 2.1763   LearningRate 0.0058   Epoch: 15   Global Step: 86350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:30,853-Speed 5486.52 samples/sec   Loss 2.1084   LearningRate 0.0058   Epoch: 15   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:32,676-Speed 5618.50 samples/sec   Loss 2.1854   LearningRate 0.0058   Epoch: 15   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:34,478-Speed 5684.68 samples/sec   Loss 2.2200   LearningRate 0.0058   Epoch: 15   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:36,329-Speed 5532.71 samples/sec   Loss 2.1528   LearningRate 0.0058   Epoch: 15   Global Step: 86390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:38,159-Speed 5598.56 samples/sec   Loss 2.0905   LearningRate 0.0058   Epoch: 15   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:39,989-Speed 5599.80 samples/sec   Loss 2.1203   LearningRate 0.0058   Epoch: 15   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:41,859-Speed 5475.49 samples/sec   Loss 2.1240   LearningRate 0.0058   Epoch: 15   Global Step: 86420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:43,785-Speed 5319.51 samples/sec   Loss 2.1421   LearningRate 0.0058   Epoch: 15   Global Step: 86430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:45,607-Speed 5620.84 samples/sec   Loss 2.0460   LearningRate 0.0058   Epoch: 15   Global Step: 86440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:47,478-Speed 5475.88 samples/sec   Loss 2.1705   LearningRate 0.0058   Epoch: 15   Global Step: 86450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:49,364-Speed 5433.30 samples/sec   Loss 1.9865   LearningRate 0.0057   Epoch: 15   Global Step: 86460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:51,216-Speed 5530.27 samples/sec   Loss 2.1136   LearningRate 0.0057   Epoch: 15   Global Step: 86470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:53,131-Speed 5350.57 samples/sec   Loss 2.2682   LearningRate 0.0057   Epoch: 15   Global Step: 86480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:55,015-Speed 5437.84 samples/sec   Loss 2.1597   LearningRate 0.0057   Epoch: 15   Global Step: 86490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:56,854-Speed 5569.62 samples/sec   Loss 2.1872   LearningRate 0.0057   Epoch: 15   Global Step: 86500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:54:58,674-Speed 5627.71 samples/sec   Loss 2.2565   LearningRate 0.0057   Epoch: 15   Global Step: 86510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:00,515-Speed 5563.09 samples/sec   Loss 2.1679   LearningRate 0.0057   Epoch: 15   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:02,396-Speed 5446.62 samples/sec   Loss 2.2005   LearningRate 0.0057   Epoch: 15   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:04,204-Speed 5666.25 samples/sec   Loss 2.0462   LearningRate 0.0057   Epoch: 15   Global Step: 86540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:06,018-Speed 5645.04 samples/sec   Loss 2.1116   LearningRate 0.0057   Epoch: 15   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:07,857-Speed 5570.55 samples/sec   Loss 2.1363   LearningRate 0.0057   Epoch: 15   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:09,693-Speed 5579.09 samples/sec   Loss 2.2602   LearningRate 0.0057   Epoch: 15   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:11,515-Speed 5623.46 samples/sec   Loss 2.1223   LearningRate 0.0057   Epoch: 15   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:13,358-Speed 5557.30 samples/sec   Loss 2.1748   LearningRate 0.0057   Epoch: 15   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:15,196-Speed 5573.19 samples/sec   Loss 2.2249   LearningRate 0.0057   Epoch: 15   Global Step: 86600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:17,021-Speed 5612.82 samples/sec   Loss 2.1894   LearningRate 0.0057   Epoch: 15   Global Step: 86610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:18,852-Speed 5595.89 samples/sec   Loss 2.1932   LearningRate 0.0057   Epoch: 15   Global Step: 86620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:20,671-Speed 5632.04 samples/sec   Loss 2.1664   LearningRate 0.0057   Epoch: 15   Global Step: 86630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:22,509-Speed 5571.20 samples/sec   Loss 2.0942   LearningRate 0.0057   Epoch: 15   Global Step: 86640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:24,324-Speed 5645.17 samples/sec   Loss 2.1823   LearningRate 0.0057   Epoch: 15   Global Step: 86650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:26,172-Speed 5541.59 samples/sec   Loss 2.1287   LearningRate 0.0057   Epoch: 15   Global Step: 86660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:27,988-Speed 5640.49 samples/sec   Loss 2.2125   LearningRate 0.0057   Epoch: 15   Global Step: 86670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:29,797-Speed 5662.07 samples/sec   Loss 2.0875   LearningRate 0.0057   Epoch: 15   Global Step: 86680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:55:31,596-Speed 5696.41 samples/sec   Loss 2.2101   LearningRate 0.0056   Epoch: 15   Global Step: 86690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:33,417-Speed 5623.02 samples/sec   Loss 2.1897   LearningRate 0.0056   Epoch: 15   Global Step: 86700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:35,235-Speed 5634.59 samples/sec   Loss 2.1950   LearningRate 0.0056   Epoch: 15   Global Step: 86710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:37,067-Speed 5592.12 samples/sec   Loss 2.1172   LearningRate 0.0056   Epoch: 15   Global Step: 86720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:38,875-Speed 5667.48 samples/sec   Loss 2.1656   LearningRate 0.0056   Epoch: 15   Global Step: 86730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:40,682-Speed 5666.28 samples/sec   Loss 2.1912   LearningRate 0.0056   Epoch: 15   Global Step: 86740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:42,495-Speed 5650.66 samples/sec   Loss 2.1388   LearningRate 0.0056   Epoch: 15   Global Step: 86750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:44,322-Speed 5605.45 samples/sec   Loss 2.1590   LearningRate 0.0056   Epoch: 15   Global Step: 86760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:46,164-Speed 5562.60 samples/sec   Loss 2.1299   LearningRate 0.0056   Epoch: 15   Global Step: 86770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:48,005-Speed 5563.55 samples/sec   Loss 2.0321   LearningRate 0.0056   Epoch: 15   Global Step: 86780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:49,818-Speed 5647.92 samples/sec   Loss 2.1530   LearningRate 0.0056   Epoch: 15   Global Step: 86790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:51,646-Speed 5606.74 samples/sec   Loss 2.1921   LearningRate 0.0056   Epoch: 15   Global Step: 86800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:53,453-Speed 5667.28 samples/sec   Loss 2.1252   LearningRate 0.0056   Epoch: 15   Global Step: 86810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:55,268-Speed 5644.68 samples/sec   Loss 2.1614   LearningRate 0.0056   Epoch: 15   Global Step: 86820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:57,097-Speed 5600.83 samples/sec   Loss 2.2855   LearningRate 0.0056   Epoch: 15   Global Step: 86830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:55:58,916-Speed 5630.86 samples/sec   Loss 2.1968   LearningRate 0.0056   Epoch: 15   Global Step: 86840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:00,746-Speed 5599.69 samples/sec   Loss 2.0245   LearningRate 0.0056   Epoch: 15   Global Step: 86850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:02,555-Speed 5659.73 samples/sec   Loss 2.2508   LearningRate 0.0056   Epoch: 15   Global Step: 86860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:04,371-Speed 5642.21 samples/sec   Loss 2.2915   LearningRate 0.0056   Epoch: 15   Global Step: 86870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:06,235-Speed 5493.86 samples/sec   Loss 2.1983   LearningRate 0.0056   Epoch: 15   Global Step: 86880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:08,061-Speed 5609.85 samples/sec   Loss 2.2859   LearningRate 0.0056   Epoch: 15   Global Step: 86890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:09,876-Speed 5643.76 samples/sec   Loss 2.1532   LearningRate 0.0056   Epoch: 15   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:11,704-Speed 5605.31 samples/sec   Loss 2.1817   LearningRate 0.0056   Epoch: 15   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:13,539-Speed 5580.22 samples/sec   Loss 2.2524   LearningRate 0.0056   Epoch: 15   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:15,350-Speed 5657.04 samples/sec   Loss 2.1999   LearningRate 0.0055   Epoch: 15   Global Step: 86930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:17,169-Speed 5632.95 samples/sec   Loss 2.1122   LearningRate 0.0055   Epoch: 15   Global Step: 86940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:18,995-Speed 5610.04 samples/sec   Loss 2.2064   LearningRate 0.0055   Epoch: 15   Global Step: 86950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:20,818-Speed 5619.18 samples/sec   Loss 2.1783   LearningRate 0.0055   Epoch: 15   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:22,628-Speed 5657.32 samples/sec   Loss 2.1267   LearningRate 0.0055   Epoch: 15   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:24,448-Speed 5628.43 samples/sec   Loss 2.0952   LearningRate 0.0055   Epoch: 15   Global Step: 86980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:26,246-Speed 5698.55 samples/sec   Loss 2.2614   LearningRate 0.0055   Epoch: 15   Global Step: 86990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:28,064-Speed 5633.63 samples/sec   Loss 2.1687   LearningRate 0.0055   Epoch: 15   Global Step: 87000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:29,885-Speed 5625.31 samples/sec   Loss 2.1627   LearningRate 0.0055   Epoch: 15   Global Step: 87010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:31,730-Speed 5554.51 samples/sec   Loss 2.2317   LearningRate 0.0055   Epoch: 15   Global Step: 87020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:33,568-Speed 5573.35 samples/sec   Loss 2.0718   LearningRate 0.0055   Epoch: 15   Global Step: 87030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:35,486-Speed 5339.03 samples/sec   Loss 2.1591   LearningRate 0.0055   Epoch: 15   Global Step: 87040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:37,377-Speed 5418.84 samples/sec   Loss 2.1902   LearningRate 0.0055   Epoch: 15   Global Step: 87050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:39,278-Speed 5389.12 samples/sec   Loss 2.2289   LearningRate 0.0055   Epoch: 15   Global Step: 87060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:41,214-Speed 5289.92 samples/sec   Loss 2.1828   LearningRate 0.0055   Epoch: 15   Global Step: 87070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:43,062-Speed 5544.65 samples/sec   Loss 2.1686   LearningRate 0.0055   Epoch: 15   Global Step: 87080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:44,904-Speed 5560.96 samples/sec   Loss 2.2984   LearningRate 0.0055   Epoch: 15   Global Step: 87090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:46,733-Speed 5598.45 samples/sec   Loss 2.2052   LearningRate 0.0055   Epoch: 15   Global Step: 87100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:48,600-Speed 5487.88 samples/sec   Loss 2.1936   LearningRate 0.0055   Epoch: 15   Global Step: 87110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:50,421-Speed 5622.60 samples/sec   Loss 2.0682   LearningRate 0.0055   Epoch: 15   Global Step: 87120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:52,251-Speed 5600.27 samples/sec   Loss 2.1940   LearningRate 0.0055   Epoch: 15   Global Step: 87130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:54,084-Speed 5588.12 samples/sec   Loss 2.2478   LearningRate 0.0055   Epoch: 15   Global Step: 87140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:55,977-Speed 5411.26 samples/sec   Loss 2.2196   LearningRate 0.0055   Epoch: 15   Global Step: 87150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:57,887-Speed 5362.03 samples/sec   Loss 2.2099   LearningRate 0.0055   Epoch: 15   Global Step: 87160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:56:59,720-Speed 5587.99 samples/sec   Loss 2.1208   LearningRate 0.0055   Epoch: 15   Global Step: 87170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:01,538-Speed 5635.80 samples/sec   Loss 2.1677   LearningRate 0.0054   Epoch: 15   Global Step: 87180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:03,348-Speed 5660.14 samples/sec   Loss 2.2846   LearningRate 0.0054   Epoch: 15   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:05,158-Speed 5659.48 samples/sec   Loss 2.3086   LearningRate 0.0054   Epoch: 15   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:06,981-Speed 5617.54 samples/sec   Loss 2.2349   LearningRate 0.0054   Epoch: 15   Global Step: 87210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:08,820-Speed 5572.75 samples/sec   Loss 2.2294   LearningRate 0.0054   Epoch: 15   Global Step: 87220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:10,645-Speed 5612.86 samples/sec   Loss 2.1731   LearningRate 0.0054   Epoch: 15   Global Step: 87230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:12,460-Speed 5643.43 samples/sec   Loss 2.1518   LearningRate 0.0054   Epoch: 15   Global Step: 87240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:14,278-Speed 5631.48 samples/sec   Loss 2.2446   LearningRate 0.0054   Epoch: 15   Global Step: 87250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:16,096-Speed 5636.39 samples/sec   Loss 2.1200   LearningRate 0.0054   Epoch: 15   Global Step: 87260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:17,917-Speed 5623.84 samples/sec   Loss 2.1965   LearningRate 0.0054   Epoch: 15   Global Step: 87270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:19,730-Speed 5648.91 samples/sec   Loss 2.2119   LearningRate 0.0054   Epoch: 15   Global Step: 87280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:21,567-Speed 5576.66 samples/sec   Loss 2.1911   LearningRate 0.0054   Epoch: 15   Global Step: 87290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:23,397-Speed 5600.90 samples/sec   Loss 2.2308   LearningRate 0.0054   Epoch: 15   Global Step: 87300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:25,205-Speed 5663.39 samples/sec   Loss 2.1876   LearningRate 0.0054   Epoch: 15   Global Step: 87310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:27,010-Speed 5675.37 samples/sec   Loss 2.1769   LearningRate 0.0054   Epoch: 15   Global Step: 87320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:28,840-Speed 5597.89 samples/sec   Loss 2.2806   LearningRate 0.0054   Epoch: 15   Global Step: 87330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:30,688-Speed 5543.68 samples/sec   Loss 2.0566   LearningRate 0.0054   Epoch: 15   Global Step: 87340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:32,501-Speed 5649.86 samples/sec   Loss 2.1866   LearningRate 0.0054   Epoch: 15   Global Step: 87350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:34,326-Speed 5612.56 samples/sec   Loss 2.1388   LearningRate 0.0054   Epoch: 15   Global Step: 87360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:36,157-Speed 5593.10 samples/sec   Loss 2.1340   LearningRate 0.0054   Epoch: 15   Global Step: 87370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:37,967-Speed 5660.71 samples/sec   Loss 2.3084   LearningRate 0.0054   Epoch: 15   Global Step: 87380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:39,789-Speed 5621.79 samples/sec   Loss 2.1089   LearningRate 0.0054   Epoch: 15   Global Step: 87390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:57:41,629-Speed 5569.01 samples/sec   Loss 2.1403   LearningRate 0.0054   Epoch: 15   Global Step: 87400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:43,483-Speed 5524.90 samples/sec   Loss 2.1820   LearningRate 0.0054   Epoch: 15   Global Step: 87410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:45,309-Speed 5607.91 samples/sec   Loss 2.1366   LearningRate 0.0053   Epoch: 15   Global Step: 87420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:47,162-Speed 5529.07 samples/sec   Loss 2.3236   LearningRate 0.0053   Epoch: 15   Global Step: 87430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:48,971-Speed 5662.01 samples/sec   Loss 2.2402   LearningRate 0.0053   Epoch: 15   Global Step: 87440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:50,794-Speed 5619.83 samples/sec   Loss 2.1626   LearningRate 0.0053   Epoch: 15   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:52,618-Speed 5616.34 samples/sec   Loss 2.1508   LearningRate 0.0053   Epoch: 15   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:54,433-Speed 5644.51 samples/sec   Loss 2.2310   LearningRate 0.0053   Epoch: 15   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:56,248-Speed 5642.38 samples/sec   Loss 2.1042   LearningRate 0.0053   Epoch: 15   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:58,083-Speed 5582.48 samples/sec   Loss 2.1765   LearningRate 0.0053   Epoch: 15   Global Step: 87490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:57:59,898-Speed 5645.22 samples/sec   Loss 2.2630   LearningRate 0.0053   Epoch: 15   Global Step: 87500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:01,725-Speed 5605.06 samples/sec   Loss 2.2741   LearningRate 0.0053   Epoch: 15   Global Step: 87510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:03,556-Speed 5594.84 samples/sec   Loss 2.1954   LearningRate 0.0053   Epoch: 15   Global Step: 87520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:05,381-Speed 5613.16 samples/sec   Loss 2.1853   LearningRate 0.0053   Epoch: 15   Global Step: 87530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:07,217-Speed 5578.45 samples/sec   Loss 2.1319   LearningRate 0.0053   Epoch: 15   Global Step: 87540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:09,065-Speed 5544.26 samples/sec   Loss 2.1235   LearningRate 0.0053   Epoch: 15   Global Step: 87550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:10,895-Speed 5598.22 samples/sec   Loss 2.2071   LearningRate 0.0053   Epoch: 15   Global Step: 87560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:12,723-Speed 5604.46 samples/sec   Loss 2.1983   LearningRate 0.0053   Epoch: 15   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:14,549-Speed 5609.85 samples/sec   Loss 2.1703   LearningRate 0.0053   Epoch: 15   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:16,372-Speed 5615.88 samples/sec   Loss 2.2274   LearningRate 0.0053   Epoch: 15   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:18,183-Speed 5656.67 samples/sec   Loss 2.2079   LearningRate 0.0053   Epoch: 15   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:20,007-Speed 5615.83 samples/sec   Loss 2.2568   LearningRate 0.0053   Epoch: 15   Global Step: 87610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:21,825-Speed 5636.94 samples/sec   Loss 2.1241   LearningRate 0.0053   Epoch: 15   Global Step: 87620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:23,652-Speed 5606.07 samples/sec   Loss 2.1898   LearningRate 0.0053   Epoch: 15   Global Step: 87630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:25,472-Speed 5626.32 samples/sec   Loss 2.1691   LearningRate 0.0053   Epoch: 15   Global Step: 87640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:27,284-Speed 5653.28 samples/sec   Loss 2.1492   LearningRate 0.0053   Epoch: 15   Global Step: 87650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:29,102-Speed 5634.08 samples/sec   Loss 2.2339   LearningRate 0.0053   Epoch: 15   Global Step: 87660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:30,910-Speed 5667.73 samples/sec   Loss 2.2902   LearningRate 0.0052   Epoch: 15   Global Step: 87670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:32,756-Speed 5548.21 samples/sec   Loss 2.1500   LearningRate 0.0052   Epoch: 15   Global Step: 87680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:34,569-Speed 5649.97 samples/sec   Loss 2.1545   LearningRate 0.0052   Epoch: 15   Global Step: 87690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:36,384-Speed 5645.85 samples/sec   Loss 2.2021   LearningRate 0.0052   Epoch: 15   Global Step: 87700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:38,198-Speed 5646.09 samples/sec   Loss 2.0105   LearningRate 0.0052   Epoch: 15   Global Step: 87710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:40,027-Speed 5599.71 samples/sec   Loss 2.2100   LearningRate 0.0052   Epoch: 15   Global Step: 87720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:41,852-Speed 5615.26 samples/sec   Loss 2.1805   LearningRate 0.0052   Epoch: 15   Global Step: 87730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:43,664-Speed 5650.65 samples/sec   Loss 2.2757   LearningRate 0.0052   Epoch: 15   Global Step: 87740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:45,470-Speed 5673.24 samples/sec   Loss 2.2585   LearningRate 0.0052   Epoch: 15   Global Step: 87750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:47,300-Speed 5598.12 samples/sec   Loss 2.1485   LearningRate 0.0052   Epoch: 15   Global Step: 87760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:49,114-Speed 5643.91 samples/sec   Loss 2.2281   LearningRate 0.0052   Epoch: 15   Global Step: 87770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:50,987-Speed 5469.85 samples/sec   Loss 2.1316   LearningRate 0.0052   Epoch: 15   Global Step: 87780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:52,817-Speed 5599.58 samples/sec   Loss 2.1647   LearningRate 0.0052   Epoch: 15   Global Step: 87790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:54,643-Speed 5608.66 samples/sec   Loss 2.1616   LearningRate 0.0052   Epoch: 15   Global Step: 87800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 06:58:56,445-Speed 5684.99 samples/sec   Loss 2.2761   LearningRate 0.0052   Epoch: 15   Global Step: 87810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:58:58,276-Speed 5594.82 samples/sec   Loss 2.2536   LearningRate 0.0052   Epoch: 15   Global Step: 87820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:00,087-Speed 5655.34 samples/sec   Loss 2.2605   LearningRate 0.0052   Epoch: 15   Global Step: 87830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:01,915-Speed 5603.91 samples/sec   Loss 2.2484   LearningRate 0.0052   Epoch: 15   Global Step: 87840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:03,730-Speed 5645.49 samples/sec   Loss 2.2407   LearningRate 0.0052   Epoch: 15   Global Step: 87850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:05,546-Speed 5638.85 samples/sec   Loss 2.1837   LearningRate 0.0052   Epoch: 15   Global Step: 87860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:07,371-Speed 5612.90 samples/sec   Loss 2.2447   LearningRate 0.0052   Epoch: 15   Global Step: 87870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:09,198-Speed 5606.02 samples/sec   Loss 2.0300   LearningRate 0.0052   Epoch: 15   Global Step: 87880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:11,043-Speed 5552.29 samples/sec   Loss 2.2141   LearningRate 0.0052   Epoch: 15   Global Step: 87890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:12,867-Speed 5617.78 samples/sec   Loss 2.2641   LearningRate 0.0052   Epoch: 15   Global Step: 87900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:14,674-Speed 5667.81 samples/sec   Loss 2.2807   LearningRate 0.0052   Epoch: 15   Global Step: 87910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:16,501-Speed 5607.74 samples/sec   Loss 2.1397   LearningRate 0.0051   Epoch: 15   Global Step: 87920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:18,316-Speed 5642.91 samples/sec   Loss 2.2066   LearningRate 0.0051   Epoch: 15   Global Step: 87930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:20,156-Speed 5567.46 samples/sec   Loss 2.1846   LearningRate 0.0051   Epoch: 15   Global Step: 87940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:21,992-Speed 5579.00 samples/sec   Loss 2.1123   LearningRate 0.0051   Epoch: 15   Global Step: 87950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:23,823-Speed 5593.92 samples/sec   Loss 2.1423   LearningRate 0.0051   Epoch: 15   Global Step: 87960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:25,655-Speed 5592.28 samples/sec   Loss 2.1719   LearningRate 0.0051   Epoch: 15   Global Step: 87970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:27,533-Speed 5453.76 samples/sec   Loss 2.1825   LearningRate 0.0051   Epoch: 15   Global Step: 87980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:29,345-Speed 5653.70 samples/sec   Loss 2.2140   LearningRate 0.0051   Epoch: 15   Global Step: 87990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 06:59:31,147-Speed 5685.52 samples/sec   Loss 2.2411   LearningRate 0.0051   Epoch: 15   Global Step: 88000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 06:59:57,399-[lfw][88000]XNorm: 22.511046
Training: 2022-04-27 06:59:57,399-[lfw][88000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-27 06:59:57,400-[lfw][88000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:00:27,907-[cfp_fp][88000]XNorm: 20.966420
Training: 2022-04-27 07:00:27,907-[cfp_fp][88000]Accuracy-Flip: 0.97357+-0.00754
Training: 2022-04-27 07:00:27,908-[cfp_fp][88000]Accuracy-Highest: 0.97357
Training: 2022-04-27 07:00:54,188-[agedb_30][88000]XNorm: 22.440152
Training: 2022-04-27 07:00:54,189-[agedb_30][88000]Accuracy-Flip: 0.97967+-0.00980
Training: 2022-04-27 07:00:54,189-[agedb_30][88000]Accuracy-Highest: 0.98117
Training: 2022-04-27 07:00:56,043-Speed 120.62 samples/sec   Loss 2.0603   LearningRate 0.0051   Epoch: 15   Global Step: 88010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:00:57,958-Speed 5349.67 samples/sec   Loss 2.2235   LearningRate 0.0051   Epoch: 15   Global Step: 88020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:00:59,797-Speed 5568.87 samples/sec   Loss 2.0918   LearningRate 0.0051   Epoch: 15   Global Step: 88030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:01:01,606-Speed 5663.36 samples/sec   Loss 2.1937   LearningRate 0.0051   Epoch: 15   Global Step: 88040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:01:03,418-Speed 5651.94 samples/sec   Loss 2.2387   LearningRate 0.0051   Epoch: 15   Global Step: 88050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:01:05,224-Speed 5670.78 samples/sec   Loss 2.2703   LearningRate 0.0051   Epoch: 15   Global Step: 88060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:01:07,026-Speed 5684.89 samples/sec   Loss 2.1501   LearningRate 0.0051   Epoch: 15   Global Step: 88070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:01:08,871-Speed 5551.36 samples/sec   Loss 2.2396   LearningRate 0.0051   Epoch: 15   Global Step: 88080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:01:10,677-Speed 5674.31 samples/sec   Loss 2.1074   LearningRate 0.0051   Epoch: 15   Global Step: 88090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:01:12,488-Speed 5656.19 samples/sec   Loss 2.3323   LearningRate 0.0051   Epoch: 15   Global Step: 88100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:14,308-Speed 5627.00 samples/sec   Loss 2.1219   LearningRate 0.0051   Epoch: 15   Global Step: 88110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:16,129-Speed 5623.53 samples/sec   Loss 2.2364   LearningRate 0.0051   Epoch: 15   Global Step: 88120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:17,936-Speed 5671.04 samples/sec   Loss 2.1872   LearningRate 0.0051   Epoch: 15   Global Step: 88130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:19,756-Speed 5628.72 samples/sec   Loss 2.1690   LearningRate 0.0051   Epoch: 15   Global Step: 88140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:21,582-Speed 5609.21 samples/sec   Loss 2.1099   LearningRate 0.0051   Epoch: 15   Global Step: 88150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:23,436-Speed 5525.95 samples/sec   Loss 2.0679   LearningRate 0.0051   Epoch: 15   Global Step: 88160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:25,288-Speed 5530.41 samples/sec   Loss 2.2667   LearningRate 0.0050   Epoch: 15   Global Step: 88170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:27,128-Speed 5568.26 samples/sec   Loss 2.0958   LearningRate 0.0050   Epoch: 15   Global Step: 88180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:28,956-Speed 5600.60 samples/sec   Loss 2.2113   LearningRate 0.0050   Epoch: 15   Global Step: 88190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:30,785-Speed 5602.58 samples/sec   Loss 2.1065   LearningRate 0.0050   Epoch: 15   Global Step: 88200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:32,655-Speed 5477.98 samples/sec   Loss 2.1935   LearningRate 0.0050   Epoch: 15   Global Step: 88210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:34,464-Speed 5660.13 samples/sec   Loss 2.2453   LearningRate 0.0050   Epoch: 15   Global Step: 88220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:36,294-Speed 5597.77 samples/sec   Loss 2.2137   LearningRate 0.0050   Epoch: 15   Global Step: 88230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:38,126-Speed 5592.70 samples/sec   Loss 2.3070   LearningRate 0.0050   Epoch: 15   Global Step: 88240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:39,968-Speed 5561.08 samples/sec   Loss 2.2106   LearningRate 0.0050   Epoch: 15   Global Step: 88250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:41,794-Speed 5608.89 samples/sec   Loss 2.2055   LearningRate 0.0050   Epoch: 15   Global Step: 88260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:43,613-Speed 5630.62 samples/sec   Loss 2.2088   LearningRate 0.0050   Epoch: 15   Global Step: 88270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:45,445-Speed 5591.58 samples/sec   Loss 2.0566   LearningRate 0.0050   Epoch: 15   Global Step: 88280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:47,266-Speed 5625.75 samples/sec   Loss 2.1702   LearningRate 0.0050   Epoch: 15   Global Step: 88290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:49,078-Speed 5653.34 samples/sec   Loss 2.2700   LearningRate 0.0050   Epoch: 15   Global Step: 88300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:51,009-Speed 5304.67 samples/sec   Loss 2.2093   LearningRate 0.0050   Epoch: 15   Global Step: 88310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:52,944-Speed 5294.11 samples/sec   Loss 2.1890   LearningRate 0.0050   Epoch: 15   Global Step: 88320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:54,778-Speed 5586.40 samples/sec   Loss 2.2527   LearningRate 0.0050   Epoch: 15   Global Step: 88330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:56,612-Speed 5584.94 samples/sec   Loss 2.0351   LearningRate 0.0050   Epoch: 15   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:01:58,448-Speed 5578.16 samples/sec   Loss 2.1924   LearningRate 0.0050   Epoch: 15   Global Step: 88350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:00,299-Speed 5534.43 samples/sec   Loss 2.1343   LearningRate 0.0050   Epoch: 15   Global Step: 88360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:02,141-Speed 5559.72 samples/sec   Loss 2.2169   LearningRate 0.0050   Epoch: 15   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:04,048-Speed 5373.71 samples/sec   Loss 2.1402   LearningRate 0.0050   Epoch: 15   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:05,896-Speed 5542.99 samples/sec   Loss 2.2397   LearningRate 0.0050   Epoch: 15   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:07,694-Speed 5695.43 samples/sec   Loss 2.1295   LearningRate 0.0050   Epoch: 15   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:09,512-Speed 5638.37 samples/sec   Loss 2.2286   LearningRate 0.0050   Epoch: 15   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:11,397-Speed 5434.64 samples/sec   Loss 2.2194   LearningRate 0.0049   Epoch: 15   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:13,229-Speed 5590.06 samples/sec   Loss 2.2124   LearningRate 0.0049   Epoch: 15   Global Step: 88430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:15,060-Speed 5595.92 samples/sec   Loss 2.0515   LearningRate 0.0049   Epoch: 15   Global Step: 88440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:16,874-Speed 5644.65 samples/sec   Loss 2.1393   LearningRate 0.0049   Epoch: 15   Global Step: 88450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:18,687-Speed 5651.87 samples/sec   Loss 2.2354   LearningRate 0.0049   Epoch: 15   Global Step: 88460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:20,495-Speed 5666.00 samples/sec   Loss 2.2680   LearningRate 0.0049   Epoch: 15   Global Step: 88470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:22,305-Speed 5657.46 samples/sec   Loss 2.2333   LearningRate 0.0049   Epoch: 15   Global Step: 88480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:24,121-Speed 5639.99 samples/sec   Loss 2.1917   LearningRate 0.0049   Epoch: 15   Global Step: 88490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:25,925-Speed 5680.26 samples/sec   Loss 2.1831   LearningRate 0.0049   Epoch: 15   Global Step: 88500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:27,736-Speed 5654.94 samples/sec   Loss 2.2320   LearningRate 0.0049   Epoch: 15   Global Step: 88510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:29,553-Speed 5640.02 samples/sec   Loss 2.1810   LearningRate 0.0049   Epoch: 15   Global Step: 88520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:31,373-Speed 5629.13 samples/sec   Loss 2.2225   LearningRate 0.0049   Epoch: 15   Global Step: 88530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:33,206-Speed 5587.12 samples/sec   Loss 2.1590   LearningRate 0.0049   Epoch: 15   Global Step: 88540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:35,017-Speed 5657.31 samples/sec   Loss 2.1568   LearningRate 0.0049   Epoch: 15   Global Step: 88550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:36,852-Speed 5581.16 samples/sec   Loss 2.1863   LearningRate 0.0049   Epoch: 15   Global Step: 88560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:38,677-Speed 5611.58 samples/sec   Loss 2.1933   LearningRate 0.0049   Epoch: 15   Global Step: 88570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:40,491-Speed 5648.87 samples/sec   Loss 2.1847   LearningRate 0.0049   Epoch: 15   Global Step: 88580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:42,313-Speed 5620.42 samples/sec   Loss 2.1926   LearningRate 0.0049   Epoch: 15   Global Step: 88590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:44,125-Speed 5653.09 samples/sec   Loss 2.1130   LearningRate 0.0049   Epoch: 15   Global Step: 88600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 07:02:45,942-Speed 5637.20 samples/sec   Loss 2.3753   LearningRate 0.0049   Epoch: 15   Global Step: 88610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:47,766-Speed 5615.53 samples/sec   Loss 2.1990   LearningRate 0.0049   Epoch: 15   Global Step: 88620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:49,580-Speed 5646.95 samples/sec   Loss 2.0932   LearningRate 0.0049   Epoch: 15   Global Step: 88630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:51,398-Speed 5637.07 samples/sec   Loss 2.2071   LearningRate 0.0049   Epoch: 15   Global Step: 88640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:53,215-Speed 5637.06 samples/sec   Loss 2.1875   LearningRate 0.0049   Epoch: 15   Global Step: 88650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:55,044-Speed 5599.74 samples/sec   Loss 2.1676   LearningRate 0.0049   Epoch: 15   Global Step: 88660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:56,894-Speed 5538.27 samples/sec   Loss 2.0995   LearningRate 0.0049   Epoch: 15   Global Step: 88670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:02:58,725-Speed 5593.78 samples/sec   Loss 2.2179   LearningRate 0.0048   Epoch: 15   Global Step: 88680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:00,539-Speed 5647.20 samples/sec   Loss 2.1339   LearningRate 0.0048   Epoch: 15   Global Step: 88690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:02,361-Speed 5622.39 samples/sec   Loss 2.1838   LearningRate 0.0048   Epoch: 15   Global Step: 88700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:04,188-Speed 5605.14 samples/sec   Loss 2.0964   LearningRate 0.0048   Epoch: 15   Global Step: 88710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:05,994-Speed 5671.77 samples/sec   Loss 2.1467   LearningRate 0.0048   Epoch: 15   Global Step: 88720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:07,813-Speed 5631.45 samples/sec   Loss 2.1576   LearningRate 0.0048   Epoch: 15   Global Step: 88730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:09,644-Speed 5594.86 samples/sec   Loss 2.1972   LearningRate 0.0048   Epoch: 15   Global Step: 88740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:11,466-Speed 5623.70 samples/sec   Loss 2.1677   LearningRate 0.0048   Epoch: 15   Global Step: 88750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:13,281-Speed 5642.97 samples/sec   Loss 2.2237   LearningRate 0.0048   Epoch: 15   Global Step: 88760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:15,105-Speed 5616.02 samples/sec   Loss 2.1933   LearningRate 0.0048   Epoch: 15   Global Step: 88770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:16,913-Speed 5666.94 samples/sec   Loss 2.1822   LearningRate 0.0048   Epoch: 15   Global Step: 88780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:18,737-Speed 5613.76 samples/sec   Loss 2.2749   LearningRate 0.0048   Epoch: 15   Global Step: 88790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:20,559-Speed 5623.10 samples/sec   Loss 2.1673   LearningRate 0.0048   Epoch: 15   Global Step: 88800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:22,369-Speed 5660.30 samples/sec   Loss 2.1519   LearningRate 0.0048   Epoch: 15   Global Step: 88810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:24,189-Speed 5627.17 samples/sec   Loss 2.2020   LearningRate 0.0048   Epoch: 15   Global Step: 88820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:26,007-Speed 5635.29 samples/sec   Loss 2.3039   LearningRate 0.0048   Epoch: 15   Global Step: 88830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:27,820-Speed 5648.64 samples/sec   Loss 2.1733   LearningRate 0.0048   Epoch: 15   Global Step: 88840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:29,637-Speed 5638.42 samples/sec   Loss 2.1549   LearningRate 0.0048   Epoch: 15   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:31,450-Speed 5649.48 samples/sec   Loss 2.1679   LearningRate 0.0048   Epoch: 15   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:33,268-Speed 5633.67 samples/sec   Loss 2.2046   LearningRate 0.0048   Epoch: 15   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:35,082-Speed 5649.10 samples/sec   Loss 2.2404   LearningRate 0.0048   Epoch: 15   Global Step: 88880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:36,934-Speed 5530.67 samples/sec   Loss 2.1444   LearningRate 0.0048   Epoch: 15   Global Step: 88890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:38,747-Speed 5650.77 samples/sec   Loss 2.1401   LearningRate 0.0048   Epoch: 15   Global Step: 88900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:40,554-Speed 5669.72 samples/sec   Loss 2.2226   LearningRate 0.0048   Epoch: 15   Global Step: 88910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:42,383-Speed 5598.74 samples/sec   Loss 2.1203   LearningRate 0.0048   Epoch: 15   Global Step: 88920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:44,195-Speed 5654.69 samples/sec   Loss 2.1580   LearningRate 0.0048   Epoch: 15   Global Step: 88930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:46,018-Speed 5616.59 samples/sec   Loss 2.2465   LearningRate 0.0047   Epoch: 15   Global Step: 88940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:47,859-Speed 5564.95 samples/sec   Loss 2.1916   LearningRate 0.0047   Epoch: 15   Global Step: 88950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:49,686-Speed 5605.45 samples/sec   Loss 2.2578   LearningRate 0.0047   Epoch: 15   Global Step: 88960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:51,523-Speed 5576.99 samples/sec   Loss 2.1505   LearningRate 0.0047   Epoch: 15   Global Step: 88970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:53,347-Speed 5616.43 samples/sec   Loss 2.1417   LearningRate 0.0047   Epoch: 15   Global Step: 88980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:55,183-Speed 5580.35 samples/sec   Loss 2.1423   LearningRate 0.0047   Epoch: 15   Global Step: 88990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:56,997-Speed 5647.42 samples/sec   Loss 2.1979   LearningRate 0.0047   Epoch: 15   Global Step: 89000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:03:58,811-Speed 5645.39 samples/sec   Loss 2.1028   LearningRate 0.0047   Epoch: 15   Global Step: 89010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:00,623-Speed 5654.72 samples/sec   Loss 2.2645   LearningRate 0.0047   Epoch: 15   Global Step: 89020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:02,436-Speed 5648.87 samples/sec   Loss 2.1961   LearningRate 0.0047   Epoch: 15   Global Step: 89030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:04,266-Speed 5598.79 samples/sec   Loss 2.0691   LearningRate 0.0047   Epoch: 15   Global Step: 89040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:06,083-Speed 5638.19 samples/sec   Loss 2.2009   LearningRate 0.0047   Epoch: 15   Global Step: 89050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:07,895-Speed 5652.31 samples/sec   Loss 2.2123   LearningRate 0.0047   Epoch: 15   Global Step: 89060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:09,707-Speed 5652.44 samples/sec   Loss 2.0821   LearningRate 0.0047   Epoch: 15   Global Step: 89070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:11,523-Speed 5642.16 samples/sec   Loss 2.2171   LearningRate 0.0047   Epoch: 15   Global Step: 89080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:13,350-Speed 5605.72 samples/sec   Loss 2.2563   LearningRate 0.0047   Epoch: 15   Global Step: 89090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:15,170-Speed 5626.14 samples/sec   Loss 2.1128   LearningRate 0.0047   Epoch: 15   Global Step: 89100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:16,995-Speed 5615.27 samples/sec   Loss 2.1802   LearningRate 0.0047   Epoch: 15   Global Step: 89110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:18,802-Speed 5668.06 samples/sec   Loss 2.1803   LearningRate 0.0047   Epoch: 15   Global Step: 89120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:20,612-Speed 5660.81 samples/sec   Loss 2.1434   LearningRate 0.0047   Epoch: 15   Global Step: 89130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:22,430-Speed 5634.18 samples/sec   Loss 2.1866   LearningRate 0.0047   Epoch: 15   Global Step: 89140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:24,272-Speed 5560.22 samples/sec   Loss 2.1544   LearningRate 0.0047   Epoch: 15   Global Step: 89150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:26,092-Speed 5628.46 samples/sec   Loss 2.1897   LearningRate 0.0047   Epoch: 15   Global Step: 89160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:27,917-Speed 5613.58 samples/sec   Loss 2.2370   LearningRate 0.0047   Epoch: 15   Global Step: 89170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:29,734-Speed 5635.51 samples/sec   Loss 2.0739   LearningRate 0.0047   Epoch: 15   Global Step: 89180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:31,558-Speed 5616.51 samples/sec   Loss 2.1992   LearningRate 0.0047   Epoch: 15   Global Step: 89190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:33,420-Speed 5501.76 samples/sec   Loss 2.1493   LearningRate 0.0046   Epoch: 15   Global Step: 89200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:35,325-Speed 5376.25 samples/sec   Loss 2.1443   LearningRate 0.0046   Epoch: 15   Global Step: 89210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:37,246-Speed 5334.74 samples/sec   Loss 2.1604   LearningRate 0.0046   Epoch: 15   Global Step: 89220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:39,169-Speed 5325.56 samples/sec   Loss 2.2186   LearningRate 0.0046   Epoch: 15   Global Step: 89230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:41,038-Speed 5481.23 samples/sec   Loss 2.1250   LearningRate 0.0046   Epoch: 15   Global Step: 89240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:42,857-Speed 5629.29 samples/sec   Loss 2.1576   LearningRate 0.0046   Epoch: 15   Global Step: 89250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:44,680-Speed 5620.53 samples/sec   Loss 2.2950   LearningRate 0.0046   Epoch: 15   Global Step: 89260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:46,494-Speed 5647.35 samples/sec   Loss 2.2113   LearningRate 0.0046   Epoch: 15   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:48,331-Speed 5573.65 samples/sec   Loss 2.1681   LearningRate 0.0046   Epoch: 15   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:50,157-Speed 5611.39 samples/sec   Loss 2.2060   LearningRate 0.0046   Epoch: 15   Global Step: 89290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:51,987-Speed 5596.24 samples/sec   Loss 2.1011   LearningRate 0.0046   Epoch: 15   Global Step: 89300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:53,824-Speed 5576.88 samples/sec   Loss 2.0487   LearningRate 0.0046   Epoch: 15   Global Step: 89310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:55,641-Speed 5640.54 samples/sec   Loss 2.1218   LearningRate 0.0046   Epoch: 15   Global Step: 89320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:57,474-Speed 5586.34 samples/sec   Loss 2.2080   LearningRate 0.0046   Epoch: 15   Global Step: 89330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:04:59,297-Speed 5617.96 samples/sec   Loss 2.1677   LearningRate 0.0046   Epoch: 15   Global Step: 89340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:01,147-Speed 5538.78 samples/sec   Loss 2.1087   LearningRate 0.0046   Epoch: 15   Global Step: 89350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:02,971-Speed 5614.79 samples/sec   Loss 2.1240   LearningRate 0.0046   Epoch: 15   Global Step: 89360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:04,785-Speed 5648.77 samples/sec   Loss 2.1023   LearningRate 0.0046   Epoch: 15   Global Step: 89370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:06,620-Speed 5582.18 samples/sec   Loss 2.1949   LearningRate 0.0046   Epoch: 15   Global Step: 89380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:08,443-Speed 5616.75 samples/sec   Loss 2.1795   LearningRate 0.0046   Epoch: 15   Global Step: 89390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:10,276-Speed 5590.53 samples/sec   Loss 2.1073   LearningRate 0.0046   Epoch: 15   Global Step: 89400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:12,104-Speed 5602.25 samples/sec   Loss 2.2462   LearningRate 0.0046   Epoch: 15   Global Step: 89410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 07:05:13,911-Speed 5671.60 samples/sec   Loss 2.1129   LearningRate 0.0046   Epoch: 15   Global Step: 89420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:15,724-Speed 5648.86 samples/sec   Loss 2.1681   LearningRate 0.0046   Epoch: 15   Global Step: 89430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:17,542-Speed 5634.13 samples/sec   Loss 2.2560   LearningRate 0.0046   Epoch: 15   Global Step: 89440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:19,353-Speed 5656.36 samples/sec   Loss 2.1352   LearningRate 0.0046   Epoch: 15   Global Step: 89450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:21,166-Speed 5649.17 samples/sec   Loss 2.1240   LearningRate 0.0046   Epoch: 15   Global Step: 89460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:22,988-Speed 5622.67 samples/sec   Loss 2.1389   LearningRate 0.0045   Epoch: 15   Global Step: 89470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:24,803-Speed 5643.08 samples/sec   Loss 2.1804   LearningRate 0.0045   Epoch: 15   Global Step: 89480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:26,628-Speed 5612.77 samples/sec   Loss 2.1456   LearningRate 0.0045   Epoch: 15   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:28,457-Speed 5600.69 samples/sec   Loss 2.2088   LearningRate 0.0045   Epoch: 15   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:30,294-Speed 5577.10 samples/sec   Loss 2.1387   LearningRate 0.0045   Epoch: 15   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:32,103-Speed 5660.34 samples/sec   Loss 2.2782   LearningRate 0.0045   Epoch: 15   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:33,922-Speed 5632.16 samples/sec   Loss 2.1381   LearningRate 0.0045   Epoch: 15   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:35,736-Speed 5646.56 samples/sec   Loss 2.2067   LearningRate 0.0045   Epoch: 15   Global Step: 89540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:37,557-Speed 5626.80 samples/sec   Loss 2.1803   LearningRate 0.0045   Epoch: 15   Global Step: 89550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:39,392-Speed 5582.70 samples/sec   Loss 2.1121   LearningRate 0.0045   Epoch: 15   Global Step: 89560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:41,210-Speed 5634.35 samples/sec   Loss 2.0794   LearningRate 0.0045   Epoch: 15   Global Step: 89570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:43,022-Speed 5652.62 samples/sec   Loss 2.2160   LearningRate 0.0045   Epoch: 15   Global Step: 89580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:44,876-Speed 5524.94 samples/sec   Loss 2.1227   LearningRate 0.0045   Epoch: 15   Global Step: 89590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:46,693-Speed 5636.89 samples/sec   Loss 2.1621   LearningRate 0.0045   Epoch: 15   Global Step: 89600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:48,529-Speed 5581.07 samples/sec   Loss 2.1941   LearningRate 0.0045   Epoch: 15   Global Step: 89610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:50,357-Speed 5602.51 samples/sec   Loss 2.1043   LearningRate 0.0045   Epoch: 15   Global Step: 89620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:52,192-Speed 5581.27 samples/sec   Loss 2.0759   LearningRate 0.0045   Epoch: 15   Global Step: 89630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:54,055-Speed 5498.41 samples/sec   Loss 2.0997   LearningRate 0.0045   Epoch: 15   Global Step: 89640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:55,893-Speed 5572.71 samples/sec   Loss 2.2201   LearningRate 0.0045   Epoch: 15   Global Step: 89650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:57,748-Speed 5524.49 samples/sec   Loss 2.1604   LearningRate 0.0045   Epoch: 15   Global Step: 89660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:05:59,642-Speed 5408.78 samples/sec   Loss 2.1815   LearningRate 0.0045   Epoch: 15   Global Step: 89670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:01,456-Speed 5645.04 samples/sec   Loss 2.1824   LearningRate 0.0045   Epoch: 15   Global Step: 89680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:03,277-Speed 5625.13 samples/sec   Loss 2.1279   LearningRate 0.0045   Epoch: 15   Global Step: 89690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:05,095-Speed 5633.66 samples/sec   Loss 2.1115   LearningRate 0.0045   Epoch: 15   Global Step: 89700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:06,919-Speed 5618.33 samples/sec   Loss 2.0863   LearningRate 0.0045   Epoch: 15   Global Step: 89710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:08,732-Speed 5650.40 samples/sec   Loss 2.1267   LearningRate 0.0045   Epoch: 15   Global Step: 89720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:10,554-Speed 5620.02 samples/sec   Loss 2.1286   LearningRate 0.0045   Epoch: 15   Global Step: 89730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:12,374-Speed 5628.35 samples/sec   Loss 2.1876   LearningRate 0.0044   Epoch: 15   Global Step: 89740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:14,194-Speed 5628.36 samples/sec   Loss 2.0218   LearningRate 0.0044   Epoch: 15   Global Step: 89750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:16,017-Speed 5619.92 samples/sec   Loss 2.2426   LearningRate 0.0044   Epoch: 15   Global Step: 89760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:17,849-Speed 5593.43 samples/sec   Loss 2.2036   LearningRate 0.0044   Epoch: 15   Global Step: 89770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 07:06:19,658-Speed 5661.67 samples/sec   Loss 2.0495   LearningRate 0.0044   Epoch: 15   Global Step: 89780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:06:21,473-Speed 5643.86 samples/sec   Loss 2.1373   LearningRate 0.0044   Epoch: 15   Global Step: 89790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:06:23,289-Speed 5640.15 samples/sec   Loss 2.1373   LearningRate 0.0044   Epoch: 15   Global Step: 89800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:06:25,105-Speed 5639.83 samples/sec   Loss 2.0650   LearningRate 0.0044   Epoch: 15   Global Step: 89810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 07:06:26,914-Speed 5662.16 samples/sec   Loss 2.0896   LearningRate 0.0044   Epoch: 15   Global Step: 89820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:28,734-Speed 5628.68 samples/sec   Loss 2.0425   LearningRate 0.0044   Epoch: 15   Global Step: 89830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:30,558-Speed 5616.17 samples/sec   Loss 2.1810   LearningRate 0.0044   Epoch: 15   Global Step: 89840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:32,380-Speed 5623.04 samples/sec   Loss 2.1892   LearningRate 0.0044   Epoch: 15   Global Step: 89850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:34,217-Speed 5574.87 samples/sec   Loss 2.1874   LearningRate 0.0044   Epoch: 15   Global Step: 89860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:36,043-Speed 5609.38 samples/sec   Loss 2.1301   LearningRate 0.0044   Epoch: 15   Global Step: 89870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:37,941-Speed 5399.13 samples/sec   Loss 2.1391   LearningRate 0.0044   Epoch: 15   Global Step: 89880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:06:39,884-Speed 5272.92 samples/sec   Loss 2.1430   LearningRate 0.0044   Epoch: 15   Global Step: 89890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:06:41,813-Speed 5307.73 samples/sec   Loss 2.0907   LearningRate 0.0044   Epoch: 15   Global Step: 89900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:06:43,662-Speed 5539.56 samples/sec   Loss 2.0996   LearningRate 0.0044   Epoch: 15   Global Step: 89910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:06:45,494-Speed 5594.35 samples/sec   Loss 2.1382   LearningRate 0.0044   Epoch: 15   Global Step: 89920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:06:47,310-Speed 5640.47 samples/sec   Loss 2.1569   LearningRate 0.0044   Epoch: 15   Global Step: 89930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:06:49,143-Speed 5585.91 samples/sec   Loss 2.1118   LearningRate 0.0044   Epoch: 15   Global Step: 89940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:06:50,962-Speed 5631.93 samples/sec   Loss 2.2027   LearningRate 0.0044   Epoch: 15   Global Step: 89950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:52,791-Speed 5599.62 samples/sec   Loss 2.1602   LearningRate 0.0044   Epoch: 15   Global Step: 89960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:54,608-Speed 5637.58 samples/sec   Loss 2.1506   LearningRate 0.0044   Epoch: 15   Global Step: 89970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:56,492-Speed 5436.73 samples/sec   Loss 2.0516   LearningRate 0.0044   Epoch: 15   Global Step: 89980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:06:58,416-Speed 5325.52 samples/sec   Loss 2.1733   LearningRate 0.0044   Epoch: 15   Global Step: 89990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:07:00,312-Speed 5403.50 samples/sec   Loss 2.0500   LearningRate 0.0044   Epoch: 15   Global Step: 90000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:07:26,372-[lfw][90000]XNorm: 22.569485
Training: 2022-04-27 07:07:26,373-[lfw][90000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-04-27 07:07:26,373-[lfw][90000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:07:56,594-[cfp_fp][90000]XNorm: 21.390203
Training: 2022-04-27 07:07:56,595-[cfp_fp][90000]Accuracy-Flip: 0.97486+-0.00743
Training: 2022-04-27 07:07:56,595-[cfp_fp][90000]Accuracy-Highest: 0.97486
Training: 2022-04-27 07:08:22,687-[agedb_30][90000]XNorm: 22.610529
Training: 2022-04-27 07:08:22,687-[agedb_30][90000]Accuracy-Flip: 0.98083+-0.00647
Training: 2022-04-27 07:08:22,688-[agedb_30][90000]Accuracy-Highest: 0.98117
Training: 2022-04-27 07:08:24,514-Speed 121.61 samples/sec   Loss 2.2206   LearningRate 0.0043   Epoch: 15   Global Step: 90010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:08:26,334-Speed 5626.76 samples/sec   Loss 2.2108   LearningRate 0.0043   Epoch: 15   Global Step: 90020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:08:28,138-Speed 5680.23 samples/sec   Loss 2.1805   LearningRate 0.0043   Epoch: 15   Global Step: 90030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:08:29,965-Speed 5606.49 samples/sec   Loss 2.0648   LearningRate 0.0043   Epoch: 15   Global Step: 90040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:08:31,777-Speed 5652.73 samples/sec   Loss 2.1073   LearningRate 0.0043   Epoch: 15   Global Step: 90050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:33,611-Speed 5584.56 samples/sec   Loss 2.1777   LearningRate 0.0043   Epoch: 15   Global Step: 90060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:35,420-Speed 5662.89 samples/sec   Loss 2.2701   LearningRate 0.0043   Epoch: 15   Global Step: 90070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:37,240-Speed 5629.39 samples/sec   Loss 2.1318   LearningRate 0.0043   Epoch: 15   Global Step: 90080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:39,052-Speed 5653.04 samples/sec   Loss 2.2030   LearningRate 0.0043   Epoch: 15   Global Step: 90090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:40,863-Speed 5656.24 samples/sec   Loss 2.1717   LearningRate 0.0043   Epoch: 15   Global Step: 90100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:42,673-Speed 5656.81 samples/sec   Loss 2.1975   LearningRate 0.0043   Epoch: 15   Global Step: 90110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:44,505-Speed 5593.11 samples/sec   Loss 2.1097   LearningRate 0.0043   Epoch: 15   Global Step: 90120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:46,334-Speed 5600.92 samples/sec   Loss 2.1265   LearningRate 0.0043   Epoch: 15   Global Step: 90130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:48,147-Speed 5650.57 samples/sec   Loss 2.0930   LearningRate 0.0043   Epoch: 15   Global Step: 90140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:49,960-Speed 5649.33 samples/sec   Loss 2.2121   LearningRate 0.0043   Epoch: 15   Global Step: 90150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:51,784-Speed 5615.63 samples/sec   Loss 2.0657   LearningRate 0.0043   Epoch: 15   Global Step: 90160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:53,630-Speed 5549.27 samples/sec   Loss 2.2076   LearningRate 0.0043   Epoch: 15   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:55,441-Speed 5655.29 samples/sec   Loss 2.0857   LearningRate 0.0043   Epoch: 15   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:57,255-Speed 5649.47 samples/sec   Loss 2.0827   LearningRate 0.0043   Epoch: 15   Global Step: 90190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:08:59,060-Speed 5674.08 samples/sec   Loss 2.0122   LearningRate 0.0043   Epoch: 15   Global Step: 90200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:00,876-Speed 5640.00 samples/sec   Loss 2.1522   LearningRate 0.0043   Epoch: 15   Global Step: 90210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:02,701-Speed 5612.37 samples/sec   Loss 2.1135   LearningRate 0.0043   Epoch: 15   Global Step: 90220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:04,518-Speed 5637.98 samples/sec   Loss 2.1075   LearningRate 0.0043   Epoch: 15   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:06,344-Speed 5611.12 samples/sec   Loss 2.1578   LearningRate 0.0043   Epoch: 15   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:08,195-Speed 5533.52 samples/sec   Loss 2.0528   LearningRate 0.0043   Epoch: 15   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:10,031-Speed 5578.28 samples/sec   Loss 2.0443   LearningRate 0.0043   Epoch: 15   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:11,872-Speed 5565.22 samples/sec   Loss 2.1692   LearningRate 0.0043   Epoch: 15   Global Step: 90270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:13,691-Speed 5632.21 samples/sec   Loss 2.2213   LearningRate 0.0042   Epoch: 15   Global Step: 90280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:15,580-Speed 5421.71 samples/sec   Loss 2.1127   LearningRate 0.0042   Epoch: 15   Global Step: 90290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:17,409-Speed 5601.85 samples/sec   Loss 2.1247   LearningRate 0.0042   Epoch: 15   Global Step: 90300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:19,226-Speed 5637.12 samples/sec   Loss 2.2434   LearningRate 0.0042   Epoch: 15   Global Step: 90310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:21,051-Speed 5611.36 samples/sec   Loss 2.0607   LearningRate 0.0042   Epoch: 15   Global Step: 90320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:22,869-Speed 5634.07 samples/sec   Loss 2.1706   LearningRate 0.0042   Epoch: 15   Global Step: 90330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:24,690-Speed 5625.97 samples/sec   Loss 2.2333   LearningRate 0.0042   Epoch: 15   Global Step: 90340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:26,509-Speed 5629.83 samples/sec   Loss 2.2057   LearningRate 0.0042   Epoch: 15   Global Step: 90350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:28,328-Speed 5634.48 samples/sec   Loss 2.1170   LearningRate 0.0042   Epoch: 15   Global Step: 90360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:30,162-Speed 5585.54 samples/sec   Loss 2.1409   LearningRate 0.0042   Epoch: 15   Global Step: 90370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:31,970-Speed 5663.92 samples/sec   Loss 2.1448   LearningRate 0.0042   Epoch: 15   Global Step: 90380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:33,800-Speed 5597.36 samples/sec   Loss 2.0587   LearningRate 0.0042   Epoch: 15   Global Step: 90390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:35,620-Speed 5628.35 samples/sec   Loss 2.0762   LearningRate 0.0042   Epoch: 15   Global Step: 90400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:37,442-Speed 5622.84 samples/sec   Loss 2.1334   LearningRate 0.0042   Epoch: 15   Global Step: 90410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:39,255-Speed 5648.81 samples/sec   Loss 2.1749   LearningRate 0.0042   Epoch: 15   Global Step: 90420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:41,085-Speed 5598.76 samples/sec   Loss 2.1865   LearningRate 0.0042   Epoch: 15   Global Step: 90430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:42,926-Speed 5564.38 samples/sec   Loss 2.1375   LearningRate 0.0042   Epoch: 15   Global Step: 90440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:44,745-Speed 5634.56 samples/sec   Loss 2.1456   LearningRate 0.0042   Epoch: 15   Global Step: 90450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:46,562-Speed 5636.46 samples/sec   Loss 2.1855   LearningRate 0.0042   Epoch: 15   Global Step: 90460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:48,402-Speed 5567.42 samples/sec   Loss 2.1926   LearningRate 0.0042   Epoch: 15   Global Step: 90470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:50,244-Speed 5562.43 samples/sec   Loss 2.1604   LearningRate 0.0042   Epoch: 15   Global Step: 90480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:52,095-Speed 5532.42 samples/sec   Loss 2.0872   LearningRate 0.0042   Epoch: 15   Global Step: 90490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:53,920-Speed 5613.13 samples/sec   Loss 2.1719   LearningRate 0.0042   Epoch: 15   Global Step: 90500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:55,741-Speed 5624.63 samples/sec   Loss 2.1324   LearningRate 0.0042   Epoch: 15   Global Step: 90510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:09:57,551-Speed 5658.95 samples/sec   Loss 2.1847   LearningRate 0.0042   Epoch: 15   Global Step: 90520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:09:59,375-Speed 5616.59 samples/sec   Loss 2.1708   LearningRate 0.0042   Epoch: 15   Global Step: 90530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:01,214-Speed 5571.20 samples/sec   Loss 2.1270   LearningRate 0.0042   Epoch: 15   Global Step: 90540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:03,047-Speed 5588.75 samples/sec   Loss 2.1205   LearningRate 0.0042   Epoch: 15   Global Step: 90550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:04,865-Speed 5633.75 samples/sec   Loss 2.1618   LearningRate 0.0041   Epoch: 15   Global Step: 90560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:06,703-Speed 5571.75 samples/sec   Loss 2.1094   LearningRate 0.0041   Epoch: 15   Global Step: 90570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:08,614-Speed 5362.00 samples/sec   Loss 2.1500   LearningRate 0.0041   Epoch: 15   Global Step: 90580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:10,437-Speed 5617.72 samples/sec   Loss 2.2123   LearningRate 0.0041   Epoch: 15   Global Step: 90590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:12,272-Speed 5582.03 samples/sec   Loss 2.2056   LearningRate 0.0041   Epoch: 15   Global Step: 90600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:14,088-Speed 5643.23 samples/sec   Loss 2.1722   LearningRate 0.0041   Epoch: 15   Global Step: 90610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:15,901-Speed 5649.37 samples/sec   Loss 2.0862   LearningRate 0.0041   Epoch: 15   Global Step: 90620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:17,726-Speed 5613.83 samples/sec   Loss 2.0219   LearningRate 0.0041   Epoch: 15   Global Step: 90630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:19,540-Speed 5644.27 samples/sec   Loss 2.0988   LearningRate 0.0041   Epoch: 15   Global Step: 90640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:21,359-Speed 5633.28 samples/sec   Loss 2.1196   LearningRate 0.0041   Epoch: 15   Global Step: 90650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:23,166-Speed 5666.42 samples/sec   Loss 2.1189   LearningRate 0.0041   Epoch: 15   Global Step: 90660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:25,000-Speed 5586.46 samples/sec   Loss 2.1146   LearningRate 0.0041   Epoch: 15   Global Step: 90670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:26,828-Speed 5602.69 samples/sec   Loss 2.1872   LearningRate 0.0041   Epoch: 15   Global Step: 90680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:28,675-Speed 5548.41 samples/sec   Loss 2.1295   LearningRate 0.0041   Epoch: 15   Global Step: 90690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:30,490-Speed 5641.97 samples/sec   Loss 2.1416   LearningRate 0.0041   Epoch: 15   Global Step: 90700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:32,308-Speed 5633.00 samples/sec   Loss 2.0347   LearningRate 0.0041   Epoch: 15   Global Step: 90710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:34,119-Speed 5657.64 samples/sec   Loss 2.0514   LearningRate 0.0041   Epoch: 15   Global Step: 90720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:35,964-Speed 5554.21 samples/sec   Loss 2.1951   LearningRate 0.0041   Epoch: 15   Global Step: 90730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:37,789-Speed 5611.45 samples/sec   Loss 2.0957   LearningRate 0.0041   Epoch: 15   Global Step: 90740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:39,601-Speed 5653.53 samples/sec   Loss 2.1840   LearningRate 0.0041   Epoch: 15   Global Step: 90750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:41,428-Speed 5605.46 samples/sec   Loss 2.1375   LearningRate 0.0041   Epoch: 15   Global Step: 90760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:43,321-Speed 5412.66 samples/sec   Loss 2.2279   LearningRate 0.0041   Epoch: 15   Global Step: 90770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:45,171-Speed 5537.47 samples/sec   Loss 2.1111   LearningRate 0.0041   Epoch: 15   Global Step: 90780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:47,018-Speed 5546.41 samples/sec   Loss 2.0359   LearningRate 0.0041   Epoch: 15   Global Step: 90790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:48,844-Speed 5608.16 samples/sec   Loss 2.1851   LearningRate 0.0041   Epoch: 15   Global Step: 90800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:50,660-Speed 5639.66 samples/sec   Loss 2.0921   LearningRate 0.0041   Epoch: 15   Global Step: 90810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:10:52,472-Speed 5652.73 samples/sec   Loss 2.1306   LearningRate 0.0041   Epoch: 15   Global Step: 90820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:54,278-Speed 5674.36 samples/sec   Loss 2.1177   LearningRate 0.0041   Epoch: 15   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:56,098-Speed 5628.44 samples/sec   Loss 2.0502   LearningRate 0.0040   Epoch: 15   Global Step: 90840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:57,912-Speed 5646.96 samples/sec   Loss 2.1705   LearningRate 0.0040   Epoch: 15   Global Step: 90850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:10:59,726-Speed 5645.32 samples/sec   Loss 2.2531   LearningRate 0.0040   Epoch: 15   Global Step: 90860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:01,556-Speed 5597.45 samples/sec   Loss 2.1588   LearningRate 0.0040   Epoch: 15   Global Step: 90870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:03,380-Speed 5617.59 samples/sec   Loss 2.0937   LearningRate 0.0040   Epoch: 15   Global Step: 90880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:05,208-Speed 5604.19 samples/sec   Loss 2.1950   LearningRate 0.0040   Epoch: 15   Global Step: 90890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:07,028-Speed 5628.78 samples/sec   Loss 2.1942   LearningRate 0.0040   Epoch: 15   Global Step: 90900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:08,840-Speed 5651.15 samples/sec   Loss 2.0421   LearningRate 0.0040   Epoch: 15   Global Step: 90910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:10,641-Speed 5689.92 samples/sec   Loss 2.1458   LearningRate 0.0040   Epoch: 15   Global Step: 90920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:12,454-Speed 5649.05 samples/sec   Loss 2.1688   LearningRate 0.0040   Epoch: 15   Global Step: 90930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:14,284-Speed 5598.08 samples/sec   Loss 2.1743   LearningRate 0.0040   Epoch: 15   Global Step: 90940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:16,121-Speed 5574.89 samples/sec   Loss 2.1602   LearningRate 0.0040   Epoch: 15   Global Step: 90950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:17,929-Speed 5666.21 samples/sec   Loss 2.1208   LearningRate 0.0040   Epoch: 15   Global Step: 90960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:19,892-Speed 5218.15 samples/sec   Loss 2.0837   LearningRate 0.0040   Epoch: 15   Global Step: 90970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:36,846-Speed 604.04 samples/sec   Loss 1.9688   LearningRate 0.0040   Epoch: 16   Global Step: 90980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:38,845-Speed 5125.33 samples/sec   Loss 1.6200   LearningRate 0.0040   Epoch: 16   Global Step: 90990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:40,847-Speed 5117.63 samples/sec   Loss 1.5882   LearningRate 0.0040   Epoch: 16   Global Step: 91000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:42,824-Speed 5181.61 samples/sec   Loss 1.6368   LearningRate 0.0040   Epoch: 16   Global Step: 91010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:44,639-Speed 5643.66 samples/sec   Loss 1.6377   LearningRate 0.0040   Epoch: 16   Global Step: 91020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:11:46,478-Speed 5570.50 samples/sec   Loss 1.5363   LearningRate 0.0040   Epoch: 16   Global Step: 91030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:48,308-Speed 5599.75 samples/sec   Loss 1.6753   LearningRate 0.0040   Epoch: 16   Global Step: 91040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:50,115-Speed 5667.24 samples/sec   Loss 1.5501   LearningRate 0.0040   Epoch: 16   Global Step: 91050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:51,936-Speed 5624.60 samples/sec   Loss 1.5906   LearningRate 0.0040   Epoch: 16   Global Step: 91060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:53,750-Speed 5647.07 samples/sec   Loss 1.5742   LearningRate 0.0040   Epoch: 16   Global Step: 91070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:55,573-Speed 5617.24 samples/sec   Loss 1.6411   LearningRate 0.0040   Epoch: 16   Global Step: 91080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:57,387-Speed 5648.53 samples/sec   Loss 1.5176   LearningRate 0.0040   Epoch: 16   Global Step: 91090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:11:59,216-Speed 5600.99 samples/sec   Loss 1.6433   LearningRate 0.0040   Epoch: 16   Global Step: 91100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:01,047-Speed 5594.05 samples/sec   Loss 1.6306   LearningRate 0.0040   Epoch: 16   Global Step: 91110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:02,880-Speed 5590.52 samples/sec   Loss 1.5547   LearningRate 0.0039   Epoch: 16   Global Step: 91120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:04,725-Speed 5551.17 samples/sec   Loss 1.6565   LearningRate 0.0039   Epoch: 16   Global Step: 91130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:06,605-Speed 5447.70 samples/sec   Loss 1.5125   LearningRate 0.0039   Epoch: 16   Global Step: 91140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:08,466-Speed 5508.97 samples/sec   Loss 1.5511   LearningRate 0.0039   Epoch: 16   Global Step: 91150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:10,291-Speed 5612.41 samples/sec   Loss 1.5416   LearningRate 0.0039   Epoch: 16   Global Step: 91160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:12,110-Speed 5630.10 samples/sec   Loss 1.5992   LearningRate 0.0039   Epoch: 16   Global Step: 91170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:13,940-Speed 5596.80 samples/sec   Loss 1.6185   LearningRate 0.0039   Epoch: 16   Global Step: 91180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:15,772-Speed 5591.50 samples/sec   Loss 1.5716   LearningRate 0.0039   Epoch: 16   Global Step: 91190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:17,609-Speed 5577.85 samples/sec   Loss 1.5453   LearningRate 0.0039   Epoch: 16   Global Step: 91200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:19,441-Speed 5592.28 samples/sec   Loss 1.6863   LearningRate 0.0039   Epoch: 16   Global Step: 91210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:21,270-Speed 5600.38 samples/sec   Loss 1.5394   LearningRate 0.0039   Epoch: 16   Global Step: 91220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:23,090-Speed 5625.40 samples/sec   Loss 1.5662   LearningRate 0.0039   Epoch: 16   Global Step: 91230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:24,947-Speed 5516.44 samples/sec   Loss 1.7360   LearningRate 0.0039   Epoch: 16   Global Step: 91240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:26,793-Speed 5549.98 samples/sec   Loss 1.7222   LearningRate 0.0039   Epoch: 16   Global Step: 91250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:28,630-Speed 5577.00 samples/sec   Loss 1.5906   LearningRate 0.0039   Epoch: 16   Global Step: 91260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:30,464-Speed 5584.06 samples/sec   Loss 1.5800   LearningRate 0.0039   Epoch: 16   Global Step: 91270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:32,278-Speed 5645.91 samples/sec   Loss 1.5693   LearningRate 0.0039   Epoch: 16   Global Step: 91280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:34,106-Speed 5605.37 samples/sec   Loss 1.5725   LearningRate 0.0039   Epoch: 16   Global Step: 91290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:35,950-Speed 5552.56 samples/sec   Loss 1.6078   LearningRate 0.0039   Epoch: 16   Global Step: 91300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:37,787-Speed 5577.81 samples/sec   Loss 1.5708   LearningRate 0.0039   Epoch: 16   Global Step: 91310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:39,615-Speed 5604.58 samples/sec   Loss 1.7357   LearningRate 0.0039   Epoch: 16   Global Step: 91320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:41,454-Speed 5570.68 samples/sec   Loss 1.6105   LearningRate 0.0039   Epoch: 16   Global Step: 91330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:43,371-Speed 5344.13 samples/sec   Loss 1.6054   LearningRate 0.0039   Epoch: 16   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:45,259-Speed 5425.72 samples/sec   Loss 1.5896   LearningRate 0.0039   Epoch: 16   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:47,074-Speed 5642.90 samples/sec   Loss 1.6103   LearningRate 0.0039   Epoch: 16   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:48,913-Speed 5569.75 samples/sec   Loss 1.6286   LearningRate 0.0039   Epoch: 16   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:50,735-Speed 5622.66 samples/sec   Loss 1.5725   LearningRate 0.0039   Epoch: 16   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:52,573-Speed 5570.12 samples/sec   Loss 1.6096   LearningRate 0.0039   Epoch: 16   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:54,395-Speed 5623.13 samples/sec   Loss 1.6176   LearningRate 0.0039   Epoch: 16   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:56,228-Speed 5586.90 samples/sec   Loss 1.6386   LearningRate 0.0038   Epoch: 16   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:58,052-Speed 5618.21 samples/sec   Loss 1.6779   LearningRate 0.0038   Epoch: 16   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:12:59,885-Speed 5586.95 samples/sec   Loss 1.7031   LearningRate 0.0038   Epoch: 16   Global Step: 91430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:13:01,706-Speed 5625.09 samples/sec   Loss 1.6120   LearningRate 0.0038   Epoch: 16   Global Step: 91440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:03,529-Speed 5621.44 samples/sec   Loss 1.5984   LearningRate 0.0038   Epoch: 16   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:05,344-Speed 5642.02 samples/sec   Loss 1.5723   LearningRate 0.0038   Epoch: 16   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:07,163-Speed 5632.22 samples/sec   Loss 1.6198   LearningRate 0.0038   Epoch: 16   Global Step: 91470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:08,985-Speed 5621.42 samples/sec   Loss 1.6454   LearningRate 0.0038   Epoch: 16   Global Step: 91480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:10,794-Speed 5662.14 samples/sec   Loss 1.5867   LearningRate 0.0038   Epoch: 16   Global Step: 91490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:12,602-Speed 5667.38 samples/sec   Loss 1.7237   LearningRate 0.0038   Epoch: 16   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:14,422-Speed 5626.97 samples/sec   Loss 1.6510   LearningRate 0.0038   Epoch: 16   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:16,229-Speed 5669.86 samples/sec   Loss 1.5818   LearningRate 0.0038   Epoch: 16   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:18,038-Speed 5663.46 samples/sec   Loss 1.6330   LearningRate 0.0038   Epoch: 16   Global Step: 91530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:19,841-Speed 5680.08 samples/sec   Loss 1.6647   LearningRate 0.0038   Epoch: 16   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:21,651-Speed 5657.37 samples/sec   Loss 1.6413   LearningRate 0.0038   Epoch: 16   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:23,471-Speed 5628.15 samples/sec   Loss 1.6247   LearningRate 0.0038   Epoch: 16   Global Step: 91560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:25,311-Speed 5567.53 samples/sec   Loss 1.5541   LearningRate 0.0038   Epoch: 16   Global Step: 91570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:27,122-Speed 5657.40 samples/sec   Loss 1.6735   LearningRate 0.0038   Epoch: 16   Global Step: 91580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:28,936-Speed 5647.58 samples/sec   Loss 1.6268   LearningRate 0.0038   Epoch: 16   Global Step: 91590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:30,741-Speed 5674.54 samples/sec   Loss 1.6493   LearningRate 0.0038   Epoch: 16   Global Step: 91600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:32,561-Speed 5629.77 samples/sec   Loss 1.7138   LearningRate 0.0038   Epoch: 16   Global Step: 91610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:34,389-Speed 5602.69 samples/sec   Loss 1.6276   LearningRate 0.0038   Epoch: 16   Global Step: 91620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:36,213-Speed 5616.47 samples/sec   Loss 1.5259   LearningRate 0.0038   Epoch: 16   Global Step: 91630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:38,034-Speed 5623.84 samples/sec   Loss 1.6847   LearningRate 0.0038   Epoch: 16   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:39,860-Speed 5611.96 samples/sec   Loss 1.7511   LearningRate 0.0038   Epoch: 16   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:41,680-Speed 5627.40 samples/sec   Loss 1.6074   LearningRate 0.0038   Epoch: 16   Global Step: 91660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:43,505-Speed 5612.94 samples/sec   Loss 1.6431   LearningRate 0.0038   Epoch: 16   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:45,319-Speed 5646.30 samples/sec   Loss 1.6317   LearningRate 0.0038   Epoch: 16   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:47,136-Speed 5637.06 samples/sec   Loss 1.7238   LearningRate 0.0038   Epoch: 16   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:48,979-Speed 5557.64 samples/sec   Loss 1.7042   LearningRate 0.0037   Epoch: 16   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:50,805-Speed 5611.83 samples/sec   Loss 1.6109   LearningRate 0.0037   Epoch: 16   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:52,618-Speed 5650.63 samples/sec   Loss 1.6300   LearningRate 0.0037   Epoch: 16   Global Step: 91720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:54,435-Speed 5635.07 samples/sec   Loss 1.6588   LearningRate 0.0037   Epoch: 16   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:56,241-Speed 5671.44 samples/sec   Loss 1.5928   LearningRate 0.0037   Epoch: 16   Global Step: 91740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:58,049-Speed 5666.26 samples/sec   Loss 1.6902   LearningRate 0.0037   Epoch: 16   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:13:59,859-Speed 5659.83 samples/sec   Loss 1.6639   LearningRate 0.0037   Epoch: 16   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:01,690-Speed 5595.27 samples/sec   Loss 1.6048   LearningRate 0.0037   Epoch: 16   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:03,500-Speed 5657.31 samples/sec   Loss 1.6389   LearningRate 0.0037   Epoch: 16   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:05,312-Speed 5653.08 samples/sec   Loss 1.6450   LearningRate 0.0037   Epoch: 16   Global Step: 91790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:07,128-Speed 5643.90 samples/sec   Loss 1.6385   LearningRate 0.0037   Epoch: 16   Global Step: 91800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:08,959-Speed 5593.86 samples/sec   Loss 1.7345   LearningRate 0.0037   Epoch: 16   Global Step: 91810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:10,794-Speed 5581.67 samples/sec   Loss 1.6705   LearningRate 0.0037   Epoch: 16   Global Step: 91820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:12,603-Speed 5662.03 samples/sec   Loss 1.6531   LearningRate 0.0037   Epoch: 16   Global Step: 91830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:14,411-Speed 5666.36 samples/sec   Loss 1.6645   LearningRate 0.0037   Epoch: 16   Global Step: 91840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:14:16,210-Speed 5695.24 samples/sec   Loss 1.5756   LearningRate 0.0037   Epoch: 16   Global Step: 91850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:18,023-Speed 5649.96 samples/sec   Loss 1.6789   LearningRate 0.0037   Epoch: 16   Global Step: 91860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:19,832-Speed 5660.02 samples/sec   Loss 1.6856   LearningRate 0.0037   Epoch: 16   Global Step: 91870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:21,676-Speed 5554.50 samples/sec   Loss 1.5757   LearningRate 0.0037   Epoch: 16   Global Step: 91880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:23,505-Speed 5601.50 samples/sec   Loss 1.6535   LearningRate 0.0037   Epoch: 16   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:25,317-Speed 5652.35 samples/sec   Loss 1.6464   LearningRate 0.0037   Epoch: 16   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:27,131-Speed 5647.51 samples/sec   Loss 1.6703   LearningRate 0.0037   Epoch: 16   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:28,967-Speed 5580.84 samples/sec   Loss 1.7428   LearningRate 0.0037   Epoch: 16   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:30,787-Speed 5626.54 samples/sec   Loss 1.5945   LearningRate 0.0037   Epoch: 16   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:32,630-Speed 5560.50 samples/sec   Loss 1.6669   LearningRate 0.0037   Epoch: 16   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:34,448-Speed 5632.87 samples/sec   Loss 1.6298   LearningRate 0.0037   Epoch: 16   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:36,290-Speed 5562.04 samples/sec   Loss 1.7162   LearningRate 0.0037   Epoch: 16   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:38,138-Speed 5541.47 samples/sec   Loss 1.6404   LearningRate 0.0037   Epoch: 16   Global Step: 91970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:39,956-Speed 5635.16 samples/sec   Loss 1.6579   LearningRate 0.0037   Epoch: 16   Global Step: 91980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:41,773-Speed 5637.43 samples/sec   Loss 1.6880   LearningRate 0.0037   Epoch: 16   Global Step: 91990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:14:43,607-Speed 5584.84 samples/sec   Loss 1.6948   LearningRate 0.0036   Epoch: 16   Global Step: 92000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:15:09,658-[lfw][92000]XNorm: 21.828823
Training: 2022-04-27 07:15:09,659-[lfw][92000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-04-27 07:15:09,659-[lfw][92000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:15:39,836-[cfp_fp][92000]XNorm: 20.339098
Training: 2022-04-27 07:15:39,836-[cfp_fp][92000]Accuracy-Flip: 0.97557+-0.00594
Training: 2022-04-27 07:15:39,837-[cfp_fp][92000]Accuracy-Highest: 0.97557
Training: 2022-04-27 07:16:05,869-[agedb_30][92000]XNorm: 21.785522
Training: 2022-04-27 07:16:05,870-[agedb_30][92000]Accuracy-Flip: 0.98167+-0.00715
Training: 2022-04-27 07:16:05,870-[agedb_30][92000]Accuracy-Highest: 0.98167
Training: 2022-04-27 07:16:07,702-Speed 121.77 samples/sec   Loss 1.6111   LearningRate 0.0036   Epoch: 16   Global Step: 92010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:09,513-Speed 5656.55 samples/sec   Loss 1.6362   LearningRate 0.0036   Epoch: 16   Global Step: 92020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:11,322-Speed 5660.90 samples/sec   Loss 1.6244   LearningRate 0.0036   Epoch: 16   Global Step: 92030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:13,149-Speed 5605.17 samples/sec   Loss 1.7146   LearningRate 0.0036   Epoch: 16   Global Step: 92040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:14,940-Speed 5720.46 samples/sec   Loss 1.6954   LearningRate 0.0036   Epoch: 16   Global Step: 92050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:16,750-Speed 5659.51 samples/sec   Loss 1.6277   LearningRate 0.0036   Epoch: 16   Global Step: 92060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:18,559-Speed 5661.83 samples/sec   Loss 1.6821   LearningRate 0.0036   Epoch: 16   Global Step: 92070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:20,397-Speed 5573.70 samples/sec   Loss 1.5939   LearningRate 0.0036   Epoch: 16   Global Step: 92080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:22,238-Speed 5565.48 samples/sec   Loss 1.5971   LearningRate 0.0036   Epoch: 16   Global Step: 92090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:24,073-Speed 5580.65 samples/sec   Loss 1.6517   LearningRate 0.0036   Epoch: 16   Global Step: 92100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:25,891-Speed 5634.57 samples/sec   Loss 1.6631   LearningRate 0.0036   Epoch: 16   Global Step: 92110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:27,704-Speed 5649.85 samples/sec   Loss 1.6765   LearningRate 0.0036   Epoch: 16   Global Step: 92120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:29,547-Speed 5558.00 samples/sec   Loss 1.7684   LearningRate 0.0036   Epoch: 16   Global Step: 92130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:31,375-Speed 5605.79 samples/sec   Loss 1.6865   LearningRate 0.0036   Epoch: 16   Global Step: 92140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:33,188-Speed 5649.29 samples/sec   Loss 1.7359   LearningRate 0.0036   Epoch: 16   Global Step: 92150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:35,003-Speed 5644.18 samples/sec   Loss 1.6810   LearningRate 0.0036   Epoch: 16   Global Step: 92160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:36,824-Speed 5626.51 samples/sec   Loss 1.7166   LearningRate 0.0036   Epoch: 16   Global Step: 92170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:38,657-Speed 5588.68 samples/sec   Loss 1.6172   LearningRate 0.0036   Epoch: 16   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:40,506-Speed 5538.06 samples/sec   Loss 1.6924   LearningRate 0.0036   Epoch: 16   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:42,315-Speed 5664.67 samples/sec   Loss 1.5996   LearningRate 0.0036   Epoch: 16   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:44,137-Speed 5619.63 samples/sec   Loss 1.6688   LearningRate 0.0036   Epoch: 16   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:45,969-Speed 5592.08 samples/sec   Loss 1.7416   LearningRate 0.0036   Epoch: 16   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:47,784-Speed 5645.49 samples/sec   Loss 1.5631   LearningRate 0.0036   Epoch: 16   Global Step: 92230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:49,600-Speed 5638.30 samples/sec   Loss 1.7200   LearningRate 0.0036   Epoch: 16   Global Step: 92240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:51,416-Speed 5640.65 samples/sec   Loss 1.7085   LearningRate 0.0036   Epoch: 16   Global Step: 92250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:16:53,219-Speed 5682.35 samples/sec   Loss 1.6513   LearningRate 0.0036   Epoch: 16   Global Step: 92260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:55,026-Speed 5670.88 samples/sec   Loss 1.6588   LearningRate 0.0036   Epoch: 16   Global Step: 92270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:56,838-Speed 5651.94 samples/sec   Loss 1.6068   LearningRate 0.0036   Epoch: 16   Global Step: 92280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:16:58,644-Speed 5671.69 samples/sec   Loss 1.6333   LearningRate 0.0036   Epoch: 16   Global Step: 92290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:00,447-Speed 5681.94 samples/sec   Loss 1.6847   LearningRate 0.0035   Epoch: 16   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:02,269-Speed 5622.30 samples/sec   Loss 1.7292   LearningRate 0.0035   Epoch: 16   Global Step: 92310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:04,088-Speed 5629.57 samples/sec   Loss 1.6766   LearningRate 0.0035   Epoch: 16   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:05,916-Speed 5603.94 samples/sec   Loss 1.7493   LearningRate 0.0035   Epoch: 16   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:07,743-Speed 5606.78 samples/sec   Loss 1.6163   LearningRate 0.0035   Epoch: 16   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:09,577-Speed 5585.35 samples/sec   Loss 1.6090   LearningRate 0.0035   Epoch: 16   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:11,387-Speed 5657.97 samples/sec   Loss 1.6838   LearningRate 0.0035   Epoch: 16   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:13,201-Speed 5649.30 samples/sec   Loss 1.7548   LearningRate 0.0035   Epoch: 16   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:15,026-Speed 5610.41 samples/sec   Loss 1.6908   LearningRate 0.0035   Epoch: 16   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:16,855-Speed 5602.89 samples/sec   Loss 1.7784   LearningRate 0.0035   Epoch: 16   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:18,669-Speed 5647.51 samples/sec   Loss 1.6892   LearningRate 0.0035   Epoch: 16   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:20,492-Speed 5619.74 samples/sec   Loss 1.7082   LearningRate 0.0035   Epoch: 16   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:22,364-Speed 5471.82 samples/sec   Loss 1.7807   LearningRate 0.0035   Epoch: 16   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:24,189-Speed 5613.06 samples/sec   Loss 1.6377   LearningRate 0.0035   Epoch: 16   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:25,996-Speed 5666.97 samples/sec   Loss 1.6889   LearningRate 0.0035   Epoch: 16   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:27,805-Speed 5663.12 samples/sec   Loss 1.7323   LearningRate 0.0035   Epoch: 16   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:29,606-Speed 5686.25 samples/sec   Loss 1.7498   LearningRate 0.0035   Epoch: 16   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:31,423-Speed 5638.40 samples/sec   Loss 1.6852   LearningRate 0.0035   Epoch: 16   Global Step: 92470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:33,261-Speed 5572.56 samples/sec   Loss 1.7173   LearningRate 0.0035   Epoch: 16   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:35,094-Speed 5588.44 samples/sec   Loss 1.7468   LearningRate 0.0035   Epoch: 16   Global Step: 92490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:36,911-Speed 5637.78 samples/sec   Loss 1.7346   LearningRate 0.0035   Epoch: 16   Global Step: 92500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:38,750-Speed 5570.07 samples/sec   Loss 1.6346   LearningRate 0.0035   Epoch: 16   Global Step: 92510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:40,576-Speed 5610.24 samples/sec   Loss 1.6308   LearningRate 0.0035   Epoch: 16   Global Step: 92520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:42,404-Speed 5604.49 samples/sec   Loss 1.5990   LearningRate 0.0035   Epoch: 16   Global Step: 92530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:44,213-Speed 5661.29 samples/sec   Loss 1.7661   LearningRate 0.0035   Epoch: 16   Global Step: 92540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:46,027-Speed 5647.00 samples/sec   Loss 1.7564   LearningRate 0.0035   Epoch: 16   Global Step: 92550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:47,844-Speed 5637.51 samples/sec   Loss 1.6152   LearningRate 0.0035   Epoch: 16   Global Step: 92560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:49,662-Speed 5633.91 samples/sec   Loss 1.7465   LearningRate 0.0035   Epoch: 16   Global Step: 92570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:51,489-Speed 5608.53 samples/sec   Loss 1.7232   LearningRate 0.0035   Epoch: 16   Global Step: 92580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:17:53,350-Speed 5502.81 samples/sec   Loss 1.7102   LearningRate 0.0035   Epoch: 16   Global Step: 92590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:55,206-Speed 5518.51 samples/sec   Loss 1.6396   LearningRate 0.0034   Epoch: 16   Global Step: 92600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:57,067-Speed 5504.84 samples/sec   Loss 1.6939   LearningRate 0.0034   Epoch: 16   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:17:58,901-Speed 5584.81 samples/sec   Loss 1.6584   LearningRate 0.0034   Epoch: 16   Global Step: 92620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:00,706-Speed 5675.34 samples/sec   Loss 1.7467   LearningRate 0.0034   Epoch: 16   Global Step: 92630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:02,526-Speed 5630.12 samples/sec   Loss 1.6319   LearningRate 0.0034   Epoch: 16   Global Step: 92640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:04,365-Speed 5569.99 samples/sec   Loss 1.7578   LearningRate 0.0034   Epoch: 16   Global Step: 92650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:06,278-Speed 5355.46 samples/sec   Loss 1.7108   LearningRate 0.0034   Epoch: 16   Global Step: 92660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:08,089-Speed 5656.05 samples/sec   Loss 1.6885   LearningRate 0.0034   Epoch: 16   Global Step: 92670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:09,905-Speed 5638.38 samples/sec   Loss 1.6332   LearningRate 0.0034   Epoch: 16   Global Step: 92680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:11,711-Speed 5673.06 samples/sec   Loss 1.7025   LearningRate 0.0034   Epoch: 16   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:13,556-Speed 5552.51 samples/sec   Loss 1.7002   LearningRate 0.0034   Epoch: 16   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:15,385-Speed 5598.59 samples/sec   Loss 1.7151   LearningRate 0.0034   Epoch: 16   Global Step: 92710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:17,219-Speed 5586.17 samples/sec   Loss 1.7687   LearningRate 0.0034   Epoch: 16   Global Step: 92720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:19,041-Speed 5620.92 samples/sec   Loss 1.7380   LearningRate 0.0034   Epoch: 16   Global Step: 92730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:20,876-Speed 5583.06 samples/sec   Loss 1.7389   LearningRate 0.0034   Epoch: 16   Global Step: 92740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:22,692-Speed 5640.23 samples/sec   Loss 1.5988   LearningRate 0.0034   Epoch: 16   Global Step: 92750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:24,534-Speed 5563.92 samples/sec   Loss 1.6560   LearningRate 0.0034   Epoch: 16   Global Step: 92760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:26,363-Speed 5599.68 samples/sec   Loss 1.6916   LearningRate 0.0034   Epoch: 16   Global Step: 92770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:28,195-Speed 5592.90 samples/sec   Loss 1.6527   LearningRate 0.0034   Epoch: 16   Global Step: 92780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:30,011-Speed 5638.98 samples/sec   Loss 1.7753   LearningRate 0.0034   Epoch: 16   Global Step: 92790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:31,893-Speed 5441.79 samples/sec   Loss 1.7874   LearningRate 0.0034   Epoch: 16   Global Step: 92800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:33,741-Speed 5543.15 samples/sec   Loss 1.6815   LearningRate 0.0034   Epoch: 16   Global Step: 92810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:35,554-Speed 5649.15 samples/sec   Loss 1.8089   LearningRate 0.0034   Epoch: 16   Global Step: 92820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:37,389-Speed 5583.56 samples/sec   Loss 1.7326   LearningRate 0.0034   Epoch: 16   Global Step: 92830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:39,210-Speed 5623.96 samples/sec   Loss 1.6815   LearningRate 0.0034   Epoch: 16   Global Step: 92840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:41,073-Speed 5500.40 samples/sec   Loss 1.6738   LearningRate 0.0034   Epoch: 16   Global Step: 92850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:42,907-Speed 5583.14 samples/sec   Loss 1.7099   LearningRate 0.0034   Epoch: 16   Global Step: 92860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:44,725-Speed 5635.58 samples/sec   Loss 1.7205   LearningRate 0.0034   Epoch: 16   Global Step: 92870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:46,556-Speed 5594.59 samples/sec   Loss 1.7082   LearningRate 0.0034   Epoch: 16   Global Step: 92880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:48,371-Speed 5643.38 samples/sec   Loss 1.6862   LearningRate 0.0034   Epoch: 16   Global Step: 92890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:50,202-Speed 5596.57 samples/sec   Loss 1.6918   LearningRate 0.0034   Epoch: 16   Global Step: 92900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:52,014-Speed 5652.24 samples/sec   Loss 1.7086   LearningRate 0.0033   Epoch: 16   Global Step: 92910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:53,843-Speed 5600.63 samples/sec   Loss 1.6854   LearningRate 0.0033   Epoch: 16   Global Step: 92920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:55,676-Speed 5589.54 samples/sec   Loss 1.6696   LearningRate 0.0033   Epoch: 16   Global Step: 92930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:57,489-Speed 5649.07 samples/sec   Loss 1.7124   LearningRate 0.0033   Epoch: 16   Global Step: 92940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:18:59,315-Speed 5608.02 samples/sec   Loss 1.7395   LearningRate 0.0033   Epoch: 16   Global Step: 92950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:01,132-Speed 5639.11 samples/sec   Loss 1.6658   LearningRate 0.0033   Epoch: 16   Global Step: 92960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:02,941-Speed 5662.35 samples/sec   Loss 1.7845   LearningRate 0.0033   Epoch: 16   Global Step: 92970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:04,757-Speed 5638.17 samples/sec   Loss 1.7378   LearningRate 0.0033   Epoch: 16   Global Step: 92980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:06,585-Speed 5606.16 samples/sec   Loss 1.6982   LearningRate 0.0033   Epoch: 16   Global Step: 92990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:08,408-Speed 5620.47 samples/sec   Loss 1.7654   LearningRate 0.0033   Epoch: 16   Global Step: 93000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:10,235-Speed 5607.03 samples/sec   Loss 1.7113   LearningRate 0.0033   Epoch: 16   Global Step: 93010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:12,064-Speed 5597.95 samples/sec   Loss 1.6741   LearningRate 0.0033   Epoch: 16   Global Step: 93020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:13,887-Speed 5618.73 samples/sec   Loss 1.7552   LearningRate 0.0033   Epoch: 16   Global Step: 93030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:15,707-Speed 5629.52 samples/sec   Loss 1.7276   LearningRate 0.0033   Epoch: 16   Global Step: 93040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:17,521-Speed 5647.92 samples/sec   Loss 1.8275   LearningRate 0.0033   Epoch: 16   Global Step: 93050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:19,339-Speed 5634.08 samples/sec   Loss 1.7635   LearningRate 0.0033   Epoch: 16   Global Step: 93060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:21,171-Speed 5590.41 samples/sec   Loss 1.7143   LearningRate 0.0033   Epoch: 16   Global Step: 93070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:22,987-Speed 5639.38 samples/sec   Loss 1.7373   LearningRate 0.0033   Epoch: 16   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:24,805-Speed 5635.75 samples/sec   Loss 1.7013   LearningRate 0.0033   Epoch: 16   Global Step: 93090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:19:26,623-Speed 5636.03 samples/sec   Loss 1.7521   LearningRate 0.0033   Epoch: 16   Global Step: 93100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:28,425-Speed 5683.44 samples/sec   Loss 1.6906   LearningRate 0.0033   Epoch: 16   Global Step: 93110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:30,242-Speed 5638.78 samples/sec   Loss 1.6373   LearningRate 0.0033   Epoch: 16   Global Step: 93120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:32,058-Speed 5638.27 samples/sec   Loss 1.7364   LearningRate 0.0033   Epoch: 16   Global Step: 93130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:33,877-Speed 5634.05 samples/sec   Loss 1.8087   LearningRate 0.0033   Epoch: 16   Global Step: 93140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:35,687-Speed 5659.30 samples/sec   Loss 1.7876   LearningRate 0.0033   Epoch: 16   Global Step: 93150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:37,510-Speed 5619.84 samples/sec   Loss 1.6330   LearningRate 0.0033   Epoch: 16   Global Step: 93160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:39,339-Speed 5598.53 samples/sec   Loss 1.7263   LearningRate 0.0033   Epoch: 16   Global Step: 93170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:41,160-Speed 5624.19 samples/sec   Loss 1.6928   LearningRate 0.0033   Epoch: 16   Global Step: 93180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:42,974-Speed 5646.82 samples/sec   Loss 1.7167   LearningRate 0.0033   Epoch: 16   Global Step: 93190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:44,789-Speed 5643.91 samples/sec   Loss 1.7484   LearningRate 0.0033   Epoch: 16   Global Step: 93200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:19:46,601-Speed 5654.88 samples/sec   Loss 1.7247   LearningRate 0.0033   Epoch: 16   Global Step: 93210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:48,410-Speed 5661.75 samples/sec   Loss 1.8342   LearningRate 0.0032   Epoch: 16   Global Step: 93220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:50,232-Speed 5621.23 samples/sec   Loss 1.6730   LearningRate 0.0032   Epoch: 16   Global Step: 93230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:52,053-Speed 5624.54 samples/sec   Loss 1.6808   LearningRate 0.0032   Epoch: 16   Global Step: 93240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:53,877-Speed 5619.31 samples/sec   Loss 1.7570   LearningRate 0.0032   Epoch: 16   Global Step: 93250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:55,686-Speed 5660.42 samples/sec   Loss 1.7706   LearningRate 0.0032   Epoch: 16   Global Step: 93260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:57,508-Speed 5623.12 samples/sec   Loss 1.7493   LearningRate 0.0032   Epoch: 16   Global Step: 93270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:19:59,326-Speed 5635.35 samples/sec   Loss 1.6599   LearningRate 0.0032   Epoch: 16   Global Step: 93280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:01,148-Speed 5621.91 samples/sec   Loss 1.6860   LearningRate 0.0032   Epoch: 16   Global Step: 93290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:02,960-Speed 5653.18 samples/sec   Loss 1.7500   LearningRate 0.0032   Epoch: 16   Global Step: 93300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:04,757-Speed 5698.71 samples/sec   Loss 1.7686   LearningRate 0.0032   Epoch: 16   Global Step: 93310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:06,593-Speed 5582.82 samples/sec   Loss 1.6586   LearningRate 0.0032   Epoch: 16   Global Step: 93320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:08,409-Speed 5640.11 samples/sec   Loss 1.6728   LearningRate 0.0032   Epoch: 16   Global Step: 93330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:10,222-Speed 5648.91 samples/sec   Loss 1.6980   LearningRate 0.0032   Epoch: 16   Global Step: 93340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:12,041-Speed 5631.01 samples/sec   Loss 1.6548   LearningRate 0.0032   Epoch: 16   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:13,864-Speed 5620.70 samples/sec   Loss 1.6852   LearningRate 0.0032   Epoch: 16   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:15,695-Speed 5594.42 samples/sec   Loss 1.6653   LearningRate 0.0032   Epoch: 16   Global Step: 93370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:17,530-Speed 5582.34 samples/sec   Loss 1.7695   LearningRate 0.0032   Epoch: 16   Global Step: 93380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:19,338-Speed 5664.94 samples/sec   Loss 1.6916   LearningRate 0.0032   Epoch: 16   Global Step: 93390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:21,149-Speed 5656.99 samples/sec   Loss 1.6751   LearningRate 0.0032   Epoch: 16   Global Step: 93400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:22,960-Speed 5654.93 samples/sec   Loss 1.6728   LearningRate 0.0032   Epoch: 16   Global Step: 93410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:20:24,784-Speed 5617.44 samples/sec   Loss 1.7158   LearningRate 0.0032   Epoch: 16   Global Step: 93420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:26,597-Speed 5650.72 samples/sec   Loss 1.7127   LearningRate 0.0032   Epoch: 16   Global Step: 93430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:28,413-Speed 5639.92 samples/sec   Loss 1.6824   LearningRate 0.0032   Epoch: 16   Global Step: 93440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:30,225-Speed 5651.33 samples/sec   Loss 1.7477   LearningRate 0.0032   Epoch: 16   Global Step: 93450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:32,040-Speed 5643.38 samples/sec   Loss 1.7508   LearningRate 0.0032   Epoch: 16   Global Step: 93460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:33,877-Speed 5577.00 samples/sec   Loss 1.7419   LearningRate 0.0032   Epoch: 16   Global Step: 93470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:35,692-Speed 5644.38 samples/sec   Loss 1.7826   LearningRate 0.0032   Epoch: 16   Global Step: 93480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:37,510-Speed 5637.12 samples/sec   Loss 1.6385   LearningRate 0.0032   Epoch: 16   Global Step: 93490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:39,333-Speed 5616.70 samples/sec   Loss 1.6758   LearningRate 0.0032   Epoch: 16   Global Step: 93500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:41,163-Speed 5598.80 samples/sec   Loss 1.6635   LearningRate 0.0032   Epoch: 16   Global Step: 93510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:42,986-Speed 5617.26 samples/sec   Loss 1.7185   LearningRate 0.0032   Epoch: 16   Global Step: 93520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:44,797-Speed 5657.67 samples/sec   Loss 1.8146   LearningRate 0.0032   Epoch: 16   Global Step: 93530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:46,606-Speed 5661.93 samples/sec   Loss 1.6634   LearningRate 0.0031   Epoch: 16   Global Step: 93540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:48,437-Speed 5594.94 samples/sec   Loss 1.6444   LearningRate 0.0031   Epoch: 16   Global Step: 93550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:50,256-Speed 5629.46 samples/sec   Loss 1.7872   LearningRate 0.0031   Epoch: 16   Global Step: 93560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:52,076-Speed 5628.55 samples/sec   Loss 1.7383   LearningRate 0.0031   Epoch: 16   Global Step: 93570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:53,911-Speed 5582.89 samples/sec   Loss 1.6435   LearningRate 0.0031   Epoch: 16   Global Step: 93580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:55,721-Speed 5658.97 samples/sec   Loss 1.6152   LearningRate 0.0031   Epoch: 16   Global Step: 93590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:57,545-Speed 5617.61 samples/sec   Loss 1.7199   LearningRate 0.0031   Epoch: 16   Global Step: 93600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:20:59,360-Speed 5642.78 samples/sec   Loss 1.7264   LearningRate 0.0031   Epoch: 16   Global Step: 93610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:01,166-Speed 5672.11 samples/sec   Loss 1.7307   LearningRate 0.0031   Epoch: 16   Global Step: 93620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:03,004-Speed 5572.76 samples/sec   Loss 1.7653   LearningRate 0.0031   Epoch: 16   Global Step: 93630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:04,838-Speed 5586.32 samples/sec   Loss 1.7261   LearningRate 0.0031   Epoch: 16   Global Step: 93640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:06,671-Speed 5588.71 samples/sec   Loss 1.6677   LearningRate 0.0031   Epoch: 16   Global Step: 93650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:08,492-Speed 5623.12 samples/sec   Loss 1.7162   LearningRate 0.0031   Epoch: 16   Global Step: 93660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:10,313-Speed 5625.15 samples/sec   Loss 1.6759   LearningRate 0.0031   Epoch: 16   Global Step: 93670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:12,127-Speed 5646.86 samples/sec   Loss 1.7332   LearningRate 0.0031   Epoch: 16   Global Step: 93680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:13,946-Speed 5631.38 samples/sec   Loss 1.8190   LearningRate 0.0031   Epoch: 16   Global Step: 93690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:15,786-Speed 5565.92 samples/sec   Loss 1.7273   LearningRate 0.0031   Epoch: 16   Global Step: 93700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:17,615-Speed 5602.30 samples/sec   Loss 1.7585   LearningRate 0.0031   Epoch: 16   Global Step: 93710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:19,429-Speed 5648.95 samples/sec   Loss 1.6866   LearningRate 0.0031   Epoch: 16   Global Step: 93720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:21:21,230-Speed 5685.50 samples/sec   Loss 1.7202   LearningRate 0.0031   Epoch: 16   Global Step: 93730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:23,045-Speed 5646.00 samples/sec   Loss 1.7427   LearningRate 0.0031   Epoch: 16   Global Step: 93740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:24,858-Speed 5649.68 samples/sec   Loss 1.6992   LearningRate 0.0031   Epoch: 16   Global Step: 93750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:26,697-Speed 5568.43 samples/sec   Loss 1.7654   LearningRate 0.0031   Epoch: 16   Global Step: 93760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:28,531-Speed 5585.19 samples/sec   Loss 1.6850   LearningRate 0.0031   Epoch: 16   Global Step: 93770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:30,359-Speed 5603.47 samples/sec   Loss 1.6620   LearningRate 0.0031   Epoch: 16   Global Step: 93780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:32,179-Speed 5628.05 samples/sec   Loss 1.7143   LearningRate 0.0031   Epoch: 16   Global Step: 93790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:33,989-Speed 5661.50 samples/sec   Loss 1.7068   LearningRate 0.0031   Epoch: 16   Global Step: 93800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:35,819-Speed 5597.26 samples/sec   Loss 1.6784   LearningRate 0.0031   Epoch: 16   Global Step: 93810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:37,669-Speed 5538.17 samples/sec   Loss 1.7509   LearningRate 0.0031   Epoch: 16   Global Step: 93820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:39,598-Speed 5309.36 samples/sec   Loss 1.7797   LearningRate 0.0031   Epoch: 16   Global Step: 93830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:41,466-Speed 5484.55 samples/sec   Loss 1.7519   LearningRate 0.0031   Epoch: 16   Global Step: 93840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:43,289-Speed 5617.46 samples/sec   Loss 1.7404   LearningRate 0.0031   Epoch: 16   Global Step: 93850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:45,104-Speed 5644.11 samples/sec   Loss 1.6503   LearningRate 0.0030   Epoch: 16   Global Step: 93860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:46,927-Speed 5620.96 samples/sec   Loss 1.7022   LearningRate 0.0030   Epoch: 16   Global Step: 93870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:48,759-Speed 5589.81 samples/sec   Loss 1.7387   LearningRate 0.0030   Epoch: 16   Global Step: 93880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:50,573-Speed 5648.65 samples/sec   Loss 1.7865   LearningRate 0.0030   Epoch: 16   Global Step: 93890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:21:52,384-Speed 5655.46 samples/sec   Loss 1.7271   LearningRate 0.0030   Epoch: 16   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:54,202-Speed 5633.71 samples/sec   Loss 1.7649   LearningRate 0.0030   Epoch: 16   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:56,038-Speed 5577.86 samples/sec   Loss 1.7537   LearningRate 0.0030   Epoch: 16   Global Step: 93920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:57,854-Speed 5642.39 samples/sec   Loss 1.6784   LearningRate 0.0030   Epoch: 16   Global Step: 93930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:21:59,676-Speed 5622.87 samples/sec   Loss 1.7278   LearningRate 0.0030   Epoch: 16   Global Step: 93940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:22:01,487-Speed 5653.30 samples/sec   Loss 1.7055   LearningRate 0.0030   Epoch: 16   Global Step: 93950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:22:03,303-Speed 5641.69 samples/sec   Loss 1.7704   LearningRate 0.0030   Epoch: 16   Global Step: 93960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:22:05,142-Speed 5571.26 samples/sec   Loss 1.7379   LearningRate 0.0030   Epoch: 16   Global Step: 93970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:22:06,971-Speed 5601.99 samples/sec   Loss 1.7499   LearningRate 0.0030   Epoch: 16   Global Step: 93980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:22:08,841-Speed 5477.04 samples/sec   Loss 1.8010   LearningRate 0.0030   Epoch: 16   Global Step: 93990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:22:10,730-Speed 5421.82 samples/sec   Loss 1.7082   LearningRate 0.0030   Epoch: 16   Global Step: 94000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:22:37,013-[lfw][94000]XNorm: 21.855137
Training: 2022-04-27 07:22:37,014-[lfw][94000]Accuracy-Flip: 0.99750+-0.00327
Training: 2022-04-27 07:22:37,014-[lfw][94000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:23:07,511-[cfp_fp][94000]XNorm: 20.663132
Training: 2022-04-27 07:23:07,512-[cfp_fp][94000]Accuracy-Flip: 0.97557+-0.00744
Training: 2022-04-27 07:23:07,512-[cfp_fp][94000]Accuracy-Highest: 0.97557
Training: 2022-04-27 07:23:33,833-[agedb_30][94000]XNorm: 21.931744
Training: 2022-04-27 07:23:33,833-[agedb_30][94000]Accuracy-Flip: 0.97983+-0.00728
Training: 2022-04-27 07:23:33,834-[agedb_30][94000]Accuracy-Highest: 0.98167
Training: 2022-04-27 07:23:35,646-Speed 120.59 samples/sec   Loss 1.6699   LearningRate 0.0030   Epoch: 16   Global Step: 94010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:37,469-Speed 5619.81 samples/sec   Loss 1.7534   LearningRate 0.0030   Epoch: 16   Global Step: 94020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:39,285-Speed 5642.19 samples/sec   Loss 1.7101   LearningRate 0.0030   Epoch: 16   Global Step: 94030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:41,105-Speed 5626.16 samples/sec   Loss 1.6266   LearningRate 0.0030   Epoch: 16   Global Step: 94040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:42,934-Speed 5600.31 samples/sec   Loss 1.6716   LearningRate 0.0030   Epoch: 16   Global Step: 94050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:44,755-Speed 5627.08 samples/sec   Loss 1.7798   LearningRate 0.0030   Epoch: 16   Global Step: 94060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:46,587-Speed 5590.64 samples/sec   Loss 1.6104   LearningRate 0.0030   Epoch: 16   Global Step: 94070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:48,395-Speed 5665.34 samples/sec   Loss 1.6588   LearningRate 0.0030   Epoch: 16   Global Step: 94080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:50,203-Speed 5664.54 samples/sec   Loss 1.7501   LearningRate 0.0030   Epoch: 16   Global Step: 94090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:51,994-Speed 5720.54 samples/sec   Loss 1.7712   LearningRate 0.0030   Epoch: 16   Global Step: 94100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:53,800-Speed 5671.99 samples/sec   Loss 1.6006   LearningRate 0.0030   Epoch: 16   Global Step: 94110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:55,602-Speed 5684.12 samples/sec   Loss 1.6904   LearningRate 0.0030   Epoch: 16   Global Step: 94120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:57,413-Speed 5656.40 samples/sec   Loss 1.6522   LearningRate 0.0030   Epoch: 16   Global Step: 94130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:23:59,217-Speed 5676.48 samples/sec   Loss 1.7095   LearningRate 0.0030   Epoch: 16   Global Step: 94140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:01,036-Speed 5631.57 samples/sec   Loss 1.6906   LearningRate 0.0030   Epoch: 16   Global Step: 94150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:02,837-Speed 5689.80 samples/sec   Loss 1.6736   LearningRate 0.0030   Epoch: 16   Global Step: 94160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:04,643-Speed 5672.60 samples/sec   Loss 1.6390   LearningRate 0.0030   Epoch: 16   Global Step: 94170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:06,447-Speed 5677.09 samples/sec   Loss 1.6715   LearningRate 0.0030   Epoch: 16   Global Step: 94180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:08,278-Speed 5594.62 samples/sec   Loss 1.7514   LearningRate 0.0029   Epoch: 16   Global Step: 94190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:10,074-Speed 5702.36 samples/sec   Loss 1.6907   LearningRate 0.0029   Epoch: 16   Global Step: 94200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:11,915-Speed 5565.17 samples/sec   Loss 1.5860   LearningRate 0.0029   Epoch: 16   Global Step: 94210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:13,760-Speed 5550.61 samples/sec   Loss 1.7616   LearningRate 0.0029   Epoch: 16   Global Step: 94220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:15,576-Speed 5641.64 samples/sec   Loss 1.7647   LearningRate 0.0029   Epoch: 16   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:17,394-Speed 5634.79 samples/sec   Loss 1.7141   LearningRate 0.0029   Epoch: 16   Global Step: 94240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:19,203-Speed 5662.13 samples/sec   Loss 1.7859   LearningRate 0.0029   Epoch: 16   Global Step: 94250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:21,021-Speed 5633.82 samples/sec   Loss 1.7211   LearningRate 0.0029   Epoch: 16   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:22,843-Speed 5624.22 samples/sec   Loss 1.7356   LearningRate 0.0029   Epoch: 16   Global Step: 94270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:24,654-Speed 5656.01 samples/sec   Loss 1.7352   LearningRate 0.0029   Epoch: 16   Global Step: 94280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:26,465-Speed 5655.65 samples/sec   Loss 1.7116   LearningRate 0.0029   Epoch: 16   Global Step: 94290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:28,284-Speed 5630.40 samples/sec   Loss 1.6494   LearningRate 0.0029   Epoch: 16   Global Step: 94300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:30,092-Speed 5668.58 samples/sec   Loss 1.8207   LearningRate 0.0029   Epoch: 16   Global Step: 94310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:31,902-Speed 5658.71 samples/sec   Loss 1.8033   LearningRate 0.0029   Epoch: 16   Global Step: 94320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:33,715-Speed 5650.11 samples/sec   Loss 1.6853   LearningRate 0.0029   Epoch: 16   Global Step: 94330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:35,536-Speed 5624.46 samples/sec   Loss 1.7151   LearningRate 0.0029   Epoch: 16   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:37,359-Speed 5619.06 samples/sec   Loss 1.6994   LearningRate 0.0029   Epoch: 16   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:39,181-Speed 5622.21 samples/sec   Loss 1.6958   LearningRate 0.0029   Epoch: 16   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:41,005-Speed 5616.70 samples/sec   Loss 1.7912   LearningRate 0.0029   Epoch: 16   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:42,819-Speed 5646.54 samples/sec   Loss 1.7727   LearningRate 0.0029   Epoch: 16   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:44,632-Speed 5649.60 samples/sec   Loss 1.7188   LearningRate 0.0029   Epoch: 16   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:46,454-Speed 5624.10 samples/sec   Loss 1.7800   LearningRate 0.0029   Epoch: 16   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:48,272-Speed 5632.41 samples/sec   Loss 1.7912   LearningRate 0.0029   Epoch: 16   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:50,096-Speed 5617.79 samples/sec   Loss 1.6485   LearningRate 0.0029   Epoch: 16   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:51,903-Speed 5667.04 samples/sec   Loss 1.7291   LearningRate 0.0029   Epoch: 16   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:53,722-Speed 5630.48 samples/sec   Loss 1.8040   LearningRate 0.0029   Epoch: 16   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:55,537-Speed 5643.60 samples/sec   Loss 1.8031   LearningRate 0.0029   Epoch: 16   Global Step: 94450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:57,388-Speed 5533.54 samples/sec   Loss 1.7253   LearningRate 0.0029   Epoch: 16   Global Step: 94460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:24:59,202-Speed 5648.70 samples/sec   Loss 1.7432   LearningRate 0.0029   Epoch: 16   Global Step: 94470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:01,025-Speed 5619.12 samples/sec   Loss 1.8390   LearningRate 0.0029   Epoch: 16   Global Step: 94480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:02,848-Speed 5618.61 samples/sec   Loss 1.8297   LearningRate 0.0029   Epoch: 16   Global Step: 94490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:04,692-Speed 5557.59 samples/sec   Loss 1.6210   LearningRate 0.0029   Epoch: 16   Global Step: 94500   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:25:06,489-Speed 5699.96 samples/sec   Loss 1.7042   LearningRate 0.0029   Epoch: 16   Global Step: 94510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:08,297-Speed 5665.71 samples/sec   Loss 1.6707   LearningRate 0.0029   Epoch: 16   Global Step: 94520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:10,138-Speed 5563.33 samples/sec   Loss 1.7029   LearningRate 0.0028   Epoch: 16   Global Step: 94530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:12,029-Speed 5416.85 samples/sec   Loss 1.7375   LearningRate 0.0028   Epoch: 16   Global Step: 94540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:13,866-Speed 5576.97 samples/sec   Loss 1.7536   LearningRate 0.0028   Epoch: 16   Global Step: 94550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:15,700-Speed 5584.75 samples/sec   Loss 1.8786   LearningRate 0.0028   Epoch: 16   Global Step: 94560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:17,566-Speed 5490.86 samples/sec   Loss 1.6440   LearningRate 0.0028   Epoch: 16   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:19,485-Speed 5337.19 samples/sec   Loss 1.6845   LearningRate 0.0028   Epoch: 16   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:21,334-Speed 5542.77 samples/sec   Loss 1.7150   LearningRate 0.0028   Epoch: 16   Global Step: 94590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:23,149-Speed 5642.09 samples/sec   Loss 1.7406   LearningRate 0.0028   Epoch: 16   Global Step: 94600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:24,967-Speed 5633.36 samples/sec   Loss 1.7869   LearningRate 0.0028   Epoch: 16   Global Step: 94610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:26,788-Speed 5627.93 samples/sec   Loss 1.6550   LearningRate 0.0028   Epoch: 16   Global Step: 94620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:28,615-Speed 5606.42 samples/sec   Loss 1.6972   LearningRate 0.0028   Epoch: 16   Global Step: 94630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:30,437-Speed 5619.79 samples/sec   Loss 1.6563   LearningRate 0.0028   Epoch: 16   Global Step: 94640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:32,251-Speed 5648.53 samples/sec   Loss 1.6897   LearningRate 0.0028   Epoch: 16   Global Step: 94650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:34,072-Speed 5625.85 samples/sec   Loss 1.7183   LearningRate 0.0028   Epoch: 16   Global Step: 94660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:35,891-Speed 5631.00 samples/sec   Loss 1.6819   LearningRate 0.0028   Epoch: 16   Global Step: 94670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:37,736-Speed 5550.18 samples/sec   Loss 1.7807   LearningRate 0.0028   Epoch: 16   Global Step: 94680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:39,557-Speed 5628.30 samples/sec   Loss 1.7289   LearningRate 0.0028   Epoch: 16   Global Step: 94690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:41,394-Speed 5575.60 samples/sec   Loss 1.7045   LearningRate 0.0028   Epoch: 16   Global Step: 94700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:43,210-Speed 5641.29 samples/sec   Loss 1.6732   LearningRate 0.0028   Epoch: 16   Global Step: 94710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:45,026-Speed 5638.54 samples/sec   Loss 1.7216   LearningRate 0.0028   Epoch: 16   Global Step: 94720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:46,848-Speed 5622.16 samples/sec   Loss 1.6822   LearningRate 0.0028   Epoch: 16   Global Step: 94730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:48,664-Speed 5640.64 samples/sec   Loss 1.7453   LearningRate 0.0028   Epoch: 16   Global Step: 94740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:50,481-Speed 5637.87 samples/sec   Loss 1.6609   LearningRate 0.0028   Epoch: 16   Global Step: 94750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:52,303-Speed 5622.73 samples/sec   Loss 1.7235   LearningRate 0.0028   Epoch: 16   Global Step: 94760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:54,116-Speed 5651.12 samples/sec   Loss 1.7500   LearningRate 0.0028   Epoch: 16   Global Step: 94770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:55,930-Speed 5644.51 samples/sec   Loss 1.6932   LearningRate 0.0028   Epoch: 16   Global Step: 94780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:57,745-Speed 5643.02 samples/sec   Loss 1.7243   LearningRate 0.0028   Epoch: 16   Global Step: 94790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:25:59,578-Speed 5590.07 samples/sec   Loss 1.7186   LearningRate 0.0028   Epoch: 16   Global Step: 94800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:01,435-Speed 5516.81 samples/sec   Loss 1.6923   LearningRate 0.0028   Epoch: 16   Global Step: 94810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:03,269-Speed 5584.94 samples/sec   Loss 1.7113   LearningRate 0.0028   Epoch: 16   Global Step: 94820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:05,109-Speed 5567.17 samples/sec   Loss 1.6567   LearningRate 0.0028   Epoch: 16   Global Step: 94830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:06,914-Speed 5676.13 samples/sec   Loss 1.7465   LearningRate 0.0028   Epoch: 16   Global Step: 94840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:08,720-Speed 5670.81 samples/sec   Loss 1.6889   LearningRate 0.0028   Epoch: 16   Global Step: 94850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:10,533-Speed 5649.95 samples/sec   Loss 1.7374   LearningRate 0.0028   Epoch: 16   Global Step: 94860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:12,356-Speed 5620.67 samples/sec   Loss 1.6166   LearningRate 0.0027   Epoch: 16   Global Step: 94870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:14,175-Speed 5631.17 samples/sec   Loss 1.7820   LearningRate 0.0027   Epoch: 16   Global Step: 94880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:15,996-Speed 5622.83 samples/sec   Loss 1.7833   LearningRate 0.0027   Epoch: 16   Global Step: 94890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:17,811-Speed 5651.23 samples/sec   Loss 1.7161   LearningRate 0.0027   Epoch: 16   Global Step: 94900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:19,613-Speed 5682.41 samples/sec   Loss 1.7164   LearningRate 0.0027   Epoch: 16   Global Step: 94910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:21,423-Speed 5660.01 samples/sec   Loss 1.6735   LearningRate 0.0027   Epoch: 16   Global Step: 94920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:23,233-Speed 5657.91 samples/sec   Loss 1.6381   LearningRate 0.0027   Epoch: 16   Global Step: 94930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:25,043-Speed 5660.82 samples/sec   Loss 1.6882   LearningRate 0.0027   Epoch: 16   Global Step: 94940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:26,853-Speed 5660.94 samples/sec   Loss 1.7207   LearningRate 0.0027   Epoch: 16   Global Step: 94950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:28,663-Speed 5656.89 samples/sec   Loss 1.6800   LearningRate 0.0027   Epoch: 16   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:30,488-Speed 5614.61 samples/sec   Loss 1.6614   LearningRate 0.0027   Epoch: 16   Global Step: 94970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:32,298-Speed 5657.65 samples/sec   Loss 1.7135   LearningRate 0.0027   Epoch: 16   Global Step: 94980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:34,116-Speed 5634.68 samples/sec   Loss 1.6953   LearningRate 0.0027   Epoch: 16   Global Step: 94990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:35,925-Speed 5664.22 samples/sec   Loss 1.6684   LearningRate 0.0027   Epoch: 16   Global Step: 95000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:37,735-Speed 5656.83 samples/sec   Loss 1.7072   LearningRate 0.0027   Epoch: 16   Global Step: 95010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:26:39,535-Speed 5693.03 samples/sec   Loss 1.6350   LearningRate 0.0027   Epoch: 16   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:41,348-Speed 5648.23 samples/sec   Loss 1.7025   LearningRate 0.0027   Epoch: 16   Global Step: 95030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:43,167-Speed 5632.50 samples/sec   Loss 1.6005   LearningRate 0.0027   Epoch: 16   Global Step: 95040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:44,980-Speed 5648.94 samples/sec   Loss 1.6682   LearningRate 0.0027   Epoch: 16   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:46,788-Speed 5669.12 samples/sec   Loss 1.5989   LearningRate 0.0027   Epoch: 16   Global Step: 95060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:48,602-Speed 5644.99 samples/sec   Loss 1.7315   LearningRate 0.0027   Epoch: 16   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:50,422-Speed 5628.65 samples/sec   Loss 1.6410   LearningRate 0.0027   Epoch: 16   Global Step: 95080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:52,229-Speed 5669.15 samples/sec   Loss 1.7984   LearningRate 0.0027   Epoch: 16   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:54,035-Speed 5670.30 samples/sec   Loss 1.6860   LearningRate 0.0027   Epoch: 16   Global Step: 95100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:55,847-Speed 5654.00 samples/sec   Loss 1.6326   LearningRate 0.0027   Epoch: 16   Global Step: 95110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:57,659-Speed 5653.25 samples/sec   Loss 1.5589   LearningRate 0.0027   Epoch: 16   Global Step: 95120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:26:59,483-Speed 5616.73 samples/sec   Loss 1.6679   LearningRate 0.0027   Epoch: 16   Global Step: 95130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:01,299-Speed 5639.99 samples/sec   Loss 1.6998   LearningRate 0.0027   Epoch: 16   Global Step: 95140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:03,112-Speed 5648.87 samples/sec   Loss 1.7520   LearningRate 0.0027   Epoch: 16   Global Step: 95150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:04,918-Speed 5672.12 samples/sec   Loss 1.6982   LearningRate 0.0027   Epoch: 16   Global Step: 95160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:06,726-Speed 5666.50 samples/sec   Loss 1.5724   LearningRate 0.0027   Epoch: 16   Global Step: 95170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:08,539-Speed 5650.42 samples/sec   Loss 1.7767   LearningRate 0.0027   Epoch: 16   Global Step: 95180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:10,358-Speed 5631.95 samples/sec   Loss 1.7642   LearningRate 0.0027   Epoch: 16   Global Step: 95190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:12,172-Speed 5645.52 samples/sec   Loss 1.7138   LearningRate 0.0027   Epoch: 16   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:13,982-Speed 5660.72 samples/sec   Loss 1.6734   LearningRate 0.0026   Epoch: 16   Global Step: 95210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:15,809-Speed 5605.78 samples/sec   Loss 1.7914   LearningRate 0.0026   Epoch: 16   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:17,618-Speed 5662.64 samples/sec   Loss 1.6700   LearningRate 0.0026   Epoch: 16   Global Step: 95230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:19,449-Speed 5593.97 samples/sec   Loss 1.6492   LearningRate 0.0026   Epoch: 16   Global Step: 95240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:21,284-Speed 5581.89 samples/sec   Loss 1.8269   LearningRate 0.0026   Epoch: 16   Global Step: 95250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:23,107-Speed 5618.92 samples/sec   Loss 1.7101   LearningRate 0.0026   Epoch: 16   Global Step: 95260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:24,919-Speed 5655.63 samples/sec   Loss 1.7398   LearningRate 0.0026   Epoch: 16   Global Step: 95270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:26,744-Speed 5612.23 samples/sec   Loss 1.5651   LearningRate 0.0026   Epoch: 16   Global Step: 95280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:28,562-Speed 5633.29 samples/sec   Loss 1.7572   LearningRate 0.0026   Epoch: 16   Global Step: 95290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:30,367-Speed 5676.14 samples/sec   Loss 1.6439   LearningRate 0.0026   Epoch: 16   Global Step: 95300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:32,188-Speed 5624.14 samples/sec   Loss 1.6972   LearningRate 0.0026   Epoch: 16   Global Step: 95310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:33,994-Speed 5672.35 samples/sec   Loss 1.7093   LearningRate 0.0026   Epoch: 16   Global Step: 95320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:35,808-Speed 5648.03 samples/sec   Loss 1.6451   LearningRate 0.0026   Epoch: 16   Global Step: 95330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:37,615-Speed 5669.52 samples/sec   Loss 1.7144   LearningRate 0.0026   Epoch: 16   Global Step: 95340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:39,422-Speed 5668.76 samples/sec   Loss 1.7203   LearningRate 0.0026   Epoch: 16   Global Step: 95350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:41,238-Speed 5638.84 samples/sec   Loss 1.6276   LearningRate 0.0026   Epoch: 16   Global Step: 95360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:27:43,067-Speed 5599.79 samples/sec   Loss 1.7180   LearningRate 0.0026   Epoch: 16   Global Step: 95370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:44,989-Speed 5331.77 samples/sec   Loss 1.5989   LearningRate 0.0026   Epoch: 16   Global Step: 95380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:46,876-Speed 5426.75 samples/sec   Loss 1.6926   LearningRate 0.0026   Epoch: 16   Global Step: 95390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:48,682-Speed 5671.68 samples/sec   Loss 1.6426   LearningRate 0.0026   Epoch: 16   Global Step: 95400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:50,492-Speed 5661.78 samples/sec   Loss 1.6701   LearningRate 0.0026   Epoch: 16   Global Step: 95410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:52,301-Speed 5662.31 samples/sec   Loss 1.7381   LearningRate 0.0026   Epoch: 16   Global Step: 95420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:54,109-Speed 5667.00 samples/sec   Loss 1.6864   LearningRate 0.0026   Epoch: 16   Global Step: 95430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:55,924-Speed 5642.42 samples/sec   Loss 1.6100   LearningRate 0.0026   Epoch: 16   Global Step: 95440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:57,734-Speed 5660.15 samples/sec   Loss 1.7506   LearningRate 0.0026   Epoch: 16   Global Step: 95450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:27:59,542-Speed 5666.23 samples/sec   Loss 1.8065   LearningRate 0.0026   Epoch: 16   Global Step: 95460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:01,363-Speed 5623.58 samples/sec   Loss 1.7122   LearningRate 0.0026   Epoch: 16   Global Step: 95470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:03,184-Speed 5625.72 samples/sec   Loss 1.6505   LearningRate 0.0026   Epoch: 16   Global Step: 95480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:05,020-Speed 5580.64 samples/sec   Loss 1.6542   LearningRate 0.0026   Epoch: 16   Global Step: 95490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:06,837-Speed 5635.25 samples/sec   Loss 1.6824   LearningRate 0.0026   Epoch: 16   Global Step: 95500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:08,654-Speed 5638.38 samples/sec   Loss 1.6069   LearningRate 0.0026   Epoch: 16   Global Step: 95510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:10,469-Speed 5643.94 samples/sec   Loss 1.7059   LearningRate 0.0026   Epoch: 16   Global Step: 95520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:12,291-Speed 5622.03 samples/sec   Loss 1.7807   LearningRate 0.0026   Epoch: 16   Global Step: 95530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:14,134-Speed 5558.14 samples/sec   Loss 1.6918   LearningRate 0.0026   Epoch: 16   Global Step: 95540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:15,949-Speed 5642.95 samples/sec   Loss 1.7778   LearningRate 0.0026   Epoch: 16   Global Step: 95550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:17,785-Speed 5579.30 samples/sec   Loss 1.7191   LearningRate 0.0026   Epoch: 16   Global Step: 95560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:19,646-Speed 5505.91 samples/sec   Loss 1.6781   LearningRate 0.0025   Epoch: 16   Global Step: 95570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:21,462-Speed 5641.89 samples/sec   Loss 1.7217   LearningRate 0.0025   Epoch: 16   Global Step: 95580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:23,276-Speed 5644.05 samples/sec   Loss 1.6039   LearningRate 0.0025   Epoch: 16   Global Step: 95590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:25,099-Speed 5620.79 samples/sec   Loss 1.6606   LearningRate 0.0025   Epoch: 16   Global Step: 95600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:26,937-Speed 5573.25 samples/sec   Loss 1.7246   LearningRate 0.0025   Epoch: 16   Global Step: 95610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:28:28,772-Speed 5581.17 samples/sec   Loss 1.6491   LearningRate 0.0025   Epoch: 16   Global Step: 95620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:30,596-Speed 5615.62 samples/sec   Loss 1.7601   LearningRate 0.0025   Epoch: 16   Global Step: 95630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:32,418-Speed 5621.66 samples/sec   Loss 1.5783   LearningRate 0.0025   Epoch: 16   Global Step: 95640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:34,227-Speed 5663.98 samples/sec   Loss 1.6130   LearningRate 0.0025   Epoch: 16   Global Step: 95650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:36,043-Speed 5639.00 samples/sec   Loss 1.5986   LearningRate 0.0025   Epoch: 16   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:37,867-Speed 5616.98 samples/sec   Loss 1.7091   LearningRate 0.0025   Epoch: 16   Global Step: 95670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:39,674-Speed 5669.56 samples/sec   Loss 1.7549   LearningRate 0.0025   Epoch: 16   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:41,485-Speed 5655.29 samples/sec   Loss 1.7229   LearningRate 0.0025   Epoch: 16   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:43,295-Speed 5658.56 samples/sec   Loss 1.6087   LearningRate 0.0025   Epoch: 16   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:45,109-Speed 5648.88 samples/sec   Loss 1.7617   LearningRate 0.0025   Epoch: 16   Global Step: 95710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:46,939-Speed 5596.02 samples/sec   Loss 1.7335   LearningRate 0.0025   Epoch: 16   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:48,755-Speed 5641.16 samples/sec   Loss 1.7092   LearningRate 0.0025   Epoch: 16   Global Step: 95730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:50,576-Speed 5625.72 samples/sec   Loss 1.6399   LearningRate 0.0025   Epoch: 16   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:52,385-Speed 5661.67 samples/sec   Loss 1.6922   LearningRate 0.0025   Epoch: 16   Global Step: 95750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:54,189-Speed 5677.31 samples/sec   Loss 1.6971   LearningRate 0.0025   Epoch: 16   Global Step: 95760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:55,997-Speed 5667.63 samples/sec   Loss 1.7405   LearningRate 0.0025   Epoch: 16   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:57,819-Speed 5621.88 samples/sec   Loss 1.7328   LearningRate 0.0025   Epoch: 16   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:28:59,650-Speed 5594.60 samples/sec   Loss 1.7028   LearningRate 0.0025   Epoch: 16   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:01,471-Speed 5624.57 samples/sec   Loss 1.5968   LearningRate 0.0025   Epoch: 16   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:03,291-Speed 5628.91 samples/sec   Loss 1.7580   LearningRate 0.0025   Epoch: 16   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:05,105-Speed 5647.35 samples/sec   Loss 1.6651   LearningRate 0.0025   Epoch: 16   Global Step: 95820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:06,924-Speed 5630.29 samples/sec   Loss 1.7638   LearningRate 0.0025   Epoch: 16   Global Step: 95830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:08,730-Speed 5672.49 samples/sec   Loss 1.6734   LearningRate 0.0025   Epoch: 16   Global Step: 95840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:10,540-Speed 5660.20 samples/sec   Loss 1.8053   LearningRate 0.0025   Epoch: 16   Global Step: 95850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:12,349-Speed 5660.60 samples/sec   Loss 1.6434   LearningRate 0.0025   Epoch: 16   Global Step: 95860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:14,159-Speed 5659.95 samples/sec   Loss 1.6611   LearningRate 0.0025   Epoch: 16   Global Step: 95870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:15,972-Speed 5649.43 samples/sec   Loss 1.8022   LearningRate 0.0025   Epoch: 16   Global Step: 95880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:17,794-Speed 5622.50 samples/sec   Loss 1.6671   LearningRate 0.0025   Epoch: 16   Global Step: 95890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:19,618-Speed 5615.62 samples/sec   Loss 1.7142   LearningRate 0.0025   Epoch: 16   Global Step: 95900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:21,433-Speed 5645.21 samples/sec   Loss 1.7199   LearningRate 0.0025   Epoch: 16   Global Step: 95910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:23,260-Speed 5605.70 samples/sec   Loss 1.6306   LearningRate 0.0025   Epoch: 16   Global Step: 95920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:25,067-Speed 5670.38 samples/sec   Loss 1.7600   LearningRate 0.0024   Epoch: 16   Global Step: 95930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:26,905-Speed 5573.80 samples/sec   Loss 1.7312   LearningRate 0.0024   Epoch: 16   Global Step: 95940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:28,714-Speed 5659.67 samples/sec   Loss 1.7305   LearningRate 0.0024   Epoch: 16   Global Step: 95950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:29:30,521-Speed 5668.95 samples/sec   Loss 1.7465   LearningRate 0.0024   Epoch: 16   Global Step: 95960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:32,333-Speed 5654.24 samples/sec   Loss 1.6571   LearningRate 0.0024   Epoch: 16   Global Step: 95970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:34,143-Speed 5657.28 samples/sec   Loss 1.6967   LearningRate 0.0024   Epoch: 16   Global Step: 95980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:35,955-Speed 5653.86 samples/sec   Loss 1.5713   LearningRate 0.0024   Epoch: 16   Global Step: 95990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:29:37,761-Speed 5673.32 samples/sec   Loss 1.5802   LearningRate 0.0024   Epoch: 16   Global Step: 96000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:30:03,939-[lfw][96000]XNorm: 22.476795
Training: 2022-04-27 07:30:03,940-[lfw][96000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-27 07:30:03,940-[lfw][96000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:30:34,160-[cfp_fp][96000]XNorm: 21.217829
Training: 2022-04-27 07:30:34,161-[cfp_fp][96000]Accuracy-Flip: 0.97614+-0.00642
Training: 2022-04-27 07:30:34,161-[cfp_fp][96000]Accuracy-Highest: 0.97614
Training: 2022-04-27 07:31:00,243-[agedb_30][96000]XNorm: 22.530527
Training: 2022-04-27 07:31:00,244-[agedb_30][96000]Accuracy-Flip: 0.98050+-0.00799
Training: 2022-04-27 07:31:00,244-[agedb_30][96000]Accuracy-Highest: 0.98167
Training: 2022-04-27 07:31:02,083-Speed 121.44 samples/sec   Loss 1.7249   LearningRate 0.0024   Epoch: 16   Global Step: 96010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:03,894-Speed 5657.16 samples/sec   Loss 1.6462   LearningRate 0.0024   Epoch: 16   Global Step: 96020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:05,720-Speed 5608.13 samples/sec   Loss 1.6859   LearningRate 0.0024   Epoch: 16   Global Step: 96030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:07,562-Speed 5562.89 samples/sec   Loss 1.6874   LearningRate 0.0024   Epoch: 16   Global Step: 96040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:09,368-Speed 5672.06 samples/sec   Loss 1.6453   LearningRate 0.0024   Epoch: 16   Global Step: 96050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:11,181-Speed 5648.66 samples/sec   Loss 1.6614   LearningRate 0.0024   Epoch: 16   Global Step: 96060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:12,996-Speed 5643.09 samples/sec   Loss 1.7513   LearningRate 0.0024   Epoch: 16   Global Step: 96070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:14,824-Speed 5606.14 samples/sec   Loss 1.6619   LearningRate 0.0024   Epoch: 16   Global Step: 96080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:16,631-Speed 5667.22 samples/sec   Loss 1.7497   LearningRate 0.0024   Epoch: 16   Global Step: 96090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:18,444-Speed 5649.65 samples/sec   Loss 1.5791   LearningRate 0.0024   Epoch: 16   Global Step: 96100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:20,263-Speed 5631.49 samples/sec   Loss 1.6241   LearningRate 0.0024   Epoch: 16   Global Step: 96110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:22,065-Speed 5686.17 samples/sec   Loss 1.7423   LearningRate 0.0024   Epoch: 16   Global Step: 96120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:23,904-Speed 5569.84 samples/sec   Loss 1.7259   LearningRate 0.0024   Epoch: 16   Global Step: 96130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:25,712-Speed 5662.48 samples/sec   Loss 1.6485   LearningRate 0.0024   Epoch: 16   Global Step: 96140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:27,582-Speed 5479.14 samples/sec   Loss 1.6065   LearningRate 0.0024   Epoch: 16   Global Step: 96150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:29,407-Speed 5612.65 samples/sec   Loss 1.7213   LearningRate 0.0024   Epoch: 16   Global Step: 96160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:31,233-Speed 5611.36 samples/sec   Loss 1.6510   LearningRate 0.0024   Epoch: 16   Global Step: 96170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:33,041-Speed 5666.60 samples/sec   Loss 1.6226   LearningRate 0.0024   Epoch: 16   Global Step: 96180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:34,857-Speed 5640.52 samples/sec   Loss 1.6994   LearningRate 0.0024   Epoch: 16   Global Step: 96190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:36,664-Speed 5667.59 samples/sec   Loss 1.7110   LearningRate 0.0024   Epoch: 16   Global Step: 96200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:38,472-Speed 5667.92 samples/sec   Loss 1.6883   LearningRate 0.0024   Epoch: 16   Global Step: 96210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:40,284-Speed 5652.75 samples/sec   Loss 1.6940   LearningRate 0.0024   Epoch: 16   Global Step: 96220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:42,108-Speed 5613.68 samples/sec   Loss 1.6559   LearningRate 0.0024   Epoch: 16   Global Step: 96230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:44,019-Speed 5360.58 samples/sec   Loss 1.6693   LearningRate 0.0024   Epoch: 16   Global Step: 96240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:45,892-Speed 5469.79 samples/sec   Loss 1.6954   LearningRate 0.0024   Epoch: 16   Global Step: 96250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:47,699-Speed 5667.81 samples/sec   Loss 1.6699   LearningRate 0.0024   Epoch: 16   Global Step: 96260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:31:49,519-Speed 5628.15 samples/sec   Loss 1.7031   LearningRate 0.0024   Epoch: 16   Global Step: 96270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:51,332-Speed 5650.58 samples/sec   Loss 1.5457   LearningRate 0.0024   Epoch: 16   Global Step: 96280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:53,143-Speed 5657.72 samples/sec   Loss 1.6649   LearningRate 0.0023   Epoch: 16   Global Step: 96290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:54,964-Speed 5625.08 samples/sec   Loss 1.7667   LearningRate 0.0023   Epoch: 16   Global Step: 96300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:56,782-Speed 5634.05 samples/sec   Loss 1.6235   LearningRate 0.0023   Epoch: 16   Global Step: 96310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:31:58,594-Speed 5652.59 samples/sec   Loss 1.7142   LearningRate 0.0023   Epoch: 16   Global Step: 96320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:00,414-Speed 5627.88 samples/sec   Loss 1.7010   LearningRate 0.0023   Epoch: 16   Global Step: 96330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:02,228-Speed 5648.59 samples/sec   Loss 1.7594   LearningRate 0.0023   Epoch: 16   Global Step: 96340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:04,050-Speed 5621.15 samples/sec   Loss 1.7276   LearningRate 0.0023   Epoch: 16   Global Step: 96350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:05,886-Speed 5580.04 samples/sec   Loss 1.7376   LearningRate 0.0023   Epoch: 16   Global Step: 96360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:07,692-Speed 5670.05 samples/sec   Loss 1.7051   LearningRate 0.0023   Epoch: 16   Global Step: 96370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:09,509-Speed 5639.41 samples/sec   Loss 1.7042   LearningRate 0.0023   Epoch: 16   Global Step: 96380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:11,327-Speed 5633.58 samples/sec   Loss 1.7085   LearningRate 0.0023   Epoch: 16   Global Step: 96390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:13,140-Speed 5649.85 samples/sec   Loss 1.6737   LearningRate 0.0023   Epoch: 16   Global Step: 96400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:14,949-Speed 5666.19 samples/sec   Loss 1.6595   LearningRate 0.0023   Epoch: 16   Global Step: 96410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:16,774-Speed 5611.25 samples/sec   Loss 1.7193   LearningRate 0.0023   Epoch: 16   Global Step: 96420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:18,608-Speed 5584.13 samples/sec   Loss 1.5853   LearningRate 0.0023   Epoch: 16   Global Step: 96430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:20,413-Speed 5677.38 samples/sec   Loss 1.6699   LearningRate 0.0023   Epoch: 16   Global Step: 96440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:22,235-Speed 5619.55 samples/sec   Loss 1.6857   LearningRate 0.0023   Epoch: 16   Global Step: 96450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:24,060-Speed 5614.31 samples/sec   Loss 1.7666   LearningRate 0.0023   Epoch: 16   Global Step: 96460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:25,905-Speed 5551.63 samples/sec   Loss 1.6387   LearningRate 0.0023   Epoch: 16   Global Step: 96470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:27,715-Speed 5659.23 samples/sec   Loss 1.6910   LearningRate 0.0023   Epoch: 16   Global Step: 96480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:29,534-Speed 5630.85 samples/sec   Loss 1.6210   LearningRate 0.0023   Epoch: 16   Global Step: 96490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:32:31,373-Speed 5569.31 samples/sec   Loss 1.7376   LearningRate 0.0023   Epoch: 16   Global Step: 96500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:33,196-Speed 5622.04 samples/sec   Loss 1.6775   LearningRate 0.0023   Epoch: 16   Global Step: 96510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:35,027-Speed 5593.09 samples/sec   Loss 1.6996   LearningRate 0.0023   Epoch: 16   Global Step: 96520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:36,847-Speed 5629.65 samples/sec   Loss 1.7240   LearningRate 0.0023   Epoch: 16   Global Step: 96530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:38,680-Speed 5586.58 samples/sec   Loss 1.6999   LearningRate 0.0023   Epoch: 16   Global Step: 96540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:40,505-Speed 5614.63 samples/sec   Loss 1.6617   LearningRate 0.0023   Epoch: 16   Global Step: 96550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:42,333-Speed 5604.11 samples/sec   Loss 1.6656   LearningRate 0.0023   Epoch: 16   Global Step: 96560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:44,161-Speed 5601.74 samples/sec   Loss 1.6267   LearningRate 0.0023   Epoch: 16   Global Step: 96570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:45,977-Speed 5640.27 samples/sec   Loss 1.7105   LearningRate 0.0023   Epoch: 16   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:47,794-Speed 5637.46 samples/sec   Loss 1.6743   LearningRate 0.0023   Epoch: 16   Global Step: 96590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:49,680-Speed 5431.55 samples/sec   Loss 1.7301   LearningRate 0.0023   Epoch: 16   Global Step: 96600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:51,626-Speed 5264.98 samples/sec   Loss 1.6317   LearningRate 0.0023   Epoch: 16   Global Step: 96610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:53,460-Speed 5584.14 samples/sec   Loss 1.6320   LearningRate 0.0023   Epoch: 16   Global Step: 96620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:55,293-Speed 5590.83 samples/sec   Loss 1.6568   LearningRate 0.0023   Epoch: 16   Global Step: 96630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:57,107-Speed 5645.08 samples/sec   Loss 1.6649   LearningRate 0.0023   Epoch: 16   Global Step: 96640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:32:58,933-Speed 5611.62 samples/sec   Loss 1.6638   LearningRate 0.0023   Epoch: 16   Global Step: 96650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:00,804-Speed 5474.30 samples/sec   Loss 1.6025   LearningRate 0.0023   Epoch: 16   Global Step: 96660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:12,919-Speed 845.31 samples/sec   Loss 1.3002   LearningRate 0.0022   Epoch: 17   Global Step: 96670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:14,775-Speed 5521.62 samples/sec   Loss 1.3107   LearningRate 0.0022   Epoch: 17   Global Step: 96680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:16,788-Speed 5089.37 samples/sec   Loss 1.1992   LearningRate 0.0022   Epoch: 17   Global Step: 96690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:18,632-Speed 5554.67 samples/sec   Loss 1.2697   LearningRate 0.0022   Epoch: 17   Global Step: 96700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:33:20,578-Speed 5263.58 samples/sec   Loss 1.2583   LearningRate 0.0022   Epoch: 17   Global Step: 96710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:22,422-Speed 5554.09 samples/sec   Loss 1.2437   LearningRate 0.0022   Epoch: 17   Global Step: 96720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:24,259-Speed 5574.95 samples/sec   Loss 1.1716   LearningRate 0.0022   Epoch: 17   Global Step: 96730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:26,096-Speed 5578.35 samples/sec   Loss 1.2721   LearningRate 0.0022   Epoch: 17   Global Step: 96740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:27,928-Speed 5590.11 samples/sec   Loss 1.1610   LearningRate 0.0022   Epoch: 17   Global Step: 96750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:29,756-Speed 5606.40 samples/sec   Loss 1.2472   LearningRate 0.0022   Epoch: 17   Global Step: 96760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:31,590-Speed 5582.75 samples/sec   Loss 1.2153   LearningRate 0.0022   Epoch: 17   Global Step: 96770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:33,441-Speed 5534.39 samples/sec   Loss 1.3194   LearningRate 0.0022   Epoch: 17   Global Step: 96780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:35,309-Speed 5483.48 samples/sec   Loss 1.2484   LearningRate 0.0022   Epoch: 17   Global Step: 96790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:37,143-Speed 5588.14 samples/sec   Loss 1.1824   LearningRate 0.0022   Epoch: 17   Global Step: 96800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:38,995-Speed 5531.06 samples/sec   Loss 1.2667   LearningRate 0.0022   Epoch: 17   Global Step: 96810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:40,813-Speed 5632.91 samples/sec   Loss 1.1601   LearningRate 0.0022   Epoch: 17   Global Step: 96820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:42,636-Speed 5617.15 samples/sec   Loss 1.2921   LearningRate 0.0022   Epoch: 17   Global Step: 96830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:44,462-Speed 5610.59 samples/sec   Loss 1.3051   LearningRate 0.0022   Epoch: 17   Global Step: 96840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:46,291-Speed 5601.24 samples/sec   Loss 1.2392   LearningRate 0.0022   Epoch: 17   Global Step: 96850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:48,140-Speed 5540.49 samples/sec   Loss 1.2897   LearningRate 0.0022   Epoch: 17   Global Step: 96860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:49,969-Speed 5599.53 samples/sec   Loss 1.2265   LearningRate 0.0022   Epoch: 17   Global Step: 96870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:33:51,779-Speed 5661.39 samples/sec   Loss 1.2452   LearningRate 0.0022   Epoch: 17   Global Step: 96880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:33:53,601-Speed 5621.25 samples/sec   Loss 1.1845   LearningRate 0.0022   Epoch: 17   Global Step: 96890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:33:55,426-Speed 5614.43 samples/sec   Loss 1.2852   LearningRate 0.0022   Epoch: 17   Global Step: 96900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:33:57,262-Speed 5576.37 samples/sec   Loss 1.3398   LearningRate 0.0022   Epoch: 17   Global Step: 96910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:33:59,099-Speed 5577.20 samples/sec   Loss 1.2652   LearningRate 0.0022   Epoch: 17   Global Step: 96920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:00,930-Speed 5594.71 samples/sec   Loss 1.2483   LearningRate 0.0022   Epoch: 17   Global Step: 96930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:02,757-Speed 5605.41 samples/sec   Loss 1.1881   LearningRate 0.0022   Epoch: 17   Global Step: 96940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:04,592-Speed 5582.48 samples/sec   Loss 1.3068   LearningRate 0.0022   Epoch: 17   Global Step: 96950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:06,431-Speed 5570.68 samples/sec   Loss 1.2932   LearningRate 0.0022   Epoch: 17   Global Step: 96960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:08,259-Speed 5606.45 samples/sec   Loss 1.2477   LearningRate 0.0022   Epoch: 17   Global Step: 96970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:10,104-Speed 5549.43 samples/sec   Loss 1.3392   LearningRate 0.0022   Epoch: 17   Global Step: 96980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:11,931-Speed 5606.40 samples/sec   Loss 1.1695   LearningRate 0.0022   Epoch: 17   Global Step: 96990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:13,777-Speed 5551.10 samples/sec   Loss 1.3732   LearningRate 0.0022   Epoch: 17   Global Step: 97000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:15,639-Speed 5499.36 samples/sec   Loss 1.2661   LearningRate 0.0022   Epoch: 17   Global Step: 97010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:17,476-Speed 5576.23 samples/sec   Loss 1.2625   LearningRate 0.0022   Epoch: 17   Global Step: 97020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:19,298-Speed 5621.74 samples/sec   Loss 1.3229   LearningRate 0.0022   Epoch: 17   Global Step: 97030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:21,149-Speed 5535.06 samples/sec   Loss 1.2562   LearningRate 0.0022   Epoch: 17   Global Step: 97040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:22,990-Speed 5565.43 samples/sec   Loss 1.2702   LearningRate 0.0021   Epoch: 17   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:24,817-Speed 5606.26 samples/sec   Loss 1.3009   LearningRate 0.0021   Epoch: 17   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:26,715-Speed 5396.59 samples/sec   Loss 1.2674   LearningRate 0.0021   Epoch: 17   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:28,533-Speed 5635.49 samples/sec   Loss 1.2800   LearningRate 0.0021   Epoch: 17   Global Step: 97080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:30,357-Speed 5616.47 samples/sec   Loss 1.2677   LearningRate 0.0021   Epoch: 17   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:32,195-Speed 5574.61 samples/sec   Loss 1.2243   LearningRate 0.0021   Epoch: 17   Global Step: 97100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:34,012-Speed 5635.12 samples/sec   Loss 1.2002   LearningRate 0.0021   Epoch: 17   Global Step: 97110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:35,835-Speed 5618.69 samples/sec   Loss 1.2346   LearningRate 0.0021   Epoch: 17   Global Step: 97120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:37,684-Speed 5541.18 samples/sec   Loss 1.3136   LearningRate 0.0021   Epoch: 17   Global Step: 97130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:39,552-Speed 5481.75 samples/sec   Loss 1.3275   LearningRate 0.0021   Epoch: 17   Global Step: 97140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:34:41,452-Speed 5392.83 samples/sec   Loss 1.2408   LearningRate 0.0021   Epoch: 17   Global Step: 97150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:43,294-Speed 5560.79 samples/sec   Loss 1.3131   LearningRate 0.0021   Epoch: 17   Global Step: 97160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:45,149-Speed 5520.68 samples/sec   Loss 1.2782   LearningRate 0.0021   Epoch: 17   Global Step: 97170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:46,995-Speed 5548.78 samples/sec   Loss 1.3043   LearningRate 0.0021   Epoch: 17   Global Step: 97180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:48,841-Speed 5551.77 samples/sec   Loss 1.3228   LearningRate 0.0021   Epoch: 17   Global Step: 97190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:50,719-Speed 5455.51 samples/sec   Loss 1.2443   LearningRate 0.0021   Epoch: 17   Global Step: 97200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:52,607-Speed 5424.48 samples/sec   Loss 1.1875   LearningRate 0.0021   Epoch: 17   Global Step: 97210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:54,448-Speed 5563.76 samples/sec   Loss 1.2715   LearningRate 0.0021   Epoch: 17   Global Step: 97220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:56,309-Speed 5503.75 samples/sec   Loss 1.3186   LearningRate 0.0021   Epoch: 17   Global Step: 97230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:58,133-Speed 5617.10 samples/sec   Loss 1.2190   LearningRate 0.0021   Epoch: 17   Global Step: 97240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:34:59,985-Speed 5530.10 samples/sec   Loss 1.3437   LearningRate 0.0021   Epoch: 17   Global Step: 97250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:01,840-Speed 5522.11 samples/sec   Loss 1.2527   LearningRate 0.0021   Epoch: 17   Global Step: 97260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:03,668-Speed 5603.61 samples/sec   Loss 1.2762   LearningRate 0.0021   Epoch: 17   Global Step: 97270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:05,515-Speed 5546.03 samples/sec   Loss 1.2859   LearningRate 0.0021   Epoch: 17   Global Step: 97280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:07,356-Speed 5565.84 samples/sec   Loss 1.3372   LearningRate 0.0021   Epoch: 17   Global Step: 97290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:09,197-Speed 5561.72 samples/sec   Loss 1.2303   LearningRate 0.0021   Epoch: 17   Global Step: 97300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:11,076-Speed 5453.96 samples/sec   Loss 1.2529   LearningRate 0.0021   Epoch: 17   Global Step: 97310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:12,913-Speed 5574.16 samples/sec   Loss 1.2128   LearningRate 0.0021   Epoch: 17   Global Step: 97320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:14,749-Speed 5580.86 samples/sec   Loss 1.2748   LearningRate 0.0021   Epoch: 17   Global Step: 97330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:16,585-Speed 5579.85 samples/sec   Loss 1.3097   LearningRate 0.0021   Epoch: 17   Global Step: 97340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:18,415-Speed 5594.80 samples/sec   Loss 1.1626   LearningRate 0.0021   Epoch: 17   Global Step: 97350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:20,265-Speed 5538.19 samples/sec   Loss 1.3627   LearningRate 0.0021   Epoch: 17   Global Step: 97360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:35:22,105-Speed 5566.33 samples/sec   Loss 1.2750   LearningRate 0.0021   Epoch: 17   Global Step: 97370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:23,959-Speed 5525.03 samples/sec   Loss 1.3242   LearningRate 0.0021   Epoch: 17   Global Step: 97380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:25,807-Speed 5544.35 samples/sec   Loss 1.2799   LearningRate 0.0021   Epoch: 17   Global Step: 97390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:27,630-Speed 5617.52 samples/sec   Loss 1.3131   LearningRate 0.0021   Epoch: 17   Global Step: 97400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:29,470-Speed 5568.22 samples/sec   Loss 1.2840   LearningRate 0.0021   Epoch: 17   Global Step: 97410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:31,296-Speed 5608.21 samples/sec   Loss 1.2326   LearningRate 0.0021   Epoch: 17   Global Step: 97420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:33,126-Speed 5599.44 samples/sec   Loss 1.2945   LearningRate 0.0021   Epoch: 17   Global Step: 97430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:34,956-Speed 5598.30 samples/sec   Loss 1.3289   LearningRate 0.0020   Epoch: 17   Global Step: 97440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:36,793-Speed 5574.62 samples/sec   Loss 1.3437   LearningRate 0.0020   Epoch: 17   Global Step: 97450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:38,666-Speed 5471.01 samples/sec   Loss 1.2491   LearningRate 0.0020   Epoch: 17   Global Step: 97460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:40,570-Speed 5378.65 samples/sec   Loss 1.2939   LearningRate 0.0020   Epoch: 17   Global Step: 97470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:42,417-Speed 5546.34 samples/sec   Loss 1.3328   LearningRate 0.0020   Epoch: 17   Global Step: 97480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:44,255-Speed 5571.36 samples/sec   Loss 1.2946   LearningRate 0.0020   Epoch: 17   Global Step: 97490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:46,079-Speed 5617.54 samples/sec   Loss 1.2653   LearningRate 0.0020   Epoch: 17   Global Step: 97500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:47,900-Speed 5624.06 samples/sec   Loss 1.2428   LearningRate 0.0020   Epoch: 17   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:49,740-Speed 5567.92 samples/sec   Loss 1.2196   LearningRate 0.0020   Epoch: 17   Global Step: 97520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:51,580-Speed 5567.78 samples/sec   Loss 1.2269   LearningRate 0.0020   Epoch: 17   Global Step: 97530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:53,416-Speed 5576.96 samples/sec   Loss 1.2854   LearningRate 0.0020   Epoch: 17   Global Step: 97540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:55,294-Speed 5456.50 samples/sec   Loss 1.2571   LearningRate 0.0020   Epoch: 17   Global Step: 97550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:57,137-Speed 5558.00 samples/sec   Loss 1.3102   LearningRate 0.0020   Epoch: 17   Global Step: 97560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:35:58,965-Speed 5606.26 samples/sec   Loss 1.2908   LearningRate 0.0020   Epoch: 17   Global Step: 97570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:36:00,789-Speed 5614.18 samples/sec   Loss 1.3055   LearningRate 0.0020   Epoch: 17   Global Step: 97580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:02,638-Speed 5541.46 samples/sec   Loss 1.2589   LearningRate 0.0020   Epoch: 17   Global Step: 97590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:04,484-Speed 5548.97 samples/sec   Loss 1.2938   LearningRate 0.0020   Epoch: 17   Global Step: 97600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:06,341-Speed 5515.17 samples/sec   Loss 1.2903   LearningRate 0.0020   Epoch: 17   Global Step: 97610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:08,219-Speed 5454.24 samples/sec   Loss 1.1949   LearningRate 0.0020   Epoch: 17   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:10,127-Speed 5368.63 samples/sec   Loss 1.2806   LearningRate 0.0020   Epoch: 17   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:11,961-Speed 5585.88 samples/sec   Loss 1.3294   LearningRate 0.0020   Epoch: 17   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:13,829-Speed 5483.83 samples/sec   Loss 1.3213   LearningRate 0.0020   Epoch: 17   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:15,744-Speed 5351.04 samples/sec   Loss 1.2484   LearningRate 0.0020   Epoch: 17   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:17,609-Speed 5490.81 samples/sec   Loss 1.2291   LearningRate 0.0020   Epoch: 17   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:19,425-Speed 5640.35 samples/sec   Loss 1.3257   LearningRate 0.0020   Epoch: 17   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:21,287-Speed 5502.34 samples/sec   Loss 1.3012   LearningRate 0.0020   Epoch: 17   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:23,115-Speed 5602.11 samples/sec   Loss 1.2342   LearningRate 0.0020   Epoch: 17   Global Step: 97700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:24,949-Speed 5585.30 samples/sec   Loss 1.3076   LearningRate 0.0020   Epoch: 17   Global Step: 97710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:26,790-Speed 5564.85 samples/sec   Loss 1.2492   LearningRate 0.0020   Epoch: 17   Global Step: 97720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:28,618-Speed 5603.44 samples/sec   Loss 1.2582   LearningRate 0.0020   Epoch: 17   Global Step: 97730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:30,463-Speed 5550.69 samples/sec   Loss 1.1899   LearningRate 0.0020   Epoch: 17   Global Step: 97740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:32,328-Speed 5494.80 samples/sec   Loss 1.3297   LearningRate 0.0020   Epoch: 17   Global Step: 97750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:34,188-Speed 5505.25 samples/sec   Loss 1.2309   LearningRate 0.0020   Epoch: 17   Global Step: 97760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:36,013-Speed 5614.33 samples/sec   Loss 1.3133   LearningRate 0.0020   Epoch: 17   Global Step: 97770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:37,833-Speed 5627.78 samples/sec   Loss 1.3590   LearningRate 0.0020   Epoch: 17   Global Step: 97780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:39,664-Speed 5593.61 samples/sec   Loss 1.3091   LearningRate 0.0020   Epoch: 17   Global Step: 97790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:41,518-Speed 5527.66 samples/sec   Loss 1.2165   LearningRate 0.0020   Epoch: 17   Global Step: 97800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:43,352-Speed 5583.65 samples/sec   Loss 1.2979   LearningRate 0.0020   Epoch: 17   Global Step: 97810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:45,222-Speed 5478.76 samples/sec   Loss 1.3133   LearningRate 0.0020   Epoch: 17   Global Step: 97820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:47,042-Speed 5626.33 samples/sec   Loss 1.2149   LearningRate 0.0020   Epoch: 17   Global Step: 97830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:48,862-Speed 5628.42 samples/sec   Loss 1.2668   LearningRate 0.0019   Epoch: 17   Global Step: 97840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:36:50,673-Speed 5657.23 samples/sec   Loss 1.3308   LearningRate 0.0019   Epoch: 17   Global Step: 97850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:36:52,498-Speed 5612.80 samples/sec   Loss 1.2635   LearningRate 0.0019   Epoch: 17   Global Step: 97860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:36:54,408-Speed 5362.49 samples/sec   Loss 1.2944   LearningRate 0.0019   Epoch: 17   Global Step: 97870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:36:56,241-Speed 5589.43 samples/sec   Loss 1.3054   LearningRate 0.0019   Epoch: 17   Global Step: 97880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:36:58,064-Speed 5616.68 samples/sec   Loss 1.2533   LearningRate 0.0019   Epoch: 17   Global Step: 97890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:36:59,921-Speed 5516.87 samples/sec   Loss 1.3249   LearningRate 0.0019   Epoch: 17   Global Step: 97900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:37:01,762-Speed 5564.24 samples/sec   Loss 1.2586   LearningRate 0.0019   Epoch: 17   Global Step: 97910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:37:03,602-Speed 5567.73 samples/sec   Loss 1.2930   LearningRate 0.0019   Epoch: 17   Global Step: 97920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:37:05,421-Speed 5632.28 samples/sec   Loss 1.2627   LearningRate 0.0019   Epoch: 17   Global Step: 97930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:37:07,248-Speed 5604.84 samples/sec   Loss 1.2996   LearningRate 0.0019   Epoch: 17   Global Step: 97940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:37:09,068-Speed 5630.31 samples/sec   Loss 1.2713   LearningRate 0.0019   Epoch: 17   Global Step: 97950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:37:10,899-Speed 5594.81 samples/sec   Loss 1.2432   LearningRate 0.0019   Epoch: 17   Global Step: 97960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:37:12,754-Speed 5522.70 samples/sec   Loss 1.3261   LearningRate 0.0019   Epoch: 17   Global Step: 97970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:37:14,630-Speed 5457.41 samples/sec   Loss 1.2472   LearningRate 0.0019   Epoch: 17   Global Step: 97980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:37:16,497-Speed 5487.22 samples/sec   Loss 1.3022   LearningRate 0.0019   Epoch: 17   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:37:18,326-Speed 5601.60 samples/sec   Loss 1.2835   LearningRate 0.0019   Epoch: 17   Global Step: 98000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:37:44,383-[lfw][98000]XNorm: 22.298069
Training: 2022-04-27 07:37:44,383-[lfw][98000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-27 07:37:44,384-[lfw][98000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:38:14,592-[cfp_fp][98000]XNorm: 21.089435
Training: 2022-04-27 07:38:14,592-[cfp_fp][98000]Accuracy-Flip: 0.97929+-0.00674
Training: 2022-04-27 07:38:14,593-[cfp_fp][98000]Accuracy-Highest: 0.97929
Training: 2022-04-27 07:38:40,682-[agedb_30][98000]XNorm: 22.387062
Training: 2022-04-27 07:38:40,682-[agedb_30][98000]Accuracy-Flip: 0.98183+-0.00529
Training: 2022-04-27 07:38:40,683-[agedb_30][98000]Accuracy-Highest: 0.98183
Training: 2022-04-27 07:38:42,530-Speed 121.61 samples/sec   Loss 1.2492   LearningRate 0.0019   Epoch: 17   Global Step: 98010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:44,361-Speed 5593.36 samples/sec   Loss 1.3550   LearningRate 0.0019   Epoch: 17   Global Step: 98020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:46,188-Speed 5607.85 samples/sec   Loss 1.2428   LearningRate 0.0019   Epoch: 17   Global Step: 98030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:48,022-Speed 5584.24 samples/sec   Loss 1.2606   LearningRate 0.0019   Epoch: 17   Global Step: 98040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:49,838-Speed 5642.88 samples/sec   Loss 1.3021   LearningRate 0.0019   Epoch: 17   Global Step: 98050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:51,686-Speed 5541.89 samples/sec   Loss 1.3133   LearningRate 0.0019   Epoch: 17   Global Step: 98060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:53,518-Speed 5591.88 samples/sec   Loss 1.3800   LearningRate 0.0019   Epoch: 17   Global Step: 98070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:55,364-Speed 5548.19 samples/sec   Loss 1.2590   LearningRate 0.0019   Epoch: 17   Global Step: 98080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:57,183-Speed 5630.11 samples/sec   Loss 1.3365   LearningRate 0.0019   Epoch: 17   Global Step: 98090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:38:59,014-Speed 5596.00 samples/sec   Loss 1.2442   LearningRate 0.0019   Epoch: 17   Global Step: 98100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:00,839-Speed 5614.91 samples/sec   Loss 1.3150   LearningRate 0.0019   Epoch: 17   Global Step: 98110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:02,679-Speed 5564.53 samples/sec   Loss 1.2674   LearningRate 0.0019   Epoch: 17   Global Step: 98120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:04,504-Speed 5613.00 samples/sec   Loss 1.3033   LearningRate 0.0019   Epoch: 17   Global Step: 98130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:06,334-Speed 5597.29 samples/sec   Loss 1.2358   LearningRate 0.0019   Epoch: 17   Global Step: 98140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:08,169-Speed 5583.82 samples/sec   Loss 1.2367   LearningRate 0.0019   Epoch: 17   Global Step: 98150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:10,017-Speed 5543.60 samples/sec   Loss 1.3444   LearningRate 0.0019   Epoch: 17   Global Step: 98160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:11,837-Speed 5625.84 samples/sec   Loss 1.3116   LearningRate 0.0019   Epoch: 17   Global Step: 98170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:13,667-Speed 5596.95 samples/sec   Loss 1.2066   LearningRate 0.0019   Epoch: 17   Global Step: 98180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:15,497-Speed 5597.48 samples/sec   Loss 1.2792   LearningRate 0.0019   Epoch: 17   Global Step: 98190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:17,356-Speed 5510.26 samples/sec   Loss 1.3050   LearningRate 0.0019   Epoch: 17   Global Step: 98200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:19,237-Speed 5448.02 samples/sec   Loss 1.2645   LearningRate 0.0019   Epoch: 17   Global Step: 98210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:21,079-Speed 5562.07 samples/sec   Loss 1.3254   LearningRate 0.0019   Epoch: 17   Global Step: 98220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:39:22,911-Speed 5590.28 samples/sec   Loss 1.2327   LearningRate 0.0019   Epoch: 17   Global Step: 98230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:24,749-Speed 5574.18 samples/sec   Loss 1.3114   LearningRate 0.0019   Epoch: 17   Global Step: 98240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:26,565-Speed 5640.19 samples/sec   Loss 1.3041   LearningRate 0.0019   Epoch: 17   Global Step: 98250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:28,406-Speed 5563.91 samples/sec   Loss 1.3206   LearningRate 0.0018   Epoch: 17   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:30,268-Speed 5502.34 samples/sec   Loss 1.2936   LearningRate 0.0018   Epoch: 17   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:32,091-Speed 5618.14 samples/sec   Loss 1.2433   LearningRate 0.0018   Epoch: 17   Global Step: 98280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:33,932-Speed 5562.69 samples/sec   Loss 1.3531   LearningRate 0.0018   Epoch: 17   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:35,781-Speed 5540.12 samples/sec   Loss 1.2284   LearningRate 0.0018   Epoch: 17   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:37,596-Speed 5643.84 samples/sec   Loss 1.2622   LearningRate 0.0018   Epoch: 17   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:39,436-Speed 5566.55 samples/sec   Loss 1.3174   LearningRate 0.0018   Epoch: 17   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:41,289-Speed 5529.76 samples/sec   Loss 1.3257   LearningRate 0.0018   Epoch: 17   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:43,115-Speed 5608.29 samples/sec   Loss 1.3406   LearningRate 0.0018   Epoch: 17   Global Step: 98340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:44,947-Speed 5592.25 samples/sec   Loss 1.2976   LearningRate 0.0018   Epoch: 17   Global Step: 98350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:46,804-Speed 5516.19 samples/sec   Loss 1.3832   LearningRate 0.0018   Epoch: 17   Global Step: 98360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:48,655-Speed 5533.15 samples/sec   Loss 1.3060   LearningRate 0.0018   Epoch: 17   Global Step: 98370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:50,476-Speed 5627.11 samples/sec   Loss 1.2981   LearningRate 0.0018   Epoch: 17   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:52,317-Speed 5563.73 samples/sec   Loss 1.2955   LearningRate 0.0018   Epoch: 17   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:54,145-Speed 5601.93 samples/sec   Loss 1.2935   LearningRate 0.0018   Epoch: 17   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:55,966-Speed 5626.07 samples/sec   Loss 1.3753   LearningRate 0.0018   Epoch: 17   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:57,793-Speed 5605.20 samples/sec   Loss 1.3170   LearningRate 0.0018   Epoch: 17   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:39:59,613-Speed 5630.02 samples/sec   Loss 1.1996   LearningRate 0.0018   Epoch: 17   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:01,452-Speed 5570.65 samples/sec   Loss 1.3549   LearningRate 0.0018   Epoch: 17   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:03,276-Speed 5616.85 samples/sec   Loss 1.1759   LearningRate 0.0018   Epoch: 17   Global Step: 98450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:05,103-Speed 5607.45 samples/sec   Loss 1.3662   LearningRate 0.0018   Epoch: 17   Global Step: 98460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:06,926-Speed 5617.54 samples/sec   Loss 1.2885   LearningRate 0.0018   Epoch: 17   Global Step: 98470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:08,749-Speed 5621.41 samples/sec   Loss 1.3065   LearningRate 0.0018   Epoch: 17   Global Step: 98480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:10,590-Speed 5562.18 samples/sec   Loss 1.2946   LearningRate 0.0018   Epoch: 17   Global Step: 98490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:12,408-Speed 5635.01 samples/sec   Loss 1.3078   LearningRate 0.0018   Epoch: 17   Global Step: 98500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:14,241-Speed 5588.74 samples/sec   Loss 1.3384   LearningRate 0.0018   Epoch: 17   Global Step: 98510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:16,060-Speed 5630.75 samples/sec   Loss 1.3110   LearningRate 0.0018   Epoch: 17   Global Step: 98520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:17,887-Speed 5605.41 samples/sec   Loss 1.3242   LearningRate 0.0018   Epoch: 17   Global Step: 98530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:19,716-Speed 5601.37 samples/sec   Loss 1.2069   LearningRate 0.0018   Epoch: 17   Global Step: 98540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:21,530-Speed 5649.70 samples/sec   Loss 1.2603   LearningRate 0.0018   Epoch: 17   Global Step: 98550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:23,363-Speed 5589.04 samples/sec   Loss 1.3171   LearningRate 0.0018   Epoch: 17   Global Step: 98560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:25,187-Speed 5615.79 samples/sec   Loss 1.4091   LearningRate 0.0018   Epoch: 17   Global Step: 98570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:27,023-Speed 5579.69 samples/sec   Loss 1.2954   LearningRate 0.0018   Epoch: 17   Global Step: 98580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:28,871-Speed 5543.33 samples/sec   Loss 1.3214   LearningRate 0.0018   Epoch: 17   Global Step: 98590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:30,691-Speed 5628.09 samples/sec   Loss 1.3229   LearningRate 0.0018   Epoch: 17   Global Step: 98600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:32,507-Speed 5639.57 samples/sec   Loss 1.3132   LearningRate 0.0018   Epoch: 17   Global Step: 98610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:34,324-Speed 5637.70 samples/sec   Loss 1.3014   LearningRate 0.0018   Epoch: 17   Global Step: 98620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:36,144-Speed 5629.49 samples/sec   Loss 1.3285   LearningRate 0.0018   Epoch: 17   Global Step: 98630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:37,964-Speed 5626.70 samples/sec   Loss 1.1698   LearningRate 0.0018   Epoch: 17   Global Step: 98640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:39,799-Speed 5584.84 samples/sec   Loss 1.2358   LearningRate 0.0018   Epoch: 17   Global Step: 98650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:41,638-Speed 5567.37 samples/sec   Loss 1.3179   LearningRate 0.0018   Epoch: 17   Global Step: 98660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:43,504-Speed 5490.11 samples/sec   Loss 1.2742   LearningRate 0.0018   Epoch: 17   Global Step: 98670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:45,350-Speed 5551.39 samples/sec   Loss 1.3358   LearningRate 0.0017   Epoch: 17   Global Step: 98680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:47,196-Speed 5548.14 samples/sec   Loss 1.3580   LearningRate 0.0017   Epoch: 17   Global Step: 98690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:49,045-Speed 5541.58 samples/sec   Loss 1.3151   LearningRate 0.0017   Epoch: 17   Global Step: 98700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:40:50,855-Speed 5659.95 samples/sec   Loss 1.3889   LearningRate 0.0017   Epoch: 17   Global Step: 98710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:52,749-Speed 5406.73 samples/sec   Loss 1.2734   LearningRate 0.0017   Epoch: 17   Global Step: 98720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:54,603-Speed 5526.38 samples/sec   Loss 1.2922   LearningRate 0.0017   Epoch: 17   Global Step: 98730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:56,463-Speed 5508.51 samples/sec   Loss 1.3051   LearningRate 0.0017   Epoch: 17   Global Step: 98740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:40:58,287-Speed 5614.48 samples/sec   Loss 1.2489   LearningRate 0.0017   Epoch: 17   Global Step: 98750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:00,147-Speed 5506.73 samples/sec   Loss 1.3409   LearningRate 0.0017   Epoch: 17   Global Step: 98760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:01,977-Speed 5596.60 samples/sec   Loss 1.3676   LearningRate 0.0017   Epoch: 17   Global Step: 98770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:03,815-Speed 5575.21 samples/sec   Loss 1.2685   LearningRate 0.0017   Epoch: 17   Global Step: 98780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:05,662-Speed 5546.90 samples/sec   Loss 1.3131   LearningRate 0.0017   Epoch: 17   Global Step: 98790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:07,492-Speed 5597.53 samples/sec   Loss 1.2961   LearningRate 0.0017   Epoch: 17   Global Step: 98800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:09,323-Speed 5593.60 samples/sec   Loss 1.4315   LearningRate 0.0017   Epoch: 17   Global Step: 98810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:11,160-Speed 5577.63 samples/sec   Loss 1.3493   LearningRate 0.0017   Epoch: 17   Global Step: 98820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:13,042-Speed 5440.76 samples/sec   Loss 1.3737   LearningRate 0.0017   Epoch: 17   Global Step: 98830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:14,900-Speed 5513.28 samples/sec   Loss 1.3014   LearningRate 0.0017   Epoch: 17   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:16,726-Speed 5611.27 samples/sec   Loss 1.3413   LearningRate 0.0017   Epoch: 17   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:18,544-Speed 5632.86 samples/sec   Loss 1.2841   LearningRate 0.0017   Epoch: 17   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:20,362-Speed 5636.28 samples/sec   Loss 1.3820   LearningRate 0.0017   Epoch: 17   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:22,196-Speed 5583.64 samples/sec   Loss 1.2383   LearningRate 0.0017   Epoch: 17   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:24,032-Speed 5580.45 samples/sec   Loss 1.2663   LearningRate 0.0017   Epoch: 17   Global Step: 98890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:25,853-Speed 5623.77 samples/sec   Loss 1.3051   LearningRate 0.0017   Epoch: 17   Global Step: 98900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:27,697-Speed 5556.20 samples/sec   Loss 1.2837   LearningRate 0.0017   Epoch: 17   Global Step: 98910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:29,546-Speed 5538.87 samples/sec   Loss 1.3631   LearningRate 0.0017   Epoch: 17   Global Step: 98920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:31,387-Speed 5564.56 samples/sec   Loss 1.3501   LearningRate 0.0017   Epoch: 17   Global Step: 98930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:33,228-Speed 5564.75 samples/sec   Loss 1.2666   LearningRate 0.0017   Epoch: 17   Global Step: 98940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:35,057-Speed 5600.35 samples/sec   Loss 1.2433   LearningRate 0.0017   Epoch: 17   Global Step: 98950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:36,865-Speed 5667.15 samples/sec   Loss 1.3612   LearningRate 0.0017   Epoch: 17   Global Step: 98960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:38,688-Speed 5618.92 samples/sec   Loss 1.3319   LearningRate 0.0017   Epoch: 17   Global Step: 98970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:40,506-Speed 5631.78 samples/sec   Loss 1.2259   LearningRate 0.0017   Epoch: 17   Global Step: 98980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:42,336-Speed 5599.71 samples/sec   Loss 1.3287   LearningRate 0.0017   Epoch: 17   Global Step: 98990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:44,156-Speed 5628.57 samples/sec   Loss 1.2853   LearningRate 0.0017   Epoch: 17   Global Step: 99000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:45,991-Speed 5579.36 samples/sec   Loss 1.3143   LearningRate 0.0017   Epoch: 17   Global Step: 99010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:47,828-Speed 5577.05 samples/sec   Loss 1.2461   LearningRate 0.0017   Epoch: 17   Global Step: 99020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:49,644-Speed 5640.50 samples/sec   Loss 1.3418   LearningRate 0.0017   Epoch: 17   Global Step: 99030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:51,467-Speed 5619.83 samples/sec   Loss 1.2857   LearningRate 0.0017   Epoch: 17   Global Step: 99040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:53,287-Speed 5630.15 samples/sec   Loss 1.3209   LearningRate 0.0017   Epoch: 17   Global Step: 99050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:41:55,104-Speed 5637.15 samples/sec   Loss 1.2910   LearningRate 0.0017   Epoch: 17   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:56,919-Speed 5645.96 samples/sec   Loss 1.3311   LearningRate 0.0017   Epoch: 17   Global Step: 99070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:41:58,818-Speed 5394.16 samples/sec   Loss 1.2527   LearningRate 0.0017   Epoch: 17   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:00,641-Speed 5618.96 samples/sec   Loss 1.2427   LearningRate 0.0017   Epoch: 17   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:02,478-Speed 5574.24 samples/sec   Loss 1.2682   LearningRate 0.0017   Epoch: 17   Global Step: 99100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:04,326-Speed 5544.28 samples/sec   Loss 1.2853   LearningRate 0.0017   Epoch: 17   Global Step: 99110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:06,188-Speed 5502.44 samples/sec   Loss 1.3831   LearningRate 0.0016   Epoch: 17   Global Step: 99120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:08,028-Speed 5565.29 samples/sec   Loss 1.4121   LearningRate 0.0016   Epoch: 17   Global Step: 99130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:09,855-Speed 5608.62 samples/sec   Loss 1.2986   LearningRate 0.0016   Epoch: 17   Global Step: 99140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:11,698-Speed 5558.51 samples/sec   Loss 1.3315   LearningRate 0.0016   Epoch: 17   Global Step: 99150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:13,503-Speed 5673.17 samples/sec   Loss 1.3619   LearningRate 0.0016   Epoch: 17   Global Step: 99160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:15,343-Speed 5568.34 samples/sec   Loss 1.3670   LearningRate 0.0016   Epoch: 17   Global Step: 99170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:17,182-Speed 5570.52 samples/sec   Loss 1.2413   LearningRate 0.0016   Epoch: 17   Global Step: 99180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:19,032-Speed 5536.30 samples/sec   Loss 1.3004   LearningRate 0.0016   Epoch: 17   Global Step: 99190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:20,884-Speed 5528.75 samples/sec   Loss 1.2808   LearningRate 0.0016   Epoch: 17   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:22,710-Speed 5611.24 samples/sec   Loss 1.2948   LearningRate 0.0016   Epoch: 17   Global Step: 99210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:24,530-Speed 5627.22 samples/sec   Loss 1.2813   LearningRate 0.0016   Epoch: 17   Global Step: 99220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:26,368-Speed 5574.56 samples/sec   Loss 1.4011   LearningRate 0.0016   Epoch: 17   Global Step: 99230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:28,238-Speed 5476.51 samples/sec   Loss 1.3691   LearningRate 0.0016   Epoch: 17   Global Step: 99240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:30,084-Speed 5550.37 samples/sec   Loss 1.3215   LearningRate 0.0016   Epoch: 17   Global Step: 99250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:31,930-Speed 5547.82 samples/sec   Loss 1.3273   LearningRate 0.0016   Epoch: 17   Global Step: 99260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:33,764-Speed 5586.31 samples/sec   Loss 1.2487   LearningRate 0.0016   Epoch: 17   Global Step: 99270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:35,597-Speed 5590.19 samples/sec   Loss 1.3617   LearningRate 0.0016   Epoch: 17   Global Step: 99280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:37,425-Speed 5602.89 samples/sec   Loss 1.3588   LearningRate 0.0016   Epoch: 17   Global Step: 99290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:39,259-Speed 5584.36 samples/sec   Loss 1.3203   LearningRate 0.0016   Epoch: 17   Global Step: 99300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:41,092-Speed 5589.04 samples/sec   Loss 1.2628   LearningRate 0.0016   Epoch: 17   Global Step: 99310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:42:42,911-Speed 5632.09 samples/sec   Loss 1.3324   LearningRate 0.0016   Epoch: 17   Global Step: 99320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:44,748-Speed 5575.01 samples/sec   Loss 1.2717   LearningRate 0.0016   Epoch: 17   Global Step: 99330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:46,569-Speed 5625.52 samples/sec   Loss 1.3742   LearningRate 0.0016   Epoch: 17   Global Step: 99340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:48,397-Speed 5603.26 samples/sec   Loss 1.4195   LearningRate 0.0016   Epoch: 17   Global Step: 99350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:50,244-Speed 5547.10 samples/sec   Loss 1.2568   LearningRate 0.0016   Epoch: 17   Global Step: 99360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:52,080-Speed 5577.64 samples/sec   Loss 1.3350   LearningRate 0.0016   Epoch: 17   Global Step: 99370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:53,917-Speed 5578.41 samples/sec   Loss 1.2238   LearningRate 0.0016   Epoch: 17   Global Step: 99380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:55,761-Speed 5553.39 samples/sec   Loss 1.2396   LearningRate 0.0016   Epoch: 17   Global Step: 99390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:57,594-Speed 5590.56 samples/sec   Loss 1.3374   LearningRate 0.0016   Epoch: 17   Global Step: 99400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:42:59,449-Speed 5519.96 samples/sec   Loss 1.3870   LearningRate 0.0016   Epoch: 17   Global Step: 99410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:01,286-Speed 5575.98 samples/sec   Loss 1.3480   LearningRate 0.0016   Epoch: 17   Global Step: 99420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:03,114-Speed 5606.16 samples/sec   Loss 1.3655   LearningRate 0.0016   Epoch: 17   Global Step: 99430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:04,940-Speed 5608.85 samples/sec   Loss 1.3463   LearningRate 0.0016   Epoch: 17   Global Step: 99440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:06,773-Speed 5587.46 samples/sec   Loss 1.3159   LearningRate 0.0016   Epoch: 17   Global Step: 99450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:08,591-Speed 5636.40 samples/sec   Loss 1.3219   LearningRate 0.0016   Epoch: 17   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:10,415-Speed 5612.84 samples/sec   Loss 1.2232   LearningRate 0.0016   Epoch: 17   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:12,287-Speed 5472.87 samples/sec   Loss 1.3597   LearningRate 0.0016   Epoch: 17   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:14,122-Speed 5583.55 samples/sec   Loss 1.3451   LearningRate 0.0016   Epoch: 17   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:15,979-Speed 5517.53 samples/sec   Loss 1.3170   LearningRate 0.0016   Epoch: 17   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:17,818-Speed 5568.90 samples/sec   Loss 1.3264   LearningRate 0.0016   Epoch: 17   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:19,652-Speed 5586.05 samples/sec   Loss 1.3640   LearningRate 0.0016   Epoch: 17   Global Step: 99520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:21,470-Speed 5632.64 samples/sec   Loss 1.3341   LearningRate 0.0016   Epoch: 17   Global Step: 99530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:23,296-Speed 5612.00 samples/sec   Loss 1.3204   LearningRate 0.0016   Epoch: 17   Global Step: 99540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:25,126-Speed 5596.77 samples/sec   Loss 1.2834   LearningRate 0.0016   Epoch: 17   Global Step: 99550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:26,964-Speed 5573.64 samples/sec   Loss 1.2001   LearningRate 0.0016   Epoch: 17   Global Step: 99560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:28,795-Speed 5596.80 samples/sec   Loss 1.3205   LearningRate 0.0015   Epoch: 17   Global Step: 99570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:30,666-Speed 5472.84 samples/sec   Loss 1.3253   LearningRate 0.0015   Epoch: 17   Global Step: 99580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:32,520-Speed 5526.02 samples/sec   Loss 1.2860   LearningRate 0.0015   Epoch: 17   Global Step: 99590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:34,407-Speed 5426.90 samples/sec   Loss 1.4130   LearningRate 0.0015   Epoch: 17   Global Step: 99600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:36,332-Speed 5322.68 samples/sec   Loss 1.3356   LearningRate 0.0015   Epoch: 17   Global Step: 99610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:38,199-Speed 5486.14 samples/sec   Loss 1.2507   LearningRate 0.0015   Epoch: 17   Global Step: 99620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:40,054-Speed 5524.28 samples/sec   Loss 1.2647   LearningRate 0.0015   Epoch: 17   Global Step: 99630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:41,894-Speed 5565.64 samples/sec   Loss 1.2895   LearningRate 0.0015   Epoch: 17   Global Step: 99640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:43,737-Speed 5557.84 samples/sec   Loss 1.3781   LearningRate 0.0015   Epoch: 17   Global Step: 99650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:45,569-Speed 5593.25 samples/sec   Loss 1.2576   LearningRate 0.0015   Epoch: 17   Global Step: 99660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:47,390-Speed 5624.40 samples/sec   Loss 1.3955   LearningRate 0.0015   Epoch: 17   Global Step: 99670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:49,293-Speed 5381.51 samples/sec   Loss 1.3700   LearningRate 0.0015   Epoch: 17   Global Step: 99680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:51,122-Speed 5601.67 samples/sec   Loss 1.2964   LearningRate 0.0015   Epoch: 17   Global Step: 99690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:52,970-Speed 5540.58 samples/sec   Loss 1.2668   LearningRate 0.0015   Epoch: 17   Global Step: 99700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:54,816-Speed 5551.63 samples/sec   Loss 1.3788   LearningRate 0.0015   Epoch: 17   Global Step: 99710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:43:56,667-Speed 5535.97 samples/sec   Loss 1.2635   LearningRate 0.0015   Epoch: 17   Global Step: 99720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:43:58,506-Speed 5569.59 samples/sec   Loss 1.2001   LearningRate 0.0015   Epoch: 17   Global Step: 99730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:00,334-Speed 5604.29 samples/sec   Loss 1.2708   LearningRate 0.0015   Epoch: 17   Global Step: 99740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:02,161-Speed 5606.48 samples/sec   Loss 1.3301   LearningRate 0.0015   Epoch: 17   Global Step: 99750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:03,988-Speed 5606.63 samples/sec   Loss 1.3844   LearningRate 0.0015   Epoch: 17   Global Step: 99760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:05,820-Speed 5591.80 samples/sec   Loss 1.2266   LearningRate 0.0015   Epoch: 17   Global Step: 99770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:07,684-Speed 5495.66 samples/sec   Loss 1.3464   LearningRate 0.0015   Epoch: 17   Global Step: 99780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:09,574-Speed 5420.78 samples/sec   Loss 1.2322   LearningRate 0.0015   Epoch: 17   Global Step: 99790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:11,405-Speed 5592.36 samples/sec   Loss 1.2722   LearningRate 0.0015   Epoch: 17   Global Step: 99800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:13,237-Speed 5592.46 samples/sec   Loss 1.4131   LearningRate 0.0015   Epoch: 17   Global Step: 99810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:15,065-Speed 5604.88 samples/sec   Loss 1.3067   LearningRate 0.0015   Epoch: 17   Global Step: 99820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:16,888-Speed 5616.41 samples/sec   Loss 1.3885   LearningRate 0.0015   Epoch: 17   Global Step: 99830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:18,711-Speed 5619.41 samples/sec   Loss 1.3565   LearningRate 0.0015   Epoch: 17   Global Step: 99840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:20,542-Speed 5594.16 samples/sec   Loss 1.3163   LearningRate 0.0015   Epoch: 17   Global Step: 99850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:22,367-Speed 5612.94 samples/sec   Loss 1.2958   LearningRate 0.0015   Epoch: 17   Global Step: 99860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:24,196-Speed 5601.05 samples/sec   Loss 1.3414   LearningRate 0.0015   Epoch: 17   Global Step: 99870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:26,029-Speed 5588.60 samples/sec   Loss 1.3774   LearningRate 0.0015   Epoch: 17   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:27,875-Speed 5549.51 samples/sec   Loss 1.2510   LearningRate 0.0015   Epoch: 17   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:29,722-Speed 5545.19 samples/sec   Loss 1.3722   LearningRate 0.0015   Epoch: 17   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:31,557-Speed 5582.87 samples/sec   Loss 1.3486   LearningRate 0.0015   Epoch: 17   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:33,387-Speed 5596.92 samples/sec   Loss 1.3087   LearningRate 0.0015   Epoch: 17   Global Step: 99920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:35,212-Speed 5614.09 samples/sec   Loss 1.3121   LearningRate 0.0015   Epoch: 17   Global Step: 99930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:44:37,034-Speed 5621.18 samples/sec   Loss 1.3030   LearningRate 0.0015   Epoch: 17   Global Step: 99940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:38,886-Speed 5531.68 samples/sec   Loss 1.3618   LearningRate 0.0015   Epoch: 17   Global Step: 99950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:40,722-Speed 5578.43 samples/sec   Loss 1.2842   LearningRate 0.0015   Epoch: 17   Global Step: 99960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:42,570-Speed 5542.76 samples/sec   Loss 1.3603   LearningRate 0.0015   Epoch: 17   Global Step: 99970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:44,412-Speed 5561.76 samples/sec   Loss 1.2993   LearningRate 0.0015   Epoch: 17   Global Step: 99980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:46,249-Speed 5576.55 samples/sec   Loss 1.2814   LearningRate 0.0015   Epoch: 17   Global Step: 99990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:44:48,080-Speed 5596.27 samples/sec   Loss 1.2462   LearningRate 0.0015   Epoch: 17   Global Step: 100000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:45:14,464-[lfw][100000]XNorm: 22.185568
Training: 2022-04-27 07:45:14,464-[lfw][100000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-27 07:45:14,465-[lfw][100000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:45:45,005-[cfp_fp][100000]XNorm: 21.126163
Training: 2022-04-27 07:45:45,005-[cfp_fp][100000]Accuracy-Flip: 0.97743+-0.00635
Training: 2022-04-27 07:45:45,006-[cfp_fp][100000]Accuracy-Highest: 0.97929
Training: 2022-04-27 07:46:11,427-[agedb_30][100000]XNorm: 22.106058
Training: 2022-04-27 07:46:11,428-[agedb_30][100000]Accuracy-Flip: 0.98083+-0.00574
Training: 2022-04-27 07:46:11,428-[agedb_30][100000]Accuracy-Highest: 0.98183
Training: 2022-04-27 07:46:13,256-Speed 120.22 samples/sec   Loss 1.3943   LearningRate 0.0015   Epoch: 17   Global Step: 100010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:46:15,075-Speed 5630.50 samples/sec   Loss 1.2623   LearningRate 0.0015   Epoch: 17   Global Step: 100020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:46:16,899-Speed 5618.00 samples/sec   Loss 1.2216   LearningRate 0.0014   Epoch: 17   Global Step: 100030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:46:18,751-Speed 5529.86 samples/sec   Loss 1.3304   LearningRate 0.0014   Epoch: 17   Global Step: 100040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:20,581-Speed 5597.37 samples/sec   Loss 1.2703   LearningRate 0.0014   Epoch: 17   Global Step: 100050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:22,438-Speed 5515.55 samples/sec   Loss 1.3174   LearningRate 0.0014   Epoch: 17   Global Step: 100060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:24,275-Speed 5576.68 samples/sec   Loss 1.2212   LearningRate 0.0014   Epoch: 17   Global Step: 100070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:26,095-Speed 5629.04 samples/sec   Loss 1.3692   LearningRate 0.0014   Epoch: 17   Global Step: 100080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:27,936-Speed 5561.91 samples/sec   Loss 1.3343   LearningRate 0.0014   Epoch: 17   Global Step: 100090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:29,775-Speed 5569.36 samples/sec   Loss 1.3914   LearningRate 0.0014   Epoch: 17   Global Step: 100100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:31,618-Speed 5557.79 samples/sec   Loss 1.2761   LearningRate 0.0014   Epoch: 17   Global Step: 100110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:33,442-Speed 5617.18 samples/sec   Loss 1.3358   LearningRate 0.0014   Epoch: 17   Global Step: 100120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:35,264-Speed 5623.74 samples/sec   Loss 1.2907   LearningRate 0.0014   Epoch: 17   Global Step: 100130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:37,102-Speed 5571.46 samples/sec   Loss 1.3200   LearningRate 0.0014   Epoch: 17   Global Step: 100140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:46:38,928-Speed 5608.32 samples/sec   Loss 1.3584   LearningRate 0.0014   Epoch: 17   Global Step: 100150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:40,774-Speed 5549.73 samples/sec   Loss 1.3133   LearningRate 0.0014   Epoch: 17   Global Step: 100160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:42,600-Speed 5610.92 samples/sec   Loss 1.3226   LearningRate 0.0014   Epoch: 17   Global Step: 100170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:44,442-Speed 5561.34 samples/sec   Loss 1.2271   LearningRate 0.0014   Epoch: 17   Global Step: 100180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:46,263-Speed 5624.51 samples/sec   Loss 1.3029   LearningRate 0.0014   Epoch: 17   Global Step: 100190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:48,091-Speed 5603.06 samples/sec   Loss 1.3209   LearningRate 0.0014   Epoch: 17   Global Step: 100200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:49,926-Speed 5583.36 samples/sec   Loss 1.3278   LearningRate 0.0014   Epoch: 17   Global Step: 100210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:51,757-Speed 5594.34 samples/sec   Loss 1.3711   LearningRate 0.0014   Epoch: 17   Global Step: 100220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:53,592-Speed 5583.37 samples/sec   Loss 1.3412   LearningRate 0.0014   Epoch: 17   Global Step: 100230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:55,419-Speed 5604.87 samples/sec   Loss 1.2360   LearningRate 0.0014   Epoch: 17   Global Step: 100240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:57,269-Speed 5536.04 samples/sec   Loss 1.3593   LearningRate 0.0014   Epoch: 17   Global Step: 100250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:46:59,108-Speed 5569.95 samples/sec   Loss 1.2622   LearningRate 0.0014   Epoch: 17   Global Step: 100260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:00,955-Speed 5546.07 samples/sec   Loss 1.2898   LearningRate 0.0014   Epoch: 17   Global Step: 100270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:02,789-Speed 5585.79 samples/sec   Loss 1.2542   LearningRate 0.0014   Epoch: 17   Global Step: 100280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:04,643-Speed 5525.80 samples/sec   Loss 1.3821   LearningRate 0.0014   Epoch: 17   Global Step: 100290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:06,498-Speed 5524.19 samples/sec   Loss 1.3843   LearningRate 0.0014   Epoch: 17   Global Step: 100300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:08,351-Speed 5525.88 samples/sec   Loss 1.3222   LearningRate 0.0014   Epoch: 17   Global Step: 100310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:10,196-Speed 5554.48 samples/sec   Loss 1.3129   LearningRate 0.0014   Epoch: 17   Global Step: 100320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:12,029-Speed 5588.35 samples/sec   Loss 1.2951   LearningRate 0.0014   Epoch: 17   Global Step: 100330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:13,853-Speed 5615.76 samples/sec   Loss 1.4157   LearningRate 0.0014   Epoch: 17   Global Step: 100340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:15,688-Speed 5580.27 samples/sec   Loss 1.2913   LearningRate 0.0014   Epoch: 17   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:17,524-Speed 5578.89 samples/sec   Loss 1.3769   LearningRate 0.0014   Epoch: 17   Global Step: 100360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:19,358-Speed 5586.93 samples/sec   Loss 1.3675   LearningRate 0.0014   Epoch: 17   Global Step: 100370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:21,194-Speed 5579.59 samples/sec   Loss 1.4061   LearningRate 0.0014   Epoch: 17   Global Step: 100380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:23,035-Speed 5564.27 samples/sec   Loss 1.3748   LearningRate 0.0014   Epoch: 17   Global Step: 100390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:24,869-Speed 5583.16 samples/sec   Loss 1.3242   LearningRate 0.0014   Epoch: 17   Global Step: 100400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:26,715-Speed 5551.93 samples/sec   Loss 1.4535   LearningRate 0.0014   Epoch: 17   Global Step: 100410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:28,544-Speed 5599.08 samples/sec   Loss 1.2930   LearningRate 0.0014   Epoch: 17   Global Step: 100420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:30,388-Speed 5556.56 samples/sec   Loss 1.3225   LearningRate 0.0014   Epoch: 17   Global Step: 100430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:32,214-Speed 5609.96 samples/sec   Loss 1.3121   LearningRate 0.0014   Epoch: 17   Global Step: 100440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:34,050-Speed 5576.49 samples/sec   Loss 1.3775   LearningRate 0.0014   Epoch: 17   Global Step: 100450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:47:35,898-Speed 5545.12 samples/sec   Loss 1.2907   LearningRate 0.0014   Epoch: 17   Global Step: 100460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:37,737-Speed 5568.74 samples/sec   Loss 1.3150   LearningRate 0.0014   Epoch: 17   Global Step: 100470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:39,566-Speed 5599.00 samples/sec   Loss 1.3028   LearningRate 0.0014   Epoch: 17   Global Step: 100480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:41,390-Speed 5617.81 samples/sec   Loss 1.3158   LearningRate 0.0014   Epoch: 17   Global Step: 100490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:43,244-Speed 5523.43 samples/sec   Loss 1.3201   LearningRate 0.0014   Epoch: 17   Global Step: 100500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:45,077-Speed 5590.47 samples/sec   Loss 1.2457   LearningRate 0.0013   Epoch: 17   Global Step: 100510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:46,911-Speed 5584.52 samples/sec   Loss 1.3441   LearningRate 0.0013   Epoch: 17   Global Step: 100520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:48,737-Speed 5609.46 samples/sec   Loss 1.3649   LearningRate 0.0013   Epoch: 17   Global Step: 100530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:50,608-Speed 5475.67 samples/sec   Loss 1.3140   LearningRate 0.0013   Epoch: 17   Global Step: 100540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:52,439-Speed 5594.90 samples/sec   Loss 1.2967   LearningRate 0.0013   Epoch: 17   Global Step: 100550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:54,257-Speed 5632.67 samples/sec   Loss 1.2827   LearningRate 0.0013   Epoch: 17   Global Step: 100560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:56,101-Speed 5556.90 samples/sec   Loss 1.2916   LearningRate 0.0013   Epoch: 17   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:57,937-Speed 5577.64 samples/sec   Loss 1.3139   LearningRate 0.0013   Epoch: 17   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:47:59,766-Speed 5599.71 samples/sec   Loss 1.2469   LearningRate 0.0013   Epoch: 17   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:01,609-Speed 5558.60 samples/sec   Loss 1.2795   LearningRate 0.0013   Epoch: 17   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:03,456-Speed 5546.47 samples/sec   Loss 1.3656   LearningRate 0.0013   Epoch: 17   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:05,287-Speed 5595.13 samples/sec   Loss 1.3438   LearningRate 0.0013   Epoch: 17   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:07,130-Speed 5557.26 samples/sec   Loss 1.2827   LearningRate 0.0013   Epoch: 17   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:08,955-Speed 5613.67 samples/sec   Loss 1.3138   LearningRate 0.0013   Epoch: 17   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:10,788-Speed 5586.21 samples/sec   Loss 1.3306   LearningRate 0.0013   Epoch: 17   Global Step: 100650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:12,625-Speed 5579.25 samples/sec   Loss 1.3371   LearningRate 0.0013   Epoch: 17   Global Step: 100660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:14,473-Speed 5541.51 samples/sec   Loss 1.2516   LearningRate 0.0013   Epoch: 17   Global Step: 100670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:16,297-Speed 5615.10 samples/sec   Loss 1.2286   LearningRate 0.0013   Epoch: 17   Global Step: 100680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:18,147-Speed 5538.57 samples/sec   Loss 1.2598   LearningRate 0.0013   Epoch: 17   Global Step: 100690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:20,021-Speed 5464.97 samples/sec   Loss 1.2625   LearningRate 0.0013   Epoch: 17   Global Step: 100700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:21,869-Speed 5544.22 samples/sec   Loss 1.3644   LearningRate 0.0013   Epoch: 17   Global Step: 100710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:23,713-Speed 5555.15 samples/sec   Loss 1.3557   LearningRate 0.0013   Epoch: 17   Global Step: 100720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:25,552-Speed 5568.05 samples/sec   Loss 1.3482   LearningRate 0.0013   Epoch: 17   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:27,394-Speed 5561.23 samples/sec   Loss 1.2623   LearningRate 0.0013   Epoch: 17   Global Step: 100740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:29,229-Speed 5581.36 samples/sec   Loss 1.2936   LearningRate 0.0013   Epoch: 17   Global Step: 100750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:31,064-Speed 5584.04 samples/sec   Loss 1.4124   LearningRate 0.0013   Epoch: 17   Global Step: 100760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:48:32,880-Speed 5643.88 samples/sec   Loss 1.3482   LearningRate 0.0013   Epoch: 17   Global Step: 100770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:34,740-Speed 5508.02 samples/sec   Loss 1.3176   LearningRate 0.0013   Epoch: 17   Global Step: 100780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:36,563-Speed 5617.78 samples/sec   Loss 1.4013   LearningRate 0.0013   Epoch: 17   Global Step: 100790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:38,393-Speed 5597.06 samples/sec   Loss 1.2633   LearningRate 0.0013   Epoch: 17   Global Step: 100800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:40,218-Speed 5612.66 samples/sec   Loss 1.2977   LearningRate 0.0013   Epoch: 17   Global Step: 100810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:42,048-Speed 5599.76 samples/sec   Loss 1.3016   LearningRate 0.0013   Epoch: 17   Global Step: 100820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:43,884-Speed 5579.59 samples/sec   Loss 1.2840   LearningRate 0.0013   Epoch: 17   Global Step: 100830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:45,725-Speed 5563.49 samples/sec   Loss 1.3377   LearningRate 0.0013   Epoch: 17   Global Step: 100840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:47,587-Speed 5500.63 samples/sec   Loss 1.3472   LearningRate 0.0013   Epoch: 17   Global Step: 100850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:49,428-Speed 5563.38 samples/sec   Loss 1.3070   LearningRate 0.0013   Epoch: 17   Global Step: 100860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:51,251-Speed 5621.17 samples/sec   Loss 1.3210   LearningRate 0.0013   Epoch: 17   Global Step: 100870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:53,120-Speed 5477.96 samples/sec   Loss 1.3416   LearningRate 0.0013   Epoch: 17   Global Step: 100880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:54,958-Speed 5576.34 samples/sec   Loss 1.3318   LearningRate 0.0013   Epoch: 17   Global Step: 100890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:56,794-Speed 5577.70 samples/sec   Loss 1.3251   LearningRate 0.0013   Epoch: 17   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:48:58,621-Speed 5608.69 samples/sec   Loss 1.2766   LearningRate 0.0013   Epoch: 17   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:00,464-Speed 5557.55 samples/sec   Loss 1.2846   LearningRate 0.0013   Epoch: 17   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:02,308-Speed 5554.31 samples/sec   Loss 1.3429   LearningRate 0.0013   Epoch: 17   Global Step: 100930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:04,132-Speed 5614.77 samples/sec   Loss 1.3078   LearningRate 0.0013   Epoch: 17   Global Step: 100940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:05,992-Speed 5506.37 samples/sec   Loss 1.3674   LearningRate 0.0013   Epoch: 17   Global Step: 100950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:07,842-Speed 5537.30 samples/sec   Loss 1.2956   LearningRate 0.0013   Epoch: 17   Global Step: 100960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:09,679-Speed 5577.57 samples/sec   Loss 1.3269   LearningRate 0.0013   Epoch: 17   Global Step: 100970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:11,521-Speed 5559.33 samples/sec   Loss 1.2376   LearningRate 0.0013   Epoch: 17   Global Step: 100980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:13,359-Speed 5573.19 samples/sec   Loss 1.2968   LearningRate 0.0013   Epoch: 17   Global Step: 100990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:15,199-Speed 5567.30 samples/sec   Loss 1.2417   LearningRate 0.0013   Epoch: 17   Global Step: 101000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:17,065-Speed 5489.46 samples/sec   Loss 1.3243   LearningRate 0.0012   Epoch: 17   Global Step: 101010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:18,923-Speed 5515.01 samples/sec   Loss 1.2962   LearningRate 0.0012   Epoch: 17   Global Step: 101020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:20,776-Speed 5527.85 samples/sec   Loss 1.2754   LearningRate 0.0012   Epoch: 17   Global Step: 101030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:22,624-Speed 5542.95 samples/sec   Loss 1.3226   LearningRate 0.0012   Epoch: 17   Global Step: 101040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:24,465-Speed 5565.91 samples/sec   Loss 1.3305   LearningRate 0.0012   Epoch: 17   Global Step: 101050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:26,296-Speed 5594.45 samples/sec   Loss 1.3132   LearningRate 0.0012   Epoch: 17   Global Step: 101060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:28,111-Speed 5642.03 samples/sec   Loss 1.2856   LearningRate 0.0012   Epoch: 17   Global Step: 101070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:29,956-Speed 5553.38 samples/sec   Loss 1.2697   LearningRate 0.0012   Epoch: 17   Global Step: 101080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:31,801-Speed 5552.17 samples/sec   Loss 1.2965   LearningRate 0.0012   Epoch: 17   Global Step: 101090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:33,636-Speed 5579.63 samples/sec   Loss 1.2813   LearningRate 0.0012   Epoch: 17   Global Step: 101100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:35,468-Speed 5592.59 samples/sec   Loss 1.2883   LearningRate 0.0012   Epoch: 17   Global Step: 101110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:37,291-Speed 5618.03 samples/sec   Loss 1.3188   LearningRate 0.0012   Epoch: 17   Global Step: 101120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:39,141-Speed 5537.56 samples/sec   Loss 1.3167   LearningRate 0.0012   Epoch: 17   Global Step: 101130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:40,995-Speed 5525.76 samples/sec   Loss 1.2976   LearningRate 0.0012   Epoch: 17   Global Step: 101140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:42,853-Speed 5513.02 samples/sec   Loss 1.3406   LearningRate 0.0012   Epoch: 17   Global Step: 101150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:44,686-Speed 5587.56 samples/sec   Loss 1.3105   LearningRate 0.0012   Epoch: 17   Global Step: 101160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:46,519-Speed 5589.57 samples/sec   Loss 1.2606   LearningRate 0.0012   Epoch: 17   Global Step: 101170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:48,344-Speed 5612.78 samples/sec   Loss 1.3382   LearningRate 0.0012   Epoch: 17   Global Step: 101180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:50,203-Speed 5510.90 samples/sec   Loss 1.3176   LearningRate 0.0012   Epoch: 17   Global Step: 101190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:52,081-Speed 5452.42 samples/sec   Loss 1.3371   LearningRate 0.0012   Epoch: 17   Global Step: 101200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:53,935-Speed 5525.51 samples/sec   Loss 1.2766   LearningRate 0.0012   Epoch: 17   Global Step: 101210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:55,768-Speed 5587.54 samples/sec   Loss 1.3043   LearningRate 0.0012   Epoch: 17   Global Step: 101220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:57,594-Speed 5611.06 samples/sec   Loss 1.2039   LearningRate 0.0012   Epoch: 17   Global Step: 101230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:49:59,433-Speed 5567.81 samples/sec   Loss 1.3275   LearningRate 0.0012   Epoch: 17   Global Step: 101240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:01,360-Speed 5315.59 samples/sec   Loss 1.3170   LearningRate 0.0012   Epoch: 17   Global Step: 101250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:03,247-Speed 5430.02 samples/sec   Loss 1.3538   LearningRate 0.0012   Epoch: 17   Global Step: 101260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:05,076-Speed 5600.28 samples/sec   Loss 1.2048   LearningRate 0.0012   Epoch: 17   Global Step: 101270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:06,923-Speed 5546.33 samples/sec   Loss 1.2973   LearningRate 0.0012   Epoch: 17   Global Step: 101280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:08,761-Speed 5573.56 samples/sec   Loss 1.3456   LearningRate 0.0012   Epoch: 17   Global Step: 101290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:10,603-Speed 5562.47 samples/sec   Loss 1.3258   LearningRate 0.0012   Epoch: 17   Global Step: 101300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:12,442-Speed 5570.69 samples/sec   Loss 1.3215   LearningRate 0.0012   Epoch: 17   Global Step: 101310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:14,285-Speed 5555.38 samples/sec   Loss 1.2832   LearningRate 0.0012   Epoch: 17   Global Step: 101320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:16,126-Speed 5565.45 samples/sec   Loss 1.3893   LearningRate 0.0012   Epoch: 17   Global Step: 101330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:17,953-Speed 5605.41 samples/sec   Loss 1.2385   LearningRate 0.0012   Epoch: 17   Global Step: 101340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:19,794-Speed 5563.75 samples/sec   Loss 1.3038   LearningRate 0.0012   Epoch: 17   Global Step: 101350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:21,621-Speed 5606.52 samples/sec   Loss 1.2151   LearningRate 0.0012   Epoch: 17   Global Step: 101360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:23,464-Speed 5559.52 samples/sec   Loss 1.2573   LearningRate 0.0012   Epoch: 17   Global Step: 101370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:50:25,286-Speed 5622.57 samples/sec   Loss 1.2451   LearningRate 0.0012   Epoch: 17   Global Step: 101380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:27,147-Speed 5504.36 samples/sec   Loss 1.3611   LearningRate 0.0012   Epoch: 17   Global Step: 101390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:28,991-Speed 5553.61 samples/sec   Loss 1.3413   LearningRate 0.0012   Epoch: 17   Global Step: 101400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:30,835-Speed 5554.56 samples/sec   Loss 1.2812   LearningRate 0.0012   Epoch: 17   Global Step: 101410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:32,689-Speed 5526.81 samples/sec   Loss 1.3450   LearningRate 0.0012   Epoch: 17   Global Step: 101420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:34,522-Speed 5587.44 samples/sec   Loss 1.2456   LearningRate 0.0012   Epoch: 17   Global Step: 101430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:36,373-Speed 5534.96 samples/sec   Loss 1.2920   LearningRate 0.0012   Epoch: 17   Global Step: 101440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:38,209-Speed 5577.85 samples/sec   Loss 1.3146   LearningRate 0.0012   Epoch: 17   Global Step: 101450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:40,038-Speed 5600.29 samples/sec   Loss 1.2891   LearningRate 0.0012   Epoch: 17   Global Step: 101460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:41,891-Speed 5532.99 samples/sec   Loss 1.2711   LearningRate 0.0012   Epoch: 17   Global Step: 101470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:43,704-Speed 5648.94 samples/sec   Loss 1.3292   LearningRate 0.0012   Epoch: 17   Global Step: 101480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:45,547-Speed 5557.15 samples/sec   Loss 1.3181   LearningRate 0.0012   Epoch: 17   Global Step: 101490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:47,399-Speed 5531.45 samples/sec   Loss 1.3226   LearningRate 0.0012   Epoch: 17   Global Step: 101500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:49,237-Speed 5574.04 samples/sec   Loss 1.3415   LearningRate 0.0012   Epoch: 17   Global Step: 101510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:51,072-Speed 5583.18 samples/sec   Loss 1.3064   LearningRate 0.0012   Epoch: 17   Global Step: 101520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:52,914-Speed 5561.92 samples/sec   Loss 1.3729   LearningRate 0.0011   Epoch: 17   Global Step: 101530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:54,842-Speed 5313.42 samples/sec   Loss 1.3272   LearningRate 0.0011   Epoch: 17   Global Step: 101540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:56,690-Speed 5540.85 samples/sec   Loss 1.3183   LearningRate 0.0011   Epoch: 17   Global Step: 101550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:50:58,583-Speed 5411.40 samples/sec   Loss 1.3633   LearningRate 0.0011   Epoch: 17   Global Step: 101560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:00,414-Speed 5595.32 samples/sec   Loss 1.1846   LearningRate 0.0011   Epoch: 17   Global Step: 101570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:02,245-Speed 5593.15 samples/sec   Loss 1.2991   LearningRate 0.0011   Epoch: 17   Global Step: 101580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:04,078-Speed 5590.44 samples/sec   Loss 1.2920   LearningRate 0.0011   Epoch: 17   Global Step: 101590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:05,974-Speed 5400.83 samples/sec   Loss 1.2465   LearningRate 0.0011   Epoch: 17   Global Step: 101600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:07,822-Speed 5542.57 samples/sec   Loss 1.2933   LearningRate 0.0011   Epoch: 17   Global Step: 101610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:09,654-Speed 5592.06 samples/sec   Loss 1.3737   LearningRate 0.0011   Epoch: 17   Global Step: 101620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:11,491-Speed 5578.69 samples/sec   Loss 1.3083   LearningRate 0.0011   Epoch: 17   Global Step: 101630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:13,328-Speed 5574.37 samples/sec   Loss 1.3706   LearningRate 0.0011   Epoch: 17   Global Step: 101640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:15,167-Speed 5572.01 samples/sec   Loss 1.3454   LearningRate 0.0011   Epoch: 17   Global Step: 101650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:16,992-Speed 5612.85 samples/sec   Loss 1.2992   LearningRate 0.0011   Epoch: 17   Global Step: 101660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:18,831-Speed 5572.64 samples/sec   Loss 1.3467   LearningRate 0.0011   Epoch: 17   Global Step: 101670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:20,672-Speed 5561.83 samples/sec   Loss 1.2248   LearningRate 0.0011   Epoch: 17   Global Step: 101680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:22,519-Speed 5547.57 samples/sec   Loss 1.2716   LearningRate 0.0011   Epoch: 17   Global Step: 101690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:24,359-Speed 5566.64 samples/sec   Loss 1.2760   LearningRate 0.0011   Epoch: 17   Global Step: 101700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:26,189-Speed 5597.07 samples/sec   Loss 1.3035   LearningRate 0.0011   Epoch: 17   Global Step: 101710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:28,042-Speed 5526.80 samples/sec   Loss 1.3466   LearningRate 0.0011   Epoch: 17   Global Step: 101720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:29,886-Speed 5554.90 samples/sec   Loss 1.3070   LearningRate 0.0011   Epoch: 17   Global Step: 101730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:31,748-Speed 5501.24 samples/sec   Loss 1.3161   LearningRate 0.0011   Epoch: 17   Global Step: 101740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:33,584-Speed 5580.31 samples/sec   Loss 1.2989   LearningRate 0.0011   Epoch: 17   Global Step: 101750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:35,410-Speed 5610.02 samples/sec   Loss 1.2263   LearningRate 0.0011   Epoch: 17   Global Step: 101760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:37,246-Speed 5580.54 samples/sec   Loss 1.3112   LearningRate 0.0011   Epoch: 17   Global Step: 101770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:51:39,081-Speed 5580.54 samples/sec   Loss 1.3905   LearningRate 0.0011   Epoch: 17   Global Step: 101780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:51:40,902-Speed 5625.23 samples/sec   Loss 1.3653   LearningRate 0.0011   Epoch: 17   Global Step: 101790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:42,730-Speed 5605.63 samples/sec   Loss 1.3171   LearningRate 0.0011   Epoch: 17   Global Step: 101800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:44,575-Speed 5549.67 samples/sec   Loss 1.2688   LearningRate 0.0011   Epoch: 17   Global Step: 101810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:46,412-Speed 5575.50 samples/sec   Loss 1.3315   LearningRate 0.0011   Epoch: 17   Global Step: 101820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:48,241-Speed 5601.95 samples/sec   Loss 1.3750   LearningRate 0.0011   Epoch: 17   Global Step: 101830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:50,104-Speed 5498.40 samples/sec   Loss 1.3373   LearningRate 0.0011   Epoch: 17   Global Step: 101840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:51,945-Speed 5562.82 samples/sec   Loss 1.2829   LearningRate 0.0011   Epoch: 17   Global Step: 101850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:53,776-Speed 5594.95 samples/sec   Loss 1.2874   LearningRate 0.0011   Epoch: 17   Global Step: 101860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:55,601-Speed 5613.57 samples/sec   Loss 1.2610   LearningRate 0.0011   Epoch: 17   Global Step: 101870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:57,435-Speed 5586.62 samples/sec   Loss 1.2058   LearningRate 0.0011   Epoch: 17   Global Step: 101880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:51:59,279-Speed 5554.17 samples/sec   Loss 1.2537   LearningRate 0.0011   Epoch: 17   Global Step: 101890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:01,134-Speed 5521.95 samples/sec   Loss 1.3111   LearningRate 0.0011   Epoch: 17   Global Step: 101900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:02,995-Speed 5505.90 samples/sec   Loss 1.2884   LearningRate 0.0011   Epoch: 17   Global Step: 101910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:04,856-Speed 5503.79 samples/sec   Loss 1.3263   LearningRate 0.0011   Epoch: 17   Global Step: 101920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:06,690-Speed 5584.70 samples/sec   Loss 1.2549   LearningRate 0.0011   Epoch: 17   Global Step: 101930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:08,546-Speed 5518.34 samples/sec   Loss 1.2506   LearningRate 0.0011   Epoch: 17   Global Step: 101940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:10,398-Speed 5531.49 samples/sec   Loss 1.3138   LearningRate 0.0011   Epoch: 17   Global Step: 101950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:12,230-Speed 5591.80 samples/sec   Loss 1.3516   LearningRate 0.0011   Epoch: 17   Global Step: 101960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:14,066-Speed 5578.30 samples/sec   Loss 1.2858   LearningRate 0.0011   Epoch: 17   Global Step: 101970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:15,913-Speed 5547.48 samples/sec   Loss 1.3012   LearningRate 0.0011   Epoch: 17   Global Step: 101980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:17,730-Speed 5637.00 samples/sec   Loss 1.2669   LearningRate 0.0011   Epoch: 17   Global Step: 101990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:19,568-Speed 5571.81 samples/sec   Loss 1.3307   LearningRate 0.0011   Epoch: 17   Global Step: 102000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:52:45,797-[lfw][102000]XNorm: 21.973692
Training: 2022-04-27 07:52:45,798-[lfw][102000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-27 07:52:45,798-[lfw][102000]Accuracy-Highest: 0.99800
Training: 2022-04-27 07:53:16,214-[cfp_fp][102000]XNorm: 21.134001
Training: 2022-04-27 07:53:16,215-[cfp_fp][102000]Accuracy-Flip: 0.97886+-0.00682
Training: 2022-04-27 07:53:16,215-[cfp_fp][102000]Accuracy-Highest: 0.97929
Training: 2022-04-27 07:53:42,529-[agedb_30][102000]XNorm: 22.020916
Training: 2022-04-27 07:53:42,529-[agedb_30][102000]Accuracy-Flip: 0.98100+-0.00680
Training: 2022-04-27 07:53:42,529-[agedb_30][102000]Accuracy-Highest: 0.98183
Training: 2022-04-27 07:53:44,384-Speed 120.73 samples/sec   Loss 1.2003   LearningRate 0.0011   Epoch: 17   Global Step: 102010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:46,214-Speed 5598.73 samples/sec   Loss 1.2926   LearningRate 0.0011   Epoch: 17   Global Step: 102020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:48,060-Speed 5548.92 samples/sec   Loss 1.3242   LearningRate 0.0011   Epoch: 17   Global Step: 102030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:49,877-Speed 5635.07 samples/sec   Loss 1.3168   LearningRate 0.0011   Epoch: 17   Global Step: 102040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:51,697-Speed 5629.21 samples/sec   Loss 1.2361   LearningRate 0.0011   Epoch: 17   Global Step: 102050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:53,517-Speed 5629.16 samples/sec   Loss 1.3409   LearningRate 0.0011   Epoch: 17   Global Step: 102060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:55,332-Speed 5644.27 samples/sec   Loss 1.2373   LearningRate 0.0010   Epoch: 17   Global Step: 102070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:57,155-Speed 5619.05 samples/sec   Loss 1.3487   LearningRate 0.0010   Epoch: 17   Global Step: 102080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:53:58,961-Speed 5670.48 samples/sec   Loss 1.4083   LearningRate 0.0010   Epoch: 17   Global Step: 102090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:00,786-Speed 5612.94 samples/sec   Loss 1.3311   LearningRate 0.0010   Epoch: 17   Global Step: 102100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:02,599-Speed 5650.62 samples/sec   Loss 1.2817   LearningRate 0.0010   Epoch: 17   Global Step: 102110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:04,413-Speed 5647.02 samples/sec   Loss 1.2352   LearningRate 0.0010   Epoch: 17   Global Step: 102120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:06,243-Speed 5596.21 samples/sec   Loss 1.2806   LearningRate 0.0010   Epoch: 17   Global Step: 102130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:08,071-Speed 5605.47 samples/sec   Loss 1.3994   LearningRate 0.0010   Epoch: 17   Global Step: 102140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:09,890-Speed 5630.46 samples/sec   Loss 1.2987   LearningRate 0.0010   Epoch: 17   Global Step: 102150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:11,730-Speed 5567.32 samples/sec   Loss 1.2510   LearningRate 0.0010   Epoch: 17   Global Step: 102160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:13,547-Speed 5636.33 samples/sec   Loss 1.2541   LearningRate 0.0010   Epoch: 17   Global Step: 102170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:15,385-Speed 5572.82 samples/sec   Loss 1.1958   LearningRate 0.0010   Epoch: 17   Global Step: 102180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:17,215-Speed 5597.73 samples/sec   Loss 1.4077   LearningRate 0.0010   Epoch: 17   Global Step: 102190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:19,068-Speed 5533.53 samples/sec   Loss 1.2842   LearningRate 0.0010   Epoch: 17   Global Step: 102200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:20,891-Speed 5616.57 samples/sec   Loss 1.2704   LearningRate 0.0010   Epoch: 17   Global Step: 102210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:22,728-Speed 5577.38 samples/sec   Loss 1.3264   LearningRate 0.0010   Epoch: 17   Global Step: 102220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:24,570-Speed 5559.57 samples/sec   Loss 1.2319   LearningRate 0.0010   Epoch: 17   Global Step: 102230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:26,397-Speed 5608.46 samples/sec   Loss 1.3643   LearningRate 0.0010   Epoch: 17   Global Step: 102240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:28,228-Speed 5594.22 samples/sec   Loss 1.3350   LearningRate 0.0010   Epoch: 17   Global Step: 102250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:30,047-Speed 5630.23 samples/sec   Loss 1.3264   LearningRate 0.0010   Epoch: 17   Global Step: 102260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:31,875-Speed 5604.90 samples/sec   Loss 1.3136   LearningRate 0.0010   Epoch: 17   Global Step: 102270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:33,718-Speed 5556.78 samples/sec   Loss 1.2868   LearningRate 0.0010   Epoch: 17   Global Step: 102280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:35,540-Speed 5621.98 samples/sec   Loss 1.2896   LearningRate 0.0010   Epoch: 17   Global Step: 102290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:37,358-Speed 5636.39 samples/sec   Loss 1.3198   LearningRate 0.0010   Epoch: 17   Global Step: 102300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:39,182-Speed 5619.00 samples/sec   Loss 1.2137   LearningRate 0.0010   Epoch: 17   Global Step: 102310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:41,021-Speed 5571.98 samples/sec   Loss 1.2610   LearningRate 0.0010   Epoch: 17   Global Step: 102320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:42,844-Speed 5618.53 samples/sec   Loss 1.3642   LearningRate 0.0010   Epoch: 17   Global Step: 102330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:44,743-Speed 5393.39 samples/sec   Loss 1.3162   LearningRate 0.0010   Epoch: 17   Global Step: 102340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:56,150-Speed 897.76 samples/sec   Loss 1.1820   LearningRate 0.0010   Epoch: 18   Global Step: 102350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:58,045-Speed 5404.57 samples/sec   Loss 1.0219   LearningRate 0.0010   Epoch: 18   Global Step: 102360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:54:59,918-Speed 5470.60 samples/sec   Loss 1.0493   LearningRate 0.0010   Epoch: 18   Global Step: 102370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:55:01,735-Speed 5635.58 samples/sec   Loss 0.9767   LearningRate 0.0010   Epoch: 18   Global Step: 102380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:55:03,601-Speed 5491.61 samples/sec   Loss 1.0362   LearningRate 0.0010   Epoch: 18   Global Step: 102390   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:55:05,456-Speed 5522.72 samples/sec   Loss 1.0232   LearningRate 0.0010   Epoch: 18   Global Step: 102400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:55:07,281-Speed 5613.62 samples/sec   Loss 0.9464   LearningRate 0.0010   Epoch: 18   Global Step: 102410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:09,106-Speed 5611.71 samples/sec   Loss 1.0031   LearningRate 0.0010   Epoch: 18   Global Step: 102420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:10,944-Speed 5574.21 samples/sec   Loss 1.0077   LearningRate 0.0010   Epoch: 18   Global Step: 102430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:12,810-Speed 5487.23 samples/sec   Loss 0.9662   LearningRate 0.0010   Epoch: 18   Global Step: 102440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:14,657-Speed 5546.97 samples/sec   Loss 0.9396   LearningRate 0.0010   Epoch: 18   Global Step: 102450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:16,484-Speed 5605.72 samples/sec   Loss 1.0221   LearningRate 0.0010   Epoch: 18   Global Step: 102460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:18,318-Speed 5585.08 samples/sec   Loss 1.0043   LearningRate 0.0010   Epoch: 18   Global Step: 102470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:20,162-Speed 5554.80 samples/sec   Loss 1.0858   LearningRate 0.0010   Epoch: 18   Global Step: 102480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:22,006-Speed 5555.48 samples/sec   Loss 0.9606   LearningRate 0.0010   Epoch: 18   Global Step: 102490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:23,859-Speed 5529.26 samples/sec   Loss 1.0015   LearningRate 0.0010   Epoch: 18   Global Step: 102500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:25,685-Speed 5608.83 samples/sec   Loss 1.0054   LearningRate 0.0010   Epoch: 18   Global Step: 102510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:27,521-Speed 5578.99 samples/sec   Loss 1.0659   LearningRate 0.0010   Epoch: 18   Global Step: 102520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:29,354-Speed 5590.70 samples/sec   Loss 1.0562   LearningRate 0.0010   Epoch: 18   Global Step: 102530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:31,174-Speed 5625.91 samples/sec   Loss 0.9357   LearningRate 0.0010   Epoch: 18   Global Step: 102540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:33,028-Speed 5525.82 samples/sec   Loss 1.0102   LearningRate 0.0010   Epoch: 18   Global Step: 102550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:34,865-Speed 5574.68 samples/sec   Loss 1.1214   LearningRate 0.0010   Epoch: 18   Global Step: 102560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 07:55:36,685-Speed 5630.53 samples/sec   Loss 0.9694   LearningRate 0.0010   Epoch: 18   Global Step: 102570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:38,529-Speed 5555.06 samples/sec   Loss 1.0175   LearningRate 0.0010   Epoch: 18   Global Step: 102580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:40,379-Speed 5537.05 samples/sec   Loss 1.0447   LearningRate 0.0010   Epoch: 18   Global Step: 102590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:42,201-Speed 5622.22 samples/sec   Loss 0.9637   LearningRate 0.0010   Epoch: 18   Global Step: 102600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:44,040-Speed 5569.02 samples/sec   Loss 0.9533   LearningRate 0.0010   Epoch: 18   Global Step: 102610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:45,906-Speed 5488.96 samples/sec   Loss 0.9830   LearningRate 0.0010   Epoch: 18   Global Step: 102620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:47,750-Speed 5556.41 samples/sec   Loss 1.0114   LearningRate 0.0010   Epoch: 18   Global Step: 102630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:49,599-Speed 5540.30 samples/sec   Loss 0.9802   LearningRate 0.0009   Epoch: 18   Global Step: 102640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:51,437-Speed 5571.18 samples/sec   Loss 0.9988   LearningRate 0.0009   Epoch: 18   Global Step: 102650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:53,262-Speed 5613.02 samples/sec   Loss 1.0013   LearningRate 0.0009   Epoch: 18   Global Step: 102660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:55:55,132-Speed 5480.19 samples/sec   Loss 0.9888   LearningRate 0.0009   Epoch: 18   Global Step: 102670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:55:56,990-Speed 5511.97 samples/sec   Loss 1.0504   LearningRate 0.0009   Epoch: 18   Global Step: 102680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:55:58,861-Speed 5474.41 samples/sec   Loss 0.9441   LearningRate 0.0009   Epoch: 18   Global Step: 102690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:00,725-Speed 5496.86 samples/sec   Loss 0.9987   LearningRate 0.0009   Epoch: 18   Global Step: 102700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:02,578-Speed 5527.26 samples/sec   Loss 1.0021   LearningRate 0.0009   Epoch: 18   Global Step: 102710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:04,419-Speed 5562.08 samples/sec   Loss 1.0044   LearningRate 0.0009   Epoch: 18   Global Step: 102720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:06,252-Speed 5589.58 samples/sec   Loss 1.0582   LearningRate 0.0009   Epoch: 18   Global Step: 102730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:08,106-Speed 5524.15 samples/sec   Loss 1.0269   LearningRate 0.0009   Epoch: 18   Global Step: 102740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:09,957-Speed 5535.03 samples/sec   Loss 0.9887   LearningRate 0.0009   Epoch: 18   Global Step: 102750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:11,777-Speed 5630.70 samples/sec   Loss 1.0588   LearningRate 0.0009   Epoch: 18   Global Step: 102760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:13,595-Speed 5631.94 samples/sec   Loss 0.9912   LearningRate 0.0009   Epoch: 18   Global Step: 102770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:15,423-Speed 5604.48 samples/sec   Loss 1.0604   LearningRate 0.0009   Epoch: 18   Global Step: 102780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:17,257-Speed 5584.72 samples/sec   Loss 0.9339   LearningRate 0.0009   Epoch: 18   Global Step: 102790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:19,115-Speed 5514.86 samples/sec   Loss 1.0507   LearningRate 0.0009   Epoch: 18   Global Step: 102800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:20,977-Speed 5498.48 samples/sec   Loss 1.0599   LearningRate 0.0009   Epoch: 18   Global Step: 102810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:22,816-Speed 5571.80 samples/sec   Loss 1.0483   LearningRate 0.0009   Epoch: 18   Global Step: 102820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:24,689-Speed 5466.79 samples/sec   Loss 1.0886   LearningRate 0.0009   Epoch: 18   Global Step: 102830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:26,532-Speed 5559.53 samples/sec   Loss 1.1020   LearningRate 0.0009   Epoch: 18   Global Step: 102840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:28,361-Speed 5601.88 samples/sec   Loss 0.9842   LearningRate 0.0009   Epoch: 18   Global Step: 102850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:30,196-Speed 5580.39 samples/sec   Loss 0.9914   LearningRate 0.0009   Epoch: 18   Global Step: 102860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:32,031-Speed 5582.63 samples/sec   Loss 1.0278   LearningRate 0.0009   Epoch: 18   Global Step: 102870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:56:33,852-Speed 5624.34 samples/sec   Loss 1.0549   LearningRate 0.0009   Epoch: 18   Global Step: 102880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:56:35,697-Speed 5552.37 samples/sec   Loss 1.0151   LearningRate 0.0009   Epoch: 18   Global Step: 102890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:37,543-Speed 5550.83 samples/sec   Loss 1.0708   LearningRate 0.0009   Epoch: 18   Global Step: 102900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:39,391-Speed 5541.45 samples/sec   Loss 0.9517   LearningRate 0.0009   Epoch: 18   Global Step: 102910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:41,481-Speed 4901.94 samples/sec   Loss 0.9987   LearningRate 0.0009   Epoch: 18   Global Step: 102920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:43,337-Speed 5517.84 samples/sec   Loss 0.9706   LearningRate 0.0009   Epoch: 18   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:45,166-Speed 5601.36 samples/sec   Loss 0.9552   LearningRate 0.0009   Epoch: 18   Global Step: 102940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:47,011-Speed 5552.55 samples/sec   Loss 0.9950   LearningRate 0.0009   Epoch: 18   Global Step: 102950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:48,832-Speed 5623.64 samples/sec   Loss 1.0199   LearningRate 0.0009   Epoch: 18   Global Step: 102960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:50,664-Speed 5592.93 samples/sec   Loss 1.0077   LearningRate 0.0009   Epoch: 18   Global Step: 102970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:52,504-Speed 5565.79 samples/sec   Loss 1.0684   LearningRate 0.0009   Epoch: 18   Global Step: 102980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:54,332-Speed 5603.75 samples/sec   Loss 1.0763   LearningRate 0.0009   Epoch: 18   Global Step: 102990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:56,166-Speed 5586.59 samples/sec   Loss 1.0240   LearningRate 0.0009   Epoch: 18   Global Step: 103000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:58,019-Speed 5528.61 samples/sec   Loss 0.9381   LearningRate 0.0009   Epoch: 18   Global Step: 103010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:56:59,861-Speed 5561.25 samples/sec   Loss 1.0730   LearningRate 0.0009   Epoch: 18   Global Step: 103020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:01,702-Speed 5564.91 samples/sec   Loss 1.0282   LearningRate 0.0009   Epoch: 18   Global Step: 103030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:03,537-Speed 5580.67 samples/sec   Loss 1.0208   LearningRate 0.0009   Epoch: 18   Global Step: 103040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:05,373-Speed 5579.51 samples/sec   Loss 0.9601   LearningRate 0.0009   Epoch: 18   Global Step: 103050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:07,219-Speed 5547.25 samples/sec   Loss 1.0599   LearningRate 0.0009   Epoch: 18   Global Step: 103060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:09,051-Speed 5594.23 samples/sec   Loss 1.0006   LearningRate 0.0009   Epoch: 18   Global Step: 103070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:10,886-Speed 5580.79 samples/sec   Loss 0.9659   LearningRate 0.0009   Epoch: 18   Global Step: 103080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:12,700-Speed 5646.62 samples/sec   Loss 0.9316   LearningRate 0.0009   Epoch: 18   Global Step: 103090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:14,531-Speed 5596.06 samples/sec   Loss 1.1055   LearningRate 0.0009   Epoch: 18   Global Step: 103100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:16,368-Speed 5573.21 samples/sec   Loss 1.0659   LearningRate 0.0009   Epoch: 18   Global Step: 103110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:18,187-Speed 5634.58 samples/sec   Loss 1.0158   LearningRate 0.0009   Epoch: 18   Global Step: 103120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:20,038-Speed 5532.09 samples/sec   Loss 1.0933   LearningRate 0.0009   Epoch: 18   Global Step: 103130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:21,870-Speed 5592.48 samples/sec   Loss 1.0709   LearningRate 0.0009   Epoch: 18   Global Step: 103140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:23,731-Speed 5504.83 samples/sec   Loss 0.9931   LearningRate 0.0009   Epoch: 18   Global Step: 103150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:25,572-Speed 5564.14 samples/sec   Loss 1.0661   LearningRate 0.0009   Epoch: 18   Global Step: 103160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:27,418-Speed 5549.12 samples/sec   Loss 1.0308   LearningRate 0.0009   Epoch: 18   Global Step: 103170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:29,252-Speed 5584.73 samples/sec   Loss 1.0587   LearningRate 0.0009   Epoch: 18   Global Step: 103180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:31,072-Speed 5627.83 samples/sec   Loss 1.0299   LearningRate 0.0009   Epoch: 18   Global Step: 103190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:32,890-Speed 5634.73 samples/sec   Loss 0.9866   LearningRate 0.0009   Epoch: 18   Global Step: 103200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:34,739-Speed 5538.84 samples/sec   Loss 0.9706   LearningRate 0.0009   Epoch: 18   Global Step: 103210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:36,586-Speed 5545.40 samples/sec   Loss 1.0151   LearningRate 0.0009   Epoch: 18   Global Step: 103220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:38,431-Speed 5552.16 samples/sec   Loss 1.0217   LearningRate 0.0009   Epoch: 18   Global Step: 103230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:40,271-Speed 5566.47 samples/sec   Loss 1.0064   LearningRate 0.0008   Epoch: 18   Global Step: 103240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:42,125-Speed 5527.76 samples/sec   Loss 1.0164   LearningRate 0.0008   Epoch: 18   Global Step: 103250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:43,964-Speed 5570.09 samples/sec   Loss 0.9826   LearningRate 0.0008   Epoch: 18   Global Step: 103260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:45,808-Speed 5553.37 samples/sec   Loss 1.0281   LearningRate 0.0008   Epoch: 18   Global Step: 103270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:47,629-Speed 5626.43 samples/sec   Loss 1.0529   LearningRate 0.0008   Epoch: 18   Global Step: 103280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:49,448-Speed 5630.92 samples/sec   Loss 1.0731   LearningRate 0.0008   Epoch: 18   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:51,285-Speed 5576.94 samples/sec   Loss 1.0377   LearningRate 0.0008   Epoch: 18   Global Step: 103300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:53,118-Speed 5587.85 samples/sec   Loss 1.0118   LearningRate 0.0008   Epoch: 18   Global Step: 103310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:54,981-Speed 5498.41 samples/sec   Loss 1.0225   LearningRate 0.0008   Epoch: 18   Global Step: 103320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:56,844-Speed 5498.32 samples/sec   Loss 1.0088   LearningRate 0.0008   Epoch: 18   Global Step: 103330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:57:58,685-Speed 5565.07 samples/sec   Loss 1.0463   LearningRate 0.0008   Epoch: 18   Global Step: 103340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:00,514-Speed 5599.71 samples/sec   Loss 1.0218   LearningRate 0.0008   Epoch: 18   Global Step: 103350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:02,356-Speed 5562.11 samples/sec   Loss 1.0113   LearningRate 0.0008   Epoch: 18   Global Step: 103360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:04,190-Speed 5584.07 samples/sec   Loss 0.9756   LearningRate 0.0008   Epoch: 18   Global Step: 103370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:06,011-Speed 5624.74 samples/sec   Loss 1.0271   LearningRate 0.0008   Epoch: 18   Global Step: 103380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:07,840-Speed 5601.80 samples/sec   Loss 1.0817   LearningRate 0.0008   Epoch: 18   Global Step: 103390   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:58:09,653-Speed 5651.18 samples/sec   Loss 1.0502   LearningRate 0.0008   Epoch: 18   Global Step: 103400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:11,481-Speed 5601.37 samples/sec   Loss 1.0251   LearningRate 0.0008   Epoch: 18   Global Step: 103410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:13,332-Speed 5535.84 samples/sec   Loss 0.9920   LearningRate 0.0008   Epoch: 18   Global Step: 103420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:15,171-Speed 5570.19 samples/sec   Loss 1.0174   LearningRate 0.0008   Epoch: 18   Global Step: 103430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:17,015-Speed 5554.81 samples/sec   Loss 0.9810   LearningRate 0.0008   Epoch: 18   Global Step: 103440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:18,855-Speed 5567.40 samples/sec   Loss 1.0782   LearningRate 0.0008   Epoch: 18   Global Step: 103450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:20,711-Speed 5518.06 samples/sec   Loss 1.0463   LearningRate 0.0008   Epoch: 18   Global Step: 103460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:22,534-Speed 5619.64 samples/sec   Loss 1.0155   LearningRate 0.0008   Epoch: 18   Global Step: 103470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:24,376-Speed 5560.88 samples/sec   Loss 1.0037   LearningRate 0.0008   Epoch: 18   Global Step: 103480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:26,204-Speed 5603.17 samples/sec   Loss 1.0388   LearningRate 0.0008   Epoch: 18   Global Step: 103490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:28,031-Speed 5607.06 samples/sec   Loss 1.0062   LearningRate 0.0008   Epoch: 18   Global Step: 103500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 07:58:29,855-Speed 5617.00 samples/sec   Loss 1.0292   LearningRate 0.0008   Epoch: 18   Global Step: 103510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:31,694-Speed 5570.25 samples/sec   Loss 1.0865   LearningRate 0.0008   Epoch: 18   Global Step: 103520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:33,524-Speed 5597.02 samples/sec   Loss 0.9890   LearningRate 0.0008   Epoch: 18   Global Step: 103530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:35,355-Speed 5592.64 samples/sec   Loss 1.0452   LearningRate 0.0008   Epoch: 18   Global Step: 103540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:37,193-Speed 5572.76 samples/sec   Loss 1.0462   LearningRate 0.0008   Epoch: 18   Global Step: 103550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:39,022-Speed 5603.01 samples/sec   Loss 1.0016   LearningRate 0.0008   Epoch: 18   Global Step: 103560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:40,856-Speed 5583.54 samples/sec   Loss 1.0388   LearningRate 0.0008   Epoch: 18   Global Step: 103570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:42,677-Speed 5626.25 samples/sec   Loss 0.9499   LearningRate 0.0008   Epoch: 18   Global Step: 103580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:44,499-Speed 5620.95 samples/sec   Loss 1.0483   LearningRate 0.0008   Epoch: 18   Global Step: 103590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:46,325-Speed 5610.00 samples/sec   Loss 1.0364   LearningRate 0.0008   Epoch: 18   Global Step: 103600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:48,148-Speed 5623.99 samples/sec   Loss 0.9702   LearningRate 0.0008   Epoch: 18   Global Step: 103610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:58:49,974-Speed 5609.19 samples/sec   Loss 1.0615   LearningRate 0.0008   Epoch: 18   Global Step: 103620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:51,816-Speed 5561.71 samples/sec   Loss 0.9431   LearningRate 0.0008   Epoch: 18   Global Step: 103630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:53,656-Speed 5564.22 samples/sec   Loss 1.0484   LearningRate 0.0008   Epoch: 18   Global Step: 103640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:55,490-Speed 5587.37 samples/sec   Loss 1.0075   LearningRate 0.0008   Epoch: 18   Global Step: 103650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:57,316-Speed 5609.04 samples/sec   Loss 1.1037   LearningRate 0.0008   Epoch: 18   Global Step: 103660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:58:59,154-Speed 5574.61 samples/sec   Loss 1.0753   LearningRate 0.0008   Epoch: 18   Global Step: 103670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:01,062-Speed 5368.03 samples/sec   Loss 0.9682   LearningRate 0.0008   Epoch: 18   Global Step: 103680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:02,925-Speed 5497.72 samples/sec   Loss 1.0394   LearningRate 0.0008   Epoch: 18   Global Step: 103690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:04,760-Speed 5583.38 samples/sec   Loss 1.0374   LearningRate 0.0008   Epoch: 18   Global Step: 103700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:06,612-Speed 5529.52 samples/sec   Loss 1.0018   LearningRate 0.0008   Epoch: 18   Global Step: 103710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:08,439-Speed 5607.40 samples/sec   Loss 1.0032   LearningRate 0.0008   Epoch: 18   Global Step: 103720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:10,269-Speed 5596.38 samples/sec   Loss 1.0681   LearningRate 0.0008   Epoch: 18   Global Step: 103730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:12,100-Speed 5596.73 samples/sec   Loss 1.0889   LearningRate 0.0008   Epoch: 18   Global Step: 103740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:13,940-Speed 5565.66 samples/sec   Loss 1.0668   LearningRate 0.0008   Epoch: 18   Global Step: 103750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:15,789-Speed 5539.07 samples/sec   Loss 1.0242   LearningRate 0.0008   Epoch: 18   Global Step: 103760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:17,630-Speed 5564.86 samples/sec   Loss 1.0376   LearningRate 0.0008   Epoch: 18   Global Step: 103770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:19,463-Speed 5589.17 samples/sec   Loss 1.0415   LearningRate 0.0008   Epoch: 18   Global Step: 103780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:21,335-Speed 5470.91 samples/sec   Loss 1.1068   LearningRate 0.0008   Epoch: 18   Global Step: 103790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:23,160-Speed 5612.69 samples/sec   Loss 1.0472   LearningRate 0.0008   Epoch: 18   Global Step: 103800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:24,986-Speed 5609.08 samples/sec   Loss 0.9841   LearningRate 0.0008   Epoch: 18   Global Step: 103810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:26,808-Speed 5623.46 samples/sec   Loss 1.0483   LearningRate 0.0008   Epoch: 18   Global Step: 103820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 07:59:28,620-Speed 5653.85 samples/sec   Loss 1.0729   LearningRate 0.0008   Epoch: 18   Global Step: 103830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:30,459-Speed 5569.69 samples/sec   Loss 0.9850   LearningRate 0.0008   Epoch: 18   Global Step: 103840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:32,293-Speed 5584.94 samples/sec   Loss 1.0240   LearningRate 0.0008   Epoch: 18   Global Step: 103850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:34,119-Speed 5612.28 samples/sec   Loss 1.0729   LearningRate 0.0008   Epoch: 18   Global Step: 103860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:35,976-Speed 5514.21 samples/sec   Loss 0.9934   LearningRate 0.0008   Epoch: 18   Global Step: 103870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:37,811-Speed 5583.62 samples/sec   Loss 1.0409   LearningRate 0.0007   Epoch: 18   Global Step: 103880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:39,649-Speed 5572.90 samples/sec   Loss 1.0355   LearningRate 0.0007   Epoch: 18   Global Step: 103890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:41,484-Speed 5581.84 samples/sec   Loss 1.1386   LearningRate 0.0007   Epoch: 18   Global Step: 103900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:43,384-Speed 5391.43 samples/sec   Loss 0.9868   LearningRate 0.0007   Epoch: 18   Global Step: 103910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:45,307-Speed 5326.16 samples/sec   Loss 0.9636   LearningRate 0.0007   Epoch: 18   Global Step: 103920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:47,143-Speed 5578.27 samples/sec   Loss 1.1160   LearningRate 0.0007   Epoch: 18   Global Step: 103930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:48,964-Speed 5627.16 samples/sec   Loss 0.9392   LearningRate 0.0007   Epoch: 18   Global Step: 103940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:50,787-Speed 5616.18 samples/sec   Loss 1.0570   LearningRate 0.0007   Epoch: 18   Global Step: 103950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:52,624-Speed 5576.80 samples/sec   Loss 1.0191   LearningRate 0.0007   Epoch: 18   Global Step: 103960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:54,458-Speed 5584.57 samples/sec   Loss 1.0025   LearningRate 0.0007   Epoch: 18   Global Step: 103970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:56,287-Speed 5601.53 samples/sec   Loss 1.1198   LearningRate 0.0007   Epoch: 18   Global Step: 103980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:58,113-Speed 5609.87 samples/sec   Loss 1.0771   LearningRate 0.0007   Epoch: 18   Global Step: 103990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 07:59:59,929-Speed 5642.91 samples/sec   Loss 1.0455   LearningRate 0.0007   Epoch: 18   Global Step: 104000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:00:26,056-[lfw][104000]XNorm: 21.937785
Training: 2022-04-27 08:00:26,057-[lfw][104000]Accuracy-Flip: 0.99817+-0.00252
Training: 2022-04-27 08:00:26,057-[lfw][104000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:00:56,312-[cfp_fp][104000]XNorm: 21.038418
Training: 2022-04-27 08:00:56,312-[cfp_fp][104000]Accuracy-Flip: 0.97814+-0.00645
Training: 2022-04-27 08:00:56,313-[cfp_fp][104000]Accuracy-Highest: 0.97929
Training: 2022-04-27 08:01:22,464-[agedb_30][104000]XNorm: 22.024103
Training: 2022-04-27 08:01:22,465-[agedb_30][104000]Accuracy-Flip: 0.98133+-0.00640
Training: 2022-04-27 08:01:22,465-[agedb_30][104000]Accuracy-Highest: 0.98183
Training: 2022-04-27 08:01:24,296-Speed 121.37 samples/sec   Loss 0.9541   LearningRate 0.0007   Epoch: 18   Global Step: 104010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:26,113-Speed 5638.03 samples/sec   Loss 1.0262   LearningRate 0.0007   Epoch: 18   Global Step: 104020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:27,929-Speed 5641.27 samples/sec   Loss 0.9317   LearningRate 0.0007   Epoch: 18   Global Step: 104030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:29,746-Speed 5637.71 samples/sec   Loss 0.9943   LearningRate 0.0007   Epoch: 18   Global Step: 104040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:31,562-Speed 5640.02 samples/sec   Loss 1.0729   LearningRate 0.0007   Epoch: 18   Global Step: 104050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:33,398-Speed 5578.29 samples/sec   Loss 1.0019   LearningRate 0.0007   Epoch: 18   Global Step: 104060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:35,211-Speed 5652.37 samples/sec   Loss 1.0129   LearningRate 0.0007   Epoch: 18   Global Step: 104070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:37,029-Speed 5631.47 samples/sec   Loss 1.0375   LearningRate 0.0007   Epoch: 18   Global Step: 104080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:38,851-Speed 5623.80 samples/sec   Loss 0.9718   LearningRate 0.0007   Epoch: 18   Global Step: 104090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:40,712-Speed 5503.48 samples/sec   Loss 0.9917   LearningRate 0.0007   Epoch: 18   Global Step: 104100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:42,543-Speed 5596.25 samples/sec   Loss 1.0478   LearningRate 0.0007   Epoch: 18   Global Step: 104110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:44,358-Speed 5641.79 samples/sec   Loss 1.0711   LearningRate 0.0007   Epoch: 18   Global Step: 104120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:46,167-Speed 5663.16 samples/sec   Loss 1.0105   LearningRate 0.0007   Epoch: 18   Global Step: 104130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:47,991-Speed 5614.77 samples/sec   Loss 1.0131   LearningRate 0.0007   Epoch: 18   Global Step: 104140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:49,806-Speed 5642.86 samples/sec   Loss 0.9935   LearningRate 0.0007   Epoch: 18   Global Step: 104150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:51,630-Speed 5619.32 samples/sec   Loss 1.0330   LearningRate 0.0007   Epoch: 18   Global Step: 104160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:53,484-Speed 5523.32 samples/sec   Loss 0.9973   LearningRate 0.0007   Epoch: 18   Global Step: 104170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:55,311-Speed 5609.05 samples/sec   Loss 1.0635   LearningRate 0.0007   Epoch: 18   Global Step: 104180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:57,135-Speed 5614.50 samples/sec   Loss 1.0349   LearningRate 0.0007   Epoch: 18   Global Step: 104190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:01:58,955-Speed 5628.23 samples/sec   Loss 1.1048   LearningRate 0.0007   Epoch: 18   Global Step: 104200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:00,773-Speed 5632.63 samples/sec   Loss 0.9975   LearningRate 0.0007   Epoch: 18   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:02,610-Speed 5578.87 samples/sec   Loss 1.0452   LearningRate 0.0007   Epoch: 18   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:04,417-Speed 5666.73 samples/sec   Loss 0.9872   LearningRate 0.0007   Epoch: 18   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:06,247-Speed 5598.66 samples/sec   Loss 1.0215   LearningRate 0.0007   Epoch: 18   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:08,088-Speed 5564.22 samples/sec   Loss 1.0248   LearningRate 0.0007   Epoch: 18   Global Step: 104250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:09,924-Speed 5577.62 samples/sec   Loss 1.0174   LearningRate 0.0007   Epoch: 18   Global Step: 104260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:11,740-Speed 5642.43 samples/sec   Loss 1.0753   LearningRate 0.0007   Epoch: 18   Global Step: 104270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:13,567-Speed 5604.63 samples/sec   Loss 1.0515   LearningRate 0.0007   Epoch: 18   Global Step: 104280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:15,393-Speed 5610.04 samples/sec   Loss 1.0877   LearningRate 0.0007   Epoch: 18   Global Step: 104290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:17,213-Speed 5630.22 samples/sec   Loss 1.0131   LearningRate 0.0007   Epoch: 18   Global Step: 104300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:19,054-Speed 5564.28 samples/sec   Loss 1.0918   LearningRate 0.0007   Epoch: 18   Global Step: 104310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:20,875-Speed 5623.94 samples/sec   Loss 1.0005   LearningRate 0.0007   Epoch: 18   Global Step: 104320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:22,704-Speed 5600.53 samples/sec   Loss 0.9662   LearningRate 0.0007   Epoch: 18   Global Step: 104330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 08:02:24,517-Speed 5651.15 samples/sec   Loss 0.9910   LearningRate 0.0007   Epoch: 18   Global Step: 104340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:26,342-Speed 5613.00 samples/sec   Loss 1.0847   LearningRate 0.0007   Epoch: 18   Global Step: 104350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:28,164-Speed 5620.78 samples/sec   Loss 1.0312   LearningRate 0.0007   Epoch: 18   Global Step: 104360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:29,980-Speed 5640.70 samples/sec   Loss 1.0021   LearningRate 0.0007   Epoch: 18   Global Step: 104370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:31,799-Speed 5630.86 samples/sec   Loss 0.9874   LearningRate 0.0007   Epoch: 18   Global Step: 104380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:33,622-Speed 5621.31 samples/sec   Loss 1.0183   LearningRate 0.0007   Epoch: 18   Global Step: 104390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:35,442-Speed 5627.77 samples/sec   Loss 1.0111   LearningRate 0.0007   Epoch: 18   Global Step: 104400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:37,278-Speed 5579.01 samples/sec   Loss 1.0107   LearningRate 0.0007   Epoch: 18   Global Step: 104410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:39,103-Speed 5614.24 samples/sec   Loss 1.1240   LearningRate 0.0007   Epoch: 18   Global Step: 104420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:40,936-Speed 5585.77 samples/sec   Loss 1.0056   LearningRate 0.0007   Epoch: 18   Global Step: 104430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:42,746-Speed 5659.99 samples/sec   Loss 1.0209   LearningRate 0.0007   Epoch: 18   Global Step: 104440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:44,572-Speed 5609.24 samples/sec   Loss 1.0343   LearningRate 0.0007   Epoch: 18   Global Step: 104450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:46,392-Speed 5628.54 samples/sec   Loss 1.0150   LearningRate 0.0007   Epoch: 18   Global Step: 104460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:48,236-Speed 5554.48 samples/sec   Loss 1.0381   LearningRate 0.0007   Epoch: 18   Global Step: 104470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:50,058-Speed 5622.95 samples/sec   Loss 1.0003   LearningRate 0.0007   Epoch: 18   Global Step: 104480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:51,882-Speed 5615.18 samples/sec   Loss 1.0416   LearningRate 0.0007   Epoch: 18   Global Step: 104490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:53,706-Speed 5616.26 samples/sec   Loss 0.9945   LearningRate 0.0007   Epoch: 18   Global Step: 104500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:55,542-Speed 5580.36 samples/sec   Loss 1.0468   LearningRate 0.0007   Epoch: 18   Global Step: 104510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:57,372-Speed 5596.22 samples/sec   Loss 0.9960   LearningRate 0.0007   Epoch: 18   Global Step: 104520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:02:59,198-Speed 5611.07 samples/sec   Loss 0.9860   LearningRate 0.0007   Epoch: 18   Global Step: 104530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:01,012-Speed 5647.31 samples/sec   Loss 1.0803   LearningRate 0.0007   Epoch: 18   Global Step: 104540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:02,846-Speed 5584.68 samples/sec   Loss 1.0506   LearningRate 0.0007   Epoch: 18   Global Step: 104550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:04,668-Speed 5621.60 samples/sec   Loss 1.0778   LearningRate 0.0006   Epoch: 18   Global Step: 104560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:06,503-Speed 5582.74 samples/sec   Loss 1.0833   LearningRate 0.0006   Epoch: 18   Global Step: 104570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:08,342-Speed 5571.12 samples/sec   Loss 0.9907   LearningRate 0.0006   Epoch: 18   Global Step: 104580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:10,166-Speed 5616.30 samples/sec   Loss 0.9890   LearningRate 0.0006   Epoch: 18   Global Step: 104590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:11,995-Speed 5599.31 samples/sec   Loss 0.9655   LearningRate 0.0006   Epoch: 18   Global Step: 104600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:13,817-Speed 5623.29 samples/sec   Loss 1.1457   LearningRate 0.0006   Epoch: 18   Global Step: 104610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:15,645-Speed 5601.71 samples/sec   Loss 1.0292   LearningRate 0.0006   Epoch: 18   Global Step: 104620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:17,465-Speed 5628.06 samples/sec   Loss 0.9896   LearningRate 0.0006   Epoch: 18   Global Step: 104630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:19,281-Speed 5640.91 samples/sec   Loss 1.0467   LearningRate 0.0006   Epoch: 18   Global Step: 104640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:21,108-Speed 5605.20 samples/sec   Loss 0.9798   LearningRate 0.0006   Epoch: 18   Global Step: 104650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:22,922-Speed 5649.49 samples/sec   Loss 1.0690   LearningRate 0.0006   Epoch: 18   Global Step: 104660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:03:24,782-Speed 5506.54 samples/sec   Loss 1.0619   LearningRate 0.0006   Epoch: 18   Global Step: 104670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:26,601-Speed 5631.77 samples/sec   Loss 1.0440   LearningRate 0.0006   Epoch: 18   Global Step: 104680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:28,422-Speed 5626.13 samples/sec   Loss 1.0287   LearningRate 0.0006   Epoch: 18   Global Step: 104690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:30,261-Speed 5569.71 samples/sec   Loss 1.0182   LearningRate 0.0006   Epoch: 18   Global Step: 104700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:32,081-Speed 5626.76 samples/sec   Loss 1.0969   LearningRate 0.0006   Epoch: 18   Global Step: 104710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:33,900-Speed 5631.71 samples/sec   Loss 1.0050   LearningRate 0.0006   Epoch: 18   Global Step: 104720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:35,726-Speed 5610.01 samples/sec   Loss 1.0993   LearningRate 0.0006   Epoch: 18   Global Step: 104730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:37,559-Speed 5587.00 samples/sec   Loss 0.9770   LearningRate 0.0006   Epoch: 18   Global Step: 104740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:39,384-Speed 5614.04 samples/sec   Loss 1.0506   LearningRate 0.0006   Epoch: 18   Global Step: 104750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:41,205-Speed 5626.29 samples/sec   Loss 1.0049   LearningRate 0.0006   Epoch: 18   Global Step: 104760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:43,020-Speed 5643.84 samples/sec   Loss 1.1262   LearningRate 0.0006   Epoch: 18   Global Step: 104770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:44,849-Speed 5599.26 samples/sec   Loss 0.9807   LearningRate 0.0006   Epoch: 18   Global Step: 104780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:46,680-Speed 5595.59 samples/sec   Loss 0.9978   LearningRate 0.0006   Epoch: 18   Global Step: 104790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:48,509-Speed 5601.50 samples/sec   Loss 1.0036   LearningRate 0.0006   Epoch: 18   Global Step: 104800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:50,331-Speed 5620.51 samples/sec   Loss 1.0675   LearningRate 0.0006   Epoch: 18   Global Step: 104810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:52,174-Speed 5557.12 samples/sec   Loss 0.9902   LearningRate 0.0006   Epoch: 18   Global Step: 104820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:53,995-Speed 5625.32 samples/sec   Loss 0.9928   LearningRate 0.0006   Epoch: 18   Global Step: 104830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:55,826-Speed 5594.29 samples/sec   Loss 1.0614   LearningRate 0.0006   Epoch: 18   Global Step: 104840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:57,653-Speed 5608.84 samples/sec   Loss 1.0938   LearningRate 0.0006   Epoch: 18   Global Step: 104850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:03:59,488-Speed 5582.28 samples/sec   Loss 1.0971   LearningRate 0.0006   Epoch: 18   Global Step: 104860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:01,302-Speed 5646.32 samples/sec   Loss 1.0277   LearningRate 0.0006   Epoch: 18   Global Step: 104870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:03,136-Speed 5584.39 samples/sec   Loss 1.1356   LearningRate 0.0006   Epoch: 18   Global Step: 104880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:04,960-Speed 5615.76 samples/sec   Loss 1.0617   LearningRate 0.0006   Epoch: 18   Global Step: 104890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:06,780-Speed 5630.04 samples/sec   Loss 1.0217   LearningRate 0.0006   Epoch: 18   Global Step: 104900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:08,622-Speed 5564.41 samples/sec   Loss 0.9526   LearningRate 0.0006   Epoch: 18   Global Step: 104910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:10,469-Speed 5544.67 samples/sec   Loss 1.0019   LearningRate 0.0006   Epoch: 18   Global Step: 104920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:12,290-Speed 5625.08 samples/sec   Loss 0.9927   LearningRate 0.0006   Epoch: 18   Global Step: 104930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:14,125-Speed 5582.88 samples/sec   Loss 1.0255   LearningRate 0.0006   Epoch: 18   Global Step: 104940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:15,951-Speed 5609.14 samples/sec   Loss 1.0332   LearningRate 0.0006   Epoch: 18   Global Step: 104950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:17,776-Speed 5613.14 samples/sec   Loss 0.9997   LearningRate 0.0006   Epoch: 18   Global Step: 104960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:19,618-Speed 5562.25 samples/sec   Loss 0.9670   LearningRate 0.0006   Epoch: 18   Global Step: 104970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:21,450-Speed 5590.14 samples/sec   Loss 0.9671   LearningRate 0.0006   Epoch: 18   Global Step: 104980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:23,298-Speed 5542.00 samples/sec   Loss 0.9880   LearningRate 0.0006   Epoch: 18   Global Step: 104990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:25,139-Speed 5565.52 samples/sec   Loss 1.0014   LearningRate 0.0006   Epoch: 18   Global Step: 105000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:26,959-Speed 5627.26 samples/sec   Loss 1.0303   LearningRate 0.0006   Epoch: 18   Global Step: 105010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:28,799-Speed 5569.60 samples/sec   Loss 1.0750   LearningRate 0.0006   Epoch: 18   Global Step: 105020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:30,629-Speed 5595.29 samples/sec   Loss 1.0340   LearningRate 0.0006   Epoch: 18   Global Step: 105030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:32,449-Speed 5629.81 samples/sec   Loss 1.0645   LearningRate 0.0006   Epoch: 18   Global Step: 105040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:34,268-Speed 5631.94 samples/sec   Loss 0.9922   LearningRate 0.0006   Epoch: 18   Global Step: 105050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:36,094-Speed 5609.21 samples/sec   Loss 1.0084   LearningRate 0.0006   Epoch: 18   Global Step: 105060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:37,918-Speed 5614.28 samples/sec   Loss 1.0241   LearningRate 0.0006   Epoch: 18   Global Step: 105070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:39,749-Speed 5595.74 samples/sec   Loss 1.0481   LearningRate 0.0006   Epoch: 18   Global Step: 105080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:41,609-Speed 5507.53 samples/sec   Loss 1.0029   LearningRate 0.0006   Epoch: 18   Global Step: 105090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:43,514-Speed 5376.94 samples/sec   Loss 1.0933   LearningRate 0.0006   Epoch: 18   Global Step: 105100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:45,351-Speed 5574.97 samples/sec   Loss 0.9843   LearningRate 0.0006   Epoch: 18   Global Step: 105110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:47,168-Speed 5640.76 samples/sec   Loss 1.0433   LearningRate 0.0006   Epoch: 18   Global Step: 105120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:49,014-Speed 5547.49 samples/sec   Loss 1.0215   LearningRate 0.0006   Epoch: 18   Global Step: 105130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:50,834-Speed 5628.32 samples/sec   Loss 1.0326   LearningRate 0.0006   Epoch: 18   Global Step: 105140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:52,673-Speed 5570.59 samples/sec   Loss 1.1659   LearningRate 0.0006   Epoch: 18   Global Step: 105150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:54,494-Speed 5626.57 samples/sec   Loss 1.0376   LearningRate 0.0006   Epoch: 18   Global Step: 105160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:56,312-Speed 5634.35 samples/sec   Loss 1.0401   LearningRate 0.0006   Epoch: 18   Global Step: 105170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 08:04:58,129-Speed 5636.97 samples/sec   Loss 1.0202   LearningRate 0.0006   Epoch: 18   Global Step: 105180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:04:59,980-Speed 5534.31 samples/sec   Loss 0.9740   LearningRate 0.0006   Epoch: 18   Global Step: 105190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:01,818-Speed 5571.93 samples/sec   Loss 1.0278   LearningRate 0.0006   Epoch: 18   Global Step: 105200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:03,659-Speed 5563.06 samples/sec   Loss 1.0775   LearningRate 0.0006   Epoch: 18   Global Step: 105210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:05,489-Speed 5599.16 samples/sec   Loss 1.0972   LearningRate 0.0006   Epoch: 18   Global Step: 105220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:07,311-Speed 5621.77 samples/sec   Loss 1.0403   LearningRate 0.0006   Epoch: 18   Global Step: 105230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:09,157-Speed 5550.18 samples/sec   Loss 0.9780   LearningRate 0.0006   Epoch: 18   Global Step: 105240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:10,979-Speed 5622.13 samples/sec   Loss 1.0080   LearningRate 0.0006   Epoch: 18   Global Step: 105250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:12,797-Speed 5634.90 samples/sec   Loss 1.0856   LearningRate 0.0006   Epoch: 18   Global Step: 105260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:14,662-Speed 5491.78 samples/sec   Loss 1.0630   LearningRate 0.0006   Epoch: 18   Global Step: 105270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:16,521-Speed 5510.97 samples/sec   Loss 1.0912   LearningRate 0.0006   Epoch: 18   Global Step: 105280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:18,347-Speed 5609.36 samples/sec   Loss 1.0675   LearningRate 0.0005   Epoch: 18   Global Step: 105290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:20,176-Speed 5599.52 samples/sec   Loss 0.9936   LearningRate 0.0005   Epoch: 18   Global Step: 105300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:22,023-Speed 5546.30 samples/sec   Loss 0.9425   LearningRate 0.0005   Epoch: 18   Global Step: 105310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:23,859-Speed 5577.66 samples/sec   Loss 1.0334   LearningRate 0.0005   Epoch: 18   Global Step: 105320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:25,689-Speed 5597.74 samples/sec   Loss 1.0573   LearningRate 0.0005   Epoch: 18   Global Step: 105330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:27,529-Speed 5569.27 samples/sec   Loss 0.9516   LearningRate 0.0005   Epoch: 18   Global Step: 105340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:29,364-Speed 5580.24 samples/sec   Loss 1.0418   LearningRate 0.0005   Epoch: 18   Global Step: 105350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:31,184-Speed 5630.18 samples/sec   Loss 1.0445   LearningRate 0.0005   Epoch: 18   Global Step: 105360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:33,010-Speed 5611.37 samples/sec   Loss 1.0557   LearningRate 0.0005   Epoch: 18   Global Step: 105370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:34,837-Speed 5604.23 samples/sec   Loss 0.9727   LearningRate 0.0005   Epoch: 18   Global Step: 105380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:36,690-Speed 5529.73 samples/sec   Loss 0.9136   LearningRate 0.0005   Epoch: 18   Global Step: 105390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:38,548-Speed 5510.87 samples/sec   Loss 1.0495   LearningRate 0.0005   Epoch: 18   Global Step: 105400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:40,397-Speed 5541.34 samples/sec   Loss 0.9163   LearningRate 0.0005   Epoch: 18   Global Step: 105410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:42,270-Speed 5469.90 samples/sec   Loss 0.9643   LearningRate 0.0005   Epoch: 18   Global Step: 105420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:44,186-Speed 5346.49 samples/sec   Loss 1.0354   LearningRate 0.0005   Epoch: 18   Global Step: 105430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:46,028-Speed 5561.04 samples/sec   Loss 0.9942   LearningRate 0.0005   Epoch: 18   Global Step: 105440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:47,897-Speed 5479.93 samples/sec   Loss 1.0187   LearningRate 0.0005   Epoch: 18   Global Step: 105450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:49,745-Speed 5540.70 samples/sec   Loss 1.0810   LearningRate 0.0005   Epoch: 18   Global Step: 105460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:05:51,569-Speed 5616.73 samples/sec   Loss 1.0546   LearningRate 0.0005   Epoch: 18   Global Step: 105470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:53,457-Speed 5426.94 samples/sec   Loss 1.0643   LearningRate 0.0005   Epoch: 18   Global Step: 105480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:55,278-Speed 5624.91 samples/sec   Loss 1.1060   LearningRate 0.0005   Epoch: 18   Global Step: 105490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:57,099-Speed 5624.48 samples/sec   Loss 0.9120   LearningRate 0.0005   Epoch: 18   Global Step: 105500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:05:58,940-Speed 5566.02 samples/sec   Loss 1.0676   LearningRate 0.0005   Epoch: 18   Global Step: 105510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:06:00,759-Speed 5629.82 samples/sec   Loss 0.9861   LearningRate 0.0005   Epoch: 18   Global Step: 105520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:06:02,590-Speed 5595.65 samples/sec   Loss 1.1058   LearningRate 0.0005   Epoch: 18   Global Step: 105530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:06:04,407-Speed 5636.50 samples/sec   Loss 1.0489   LearningRate 0.0005   Epoch: 18   Global Step: 105540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:06:06,246-Speed 5570.58 samples/sec   Loss 1.0468   LearningRate 0.0005   Epoch: 18   Global Step: 105550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:06:08,075-Speed 5599.94 samples/sec   Loss 1.0385   LearningRate 0.0005   Epoch: 18   Global Step: 105560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 08:06:09,902-Speed 5605.95 samples/sec   Loss 0.9901   LearningRate 0.0005   Epoch: 18   Global Step: 105570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:11,750-Speed 5545.18 samples/sec   Loss 0.9225   LearningRate 0.0005   Epoch: 18   Global Step: 105580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:13,591-Speed 5561.97 samples/sec   Loss 1.0231   LearningRate 0.0005   Epoch: 18   Global Step: 105590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:15,420-Speed 5602.19 samples/sec   Loss 1.0238   LearningRate 0.0005   Epoch: 18   Global Step: 105600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:17,249-Speed 5601.79 samples/sec   Loss 1.0517   LearningRate 0.0005   Epoch: 18   Global Step: 105610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:19,128-Speed 5449.84 samples/sec   Loss 1.0521   LearningRate 0.0005   Epoch: 18   Global Step: 105620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:20,955-Speed 5608.37 samples/sec   Loss 1.0500   LearningRate 0.0005   Epoch: 18   Global Step: 105630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:22,820-Speed 5492.28 samples/sec   Loss 1.0501   LearningRate 0.0005   Epoch: 18   Global Step: 105640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:24,636-Speed 5640.84 samples/sec   Loss 1.0125   LearningRate 0.0005   Epoch: 18   Global Step: 105650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:26,460-Speed 5614.06 samples/sec   Loss 0.9783   LearningRate 0.0005   Epoch: 18   Global Step: 105660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:28,299-Speed 5571.04 samples/sec   Loss 1.0494   LearningRate 0.0005   Epoch: 18   Global Step: 105670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:30,124-Speed 5613.06 samples/sec   Loss 1.0361   LearningRate 0.0005   Epoch: 18   Global Step: 105680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:31,968-Speed 5555.91 samples/sec   Loss 1.0821   LearningRate 0.0005   Epoch: 18   Global Step: 105690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:33,806-Speed 5570.27 samples/sec   Loss 1.0110   LearningRate 0.0005   Epoch: 18   Global Step: 105700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:35,648-Speed 5563.39 samples/sec   Loss 1.0151   LearningRate 0.0005   Epoch: 18   Global Step: 105710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:37,480-Speed 5590.93 samples/sec   Loss 0.9386   LearningRate 0.0005   Epoch: 18   Global Step: 105720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:39,312-Speed 5592.76 samples/sec   Loss 1.0082   LearningRate 0.0005   Epoch: 18   Global Step: 105730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:41,135-Speed 5618.66 samples/sec   Loss 0.9919   LearningRate 0.0005   Epoch: 18   Global Step: 105740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:42,954-Speed 5631.90 samples/sec   Loss 1.0752   LearningRate 0.0005   Epoch: 18   Global Step: 105750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 08:06:44,783-Speed 5597.61 samples/sec   Loss 1.0834   LearningRate 0.0005   Epoch: 18   Global Step: 105760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:46,661-Speed 5454.77 samples/sec   Loss 1.0170   LearningRate 0.0005   Epoch: 18   Global Step: 105770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:48,488-Speed 5608.03 samples/sec   Loss 1.0505   LearningRate 0.0005   Epoch: 18   Global Step: 105780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:50,318-Speed 5597.99 samples/sec   Loss 1.0219   LearningRate 0.0005   Epoch: 18   Global Step: 105790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:52,145-Speed 5606.19 samples/sec   Loss 1.0387   LearningRate 0.0005   Epoch: 18   Global Step: 105800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:53,982-Speed 5575.42 samples/sec   Loss 0.9854   LearningRate 0.0005   Epoch: 18   Global Step: 105810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:55,821-Speed 5570.22 samples/sec   Loss 1.0980   LearningRate 0.0005   Epoch: 18   Global Step: 105820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:57,642-Speed 5625.17 samples/sec   Loss 1.0979   LearningRate 0.0005   Epoch: 18   Global Step: 105830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:06:59,489-Speed 5544.75 samples/sec   Loss 1.0566   LearningRate 0.0005   Epoch: 18   Global Step: 105840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:01,324-Speed 5582.44 samples/sec   Loss 1.0832   LearningRate 0.0005   Epoch: 18   Global Step: 105850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:03,163-Speed 5572.71 samples/sec   Loss 1.0498   LearningRate 0.0005   Epoch: 18   Global Step: 105860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:04,980-Speed 5637.46 samples/sec   Loss 0.9651   LearningRate 0.0005   Epoch: 18   Global Step: 105870   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:07:06,788-Speed 5663.05 samples/sec   Loss 1.0346   LearningRate 0.0005   Epoch: 18   Global Step: 105880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:08,612-Speed 5618.12 samples/sec   Loss 0.9743   LearningRate 0.0005   Epoch: 18   Global Step: 105890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:10,437-Speed 5613.51 samples/sec   Loss 1.1036   LearningRate 0.0005   Epoch: 18   Global Step: 105900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:12,260-Speed 5618.55 samples/sec   Loss 1.0265   LearningRate 0.0005   Epoch: 18   Global Step: 105910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:14,081-Speed 5624.67 samples/sec   Loss 0.9915   LearningRate 0.0005   Epoch: 18   Global Step: 105920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:15,895-Speed 5645.74 samples/sec   Loss 1.0326   LearningRate 0.0005   Epoch: 18   Global Step: 105930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:17,713-Speed 5634.58 samples/sec   Loss 1.0514   LearningRate 0.0005   Epoch: 18   Global Step: 105940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:19,531-Speed 5633.53 samples/sec   Loss 1.1153   LearningRate 0.0005   Epoch: 18   Global Step: 105950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:21,364-Speed 5590.89 samples/sec   Loss 1.0830   LearningRate 0.0005   Epoch: 18   Global Step: 105960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:23,204-Speed 5565.92 samples/sec   Loss 1.0930   LearningRate 0.0005   Epoch: 18   Global Step: 105970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:25,018-Speed 5648.11 samples/sec   Loss 1.0692   LearningRate 0.0005   Epoch: 18   Global Step: 105980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:26,835-Speed 5636.46 samples/sec   Loss 1.0666   LearningRate 0.0005   Epoch: 18   Global Step: 105990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:28,659-Speed 5614.79 samples/sec   Loss 1.0788   LearningRate 0.0005   Epoch: 18   Global Step: 106000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:07:55,096-[lfw][106000]XNorm: 21.998031
Training: 2022-04-27 08:07:55,096-[lfw][106000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-27 08:07:55,097-[lfw][106000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:08:25,657-[cfp_fp][106000]XNorm: 21.255755
Training: 2022-04-27 08:08:25,658-[cfp_fp][106000]Accuracy-Flip: 0.97814+-0.00645
Training: 2022-04-27 08:08:25,658-[cfp_fp][106000]Accuracy-Highest: 0.97929
Training: 2022-04-27 08:08:52,051-[agedb_30][106000]XNorm: 22.189983
Training: 2022-04-27 08:08:52,052-[agedb_30][106000]Accuracy-Flip: 0.98233+-0.00620
Training: 2022-04-27 08:08:52,052-[agedb_30][106000]Accuracy-Highest: 0.98233
Training: 2022-04-27 08:08:53,918-Speed 120.11 samples/sec   Loss 0.9803   LearningRate 0.0005   Epoch: 18   Global Step: 106010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:08:55,738-Speed 5627.29 samples/sec   Loss 1.0809   LearningRate 0.0005   Epoch: 18   Global Step: 106020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:08:57,606-Speed 5483.49 samples/sec   Loss 1.0784   LearningRate 0.0005   Epoch: 18   Global Step: 106030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:08:59,415-Speed 5661.80 samples/sec   Loss 1.0296   LearningRate 0.0005   Epoch: 18   Global Step: 106040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:01,230-Speed 5646.02 samples/sec   Loss 0.9690   LearningRate 0.0005   Epoch: 18   Global Step: 106050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:03,052-Speed 5619.95 samples/sec   Loss 1.0370   LearningRate 0.0005   Epoch: 18   Global Step: 106060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:04,882-Speed 5599.18 samples/sec   Loss 1.0989   LearningRate 0.0005   Epoch: 18   Global Step: 106070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:06,682-Speed 5688.59 samples/sec   Loss 0.9538   LearningRate 0.0005   Epoch: 18   Global Step: 106080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:08,503-Speed 5625.39 samples/sec   Loss 1.0803   LearningRate 0.0005   Epoch: 18   Global Step: 106090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:10,331-Speed 5603.72 samples/sec   Loss 1.0719   LearningRate 0.0004   Epoch: 18   Global Step: 106100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:12,156-Speed 5611.83 samples/sec   Loss 1.0248   LearningRate 0.0004   Epoch: 18   Global Step: 106110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:14,004-Speed 5544.52 samples/sec   Loss 0.9879   LearningRate 0.0004   Epoch: 18   Global Step: 106120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:15,820-Speed 5640.30 samples/sec   Loss 1.0702   LearningRate 0.0004   Epoch: 18   Global Step: 106130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:17,675-Speed 5522.74 samples/sec   Loss 1.1148   LearningRate 0.0004   Epoch: 18   Global Step: 106140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:19,554-Speed 5449.66 samples/sec   Loss 0.9949   LearningRate 0.0004   Epoch: 18   Global Step: 106150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:21,390-Speed 5580.15 samples/sec   Loss 1.0078   LearningRate 0.0004   Epoch: 18   Global Step: 106160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:23,217-Speed 5606.93 samples/sec   Loss 0.9958   LearningRate 0.0004   Epoch: 18   Global Step: 106170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:25,044-Speed 5606.09 samples/sec   Loss 0.9526   LearningRate 0.0004   Epoch: 18   Global Step: 106180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:26,875-Speed 5595.75 samples/sec   Loss 1.1155   LearningRate 0.0004   Epoch: 18   Global Step: 106190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:28,692-Speed 5638.89 samples/sec   Loss 0.9941   LearningRate 0.0004   Epoch: 18   Global Step: 106200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:30,522-Speed 5596.37 samples/sec   Loss 0.9876   LearningRate 0.0004   Epoch: 18   Global Step: 106210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:32,361-Speed 5569.18 samples/sec   Loss 1.0613   LearningRate 0.0004   Epoch: 18   Global Step: 106220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:34,199-Speed 5573.02 samples/sec   Loss 1.0833   LearningRate 0.0004   Epoch: 18   Global Step: 106230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:36,065-Speed 5490.15 samples/sec   Loss 1.0668   LearningRate 0.0004   Epoch: 18   Global Step: 106240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:37,930-Speed 5492.02 samples/sec   Loss 0.9975   LearningRate 0.0004   Epoch: 18   Global Step: 106250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:39,748-Speed 5633.29 samples/sec   Loss 1.0651   LearningRate 0.0004   Epoch: 18   Global Step: 106260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:41,582-Speed 5585.57 samples/sec   Loss 1.0692   LearningRate 0.0004   Epoch: 18   Global Step: 106270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:43,408-Speed 5612.47 samples/sec   Loss 1.0309   LearningRate 0.0004   Epoch: 18   Global Step: 106280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:09:45,226-Speed 5632.43 samples/sec   Loss 1.0745   LearningRate 0.0004   Epoch: 18   Global Step: 106290   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:09:47,043-Speed 5639.07 samples/sec   Loss 0.9964   LearningRate 0.0004   Epoch: 18   Global Step: 106300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:48,869-Speed 5609.21 samples/sec   Loss 1.0253   LearningRate 0.0004   Epoch: 18   Global Step: 106310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:50,701-Speed 5592.34 samples/sec   Loss 1.0465   LearningRate 0.0004   Epoch: 18   Global Step: 106320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:52,525-Speed 5614.13 samples/sec   Loss 1.0027   LearningRate 0.0004   Epoch: 18   Global Step: 106330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:54,346-Speed 5626.65 samples/sec   Loss 1.0826   LearningRate 0.0004   Epoch: 18   Global Step: 106340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:56,206-Speed 5507.54 samples/sec   Loss 1.0311   LearningRate 0.0004   Epoch: 18   Global Step: 106350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:58,052-Speed 5547.84 samples/sec   Loss 1.0584   LearningRate 0.0004   Epoch: 18   Global Step: 106360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:09:59,875-Speed 5619.35 samples/sec   Loss 0.9177   LearningRate 0.0004   Epoch: 18   Global Step: 106370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:01,722-Speed 5545.31 samples/sec   Loss 1.0282   LearningRate 0.0004   Epoch: 18   Global Step: 106380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:03,562-Speed 5567.79 samples/sec   Loss 1.0807   LearningRate 0.0004   Epoch: 18   Global Step: 106390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:05,401-Speed 5569.14 samples/sec   Loss 1.0120   LearningRate 0.0004   Epoch: 18   Global Step: 106400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:07,223-Speed 5622.11 samples/sec   Loss 1.0571   LearningRate 0.0004   Epoch: 18   Global Step: 106410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:09,054-Speed 5595.19 samples/sec   Loss 1.0999   LearningRate 0.0004   Epoch: 18   Global Step: 106420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:10,883-Speed 5601.51 samples/sec   Loss 1.0239   LearningRate 0.0004   Epoch: 18   Global Step: 106430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:12,729-Speed 5547.87 samples/sec   Loss 1.0654   LearningRate 0.0004   Epoch: 18   Global Step: 106440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:14,556-Speed 5606.62 samples/sec   Loss 1.0533   LearningRate 0.0004   Epoch: 18   Global Step: 106450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:16,372-Speed 5641.78 samples/sec   Loss 1.0440   LearningRate 0.0004   Epoch: 18   Global Step: 106460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:18,225-Speed 5527.40 samples/sec   Loss 1.0377   LearningRate 0.0004   Epoch: 18   Global Step: 106470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:20,065-Speed 5566.44 samples/sec   Loss 1.0564   LearningRate 0.0004   Epoch: 18   Global Step: 106480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:21,901-Speed 5578.70 samples/sec   Loss 1.1069   LearningRate 0.0004   Epoch: 18   Global Step: 106490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:23,728-Speed 5605.38 samples/sec   Loss 1.0104   LearningRate 0.0004   Epoch: 18   Global Step: 106500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:25,581-Speed 5530.63 samples/sec   Loss 1.0250   LearningRate 0.0004   Epoch: 18   Global Step: 106510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:27,437-Speed 5516.97 samples/sec   Loss 1.0727   LearningRate 0.0004   Epoch: 18   Global Step: 106520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:29,288-Speed 5537.18 samples/sec   Loss 1.0292   LearningRate 0.0004   Epoch: 18   Global Step: 106530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:31,105-Speed 5636.26 samples/sec   Loss 1.0482   LearningRate 0.0004   Epoch: 18   Global Step: 106540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:32,928-Speed 5619.82 samples/sec   Loss 0.9758   LearningRate 0.0004   Epoch: 18   Global Step: 106550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:34,754-Speed 5609.69 samples/sec   Loss 1.0381   LearningRate 0.0004   Epoch: 18   Global Step: 106560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:36,579-Speed 5611.37 samples/sec   Loss 1.0806   LearningRate 0.0004   Epoch: 18   Global Step: 106570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:38,415-Speed 5579.91 samples/sec   Loss 0.9880   LearningRate 0.0004   Epoch: 18   Global Step: 106580   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:10:40,232-Speed 5635.85 samples/sec   Loss 1.0161   LearningRate 0.0004   Epoch: 18   Global Step: 106590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:42,063-Speed 5596.08 samples/sec   Loss 1.1029   LearningRate 0.0004   Epoch: 18   Global Step: 106600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:10:43,877-Speed 5647.64 samples/sec   Loss 1.0377   LearningRate 0.0004   Epoch: 18   Global Step: 106610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:45,695-Speed 5634.13 samples/sec   Loss 1.0157   LearningRate 0.0004   Epoch: 18   Global Step: 106620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:47,544-Speed 5537.81 samples/sec   Loss 1.0757   LearningRate 0.0004   Epoch: 18   Global Step: 106630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:49,372-Speed 5604.79 samples/sec   Loss 1.0606   LearningRate 0.0004   Epoch: 18   Global Step: 106640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:51,204-Speed 5591.33 samples/sec   Loss 1.0389   LearningRate 0.0004   Epoch: 18   Global Step: 106650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:53,029-Speed 5613.48 samples/sec   Loss 1.0612   LearningRate 0.0004   Epoch: 18   Global Step: 106660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:54,882-Speed 5527.12 samples/sec   Loss 1.1131   LearningRate 0.0004   Epoch: 18   Global Step: 106670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:56,699-Speed 5640.19 samples/sec   Loss 1.0468   LearningRate 0.0004   Epoch: 18   Global Step: 106680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:10:58,526-Speed 5606.64 samples/sec   Loss 1.0429   LearningRate 0.0004   Epoch: 18   Global Step: 106690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:00,347-Speed 5622.81 samples/sec   Loss 1.0403   LearningRate 0.0004   Epoch: 18   Global Step: 106700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:02,175-Speed 5603.47 samples/sec   Loss 1.0471   LearningRate 0.0004   Epoch: 18   Global Step: 106710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:03,999-Speed 5616.72 samples/sec   Loss 1.0420   LearningRate 0.0004   Epoch: 18   Global Step: 106720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:05,821-Speed 5621.83 samples/sec   Loss 1.0623   LearningRate 0.0004   Epoch: 18   Global Step: 106730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:07,665-Speed 5555.91 samples/sec   Loss 1.0923   LearningRate 0.0004   Epoch: 18   Global Step: 106740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:09,496-Speed 5595.13 samples/sec   Loss 1.0024   LearningRate 0.0004   Epoch: 18   Global Step: 106750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:11,313-Speed 5636.40 samples/sec   Loss 1.0662   LearningRate 0.0004   Epoch: 18   Global Step: 106760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:13,157-Speed 5554.78 samples/sec   Loss 1.0078   LearningRate 0.0004   Epoch: 18   Global Step: 106770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:14,987-Speed 5598.90 samples/sec   Loss 1.0736   LearningRate 0.0004   Epoch: 18   Global Step: 106780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:16,820-Speed 5588.84 samples/sec   Loss 1.0062   LearningRate 0.0004   Epoch: 18   Global Step: 106790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:18,665-Speed 5552.39 samples/sec   Loss 1.0178   LearningRate 0.0004   Epoch: 18   Global Step: 106800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:20,487-Speed 5620.48 samples/sec   Loss 1.0094   LearningRate 0.0004   Epoch: 18   Global Step: 106810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:11:22,324-Speed 5576.14 samples/sec   Loss 1.0517   LearningRate 0.0004   Epoch: 18   Global Step: 106820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:24,174-Speed 5537.27 samples/sec   Loss 1.1117   LearningRate 0.0004   Epoch: 18   Global Step: 106830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:26,015-Speed 5564.52 samples/sec   Loss 1.0234   LearningRate 0.0004   Epoch: 18   Global Step: 106840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:27,887-Speed 5472.03 samples/sec   Loss 1.0827   LearningRate 0.0004   Epoch: 18   Global Step: 106850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:29,740-Speed 5526.09 samples/sec   Loss 0.9605   LearningRate 0.0004   Epoch: 18   Global Step: 106860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:31,572-Speed 5592.59 samples/sec   Loss 0.9996   LearningRate 0.0004   Epoch: 18   Global Step: 106870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:33,403-Speed 5594.79 samples/sec   Loss 1.0765   LearningRate 0.0004   Epoch: 18   Global Step: 106880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:35,226-Speed 5618.63 samples/sec   Loss 1.0103   LearningRate 0.0004   Epoch: 18   Global Step: 106890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:37,071-Speed 5551.85 samples/sec   Loss 0.9768   LearningRate 0.0004   Epoch: 18   Global Step: 106900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:38,917-Speed 5549.15 samples/sec   Loss 1.0107   LearningRate 0.0004   Epoch: 18   Global Step: 106910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:40,753-Speed 5578.08 samples/sec   Loss 1.1139   LearningRate 0.0004   Epoch: 18   Global Step: 106920   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:11:42,588-Speed 5584.20 samples/sec   Loss 1.0817   LearningRate 0.0004   Epoch: 18   Global Step: 106930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:44,411-Speed 5616.64 samples/sec   Loss 1.0882   LearningRate 0.0004   Epoch: 18   Global Step: 106940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:46,230-Speed 5632.03 samples/sec   Loss 1.1369   LearningRate 0.0004   Epoch: 18   Global Step: 106950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:48,059-Speed 5599.96 samples/sec   Loss 0.9998   LearningRate 0.0004   Epoch: 18   Global Step: 106960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:49,893-Speed 5587.92 samples/sec   Loss 1.0469   LearningRate 0.0004   Epoch: 18   Global Step: 106970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:51,734-Speed 5561.38 samples/sec   Loss 0.9501   LearningRate 0.0004   Epoch: 18   Global Step: 106980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:53,554-Speed 5627.57 samples/sec   Loss 1.1440   LearningRate 0.0004   Epoch: 18   Global Step: 106990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:55,392-Speed 5574.84 samples/sec   Loss 1.0401   LearningRate 0.0003   Epoch: 18   Global Step: 107000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:57,250-Speed 5515.60 samples/sec   Loss 1.0361   LearningRate 0.0003   Epoch: 18   Global Step: 107010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:11:59,113-Speed 5496.11 samples/sec   Loss 1.0556   LearningRate 0.0003   Epoch: 18   Global Step: 107020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:00,938-Speed 5614.79 samples/sec   Loss 1.0693   LearningRate 0.0003   Epoch: 18   Global Step: 107030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:02,757-Speed 5629.69 samples/sec   Loss 1.0598   LearningRate 0.0003   Epoch: 18   Global Step: 107040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:04,591-Speed 5587.64 samples/sec   Loss 1.0050   LearningRate 0.0003   Epoch: 18   Global Step: 107050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:06,436-Speed 5551.76 samples/sec   Loss 1.0262   LearningRate 0.0003   Epoch: 18   Global Step: 107060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:08,252-Speed 5638.06 samples/sec   Loss 1.0315   LearningRate 0.0003   Epoch: 18   Global Step: 107070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:10,102-Speed 5537.29 samples/sec   Loss 1.0322   LearningRate 0.0003   Epoch: 18   Global Step: 107080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:11,946-Speed 5556.94 samples/sec   Loss 1.0318   LearningRate 0.0003   Epoch: 18   Global Step: 107090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:13,799-Speed 5527.79 samples/sec   Loss 1.0507   LearningRate 0.0003   Epoch: 18   Global Step: 107100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:15,620-Speed 5623.63 samples/sec   Loss 1.0369   LearningRate 0.0003   Epoch: 18   Global Step: 107110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:17,472-Speed 5533.04 samples/sec   Loss 1.0797   LearningRate 0.0003   Epoch: 18   Global Step: 107120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:19,313-Speed 5562.66 samples/sec   Loss 1.0886   LearningRate 0.0003   Epoch: 18   Global Step: 107130   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:12:21,125-Speed 5652.46 samples/sec   Loss 1.0655   LearningRate 0.0003   Epoch: 18   Global Step: 107140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:22,957-Speed 5593.27 samples/sec   Loss 1.0535   LearningRate 0.0003   Epoch: 18   Global Step: 107150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:24,781-Speed 5614.85 samples/sec   Loss 1.0903   LearningRate 0.0003   Epoch: 18   Global Step: 107160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:26,625-Speed 5554.61 samples/sec   Loss 1.0142   LearningRate 0.0003   Epoch: 18   Global Step: 107170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:28,466-Speed 5565.79 samples/sec   Loss 1.0382   LearningRate 0.0003   Epoch: 18   Global Step: 107180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:30,294-Speed 5603.72 samples/sec   Loss 1.0268   LearningRate 0.0003   Epoch: 18   Global Step: 107190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:32,193-Speed 5391.74 samples/sec   Loss 1.0262   LearningRate 0.0003   Epoch: 18   Global Step: 107200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:34,024-Speed 5595.65 samples/sec   Loss 1.0823   LearningRate 0.0003   Epoch: 18   Global Step: 107210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:35,848-Speed 5614.66 samples/sec   Loss 1.0616   LearningRate 0.0003   Epoch: 18   Global Step: 107220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:37,682-Speed 5585.98 samples/sec   Loss 0.9939   LearningRate 0.0003   Epoch: 18   Global Step: 107230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:39,528-Speed 5550.74 samples/sec   Loss 1.0026   LearningRate 0.0003   Epoch: 18   Global Step: 107240   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:12:41,365-Speed 5576.47 samples/sec   Loss 1.0816   LearningRate 0.0003   Epoch: 18   Global Step: 107250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:43,224-Speed 5509.99 samples/sec   Loss 1.0020   LearningRate 0.0003   Epoch: 18   Global Step: 107260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:45,064-Speed 5565.39 samples/sec   Loss 1.0026   LearningRate 0.0003   Epoch: 18   Global Step: 107270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:46,895-Speed 5594.34 samples/sec   Loss 1.0181   LearningRate 0.0003   Epoch: 18   Global Step: 107280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:48,739-Speed 5557.28 samples/sec   Loss 1.0152   LearningRate 0.0003   Epoch: 18   Global Step: 107290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:50,612-Speed 5467.62 samples/sec   Loss 1.0583   LearningRate 0.0003   Epoch: 18   Global Step: 107300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:52,458-Speed 5551.00 samples/sec   Loss 0.9787   LearningRate 0.0003   Epoch: 18   Global Step: 107310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:54,278-Speed 5627.99 samples/sec   Loss 1.0319   LearningRate 0.0003   Epoch: 18   Global Step: 107320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:56,119-Speed 5563.57 samples/sec   Loss 1.0331   LearningRate 0.0003   Epoch: 18   Global Step: 107330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:57,949-Speed 5595.63 samples/sec   Loss 1.0498   LearningRate 0.0003   Epoch: 18   Global Step: 107340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:12:59,765-Speed 5642.26 samples/sec   Loss 1.0448   LearningRate 0.0003   Epoch: 18   Global Step: 107350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:01,599-Speed 5583.68 samples/sec   Loss 0.9742   LearningRate 0.0003   Epoch: 18   Global Step: 107360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:03,433-Speed 5588.45 samples/sec   Loss 0.9538   LearningRate 0.0003   Epoch: 18   Global Step: 107370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:05,254-Speed 5625.30 samples/sec   Loss 1.0130   LearningRate 0.0003   Epoch: 18   Global Step: 107380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:07,079-Speed 5613.06 samples/sec   Loss 0.9747   LearningRate 0.0003   Epoch: 18   Global Step: 107390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:08,918-Speed 5569.21 samples/sec   Loss 1.0746   LearningRate 0.0003   Epoch: 18   Global Step: 107400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:10,746-Speed 5602.57 samples/sec   Loss 0.9647   LearningRate 0.0003   Epoch: 18   Global Step: 107410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:12,588-Speed 5562.25 samples/sec   Loss 1.0677   LearningRate 0.0003   Epoch: 18   Global Step: 107420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:14,411-Speed 5616.58 samples/sec   Loss 1.0584   LearningRate 0.0003   Epoch: 18   Global Step: 107430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:16,231-Speed 5629.36 samples/sec   Loss 0.9686   LearningRate 0.0003   Epoch: 18   Global Step: 107440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:18,039-Speed 5666.36 samples/sec   Loss 0.9924   LearningRate 0.0003   Epoch: 18   Global Step: 107450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:19,870-Speed 5592.75 samples/sec   Loss 1.0704   LearningRate 0.0003   Epoch: 18   Global Step: 107460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:21,695-Speed 5613.94 samples/sec   Loss 1.0538   LearningRate 0.0003   Epoch: 18   Global Step: 107470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:23,540-Speed 5551.78 samples/sec   Loss 0.9912   LearningRate 0.0003   Epoch: 18   Global Step: 107480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:25,375-Speed 5582.70 samples/sec   Loss 1.0384   LearningRate 0.0003   Epoch: 18   Global Step: 107490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:27,203-Speed 5604.64 samples/sec   Loss 1.0242   LearningRate 0.0003   Epoch: 18   Global Step: 107500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:29,033-Speed 5598.60 samples/sec   Loss 1.0494   LearningRate 0.0003   Epoch: 18   Global Step: 107510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:30,894-Speed 5503.36 samples/sec   Loss 1.1280   LearningRate 0.0003   Epoch: 18   Global Step: 107520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:32,733-Speed 5568.81 samples/sec   Loss 1.0061   LearningRate 0.0003   Epoch: 18   Global Step: 107530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:34,568-Speed 5581.99 samples/sec   Loss 1.0555   LearningRate 0.0003   Epoch: 18   Global Step: 107540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:36,396-Speed 5605.30 samples/sec   Loss 1.0451   LearningRate 0.0003   Epoch: 18   Global Step: 107550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:38,240-Speed 5552.74 samples/sec   Loss 1.1147   LearningRate 0.0003   Epoch: 18   Global Step: 107560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:40,086-Speed 5551.32 samples/sec   Loss 1.0554   LearningRate 0.0003   Epoch: 18   Global Step: 107570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:41,934-Speed 5542.73 samples/sec   Loss 1.0652   LearningRate 0.0003   Epoch: 18   Global Step: 107580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:43,768-Speed 5584.92 samples/sec   Loss 1.0545   LearningRate 0.0003   Epoch: 18   Global Step: 107590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:45,596-Speed 5603.42 samples/sec   Loss 0.9781   LearningRate 0.0003   Epoch: 18   Global Step: 107600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:47,425-Speed 5600.90 samples/sec   Loss 0.9775   LearningRate 0.0003   Epoch: 18   Global Step: 107610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:49,266-Speed 5565.60 samples/sec   Loss 1.0621   LearningRate 0.0003   Epoch: 18   Global Step: 107620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:51,117-Speed 5532.81 samples/sec   Loss 0.9984   LearningRate 0.0003   Epoch: 18   Global Step: 107630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:52,940-Speed 5618.86 samples/sec   Loss 1.0506   LearningRate 0.0003   Epoch: 18   Global Step: 107640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:54,775-Speed 5582.91 samples/sec   Loss 0.9917   LearningRate 0.0003   Epoch: 18   Global Step: 107650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:13:56,609-Speed 5584.79 samples/sec   Loss 1.0591   LearningRate 0.0003   Epoch: 18   Global Step: 107660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:13:58,448-Speed 5569.80 samples/sec   Loss 0.9825   LearningRate 0.0003   Epoch: 18   Global Step: 107670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:00,273-Speed 5611.77 samples/sec   Loss 1.0730   LearningRate 0.0003   Epoch: 18   Global Step: 107680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:02,110-Speed 5576.59 samples/sec   Loss 1.1013   LearningRate 0.0003   Epoch: 18   Global Step: 107690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:03,933-Speed 5620.64 samples/sec   Loss 1.0336   LearningRate 0.0003   Epoch: 18   Global Step: 107700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:05,778-Speed 5551.87 samples/sec   Loss 1.0449   LearningRate 0.0003   Epoch: 18   Global Step: 107710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:07,634-Speed 5517.48 samples/sec   Loss 1.0360   LearningRate 0.0003   Epoch: 18   Global Step: 107720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:09,459-Speed 5614.47 samples/sec   Loss 0.9924   LearningRate 0.0003   Epoch: 18   Global Step: 107730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:11,277-Speed 5635.68 samples/sec   Loss 1.0640   LearningRate 0.0003   Epoch: 18   Global Step: 107740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:13,104-Speed 5606.52 samples/sec   Loss 0.9713   LearningRate 0.0003   Epoch: 18   Global Step: 107750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:14,944-Speed 5565.46 samples/sec   Loss 1.0014   LearningRate 0.0003   Epoch: 18   Global Step: 107760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:14:16,760-Speed 5641.78 samples/sec   Loss 1.0198   LearningRate 0.0003   Epoch: 18   Global Step: 107770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:18,607-Speed 5543.76 samples/sec   Loss 1.0988   LearningRate 0.0003   Epoch: 18   Global Step: 107780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:20,447-Speed 5567.97 samples/sec   Loss 1.0816   LearningRate 0.0003   Epoch: 18   Global Step: 107790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:22,276-Speed 5599.59 samples/sec   Loss 1.0192   LearningRate 0.0003   Epoch: 18   Global Step: 107800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:24,132-Speed 5520.24 samples/sec   Loss 0.9998   LearningRate 0.0003   Epoch: 18   Global Step: 107810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:25,960-Speed 5602.02 samples/sec   Loss 1.0488   LearningRate 0.0003   Epoch: 18   Global Step: 107820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:27,798-Speed 5574.78 samples/sec   Loss 1.0845   LearningRate 0.0003   Epoch: 18   Global Step: 107830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:29,624-Speed 5608.26 samples/sec   Loss 1.0183   LearningRate 0.0003   Epoch: 18   Global Step: 107840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:31,449-Speed 5613.43 samples/sec   Loss 1.0963   LearningRate 0.0003   Epoch: 18   Global Step: 107850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:33,301-Speed 5533.60 samples/sec   Loss 1.0606   LearningRate 0.0003   Epoch: 18   Global Step: 107860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:35,130-Speed 5598.05 samples/sec   Loss 1.0552   LearningRate 0.0003   Epoch: 18   Global Step: 107870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:36,970-Speed 5567.52 samples/sec   Loss 1.0606   LearningRate 0.0003   Epoch: 18   Global Step: 107880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:38,813-Speed 5559.75 samples/sec   Loss 0.9883   LearningRate 0.0003   Epoch: 18   Global Step: 107890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:40,638-Speed 5612.43 samples/sec   Loss 0.9943   LearningRate 0.0003   Epoch: 18   Global Step: 107900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:42,460-Speed 5622.20 samples/sec   Loss 1.0836   LearningRate 0.0003   Epoch: 18   Global Step: 107910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:44,292-Speed 5590.54 samples/sec   Loss 1.0847   LearningRate 0.0003   Epoch: 18   Global Step: 107920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:46,134-Speed 5561.93 samples/sec   Loss 0.9946   LearningRate 0.0003   Epoch: 18   Global Step: 107930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:47,983-Speed 5539.32 samples/sec   Loss 1.0120   LearningRate 0.0003   Epoch: 18   Global Step: 107940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:49,827-Speed 5554.16 samples/sec   Loss 1.0175   LearningRate 0.0003   Epoch: 18   Global Step: 107950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:51,648-Speed 5627.27 samples/sec   Loss 1.0610   LearningRate 0.0003   Epoch: 18   Global Step: 107960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:53,474-Speed 5608.18 samples/sec   Loss 0.9859   LearningRate 0.0003   Epoch: 18   Global Step: 107970   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:14:55,317-Speed 5559.45 samples/sec   Loss 1.0916   LearningRate 0.0003   Epoch: 18   Global Step: 107980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:57,157-Speed 5566.53 samples/sec   Loss 1.0393   LearningRate 0.0003   Epoch: 18   Global Step: 107990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:14:59,011-Speed 5525.82 samples/sec   Loss 1.0476   LearningRate 0.0003   Epoch: 18   Global Step: 108000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:15:25,010-[lfw][108000]XNorm: 21.932818
Training: 2022-04-27 08:15:25,010-[lfw][108000]Accuracy-Flip: 0.99750+-0.00250
Training: 2022-04-27 08:15:25,011-[lfw][108000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:15:55,192-[cfp_fp][108000]XNorm: 21.231755
Training: 2022-04-27 08:15:55,193-[cfp_fp][108000]Accuracy-Flip: 0.98100+-0.00599
Training: 2022-04-27 08:15:55,193-[cfp_fp][108000]Accuracy-Highest: 0.98100
Training: 2022-04-27 08:16:21,245-[agedb_30][108000]XNorm: 22.029147
Training: 2022-04-27 08:16:21,246-[agedb_30][108000]Accuracy-Flip: 0.98100+-0.00638
Training: 2022-04-27 08:16:21,246-[agedb_30][108000]Accuracy-Highest: 0.98233
Training: 2022-04-27 08:16:23,088-Speed 121.79 samples/sec   Loss 1.0637   LearningRate 0.0003   Epoch: 18   Global Step: 108010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:24,925-Speed 5575.79 samples/sec   Loss 0.9437   LearningRate 0.0003   Epoch: 18   Global Step: 108020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:26,811-Speed 5431.78 samples/sec   Loss 1.0033   LearningRate 0.0003   Epoch: 18   Global Step: 108030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:40,657-Speed 739.64 samples/sec   Loss 0.8670   LearningRate 0.0002   Epoch: 19   Global Step: 108040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:42,535-Speed 5453.01 samples/sec   Loss 0.8399   LearningRate 0.0002   Epoch: 19   Global Step: 108050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:44,384-Speed 5540.80 samples/sec   Loss 0.9233   LearningRate 0.0002   Epoch: 19   Global Step: 108060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:46,226-Speed 5562.63 samples/sec   Loss 0.8606   LearningRate 0.0002   Epoch: 19   Global Step: 108070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:48,052-Speed 5607.99 samples/sec   Loss 0.9119   LearningRate 0.0002   Epoch: 19   Global Step: 108080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:49,903-Speed 5533.89 samples/sec   Loss 0.8394   LearningRate 0.0002   Epoch: 19   Global Step: 108090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:51,717-Speed 5648.47 samples/sec   Loss 0.8780   LearningRate 0.0002   Epoch: 19   Global Step: 108100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:53,550-Speed 5587.56 samples/sec   Loss 0.9224   LearningRate 0.0002   Epoch: 19   Global Step: 108110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:55,393-Speed 5557.25 samples/sec   Loss 0.8704   LearningRate 0.0002   Epoch: 19   Global Step: 108120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:57,219-Speed 5610.71 samples/sec   Loss 0.9583   LearningRate 0.0002   Epoch: 19   Global Step: 108130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:16:59,049-Speed 5596.70 samples/sec   Loss 0.8984   LearningRate 0.0002   Epoch: 19   Global Step: 108140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:00,886-Speed 5575.25 samples/sec   Loss 0.8742   LearningRate 0.0002   Epoch: 19   Global Step: 108150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:02,710-Speed 5616.54 samples/sec   Loss 0.9216   LearningRate 0.0002   Epoch: 19   Global Step: 108160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:04,546-Speed 5582.32 samples/sec   Loss 0.8863   LearningRate 0.0002   Epoch: 19   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:06,363-Speed 5636.78 samples/sec   Loss 0.8983   LearningRate 0.0002   Epoch: 19   Global Step: 108180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:08,198-Speed 5581.74 samples/sec   Loss 0.8668   LearningRate 0.0002   Epoch: 19   Global Step: 108190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:10,023-Speed 5614.47 samples/sec   Loss 0.9647   LearningRate 0.0002   Epoch: 19   Global Step: 108200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:11,856-Speed 5585.70 samples/sec   Loss 0.9405   LearningRate 0.0002   Epoch: 19   Global Step: 108210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:13,692-Speed 5580.31 samples/sec   Loss 0.8785   LearningRate 0.0002   Epoch: 19   Global Step: 108220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:15,558-Speed 5487.85 samples/sec   Loss 0.8598   LearningRate 0.0002   Epoch: 19   Global Step: 108230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:17,379-Speed 5626.49 samples/sec   Loss 0.8992   LearningRate 0.0002   Epoch: 19   Global Step: 108240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:19,221-Speed 5562.18 samples/sec   Loss 0.8982   LearningRate 0.0002   Epoch: 19   Global Step: 108250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:21,064-Speed 5556.07 samples/sec   Loss 0.8872   LearningRate 0.0002   Epoch: 19   Global Step: 108260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:22,919-Speed 5522.34 samples/sec   Loss 0.9153   LearningRate 0.0002   Epoch: 19   Global Step: 108270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:24,760-Speed 5563.22 samples/sec   Loss 0.9313   LearningRate 0.0002   Epoch: 19   Global Step: 108280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:17:26,579-Speed 5635.72 samples/sec   Loss 0.9736   LearningRate 0.0002   Epoch: 19   Global Step: 108290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:28,415-Speed 5577.33 samples/sec   Loss 0.8999   LearningRate 0.0002   Epoch: 19   Global Step: 108300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:30,258-Speed 5558.45 samples/sec   Loss 0.8726   LearningRate 0.0002   Epoch: 19   Global Step: 108310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:32,098-Speed 5568.33 samples/sec   Loss 0.9330   LearningRate 0.0002   Epoch: 19   Global Step: 108320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:33,941-Speed 5558.02 samples/sec   Loss 0.9114   LearningRate 0.0002   Epoch: 19   Global Step: 108330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:35,758-Speed 5637.70 samples/sec   Loss 0.7691   LearningRate 0.0002   Epoch: 19   Global Step: 108340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:37,581-Speed 5617.68 samples/sec   Loss 0.8953   LearningRate 0.0002   Epoch: 19   Global Step: 108350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:39,405-Speed 5614.88 samples/sec   Loss 0.9119   LearningRate 0.0002   Epoch: 19   Global Step: 108360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:41,236-Speed 5594.33 samples/sec   Loss 0.8610   LearningRate 0.0002   Epoch: 19   Global Step: 108370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:43,065-Speed 5602.73 samples/sec   Loss 0.8533   LearningRate 0.0002   Epoch: 19   Global Step: 108380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:44,896-Speed 5594.39 samples/sec   Loss 0.8775   LearningRate 0.0002   Epoch: 19   Global Step: 108390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:46,737-Speed 5563.76 samples/sec   Loss 0.9518   LearningRate 0.0002   Epoch: 19   Global Step: 108400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:48,577-Speed 5567.03 samples/sec   Loss 0.8608   LearningRate 0.0002   Epoch: 19   Global Step: 108410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:50,410-Speed 5587.53 samples/sec   Loss 0.9605   LearningRate 0.0002   Epoch: 19   Global Step: 108420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:52,245-Speed 5582.85 samples/sec   Loss 0.9138   LearningRate 0.0002   Epoch: 19   Global Step: 108430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:54,073-Speed 5603.37 samples/sec   Loss 0.8719   LearningRate 0.0002   Epoch: 19   Global Step: 108440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:55,892-Speed 5632.42 samples/sec   Loss 0.8855   LearningRate 0.0002   Epoch: 19   Global Step: 108450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:57,729-Speed 5576.90 samples/sec   Loss 0.8039   LearningRate 0.0002   Epoch: 19   Global Step: 108460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:17:59,563-Speed 5582.52 samples/sec   Loss 0.8880   LearningRate 0.0002   Epoch: 19   Global Step: 108470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:01,412-Speed 5539.70 samples/sec   Loss 0.9461   LearningRate 0.0002   Epoch: 19   Global Step: 108480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:03,231-Speed 5632.50 samples/sec   Loss 0.8824   LearningRate 0.0002   Epoch: 19   Global Step: 108490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:05,071-Speed 5566.98 samples/sec   Loss 0.9179   LearningRate 0.0002   Epoch: 19   Global Step: 108500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:06,887-Speed 5640.77 samples/sec   Loss 0.9244   LearningRate 0.0002   Epoch: 19   Global Step: 108510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:08,703-Speed 5642.66 samples/sec   Loss 0.9006   LearningRate 0.0002   Epoch: 19   Global Step: 108520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:10,533-Speed 5595.31 samples/sec   Loss 0.8868   LearningRate 0.0002   Epoch: 19   Global Step: 108530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:12,372-Speed 5572.56 samples/sec   Loss 0.8840   LearningRate 0.0002   Epoch: 19   Global Step: 108540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:14,183-Speed 5654.97 samples/sec   Loss 0.8696   LearningRate 0.0002   Epoch: 19   Global Step: 108550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:16,004-Speed 5626.01 samples/sec   Loss 0.8631   LearningRate 0.0002   Epoch: 19   Global Step: 108560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:17,835-Speed 5595.05 samples/sec   Loss 0.8580   LearningRate 0.0002   Epoch: 19   Global Step: 108570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:19,689-Speed 5524.69 samples/sec   Loss 0.8466   LearningRate 0.0002   Epoch: 19   Global Step: 108580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:21,583-Speed 5408.53 samples/sec   Loss 0.8953   LearningRate 0.0002   Epoch: 19   Global Step: 108590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:23,469-Speed 5429.91 samples/sec   Loss 0.9242   LearningRate 0.0002   Epoch: 19   Global Step: 108600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:25,291-Speed 5621.33 samples/sec   Loss 0.9093   LearningRate 0.0002   Epoch: 19   Global Step: 108610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:27,114-Speed 5620.58 samples/sec   Loss 0.8455   LearningRate 0.0002   Epoch: 19   Global Step: 108620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:28,945-Speed 5594.38 samples/sec   Loss 0.9136   LearningRate 0.0002   Epoch: 19   Global Step: 108630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:30,759-Speed 5646.18 samples/sec   Loss 0.9262   LearningRate 0.0002   Epoch: 19   Global Step: 108640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:32,586-Speed 5607.98 samples/sec   Loss 0.8771   LearningRate 0.0002   Epoch: 19   Global Step: 108650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:34,410-Speed 5616.61 samples/sec   Loss 0.9548   LearningRate 0.0002   Epoch: 19   Global Step: 108660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:36,232-Speed 5621.02 samples/sec   Loss 0.9270   LearningRate 0.0002   Epoch: 19   Global Step: 108670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:38,053-Speed 5626.19 samples/sec   Loss 0.9657   LearningRate 0.0002   Epoch: 19   Global Step: 108680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:39,862-Speed 5660.31 samples/sec   Loss 0.9177   LearningRate 0.0002   Epoch: 19   Global Step: 108690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:41,684-Speed 5622.73 samples/sec   Loss 0.9406   LearningRate 0.0002   Epoch: 19   Global Step: 108700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:43,500-Speed 5638.98 samples/sec   Loss 0.9163   LearningRate 0.0002   Epoch: 19   Global Step: 108710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:45,315-Speed 5644.21 samples/sec   Loss 0.8440   LearningRate 0.0002   Epoch: 19   Global Step: 108720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:47,148-Speed 5587.96 samples/sec   Loss 0.8117   LearningRate 0.0002   Epoch: 19   Global Step: 108730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:48,966-Speed 5635.88 samples/sec   Loss 0.8772   LearningRate 0.0002   Epoch: 19   Global Step: 108740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:50,785-Speed 5630.83 samples/sec   Loss 0.9066   LearningRate 0.0002   Epoch: 19   Global Step: 108750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:52,623-Speed 5573.98 samples/sec   Loss 0.8991   LearningRate 0.0002   Epoch: 19   Global Step: 108760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:54,449-Speed 5608.31 samples/sec   Loss 0.8419   LearningRate 0.0002   Epoch: 19   Global Step: 108770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:56,268-Speed 5631.38 samples/sec   Loss 0.9772   LearningRate 0.0002   Epoch: 19   Global Step: 108780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:18:58,118-Speed 5540.03 samples/sec   Loss 0.8814   LearningRate 0.0002   Epoch: 19   Global Step: 108790   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:18:59,951-Speed 5588.18 samples/sec   Loss 0.8269   LearningRate 0.0002   Epoch: 19   Global Step: 108800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:01,803-Speed 5530.76 samples/sec   Loss 0.9024   LearningRate 0.0002   Epoch: 19   Global Step: 108810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:03,641-Speed 5570.86 samples/sec   Loss 0.8698   LearningRate 0.0002   Epoch: 19   Global Step: 108820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:05,461-Speed 5630.40 samples/sec   Loss 0.8965   LearningRate 0.0002   Epoch: 19   Global Step: 108830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:07,283-Speed 5620.81 samples/sec   Loss 0.9131   LearningRate 0.0002   Epoch: 19   Global Step: 108840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:09,106-Speed 5620.29 samples/sec   Loss 0.9285   LearningRate 0.0002   Epoch: 19   Global Step: 108850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:10,930-Speed 5614.86 samples/sec   Loss 0.8512   LearningRate 0.0002   Epoch: 19   Global Step: 108860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:12,759-Speed 5601.56 samples/sec   Loss 0.8355   LearningRate 0.0002   Epoch: 19   Global Step: 108870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:14,582-Speed 5618.79 samples/sec   Loss 0.9119   LearningRate 0.0002   Epoch: 19   Global Step: 108880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:16,423-Speed 5564.62 samples/sec   Loss 0.9087   LearningRate 0.0002   Epoch: 19   Global Step: 108890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:18,249-Speed 5608.82 samples/sec   Loss 0.9667   LearningRate 0.0002   Epoch: 19   Global Step: 108900   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:19:20,062-Speed 5651.11 samples/sec   Loss 0.9572   LearningRate 0.0002   Epoch: 19   Global Step: 108910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:21,884-Speed 5622.83 samples/sec   Loss 0.9335   LearningRate 0.0002   Epoch: 19   Global Step: 108920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:23,716-Speed 5589.91 samples/sec   Loss 0.9312   LearningRate 0.0002   Epoch: 19   Global Step: 108930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:25,534-Speed 5635.88 samples/sec   Loss 0.9264   LearningRate 0.0002   Epoch: 19   Global Step: 108940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:27,382-Speed 5541.98 samples/sec   Loss 0.9123   LearningRate 0.0002   Epoch: 19   Global Step: 108950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:29,226-Speed 5553.56 samples/sec   Loss 0.9676   LearningRate 0.0002   Epoch: 19   Global Step: 108960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:31,058-Speed 5592.72 samples/sec   Loss 0.9674   LearningRate 0.0002   Epoch: 19   Global Step: 108970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:32,887-Speed 5600.74 samples/sec   Loss 0.9429   LearningRate 0.0002   Epoch: 19   Global Step: 108980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:34,718-Speed 5594.75 samples/sec   Loss 0.9039   LearningRate 0.0002   Epoch: 19   Global Step: 108990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:36,550-Speed 5592.93 samples/sec   Loss 0.9261   LearningRate 0.0002   Epoch: 19   Global Step: 109000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:38,377-Speed 5606.29 samples/sec   Loss 0.8783   LearningRate 0.0002   Epoch: 19   Global Step: 109010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:40,205-Speed 5603.59 samples/sec   Loss 0.9323   LearningRate 0.0002   Epoch: 19   Global Step: 109020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:42,033-Speed 5602.44 samples/sec   Loss 0.9315   LearningRate 0.0002   Epoch: 19   Global Step: 109030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:43,874-Speed 5563.42 samples/sec   Loss 0.8709   LearningRate 0.0002   Epoch: 19   Global Step: 109040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:45,700-Speed 5609.52 samples/sec   Loss 0.8251   LearningRate 0.0002   Epoch: 19   Global Step: 109050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:47,526-Speed 5609.74 samples/sec   Loss 0.8730   LearningRate 0.0002   Epoch: 19   Global Step: 109060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:49,347-Speed 5627.00 samples/sec   Loss 0.8494   LearningRate 0.0002   Epoch: 19   Global Step: 109070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:51,167-Speed 5625.98 samples/sec   Loss 0.9895   LearningRate 0.0002   Epoch: 19   Global Step: 109080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:53,014-Speed 5548.02 samples/sec   Loss 0.9301   LearningRate 0.0002   Epoch: 19   Global Step: 109090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:54,834-Speed 5626.57 samples/sec   Loss 0.9094   LearningRate 0.0002   Epoch: 19   Global Step: 109100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:56,644-Speed 5660.88 samples/sec   Loss 0.8488   LearningRate 0.0002   Epoch: 19   Global Step: 109110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:19:58,480-Speed 5577.74 samples/sec   Loss 0.9009   LearningRate 0.0002   Epoch: 19   Global Step: 109120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:00,308-Speed 5603.86 samples/sec   Loss 0.9650   LearningRate 0.0002   Epoch: 19   Global Step: 109130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:02,143-Speed 5582.63 samples/sec   Loss 0.9119   LearningRate 0.0002   Epoch: 19   Global Step: 109140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:03,958-Speed 5644.12 samples/sec   Loss 0.8563   LearningRate 0.0002   Epoch: 19   Global Step: 109150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:05,781-Speed 5620.19 samples/sec   Loss 0.8959   LearningRate 0.0002   Epoch: 19   Global Step: 109160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:07,615-Speed 5584.27 samples/sec   Loss 0.8654   LearningRate 0.0002   Epoch: 19   Global Step: 109170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:09,444-Speed 5600.14 samples/sec   Loss 0.9284   LearningRate 0.0002   Epoch: 19   Global Step: 109180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:11,268-Speed 5617.10 samples/sec   Loss 0.8738   LearningRate 0.0002   Epoch: 19   Global Step: 109190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:13,100-Speed 5589.36 samples/sec   Loss 0.8241   LearningRate 0.0002   Epoch: 19   Global Step: 109200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:14,925-Speed 5614.34 samples/sec   Loss 0.8661   LearningRate 0.0002   Epoch: 19   Global Step: 109210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:16,747-Speed 5620.15 samples/sec   Loss 0.9100   LearningRate 0.0002   Epoch: 19   Global Step: 109220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:18,580-Speed 5589.54 samples/sec   Loss 0.9288   LearningRate 0.0002   Epoch: 19   Global Step: 109230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:20,426-Speed 5550.94 samples/sec   Loss 0.9168   LearningRate 0.0002   Epoch: 19   Global Step: 109240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:20:22,249-Speed 5617.15 samples/sec   Loss 0.8380   LearningRate 0.0002   Epoch: 19   Global Step: 109250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:24,083-Speed 5585.80 samples/sec   Loss 0.8275   LearningRate 0.0002   Epoch: 19   Global Step: 109260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:25,928-Speed 5552.79 samples/sec   Loss 0.8250   LearningRate 0.0002   Epoch: 19   Global Step: 109270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:27,758-Speed 5596.95 samples/sec   Loss 0.9467   LearningRate 0.0002   Epoch: 19   Global Step: 109280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:29,591-Speed 5588.10 samples/sec   Loss 0.8830   LearningRate 0.0002   Epoch: 19   Global Step: 109290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:31,432-Speed 5564.43 samples/sec   Loss 0.8516   LearningRate 0.0002   Epoch: 19   Global Step: 109300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:33,273-Speed 5562.33 samples/sec   Loss 0.8467   LearningRate 0.0002   Epoch: 19   Global Step: 109310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:35,122-Speed 5552.58 samples/sec   Loss 0.8999   LearningRate 0.0001   Epoch: 19   Global Step: 109320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:36,970-Speed 5542.50 samples/sec   Loss 0.8416   LearningRate 0.0001   Epoch: 19   Global Step: 109330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:38,802-Speed 5589.07 samples/sec   Loss 0.9442   LearningRate 0.0001   Epoch: 19   Global Step: 109340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:40,622-Speed 5631.35 samples/sec   Loss 0.9613   LearningRate 0.0001   Epoch: 19   Global Step: 109350   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:20:42,438-Speed 5640.65 samples/sec   Loss 0.8851   LearningRate 0.0001   Epoch: 19   Global Step: 109360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:44,261-Speed 5618.59 samples/sec   Loss 0.9271   LearningRate 0.0001   Epoch: 19   Global Step: 109370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:46,080-Speed 5631.50 samples/sec   Loss 0.9331   LearningRate 0.0001   Epoch: 19   Global Step: 109380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:47,932-Speed 5529.20 samples/sec   Loss 0.8566   LearningRate 0.0001   Epoch: 19   Global Step: 109390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:49,780-Speed 5544.64 samples/sec   Loss 0.9146   LearningRate 0.0001   Epoch: 19   Global Step: 109400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:51,612-Speed 5591.13 samples/sec   Loss 0.9149   LearningRate 0.0001   Epoch: 19   Global Step: 109410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:53,454-Speed 5560.41 samples/sec   Loss 0.8773   LearningRate 0.0001   Epoch: 19   Global Step: 109420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:55,351-Speed 5399.70 samples/sec   Loss 0.8473   LearningRate 0.0001   Epoch: 19   Global Step: 109430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:57,189-Speed 5573.58 samples/sec   Loss 0.9404   LearningRate 0.0001   Epoch: 19   Global Step: 109440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:20:59,020-Speed 5595.47 samples/sec   Loss 0.9023   LearningRate 0.0001   Epoch: 19   Global Step: 109450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:00,867-Speed 5544.56 samples/sec   Loss 0.9103   LearningRate 0.0001   Epoch: 19   Global Step: 109460   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:21:02,691-Speed 5617.16 samples/sec   Loss 0.8987   LearningRate 0.0001   Epoch: 19   Global Step: 109470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:04,517-Speed 5611.14 samples/sec   Loss 0.9311   LearningRate 0.0001   Epoch: 19   Global Step: 109480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:06,345-Speed 5603.48 samples/sec   Loss 0.8869   LearningRate 0.0001   Epoch: 19   Global Step: 109490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:08,166-Speed 5623.39 samples/sec   Loss 0.8800   LearningRate 0.0001   Epoch: 19   Global Step: 109500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:10,001-Speed 5584.26 samples/sec   Loss 0.8930   LearningRate 0.0001   Epoch: 19   Global Step: 109510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:11,832-Speed 5593.82 samples/sec   Loss 0.8770   LearningRate 0.0001   Epoch: 19   Global Step: 109520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:13,650-Speed 5635.17 samples/sec   Loss 0.8493   LearningRate 0.0001   Epoch: 19   Global Step: 109530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:15,483-Speed 5586.08 samples/sec   Loss 0.8870   LearningRate 0.0001   Epoch: 19   Global Step: 109540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:17,304-Speed 5625.38 samples/sec   Loss 0.8504   LearningRate 0.0001   Epoch: 19   Global Step: 109550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:19,126-Speed 5623.04 samples/sec   Loss 0.8983   LearningRate 0.0001   Epoch: 19   Global Step: 109560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:20,981-Speed 5523.00 samples/sec   Loss 0.8813   LearningRate 0.0001   Epoch: 19   Global Step: 109570   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:21:22,807-Speed 5609.52 samples/sec   Loss 0.9535   LearningRate 0.0001   Epoch: 19   Global Step: 109580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:24,655-Speed 5542.30 samples/sec   Loss 0.9411   LearningRate 0.0001   Epoch: 19   Global Step: 109590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:26,487-Speed 5591.20 samples/sec   Loss 0.8634   LearningRate 0.0001   Epoch: 19   Global Step: 109600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:28,323-Speed 5579.54 samples/sec   Loss 0.9083   LearningRate 0.0001   Epoch: 19   Global Step: 109610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:30,149-Speed 5609.82 samples/sec   Loss 0.9143   LearningRate 0.0001   Epoch: 19   Global Step: 109620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:31,986-Speed 5576.47 samples/sec   Loss 0.8787   LearningRate 0.0001   Epoch: 19   Global Step: 109630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:33,818-Speed 5591.50 samples/sec   Loss 0.9038   LearningRate 0.0001   Epoch: 19   Global Step: 109640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:35,638-Speed 5628.93 samples/sec   Loss 0.8910   LearningRate 0.0001   Epoch: 19   Global Step: 109650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:37,461-Speed 5616.77 samples/sec   Loss 0.8881   LearningRate 0.0001   Epoch: 19   Global Step: 109660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:39,320-Speed 5511.74 samples/sec   Loss 0.9127   LearningRate 0.0001   Epoch: 19   Global Step: 109670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:41,151-Speed 5594.30 samples/sec   Loss 0.9372   LearningRate 0.0001   Epoch: 19   Global Step: 109680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:42,993-Speed 5559.99 samples/sec   Loss 0.8548   LearningRate 0.0001   Epoch: 19   Global Step: 109690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:44,831-Speed 5572.62 samples/sec   Loss 0.9021   LearningRate 0.0001   Epoch: 19   Global Step: 109700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:46,671-Speed 5567.62 samples/sec   Loss 0.9176   LearningRate 0.0001   Epoch: 19   Global Step: 109710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:48,513-Speed 5559.95 samples/sec   Loss 0.8574   LearningRate 0.0001   Epoch: 19   Global Step: 109720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:50,344-Speed 5595.60 samples/sec   Loss 0.9404   LearningRate 0.0001   Epoch: 19   Global Step: 109730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:52,180-Speed 5580.50 samples/sec   Loss 0.9608   LearningRate 0.0001   Epoch: 19   Global Step: 109740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:54,044-Speed 5495.50 samples/sec   Loss 0.8677   LearningRate 0.0001   Epoch: 19   Global Step: 109750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:55,894-Speed 5536.24 samples/sec   Loss 0.9225   LearningRate 0.0001   Epoch: 19   Global Step: 109760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:57,744-Speed 5537.31 samples/sec   Loss 0.9428   LearningRate 0.0001   Epoch: 19   Global Step: 109770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:21:59,571-Speed 5606.41 samples/sec   Loss 0.8972   LearningRate 0.0001   Epoch: 19   Global Step: 109780   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:22:01,390-Speed 5632.09 samples/sec   Loss 0.8742   LearningRate 0.0001   Epoch: 19   Global Step: 109790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:03,242-Speed 5529.37 samples/sec   Loss 0.9105   LearningRate 0.0001   Epoch: 19   Global Step: 109800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:05,072-Speed 5599.25 samples/sec   Loss 0.8888   LearningRate 0.0001   Epoch: 19   Global Step: 109810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:06,900-Speed 5604.36 samples/sec   Loss 0.8566   LearningRate 0.0001   Epoch: 19   Global Step: 109820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:08,739-Speed 5567.44 samples/sec   Loss 0.8975   LearningRate 0.0001   Epoch: 19   Global Step: 109830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:10,594-Speed 5525.10 samples/sec   Loss 0.8667   LearningRate 0.0001   Epoch: 19   Global Step: 109840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:12,450-Speed 5516.65 samples/sec   Loss 0.9446   LearningRate 0.0001   Epoch: 19   Global Step: 109850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:14,286-Speed 5579.62 samples/sec   Loss 0.9225   LearningRate 0.0001   Epoch: 19   Global Step: 109860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:16,127-Speed 5564.33 samples/sec   Loss 0.8698   LearningRate 0.0001   Epoch: 19   Global Step: 109870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:17,946-Speed 5631.09 samples/sec   Loss 0.8461   LearningRate 0.0001   Epoch: 19   Global Step: 109880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:19,773-Speed 5608.77 samples/sec   Loss 0.8857   LearningRate 0.0001   Epoch: 19   Global Step: 109890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:21,616-Speed 5555.64 samples/sec   Loss 0.8835   LearningRate 0.0001   Epoch: 19   Global Step: 109900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:23,490-Speed 5467.89 samples/sec   Loss 0.9047   LearningRate 0.0001   Epoch: 19   Global Step: 109910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:25,314-Speed 5616.34 samples/sec   Loss 0.8615   LearningRate 0.0001   Epoch: 19   Global Step: 109920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:27,157-Speed 5557.07 samples/sec   Loss 0.9030   LearningRate 0.0001   Epoch: 19   Global Step: 109930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:29,013-Speed 5519.32 samples/sec   Loss 0.9220   LearningRate 0.0001   Epoch: 19   Global Step: 109940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:30,847-Speed 5584.96 samples/sec   Loss 0.9238   LearningRate 0.0001   Epoch: 19   Global Step: 109950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:32,679-Speed 5591.39 samples/sec   Loss 0.8983   LearningRate 0.0001   Epoch: 19   Global Step: 109960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:34,523-Speed 5556.71 samples/sec   Loss 0.9055   LearningRate 0.0001   Epoch: 19   Global Step: 109970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:36,391-Speed 5482.02 samples/sec   Loss 0.9275   LearningRate 0.0001   Epoch: 19   Global Step: 109980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:38,212-Speed 5625.69 samples/sec   Loss 0.9376   LearningRate 0.0001   Epoch: 19   Global Step: 109990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:22:40,066-Speed 5524.04 samples/sec   Loss 0.9827   LearningRate 0.0001   Epoch: 19   Global Step: 110000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:23:06,239-[lfw][110000]XNorm: 21.958611
Training: 2022-04-27 08:23:06,240-[lfw][110000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-27 08:23:06,240-[lfw][110000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:23:36,577-[cfp_fp][110000]XNorm: 21.303466
Training: 2022-04-27 08:23:36,578-[cfp_fp][110000]Accuracy-Flip: 0.97971+-0.00556
Training: 2022-04-27 08:23:36,578-[cfp_fp][110000]Accuracy-Highest: 0.98100
Training: 2022-04-27 08:24:02,790-[agedb_30][110000]XNorm: 22.110064
Training: 2022-04-27 08:24:02,790-[agedb_30][110000]Accuracy-Flip: 0.98217+-0.00610
Training: 2022-04-27 08:24:02,791-[agedb_30][110000]Accuracy-Highest: 0.98233
Training: 2022-04-27 08:24:04,623-Speed 121.10 samples/sec   Loss 0.8766   LearningRate 0.0001   Epoch: 19   Global Step: 110010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:06,478-Speed 5521.27 samples/sec   Loss 0.7726   LearningRate 0.0001   Epoch: 19   Global Step: 110020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:08,378-Speed 5391.09 samples/sec   Loss 0.8861   LearningRate 0.0001   Epoch: 19   Global Step: 110030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:10,228-Speed 5535.66 samples/sec   Loss 0.9379   LearningRate 0.0001   Epoch: 19   Global Step: 110040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:12,090-Speed 5503.42 samples/sec   Loss 0.9194   LearningRate 0.0001   Epoch: 19   Global Step: 110050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:13,947-Speed 5515.78 samples/sec   Loss 0.9042   LearningRate 0.0001   Epoch: 19   Global Step: 110060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:15,779-Speed 5590.37 samples/sec   Loss 0.8612   LearningRate 0.0001   Epoch: 19   Global Step: 110070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:17,593-Speed 5646.18 samples/sec   Loss 0.9473   LearningRate 0.0001   Epoch: 19   Global Step: 110080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:19,435-Speed 5561.96 samples/sec   Loss 0.9098   LearningRate 0.0001   Epoch: 19   Global Step: 110090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:21,249-Speed 5646.48 samples/sec   Loss 0.9094   LearningRate 0.0001   Epoch: 19   Global Step: 110100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:23,078-Speed 5601.87 samples/sec   Loss 0.9111   LearningRate 0.0001   Epoch: 19   Global Step: 110110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:24,900-Speed 5621.59 samples/sec   Loss 0.8879   LearningRate 0.0001   Epoch: 19   Global Step: 110120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:26,737-Speed 5575.82 samples/sec   Loss 0.9179   LearningRate 0.0001   Epoch: 19   Global Step: 110130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:28,566-Speed 5600.06 samples/sec   Loss 0.8814   LearningRate 0.0001   Epoch: 19   Global Step: 110140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:30,406-Speed 5567.11 samples/sec   Loss 0.8566   LearningRate 0.0001   Epoch: 19   Global Step: 110150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:32,233-Speed 5607.95 samples/sec   Loss 0.8863   LearningRate 0.0001   Epoch: 19   Global Step: 110160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:34,052-Speed 5632.03 samples/sec   Loss 0.8666   LearningRate 0.0001   Epoch: 19   Global Step: 110170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:35,889-Speed 5575.17 samples/sec   Loss 0.8920   LearningRate 0.0001   Epoch: 19   Global Step: 110180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:37,727-Speed 5573.01 samples/sec   Loss 0.8435   LearningRate 0.0001   Epoch: 19   Global Step: 110190   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:24:39,534-Speed 5668.07 samples/sec   Loss 0.8592   LearningRate 0.0001   Epoch: 19   Global Step: 110200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:41,408-Speed 5467.20 samples/sec   Loss 1.0199   LearningRate 0.0001   Epoch: 19   Global Step: 110210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:43,233-Speed 5613.55 samples/sec   Loss 0.8814   LearningRate 0.0001   Epoch: 19   Global Step: 110220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:45,099-Speed 5487.74 samples/sec   Loss 0.9026   LearningRate 0.0001   Epoch: 19   Global Step: 110230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:46,926-Speed 5607.83 samples/sec   Loss 0.8704   LearningRate 0.0001   Epoch: 19   Global Step: 110240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:48,762-Speed 5577.45 samples/sec   Loss 0.9268   LearningRate 0.0001   Epoch: 19   Global Step: 110250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:50,592-Speed 5600.12 samples/sec   Loss 0.8913   LearningRate 0.0001   Epoch: 19   Global Step: 110260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:52,421-Speed 5598.85 samples/sec   Loss 0.8649   LearningRate 0.0001   Epoch: 19   Global Step: 110270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:54,314-Speed 5410.33 samples/sec   Loss 0.8818   LearningRate 0.0001   Epoch: 19   Global Step: 110280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:56,163-Speed 5543.00 samples/sec   Loss 0.8715   LearningRate 0.0001   Epoch: 19   Global Step: 110290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:57,992-Speed 5599.57 samples/sec   Loss 0.9493   LearningRate 0.0001   Epoch: 19   Global Step: 110300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:24:59,827-Speed 5582.61 samples/sec   Loss 0.9414   LearningRate 0.0001   Epoch: 19   Global Step: 110310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:01,708-Speed 5444.70 samples/sec   Loss 0.8249   LearningRate 0.0001   Epoch: 19   Global Step: 110320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:03,557-Speed 5541.04 samples/sec   Loss 0.8857   LearningRate 0.0001   Epoch: 19   Global Step: 110330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:05,382-Speed 5613.80 samples/sec   Loss 0.9497   LearningRate 0.0001   Epoch: 19   Global Step: 110340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:07,204-Speed 5621.29 samples/sec   Loss 0.9075   LearningRate 0.0001   Epoch: 19   Global Step: 110350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:09,041-Speed 5575.92 samples/sec   Loss 0.9098   LearningRate 0.0001   Epoch: 19   Global Step: 110360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:10,874-Speed 5586.86 samples/sec   Loss 0.9704   LearningRate 0.0001   Epoch: 19   Global Step: 110370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:12,751-Speed 5457.87 samples/sec   Loss 0.8967   LearningRate 0.0001   Epoch: 19   Global Step: 110380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:14,579-Speed 5605.18 samples/sec   Loss 0.9691   LearningRate 0.0001   Epoch: 19   Global Step: 110390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:16,417-Speed 5570.98 samples/sec   Loss 0.8612   LearningRate 0.0001   Epoch: 19   Global Step: 110400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:18,256-Speed 5572.06 samples/sec   Loss 0.9367   LearningRate 0.0001   Epoch: 19   Global Step: 110410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:20,106-Speed 5538.30 samples/sec   Loss 0.9410   LearningRate 0.0001   Epoch: 19   Global Step: 110420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:21,937-Speed 5594.82 samples/sec   Loss 0.9191   LearningRate 0.0001   Epoch: 19   Global Step: 110430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:23,765-Speed 5602.51 samples/sec   Loss 0.9018   LearningRate 0.0001   Epoch: 19   Global Step: 110440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:25,598-Speed 5586.28 samples/sec   Loss 0.9166   LearningRate 0.0001   Epoch: 19   Global Step: 110450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:27,440-Speed 5562.74 samples/sec   Loss 0.9060   LearningRate 0.0001   Epoch: 19   Global Step: 110460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:29,253-Speed 5647.96 samples/sec   Loss 0.9390   LearningRate 0.0001   Epoch: 19   Global Step: 110470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:31,075-Speed 5623.11 samples/sec   Loss 0.8355   LearningRate 0.0001   Epoch: 19   Global Step: 110480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:32,892-Speed 5638.11 samples/sec   Loss 0.8555   LearningRate 0.0001   Epoch: 19   Global Step: 110490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:34,711-Speed 5632.58 samples/sec   Loss 0.8456   LearningRate 0.0001   Epoch: 19   Global Step: 110500   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:25:36,536-Speed 5612.68 samples/sec   Loss 0.8690   LearningRate 0.0001   Epoch: 19   Global Step: 110510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:38,374-Speed 5572.78 samples/sec   Loss 0.8647   LearningRate 0.0001   Epoch: 19   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:40,204-Speed 5597.57 samples/sec   Loss 0.9714   LearningRate 0.0001   Epoch: 19   Global Step: 110530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:42,031-Speed 5607.72 samples/sec   Loss 0.8980   LearningRate 0.0001   Epoch: 19   Global Step: 110540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:43,853-Speed 5622.85 samples/sec   Loss 0.9194   LearningRate 0.0001   Epoch: 19   Global Step: 110550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:45,683-Speed 5596.81 samples/sec   Loss 0.8756   LearningRate 0.0001   Epoch: 19   Global Step: 110560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:47,521-Speed 5572.87 samples/sec   Loss 0.9355   LearningRate 0.0001   Epoch: 19   Global Step: 110570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:49,353-Speed 5590.26 samples/sec   Loss 0.9309   LearningRate 0.0001   Epoch: 19   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:51,178-Speed 5612.06 samples/sec   Loss 0.8916   LearningRate 0.0001   Epoch: 19   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:53,004-Speed 5611.16 samples/sec   Loss 0.9394   LearningRate 0.0001   Epoch: 19   Global Step: 110600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:54,836-Speed 5591.41 samples/sec   Loss 0.9227   LearningRate 0.0001   Epoch: 19   Global Step: 110610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:56,661-Speed 5612.37 samples/sec   Loss 0.9691   LearningRate 0.0001   Epoch: 19   Global Step: 110620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:25:58,494-Speed 5588.37 samples/sec   Loss 0.9187   LearningRate 0.0001   Epoch: 19   Global Step: 110630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:00,322-Speed 5604.68 samples/sec   Loss 0.8837   LearningRate 0.0001   Epoch: 19   Global Step: 110640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:02,149-Speed 5606.93 samples/sec   Loss 0.9076   LearningRate 0.0001   Epoch: 19   Global Step: 110650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:03,987-Speed 5573.75 samples/sec   Loss 0.9188   LearningRate 0.0001   Epoch: 19   Global Step: 110660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:05,806-Speed 5629.58 samples/sec   Loss 0.9303   LearningRate 0.0001   Epoch: 19   Global Step: 110670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:07,646-Speed 5567.64 samples/sec   Loss 0.8511   LearningRate 0.0001   Epoch: 19   Global Step: 110680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:09,502-Speed 5518.26 samples/sec   Loss 0.8993   LearningRate 0.0001   Epoch: 19   Global Step: 110690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:11,325-Speed 5619.60 samples/sec   Loss 0.9135   LearningRate 0.0001   Epoch: 19   Global Step: 110700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:13,149-Speed 5616.13 samples/sec   Loss 0.8594   LearningRate 0.0001   Epoch: 19   Global Step: 110710   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:26:15,001-Speed 5531.30 samples/sec   Loss 0.9683   LearningRate 0.0001   Epoch: 19   Global Step: 110720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:16,830-Speed 5601.59 samples/sec   Loss 0.8446   LearningRate 0.0001   Epoch: 19   Global Step: 110730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:18,712-Speed 5440.89 samples/sec   Loss 0.9203   LearningRate 0.0001   Epoch: 19   Global Step: 110740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:20,566-Speed 5526.75 samples/sec   Loss 0.8919   LearningRate 0.0001   Epoch: 19   Global Step: 110750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:22,407-Speed 5563.17 samples/sec   Loss 0.9479   LearningRate 0.0001   Epoch: 19   Global Step: 110760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:24,243-Speed 5579.91 samples/sec   Loss 0.8953   LearningRate 0.0001   Epoch: 19   Global Step: 110770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:26,074-Speed 5593.17 samples/sec   Loss 0.8906   LearningRate 0.0001   Epoch: 19   Global Step: 110780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:27,917-Speed 5560.56 samples/sec   Loss 0.8807   LearningRate 0.0001   Epoch: 19   Global Step: 110790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:29,756-Speed 5570.19 samples/sec   Loss 0.9166   LearningRate 0.0001   Epoch: 19   Global Step: 110800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:31,578-Speed 5620.36 samples/sec   Loss 0.9278   LearningRate 0.0001   Epoch: 19   Global Step: 110810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:33,421-Speed 5559.77 samples/sec   Loss 0.8910   LearningRate 0.0001   Epoch: 19   Global Step: 110820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:35,244-Speed 5617.97 samples/sec   Loss 0.8909   LearningRate 0.0001   Epoch: 19   Global Step: 110830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:37,096-Speed 5531.05 samples/sec   Loss 0.9296   LearningRate 0.0001   Epoch: 19   Global Step: 110840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:38,928-Speed 5591.72 samples/sec   Loss 0.9313   LearningRate 0.0001   Epoch: 19   Global Step: 110850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:40,767-Speed 5569.39 samples/sec   Loss 0.9419   LearningRate 0.0001   Epoch: 19   Global Step: 110860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:42,620-Speed 5529.28 samples/sec   Loss 0.8723   LearningRate 0.0001   Epoch: 19   Global Step: 110870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:44,444-Speed 5614.57 samples/sec   Loss 0.8563   LearningRate 0.0001   Epoch: 19   Global Step: 110880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:46,268-Speed 5615.89 samples/sec   Loss 0.9053   LearningRate 0.0001   Epoch: 19   Global Step: 110890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:48,098-Speed 5598.65 samples/sec   Loss 0.9774   LearningRate 0.0001   Epoch: 19   Global Step: 110900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:26:49,922-Speed 5616.26 samples/sec   Loss 0.9089   LearningRate 0.0001   Epoch: 19   Global Step: 110910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:51,776-Speed 5524.25 samples/sec   Loss 0.9271   LearningRate 0.0001   Epoch: 19   Global Step: 110920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:53,621-Speed 5552.98 samples/sec   Loss 0.9354   LearningRate 0.0001   Epoch: 19   Global Step: 110930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:55,493-Speed 5471.84 samples/sec   Loss 0.8515   LearningRate 0.0001   Epoch: 19   Global Step: 110940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:57,368-Speed 5463.68 samples/sec   Loss 0.9065   LearningRate 0.0001   Epoch: 19   Global Step: 110950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:26:59,213-Speed 5551.09 samples/sec   Loss 0.8591   LearningRate 0.0001   Epoch: 19   Global Step: 110960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:01,052-Speed 5568.29 samples/sec   Loss 0.9632   LearningRate 0.0001   Epoch: 19   Global Step: 110970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:02,924-Speed 5474.84 samples/sec   Loss 0.9051   LearningRate 0.0001   Epoch: 19   Global Step: 110980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:04,806-Speed 5440.66 samples/sec   Loss 0.8835   LearningRate 0.0001   Epoch: 19   Global Step: 110990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:06,643-Speed 5575.81 samples/sec   Loss 0.9247   LearningRate 0.0001   Epoch: 19   Global Step: 111000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:08,466-Speed 5621.72 samples/sec   Loss 0.9027   LearningRate 0.0001   Epoch: 19   Global Step: 111010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:10,301-Speed 5581.89 samples/sec   Loss 0.8784   LearningRate 0.0001   Epoch: 19   Global Step: 111020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:12,136-Speed 5583.66 samples/sec   Loss 0.8511   LearningRate 0.0001   Epoch: 19   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:13,968-Speed 5589.77 samples/sec   Loss 0.9330   LearningRate 0.0001   Epoch: 19   Global Step: 111040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:15,797-Speed 5601.77 samples/sec   Loss 0.8921   LearningRate 0.0001   Epoch: 19   Global Step: 111050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:17,626-Speed 5598.74 samples/sec   Loss 0.9150   LearningRate 0.0001   Epoch: 19   Global Step: 111060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:19,452-Speed 5610.11 samples/sec   Loss 0.9331   LearningRate 0.0001   Epoch: 19   Global Step: 111070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:21,295-Speed 5558.30 samples/sec   Loss 0.9796   LearningRate 0.0001   Epoch: 19   Global Step: 111080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:23,133-Speed 5575.10 samples/sec   Loss 0.9005   LearningRate 0.0001   Epoch: 19   Global Step: 111090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:24,963-Speed 5596.75 samples/sec   Loss 0.9393   LearningRate 0.0001   Epoch: 19   Global Step: 111100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:26,781-Speed 5633.73 samples/sec   Loss 0.8956   LearningRate 0.0001   Epoch: 19   Global Step: 111110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:28,606-Speed 5612.71 samples/sec   Loss 0.9044   LearningRate 0.0001   Epoch: 19   Global Step: 111120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:30,453-Speed 5546.03 samples/sec   Loss 0.8717   LearningRate 0.0001   Epoch: 19   Global Step: 111130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:32,276-Speed 5620.42 samples/sec   Loss 0.8219   LearningRate 0.0001   Epoch: 19   Global Step: 111140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:34,101-Speed 5613.17 samples/sec   Loss 0.8775   LearningRate 0.0001   Epoch: 19   Global Step: 111150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:35,936-Speed 5580.19 samples/sec   Loss 0.9228   LearningRate 0.0001   Epoch: 19   Global Step: 111160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:37,773-Speed 5577.86 samples/sec   Loss 0.8968   LearningRate 0.0001   Epoch: 19   Global Step: 111170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:39,618-Speed 5550.16 samples/sec   Loss 0.8865   LearningRate 0.0000   Epoch: 19   Global Step: 111180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:41,462-Speed 5555.59 samples/sec   Loss 0.9415   LearningRate 0.0000   Epoch: 19   Global Step: 111190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:43,298-Speed 5579.10 samples/sec   Loss 0.9041   LearningRate 0.0000   Epoch: 19   Global Step: 111200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:45,127-Speed 5602.53 samples/sec   Loss 0.9769   LearningRate 0.0000   Epoch: 19   Global Step: 111210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:46,945-Speed 5632.46 samples/sec   Loss 0.8960   LearningRate 0.0000   Epoch: 19   Global Step: 111220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:48,774-Speed 5600.87 samples/sec   Loss 0.8915   LearningRate 0.0000   Epoch: 19   Global Step: 111230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:50,615-Speed 5563.95 samples/sec   Loss 0.9478   LearningRate 0.0000   Epoch: 19   Global Step: 111240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:27:52,430-Speed 5644.11 samples/sec   Loss 0.8053   LearningRate 0.0000   Epoch: 19   Global Step: 111250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:27:54,263-Speed 5590.99 samples/sec   Loss 0.8750   LearningRate 0.0000   Epoch: 19   Global Step: 111260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:27:56,087-Speed 5614.34 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 111270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:27:57,913-Speed 5610.75 samples/sec   Loss 0.9309   LearningRate 0.0000   Epoch: 19   Global Step: 111280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:27:59,756-Speed 5557.92 samples/sec   Loss 0.8973   LearningRate 0.0000   Epoch: 19   Global Step: 111290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:01,599-Speed 5557.53 samples/sec   Loss 0.8735   LearningRate 0.0000   Epoch: 19   Global Step: 111300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:03,438-Speed 5568.45 samples/sec   Loss 0.9159   LearningRate 0.0000   Epoch: 19   Global Step: 111310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:05,267-Speed 5600.30 samples/sec   Loss 0.9010   LearningRate 0.0000   Epoch: 19   Global Step: 111320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:07,111-Speed 5557.74 samples/sec   Loss 0.8398   LearningRate 0.0000   Epoch: 19   Global Step: 111330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:08,953-Speed 5559.36 samples/sec   Loss 0.9759   LearningRate 0.0000   Epoch: 19   Global Step: 111340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:10,781-Speed 5602.71 samples/sec   Loss 0.9290   LearningRate 0.0000   Epoch: 19   Global Step: 111350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:12,610-Speed 5601.11 samples/sec   Loss 0.8718   LearningRate 0.0000   Epoch: 19   Global Step: 111360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:14,434-Speed 5617.83 samples/sec   Loss 0.8892   LearningRate 0.0000   Epoch: 19   Global Step: 111370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:16,288-Speed 5525.88 samples/sec   Loss 0.9132   LearningRate 0.0000   Epoch: 19   Global Step: 111380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:18,146-Speed 5511.72 samples/sec   Loss 0.9157   LearningRate 0.0000   Epoch: 19   Global Step: 111390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:19,997-Speed 5534.61 samples/sec   Loss 0.9428   LearningRate 0.0000   Epoch: 19   Global Step: 111400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:21,833-Speed 5579.44 samples/sec   Loss 0.8195   LearningRate 0.0000   Epoch: 19   Global Step: 111410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:23,689-Speed 5519.28 samples/sec   Loss 0.8908   LearningRate 0.0000   Epoch: 19   Global Step: 111420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:25,539-Speed 5539.94 samples/sec   Loss 0.9257   LearningRate 0.0000   Epoch: 19   Global Step: 111430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:27,398-Speed 5510.80 samples/sec   Loss 0.9160   LearningRate 0.0000   Epoch: 19   Global Step: 111440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:29,225-Speed 5605.57 samples/sec   Loss 0.9201   LearningRate 0.0000   Epoch: 19   Global Step: 111450   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:28:31,054-Speed 5599.41 samples/sec   Loss 0.8807   LearningRate 0.0000   Epoch: 19   Global Step: 111460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:32,881-Speed 5607.08 samples/sec   Loss 0.9434   LearningRate 0.0000   Epoch: 19   Global Step: 111470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:34,763-Speed 5443.47 samples/sec   Loss 0.9207   LearningRate 0.0000   Epoch: 19   Global Step: 111480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:36,594-Speed 5593.70 samples/sec   Loss 0.8088   LearningRate 0.0000   Epoch: 19   Global Step: 111490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:38,430-Speed 5582.79 samples/sec   Loss 0.8132   LearningRate 0.0000   Epoch: 19   Global Step: 111500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:40,254-Speed 5614.42 samples/sec   Loss 0.9575   LearningRate 0.0000   Epoch: 19   Global Step: 111510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:42,084-Speed 5598.64 samples/sec   Loss 0.9361   LearningRate 0.0000   Epoch: 19   Global Step: 111520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:43,908-Speed 5614.85 samples/sec   Loss 0.8810   LearningRate 0.0000   Epoch: 19   Global Step: 111530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:45,743-Speed 5581.33 samples/sec   Loss 0.8848   LearningRate 0.0000   Epoch: 19   Global Step: 111540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:47,588-Speed 5552.47 samples/sec   Loss 0.9151   LearningRate 0.0000   Epoch: 19   Global Step: 111550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:49,418-Speed 5597.92 samples/sec   Loss 0.9495   LearningRate 0.0000   Epoch: 19   Global Step: 111560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:28:51,279-Speed 5504.66 samples/sec   Loss 0.9116   LearningRate 0.0000   Epoch: 19   Global Step: 111570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:53,139-Speed 5506.96 samples/sec   Loss 0.8282   LearningRate 0.0000   Epoch: 19   Global Step: 111580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:54,969-Speed 5596.53 samples/sec   Loss 0.8986   LearningRate 0.0000   Epoch: 19   Global Step: 111590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:56,795-Speed 5611.78 samples/sec   Loss 0.9341   LearningRate 0.0000   Epoch: 19   Global Step: 111600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:28:58,631-Speed 5578.56 samples/sec   Loss 0.8484   LearningRate 0.0000   Epoch: 19   Global Step: 111610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:00,466-Speed 5583.24 samples/sec   Loss 0.8265   LearningRate 0.0000   Epoch: 19   Global Step: 111620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:02,298-Speed 5590.37 samples/sec   Loss 0.9095   LearningRate 0.0000   Epoch: 19   Global Step: 111630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:04,118-Speed 5627.17 samples/sec   Loss 0.8208   LearningRate 0.0000   Epoch: 19   Global Step: 111640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:05,941-Speed 5620.62 samples/sec   Loss 0.9348   LearningRate 0.0000   Epoch: 19   Global Step: 111650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:07,768-Speed 5608.02 samples/sec   Loss 0.8771   LearningRate 0.0000   Epoch: 19   Global Step: 111660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:09,589-Speed 5623.26 samples/sec   Loss 0.9045   LearningRate 0.0000   Epoch: 19   Global Step: 111670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:11,408-Speed 5631.85 samples/sec   Loss 0.8665   LearningRate 0.0000   Epoch: 19   Global Step: 111680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:13,249-Speed 5564.80 samples/sec   Loss 0.8781   LearningRate 0.0000   Epoch: 19   Global Step: 111690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:15,076-Speed 5604.49 samples/sec   Loss 0.9119   LearningRate 0.0000   Epoch: 19   Global Step: 111700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:16,898-Speed 5623.10 samples/sec   Loss 0.8675   LearningRate 0.0000   Epoch: 19   Global Step: 111710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:18,729-Speed 5593.90 samples/sec   Loss 0.8610   LearningRate 0.0000   Epoch: 19   Global Step: 111720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:20,562-Speed 5588.69 samples/sec   Loss 0.8887   LearningRate 0.0000   Epoch: 19   Global Step: 111730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:22,383-Speed 5627.65 samples/sec   Loss 0.9565   LearningRate 0.0000   Epoch: 19   Global Step: 111740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:24,206-Speed 5615.96 samples/sec   Loss 0.8903   LearningRate 0.0000   Epoch: 19   Global Step: 111750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:26,039-Speed 5590.82 samples/sec   Loss 0.9298   LearningRate 0.0000   Epoch: 19   Global Step: 111760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:27,855-Speed 5639.89 samples/sec   Loss 0.9452   LearningRate 0.0000   Epoch: 19   Global Step: 111770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:29,674-Speed 5631.47 samples/sec   Loss 0.8579   LearningRate 0.0000   Epoch: 19   Global Step: 111780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:31,492-Speed 5634.20 samples/sec   Loss 0.8802   LearningRate 0.0000   Epoch: 19   Global Step: 111790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:33,323-Speed 5594.90 samples/sec   Loss 0.9401   LearningRate 0.0000   Epoch: 19   Global Step: 111800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:35,140-Speed 5635.43 samples/sec   Loss 0.8375   LearningRate 0.0000   Epoch: 19   Global Step: 111810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:37,000-Speed 5507.07 samples/sec   Loss 0.8564   LearningRate 0.0000   Epoch: 19   Global Step: 111820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:38,834-Speed 5587.88 samples/sec   Loss 0.8263   LearningRate 0.0000   Epoch: 19   Global Step: 111830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:40,661-Speed 5607.31 samples/sec   Loss 1.0085   LearningRate 0.0000   Epoch: 19   Global Step: 111840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:42,483-Speed 5622.39 samples/sec   Loss 0.9175   LearningRate 0.0000   Epoch: 19   Global Step: 111850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:44,330-Speed 5543.59 samples/sec   Loss 0.8786   LearningRate 0.0000   Epoch: 19   Global Step: 111860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:46,156-Speed 5611.98 samples/sec   Loss 0.8909   LearningRate 0.0000   Epoch: 19   Global Step: 111870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:47,982-Speed 5609.77 samples/sec   Loss 0.8728   LearningRate 0.0000   Epoch: 19   Global Step: 111880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:49,806-Speed 5613.68 samples/sec   Loss 0.8616   LearningRate 0.0000   Epoch: 19   Global Step: 111890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:51,633-Speed 5607.20 samples/sec   Loss 0.9687   LearningRate 0.0000   Epoch: 19   Global Step: 111900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:29:53,474-Speed 5564.32 samples/sec   Loss 0.9218   LearningRate 0.0000   Epoch: 19   Global Step: 111910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:55,296-Speed 5622.60 samples/sec   Loss 0.8335   LearningRate 0.0000   Epoch: 19   Global Step: 111920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:57,137-Speed 5564.95 samples/sec   Loss 0.8661   LearningRate 0.0000   Epoch: 19   Global Step: 111930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:29:58,973-Speed 5578.85 samples/sec   Loss 0.8876   LearningRate 0.0000   Epoch: 19   Global Step: 111940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:30:00,824-Speed 5532.24 samples/sec   Loss 0.9738   LearningRate 0.0000   Epoch: 19   Global Step: 111950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:30:02,698-Speed 5465.74 samples/sec   Loss 0.9293   LearningRate 0.0000   Epoch: 19   Global Step: 111960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:30:04,558-Speed 5509.03 samples/sec   Loss 0.9188   LearningRate 0.0000   Epoch: 19   Global Step: 111970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:30:06,391-Speed 5588.16 samples/sec   Loss 0.8895   LearningRate 0.0000   Epoch: 19   Global Step: 111980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:30:08,221-Speed 5597.26 samples/sec   Loss 0.8981   LearningRate 0.0000   Epoch: 19   Global Step: 111990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:30:10,059-Speed 5573.15 samples/sec   Loss 0.9471   LearningRate 0.0000   Epoch: 19   Global Step: 112000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:30:36,366-[lfw][112000]XNorm: 21.959637
Training: 2022-04-27 08:30:36,367-[lfw][112000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-27 08:30:36,367-[lfw][112000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:31:06,928-[cfp_fp][112000]XNorm: 21.275295
Training: 2022-04-27 08:31:06,929-[cfp_fp][112000]Accuracy-Flip: 0.97943+-0.00590
Training: 2022-04-27 08:31:06,929-[cfp_fp][112000]Accuracy-Highest: 0.98100
Training: 2022-04-27 08:31:33,334-[agedb_30][112000]XNorm: 22.080918
Training: 2022-04-27 08:31:33,335-[agedb_30][112000]Accuracy-Flip: 0.98250+-0.00651
Training: 2022-04-27 08:31:33,335-[agedb_30][112000]Accuracy-Highest: 0.98250
Training: 2022-04-27 08:31:35,173-Speed 120.31 samples/sec   Loss 0.8574   LearningRate 0.0000   Epoch: 19   Global Step: 112010   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:31:36,976-Speed 5681.69 samples/sec   Loss 0.9182   LearningRate 0.0000   Epoch: 19   Global Step: 112020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:38,797-Speed 5627.07 samples/sec   Loss 0.8749   LearningRate 0.0000   Epoch: 19   Global Step: 112030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:40,628-Speed 5592.46 samples/sec   Loss 0.8665   LearningRate 0.0000   Epoch: 19   Global Step: 112040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:42,443-Speed 5645.09 samples/sec   Loss 0.8959   LearningRate 0.0000   Epoch: 19   Global Step: 112050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:44,255-Speed 5653.60 samples/sec   Loss 0.8637   LearningRate 0.0000   Epoch: 19   Global Step: 112060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:46,084-Speed 5600.43 samples/sec   Loss 0.8189   LearningRate 0.0000   Epoch: 19   Global Step: 112070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:47,911-Speed 5607.05 samples/sec   Loss 0.9052   LearningRate 0.0000   Epoch: 19   Global Step: 112080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:49,726-Speed 5644.47 samples/sec   Loss 0.8896   LearningRate 0.0000   Epoch: 19   Global Step: 112090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:51,541-Speed 5643.04 samples/sec   Loss 0.9044   LearningRate 0.0000   Epoch: 19   Global Step: 112100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:53,362-Speed 5623.31 samples/sec   Loss 0.9100   LearningRate 0.0000   Epoch: 19   Global Step: 112110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:55,168-Speed 5673.90 samples/sec   Loss 0.8337   LearningRate 0.0000   Epoch: 19   Global Step: 112120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:56,994-Speed 5607.76 samples/sec   Loss 0.8771   LearningRate 0.0000   Epoch: 19   Global Step: 112130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:31:58,830-Speed 5581.30 samples/sec   Loss 0.9087   LearningRate 0.0000   Epoch: 19   Global Step: 112140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:00,661-Speed 5593.87 samples/sec   Loss 0.9124   LearningRate 0.0000   Epoch: 19   Global Step: 112150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:02,492-Speed 5595.22 samples/sec   Loss 1.0071   LearningRate 0.0000   Epoch: 19   Global Step: 112160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:04,317-Speed 5610.85 samples/sec   Loss 0.9303   LearningRate 0.0000   Epoch: 19   Global Step: 112170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:06,182-Speed 5494.13 samples/sec   Loss 0.9165   LearningRate 0.0000   Epoch: 19   Global Step: 112180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:08,032-Speed 5536.49 samples/sec   Loss 0.8450   LearningRate 0.0000   Epoch: 19   Global Step: 112190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:09,861-Speed 5601.62 samples/sec   Loss 0.9034   LearningRate 0.0000   Epoch: 19   Global Step: 112200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:11,686-Speed 5613.40 samples/sec   Loss 0.9958   LearningRate 0.0000   Epoch: 19   Global Step: 112210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:13,508-Speed 5620.27 samples/sec   Loss 0.8481   LearningRate 0.0000   Epoch: 19   Global Step: 112220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:15,334-Speed 5610.21 samples/sec   Loss 0.9603   LearningRate 0.0000   Epoch: 19   Global Step: 112230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:17,162-Speed 5604.83 samples/sec   Loss 0.8694   LearningRate 0.0000   Epoch: 19   Global Step: 112240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:18,988-Speed 5610.12 samples/sec   Loss 0.9987   LearningRate 0.0000   Epoch: 19   Global Step: 112250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:20,830-Speed 5560.36 samples/sec   Loss 0.8499   LearningRate 0.0000   Epoch: 19   Global Step: 112260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:22,671-Speed 5564.79 samples/sec   Loss 0.8354   LearningRate 0.0000   Epoch: 19   Global Step: 112270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:24,501-Speed 5596.37 samples/sec   Loss 0.9289   LearningRate 0.0000   Epoch: 19   Global Step: 112280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:26,335-Speed 5587.10 samples/sec   Loss 0.9064   LearningRate 0.0000   Epoch: 19   Global Step: 112290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:28,159-Speed 5616.54 samples/sec   Loss 0.8722   LearningRate 0.0000   Epoch: 19   Global Step: 112300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:30,000-Speed 5562.92 samples/sec   Loss 0.9242   LearningRate 0.0000   Epoch: 19   Global Step: 112310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:31,841-Speed 5564.63 samples/sec   Loss 0.9547   LearningRate 0.0000   Epoch: 19   Global Step: 112320   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:32:33,682-Speed 5563.08 samples/sec   Loss 0.8941   LearningRate 0.0000   Epoch: 19   Global Step: 112330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:35,516-Speed 5584.96 samples/sec   Loss 0.9459   LearningRate 0.0000   Epoch: 19   Global Step: 112340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:37,348-Speed 5592.30 samples/sec   Loss 0.9163   LearningRate 0.0000   Epoch: 19   Global Step: 112350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:39,162-Speed 5646.50 samples/sec   Loss 0.9591   LearningRate 0.0000   Epoch: 19   Global Step: 112360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:41,009-Speed 5544.66 samples/sec   Loss 0.8865   LearningRate 0.0000   Epoch: 19   Global Step: 112370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:42,843-Speed 5587.39 samples/sec   Loss 0.9209   LearningRate 0.0000   Epoch: 19   Global Step: 112380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:44,688-Speed 5550.72 samples/sec   Loss 0.8776   LearningRate 0.0000   Epoch: 19   Global Step: 112390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:46,507-Speed 5632.22 samples/sec   Loss 0.8926   LearningRate 0.0000   Epoch: 19   Global Step: 112400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:48,334-Speed 5606.60 samples/sec   Loss 0.8581   LearningRate 0.0000   Epoch: 19   Global Step: 112410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:50,163-Speed 5602.53 samples/sec   Loss 0.9561   LearningRate 0.0000   Epoch: 19   Global Step: 112420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:51,984-Speed 5623.62 samples/sec   Loss 0.8445   LearningRate 0.0000   Epoch: 19   Global Step: 112430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:53,820-Speed 5579.24 samples/sec   Loss 0.8983   LearningRate 0.0000   Epoch: 19   Global Step: 112440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:55,653-Speed 5587.93 samples/sec   Loss 0.9231   LearningRate 0.0000   Epoch: 19   Global Step: 112450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:57,493-Speed 5567.84 samples/sec   Loss 0.8963   LearningRate 0.0000   Epoch: 19   Global Step: 112460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:32:59,357-Speed 5496.13 samples/sec   Loss 0.9008   LearningRate 0.0000   Epoch: 19   Global Step: 112470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:01,210-Speed 5528.84 samples/sec   Loss 0.9411   LearningRate 0.0000   Epoch: 19   Global Step: 112480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:03,038-Speed 5601.00 samples/sec   Loss 0.9205   LearningRate 0.0000   Epoch: 19   Global Step: 112490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:04,866-Speed 5603.40 samples/sec   Loss 0.8858   LearningRate 0.0000   Epoch: 19   Global Step: 112500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:06,695-Speed 5599.92 samples/sec   Loss 0.8697   LearningRate 0.0000   Epoch: 19   Global Step: 112510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:08,513-Speed 5637.62 samples/sec   Loss 0.9286   LearningRate 0.0000   Epoch: 19   Global Step: 112520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:10,337-Speed 5614.07 samples/sec   Loss 0.9149   LearningRate 0.0000   Epoch: 19   Global Step: 112530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:12,155-Speed 5635.82 samples/sec   Loss 0.9604   LearningRate 0.0000   Epoch: 19   Global Step: 112540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:13,994-Speed 5568.82 samples/sec   Loss 0.9078   LearningRate 0.0000   Epoch: 19   Global Step: 112550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:15,813-Speed 5633.83 samples/sec   Loss 0.9345   LearningRate 0.0000   Epoch: 19   Global Step: 112560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:17,629-Speed 5640.50 samples/sec   Loss 0.9100   LearningRate 0.0000   Epoch: 19   Global Step: 112570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:19,481-Speed 5531.14 samples/sec   Loss 0.8602   LearningRate 0.0000   Epoch: 19   Global Step: 112580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:21,306-Speed 5611.52 samples/sec   Loss 0.9131   LearningRate 0.0000   Epoch: 19   Global Step: 112590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 08:33:23,132-Speed 5610.28 samples/sec   Loss 0.9047   LearningRate 0.0000   Epoch: 19   Global Step: 112600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:24,985-Speed 5525.94 samples/sec   Loss 0.9241   LearningRate 0.0000   Epoch: 19   Global Step: 112610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:26,813-Speed 5605.76 samples/sec   Loss 0.8346   LearningRate 0.0000   Epoch: 19   Global Step: 112620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:28,630-Speed 5637.32 samples/sec   Loss 0.8468   LearningRate 0.0000   Epoch: 19   Global Step: 112630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:30,457-Speed 5608.16 samples/sec   Loss 0.9342   LearningRate 0.0000   Epoch: 19   Global Step: 112640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:32,289-Speed 5591.14 samples/sec   Loss 0.9711   LearningRate 0.0000   Epoch: 19   Global Step: 112650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:34,130-Speed 5563.23 samples/sec   Loss 0.9092   LearningRate 0.0000   Epoch: 19   Global Step: 112660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:35,946-Speed 5640.45 samples/sec   Loss 0.8250   LearningRate 0.0000   Epoch: 19   Global Step: 112670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:37,773-Speed 5605.42 samples/sec   Loss 0.9400   LearningRate 0.0000   Epoch: 19   Global Step: 112680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:39,620-Speed 5546.80 samples/sec   Loss 0.8785   LearningRate 0.0000   Epoch: 19   Global Step: 112690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:41,442-Speed 5622.26 samples/sec   Loss 0.9702   LearningRate 0.0000   Epoch: 19   Global Step: 112700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:43,263-Speed 5625.05 samples/sec   Loss 0.8297   LearningRate 0.0000   Epoch: 19   Global Step: 112710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:45,082-Speed 5632.25 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 19   Global Step: 112720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:46,902-Speed 5626.73 samples/sec   Loss 0.9377   LearningRate 0.0000   Epoch: 19   Global Step: 112730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:48,725-Speed 5621.16 samples/sec   Loss 0.8867   LearningRate 0.0000   Epoch: 19   Global Step: 112740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:50,542-Speed 5634.76 samples/sec   Loss 0.8425   LearningRate 0.0000   Epoch: 19   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:52,370-Speed 5603.81 samples/sec   Loss 0.8879   LearningRate 0.0000   Epoch: 19   Global Step: 112760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:54,199-Speed 5603.85 samples/sec   Loss 0.9413   LearningRate 0.0000   Epoch: 19   Global Step: 112770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:56,022-Speed 5617.92 samples/sec   Loss 0.8532   LearningRate 0.0000   Epoch: 19   Global Step: 112780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:57,841-Speed 5632.22 samples/sec   Loss 0.8907   LearningRate 0.0000   Epoch: 19   Global Step: 112790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:33:59,663-Speed 5621.81 samples/sec   Loss 0.8543   LearningRate 0.0000   Epoch: 19   Global Step: 112800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:01,482-Speed 5631.85 samples/sec   Loss 0.8526   LearningRate 0.0000   Epoch: 19   Global Step: 112810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:03,313-Speed 5592.08 samples/sec   Loss 0.9205   LearningRate 0.0000   Epoch: 19   Global Step: 112820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:05,148-Speed 5582.25 samples/sec   Loss 0.9352   LearningRate 0.0000   Epoch: 19   Global Step: 112830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:06,999-Speed 5533.82 samples/sec   Loss 0.9191   LearningRate 0.0000   Epoch: 19   Global Step: 112840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:08,877-Speed 5454.08 samples/sec   Loss 0.9139   LearningRate 0.0000   Epoch: 19   Global Step: 112850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:10,699-Speed 5624.27 samples/sec   Loss 0.9960   LearningRate 0.0000   Epoch: 19   Global Step: 112860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:12,560-Speed 5504.04 samples/sec   Loss 0.9525   LearningRate 0.0000   Epoch: 19   Global Step: 112870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:14,400-Speed 5567.46 samples/sec   Loss 0.9554   LearningRate 0.0000   Epoch: 19   Global Step: 112880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:16,254-Speed 5524.20 samples/sec   Loss 0.8784   LearningRate 0.0000   Epoch: 19   Global Step: 112890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:18,092-Speed 5575.73 samples/sec   Loss 0.8722   LearningRate 0.0000   Epoch: 19   Global Step: 112900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:19,915-Speed 5616.02 samples/sec   Loss 0.9104   LearningRate 0.0000   Epoch: 19   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:21,780-Speed 5493.77 samples/sec   Loss 0.8279   LearningRate 0.0000   Epoch: 19   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:23,613-Speed 5587.34 samples/sec   Loss 0.9151   LearningRate 0.0000   Epoch: 19   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:25,447-Speed 5585.73 samples/sec   Loss 0.9983   LearningRate 0.0000   Epoch: 19   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:27,283-Speed 5578.87 samples/sec   Loss 0.9006   LearningRate 0.0000   Epoch: 19   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:29,100-Speed 5637.49 samples/sec   Loss 0.8745   LearningRate 0.0000   Epoch: 19   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:30,942-Speed 5561.98 samples/sec   Loss 0.8909   LearningRate 0.0000   Epoch: 19   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:32,780-Speed 5574.46 samples/sec   Loss 0.8836   LearningRate 0.0000   Epoch: 19   Global Step: 112980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:34,622-Speed 5559.99 samples/sec   Loss 0.8782   LearningRate 0.0000   Epoch: 19   Global Step: 112990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:36,463-Speed 5563.31 samples/sec   Loss 0.9376   LearningRate 0.0000   Epoch: 19   Global Step: 113000   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:34:38,296-Speed 5589.78 samples/sec   Loss 0.9337   LearningRate 0.0000   Epoch: 19   Global Step: 113010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:40,123-Speed 5607.36 samples/sec   Loss 0.8973   LearningRate 0.0000   Epoch: 19   Global Step: 113020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:41,940-Speed 5635.78 samples/sec   Loss 0.9430   LearningRate 0.0000   Epoch: 19   Global Step: 113030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:43,758-Speed 5635.54 samples/sec   Loss 0.8619   LearningRate 0.0000   Epoch: 19   Global Step: 113040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:45,596-Speed 5572.80 samples/sec   Loss 0.9007   LearningRate 0.0000   Epoch: 19   Global Step: 113050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:47,425-Speed 5601.18 samples/sec   Loss 0.8765   LearningRate 0.0000   Epoch: 19   Global Step: 113060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:49,266-Speed 5563.20 samples/sec   Loss 0.9296   LearningRate 0.0000   Epoch: 19   Global Step: 113070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:51,106-Speed 5565.69 samples/sec   Loss 0.9234   LearningRate 0.0000   Epoch: 19   Global Step: 113080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:52,934-Speed 5605.66 samples/sec   Loss 0.8872   LearningRate 0.0000   Epoch: 19   Global Step: 113090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:54,781-Speed 5546.40 samples/sec   Loss 0.8565   LearningRate 0.0000   Epoch: 19   Global Step: 113100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:56,595-Speed 5644.90 samples/sec   Loss 0.8864   LearningRate 0.0000   Epoch: 19   Global Step: 113110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:34:58,426-Speed 5595.35 samples/sec   Loss 0.8888   LearningRate 0.0000   Epoch: 19   Global Step: 113120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:00,256-Speed 5597.70 samples/sec   Loss 0.9345   LearningRate 0.0000   Epoch: 19   Global Step: 113130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:02,081-Speed 5613.81 samples/sec   Loss 0.9091   LearningRate 0.0000   Epoch: 19   Global Step: 113140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:03,917-Speed 5577.68 samples/sec   Loss 0.8902   LearningRate 0.0000   Epoch: 19   Global Step: 113150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:05,757-Speed 5568.95 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 19   Global Step: 113160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:07,601-Speed 5554.28 samples/sec   Loss 0.8963   LearningRate 0.0000   Epoch: 19   Global Step: 113170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:09,434-Speed 5586.23 samples/sec   Loss 0.8957   LearningRate 0.0000   Epoch: 19   Global Step: 113180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:11,272-Speed 5573.01 samples/sec   Loss 0.8788   LearningRate 0.0000   Epoch: 19   Global Step: 113190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:13,100-Speed 5604.13 samples/sec   Loss 0.8342   LearningRate 0.0000   Epoch: 19   Global Step: 113200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:14,928-Speed 5605.84 samples/sec   Loss 0.9299   LearningRate 0.0000   Epoch: 19   Global Step: 113210   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:35:16,757-Speed 5599.95 samples/sec   Loss 0.8394   LearningRate 0.0000   Epoch: 19   Global Step: 113220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:18,577-Speed 5626.01 samples/sec   Loss 0.9374   LearningRate 0.0000   Epoch: 19   Global Step: 113230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:20,423-Speed 5549.75 samples/sec   Loss 0.9211   LearningRate 0.0000   Epoch: 19   Global Step: 113240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:22,275-Speed 5529.79 samples/sec   Loss 0.9662   LearningRate 0.0000   Epoch: 19   Global Step: 113250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:24,103-Speed 5606.57 samples/sec   Loss 0.9374   LearningRate 0.0000   Epoch: 19   Global Step: 113260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:25,946-Speed 5557.39 samples/sec   Loss 0.8569   LearningRate 0.0000   Epoch: 19   Global Step: 113270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:27,779-Speed 5589.87 samples/sec   Loss 0.9242   LearningRate 0.0000   Epoch: 19   Global Step: 113280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:29,615-Speed 5578.75 samples/sec   Loss 0.9543   LearningRate 0.0000   Epoch: 19   Global Step: 113290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:31,457-Speed 5560.38 samples/sec   Loss 0.8630   LearningRate 0.0000   Epoch: 19   Global Step: 113300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:33,298-Speed 5563.95 samples/sec   Loss 0.9224   LearningRate 0.0000   Epoch: 19   Global Step: 113310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:35,159-Speed 5505.35 samples/sec   Loss 0.8232   LearningRate 0.0000   Epoch: 19   Global Step: 113320   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 08:35:36,985-Speed 5608.41 samples/sec   Loss 0.9654   LearningRate 0.0000   Epoch: 19   Global Step: 113330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:38,817-Speed 5590.47 samples/sec   Loss 0.9052   LearningRate 0.0000   Epoch: 19   Global Step: 113340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:40,644-Speed 5607.38 samples/sec   Loss 0.9508   LearningRate 0.0000   Epoch: 19   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:42,472-Speed 5603.08 samples/sec   Loss 0.9453   LearningRate 0.0000   Epoch: 19   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:44,322-Speed 5539.36 samples/sec   Loss 0.8852   LearningRate 0.0000   Epoch: 19   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:46,154-Speed 5592.03 samples/sec   Loss 0.8481   LearningRate 0.0000   Epoch: 19   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:47,970-Speed 5641.02 samples/sec   Loss 0.9657   LearningRate 0.0000   Epoch: 19   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:49,818-Speed 5542.47 samples/sec   Loss 0.9751   LearningRate 0.0000   Epoch: 19   Global Step: 113400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:51,656-Speed 5570.80 samples/sec   Loss 0.9252   LearningRate 0.0000   Epoch: 19   Global Step: 113410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:53,522-Speed 5492.12 samples/sec   Loss 0.8894   LearningRate 0.0000   Epoch: 19   Global Step: 113420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:55,335-Speed 5648.27 samples/sec   Loss 0.9116   LearningRate 0.0000   Epoch: 19   Global Step: 113430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:57,170-Speed 5584.13 samples/sec   Loss 0.8354   LearningRate 0.0000   Epoch: 19   Global Step: 113440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:35:58,989-Speed 5629.87 samples/sec   Loss 0.9234   LearningRate 0.0000   Epoch: 19   Global Step: 113450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:00,841-Speed 5530.87 samples/sec   Loss 0.8781   LearningRate 0.0000   Epoch: 19   Global Step: 113460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:02,687-Speed 5548.18 samples/sec   Loss 0.9827   LearningRate 0.0000   Epoch: 19   Global Step: 113470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:04,521-Speed 5587.29 samples/sec   Loss 0.9157   LearningRate 0.0000   Epoch: 19   Global Step: 113480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:06,340-Speed 5630.56 samples/sec   Loss 0.8460   LearningRate 0.0000   Epoch: 19   Global Step: 113490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:08,160-Speed 5629.58 samples/sec   Loss 0.9083   LearningRate 0.0000   Epoch: 19   Global Step: 113500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:10,003-Speed 5558.32 samples/sec   Loss 0.8861   LearningRate 0.0000   Epoch: 19   Global Step: 113510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:11,824-Speed 5623.51 samples/sec   Loss 0.9146   LearningRate 0.0000   Epoch: 19   Global Step: 113520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:13,660-Speed 5578.85 samples/sec   Loss 0.8679   LearningRate 0.0000   Epoch: 19   Global Step: 113530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:15,482-Speed 5622.24 samples/sec   Loss 0.9071   LearningRate 0.0000   Epoch: 19   Global Step: 113540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:17,317-Speed 5583.14 samples/sec   Loss 0.8988   LearningRate 0.0000   Epoch: 19   Global Step: 113550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:19,164-Speed 5545.37 samples/sec   Loss 0.8795   LearningRate 0.0000   Epoch: 19   Global Step: 113560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:21,018-Speed 5526.02 samples/sec   Loss 0.8677   LearningRate 0.0000   Epoch: 19   Global Step: 113570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:22,850-Speed 5592.07 samples/sec   Loss 0.9298   LearningRate 0.0000   Epoch: 19   Global Step: 113580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:24,684-Speed 5585.04 samples/sec   Loss 0.9025   LearningRate 0.0000   Epoch: 19   Global Step: 113590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:26,512-Speed 5602.21 samples/sec   Loss 0.8337   LearningRate 0.0000   Epoch: 19   Global Step: 113600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:28,349-Speed 5574.91 samples/sec   Loss 0.9337   LearningRate 0.0000   Epoch: 19   Global Step: 113610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:30,169-Speed 5630.80 samples/sec   Loss 0.9140   LearningRate 0.0000   Epoch: 19   Global Step: 113620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:31,989-Speed 5629.82 samples/sec   Loss 0.8580   LearningRate 0.0000   Epoch: 19   Global Step: 113630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:33,823-Speed 5585.44 samples/sec   Loss 0.8593   LearningRate 0.0000   Epoch: 19   Global Step: 113640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:35,665-Speed 5559.58 samples/sec   Loss 0.9264   LearningRate 0.0000   Epoch: 19   Global Step: 113650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:37,513-Speed 5541.37 samples/sec   Loss 0.9049   LearningRate 0.0000   Epoch: 19   Global Step: 113660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:39,344-Speed 5596.70 samples/sec   Loss 0.9431   LearningRate 0.0000   Epoch: 19   Global Step: 113670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:41,178-Speed 5582.49 samples/sec   Loss 0.8512   LearningRate 0.0000   Epoch: 19   Global Step: 113680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:43,024-Speed 5549.93 samples/sec   Loss 0.8791   LearningRate 0.0000   Epoch: 19   Global Step: 113690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:44,857-Speed 5588.24 samples/sec   Loss 0.9119   LearningRate 0.0000   Epoch: 19   Global Step: 113700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:46,765-Speed 5368.49 samples/sec   Loss 0.8956   LearningRate 0.0000   Epoch: 19   Global Step: 113710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 08:36:48,596-Speed 5596.34 samples/sec   Loss 0.8276   LearningRate 0.0000   Epoch: 19   Global Step: 113720   Fp16 Grad Scale: 65536   Required: -0 hours