Training: 2022-04-26 12:13:48,919-rank_id: 0
Training: 2022-04-26 12:14:02,806-: margin_list              [1.0, 0.0, 0.4]
Training: 2022-04-26 12:14:02,808-: network                  r100
Training: 2022-04-26 12:14:02,808-: resume                   False
Training: 2022-04-26 12:14:02,808-: output                   work_dirs/wf4m_r100
Training: 2022-04-26 12:14:02,808-: embedding_size           512
Training: 2022-04-26 12:14:02,808-: sample_rate              1.0
Training: 2022-04-26 12:14:02,808-: interclass_filtering_threshold0
Training: 2022-04-26 12:14:02,808-: fp16                     True
Training: 2022-04-26 12:14:02,808-: batch_size               128
Training: 2022-04-26 12:14:02,808-: optimizer                sgd
Training: 2022-04-26 12:14:02,808-: lr                       0.1
Training: 2022-04-26 12:14:02,808-: momentum                 0.9
Training: 2022-04-26 12:14:02,808-: weight_decay             0.0005
Training: 2022-04-26 12:14:02,809-: verbose                  2000
Training: 2022-04-26 12:14:02,809-: frequent                 10
Training: 2022-04-26 12:14:02,809-: dali                     False
Training: 2022-04-26 12:14:02,809-: rec                      /train_tmp/WebFace4M
Training: 2022-04-26 12:14:02,809-: num_classes              205990
Training: 2022-04-26 12:14:02,809-: num_image                4235242
Training: 2022-04-26 12:14:02,809-: num_epoch                20
Training: 2022-04-26 12:14:02,809-: warmup_epoch             0
Training: 2022-04-26 12:14:02,809-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-26 12:14:02,809-: total_batch_size         1024
Training: 2022-04-26 12:14:02,809-: warmup_step              0
Training: 2022-04-26 12:14:02,810-: total_step               82700
Training: 2022-04-26 12:15:09,750-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-26 12:15:15,276-Speed 3366.51 samples/sec   Loss 42.2827   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 8192   Required: 17 hours
Training: 2022-04-26 12:15:18,345-Speed 3337.14 samples/sec   Loss 43.4502   LearningRate 0.0999   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-26 12:15:21,405-Speed 3347.78 samples/sec   Loss 43.2877   LearningRate 0.0999   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-26 12:15:24,410-Speed 3408.39 samples/sec   Loss 42.7765   LearningRate 0.0999   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-26 12:15:27,443-Speed 3377.75 samples/sec   Loss 42.8709   LearningRate 0.0999   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-26 12:15:30,469-Speed 3384.78 samples/sec   Loss 42.8863   LearningRate 0.0998   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-26 12:15:33,469-Speed 3413.84 samples/sec   Loss 42.7049   LearningRate 0.0998   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-26 12:15:36,458-Speed 3426.85 samples/sec   Loss 42.8331   LearningRate 0.0998   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-26 12:15:39,448-Speed 3425.80 samples/sec   Loss 42.6035   LearningRate 0.0998   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-26 12:15:42,443-Speed 3420.05 samples/sec   Loss 42.4425   LearningRate 0.0997   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-26 12:15:45,437-Speed 3420.09 samples/sec   Loss 42.2829   LearningRate 0.0997   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-26 12:15:48,455-Speed 3394.66 samples/sec   Loss 42.3140   LearningRate 0.0997   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-26 12:15:51,487-Speed 3377.36 samples/sec   Loss 42.1229   LearningRate 0.0997   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-26 12:15:54,538-Speed 3357.15 samples/sec   Loss 42.0627   LearningRate 0.0996   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-26 12:15:57,551-Speed 3399.96 samples/sec   Loss 42.0204   LearningRate 0.0996   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-26 12:16:00,550-Speed 3415.02 samples/sec   Loss 41.7406   LearningRate 0.0996   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-26 12:16:03,548-Speed 3416.63 samples/sec   Loss 41.7301   LearningRate 0.0996   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-26 12:16:06,546-Speed 3416.59 samples/sec   Loss 41.4841   LearningRate 0.0995   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-26 12:16:09,550-Speed 3409.41 samples/sec   Loss 41.4125   LearningRate 0.0995   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-26 12:16:12,553-Speed 3410.07 samples/sec   Loss 41.4388   LearningRate 0.0995   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:15,559-Speed 3407.80 samples/sec   Loss 41.2686   LearningRate 0.0995   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:18,561-Speed 3412.14 samples/sec   Loss 41.0784   LearningRate 0.0994   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:21,563-Speed 3410.88 samples/sec   Loss 41.0651   LearningRate 0.0994   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:24,573-Speed 3404.00 samples/sec   Loss 40.9296   LearningRate 0.0994   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:27,577-Speed 3409.08 samples/sec   Loss 40.8982   LearningRate 0.0994   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:30,583-Speed 3407.55 samples/sec   Loss 40.7956   LearningRate 0.0993   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:33,583-Speed 3413.69 samples/sec   Loss 40.5724   LearningRate 0.0993   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:36,615-Speed 3378.79 samples/sec   Loss 40.4337   LearningRate 0.0993   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:39,621-Speed 3406.47 samples/sec   Loss 40.3194   LearningRate 0.0993   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:16:42,650-Speed 3381.19 samples/sec   Loss 40.2631   LearningRate 0.0993   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:16:45,716-Speed 3341.24 samples/sec   Loss 40.0456   LearningRate 0.0992   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:16:48,744-Speed 3382.81 samples/sec   Loss 40.0414   LearningRate 0.0992   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:16:51,769-Speed 3384.78 samples/sec   Loss 39.9460   LearningRate 0.0992   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:16:54,784-Speed 3398.02 samples/sec   Loss 39.8949   LearningRate 0.0992   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:16:57,791-Speed 3406.77 samples/sec   Loss 39.6884   LearningRate 0.0991   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:17:00,798-Speed 3406.99 samples/sec   Loss 39.6429   LearningRate 0.0991   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:17:03,808-Speed 3402.20 samples/sec   Loss 39.5055   LearningRate 0.0991   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:17:06,815-Speed 3406.43 samples/sec   Loss 39.4734   LearningRate 0.0991   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:17:09,824-Speed 3403.53 samples/sec   Loss 39.2475   LearningRate 0.0990   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:17:12,830-Speed 3407.44 samples/sec   Loss 39.1171   LearningRate 0.0990   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:15,835-Speed 3407.72 samples/sec   Loss 38.9276   LearningRate 0.0990   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:18,845-Speed 3404.00 samples/sec   Loss 38.9228   LearningRate 0.0990   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:21,854-Speed 3403.77 samples/sec   Loss 38.7256   LearningRate 0.0989   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:24,867-Speed 3399.37 samples/sec   Loss 38.6432   LearningRate 0.0989   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:27,888-Speed 3390.39 samples/sec   Loss 38.5437   LearningRate 0.0989   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:30,898-Speed 3402.91 samples/sec   Loss 38.5147   LearningRate 0.0989   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:33,945-Speed 3361.26 samples/sec   Loss 38.4066   LearningRate 0.0988   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:36,953-Speed 3405.29 samples/sec   Loss 38.2313   LearningRate 0.0988   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:39,959-Speed 3407.97 samples/sec   Loss 38.1755   LearningRate 0.0988   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:17:42,987-Speed 3382.57 samples/sec   Loss 37.9321   LearningRate 0.0988   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:17:46,047-Speed 3347.02 samples/sec   Loss 37.8409   LearningRate 0.0987   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:17:49,053-Speed 3407.03 samples/sec   Loss 37.8308   LearningRate 0.0987   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:17:52,063-Speed 3403.44 samples/sec   Loss 37.6902   LearningRate 0.0987   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:17:55,078-Speed 3397.20 samples/sec   Loss 37.5224   LearningRate 0.0987   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:17:58,091-Speed 3398.98 samples/sec   Loss 37.5446   LearningRate 0.0987   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:01,095-Speed 3409.45 samples/sec   Loss 37.4427   LearningRate 0.0986   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:04,105-Speed 3402.93 samples/sec   Loss 37.2560   LearningRate 0.0986   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:07,109-Speed 3409.70 samples/sec   Loss 37.1291   LearningRate 0.0986   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:10,113-Speed 3410.18 samples/sec   Loss 36.9833   LearningRate 0.0986   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:13,113-Speed 3413.64 samples/sec   Loss 36.8935   LearningRate 0.0985   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:16,117-Speed 3410.05 samples/sec   Loss 36.9686   LearningRate 0.0985   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:19,123-Speed 3406.76 samples/sec   Loss 36.7577   LearningRate 0.0985   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:22,136-Speed 3400.58 samples/sec   Loss 36.6989   LearningRate 0.0985   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:25,151-Speed 3396.35 samples/sec   Loss 36.4549   LearningRate 0.0984   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:28,156-Speed 3409.09 samples/sec   Loss 36.5036   LearningRate 0.0984   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:31,162-Speed 3406.26 samples/sec   Loss 36.3930   LearningRate 0.0984   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:34,165-Speed 3410.79 samples/sec   Loss 36.3177   LearningRate 0.0984   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:37,169-Speed 3409.82 samples/sec   Loss 36.0101   LearningRate 0.0983   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:40,174-Speed 3408.10 samples/sec   Loss 36.1398   LearningRate 0.0983   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:18:43,191-Speed 3394.47 samples/sec   Loss 35.9398   LearningRate 0.0983   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:46,202-Speed 3402.03 samples/sec   Loss 35.7569   LearningRate 0.0983   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:49,280-Speed 3327.87 samples/sec   Loss 35.6451   LearningRate 0.0982   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:52,339-Speed 3349.22 samples/sec   Loss 35.5468   LearningRate 0.0982   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:55,344-Speed 3408.23 samples/sec   Loss 35.4674   LearningRate 0.0982   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:18:58,381-Speed 3371.70 samples/sec   Loss 35.4605   LearningRate 0.0982   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:01,395-Speed 3397.99 samples/sec   Loss 35.4914   LearningRate 0.0981   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:04,407-Speed 3400.68 samples/sec   Loss 35.2344   LearningRate 0.0981   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:07,421-Speed 3398.69 samples/sec   Loss 35.2761   LearningRate 0.0981   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:10,428-Speed 3406.45 samples/sec   Loss 35.0778   LearningRate 0.0981   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:13,436-Speed 3404.28 samples/sec   Loss 35.0176   LearningRate 0.0981   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:16,451-Speed 3397.94 samples/sec   Loss 34.8997   LearningRate 0.0980   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:19,459-Speed 3405.22 samples/sec   Loss 34.6697   LearningRate 0.0980   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:22,466-Speed 3405.55 samples/sec   Loss 34.6715   LearningRate 0.0980   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:25,479-Speed 3399.88 samples/sec   Loss 34.4812   LearningRate 0.0980   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:28,491-Speed 3400.36 samples/sec   Loss 34.4792   LearningRate 0.0979   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:19:31,504-Speed 3399.37 samples/sec   Loss 34.4376   LearningRate 0.0979   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:19:34,521-Speed 3395.86 samples/sec   Loss 34.3876   LearningRate 0.0979   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:37,529-Speed 3404.64 samples/sec   Loss 34.1665   LearningRate 0.0979   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:40,547-Speed 3393.88 samples/sec   Loss 34.1142   LearningRate 0.0978   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:43,556-Speed 3403.89 samples/sec   Loss 34.0121   LearningRate 0.0978   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:46,565-Speed 3404.64 samples/sec   Loss 33.9119   LearningRate 0.0978   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:49,578-Speed 3398.86 samples/sec   Loss 33.6631   LearningRate 0.0978   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:52,607-Speed 3381.49 samples/sec   Loss 33.6736   LearningRate 0.0977   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:55,620-Speed 3399.71 samples/sec   Loss 33.5809   LearningRate 0.0977   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:19:58,632-Speed 3399.87 samples/sec   Loss 33.5833   LearningRate 0.0977   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:01,641-Speed 3404.56 samples/sec   Loss 33.3796   LearningRate 0.0977   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:04,635-Speed 3420.41 samples/sec   Loss 33.1931   LearningRate 0.0976   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:07,646-Speed 3402.14 samples/sec   Loss 33.1058   LearningRate 0.0976   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:10,666-Speed 3391.08 samples/sec   Loss 33.0799   LearningRate 0.0976   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:13,676-Speed 3403.12 samples/sec   Loss 33.0494   LearningRate 0.0976   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:16,685-Speed 3404.46 samples/sec   Loss 32.8746   LearningRate 0.0975   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:19,696-Speed 3401.80 samples/sec   Loss 32.7224   LearningRate 0.0975   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:22,708-Speed 3400.68 samples/sec   Loss 32.6824   LearningRate 0.0975   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:25,718-Speed 3402.56 samples/sec   Loss 32.6408   LearningRate 0.0975   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:28,727-Speed 3403.06 samples/sec   Loss 32.6445   LearningRate 0.0975   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:31,751-Speed 3387.51 samples/sec   Loss 32.2915   LearningRate 0.0974   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:34,755-Speed 3409.69 samples/sec   Loss 32.2545   LearningRate 0.0974   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:37,779-Speed 3386.38 samples/sec   Loss 32.1185   LearningRate 0.0974   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:40,796-Speed 3395.93 samples/sec   Loss 32.2366   LearningRate 0.0974   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:43,807-Speed 3400.68 samples/sec   Loss 31.9739   LearningRate 0.0973   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:46,825-Speed 3394.12 samples/sec   Loss 31.9725   LearningRate 0.0973   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:49,869-Speed 3365.17 samples/sec   Loss 31.7781   LearningRate 0.0973   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:52,902-Speed 3376.56 samples/sec   Loss 31.6935   LearningRate 0.0973   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:55,912-Speed 3403.48 samples/sec   Loss 31.6471   LearningRate 0.0972   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:20:58,926-Speed 3397.97 samples/sec   Loss 31.6190   LearningRate 0.0972   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:01,959-Speed 3376.27 samples/sec   Loss 31.3908   LearningRate 0.0972   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:04,994-Speed 3375.53 samples/sec   Loss 31.3512   LearningRate 0.0972   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:08,037-Speed 3366.41 samples/sec   Loss 31.3354   LearningRate 0.0971   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:11,049-Speed 3399.61 samples/sec   Loss 31.0653   LearningRate 0.0971   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:14,063-Speed 3398.36 samples/sec   Loss 30.9226   LearningRate 0.0971   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:17,077-Speed 3398.64 samples/sec   Loss 31.0489   LearningRate 0.0971   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:20,088-Speed 3401.29 samples/sec   Loss 30.8484   LearningRate 0.0970   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:23,114-Speed 3384.79 samples/sec   Loss 30.6441   LearningRate 0.0970   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:26,147-Speed 3377.13 samples/sec   Loss 30.6786   LearningRate 0.0970   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:29,175-Speed 3382.70 samples/sec   Loss 30.5269   LearningRate 0.0970   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:32,189-Speed 3398.10 samples/sec   Loss 30.4855   LearningRate 0.0970   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:35,201-Speed 3400.33 samples/sec   Loss 30.3768   LearningRate 0.0969   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 12:21:38,208-Speed 3406.79 samples/sec   Loss 30.4089   LearningRate 0.0969   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:41,234-Speed 3384.46 samples/sec   Loss 30.2671   LearningRate 0.0969   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:44,260-Speed 3384.85 samples/sec   Loss 30.1830   LearningRate 0.0969   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:47,296-Speed 3374.13 samples/sec   Loss 30.0835   LearningRate 0.0968   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:50,316-Speed 3391.24 samples/sec   Loss 30.0697   LearningRate 0.0968   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:53,405-Speed 3316.46 samples/sec   Loss 29.9324   LearningRate 0.0968   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:56,418-Speed 3399.31 samples/sec   Loss 29.6903   LearningRate 0.0968   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:21:59,433-Speed 3397.90 samples/sec   Loss 29.6768   LearningRate 0.0967   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:02,453-Speed 3391.25 samples/sec   Loss 29.5743   LearningRate 0.0967   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:05,492-Speed 3369.43 samples/sec   Loss 29.3147   LearningRate 0.0967   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:08,498-Speed 3407.71 samples/sec   Loss 29.5036   LearningRate 0.0967   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:11,514-Speed 3396.50 samples/sec   Loss 29.2657   LearningRate 0.0966   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:14,519-Speed 3408.31 samples/sec   Loss 29.1744   LearningRate 0.0966   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:17,534-Speed 3397.68 samples/sec   Loss 29.1135   LearningRate 0.0966   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:20,548-Speed 3398.04 samples/sec   Loss 28.9969   LearningRate 0.0966   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:23,566-Speed 3393.41 samples/sec   Loss 29.0000   LearningRate 0.0965   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:26,595-Speed 3381.53 samples/sec   Loss 28.7528   LearningRate 0.0965   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:29,610-Speed 3396.62 samples/sec   Loss 28.7747   LearningRate 0.0965   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:32,630-Speed 3392.10 samples/sec   Loss 28.6391   LearningRate 0.0965   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:35,659-Speed 3381.53 samples/sec   Loss 28.4322   LearningRate 0.0965   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:38,681-Speed 3389.52 samples/sec   Loss 28.4758   LearningRate 0.0964   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:41,721-Speed 3368.57 samples/sec   Loss 28.4643   LearningRate 0.0964   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:22:44,765-Speed 3365.56 samples/sec   Loss 28.2654   LearningRate 0.0964   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:47,818-Speed 3354.08 samples/sec   Loss 28.1631   LearningRate 0.0964   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:50,841-Speed 3388.70 samples/sec   Loss 28.0662   LearningRate 0.0963   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:53,868-Speed 3383.12 samples/sec   Loss 28.1282   LearningRate 0.0963   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:56,898-Speed 3381.16 samples/sec   Loss 27.8934   LearningRate 0.0963   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:22:59,917-Speed 3392.28 samples/sec   Loss 27.8067   LearningRate 0.0963   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:02,935-Speed 3394.58 samples/sec   Loss 27.7541   LearningRate 0.0962   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:05,963-Speed 3382.07 samples/sec   Loss 27.5696   LearningRate 0.0962   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:08,982-Speed 3392.97 samples/sec   Loss 27.4587   LearningRate 0.0962   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:12,000-Speed 3393.41 samples/sec   Loss 27.5781   LearningRate 0.0962   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:15,009-Speed 3403.96 samples/sec   Loss 27.4105   LearningRate 0.0961   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:18,038-Speed 3381.48 samples/sec   Loss 27.2848   LearningRate 0.0961   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:21,054-Speed 3395.53 samples/sec   Loss 27.1003   LearningRate 0.0961   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:24,077-Speed 3388.58 samples/sec   Loss 27.1489   LearningRate 0.0961   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:27,094-Speed 3394.57 samples/sec   Loss 27.0352   LearningRate 0.0960   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:30,110-Speed 3395.76 samples/sec   Loss 26.8297   LearningRate 0.0960   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:33,126-Speed 3397.12 samples/sec   Loss 26.7684   LearningRate 0.0960   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:36,142-Speed 3395.79 samples/sec   Loss 26.5770   LearningRate 0.0960   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:39,167-Speed 3385.59 samples/sec   Loss 26.6139   LearningRate 0.0960   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:42,186-Speed 3392.84 samples/sec   Loss 26.6082   LearningRate 0.0959   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:45,192-Speed 3406.75 samples/sec   Loss 26.6421   LearningRate 0.0959   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:48,218-Speed 3384.95 samples/sec   Loss 26.3377   LearningRate 0.0959   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:51,248-Speed 3380.33 samples/sec   Loss 26.3191   LearningRate 0.0959   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:54,267-Speed 3392.77 samples/sec   Loss 26.3810   LearningRate 0.0958   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:23:57,291-Speed 3387.34 samples/sec   Loss 26.2438   LearningRate 0.0958   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:00,317-Speed 3384.44 samples/sec   Loss 26.0685   LearningRate 0.0958   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:03,339-Speed 3389.22 samples/sec   Loss 26.1945   LearningRate 0.0958   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:06,363-Speed 3387.33 samples/sec   Loss 26.0149   LearningRate 0.0957   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:09,383-Speed 3391.33 samples/sec   Loss 25.9347   LearningRate 0.0957   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:12,400-Speed 3394.33 samples/sec   Loss 25.7501   LearningRate 0.0957   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:15,514-Speed 3289.18 samples/sec   Loss 25.8267   LearningRate 0.0957   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:18,544-Speed 3381.08 samples/sec   Loss 25.5801   LearningRate 0.0956   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:21,562-Speed 3393.76 samples/sec   Loss 25.5263   LearningRate 0.0956   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:24,586-Speed 3387.05 samples/sec   Loss 25.6699   LearningRate 0.0956   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:24:27,595-Speed 3403.85 samples/sec   Loss 25.3367   LearningRate 0.0956   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:30,615-Speed 3391.55 samples/sec   Loss 25.2756   LearningRate 0.0956   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:33,635-Speed 3392.31 samples/sec   Loss 25.3261   LearningRate 0.0955   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:36,673-Speed 3370.43 samples/sec   Loss 25.3227   LearningRate 0.0955   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:39,695-Speed 3389.95 samples/sec   Loss 25.2000   LearningRate 0.0955   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:42,713-Speed 3393.25 samples/sec   Loss 25.0936   LearningRate 0.0955   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:45,740-Speed 3384.13 samples/sec   Loss 25.0727   LearningRate 0.0954   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:48,778-Speed 3371.25 samples/sec   Loss 24.8530   LearningRate 0.0954   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:51,797-Speed 3391.63 samples/sec   Loss 24.8150   LearningRate 0.0954   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:54,819-Speed 3389.76 samples/sec   Loss 24.6635   LearningRate 0.0954   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:24:57,842-Speed 3388.18 samples/sec   Loss 24.7082   LearningRate 0.0953   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:25:00,869-Speed 3384.02 samples/sec   Loss 24.4736   LearningRate 0.0953   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:25:03,891-Speed 3389.28 samples/sec   Loss 24.6331   LearningRate 0.0953   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:25:06,915-Speed 3387.46 samples/sec   Loss 24.2703   LearningRate 0.0953   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:25:09,933-Speed 3393.63 samples/sec   Loss 24.3315   LearningRate 0.0952   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:25:12,958-Speed 3385.77 samples/sec   Loss 24.2481   LearningRate 0.0952   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:25:56,951-[lfw][2000]XNorm: 23.266422
Training: 2022-04-26 12:25:56,952-[lfw][2000]Accuracy-Flip: 0.97650+-0.00858
Training: 2022-04-26 12:25:56,952-[lfw][2000]Accuracy-Highest: 0.97650
Training: 2022-04-26 12:26:47,416-[cfp_fp][2000]XNorm: 20.272900
Training: 2022-04-26 12:26:47,417-[cfp_fp][2000]Accuracy-Flip: 0.83300+-0.01908
Training: 2022-04-26 12:26:47,417-[cfp_fp][2000]Accuracy-Highest: 0.83300
Training: 2022-04-26 12:27:31,351-[agedb_30][2000]XNorm: 22.526387
Training: 2022-04-26 12:27:31,352-[agedb_30][2000]Accuracy-Flip: 0.85000+-0.01765
Training: 2022-04-26 12:27:31,352-[agedb_30][2000]Accuracy-Highest: 0.85000
Training: 2022-04-26 12:27:34,375-Speed 72.41 samples/sec   Loss 24.2335   LearningRate 0.0952   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:37,397-Speed 3389.71 samples/sec   Loss 24.0198   LearningRate 0.0952   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:40,411-Speed 3398.30 samples/sec   Loss 24.0065   LearningRate 0.0952   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:43,427-Speed 3396.39 samples/sec   Loss 23.8562   LearningRate 0.0951   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:46,438-Speed 3401.08 samples/sec   Loss 23.9249   LearningRate 0.0951   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:49,458-Speed 3391.32 samples/sec   Loss 24.0500   LearningRate 0.0951   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:52,498-Speed 3369.55 samples/sec   Loss 23.6745   LearningRate 0.0951   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:55,526-Speed 3382.04 samples/sec   Loss 23.8468   LearningRate 0.0950   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:27:58,552-Speed 3385.03 samples/sec   Loss 23.5992   LearningRate 0.0950   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:01,586-Speed 3375.26 samples/sec   Loss 23.6484   LearningRate 0.0950   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:04,617-Speed 3379.92 samples/sec   Loss 23.2813   LearningRate 0.0950   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:07,647-Speed 3380.24 samples/sec   Loss 23.4191   LearningRate 0.0949   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:10,684-Speed 3372.46 samples/sec   Loss 23.2265   LearningRate 0.0949   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:13,715-Speed 3379.38 samples/sec   Loss 23.3235   LearningRate 0.0949   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:16,734-Speed 3392.42 samples/sec   Loss 23.1560   LearningRate 0.0949   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:19,769-Speed 3375.40 samples/sec   Loss 23.3017   LearningRate 0.0948   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:22,804-Speed 3374.40 samples/sec   Loss 23.0438   LearningRate 0.0948   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:25,841-Speed 3373.05 samples/sec   Loss 23.2211   LearningRate 0.0948   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:28,869-Speed 3382.44 samples/sec   Loss 23.0051   LearningRate 0.0948   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:31,945-Speed 3329.51 samples/sec   Loss 22.9173   LearningRate 0.0948   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:34,977-Speed 3377.39 samples/sec   Loss 22.9302   LearningRate 0.0947   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:38,004-Speed 3384.22 samples/sec   Loss 22.6696   LearningRate 0.0947   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:41,027-Speed 3387.85 samples/sec   Loss 22.5892   LearningRate 0.0947   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:44,048-Speed 3391.08 samples/sec   Loss 22.7418   LearningRate 0.0947   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:47,058-Speed 3402.97 samples/sec   Loss 22.5202   LearningRate 0.0946   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:50,099-Speed 3367.54 samples/sec   Loss 22.5432   LearningRate 0.0946   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:53,117-Speed 3393.33 samples/sec   Loss 22.4129   LearningRate 0.0946   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:56,132-Speed 3397.25 samples/sec   Loss 22.3963   LearningRate 0.0946   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:28:59,146-Speed 3399.10 samples/sec   Loss 22.1889   LearningRate 0.0945   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:29:02,163-Speed 3393.72 samples/sec   Loss 22.4095   LearningRate 0.0945   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:29:05,197-Speed 3375.95 samples/sec   Loss 22.1683   LearningRate 0.0945   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:29:08,223-Speed 3385.70 samples/sec   Loss 22.1599   LearningRate 0.0945   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:29:11,243-Speed 3391.59 samples/sec   Loss 22.2664   LearningRate 0.0944   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:29:14,271-Speed 3382.39 samples/sec   Loss 22.0413   LearningRate 0.0944   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:29:17,294-Speed 3388.57 samples/sec   Loss 22.0002   LearningRate 0.0944   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:29:20,300-Speed 3406.66 samples/sec   Loss 21.8326   LearningRate 0.0944   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:29:23,297-Speed 3417.35 samples/sec   Loss 21.8101   LearningRate 0.0944   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:26,321-Speed 3387.12 samples/sec   Loss 21.7909   LearningRate 0.0943   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:29,408-Speed 3318.28 samples/sec   Loss 21.6566   LearningRate 0.0943   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:32,431-Speed 3387.69 samples/sec   Loss 21.7723   LearningRate 0.0943   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:35,455-Speed 3386.82 samples/sec   Loss 21.5789   LearningRate 0.0943   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:38,470-Speed 3397.84 samples/sec   Loss 21.4972   LearningRate 0.0942   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:41,493-Speed 3388.20 samples/sec   Loss 21.4423   LearningRate 0.0942   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:44,509-Speed 3396.13 samples/sec   Loss 21.4192   LearningRate 0.0942   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:47,534-Speed 3385.87 samples/sec   Loss 21.3278   LearningRate 0.0942   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:50,555-Speed 3390.13 samples/sec   Loss 21.3007   LearningRate 0.0941   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-26 12:29:53,575-Speed 3390.86 samples/sec   Loss 21.1625   LearningRate 0.0941   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:29:56,596-Speed 3390.74 samples/sec   Loss 21.2400   LearningRate 0.0941   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:29:59,625-Speed 3381.43 samples/sec   Loss 21.2809   LearningRate 0.0941   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:02,652-Speed 3383.88 samples/sec   Loss 21.0145   LearningRate 0.0940   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:05,681-Speed 3381.37 samples/sec   Loss 21.1431   LearningRate 0.0940   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:08,698-Speed 3394.94 samples/sec   Loss 20.8970   LearningRate 0.0940   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:11,716-Speed 3394.30 samples/sec   Loss 20.9268   LearningRate 0.0940   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:14,751-Speed 3374.30 samples/sec   Loss 20.8648   LearningRate 0.0940   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:17,768-Speed 3394.54 samples/sec   Loss 20.8253   LearningRate 0.0939   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:20,788-Speed 3391.76 samples/sec   Loss 20.8162   LearningRate 0.0939   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:30:23,805-Speed 3394.33 samples/sec   Loss 20.7420   LearningRate 0.0939   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:26,832-Speed 3383.81 samples/sec   Loss 20.6199   LearningRate 0.0939   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:29,852-Speed 3391.77 samples/sec   Loss 20.8163   LearningRate 0.0938   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:32,891-Speed 3370.03 samples/sec   Loss 20.5112   LearningRate 0.0938   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:35,921-Speed 3381.16 samples/sec   Loss 20.5399   LearningRate 0.0938   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:38,950-Speed 3381.55 samples/sec   Loss 20.3621   LearningRate 0.0938   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:41,971-Speed 3389.78 samples/sec   Loss 20.3267   LearningRate 0.0937   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:44,994-Speed 3388.89 samples/sec   Loss 20.3423   LearningRate 0.0937   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:48,026-Speed 3377.82 samples/sec   Loss 20.3826   LearningRate 0.0937   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:51,049-Speed 3388.35 samples/sec   Loss 20.2065   LearningRate 0.0937   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:54,063-Speed 3397.59 samples/sec   Loss 20.1751   LearningRate 0.0936   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:30:57,092-Speed 3381.84 samples/sec   Loss 20.2776   LearningRate 0.0936   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:00,111-Speed 3392.01 samples/sec   Loss 20.1438   LearningRate 0.0936   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:03,140-Speed 3381.36 samples/sec   Loss 20.0599   LearningRate 0.0936   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:06,175-Speed 3374.99 samples/sec   Loss 20.0879   LearningRate 0.0936   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:09,199-Speed 3387.20 samples/sec   Loss 20.0598   LearningRate 0.0935   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:12,226-Speed 3383.74 samples/sec   Loss 19.9744   LearningRate 0.0935   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:15,251-Speed 3386.75 samples/sec   Loss 20.0618   LearningRate 0.0935   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:18,269-Speed 3393.25 samples/sec   Loss 19.9417   LearningRate 0.0935   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:21,295-Speed 3384.69 samples/sec   Loss 19.8162   LearningRate 0.0934   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:24,304-Speed 3403.64 samples/sec   Loss 19.5521   LearningRate 0.0934   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:27,341-Speed 3373.24 samples/sec   Loss 19.7322   LearningRate 0.0934   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:30,365-Speed 3387.03 samples/sec   Loss 19.6504   LearningRate 0.0934   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:33,385-Speed 3391.07 samples/sec   Loss 19.5512   LearningRate 0.0933   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:36,413-Speed 3382.89 samples/sec   Loss 19.8663   LearningRate 0.0933   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:39,438-Speed 3386.32 samples/sec   Loss 19.5429   LearningRate 0.0933   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:42,461-Speed 3387.95 samples/sec   Loss 19.5052   LearningRate 0.0933   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:45,485-Speed 3387.67 samples/sec   Loss 19.4968   LearningRate 0.0932   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:48,510-Speed 3385.70 samples/sec   Loss 19.4685   LearningRate 0.0932   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:51,534-Speed 3387.01 samples/sec   Loss 19.5273   LearningRate 0.0932   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:54,569-Speed 3373.79 samples/sec   Loss 19.2813   LearningRate 0.0932   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:31:57,597-Speed 3383.19 samples/sec   Loss 19.3485   LearningRate 0.0932   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:00,623-Speed 3385.74 samples/sec   Loss 19.2811   LearningRate 0.0931   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:03,649-Speed 3384.03 samples/sec   Loss 19.0362   LearningRate 0.0931   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:06,675-Speed 3384.91 samples/sec   Loss 19.1359   LearningRate 0.0931   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:09,709-Speed 3376.39 samples/sec   Loss 19.2513   LearningRate 0.0931   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:12,739-Speed 3379.15 samples/sec   Loss 19.0015   LearningRate 0.0930   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:15,768-Speed 3381.73 samples/sec   Loss 18.9933   LearningRate 0.0930   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:18,794-Speed 3384.50 samples/sec   Loss 19.0784   LearningRate 0.0930   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:21,822-Speed 3383.55 samples/sec   Loss 18.8338   LearningRate 0.0930   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:24,856-Speed 3375.92 samples/sec   Loss 18.7547   LearningRate 0.0929   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:32:27,869-Speed 3399.26 samples/sec   Loss 19.0169   LearningRate 0.0929   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:30,891-Speed 3389.04 samples/sec   Loss 19.0283   LearningRate 0.0929   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:33,911-Speed 3391.49 samples/sec   Loss 18.9940   LearningRate 0.0929   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:36,941-Speed 3379.95 samples/sec   Loss 18.7740   LearningRate 0.0929   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:40,017-Speed 3330.05 samples/sec   Loss 18.8626   LearningRate 0.0928   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:43,041-Speed 3387.48 samples/sec   Loss 18.4988   LearningRate 0.0928   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:46,072-Speed 3378.90 samples/sec   Loss 18.6801   LearningRate 0.0928   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:49,100-Speed 3382.96 samples/sec   Loss 18.8200   LearningRate 0.0928   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:52,130-Speed 3379.59 samples/sec   Loss 18.8756   LearningRate 0.0927   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:55,166-Speed 3373.87 samples/sec   Loss 18.4735   LearningRate 0.0927   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:32:58,180-Speed 3398.89 samples/sec   Loss 18.5709   LearningRate 0.0927   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:01,204-Speed 3386.33 samples/sec   Loss 18.4588   LearningRate 0.0927   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:04,231-Speed 3384.27 samples/sec   Loss 18.5831   LearningRate 0.0926   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:07,258-Speed 3383.70 samples/sec   Loss 18.3655   LearningRate 0.0926   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:10,286-Speed 3381.97 samples/sec   Loss 18.3580   LearningRate 0.0926   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:13,320-Speed 3376.00 samples/sec   Loss 18.3997   LearningRate 0.0926   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:16,363-Speed 3366.06 samples/sec   Loss 18.3992   LearningRate 0.0926   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:19,395-Speed 3377.88 samples/sec   Loss 18.5029   LearningRate 0.0925   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:22,426-Speed 3379.32 samples/sec   Loss 18.4687   LearningRate 0.0925   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:25,465-Speed 3370.73 samples/sec   Loss 18.3583   LearningRate 0.0925   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:28,489-Speed 3387.60 samples/sec   Loss 18.3026   LearningRate 0.0925   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:31,513-Speed 3387.15 samples/sec   Loss 18.1775   LearningRate 0.0924   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:34,548-Speed 3374.73 samples/sec   Loss 18.0440   LearningRate 0.0924   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:37,573-Speed 3385.50 samples/sec   Loss 18.0492   LearningRate 0.0924   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:40,601-Speed 3383.39 samples/sec   Loss 18.2113   LearningRate 0.0924   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:43,627-Speed 3384.74 samples/sec   Loss 17.8987   LearningRate 0.0923   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:46,671-Speed 3364.16 samples/sec   Loss 17.9165   LearningRate 0.0923   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:49,713-Speed 3367.09 samples/sec   Loss 18.2081   LearningRate 0.0923   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:52,741-Speed 3382.59 samples/sec   Loss 17.9169   LearningRate 0.0923   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:33:55,766-Speed 3386.26 samples/sec   Loss 17.9522   LearningRate 0.0922   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:33:58,798-Speed 3378.13 samples/sec   Loss 18.0289   LearningRate 0.0922   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:01,840-Speed 3367.56 samples/sec   Loss 17.8190   LearningRate 0.0922   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:04,871-Speed 3378.61 samples/sec   Loss 17.8371   LearningRate 0.0922   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:07,903-Speed 3378.86 samples/sec   Loss 17.7826   LearningRate 0.0922   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:10,929-Speed 3384.22 samples/sec   Loss 17.8777   LearningRate 0.0921   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:13,957-Speed 3382.20 samples/sec   Loss 17.7017   LearningRate 0.0921   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:17,013-Speed 3351.83 samples/sec   Loss 17.6672   LearningRate 0.0921   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:20,044-Speed 3379.46 samples/sec   Loss 17.7159   LearningRate 0.0921   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:23,073-Speed 3381.48 samples/sec   Loss 17.5266   LearningRate 0.0920   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:26,103-Speed 3380.22 samples/sec   Loss 17.6323   LearningRate 0.0920   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:29,151-Speed 3360.40 samples/sec   Loss 17.5909   LearningRate 0.0920   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:32,185-Speed 3376.48 samples/sec   Loss 17.7550   LearningRate 0.0920   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:35,212-Speed 3383.13 samples/sec   Loss 17.2533   LearningRate 0.0919   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:38,242-Speed 3380.63 samples/sec   Loss 17.3725   LearningRate 0.0919   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:41,296-Speed 3353.24 samples/sec   Loss 17.4044   LearningRate 0.0919   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:44,335-Speed 3370.83 samples/sec   Loss 17.4138   LearningRate 0.0919   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:47,363-Speed 3382.02 samples/sec   Loss 17.5884   LearningRate 0.0919   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:50,405-Speed 3367.76 samples/sec   Loss 17.2990   LearningRate 0.0918   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:53,478-Speed 3332.91 samples/sec   Loss 17.3236   LearningRate 0.0918   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:56,499-Speed 3389.95 samples/sec   Loss 17.3301   LearningRate 0.0918   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:34:59,539-Speed 3369.39 samples/sec   Loss 17.2738   LearningRate 0.0918   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:02,574-Speed 3374.41 samples/sec   Loss 17.1875   LearningRate 0.0917   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:05,606-Speed 3378.66 samples/sec   Loss 17.2073   LearningRate 0.0917   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:08,635-Speed 3381.07 samples/sec   Loss 17.1505   LearningRate 0.0917   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:11,666-Speed 3378.78 samples/sec   Loss 17.3714   LearningRate 0.0917   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:14,694-Speed 3383.35 samples/sec   Loss 17.3088   LearningRate 0.0916   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:17,726-Speed 3377.66 samples/sec   Loss 17.0785   LearningRate 0.0916   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:20,757-Speed 3378.73 samples/sec   Loss 17.1981   LearningRate 0.0916   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:23,790-Speed 3377.86 samples/sec   Loss 17.0714   LearningRate 0.0916   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:35:26,834-Speed 3364.48 samples/sec   Loss 16.9633   LearningRate 0.0916   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:29,888-Speed 3354.05 samples/sec   Loss 17.0695   LearningRate 0.0915   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:32,939-Speed 3357.19 samples/sec   Loss 17.0678   LearningRate 0.0915   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:35,975-Speed 3373.57 samples/sec   Loss 16.7247   LearningRate 0.0915   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:39,008-Speed 3377.93 samples/sec   Loss 17.0180   LearningRate 0.0915   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:42,035-Speed 3383.09 samples/sec   Loss 16.7901   LearningRate 0.0914   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:45,073-Speed 3371.46 samples/sec   Loss 16.9863   LearningRate 0.0914   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:48,111-Speed 3370.79 samples/sec   Loss 16.9139   LearningRate 0.0914   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:51,145-Speed 3377.04 samples/sec   Loss 16.9437   LearningRate 0.0914   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:54,171-Speed 3384.24 samples/sec   Loss 16.8348   LearningRate 0.0913   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:35:57,201-Speed 3380.74 samples/sec   Loss 16.8850   LearningRate 0.0913   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:36:00,229-Speed 3381.94 samples/sec   Loss 16.6619   LearningRate 0.0913   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:36:03,251-Speed 3389.72 samples/sec   Loss 16.7267   LearningRate 0.0913   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:36:06,288-Speed 3372.28 samples/sec   Loss 16.6742   LearningRate 0.0913   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:36:09,317-Speed 3382.05 samples/sec   Loss 16.8823   LearningRate 0.0912   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:36:12,358-Speed 3367.24 samples/sec   Loss 16.7265   LearningRate 0.0912   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:36:15,388-Speed 3380.58 samples/sec   Loss 16.5967   LearningRate 0.0912   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:36:18,419-Speed 3379.50 samples/sec   Loss 16.6984   LearningRate 0.0912   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:36:21,457-Speed 3371.45 samples/sec   Loss 16.5427   LearningRate 0.0911   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:36:24,496-Speed 3370.87 samples/sec   Loss 16.5452   LearningRate 0.0911   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:36:27,525-Speed 3380.89 samples/sec   Loss 16.7259   LearningRate 0.0911   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:36:30,559-Speed 3377.33 samples/sec   Loss 16.5296   LearningRate 0.0911   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 12:36:33,600-Speed 3367.53 samples/sec   Loss 16.6992   LearningRate 0.0910   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:36,630-Speed 3380.42 samples/sec   Loss 16.4849   LearningRate 0.0910   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:39,662-Speed 3377.25 samples/sec   Loss 16.4810   LearningRate 0.0910   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:42,709-Speed 3362.15 samples/sec   Loss 16.4987   LearningRate 0.0910   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:45,741-Speed 3378.16 samples/sec   Loss 16.5792   LearningRate 0.0910   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:48,773-Speed 3377.78 samples/sec   Loss 16.5223   LearningRate 0.0909   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:51,807-Speed 3376.66 samples/sec   Loss 16.4697   LearningRate 0.0909   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:54,835-Speed 3381.85 samples/sec   Loss 16.3508   LearningRate 0.0909   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:36:57,865-Speed 3380.89 samples/sec   Loss 16.2645   LearningRate 0.0909   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:00,902-Speed 3372.60 samples/sec   Loss 16.2906   LearningRate 0.0908   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:03,923-Speed 3390.07 samples/sec   Loss 16.3350   LearningRate 0.0908   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:06,954-Speed 3378.45 samples/sec   Loss 16.2298   LearningRate 0.0908   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:09,992-Speed 3372.53 samples/sec   Loss 16.2186   LearningRate 0.0908   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:13,023-Speed 3379.58 samples/sec   Loss 16.4339   LearningRate 0.0907   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:16,053-Speed 3380.46 samples/sec   Loss 16.2917   LearningRate 0.0907   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:19,087-Speed 3376.02 samples/sec   Loss 16.3287   LearningRate 0.0907   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:22,127-Speed 3368.83 samples/sec   Loss 16.3874   LearningRate 0.0907   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:25,174-Speed 3361.06 samples/sec   Loss 16.3066   LearningRate 0.0907   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:28,216-Speed 3367.70 samples/sec   Loss 16.1705   LearningRate 0.0906   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:31,250-Speed 3375.91 samples/sec   Loss 16.2324   LearningRate 0.0906   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:34,265-Speed 3396.90 samples/sec   Loss 16.0801   LearningRate 0.0906   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:37:37,304-Speed 3370.26 samples/sec   Loss 16.1344   LearningRate 0.0906   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:38:20,698-[lfw][4000]XNorm: 22.270311
Training: 2022-04-26 12:38:20,699-[lfw][4000]Accuracy-Flip: 0.99117+-0.00454
Training: 2022-04-26 12:38:20,699-[lfw][4000]Accuracy-Highest: 0.99117
Training: 2022-04-26 12:39:11,164-[cfp_fp][4000]XNorm: 20.656120
Training: 2022-04-26 12:39:11,165-[cfp_fp][4000]Accuracy-Flip: 0.93943+-0.00850
Training: 2022-04-26 12:39:11,165-[cfp_fp][4000]Accuracy-Highest: 0.93943
Training: 2022-04-26 12:39:54,620-[agedb_30][4000]XNorm: 21.935010
Training: 2022-04-26 12:39:54,621-[agedb_30][4000]Accuracy-Flip: 0.92000+-0.01635
Training: 2022-04-26 12:39:54,621-[agedb_30][4000]Accuracy-Highest: 0.92000
Training: 2022-04-26 12:39:57,647-Speed 72.96 samples/sec   Loss 16.0907   LearningRate 0.0905   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:00,904-Speed 3144.52 samples/sec   Loss 16.1259   LearningRate 0.0905   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:03,937-Speed 3377.28 samples/sec   Loss 16.1534   LearningRate 0.0905   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:06,961-Speed 3387.04 samples/sec   Loss 16.0516   LearningRate 0.0905   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:09,988-Speed 3383.33 samples/sec   Loss 16.0206   LearningRate 0.0904   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:13,018-Speed 3380.62 samples/sec   Loss 15.9576   LearningRate 0.0904   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:16,047-Speed 3381.52 samples/sec   Loss 15.9474   LearningRate 0.0904   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:19,074-Speed 3383.71 samples/sec   Loss 15.8574   LearningRate 0.0904   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:22,089-Speed 3397.59 samples/sec   Loss 15.8158   LearningRate 0.0904   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:25,118-Speed 3381.03 samples/sec   Loss 15.9080   LearningRate 0.0903   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:28,152-Speed 3375.86 samples/sec   Loss 15.8835   LearningRate 0.0903   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:31,187-Speed 3375.07 samples/sec   Loss 15.6935   LearningRate 0.0903   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:34,290-Speed 3300.60 samples/sec   Loss 15.9181   LearningRate 0.0903   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:46,523-Speed 837.14 samples/sec   Loss 14.6858   LearningRate 0.0902   Epoch: 1   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:49,567-Speed 3365.13 samples/sec   Loss 13.7631   LearningRate 0.0902   Epoch: 1   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:52,598-Speed 3379.36 samples/sec   Loss 13.5917   LearningRate 0.0902   Epoch: 1   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:55,632-Speed 3376.49 samples/sec   Loss 13.5659   LearningRate 0.0902   Epoch: 1   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:40:58,667-Speed 3374.24 samples/sec   Loss 13.5817   LearningRate 0.0901   Epoch: 1   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:01,689-Speed 3390.01 samples/sec   Loss 13.6426   LearningRate 0.0901   Epoch: 1   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:04,758-Speed 3336.51 samples/sec   Loss 13.5861   LearningRate 0.0901   Epoch: 1   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:07,795-Speed 3373.13 samples/sec   Loss 13.6295   LearningRate 0.0901   Epoch: 1   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:11,006-Speed 3189.84 samples/sec   Loss 13.6230   LearningRate 0.0901   Epoch: 1   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:14,560-Speed 2881.54 samples/sec   Loss 13.5548   LearningRate 0.0900   Epoch: 1   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:17,602-Speed 3366.58 samples/sec   Loss 13.5877   LearningRate 0.0900   Epoch: 1   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:20,638-Speed 3373.49 samples/sec   Loss 13.7041   LearningRate 0.0900   Epoch: 1   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:23,699-Speed 3346.80 samples/sec   Loss 13.5805   LearningRate 0.0900   Epoch: 1   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:26,728-Speed 3381.47 samples/sec   Loss 13.7309   LearningRate 0.0899   Epoch: 1   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:29,755-Speed 3383.05 samples/sec   Loss 13.7037   LearningRate 0.0899   Epoch: 1   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:32,783-Speed 3382.41 samples/sec   Loss 13.7450   LearningRate 0.0899   Epoch: 1   Global Step: 4290   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:41:35,795-Speed 3400.89 samples/sec   Loss 13.8632   LearningRate 0.0899   Epoch: 1   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:38,815-Speed 3391.12 samples/sec   Loss 13.7754   LearningRate 0.0898   Epoch: 1   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:41,838-Speed 3388.70 samples/sec   Loss 13.7240   LearningRate 0.0898   Epoch: 1   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:44,873-Speed 3375.27 samples/sec   Loss 13.6602   LearningRate 0.0898   Epoch: 1   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:47,901-Speed 3382.65 samples/sec   Loss 13.7699   LearningRate 0.0898   Epoch: 1   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:50,937-Speed 3373.16 samples/sec   Loss 13.7631   LearningRate 0.0898   Epoch: 1   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:53,958-Speed 3390.48 samples/sec   Loss 13.8691   LearningRate 0.0897   Epoch: 1   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:41:56,989-Speed 3380.05 samples/sec   Loss 13.8995   LearningRate 0.0897   Epoch: 1   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:00,008-Speed 3392.18 samples/sec   Loss 13.8082   LearningRate 0.0897   Epoch: 1   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:03,038-Speed 3380.44 samples/sec   Loss 13.8896   LearningRate 0.0897   Epoch: 1   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:06,048-Speed 3403.08 samples/sec   Loss 13.7876   LearningRate 0.0896   Epoch: 1   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:09,070-Speed 3389.05 samples/sec   Loss 13.6904   LearningRate 0.0896   Epoch: 1   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:12,090-Speed 3391.95 samples/sec   Loss 13.6220   LearningRate 0.0896   Epoch: 1   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:15,112-Speed 3389.19 samples/sec   Loss 13.7773   LearningRate 0.0896   Epoch: 1   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:18,132-Speed 3391.19 samples/sec   Loss 13.8106   LearningRate 0.0896   Epoch: 1   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:21,152-Speed 3391.77 samples/sec   Loss 13.9486   LearningRate 0.0895   Epoch: 1   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:24,177-Speed 3386.03 samples/sec   Loss 13.9504   LearningRate 0.0895   Epoch: 1   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:27,207-Speed 3379.17 samples/sec   Loss 13.8248   LearningRate 0.0895   Epoch: 1   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:30,228-Speed 3390.58 samples/sec   Loss 13.8242   LearningRate 0.0895   Epoch: 1   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:33,248-Speed 3392.41 samples/sec   Loss 13.9284   LearningRate 0.0894   Epoch: 1   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:36,283-Speed 3374.40 samples/sec   Loss 13.9798   LearningRate 0.0894   Epoch: 1   Global Step: 4500   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:42:39,310-Speed 3384.41 samples/sec   Loss 13.9484   LearningRate 0.0894   Epoch: 1   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:42,329-Speed 3393.08 samples/sec   Loss 13.9913   LearningRate 0.0894   Epoch: 1   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:45,347-Speed 3393.46 samples/sec   Loss 13.8545   LearningRate 0.0893   Epoch: 1   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:48,366-Speed 3392.38 samples/sec   Loss 13.9530   LearningRate 0.0893   Epoch: 1   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:51,394-Speed 3382.72 samples/sec   Loss 13.9066   LearningRate 0.0893   Epoch: 1   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:54,415-Speed 3390.67 samples/sec   Loss 14.0200   LearningRate 0.0893   Epoch: 1   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:42:57,443-Speed 3381.66 samples/sec   Loss 13.9890   LearningRate 0.0893   Epoch: 1   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:00,479-Speed 3373.39 samples/sec   Loss 14.0210   LearningRate 0.0892   Epoch: 1   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:03,503-Speed 3387.36 samples/sec   Loss 13.8107   LearningRate 0.0892   Epoch: 1   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:06,522-Speed 3392.94 samples/sec   Loss 14.1538   LearningRate 0.0892   Epoch: 1   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:09,535-Speed 3399.48 samples/sec   Loss 14.0822   LearningRate 0.0892   Epoch: 1   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:12,554-Speed 3392.74 samples/sec   Loss 13.8331   LearningRate 0.0891   Epoch: 1   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:15,577-Speed 3387.70 samples/sec   Loss 14.1410   LearningRate 0.0891   Epoch: 1   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:18,606-Speed 3381.86 samples/sec   Loss 13.8544   LearningRate 0.0891   Epoch: 1   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:21,642-Speed 3373.02 samples/sec   Loss 14.1168   LearningRate 0.0891   Epoch: 1   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:24,669-Speed 3384.06 samples/sec   Loss 13.9383   LearningRate 0.0890   Epoch: 1   Global Step: 4660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:27,704-Speed 3374.14 samples/sec   Loss 14.0640   LearningRate 0.0890   Epoch: 1   Global Step: 4670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:30,728-Speed 3387.91 samples/sec   Loss 14.1127   LearningRate 0.0890   Epoch: 1   Global Step: 4680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:33,746-Speed 3393.61 samples/sec   Loss 14.1422   LearningRate 0.0890   Epoch: 1   Global Step: 4690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:36,782-Speed 3373.66 samples/sec   Loss 13.9831   LearningRate 0.0890   Epoch: 1   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:39,792-Speed 3403.74 samples/sec   Loss 14.0534   LearningRate 0.0889   Epoch: 1   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:42,814-Speed 3388.99 samples/sec   Loss 13.8765   LearningRate 0.0889   Epoch: 1   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:45,837-Speed 3387.81 samples/sec   Loss 13.9953   LearningRate 0.0889   Epoch: 1   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:48,881-Speed 3364.58 samples/sec   Loss 13.9607   LearningRate 0.0889   Epoch: 1   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:51,907-Speed 3386.01 samples/sec   Loss 14.0225   LearningRate 0.0888   Epoch: 1   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:54,931-Speed 3386.21 samples/sec   Loss 13.9372   LearningRate 0.0888   Epoch: 1   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:43:57,966-Speed 3374.40 samples/sec   Loss 14.0838   LearningRate 0.0888   Epoch: 1   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:00,988-Speed 3389.54 samples/sec   Loss 14.1156   LearningRate 0.0888   Epoch: 1   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:04,025-Speed 3372.51 samples/sec   Loss 14.1294   LearningRate 0.0888   Epoch: 1   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:07,048-Speed 3388.25 samples/sec   Loss 13.9799   LearningRate 0.0887   Epoch: 1   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:10,060-Speed 3401.14 samples/sec   Loss 14.1104   LearningRate 0.0887   Epoch: 1   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:13,082-Speed 3389.50 samples/sec   Loss 14.1447   LearningRate 0.0887   Epoch: 1   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:16,101-Speed 3392.02 samples/sec   Loss 13.9756   LearningRate 0.0887   Epoch: 1   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:19,132-Speed 3379.36 samples/sec   Loss 13.9842   LearningRate 0.0886   Epoch: 1   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:22,148-Speed 3395.86 samples/sec   Loss 13.9896   LearningRate 0.0886   Epoch: 1   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:25,173-Speed 3385.58 samples/sec   Loss 14.0072   LearningRate 0.0886   Epoch: 1   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:28,195-Speed 3389.60 samples/sec   Loss 14.0344   LearningRate 0.0886   Epoch: 1   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:31,222-Speed 3384.20 samples/sec   Loss 14.0589   LearningRate 0.0885   Epoch: 1   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:34,241-Speed 3391.81 samples/sec   Loss 14.2001   LearningRate 0.0885   Epoch: 1   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:37,262-Speed 3391.54 samples/sec   Loss 14.2046   LearningRate 0.0885   Epoch: 1   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:40,284-Speed 3388.90 samples/sec   Loss 13.9911   LearningRate 0.0885   Epoch: 1   Global Step: 4910   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:44:43,294-Speed 3402.36 samples/sec   Loss 14.1515   LearningRate 0.0885   Epoch: 1   Global Step: 4920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:46,325-Speed 3379.38 samples/sec   Loss 14.0300   LearningRate 0.0884   Epoch: 1   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:49,343-Speed 3393.27 samples/sec   Loss 14.1487   LearningRate 0.0884   Epoch: 1   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:52,361-Speed 3393.55 samples/sec   Loss 14.0689   LearningRate 0.0884   Epoch: 1   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:55,381-Speed 3391.17 samples/sec   Loss 14.1769   LearningRate 0.0884   Epoch: 1   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:44:58,413-Speed 3378.47 samples/sec   Loss 14.1008   LearningRate 0.0883   Epoch: 1   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:01,443-Speed 3381.39 samples/sec   Loss 14.1073   LearningRate 0.0883   Epoch: 1   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:04,473-Speed 3379.72 samples/sec   Loss 14.0964   LearningRate 0.0883   Epoch: 1   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:07,498-Speed 3385.64 samples/sec   Loss 14.0756   LearningRate 0.0883   Epoch: 1   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:10,539-Speed 3367.93 samples/sec   Loss 14.0868   LearningRate 0.0883   Epoch: 1   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:13,546-Speed 3405.93 samples/sec   Loss 14.0260   LearningRate 0.0882   Epoch: 1   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:16,568-Speed 3390.09 samples/sec   Loss 14.2570   LearningRate 0.0882   Epoch: 1   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:19,591-Speed 3387.37 samples/sec   Loss 14.0377   LearningRate 0.0882   Epoch: 1   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:22,620-Speed 3381.67 samples/sec   Loss 14.0956   LearningRate 0.0882   Epoch: 1   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:25,650-Speed 3380.60 samples/sec   Loss 14.1185   LearningRate 0.0881   Epoch: 1   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:28,748-Speed 3305.87 samples/sec   Loss 14.0849   LearningRate 0.0881   Epoch: 1   Global Step: 5070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:31,770-Speed 3389.21 samples/sec   Loss 13.9792   LearningRate 0.0881   Epoch: 1   Global Step: 5080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:34,793-Speed 3388.20 samples/sec   Loss 14.1061   LearningRate 0.0881   Epoch: 1   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:37,815-Speed 3389.58 samples/sec   Loss 14.1558   LearningRate 0.0880   Epoch: 1   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:40,839-Speed 3386.98 samples/sec   Loss 14.1093   LearningRate 0.0880   Epoch: 1   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:43,855-Speed 3395.66 samples/sec   Loss 13.9854   LearningRate 0.0880   Epoch: 1   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:46,878-Speed 3388.48 samples/sec   Loss 14.2979   LearningRate 0.0880   Epoch: 1   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:49,906-Speed 3382.46 samples/sec   Loss 13.9414   LearningRate 0.0880   Epoch: 1   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:52,937-Speed 3379.14 samples/sec   Loss 14.2647   LearningRate 0.0879   Epoch: 1   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:55,961-Speed 3386.39 samples/sec   Loss 14.0185   LearningRate 0.0879   Epoch: 1   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:45:58,986-Speed 3387.02 samples/sec   Loss 13.9420   LearningRate 0.0879   Epoch: 1   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:02,011-Speed 3386.40 samples/sec   Loss 13.9423   LearningRate 0.0879   Epoch: 1   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:05,035-Speed 3386.77 samples/sec   Loss 13.9597   LearningRate 0.0878   Epoch: 1   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:08,056-Speed 3390.33 samples/sec   Loss 14.0221   LearningRate 0.0878   Epoch: 1   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:11,078-Speed 3389.45 samples/sec   Loss 13.9599   LearningRate 0.0878   Epoch: 1   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:14,106-Speed 3382.76 samples/sec   Loss 13.9598   LearningRate 0.0878   Epoch: 1   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:17,166-Speed 3346.66 samples/sec   Loss 13.9150   LearningRate 0.0878   Epoch: 1   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:20,205-Speed 3370.21 samples/sec   Loss 13.9406   LearningRate 0.0877   Epoch: 1   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:23,274-Speed 3337.00 samples/sec   Loss 13.8658   LearningRate 0.0877   Epoch: 1   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:26,302-Speed 3383.54 samples/sec   Loss 13.9762   LearningRate 0.0877   Epoch: 1   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:29,326-Speed 3386.56 samples/sec   Loss 13.8277   LearningRate 0.0877   Epoch: 1   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:32,365-Speed 3371.02 samples/sec   Loss 13.9845   LearningRate 0.0876   Epoch: 1   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:35,387-Speed 3388.29 samples/sec   Loss 14.0234   LearningRate 0.0876   Epoch: 1   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:38,410-Speed 3388.30 samples/sec   Loss 13.9289   LearningRate 0.0876   Epoch: 1   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:41,441-Speed 3379.75 samples/sec   Loss 13.8683   LearningRate 0.0876   Epoch: 1   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:44,463-Speed 3388.44 samples/sec   Loss 13.8414   LearningRate 0.0875   Epoch: 1   Global Step: 5320   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:46:47,478-Speed 3396.77 samples/sec   Loss 13.8067   LearningRate 0.0875   Epoch: 1   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:50,503-Speed 3386.40 samples/sec   Loss 13.8390   LearningRate 0.0875   Epoch: 1   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:53,558-Speed 3352.32 samples/sec   Loss 13.9820   LearningRate 0.0875   Epoch: 1   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:56,593-Speed 3374.96 samples/sec   Loss 13.9081   LearningRate 0.0875   Epoch: 1   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:46:59,619-Speed 3384.59 samples/sec   Loss 13.9975   LearningRate 0.0874   Epoch: 1   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:02,663-Speed 3365.06 samples/sec   Loss 13.9505   LearningRate 0.0874   Epoch: 1   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:05,717-Speed 3354.36 samples/sec   Loss 13.8605   LearningRate 0.0874   Epoch: 1   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:08,744-Speed 3383.93 samples/sec   Loss 13.8689   LearningRate 0.0874   Epoch: 1   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:11,771-Speed 3383.46 samples/sec   Loss 13.9164   LearningRate 0.0873   Epoch: 1   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:14,795-Speed 3386.83 samples/sec   Loss 13.8821   LearningRate 0.0873   Epoch: 1   Global Step: 5420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:17,816-Speed 3389.42 samples/sec   Loss 13.9847   LearningRate 0.0873   Epoch: 1   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:20,830-Speed 3398.41 samples/sec   Loss 14.0633   LearningRate 0.0873   Epoch: 1   Global Step: 5440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:23,944-Speed 3289.43 samples/sec   Loss 14.0607   LearningRate 0.0873   Epoch: 1   Global Step: 5450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:26,981-Speed 3372.67 samples/sec   Loss 13.9994   LearningRate 0.0872   Epoch: 1   Global Step: 5460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:30,012-Speed 3379.33 samples/sec   Loss 13.9446   LearningRate 0.0872   Epoch: 1   Global Step: 5470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:33,037-Speed 3385.71 samples/sec   Loss 13.8328   LearningRate 0.0872   Epoch: 1   Global Step: 5480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:36,067-Speed 3380.14 samples/sec   Loss 13.7980   LearningRate 0.0872   Epoch: 1   Global Step: 5490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:39,096-Speed 3381.44 samples/sec   Loss 13.8111   LearningRate 0.0871   Epoch: 1   Global Step: 5500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:42,129-Speed 3377.41 samples/sec   Loss 13.7066   LearningRate 0.0871   Epoch: 1   Global Step: 5510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:45,160-Speed 3379.13 samples/sec   Loss 13.8658   LearningRate 0.0871   Epoch: 1   Global Step: 5520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:48,197-Speed 3372.51 samples/sec   Loss 13.7927   LearningRate 0.0871   Epoch: 1   Global Step: 5530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:47:51,227-Speed 3379.94 samples/sec   Loss 13.7636   LearningRate 0.0871   Epoch: 1   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:54,256-Speed 3381.18 samples/sec   Loss 13.8159   LearningRate 0.0870   Epoch: 1   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:47:57,317-Speed 3346.58 samples/sec   Loss 13.7400   LearningRate 0.0870   Epoch: 1   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:00,357-Speed 3369.21 samples/sec   Loss 13.6663   LearningRate 0.0870   Epoch: 1   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:03,386-Speed 3381.28 samples/sec   Loss 13.8571   LearningRate 0.0870   Epoch: 1   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:06,418-Speed 3377.99 samples/sec   Loss 13.8536   LearningRate 0.0869   Epoch: 1   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:09,449-Speed 3378.99 samples/sec   Loss 13.9487   LearningRate 0.0869   Epoch: 1   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:12,486-Speed 3373.40 samples/sec   Loss 13.6193   LearningRate 0.0869   Epoch: 1   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:15,518-Speed 3377.12 samples/sec   Loss 13.9033   LearningRate 0.0869   Epoch: 1   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:18,551-Speed 3376.96 samples/sec   Loss 13.8228   LearningRate 0.0868   Epoch: 1   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:21,588-Speed 3372.89 samples/sec   Loss 13.6847   LearningRate 0.0868   Epoch: 1   Global Step: 5640   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:48:24,623-Speed 3374.77 samples/sec   Loss 13.8036   LearningRate 0.0868   Epoch: 1   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:27,670-Speed 3361.96 samples/sec   Loss 13.6980   LearningRate 0.0868   Epoch: 1   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:30,704-Speed 3376.05 samples/sec   Loss 13.8385   LearningRate 0.0868   Epoch: 1   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:33,736-Speed 3377.41 samples/sec   Loss 13.7613   LearningRate 0.0867   Epoch: 1   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:36,765-Speed 3381.25 samples/sec   Loss 13.7327   LearningRate 0.0867   Epoch: 1   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:39,797-Speed 3377.81 samples/sec   Loss 13.9431   LearningRate 0.0867   Epoch: 1   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:42,831-Speed 3376.07 samples/sec   Loss 13.8463   LearningRate 0.0867   Epoch: 1   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:45,867-Speed 3373.46 samples/sec   Loss 13.7231   LearningRate 0.0866   Epoch: 1   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:48,901-Speed 3376.48 samples/sec   Loss 13.5518   LearningRate 0.0866   Epoch: 1   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:51,933-Speed 3378.20 samples/sec   Loss 13.6944   LearningRate 0.0866   Epoch: 1   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:54,956-Speed 3387.19 samples/sec   Loss 13.5322   LearningRate 0.0866   Epoch: 1   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:48:57,983-Speed 3384.19 samples/sec   Loss 13.8486   LearningRate 0.0866   Epoch: 1   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:01,009-Speed 3385.12 samples/sec   Loss 13.6332   LearningRate 0.0865   Epoch: 1   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:04,037-Speed 3382.47 samples/sec   Loss 13.7862   LearningRate 0.0865   Epoch: 1   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:07,066-Speed 3381.37 samples/sec   Loss 13.7414   LearningRate 0.0865   Epoch: 1   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:10,092-Speed 3383.87 samples/sec   Loss 13.7085   LearningRate 0.0865   Epoch: 1   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:13,122-Speed 3380.58 samples/sec   Loss 13.7492   LearningRate 0.0864   Epoch: 1   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:16,152-Speed 3380.38 samples/sec   Loss 13.5869   LearningRate 0.0864   Epoch: 1   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:19,181-Speed 3382.03 samples/sec   Loss 13.5854   LearningRate 0.0864   Epoch: 1   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:22,209-Speed 3382.86 samples/sec   Loss 13.6126   LearningRate 0.0864   Epoch: 1   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:25,228-Speed 3391.96 samples/sec   Loss 13.5656   LearningRate 0.0864   Epoch: 1   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:28,303-Speed 3331.57 samples/sec   Loss 13.7084   LearningRate 0.0863   Epoch: 1   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:31,333-Speed 3380.04 samples/sec   Loss 13.8249   LearningRate 0.0863   Epoch: 1   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:34,360-Speed 3384.02 samples/sec   Loss 13.6351   LearningRate 0.0863   Epoch: 1   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:37,390-Speed 3379.39 samples/sec   Loss 13.5711   LearningRate 0.0863   Epoch: 1   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:40,434-Speed 3365.04 samples/sec   Loss 13.4294   LearningRate 0.0862   Epoch: 1   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:43,466-Speed 3378.20 samples/sec   Loss 13.5320   LearningRate 0.0862   Epoch: 1   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:46,502-Speed 3372.94 samples/sec   Loss 13.5402   LearningRate 0.0862   Epoch: 1   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:49,531-Speed 3381.72 samples/sec   Loss 13.7207   LearningRate 0.0862   Epoch: 1   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:52,561-Speed 3380.92 samples/sec   Loss 13.4587   LearningRate 0.0862   Epoch: 1   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:49:55,591-Speed 3379.56 samples/sec   Loss 13.5633   LearningRate 0.0861   Epoch: 1   Global Step: 5950   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:49:58,612-Speed 3390.60 samples/sec   Loss 13.3671   LearningRate 0.0861   Epoch: 1   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:50:01,644-Speed 3379.13 samples/sec   Loss 13.5700   LearningRate 0.0861   Epoch: 1   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:50:04,673-Speed 3381.25 samples/sec   Loss 13.6084   LearningRate 0.0861   Epoch: 1   Global Step: 5980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:50:07,700-Speed 3383.13 samples/sec   Loss 13.5340   LearningRate 0.0860   Epoch: 1   Global Step: 5990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:50:10,731-Speed 3379.27 samples/sec   Loss 13.5538   LearningRate 0.0860   Epoch: 1   Global Step: 6000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 12:50:54,080-[lfw][6000]XNorm: 21.976314
Training: 2022-04-26 12:50:54,081-[lfw][6000]Accuracy-Flip: 0.99383+-0.00373
Training: 2022-04-26 12:50:54,081-[lfw][6000]Accuracy-Highest: 0.99383
Training: 2022-04-26 12:51:44,535-[cfp_fp][6000]XNorm: 20.315681
Training: 2022-04-26 12:51:44,536-[cfp_fp][6000]Accuracy-Flip: 0.94657+-0.01120
Training: 2022-04-26 12:51:44,536-[cfp_fp][6000]Accuracy-Highest: 0.94657
Training: 2022-04-26 12:52:28,178-[agedb_30][6000]XNorm: 21.829312
Training: 2022-04-26 12:52:28,178-[agedb_30][6000]Accuracy-Flip: 0.94100+-0.00892
Training: 2022-04-26 12:52:28,179-[agedb_30][6000]Accuracy-Highest: 0.94100
Training: 2022-04-26 12:52:31,214-Speed 72.89 samples/sec   Loss 13.6190   LearningRate 0.0860   Epoch: 1   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:34,245-Speed 3379.18 samples/sec   Loss 13.6031   LearningRate 0.0860   Epoch: 1   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:37,277-Speed 3377.62 samples/sec   Loss 13.6204   LearningRate 0.0859   Epoch: 1   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:40,306-Speed 3381.97 samples/sec   Loss 13.6042   LearningRate 0.0859   Epoch: 1   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:43,342-Speed 3373.97 samples/sec   Loss 13.5205   LearningRate 0.0859   Epoch: 1   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:46,375-Speed 3376.57 samples/sec   Loss 13.4283   LearningRate 0.0859   Epoch: 1   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:49,414-Speed 3369.95 samples/sec   Loss 13.3766   LearningRate 0.0859   Epoch: 1   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:52,461-Speed 3362.02 samples/sec   Loss 13.5656   LearningRate 0.0858   Epoch: 1   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:55,506-Speed 3363.38 samples/sec   Loss 13.4418   LearningRate 0.0858   Epoch: 1   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:52:58,549-Speed 3366.00 samples/sec   Loss 13.5677   LearningRate 0.0858   Epoch: 1   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:01,596-Speed 3361.65 samples/sec   Loss 13.5504   LearningRate 0.0858   Epoch: 1   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:04,651-Speed 3352.14 samples/sec   Loss 13.4097   LearningRate 0.0857   Epoch: 1   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:07,709-Speed 3349.03 samples/sec   Loss 13.4451   LearningRate 0.0857   Epoch: 1   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:10,741-Speed 3378.59 samples/sec   Loss 13.4071   LearningRate 0.0857   Epoch: 1   Global Step: 6140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:13,801-Speed 3347.18 samples/sec   Loss 13.5547   LearningRate 0.0857   Epoch: 1   Global Step: 6150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:16,846-Speed 3363.62 samples/sec   Loss 13.5209   LearningRate 0.0857   Epoch: 1   Global Step: 6160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:19,889-Speed 3365.76 samples/sec   Loss 13.5813   LearningRate 0.0856   Epoch: 1   Global Step: 6170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:22,939-Speed 3358.61 samples/sec   Loss 13.3664   LearningRate 0.0856   Epoch: 1   Global Step: 6180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:26,005-Speed 3339.86 samples/sec   Loss 13.4231   LearningRate 0.0856   Epoch: 1   Global Step: 6190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:29,067-Speed 3345.08 samples/sec   Loss 13.4517   LearningRate 0.0856   Epoch: 1   Global Step: 6200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:32,111-Speed 3364.45 samples/sec   Loss 13.3166   LearningRate 0.0855   Epoch: 1   Global Step: 6210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:35,166-Speed 3353.51 samples/sec   Loss 13.3456   LearningRate 0.0855   Epoch: 1   Global Step: 6220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:38,232-Speed 3340.39 samples/sec   Loss 13.3879   LearningRate 0.0855   Epoch: 1   Global Step: 6230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:53:41,290-Speed 3348.62 samples/sec   Loss 13.3659   LearningRate 0.0855   Epoch: 1   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:44,348-Speed 3349.93 samples/sec   Loss 13.3999   LearningRate 0.0855   Epoch: 1   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:47,403-Speed 3352.35 samples/sec   Loss 13.4071   LearningRate 0.0854   Epoch: 1   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:50,458-Speed 3353.12 samples/sec   Loss 13.2980   LearningRate 0.0854   Epoch: 1   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:53,507-Speed 3358.81 samples/sec   Loss 13.4056   LearningRate 0.0854   Epoch: 1   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:56,551-Speed 3365.00 samples/sec   Loss 13.5222   LearningRate 0.0854   Epoch: 1   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:53:59,599-Speed 3360.89 samples/sec   Loss 13.3536   LearningRate 0.0853   Epoch: 1   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:02,642-Speed 3364.80 samples/sec   Loss 13.2913   LearningRate 0.0853   Epoch: 1   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:05,689-Speed 3362.26 samples/sec   Loss 13.3976   LearningRate 0.0853   Epoch: 1   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:08,733-Speed 3364.33 samples/sec   Loss 13.4070   LearningRate 0.0853   Epoch: 1   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:11,772-Speed 3370.26 samples/sec   Loss 13.2746   LearningRate 0.0853   Epoch: 1   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:14,809-Speed 3372.80 samples/sec   Loss 13.1939   LearningRate 0.0852   Epoch: 1   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:17,845-Speed 3373.79 samples/sec   Loss 13.3328   LearningRate 0.0852   Epoch: 1   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:20,878-Speed 3376.86 samples/sec   Loss 13.2100   LearningRate 0.0852   Epoch: 1   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:23,913-Speed 3374.16 samples/sec   Loss 13.4073   LearningRate 0.0852   Epoch: 1   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:26,947-Speed 3376.25 samples/sec   Loss 13.3002   LearningRate 0.0851   Epoch: 1   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:29,981-Speed 3375.62 samples/sec   Loss 13.2360   LearningRate 0.0851   Epoch: 1   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:33,009-Speed 3382.65 samples/sec   Loss 13.2052   LearningRate 0.0851   Epoch: 1   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:54:36,047-Speed 3370.96 samples/sec   Loss 13.2965   LearningRate 0.0851   Epoch: 1   Global Step: 6420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:54:39,091-Speed 3365.61 samples/sec   Loss 13.3097   LearningRate 0.0851   Epoch: 1   Global Step: 6430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:54:42,127-Speed 3372.71 samples/sec   Loss 13.1351   LearningRate 0.0850   Epoch: 1   Global Step: 6440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:54:45,150-Speed 3388.00 samples/sec   Loss 13.1818   LearningRate 0.0850   Epoch: 1   Global Step: 6450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:54:48,179-Speed 3381.68 samples/sec   Loss 13.4330   LearningRate 0.0850   Epoch: 1   Global Step: 6460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:54:51,204-Speed 3386.15 samples/sec   Loss 13.2482   LearningRate 0.0850   Epoch: 1   Global Step: 6470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:54:54,244-Speed 3369.28 samples/sec   Loss 13.2318   LearningRate 0.0849   Epoch: 1   Global Step: 6480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:54:57,292-Speed 3360.49 samples/sec   Loss 13.2518   LearningRate 0.0849   Epoch: 1   Global Step: 6490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:00,314-Speed 3389.52 samples/sec   Loss 13.1447   LearningRate 0.0849   Epoch: 1   Global Step: 6500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:03,339-Speed 3386.19 samples/sec   Loss 13.2137   LearningRate 0.0849   Epoch: 1   Global Step: 6510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:06,360-Speed 3389.89 samples/sec   Loss 13.2117   LearningRate 0.0849   Epoch: 1   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:55:09,373-Speed 3398.59 samples/sec   Loss 13.2274   LearningRate 0.0848   Epoch: 1   Global Step: 6530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:12,405-Speed 3378.03 samples/sec   Loss 13.0391   LearningRate 0.0848   Epoch: 1   Global Step: 6540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:15,458-Speed 3355.10 samples/sec   Loss 13.2067   LearningRate 0.0848   Epoch: 1   Global Step: 6550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:18,490-Speed 3378.52 samples/sec   Loss 13.2220   LearningRate 0.0848   Epoch: 1   Global Step: 6560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:21,516-Speed 3384.91 samples/sec   Loss 13.1584   LearningRate 0.0847   Epoch: 1   Global Step: 6570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:24,543-Speed 3383.77 samples/sec   Loss 13.1299   LearningRate 0.0847   Epoch: 1   Global Step: 6580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:27,624-Speed 3324.10 samples/sec   Loss 13.1807   LearningRate 0.0847   Epoch: 1   Global Step: 6590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:30,707-Speed 3322.32 samples/sec   Loss 13.1556   LearningRate 0.0847   Epoch: 1   Global Step: 6600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:33,738-Speed 3379.36 samples/sec   Loss 13.0751   LearningRate 0.0847   Epoch: 1   Global Step: 6610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:36,778-Speed 3368.23 samples/sec   Loss 13.2020   LearningRate 0.0846   Epoch: 1   Global Step: 6620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:55:39,814-Speed 3373.88 samples/sec   Loss 13.0997   LearningRate 0.0846   Epoch: 1   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:55:42,843-Speed 3381.01 samples/sec   Loss 13.0918   LearningRate 0.0846   Epoch: 1   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:55:45,882-Speed 3371.37 samples/sec   Loss 13.1033   LearningRate 0.0846   Epoch: 1   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:55:48,913-Speed 3378.84 samples/sec   Loss 13.0174   LearningRate 0.0845   Epoch: 1   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:55:51,965-Speed 3356.36 samples/sec   Loss 13.1363   LearningRate 0.0845   Epoch: 1   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:55:55,002-Speed 3371.90 samples/sec   Loss 13.0528   LearningRate 0.0845   Epoch: 1   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:55:58,033-Speed 3380.39 samples/sec   Loss 13.0787   LearningRate 0.0845   Epoch: 1   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:01,063-Speed 3379.32 samples/sec   Loss 13.0215   LearningRate 0.0845   Epoch: 1   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:04,093-Speed 3381.16 samples/sec   Loss 13.0090   LearningRate 0.0844   Epoch: 1   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:07,130-Speed 3372.26 samples/sec   Loss 12.9471   LearningRate 0.0844   Epoch: 1   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:10,160-Speed 3379.44 samples/sec   Loss 13.0689   LearningRate 0.0844   Epoch: 1   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:13,188-Speed 3382.66 samples/sec   Loss 12.9564   LearningRate 0.0844   Epoch: 1   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:16,228-Speed 3369.99 samples/sec   Loss 12.9427   LearningRate 0.0843   Epoch: 1   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:19,269-Speed 3367.80 samples/sec   Loss 12.9968   LearningRate 0.0843   Epoch: 1   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:22,304-Speed 3375.15 samples/sec   Loss 13.0423   LearningRate 0.0843   Epoch: 1   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:25,332-Speed 3382.16 samples/sec   Loss 12.9160   LearningRate 0.0843   Epoch: 1   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:28,363-Speed 3378.92 samples/sec   Loss 12.9316   LearningRate 0.0843   Epoch: 1   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:31,393-Speed 3381.12 samples/sec   Loss 13.0771   LearningRate 0.0842   Epoch: 1   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:34,423-Speed 3379.80 samples/sec   Loss 13.1145   LearningRate 0.0842   Epoch: 1   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:37,462-Speed 3370.30 samples/sec   Loss 13.0667   LearningRate 0.0842   Epoch: 1   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:40,491-Speed 3380.77 samples/sec   Loss 13.0421   LearningRate 0.0842   Epoch: 1   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:43,523-Speed 3378.78 samples/sec   Loss 12.8828   LearningRate 0.0841   Epoch: 1   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:46,554-Speed 3379.55 samples/sec   Loss 12.8507   LearningRate 0.0841   Epoch: 1   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:49,583-Speed 3381.61 samples/sec   Loss 12.9342   LearningRate 0.0841   Epoch: 1   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:52,620-Speed 3371.88 samples/sec   Loss 12.8631   LearningRate 0.0841   Epoch: 1   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:55,655-Speed 3374.61 samples/sec   Loss 12.8289   LearningRate 0.0841   Epoch: 1   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:56:58,681-Speed 3385.56 samples/sec   Loss 12.9975   LearningRate 0.0840   Epoch: 1   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:01,708-Speed 3382.67 samples/sec   Loss 12.9266   LearningRate 0.0840   Epoch: 1   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:04,744-Speed 3374.66 samples/sec   Loss 13.0318   LearningRate 0.0840   Epoch: 1   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:07,775-Speed 3378.66 samples/sec   Loss 12.7985   LearningRate 0.0840   Epoch: 1   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:10,812-Speed 3371.82 samples/sec   Loss 12.9299   LearningRate 0.0839   Epoch: 1   Global Step: 6930   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:57:13,830-Speed 3393.97 samples/sec   Loss 12.8600   LearningRate 0.0839   Epoch: 1   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:16,858-Speed 3383.04 samples/sec   Loss 12.8057   LearningRate 0.0839   Epoch: 1   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:19,885-Speed 3384.57 samples/sec   Loss 12.9355   LearningRate 0.0839   Epoch: 1   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:22,919-Speed 3374.72 samples/sec   Loss 12.8403   LearningRate 0.0839   Epoch: 1   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:25,955-Speed 3374.42 samples/sec   Loss 12.8843   LearningRate 0.0838   Epoch: 1   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:29,036-Speed 3324.37 samples/sec   Loss 12.9865   LearningRate 0.0838   Epoch: 1   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:32,067-Speed 3378.62 samples/sec   Loss 12.8528   LearningRate 0.0838   Epoch: 1   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:35,097-Speed 3380.46 samples/sec   Loss 13.0202   LearningRate 0.0838   Epoch: 1   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:38,131-Speed 3376.11 samples/sec   Loss 12.9088   LearningRate 0.0837   Epoch: 1   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:41,154-Speed 3387.66 samples/sec   Loss 12.7726   LearningRate 0.0837   Epoch: 1   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:44,177-Speed 3388.03 samples/sec   Loss 12.7834   LearningRate 0.0837   Epoch: 1   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:47,220-Speed 3366.70 samples/sec   Loss 13.0775   LearningRate 0.0837   Epoch: 1   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:50,248-Speed 3382.40 samples/sec   Loss 12.9288   LearningRate 0.0837   Epoch: 1   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:53,275-Speed 3383.17 samples/sec   Loss 12.8598   LearningRate 0.0836   Epoch: 1   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:56,305-Speed 3380.12 samples/sec   Loss 12.8392   LearningRate 0.0836   Epoch: 1   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:57:59,343-Speed 3371.70 samples/sec   Loss 12.8853   LearningRate 0.0836   Epoch: 1   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:02,371-Speed 3382.97 samples/sec   Loss 12.6396   LearningRate 0.0836   Epoch: 1   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:05,399-Speed 3381.75 samples/sec   Loss 12.8098   LearningRate 0.0835   Epoch: 1   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:08,439-Speed 3368.91 samples/sec   Loss 12.9450   LearningRate 0.0835   Epoch: 1   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:11,482-Speed 3366.34 samples/sec   Loss 12.8153   LearningRate 0.0835   Epoch: 1   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:14,522-Speed 3368.92 samples/sec   Loss 12.8526   LearningRate 0.0835   Epoch: 1   Global Step: 7140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:17,553-Speed 3379.91 samples/sec   Loss 12.8762   LearningRate 0.0835   Epoch: 1   Global Step: 7150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:20,582-Speed 3381.06 samples/sec   Loss 12.7645   LearningRate 0.0834   Epoch: 1   Global Step: 7160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:23,621-Speed 3370.35 samples/sec   Loss 12.7657   LearningRate 0.0834   Epoch: 1   Global Step: 7170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:26,661-Speed 3369.53 samples/sec   Loss 12.6751   LearningRate 0.0834   Epoch: 1   Global Step: 7180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:29,688-Speed 3383.00 samples/sec   Loss 12.8973   LearningRate 0.0834   Epoch: 1   Global Step: 7190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:32,731-Speed 3366.60 samples/sec   Loss 12.9011   LearningRate 0.0833   Epoch: 1   Global Step: 7200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:58:35,756-Speed 3384.82 samples/sec   Loss 12.7724   LearningRate 0.0833   Epoch: 1   Global Step: 7210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:58:38,813-Speed 3350.38 samples/sec   Loss 12.8571   LearningRate 0.0833   Epoch: 1   Global Step: 7220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:58:41,842-Speed 3381.97 samples/sec   Loss 12.8714   LearningRate 0.0833   Epoch: 1   Global Step: 7230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:58:44,879-Speed 3372.29 samples/sec   Loss 12.8696   LearningRate 0.0833   Epoch: 1   Global Step: 7240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:58:47,910-Speed 3379.46 samples/sec   Loss 12.7628   LearningRate 0.0832   Epoch: 1   Global Step: 7250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:58:50,941-Speed 3379.86 samples/sec   Loss 12.7811   LearningRate 0.0832   Epoch: 1   Global Step: 7260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:58:53,970-Speed 3380.65 samples/sec   Loss 12.7829   LearningRate 0.0832   Epoch: 1   Global Step: 7270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:58:57,005-Speed 3375.39 samples/sec   Loss 12.8279   LearningRate 0.0832   Epoch: 1   Global Step: 7280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:59:00,041-Speed 3373.29 samples/sec   Loss 12.7130   LearningRate 0.0831   Epoch: 1   Global Step: 7290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:59:03,093-Speed 3356.25 samples/sec   Loss 12.6692   LearningRate 0.0831   Epoch: 1   Global Step: 7300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 12:59:06,127-Speed 3375.87 samples/sec   Loss 12.6781   LearningRate 0.0831   Epoch: 1   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:09,165-Speed 3371.41 samples/sec   Loss 12.7615   LearningRate 0.0831   Epoch: 1   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:12,199-Speed 3374.94 samples/sec   Loss 12.7982   LearningRate 0.0831   Epoch: 1   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:15,243-Speed 3365.62 samples/sec   Loss 12.7049   LearningRate 0.0830   Epoch: 1   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:18,278-Speed 3374.49 samples/sec   Loss 12.5581   LearningRate 0.0830   Epoch: 1   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:21,309-Speed 3379.17 samples/sec   Loss 12.6024   LearningRate 0.0830   Epoch: 1   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:24,342-Speed 3377.20 samples/sec   Loss 12.4712   LearningRate 0.0830   Epoch: 1   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:27,372-Speed 3380.47 samples/sec   Loss 12.5022   LearningRate 0.0829   Epoch: 1   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:30,400-Speed 3381.84 samples/sec   Loss 12.5797   LearningRate 0.0829   Epoch: 1   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:33,428-Speed 3382.48 samples/sec   Loss 12.6613   LearningRate 0.0829   Epoch: 1   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:36,461-Speed 3376.99 samples/sec   Loss 12.7450   LearningRate 0.0829   Epoch: 1   Global Step: 7410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 12:59:39,476-Speed 3396.94 samples/sec   Loss 12.6389   LearningRate 0.0829   Epoch: 1   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:42,526-Speed 3358.05 samples/sec   Loss 12.5288   LearningRate 0.0828   Epoch: 1   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:45,565-Speed 3370.44 samples/sec   Loss 12.6072   LearningRate 0.0828   Epoch: 1   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:48,602-Speed 3372.72 samples/sec   Loss 12.4775   LearningRate 0.0828   Epoch: 1   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:51,641-Speed 3370.42 samples/sec   Loss 12.6329   LearningRate 0.0828   Epoch: 1   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:54,677-Speed 3373.34 samples/sec   Loss 12.6828   LearningRate 0.0828   Epoch: 1   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 12:59:57,709-Speed 3377.97 samples/sec   Loss 12.6261   LearningRate 0.0827   Epoch: 1   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:00,748-Speed 3371.18 samples/sec   Loss 12.4271   LearningRate 0.0827   Epoch: 1   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:03,788-Speed 3368.43 samples/sec   Loss 12.5362   LearningRate 0.0827   Epoch: 1   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:06,820-Speed 3378.85 samples/sec   Loss 12.6917   LearningRate 0.0827   Epoch: 1   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:09,852-Speed 3377.03 samples/sec   Loss 12.4360   LearningRate 0.0826   Epoch: 1   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:12,903-Speed 3357.36 samples/sec   Loss 12.4817   LearningRate 0.0826   Epoch: 1   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:15,939-Speed 3373.98 samples/sec   Loss 12.6777   LearningRate 0.0826   Epoch: 1   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:18,972-Speed 3376.54 samples/sec   Loss 12.6952   LearningRate 0.0826   Epoch: 1   Global Step: 7550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:22,001-Speed 3381.55 samples/sec   Loss 12.5781   LearningRate 0.0826   Epoch: 1   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:25,037-Speed 3374.07 samples/sec   Loss 12.5650   LearningRate 0.0825   Epoch: 1   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:28,071-Speed 3375.50 samples/sec   Loss 12.5490   LearningRate 0.0825   Epoch: 1   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:31,099-Speed 3382.18 samples/sec   Loss 12.4645   LearningRate 0.0825   Epoch: 1   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:34,129-Speed 3380.99 samples/sec   Loss 12.5535   LearningRate 0.0825   Epoch: 1   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:37,160-Speed 3378.76 samples/sec   Loss 12.5433   LearningRate 0.0824   Epoch: 1   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:00:40,178-Speed 3393.33 samples/sec   Loss 12.3675   LearningRate 0.0824   Epoch: 1   Global Step: 7620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:00:43,207-Speed 3382.05 samples/sec   Loss 12.4706   LearningRate 0.0824   Epoch: 1   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:00:46,242-Speed 3375.09 samples/sec   Loss 12.3383   LearningRate 0.0824   Epoch: 1   Global Step: 7640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:00:49,308-Speed 3340.05 samples/sec   Loss 12.5532   LearningRate 0.0824   Epoch: 1   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:00:52,464-Speed 3245.91 samples/sec   Loss 12.5462   LearningRate 0.0823   Epoch: 1   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:00:55,503-Speed 3370.28 samples/sec   Loss 12.6861   LearningRate 0.0823   Epoch: 1   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:00:58,536-Speed 3376.88 samples/sec   Loss 12.2883   LearningRate 0.0823   Epoch: 1   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:01,570-Speed 3374.92 samples/sec   Loss 12.5902   LearningRate 0.0823   Epoch: 1   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:04,607-Speed 3372.96 samples/sec   Loss 12.3789   LearningRate 0.0822   Epoch: 1   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:07,641-Speed 3375.76 samples/sec   Loss 12.6980   LearningRate 0.0822   Epoch: 1   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:10,659-Speed 3394.03 samples/sec   Loss 12.3882   LearningRate 0.0822   Epoch: 1   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:13,700-Speed 3368.14 samples/sec   Loss 12.3676   LearningRate 0.0822   Epoch: 1   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:16,732-Speed 3378.18 samples/sec   Loss 12.3058   LearningRate 0.0822   Epoch: 1   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:19,762-Speed 3380.46 samples/sec   Loss 12.5252   LearningRate 0.0821   Epoch: 1   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:22,822-Speed 3347.08 samples/sec   Loss 12.2930   LearningRate 0.0821   Epoch: 1   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:25,859-Speed 3372.80 samples/sec   Loss 12.4670   LearningRate 0.0821   Epoch: 1   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:28,908-Speed 3358.48 samples/sec   Loss 12.3782   LearningRate 0.0821   Epoch: 1   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:31,935-Speed 3383.59 samples/sec   Loss 12.4116   LearningRate 0.0820   Epoch: 1   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:34,977-Speed 3367.10 samples/sec   Loss 12.3073   LearningRate 0.0820   Epoch: 1   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:38,021-Speed 3364.78 samples/sec   Loss 12.4931   LearningRate 0.0820   Epoch: 1   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:41,045-Speed 3386.80 samples/sec   Loss 12.3458   LearningRate 0.0820   Epoch: 1   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:44,079-Speed 3376.85 samples/sec   Loss 12.4519   LearningRate 0.0820   Epoch: 1   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:47,109-Speed 3379.67 samples/sec   Loss 12.5452   LearningRate 0.0819   Epoch: 1   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:50,151-Speed 3366.93 samples/sec   Loss 12.2739   LearningRate 0.0819   Epoch: 1   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:53,202-Speed 3357.14 samples/sec   Loss 12.3231   LearningRate 0.0819   Epoch: 1   Global Step: 7860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:56,239-Speed 3373.21 samples/sec   Loss 12.4183   LearningRate 0.0819   Epoch: 1   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:01:59,274-Speed 3373.79 samples/sec   Loss 12.3709   LearningRate 0.0819   Epoch: 1   Global Step: 7880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:02:02,299-Speed 3386.65 samples/sec   Loss 12.2883   LearningRate 0.0818   Epoch: 1   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:05,354-Speed 3351.97 samples/sec   Loss 12.2775   LearningRate 0.0818   Epoch: 1   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:08,392-Speed 3371.17 samples/sec   Loss 12.3129   LearningRate 0.0818   Epoch: 1   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:11,428-Speed 3374.55 samples/sec   Loss 12.3857   LearningRate 0.0818   Epoch: 1   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:14,483-Speed 3352.88 samples/sec   Loss 12.2941   LearningRate 0.0817   Epoch: 1   Global Step: 7930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:17,525-Speed 3367.02 samples/sec   Loss 12.1905   LearningRate 0.0817   Epoch: 1   Global Step: 7940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:20,560-Speed 3373.82 samples/sec   Loss 12.2637   LearningRate 0.0817   Epoch: 1   Global Step: 7950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:23,591-Speed 3379.35 samples/sec   Loss 12.3883   LearningRate 0.0817   Epoch: 1   Global Step: 7960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:26,622-Speed 3379.32 samples/sec   Loss 12.1573   LearningRate 0.0817   Epoch: 1   Global Step: 7970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:29,653-Speed 3379.00 samples/sec   Loss 12.0458   LearningRate 0.0816   Epoch: 1   Global Step: 7980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:02:32,683-Speed 3380.78 samples/sec   Loss 12.3719   LearningRate 0.0816   Epoch: 1   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:02:35,726-Speed 3365.36 samples/sec   Loss 12.3211   LearningRate 0.0816   Epoch: 1   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:03:19,202-[lfw][8000]XNorm: 23.443890
Training: 2022-04-26 13:03:19,203-[lfw][8000]Accuracy-Flip: 0.99583+-0.00261
Training: 2022-04-26 13:03:19,203-[lfw][8000]Accuracy-Highest: 0.99583
Training: 2022-04-26 13:04:09,656-[cfp_fp][8000]XNorm: 21.457723
Training: 2022-04-26 13:04:09,656-[cfp_fp][8000]Accuracy-Flip: 0.96486+-0.00806
Training: 2022-04-26 13:04:09,657-[cfp_fp][8000]Accuracy-Highest: 0.96486
Training: 2022-04-26 13:04:53,018-[agedb_30][8000]XNorm: 23.139969
Training: 2022-04-26 13:04:53,018-[agedb_30][8000]Accuracy-Flip: 0.94700+-0.01024
Training: 2022-04-26 13:04:53,019-[agedb_30][8000]Accuracy-Highest: 0.94700
Training: 2022-04-26 13:04:56,050-Speed 72.97 samples/sec   Loss 12.2026   LearningRate 0.0816   Epoch: 1   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:04:59,081-Speed 3379.65 samples/sec   Loss 12.3251   LearningRate 0.0815   Epoch: 1   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:02,122-Speed 3367.73 samples/sec   Loss 12.2858   LearningRate 0.0815   Epoch: 1   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:05,161-Speed 3370.28 samples/sec   Loss 12.2768   LearningRate 0.0815   Epoch: 1   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:08,193-Speed 3377.64 samples/sec   Loss 12.3160   LearningRate 0.0815   Epoch: 1   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:11,230-Speed 3373.35 samples/sec   Loss 12.2696   LearningRate 0.0815   Epoch: 1   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:14,270-Speed 3368.67 samples/sec   Loss 12.2340   LearningRate 0.0814   Epoch: 1   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:17,319-Speed 3358.99 samples/sec   Loss 12.1417   LearningRate 0.0814   Epoch: 1   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:20,356-Speed 3372.41 samples/sec   Loss 12.1553   LearningRate 0.0814   Epoch: 1   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:23,415-Speed 3349.33 samples/sec   Loss 12.2517   LearningRate 0.0814   Epoch: 1   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:26,452-Speed 3372.06 samples/sec   Loss 12.2136   LearningRate 0.0813   Epoch: 1   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:29,493-Speed 3367.72 samples/sec   Loss 12.1890   LearningRate 0.0813   Epoch: 1   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:32,529-Speed 3373.63 samples/sec   Loss 12.2110   LearningRate 0.0813   Epoch: 1   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:35,564-Speed 3375.88 samples/sec   Loss 12.0840   LearningRate 0.0813   Epoch: 1   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:38,598-Speed 3375.40 samples/sec   Loss 12.2089   LearningRate 0.0813   Epoch: 1   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:41,631-Speed 3376.19 samples/sec   Loss 12.2774   LearningRate 0.0812   Epoch: 1   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:44,662-Speed 3379.96 samples/sec   Loss 12.3149   LearningRate 0.0812   Epoch: 1   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:47,688-Speed 3384.09 samples/sec   Loss 12.2998   LearningRate 0.0812   Epoch: 1   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:50,713-Speed 3386.51 samples/sec   Loss 12.1716   LearningRate 0.0812   Epoch: 1   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:53,759-Speed 3362.52 samples/sec   Loss 12.2259   LearningRate 0.0812   Epoch: 1   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:56,790-Speed 3378.89 samples/sec   Loss 12.1992   LearningRate 0.0811   Epoch: 1   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:05:59,829-Speed 3369.86 samples/sec   Loss 12.2270   LearningRate 0.0811   Epoch: 1   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:02,875-Speed 3362.92 samples/sec   Loss 12.0249   LearningRate 0.0811   Epoch: 1   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:05,903-Speed 3383.22 samples/sec   Loss 12.2160   LearningRate 0.0811   Epoch: 1   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:08,932-Speed 3381.60 samples/sec   Loss 12.1470   LearningRate 0.0810   Epoch: 1   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:12,021-Speed 3315.09 samples/sec   Loss 12.2451   LearningRate 0.0810   Epoch: 1   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:15,052-Speed 3379.70 samples/sec   Loss 12.0702   LearningRate 0.0810   Epoch: 1   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:27,268-Speed 838.28 samples/sec   Loss 10.1839   LearningRate 0.0810   Epoch: 2   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:30,286-Speed 3394.26 samples/sec   Loss 10.1101   LearningRate 0.0810   Epoch: 2   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:33,325-Speed 3370.05 samples/sec   Loss 10.2044   LearningRate 0.0809   Epoch: 2   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:36,351-Speed 3384.32 samples/sec   Loss 10.1983   LearningRate 0.0809   Epoch: 2   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:39,371-Speed 3391.73 samples/sec   Loss 10.1698   LearningRate 0.0809   Epoch: 2   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:42,396-Speed 3386.57 samples/sec   Loss 10.2550   LearningRate 0.0809   Epoch: 2   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:45,419-Speed 3387.75 samples/sec   Loss 10.2145   LearningRate 0.0808   Epoch: 2   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:48,444-Speed 3386.23 samples/sec   Loss 10.0905   LearningRate 0.0808   Epoch: 2   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:51,468-Speed 3387.01 samples/sec   Loss 10.1838   LearningRate 0.0808   Epoch: 2   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:54,493-Speed 3385.86 samples/sec   Loss 10.3224   LearningRate 0.0808   Epoch: 2   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:06:57,525-Speed 3378.21 samples/sec   Loss 10.3336   LearningRate 0.0808   Epoch: 2   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:00,553-Speed 3382.01 samples/sec   Loss 10.2115   LearningRate 0.0807   Epoch: 2   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:03,597-Speed 3364.45 samples/sec   Loss 10.2334   LearningRate 0.0807   Epoch: 2   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:06,643-Speed 3362.75 samples/sec   Loss 10.1984   LearningRate 0.0807   Epoch: 2   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:09,679-Speed 3373.33 samples/sec   Loss 10.3156   LearningRate 0.0807   Epoch: 2   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:12,718-Speed 3370.41 samples/sec   Loss 10.3888   LearningRate 0.0807   Epoch: 2   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:15,766-Speed 3360.21 samples/sec   Loss 10.2508   LearningRate 0.0806   Epoch: 2   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:18,805-Speed 3370.75 samples/sec   Loss 10.3240   LearningRate 0.0806   Epoch: 2   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:21,843-Speed 3371.62 samples/sec   Loss 10.3743   LearningRate 0.0806   Epoch: 2   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:24,891-Speed 3360.48 samples/sec   Loss 10.4642   LearningRate 0.0806   Epoch: 2   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:27,958-Speed 3338.99 samples/sec   Loss 10.4893   LearningRate 0.0805   Epoch: 2   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:30,993-Speed 3375.01 samples/sec   Loss 10.4551   LearningRate 0.0805   Epoch: 2   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:34,043-Speed 3358.41 samples/sec   Loss 10.4057   LearningRate 0.0805   Epoch: 2   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:37,098-Speed 3352.66 samples/sec   Loss 10.4798   LearningRate 0.0805   Epoch: 2   Global Step: 8510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:40,143-Speed 3363.19 samples/sec   Loss 10.4731   LearningRate 0.0805   Epoch: 2   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:43,187-Speed 3364.91 samples/sec   Loss 10.4337   LearningRate 0.0804   Epoch: 2   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:46,243-Speed 3351.88 samples/sec   Loss 10.4300   LearningRate 0.0804   Epoch: 2   Global Step: 8540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:49,290-Speed 3361.42 samples/sec   Loss 10.3359   LearningRate 0.0804   Epoch: 2   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:52,340-Speed 3358.77 samples/sec   Loss 10.5295   LearningRate 0.0804   Epoch: 2   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:55,383-Speed 3365.29 samples/sec   Loss 10.6487   LearningRate 0.0803   Epoch: 2   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:07:58,423-Speed 3368.84 samples/sec   Loss 10.6133   LearningRate 0.0803   Epoch: 2   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:01,451-Speed 3382.70 samples/sec   Loss 10.6255   LearningRate 0.0803   Epoch: 2   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:04,521-Speed 3336.72 samples/sec   Loss 10.6142   LearningRate 0.0803   Epoch: 2   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:07,564-Speed 3365.91 samples/sec   Loss 10.7189   LearningRate 0.0803   Epoch: 2   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:10,603-Speed 3370.26 samples/sec   Loss 10.7793   LearningRate 0.0802   Epoch: 2   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:13,634-Speed 3379.57 samples/sec   Loss 10.5492   LearningRate 0.0802   Epoch: 2   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:16,660-Speed 3383.90 samples/sec   Loss 10.6656   LearningRate 0.0802   Epoch: 2   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:19,693-Speed 3377.13 samples/sec   Loss 10.5744   LearningRate 0.0802   Epoch: 2   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:22,727-Speed 3375.85 samples/sec   Loss 10.5032   LearningRate 0.0802   Epoch: 2   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:25,760-Speed 3377.21 samples/sec   Loss 10.6392   LearningRate 0.0801   Epoch: 2   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:28,807-Speed 3360.88 samples/sec   Loss 10.7077   LearningRate 0.0801   Epoch: 2   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:31,833-Speed 3384.91 samples/sec   Loss 10.6323   LearningRate 0.0801   Epoch: 2   Global Step: 8690   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-26 13:08:34,846-Speed 3399.68 samples/sec   Loss 10.6443   LearningRate 0.0801   Epoch: 2   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:37,887-Speed 3368.28 samples/sec   Loss 10.6802   LearningRate 0.0800   Epoch: 2   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:40,909-Speed 3389.26 samples/sec   Loss 10.8063   LearningRate 0.0800   Epoch: 2   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:43,934-Speed 3385.98 samples/sec   Loss 10.6580   LearningRate 0.0800   Epoch: 2   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:46,955-Speed 3390.33 samples/sec   Loss 10.8262   LearningRate 0.0800   Epoch: 2   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:49,983-Speed 3382.36 samples/sec   Loss 10.8078   LearningRate 0.0800   Epoch: 2   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:53,010-Speed 3383.87 samples/sec   Loss 10.8879   LearningRate 0.0799   Epoch: 2   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:56,032-Speed 3390.01 samples/sec   Loss 10.6428   LearningRate 0.0799   Epoch: 2   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:08:59,061-Speed 3380.78 samples/sec   Loss 10.7654   LearningRate 0.0799   Epoch: 2   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:02,075-Speed 3398.72 samples/sec   Loss 10.8698   LearningRate 0.0799   Epoch: 2   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:05,094-Speed 3392.61 samples/sec   Loss 10.9735   LearningRate 0.0799   Epoch: 2   Global Step: 8800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:08,114-Speed 3390.83 samples/sec   Loss 10.8988   LearningRate 0.0798   Epoch: 2   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:11,136-Speed 3390.36 samples/sec   Loss 10.7847   LearningRate 0.0798   Epoch: 2   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:14,156-Speed 3390.39 samples/sec   Loss 10.8386   LearningRate 0.0798   Epoch: 2   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:17,201-Speed 3364.32 samples/sec   Loss 10.8056   LearningRate 0.0798   Epoch: 2   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:20,221-Speed 3391.84 samples/sec   Loss 10.8837   LearningRate 0.0797   Epoch: 2   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:23,245-Speed 3387.40 samples/sec   Loss 10.8434   LearningRate 0.0797   Epoch: 2   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:26,292-Speed 3360.99 samples/sec   Loss 10.9597   LearningRate 0.0797   Epoch: 2   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:29,328-Speed 3373.00 samples/sec   Loss 10.9234   LearningRate 0.0797   Epoch: 2   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:32,349-Speed 3390.83 samples/sec   Loss 11.0374   LearningRate 0.0797   Epoch: 2   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:09:35,368-Speed 3393.02 samples/sec   Loss 11.0354   LearningRate 0.0796   Epoch: 2   Global Step: 8900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:38,391-Speed 3387.36 samples/sec   Loss 10.9568   LearningRate 0.0796   Epoch: 2   Global Step: 8910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:41,409-Speed 3393.85 samples/sec   Loss 10.9878   LearningRate 0.0796   Epoch: 2   Global Step: 8920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:44,432-Speed 3388.46 samples/sec   Loss 10.8927   LearningRate 0.0796   Epoch: 2   Global Step: 8930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:47,478-Speed 3362.72 samples/sec   Loss 10.9144   LearningRate 0.0795   Epoch: 2   Global Step: 8940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:50,507-Speed 3381.07 samples/sec   Loss 10.9892   LearningRate 0.0795   Epoch: 2   Global Step: 8950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:53,538-Speed 3379.33 samples/sec   Loss 10.9655   LearningRate 0.0795   Epoch: 2   Global Step: 8960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:56,570-Speed 3378.11 samples/sec   Loss 11.1116   LearningRate 0.0795   Epoch: 2   Global Step: 8970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:09:59,622-Speed 3355.76 samples/sec   Loss 11.0789   LearningRate 0.0795   Epoch: 2   Global Step: 8980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:02,644-Speed 3389.70 samples/sec   Loss 11.0917   LearningRate 0.0794   Epoch: 2   Global Step: 8990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:05,672-Speed 3382.40 samples/sec   Loss 11.1585   LearningRate 0.0794   Epoch: 2   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:10:08,704-Speed 3378.38 samples/sec   Loss 11.1383   LearningRate 0.0794   Epoch: 2   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:10:11,724-Speed 3391.69 samples/sec   Loss 11.0786   LearningRate 0.0794   Epoch: 2   Global Step: 9020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:14,749-Speed 3385.03 samples/sec   Loss 11.0780   LearningRate 0.0794   Epoch: 2   Global Step: 9030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:17,776-Speed 3384.34 samples/sec   Loss 11.1212   LearningRate 0.0793   Epoch: 2   Global Step: 9040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:20,800-Speed 3387.20 samples/sec   Loss 11.0856   LearningRate 0.0793   Epoch: 2   Global Step: 9050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:23,825-Speed 3385.48 samples/sec   Loss 11.1087   LearningRate 0.0793   Epoch: 2   Global Step: 9060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:26,851-Speed 3385.12 samples/sec   Loss 11.0586   LearningRate 0.0793   Epoch: 2   Global Step: 9070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-26 13:10:29,884-Speed 3375.90 samples/sec   Loss 11.0803   LearningRate 0.0792   Epoch: 2   Global Step: 9080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:10:32,906-Speed 3390.07 samples/sec   Loss 11.0763   LearningRate 0.0792   Epoch: 2   Global Step: 9090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:10:35,939-Speed 3377.21 samples/sec   Loss 11.1559   LearningRate 0.0792   Epoch: 2   Global Step: 9100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:10:38,964-Speed 3385.30 samples/sec   Loss 11.0461   LearningRate 0.0792   Epoch: 2   Global Step: 9110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:10:41,987-Speed 3388.05 samples/sec   Loss 11.0097   LearningRate 0.0792   Epoch: 2   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:10:45,013-Speed 3385.45 samples/sec   Loss 11.3026   LearningRate 0.0791   Epoch: 2   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:10:48,068-Speed 3352.50 samples/sec   Loss 11.0449   LearningRate 0.0791   Epoch: 2   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:10:51,106-Speed 3370.75 samples/sec   Loss 11.0898   LearningRate 0.0791   Epoch: 2   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:10:54,154-Speed 3360.88 samples/sec   Loss 11.2513   LearningRate 0.0791   Epoch: 2   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:10:57,184-Speed 3379.77 samples/sec   Loss 11.0445   LearningRate 0.0791   Epoch: 2   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:00,208-Speed 3387.33 samples/sec   Loss 11.1161   LearningRate 0.0790   Epoch: 2   Global Step: 9180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:03,243-Speed 3375.33 samples/sec   Loss 11.2079   LearningRate 0.0790   Epoch: 2   Global Step: 9190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:06,270-Speed 3383.84 samples/sec   Loss 11.0542   LearningRate 0.0790   Epoch: 2   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:09,297-Speed 3384.32 samples/sec   Loss 11.2625   LearningRate 0.0790   Epoch: 2   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:12,354-Speed 3349.98 samples/sec   Loss 11.1159   LearningRate 0.0789   Epoch: 2   Global Step: 9220   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 13:11:15,375-Speed 3391.45 samples/sec   Loss 11.1186   LearningRate 0.0789   Epoch: 2   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:18,419-Speed 3364.09 samples/sec   Loss 11.2096   LearningRate 0.0789   Epoch: 2   Global Step: 9240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:21,446-Speed 3384.65 samples/sec   Loss 11.1878   LearningRate 0.0789   Epoch: 2   Global Step: 9250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:24,472-Speed 3384.52 samples/sec   Loss 11.2669   LearningRate 0.0789   Epoch: 2   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:27,495-Speed 3388.43 samples/sec   Loss 11.1055   LearningRate 0.0788   Epoch: 2   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:30,528-Speed 3376.40 samples/sec   Loss 11.1991   LearningRate 0.0788   Epoch: 2   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:33,556-Speed 3382.34 samples/sec   Loss 11.2595   LearningRate 0.0788   Epoch: 2   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:36,580-Speed 3387.91 samples/sec   Loss 11.0885   LearningRate 0.0788   Epoch: 2   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:39,603-Speed 3387.76 samples/sec   Loss 11.1885   LearningRate 0.0788   Epoch: 2   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:42,623-Speed 3391.06 samples/sec   Loss 11.2576   LearningRate 0.0787   Epoch: 2   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:45,639-Speed 3396.68 samples/sec   Loss 11.2494   LearningRate 0.0787   Epoch: 2   Global Step: 9330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:48,664-Speed 3385.66 samples/sec   Loss 11.1772   LearningRate 0.0787   Epoch: 2   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:51,692-Speed 3382.03 samples/sec   Loss 11.1658   LearningRate 0.0787   Epoch: 2   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:54,732-Speed 3369.53 samples/sec   Loss 11.1180   LearningRate 0.0786   Epoch: 2   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:11:57,760-Speed 3382.88 samples/sec   Loss 11.0160   LearningRate 0.0786   Epoch: 2   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:00,792-Speed 3378.08 samples/sec   Loss 11.1594   LearningRate 0.0786   Epoch: 2   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:03,851-Speed 3348.74 samples/sec   Loss 11.2111   LearningRate 0.0786   Epoch: 2   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:06,874-Speed 3387.36 samples/sec   Loss 11.1954   LearningRate 0.0786   Epoch: 2   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:09,901-Speed 3383.39 samples/sec   Loss 11.0784   LearningRate 0.0785   Epoch: 2   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:12,927-Speed 3385.20 samples/sec   Loss 11.1910   LearningRate 0.0785   Epoch: 2   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:15,940-Speed 3399.69 samples/sec   Loss 11.1268   LearningRate 0.0785   Epoch: 2   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:18,979-Speed 3369.70 samples/sec   Loss 11.0946   LearningRate 0.0785   Epoch: 2   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:22,003-Speed 3387.01 samples/sec   Loss 11.2406   LearningRate 0.0785   Epoch: 2   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:25,065-Speed 3344.76 samples/sec   Loss 11.2573   LearningRate 0.0784   Epoch: 2   Global Step: 9460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:28,097-Speed 3379.09 samples/sec   Loss 11.2429   LearningRate 0.0784   Epoch: 2   Global Step: 9470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:31,122-Speed 3385.64 samples/sec   Loss 11.3129   LearningRate 0.0784   Epoch: 2   Global Step: 9480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:34,146-Speed 3386.64 samples/sec   Loss 11.2896   LearningRate 0.0784   Epoch: 2   Global Step: 9490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:37,175-Speed 3381.98 samples/sec   Loss 11.1390   LearningRate 0.0783   Epoch: 2   Global Step: 9500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:40,210-Speed 3373.99 samples/sec   Loss 11.1695   LearningRate 0.0783   Epoch: 2   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:43,239-Speed 3381.45 samples/sec   Loss 11.2411   LearningRate 0.0783   Epoch: 2   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:46,254-Speed 3396.88 samples/sec   Loss 11.2395   LearningRate 0.0783   Epoch: 2   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:49,295-Speed 3368.78 samples/sec   Loss 11.2144   LearningRate 0.0783   Epoch: 2   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:12:52,347-Speed 3355.64 samples/sec   Loss 11.1681   LearningRate 0.0782   Epoch: 2   Global Step: 9550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:12:55,382-Speed 3374.16 samples/sec   Loss 11.1964   LearningRate 0.0782   Epoch: 2   Global Step: 9560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:12:58,408-Speed 3385.51 samples/sec   Loss 11.3053   LearningRate 0.0782   Epoch: 2   Global Step: 9570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:01,440-Speed 3377.95 samples/sec   Loss 11.1954   LearningRate 0.0782   Epoch: 2   Global Step: 9580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:04,461-Speed 3390.08 samples/sec   Loss 11.3484   LearningRate 0.0782   Epoch: 2   Global Step: 9590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:07,496-Speed 3375.07 samples/sec   Loss 11.4029   LearningRate 0.0781   Epoch: 2   Global Step: 9600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:10,520-Speed 3386.74 samples/sec   Loss 11.0634   LearningRate 0.0781   Epoch: 2   Global Step: 9610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:13,548-Speed 3382.07 samples/sec   Loss 11.2173   LearningRate 0.0781   Epoch: 2   Global Step: 9620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:16,589-Speed 3368.52 samples/sec   Loss 11.2662   LearningRate 0.0781   Epoch: 2   Global Step: 9630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:19,611-Speed 3388.99 samples/sec   Loss 11.2336   LearningRate 0.0780   Epoch: 2   Global Step: 9640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:22,635-Speed 3387.47 samples/sec   Loss 11.1729   LearningRate 0.0780   Epoch: 2   Global Step: 9650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:25,661-Speed 3384.43 samples/sec   Loss 11.1934   LearningRate 0.0780   Epoch: 2   Global Step: 9660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:28,703-Speed 3367.04 samples/sec   Loss 11.2001   LearningRate 0.0780   Epoch: 2   Global Step: 9670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:31,727-Speed 3387.22 samples/sec   Loss 11.1749   LearningRate 0.0780   Epoch: 2   Global Step: 9680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:13:34,758-Speed 3379.11 samples/sec   Loss 11.3279   LearningRate 0.0779   Epoch: 2   Global Step: 9690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:37,785-Speed 3384.06 samples/sec   Loss 11.2306   LearningRate 0.0779   Epoch: 2   Global Step: 9700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:40,814-Speed 3381.44 samples/sec   Loss 11.2082   LearningRate 0.0779   Epoch: 2   Global Step: 9710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:43,847-Speed 3376.72 samples/sec   Loss 11.2561   LearningRate 0.0779   Epoch: 2   Global Step: 9720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:46,874-Speed 3384.04 samples/sec   Loss 11.2025   LearningRate 0.0779   Epoch: 2   Global Step: 9730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:49,902-Speed 3382.46 samples/sec   Loss 11.1963   LearningRate 0.0778   Epoch: 2   Global Step: 9740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:52,926-Speed 3387.02 samples/sec   Loss 11.2425   LearningRate 0.0778   Epoch: 2   Global Step: 9750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:55,956-Speed 3379.57 samples/sec   Loss 11.2328   LearningRate 0.0778   Epoch: 2   Global Step: 9760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:13:58,985-Speed 3382.15 samples/sec   Loss 11.3117   LearningRate 0.0778   Epoch: 2   Global Step: 9770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:14:02,015-Speed 3379.94 samples/sec   Loss 11.0892   LearningRate 0.0777   Epoch: 2   Global Step: 9780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:14:05,045-Speed 3380.40 samples/sec   Loss 11.2491   LearningRate 0.0777   Epoch: 2   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:08,071-Speed 3384.73 samples/sec   Loss 11.2263   LearningRate 0.0777   Epoch: 2   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:11,098-Speed 3383.55 samples/sec   Loss 10.9890   LearningRate 0.0777   Epoch: 2   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:14,124-Speed 3384.74 samples/sec   Loss 11.2293   LearningRate 0.0777   Epoch: 2   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:17,179-Speed 3353.42 samples/sec   Loss 11.0291   LearningRate 0.0776   Epoch: 2   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:20,208-Speed 3381.01 samples/sec   Loss 11.2762   LearningRate 0.0776   Epoch: 2   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:23,231-Speed 3388.03 samples/sec   Loss 11.1631   LearningRate 0.0776   Epoch: 2   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:26,262-Speed 3379.41 samples/sec   Loss 11.0898   LearningRate 0.0776   Epoch: 2   Global Step: 9860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:29,288-Speed 3385.34 samples/sec   Loss 11.2998   LearningRate 0.0776   Epoch: 2   Global Step: 9870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:32,311-Speed 3387.08 samples/sec   Loss 11.0959   LearningRate 0.0775   Epoch: 2   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:35,340-Speed 3381.65 samples/sec   Loss 11.0481   LearningRate 0.0775   Epoch: 2   Global Step: 9890   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 13:14:38,366-Speed 3385.57 samples/sec   Loss 11.1768   LearningRate 0.0775   Epoch: 2   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:41,398-Speed 3377.74 samples/sec   Loss 11.0855   LearningRate 0.0775   Epoch: 2   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:44,432-Speed 3376.03 samples/sec   Loss 11.2564   LearningRate 0.0774   Epoch: 2   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:47,457-Speed 3385.84 samples/sec   Loss 11.2844   LearningRate 0.0774   Epoch: 2   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:50,493-Speed 3372.80 samples/sec   Loss 11.1667   LearningRate 0.0774   Epoch: 2   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:53,532-Speed 3370.33 samples/sec   Loss 11.1834   LearningRate 0.0774   Epoch: 2   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:56,561-Speed 3381.36 samples/sec   Loss 11.2176   LearningRate 0.0774   Epoch: 2   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:14:59,589-Speed 3383.15 samples/sec   Loss 11.2416   LearningRate 0.0773   Epoch: 2   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:15:02,629-Speed 3369.32 samples/sec   Loss 11.1710   LearningRate 0.0773   Epoch: 2   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:15:05,668-Speed 3370.09 samples/sec   Loss 11.1833   LearningRate 0.0773   Epoch: 2   Global Step: 9990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:15:08,682-Speed 3398.65 samples/sec   Loss 11.2704   LearningRate 0.0773   Epoch: 2   Global Step: 10000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:15:52,471-[lfw][10000]XNorm: 22.183566
Training: 2022-04-26 13:15:52,471-[lfw][10000]Accuracy-Flip: 0.99650+-0.00320
Training: 2022-04-26 13:15:52,472-[lfw][10000]Accuracy-Highest: 0.99650
Training: 2022-04-26 13:16:43,049-[cfp_fp][10000]XNorm: 20.394854
Training: 2022-04-26 13:16:43,050-[cfp_fp][10000]Accuracy-Flip: 0.96686+-0.00903
Training: 2022-04-26 13:16:43,050-[cfp_fp][10000]Accuracy-Highest: 0.96686
Training: 2022-04-26 13:17:26,715-[agedb_30][10000]XNorm: 22.022943
Training: 2022-04-26 13:17:26,716-[agedb_30][10000]Accuracy-Flip: 0.95500+-0.00548
Training: 2022-04-26 13:17:26,716-[agedb_30][10000]Accuracy-Highest: 0.95500
Training: 2022-04-26 13:17:29,743-Speed 72.59 samples/sec   Loss 11.0792   LearningRate 0.0773   Epoch: 2   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:32,761-Speed 3393.51 samples/sec   Loss 11.2478   LearningRate 0.0772   Epoch: 2   Global Step: 10020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:35,788-Speed 3384.42 samples/sec   Loss 11.2129   LearningRate 0.0772   Epoch: 2   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:38,834-Speed 3362.13 samples/sec   Loss 11.1773   LearningRate 0.0772   Epoch: 2   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:41,861-Speed 3383.35 samples/sec   Loss 11.0858   LearningRate 0.0772   Epoch: 2   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:44,887-Speed 3384.61 samples/sec   Loss 11.1438   LearningRate 0.0772   Epoch: 2   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:47,919-Speed 3379.31 samples/sec   Loss 11.2491   LearningRate 0.0771   Epoch: 2   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:50,948-Speed 3381.29 samples/sec   Loss 11.1971   LearningRate 0.0771   Epoch: 2   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:53,977-Speed 3380.86 samples/sec   Loss 11.2327   LearningRate 0.0771   Epoch: 2   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:17:56,995-Speed 3393.77 samples/sec   Loss 11.1493   LearningRate 0.0771   Epoch: 2   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:00,028-Speed 3376.45 samples/sec   Loss 11.2858   LearningRate 0.0770   Epoch: 2   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:03,066-Speed 3372.04 samples/sec   Loss 11.2532   LearningRate 0.0770   Epoch: 2   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:06,111-Speed 3363.04 samples/sec   Loss 11.3662   LearningRate 0.0770   Epoch: 2   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:09,146-Speed 3375.41 samples/sec   Loss 11.2310   LearningRate 0.0770   Epoch: 2   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:12,173-Speed 3383.28 samples/sec   Loss 11.1127   LearningRate 0.0770   Epoch: 2   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:15,204-Speed 3379.02 samples/sec   Loss 11.2268   LearningRate 0.0769   Epoch: 2   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:18,256-Speed 3356.11 samples/sec   Loss 11.2667   LearningRate 0.0769   Epoch: 2   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:21,287-Speed 3379.95 samples/sec   Loss 11.1610   LearningRate 0.0769   Epoch: 2   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:24,316-Speed 3381.46 samples/sec   Loss 11.2025   LearningRate 0.0769   Epoch: 2   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:27,335-Speed 3391.83 samples/sec   Loss 11.1571   LearningRate 0.0769   Epoch: 2   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:30,371-Speed 3373.88 samples/sec   Loss 11.0976   LearningRate 0.0768   Epoch: 2   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:33,397-Speed 3384.90 samples/sec   Loss 11.1579   LearningRate 0.0768   Epoch: 2   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:36,426-Speed 3381.18 samples/sec   Loss 11.3014   LearningRate 0.0768   Epoch: 2   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:39,450-Speed 3386.71 samples/sec   Loss 11.0848   LearningRate 0.0768   Epoch: 2   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:42,483-Speed 3378.10 samples/sec   Loss 11.1192   LearningRate 0.0767   Epoch: 2   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:45,505-Speed 3389.18 samples/sec   Loss 10.9752   LearningRate 0.0767   Epoch: 2   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-26 13:18:48,527-Speed 3388.85 samples/sec   Loss 11.1239   LearningRate 0.0767   Epoch: 2   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:18:51,552-Speed 3385.93 samples/sec   Loss 11.3012   LearningRate 0.0767   Epoch: 2   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:18:54,585-Speed 3376.43 samples/sec   Loss 11.1418   LearningRate 0.0767   Epoch: 2   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:18:57,597-Speed 3400.78 samples/sec   Loss 11.3391   LearningRate 0.0766   Epoch: 2   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:19:00,626-Speed 3381.74 samples/sec   Loss 11.1182   LearningRate 0.0766   Epoch: 2   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:19:03,638-Speed 3399.95 samples/sec   Loss 11.1505   LearningRate 0.0766   Epoch: 2   Global Step: 10320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:06,658-Speed 3391.50 samples/sec   Loss 11.1994   LearningRate 0.0766   Epoch: 2   Global Step: 10330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:09,675-Speed 3395.40 samples/sec   Loss 11.1244   LearningRate 0.0766   Epoch: 2   Global Step: 10340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:12,694-Speed 3392.28 samples/sec   Loss 11.0732   LearningRate 0.0765   Epoch: 2   Global Step: 10350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:15,714-Speed 3391.50 samples/sec   Loss 11.2009   LearningRate 0.0765   Epoch: 2   Global Step: 10360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:18,733-Speed 3392.47 samples/sec   Loss 11.2602   LearningRate 0.0765   Epoch: 2   Global Step: 10370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:21,752-Speed 3393.40 samples/sec   Loss 11.0166   LearningRate 0.0765   Epoch: 2   Global Step: 10380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:24,772-Speed 3390.85 samples/sec   Loss 10.9952   LearningRate 0.0765   Epoch: 2   Global Step: 10390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:27,801-Speed 3382.13 samples/sec   Loss 11.1480   LearningRate 0.0764   Epoch: 2   Global Step: 10400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:30,821-Speed 3390.93 samples/sec   Loss 11.1930   LearningRate 0.0764   Epoch: 2   Global Step: 10410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:33,840-Speed 3392.74 samples/sec   Loss 11.1373   LearningRate 0.0764   Epoch: 2   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:19:36,946-Speed 3297.29 samples/sec   Loss 11.2940   LearningRate 0.0764   Epoch: 2   Global Step: 10430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:19:39,953-Speed 3406.22 samples/sec   Loss 11.2069   LearningRate 0.0763   Epoch: 2   Global Step: 10440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:19:42,974-Speed 3391.17 samples/sec   Loss 10.9394   LearningRate 0.0763   Epoch: 2   Global Step: 10450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:19:45,986-Speed 3399.87 samples/sec   Loss 11.0856   LearningRate 0.0763   Epoch: 2   Global Step: 10460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:19:49,006-Speed 3391.97 samples/sec   Loss 11.1382   LearningRate 0.0763   Epoch: 2   Global Step: 10470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:19:52,046-Speed 3369.15 samples/sec   Loss 11.0960   LearningRate 0.0763   Epoch: 2   Global Step: 10480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:19:55,071-Speed 3385.41 samples/sec   Loss 11.2630   LearningRate 0.0762   Epoch: 2   Global Step: 10490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:19:58,094-Speed 3388.35 samples/sec   Loss 10.9621   LearningRate 0.0762   Epoch: 2   Global Step: 10500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:20:01,116-Speed 3388.67 samples/sec   Loss 11.1698   LearningRate 0.0762   Epoch: 2   Global Step: 10510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:20:04,141-Speed 3386.56 samples/sec   Loss 11.0920   LearningRate 0.0762   Epoch: 2   Global Step: 10520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:20:07,165-Speed 3386.69 samples/sec   Loss 11.1739   LearningRate 0.0762   Epoch: 2   Global Step: 10530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-26 13:20:10,184-Speed 3393.06 samples/sec   Loss 11.0009   LearningRate 0.0761   Epoch: 2   Global Step: 10540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:13,208-Speed 3387.61 samples/sec   Loss 11.0861   LearningRate 0.0761   Epoch: 2   Global Step: 10550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:16,228-Speed 3391.01 samples/sec   Loss 11.0440   LearningRate 0.0761   Epoch: 2   Global Step: 10560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:19,266-Speed 3371.64 samples/sec   Loss 11.1409   LearningRate 0.0761   Epoch: 2   Global Step: 10570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:22,284-Speed 3394.07 samples/sec   Loss 11.1602   LearningRate 0.0761   Epoch: 2   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:25,304-Speed 3390.94 samples/sec   Loss 11.0627   LearningRate 0.0760   Epoch: 2   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:28,323-Speed 3392.08 samples/sec   Loss 11.1262   LearningRate 0.0760   Epoch: 2   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:31,355-Speed 3378.70 samples/sec   Loss 11.2059   LearningRate 0.0760   Epoch: 2   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:34,373-Speed 3393.70 samples/sec   Loss 11.1616   LearningRate 0.0760   Epoch: 2   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:37,398-Speed 3385.26 samples/sec   Loss 11.1862   LearningRate 0.0759   Epoch: 2   Global Step: 10630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:40,423-Speed 3386.33 samples/sec   Loss 11.2533   LearningRate 0.0759   Epoch: 2   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:20:43,474-Speed 3357.10 samples/sec   Loss 10.9939   LearningRate 0.0759   Epoch: 2   Global Step: 10650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:46,491-Speed 3394.85 samples/sec   Loss 10.9625   LearningRate 0.0759   Epoch: 2   Global Step: 10660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:49,512-Speed 3390.57 samples/sec   Loss 11.0918   LearningRate 0.0759   Epoch: 2   Global Step: 10670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:52,532-Speed 3391.17 samples/sec   Loss 11.1399   LearningRate 0.0758   Epoch: 2   Global Step: 10680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:55,558-Speed 3384.65 samples/sec   Loss 11.1445   LearningRate 0.0758   Epoch: 2   Global Step: 10690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:20:58,585-Speed 3383.32 samples/sec   Loss 10.9344   LearningRate 0.0758   Epoch: 2   Global Step: 10700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:01,621-Speed 3374.25 samples/sec   Loss 11.0683   LearningRate 0.0758   Epoch: 2   Global Step: 10710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:04,696-Speed 3330.63 samples/sec   Loss 11.1045   LearningRate 0.0758   Epoch: 2   Global Step: 10720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:07,729-Speed 3376.89 samples/sec   Loss 10.9183   LearningRate 0.0757   Epoch: 2   Global Step: 10730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:10,799-Speed 3336.64 samples/sec   Loss 10.9932   LearningRate 0.0757   Epoch: 2   Global Step: 10740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:13,821-Speed 3388.93 samples/sec   Loss 11.1324   LearningRate 0.0757   Epoch: 2   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:21:16,850-Speed 3381.50 samples/sec   Loss 11.0139   LearningRate 0.0757   Epoch: 2   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:21:19,876-Speed 3384.70 samples/sec   Loss 10.9497   LearningRate 0.0757   Epoch: 2   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:21:22,899-Speed 3388.28 samples/sec   Loss 10.9734   LearningRate 0.0756   Epoch: 2   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:21:25,912-Speed 3399.11 samples/sec   Loss 11.0795   LearningRate 0.0756   Epoch: 2   Global Step: 10790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:28,939-Speed 3382.91 samples/sec   Loss 10.9195   LearningRate 0.0756   Epoch: 2   Global Step: 10800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:31,976-Speed 3372.95 samples/sec   Loss 11.0837   LearningRate 0.0756   Epoch: 2   Global Step: 10810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:35,008-Speed 3378.63 samples/sec   Loss 10.9327   LearningRate 0.0755   Epoch: 2   Global Step: 10820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:38,036-Speed 3381.97 samples/sec   Loss 11.1853   LearningRate 0.0755   Epoch: 2   Global Step: 10830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:41,060-Speed 3387.95 samples/sec   Loss 11.0884   LearningRate 0.0755   Epoch: 2   Global Step: 10840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:44,089-Speed 3380.55 samples/sec   Loss 10.8366   LearningRate 0.0755   Epoch: 2   Global Step: 10850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:47,113-Speed 3387.05 samples/sec   Loss 11.0584   LearningRate 0.0755   Epoch: 2   Global Step: 10860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:50,150-Speed 3372.61 samples/sec   Loss 10.9427   LearningRate 0.0754   Epoch: 2   Global Step: 10870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:53,181-Speed 3378.58 samples/sec   Loss 11.1273   LearningRate 0.0754   Epoch: 2   Global Step: 10880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:21:56,219-Speed 3371.60 samples/sec   Loss 10.9273   LearningRate 0.0754   Epoch: 2   Global Step: 10890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:21:59,257-Speed 3371.60 samples/sec   Loss 10.9489   LearningRate 0.0754   Epoch: 2   Global Step: 10900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:02,288-Speed 3379.57 samples/sec   Loss 11.0027   LearningRate 0.0754   Epoch: 2   Global Step: 10910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:05,342-Speed 3353.56 samples/sec   Loss 10.8797   LearningRate 0.0753   Epoch: 2   Global Step: 10920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:08,378-Speed 3374.22 samples/sec   Loss 11.1078   LearningRate 0.0753   Epoch: 2   Global Step: 10930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:11,404-Speed 3384.34 samples/sec   Loss 10.9847   LearningRate 0.0753   Epoch: 2   Global Step: 10940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:14,438-Speed 3375.74 samples/sec   Loss 10.9906   LearningRate 0.0753   Epoch: 2   Global Step: 10950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:17,467-Speed 3382.20 samples/sec   Loss 10.9752   LearningRate 0.0753   Epoch: 2   Global Step: 10960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:20,508-Speed 3367.50 samples/sec   Loss 11.0490   LearningRate 0.0752   Epoch: 2   Global Step: 10970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:23,553-Speed 3363.89 samples/sec   Loss 10.9155   LearningRate 0.0752   Epoch: 2   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:26,587-Speed 3375.50 samples/sec   Loss 10.8964   LearningRate 0.0752   Epoch: 2   Global Step: 10990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:29,631-Speed 3364.94 samples/sec   Loss 10.9211   LearningRate 0.0752   Epoch: 2   Global Step: 11000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:32,656-Speed 3385.88 samples/sec   Loss 10.9854   LearningRate 0.0751   Epoch: 2   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:35,684-Speed 3382.62 samples/sec   Loss 11.0659   LearningRate 0.0751   Epoch: 2   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:38,722-Speed 3371.45 samples/sec   Loss 10.9508   LearningRate 0.0751   Epoch: 2   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:41,754-Speed 3377.62 samples/sec   Loss 10.9912   LearningRate 0.0751   Epoch: 2   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:44,786-Speed 3379.00 samples/sec   Loss 11.0533   LearningRate 0.0751   Epoch: 2   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:47,814-Speed 3381.56 samples/sec   Loss 10.9882   LearningRate 0.0750   Epoch: 2   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:50,843-Speed 3382.18 samples/sec   Loss 10.9396   LearningRate 0.0750   Epoch: 2   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:53,881-Speed 3370.74 samples/sec   Loss 11.0249   LearningRate 0.0750   Epoch: 2   Global Step: 11080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:56,899-Speed 3393.69 samples/sec   Loss 11.0963   LearningRate 0.0750   Epoch: 2   Global Step: 11090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:22:59,940-Speed 3367.72 samples/sec   Loss 10.8919   LearningRate 0.0750   Epoch: 2   Global Step: 11100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:03,017-Speed 3329.33 samples/sec   Loss 10.8448   LearningRate 0.0749   Epoch: 2   Global Step: 11110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:06,047-Speed 3380.76 samples/sec   Loss 10.9265   LearningRate 0.0749   Epoch: 2   Global Step: 11120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:09,078-Speed 3379.18 samples/sec   Loss 11.0145   LearningRate 0.0749   Epoch: 2   Global Step: 11130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:12,106-Speed 3382.46 samples/sec   Loss 11.0331   LearningRate 0.0749   Epoch: 2   Global Step: 11140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:15,141-Speed 3374.61 samples/sec   Loss 10.9589   LearningRate 0.0749   Epoch: 2   Global Step: 11150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:18,171-Speed 3380.13 samples/sec   Loss 10.9356   LearningRate 0.0748   Epoch: 2   Global Step: 11160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:21,201-Speed 3380.21 samples/sec   Loss 11.0125   LearningRate 0.0748   Epoch: 2   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:24,234-Speed 3376.61 samples/sec   Loss 10.8832   LearningRate 0.0748   Epoch: 2   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:27,276-Speed 3367.91 samples/sec   Loss 10.7516   LearningRate 0.0748   Epoch: 2   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:30,332-Speed 3351.24 samples/sec   Loss 10.8001   LearningRate 0.0747   Epoch: 2   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:33,362-Speed 3380.05 samples/sec   Loss 10.9426   LearningRate 0.0747   Epoch: 2   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:36,388-Speed 3384.46 samples/sec   Loss 10.8889   LearningRate 0.0747   Epoch: 2   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:39,415-Speed 3384.60 samples/sec   Loss 10.9822   LearningRate 0.0747   Epoch: 2   Global Step: 11230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:42,441-Speed 3384.08 samples/sec   Loss 10.8914   LearningRate 0.0747   Epoch: 2   Global Step: 11240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:45,466-Speed 3386.31 samples/sec   Loss 10.8828   LearningRate 0.0746   Epoch: 2   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:48,491-Speed 3385.40 samples/sec   Loss 10.9649   LearningRate 0.0746   Epoch: 2   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:51,523-Speed 3377.77 samples/sec   Loss 10.8318   LearningRate 0.0746   Epoch: 2   Global Step: 11270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:54,547-Speed 3387.63 samples/sec   Loss 10.8207   LearningRate 0.0746   Epoch: 2   Global Step: 11280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:23:57,573-Speed 3384.35 samples/sec   Loss 10.7730   LearningRate 0.0746   Epoch: 2   Global Step: 11290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:00,609-Speed 3373.55 samples/sec   Loss 10.8967   LearningRate 0.0745   Epoch: 2   Global Step: 11300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:03,640-Speed 3380.11 samples/sec   Loss 10.8378   LearningRate 0.0745   Epoch: 2   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:06,664-Speed 3386.53 samples/sec   Loss 10.8217   LearningRate 0.0745   Epoch: 2   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:09,691-Speed 3383.93 samples/sec   Loss 10.8066   LearningRate 0.0745   Epoch: 2   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:12,717-Speed 3385.13 samples/sec   Loss 10.8224   LearningRate 0.0745   Epoch: 2   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:15,747-Speed 3380.36 samples/sec   Loss 10.6292   LearningRate 0.0744   Epoch: 2   Global Step: 11350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:18,781-Speed 3374.79 samples/sec   Loss 10.8184   LearningRate 0.0744   Epoch: 2   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:21,813-Speed 3378.06 samples/sec   Loss 10.8336   LearningRate 0.0744   Epoch: 2   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:24,846-Speed 3377.81 samples/sec   Loss 10.8741   LearningRate 0.0744   Epoch: 2   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:27,875-Speed 3381.57 samples/sec   Loss 11.0284   LearningRate 0.0744   Epoch: 2   Global Step: 11390   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 13:24:30,890-Speed 3397.00 samples/sec   Loss 11.0255   LearningRate 0.0743   Epoch: 2   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:33,929-Speed 3370.17 samples/sec   Loss 10.8135   LearningRate 0.0743   Epoch: 2   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:36,972-Speed 3365.46 samples/sec   Loss 10.8300   LearningRate 0.0743   Epoch: 2   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:40,006-Speed 3376.72 samples/sec   Loss 10.8214   LearningRate 0.0743   Epoch: 2   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:43,037-Speed 3378.37 samples/sec   Loss 10.8018   LearningRate 0.0742   Epoch: 2   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:46,070-Speed 3377.27 samples/sec   Loss 10.8031   LearningRate 0.0742   Epoch: 2   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:49,107-Speed 3372.30 samples/sec   Loss 10.8180   LearningRate 0.0742   Epoch: 2   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:52,205-Speed 3306.31 samples/sec   Loss 10.8570   LearningRate 0.0742   Epoch: 2   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:55,240-Speed 3374.83 samples/sec   Loss 10.7871   LearningRate 0.0742   Epoch: 2   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:24:58,272-Speed 3378.13 samples/sec   Loss 10.6714   LearningRate 0.0741   Epoch: 2   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:01,301-Speed 3381.17 samples/sec   Loss 10.7393   LearningRate 0.0741   Epoch: 2   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:04,337-Speed 3374.23 samples/sec   Loss 10.8118   LearningRate 0.0741   Epoch: 2   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:07,370-Speed 3376.07 samples/sec   Loss 10.7851   LearningRate 0.0741   Epoch: 2   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:10,401-Speed 3379.92 samples/sec   Loss 10.6973   LearningRate 0.0741   Epoch: 2   Global Step: 11530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:13,429-Speed 3382.77 samples/sec   Loss 10.8142   LearningRate 0.0740   Epoch: 2   Global Step: 11540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:16,463-Speed 3375.62 samples/sec   Loss 10.6591   LearningRate 0.0740   Epoch: 2   Global Step: 11550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:19,494-Speed 3379.22 samples/sec   Loss 10.7161   LearningRate 0.0740   Epoch: 2   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:22,523-Speed 3381.01 samples/sec   Loss 10.6784   LearningRate 0.0740   Epoch: 2   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:25,589-Speed 3340.35 samples/sec   Loss 10.7844   LearningRate 0.0740   Epoch: 2   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:28,622-Speed 3377.27 samples/sec   Loss 10.6950   LearningRate 0.0739   Epoch: 2   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:25:31,645-Speed 3388.88 samples/sec   Loss 10.8567   LearningRate 0.0739   Epoch: 2   Global Step: 11600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:34,677-Speed 3377.48 samples/sec   Loss 10.6941   LearningRate 0.0739   Epoch: 2   Global Step: 11610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:37,707-Speed 3380.58 samples/sec   Loss 10.7122   LearningRate 0.0739   Epoch: 2   Global Step: 11620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:40,736-Speed 3380.82 samples/sec   Loss 10.9347   LearningRate 0.0739   Epoch: 2   Global Step: 11630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:43,766-Speed 3380.56 samples/sec   Loss 10.9281   LearningRate 0.0738   Epoch: 2   Global Step: 11640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:46,800-Speed 3375.39 samples/sec   Loss 10.6932   LearningRate 0.0738   Epoch: 2   Global Step: 11650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:49,843-Speed 3366.76 samples/sec   Loss 10.8438   LearningRate 0.0738   Epoch: 2   Global Step: 11660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:52,871-Speed 3381.58 samples/sec   Loss 10.7578   LearningRate 0.0738   Epoch: 2   Global Step: 11670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:55,905-Speed 3376.54 samples/sec   Loss 10.6948   LearningRate 0.0737   Epoch: 2   Global Step: 11680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:25:58,939-Speed 3375.07 samples/sec   Loss 10.6600   LearningRate 0.0737   Epoch: 2   Global Step: 11690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:26:01,972-Speed 3378.18 samples/sec   Loss 10.7090   LearningRate 0.0737   Epoch: 2   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:05,004-Speed 3377.97 samples/sec   Loss 10.7590   LearningRate 0.0737   Epoch: 2   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:08,035-Speed 3378.19 samples/sec   Loss 10.8203   LearningRate 0.0737   Epoch: 2   Global Step: 11720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:11,067-Speed 3378.43 samples/sec   Loss 10.7871   LearningRate 0.0736   Epoch: 2   Global Step: 11730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:14,101-Speed 3376.04 samples/sec   Loss 10.6758   LearningRate 0.0736   Epoch: 2   Global Step: 11740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:17,171-Speed 3336.50 samples/sec   Loss 10.7151   LearningRate 0.0736   Epoch: 2   Global Step: 11750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:20,201-Speed 3379.73 samples/sec   Loss 10.8002   LearningRate 0.0736   Epoch: 2   Global Step: 11760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:23,232-Speed 3379.10 samples/sec   Loss 10.7157   LearningRate 0.0736   Epoch: 2   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:26,277-Speed 3363.89 samples/sec   Loss 10.8084   LearningRate 0.0735   Epoch: 2   Global Step: 11780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:29,316-Speed 3370.69 samples/sec   Loss 10.9313   LearningRate 0.0735   Epoch: 2   Global Step: 11790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:32,341-Speed 3385.95 samples/sec   Loss 10.7081   LearningRate 0.0735   Epoch: 2   Global Step: 11800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:35,374-Speed 3376.74 samples/sec   Loss 10.7126   LearningRate 0.0735   Epoch: 2   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:38,435-Speed 3346.68 samples/sec   Loss 10.7342   LearningRate 0.0735   Epoch: 2   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:41,468-Speed 3375.96 samples/sec   Loss 10.6919   LearningRate 0.0734   Epoch: 2   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:44,501-Speed 3377.64 samples/sec   Loss 10.6116   LearningRate 0.0734   Epoch: 2   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:47,576-Speed 3330.25 samples/sec   Loss 10.7857   LearningRate 0.0734   Epoch: 2   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:50,608-Speed 3378.45 samples/sec   Loss 10.6714   LearningRate 0.0734   Epoch: 2   Global Step: 11860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:53,645-Speed 3372.70 samples/sec   Loss 10.7040   LearningRate 0.0734   Epoch: 2   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:56,711-Speed 3340.04 samples/sec   Loss 10.6977   LearningRate 0.0733   Epoch: 2   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:26:59,747-Speed 3373.62 samples/sec   Loss 10.6321   LearningRate 0.0733   Epoch: 2   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:02,778-Speed 3380.42 samples/sec   Loss 10.6520   LearningRate 0.0733   Epoch: 2   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:05,804-Speed 3384.60 samples/sec   Loss 10.7741   LearningRate 0.0733   Epoch: 2   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:08,848-Speed 3363.89 samples/sec   Loss 10.6359   LearningRate 0.0733   Epoch: 2   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:11,880-Speed 3378.61 samples/sec   Loss 10.7210   LearningRate 0.0732   Epoch: 2   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:14,911-Speed 3379.64 samples/sec   Loss 10.6258   LearningRate 0.0732   Epoch: 2   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:17,941-Speed 3379.91 samples/sec   Loss 10.6800   LearningRate 0.0732   Epoch: 2   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:20,970-Speed 3381.55 samples/sec   Loss 10.7682   LearningRate 0.0732   Epoch: 2   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:24,004-Speed 3376.07 samples/sec   Loss 10.7517   LearningRate 0.0731   Epoch: 2   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:27,037-Speed 3376.49 samples/sec   Loss 10.4884   LearningRate 0.0731   Epoch: 2   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:30,066-Speed 3382.55 samples/sec   Loss 10.7147   LearningRate 0.0731   Epoch: 2   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:27:33,095-Speed 3381.00 samples/sec   Loss 10.5223   LearningRate 0.0731   Epoch: 2   Global Step: 12000   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 13:28:16,713-[lfw][12000]XNorm: 25.883696
Training: 2022-04-26 13:28:16,714-[lfw][12000]Accuracy-Flip: 0.99550+-0.00211
Training: 2022-04-26 13:28:16,714-[lfw][12000]Accuracy-Highest: 0.99650
Training: 2022-04-26 13:29:07,755-[cfp_fp][12000]XNorm: 24.232434
Training: 2022-04-26 13:29:07,756-[cfp_fp][12000]Accuracy-Flip: 0.97271+-0.00611
Training: 2022-04-26 13:29:07,756-[cfp_fp][12000]Accuracy-Highest: 0.97271
Training: 2022-04-26 13:29:51,261-[agedb_30][12000]XNorm: 26.004011
Training: 2022-04-26 13:29:51,262-[agedb_30][12000]Accuracy-Flip: 0.95517+-0.01012
Training: 2022-04-26 13:29:51,262-[agedb_30][12000]Accuracy-Highest: 0.95517
Training: 2022-04-26 13:29:54,285-Speed 72.53 samples/sec   Loss 10.7288   LearningRate 0.0731   Epoch: 2   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:29:57,314-Speed 3381.51 samples/sec   Loss 10.5935   LearningRate 0.0730   Epoch: 2   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:00,335-Speed 3390.02 samples/sec   Loss 10.4933   LearningRate 0.0730   Epoch: 2   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:03,358-Speed 3388.04 samples/sec   Loss 10.5115   LearningRate 0.0730   Epoch: 2   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:06,380-Speed 3389.76 samples/sec   Loss 10.5839   LearningRate 0.0730   Epoch: 2   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:09,404-Speed 3386.28 samples/sec   Loss 10.6321   LearningRate 0.0730   Epoch: 2   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:12,427-Speed 3388.48 samples/sec   Loss 10.6591   LearningRate 0.0729   Epoch: 2   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:15,455-Speed 3382.72 samples/sec   Loss 10.4745   LearningRate 0.0729   Epoch: 2   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:18,484-Speed 3381.18 samples/sec   Loss 10.5435   LearningRate 0.0729   Epoch: 2   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:21,514-Speed 3381.44 samples/sec   Loss 10.6722   LearningRate 0.0729   Epoch: 2   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:24,539-Speed 3385.47 samples/sec   Loss 10.4674   LearningRate 0.0729   Epoch: 2   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:27,574-Speed 3374.79 samples/sec   Loss 10.5005   LearningRate 0.0728   Epoch: 2   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:30,620-Speed 3361.79 samples/sec   Loss 10.6348   LearningRate 0.0728   Epoch: 2   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:33,662-Speed 3367.09 samples/sec   Loss 10.5349   LearningRate 0.0728   Epoch: 2   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:36,704-Speed 3366.85 samples/sec   Loss 10.5904   LearningRate 0.0728   Epoch: 2   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:39,745-Speed 3368.81 samples/sec   Loss 10.5096   LearningRate 0.0728   Epoch: 2   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:42,796-Speed 3356.41 samples/sec   Loss 10.5747   LearningRate 0.0727   Epoch: 2   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:45,840-Speed 3365.71 samples/sec   Loss 10.3887   LearningRate 0.0727   Epoch: 2   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:48,884-Speed 3364.67 samples/sec   Loss 10.7392   LearningRate 0.0727   Epoch: 2   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:51,928-Speed 3364.30 samples/sec   Loss 10.5055   LearningRate 0.0727   Epoch: 2   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:54,968-Speed 3368.89 samples/sec   Loss 10.5913   LearningRate 0.0727   Epoch: 2   Global Step: 12210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:30:58,025-Speed 3350.88 samples/sec   Loss 10.5632   LearningRate 0.0726   Epoch: 2   Global Step: 12220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:01,077-Speed 3355.52 samples/sec   Loss 10.5907   LearningRate 0.0726   Epoch: 2   Global Step: 12230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:04,132-Speed 3352.62 samples/sec   Loss 10.5490   LearningRate 0.0726   Epoch: 2   Global Step: 12240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:07,171-Speed 3369.87 samples/sec   Loss 10.5792   LearningRate 0.0726   Epoch: 2   Global Step: 12250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:10,207-Speed 3373.86 samples/sec   Loss 10.7781   LearningRate 0.0725   Epoch: 2   Global Step: 12260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:13,245-Speed 3372.22 samples/sec   Loss 10.6140   LearningRate 0.0725   Epoch: 2   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:16,287-Speed 3367.36 samples/sec   Loss 10.4466   LearningRate 0.0725   Epoch: 2   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:19,316-Speed 3380.92 samples/sec   Loss 10.5165   LearningRate 0.0725   Epoch: 2   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:22,344-Speed 3382.29 samples/sec   Loss 10.4572   LearningRate 0.0725   Epoch: 2   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:25,379-Speed 3374.46 samples/sec   Loss 10.7801   LearningRate 0.0724   Epoch: 2   Global Step: 12310   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 13:31:28,398-Speed 3392.71 samples/sec   Loss 10.5189   LearningRate 0.0724   Epoch: 2   Global Step: 12320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:31,427-Speed 3381.10 samples/sec   Loss 10.5845   LearningRate 0.0724   Epoch: 2   Global Step: 12330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:34,456-Speed 3381.75 samples/sec   Loss 10.5315   LearningRate 0.0724   Epoch: 2   Global Step: 12340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:37,487-Speed 3379.13 samples/sec   Loss 10.5475   LearningRate 0.0724   Epoch: 2   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:40,510-Speed 3388.34 samples/sec   Loss 10.6493   LearningRate 0.0723   Epoch: 2   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:43,536-Speed 3384.46 samples/sec   Loss 10.5129   LearningRate 0.0723   Epoch: 2   Global Step: 12370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:46,560-Speed 3387.78 samples/sec   Loss 10.3060   LearningRate 0.0723   Epoch: 2   Global Step: 12380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:49,625-Speed 3342.47 samples/sec   Loss 10.5354   LearningRate 0.0723   Epoch: 2   Global Step: 12390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:31:52,701-Speed 3329.50 samples/sec   Loss 10.4084   LearningRate 0.0723   Epoch: 2   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:05,625-Speed 792.37 samples/sec   Loss 9.5110   LearningRate 0.0722   Epoch: 3   Global Step: 12410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:08,810-Speed 3216.73 samples/sec   Loss 8.5619   LearningRate 0.0722   Epoch: 3   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:11,841-Speed 3378.70 samples/sec   Loss 8.7161   LearningRate 0.0722   Epoch: 3   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:14,856-Speed 3398.10 samples/sec   Loss 8.6148   LearningRate 0.0722   Epoch: 3   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:17,870-Speed 3397.64 samples/sec   Loss 8.5481   LearningRate 0.0722   Epoch: 3   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:20,886-Speed 3396.74 samples/sec   Loss 8.6392   LearningRate 0.0721   Epoch: 3   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:23,921-Speed 3374.20 samples/sec   Loss 8.6182   LearningRate 0.0721   Epoch: 3   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:26,943-Speed 3389.09 samples/sec   Loss 8.6495   LearningRate 0.0721   Epoch: 3   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:29,969-Speed 3385.38 samples/sec   Loss 8.6205   LearningRate 0.0721   Epoch: 3   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:32,980-Speed 3401.80 samples/sec   Loss 8.6196   LearningRate 0.0721   Epoch: 3   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:36,011-Speed 3379.16 samples/sec   Loss 8.6122   LearningRate 0.0720   Epoch: 3   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:39,025-Speed 3397.63 samples/sec   Loss 8.6652   LearningRate 0.0720   Epoch: 3   Global Step: 12520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:42,070-Speed 3363.70 samples/sec   Loss 8.6758   LearningRate 0.0720   Epoch: 3   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:45,155-Speed 3320.14 samples/sec   Loss 8.8396   LearningRate 0.0720   Epoch: 3   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:48,182-Speed 3383.73 samples/sec   Loss 8.6819   LearningRate 0.0720   Epoch: 3   Global Step: 12550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:51,202-Speed 3392.09 samples/sec   Loss 8.8352   LearningRate 0.0719   Epoch: 3   Global Step: 12560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:54,245-Speed 3365.40 samples/sec   Loss 8.6674   LearningRate 0.0719   Epoch: 3   Global Step: 12570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:32:57,268-Speed 3387.56 samples/sec   Loss 8.8303   LearningRate 0.0719   Epoch: 3   Global Step: 12580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:00,281-Speed 3399.33 samples/sec   Loss 8.7658   LearningRate 0.0719   Epoch: 3   Global Step: 12590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:03,300-Speed 3393.34 samples/sec   Loss 8.9672   LearningRate 0.0718   Epoch: 3   Global Step: 12600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:06,339-Speed 3370.19 samples/sec   Loss 8.7982   LearningRate 0.0718   Epoch: 3   Global Step: 12610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:09,357-Speed 3392.91 samples/sec   Loss 8.7839   LearningRate 0.0718   Epoch: 3   Global Step: 12620   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 13:33:12,393-Speed 3373.96 samples/sec   Loss 8.8436   LearningRate 0.0718   Epoch: 3   Global Step: 12630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:15,421-Speed 3383.68 samples/sec   Loss 8.9863   LearningRate 0.0718   Epoch: 3   Global Step: 12640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:18,437-Speed 3395.24 samples/sec   Loss 8.7883   LearningRate 0.0717   Epoch: 3   Global Step: 12650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:21,449-Speed 3400.69 samples/sec   Loss 8.8357   LearningRate 0.0717   Epoch: 3   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:24,464-Speed 3397.67 samples/sec   Loss 8.9413   LearningRate 0.0717   Epoch: 3   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:27,479-Speed 3397.15 samples/sec   Loss 8.9263   LearningRate 0.0717   Epoch: 3   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:30,491-Speed 3400.05 samples/sec   Loss 9.0467   LearningRate 0.0717   Epoch: 3   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:33,523-Speed 3377.79 samples/sec   Loss 9.0044   LearningRate 0.0716   Epoch: 3   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:36,538-Speed 3397.35 samples/sec   Loss 9.0289   LearningRate 0.0716   Epoch: 3   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:39,554-Speed 3395.93 samples/sec   Loss 9.0118   LearningRate 0.0716   Epoch: 3   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:42,553-Speed 3415.70 samples/sec   Loss 8.9570   LearningRate 0.0716   Epoch: 3   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:45,569-Speed 3396.42 samples/sec   Loss 9.0138   LearningRate 0.0716   Epoch: 3   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:48,584-Speed 3396.63 samples/sec   Loss 9.0317   LearningRate 0.0715   Epoch: 3   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:51,592-Speed 3406.07 samples/sec   Loss 9.1071   LearningRate 0.0715   Epoch: 3   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:54,605-Speed 3398.55 samples/sec   Loss 9.1157   LearningRate 0.0715   Epoch: 3   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:33:57,615-Speed 3403.24 samples/sec   Loss 9.0508   LearningRate 0.0715   Epoch: 3   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:00,625-Speed 3402.12 samples/sec   Loss 9.2083   LearningRate 0.0715   Epoch: 3   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:03,633-Speed 3404.89 samples/sec   Loss 9.2684   LearningRate 0.0714   Epoch: 3   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:06,660-Speed 3384.46 samples/sec   Loss 9.0977   LearningRate 0.0714   Epoch: 3   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:09,667-Speed 3405.95 samples/sec   Loss 9.1058   LearningRate 0.0714   Epoch: 3   Global Step: 12820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:12,667-Speed 3414.05 samples/sec   Loss 9.1552   LearningRate 0.0714   Epoch: 3   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:15,717-Speed 3358.60 samples/sec   Loss 9.1814   LearningRate 0.0714   Epoch: 3   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:18,739-Speed 3388.82 samples/sec   Loss 9.2304   LearningRate 0.0713   Epoch: 3   Global Step: 12850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:21,740-Speed 3413.30 samples/sec   Loss 9.1339   LearningRate 0.0713   Epoch: 3   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:24,758-Speed 3393.26 samples/sec   Loss 9.0948   LearningRate 0.0713   Epoch: 3   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:27,771-Speed 3400.00 samples/sec   Loss 9.1794   LearningRate 0.0713   Epoch: 3   Global Step: 12880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:30,775-Speed 3409.38 samples/sec   Loss 9.3330   LearningRate 0.0713   Epoch: 3   Global Step: 12890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:34:33,771-Speed 3418.59 samples/sec   Loss 9.2994   LearningRate 0.0712   Epoch: 3   Global Step: 12900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:36,778-Speed 3406.16 samples/sec   Loss 9.2722   LearningRate 0.0712   Epoch: 3   Global Step: 12910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:39,825-Speed 3361.91 samples/sec   Loss 9.3139   LearningRate 0.0712   Epoch: 3   Global Step: 12920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:42,834-Speed 3402.87 samples/sec   Loss 9.2975   LearningRate 0.0712   Epoch: 3   Global Step: 12930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:45,844-Speed 3403.18 samples/sec   Loss 9.3664   LearningRate 0.0712   Epoch: 3   Global Step: 12940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:48,850-Speed 3407.35 samples/sec   Loss 9.3811   LearningRate 0.0711   Epoch: 3   Global Step: 12950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:51,860-Speed 3402.38 samples/sec   Loss 9.3857   LearningRate 0.0711   Epoch: 3   Global Step: 12960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:54,879-Speed 3393.17 samples/sec   Loss 9.3339   LearningRate 0.0711   Epoch: 3   Global Step: 12970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:34:57,904-Speed 3386.05 samples/sec   Loss 9.2608   LearningRate 0.0711   Epoch: 3   Global Step: 12980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:35:00,921-Speed 3394.15 samples/sec   Loss 9.4265   LearningRate 0.0711   Epoch: 3   Global Step: 12990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:35:03,980-Speed 3348.57 samples/sec   Loss 9.3616   LearningRate 0.0710   Epoch: 3   Global Step: 13000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:06,993-Speed 3399.25 samples/sec   Loss 9.4580   LearningRate 0.0710   Epoch: 3   Global Step: 13010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:09,999-Speed 3407.64 samples/sec   Loss 9.3346   LearningRate 0.0710   Epoch: 3   Global Step: 13020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:13,007-Speed 3405.44 samples/sec   Loss 9.3912   LearningRate 0.0710   Epoch: 3   Global Step: 13030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:16,018-Speed 3400.91 samples/sec   Loss 9.4692   LearningRate 0.0710   Epoch: 3   Global Step: 13040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:19,028-Speed 3402.91 samples/sec   Loss 9.5029   LearningRate 0.0709   Epoch: 3   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:22,034-Speed 3407.05 samples/sec   Loss 9.5709   LearningRate 0.0709   Epoch: 3   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:25,094-Speed 3347.47 samples/sec   Loss 9.4850   LearningRate 0.0709   Epoch: 3   Global Step: 13070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:28,109-Speed 3397.12 samples/sec   Loss 9.5379   LearningRate 0.0709   Epoch: 3   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:31,132-Speed 3387.86 samples/sec   Loss 9.4786   LearningRate 0.0708   Epoch: 3   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:34,149-Speed 3396.07 samples/sec   Loss 9.4915   LearningRate 0.0708   Epoch: 3   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:37,161-Speed 3400.54 samples/sec   Loss 9.4984   LearningRate 0.0708   Epoch: 3   Global Step: 13110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:40,169-Speed 3403.78 samples/sec   Loss 9.4326   LearningRate 0.0708   Epoch: 3   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:43,176-Speed 3406.65 samples/sec   Loss 9.4788   LearningRate 0.0708   Epoch: 3   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:46,185-Speed 3403.74 samples/sec   Loss 9.5119   LearningRate 0.0707   Epoch: 3   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:49,197-Speed 3400.15 samples/sec   Loss 9.3659   LearningRate 0.0707   Epoch: 3   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:52,213-Speed 3395.96 samples/sec   Loss 9.6224   LearningRate 0.0707   Epoch: 3   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:55,231-Speed 3393.87 samples/sec   Loss 9.6067   LearningRate 0.0707   Epoch: 3   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:35:58,255-Speed 3387.99 samples/sec   Loss 9.6086   LearningRate 0.0707   Epoch: 3   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:01,264-Speed 3403.34 samples/sec   Loss 9.5971   LearningRate 0.0706   Epoch: 3   Global Step: 13190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:04,269-Speed 3408.50 samples/sec   Loss 9.6285   LearningRate 0.0706   Epoch: 3   Global Step: 13200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:07,280-Speed 3401.65 samples/sec   Loss 9.6701   LearningRate 0.0706   Epoch: 3   Global Step: 13210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:10,288-Speed 3404.88 samples/sec   Loss 9.5364   LearningRate 0.0706   Epoch: 3   Global Step: 13220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:13,305-Speed 3394.59 samples/sec   Loss 9.4616   LearningRate 0.0706   Epoch: 3   Global Step: 13230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:16,320-Speed 3397.29 samples/sec   Loss 9.5787   LearningRate 0.0705   Epoch: 3   Global Step: 13240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:19,331-Speed 3401.97 samples/sec   Loss 9.6208   LearningRate 0.0705   Epoch: 3   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:22,352-Speed 3390.67 samples/sec   Loss 9.6426   LearningRate 0.0705   Epoch: 3   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:25,371-Speed 3392.28 samples/sec   Loss 9.5297   LearningRate 0.0705   Epoch: 3   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:28,391-Speed 3391.29 samples/sec   Loss 9.5564   LearningRate 0.0705   Epoch: 3   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:31,406-Speed 3397.29 samples/sec   Loss 9.6584   LearningRate 0.0704   Epoch: 3   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:34,409-Speed 3410.61 samples/sec   Loss 9.5221   LearningRate 0.0704   Epoch: 3   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:37,429-Speed 3391.46 samples/sec   Loss 9.7093   LearningRate 0.0704   Epoch: 3   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:40,452-Speed 3388.75 samples/sec   Loss 9.5986   LearningRate 0.0704   Epoch: 3   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:43,470-Speed 3393.64 samples/sec   Loss 9.6363   LearningRate 0.0704   Epoch: 3   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:46,482-Speed 3399.99 samples/sec   Loss 9.4652   LearningRate 0.0703   Epoch: 3   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:49,492-Speed 3402.91 samples/sec   Loss 9.6660   LearningRate 0.0703   Epoch: 3   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:52,518-Speed 3384.73 samples/sec   Loss 9.6681   LearningRate 0.0703   Epoch: 3   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:55,533-Speed 3397.64 samples/sec   Loss 9.7873   LearningRate 0.0703   Epoch: 3   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:36:58,548-Speed 3397.24 samples/sec   Loss 9.6026   LearningRate 0.0703   Epoch: 3   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:37:01,564-Speed 3396.26 samples/sec   Loss 9.6915   LearningRate 0.0702   Epoch: 3   Global Step: 13390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:37:04,571-Speed 3405.63 samples/sec   Loss 9.7446   LearningRate 0.0702   Epoch: 3   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:37:07,578-Speed 3406.23 samples/sec   Loss 9.6449   LearningRate 0.0702   Epoch: 3   Global Step: 13410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:10,588-Speed 3403.02 samples/sec   Loss 9.7566   LearningRate 0.0702   Epoch: 3   Global Step: 13420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:13,621-Speed 3376.78 samples/sec   Loss 9.8091   LearningRate 0.0702   Epoch: 3   Global Step: 13430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:16,633-Speed 3400.52 samples/sec   Loss 9.6599   LearningRate 0.0701   Epoch: 3   Global Step: 13440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:19,658-Speed 3385.22 samples/sec   Loss 9.7278   LearningRate 0.0701   Epoch: 3   Global Step: 13450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:22,670-Speed 3400.18 samples/sec   Loss 9.9145   LearningRate 0.0701   Epoch: 3   Global Step: 13460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:25,683-Speed 3399.72 samples/sec   Loss 9.6234   LearningRate 0.0701   Epoch: 3   Global Step: 13470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:28,705-Speed 3390.19 samples/sec   Loss 9.6719   LearningRate 0.0701   Epoch: 3   Global Step: 13480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:31,718-Speed 3398.93 samples/sec   Loss 9.7099   LearningRate 0.0700   Epoch: 3   Global Step: 13490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:34,731-Speed 3399.84 samples/sec   Loss 9.7130   LearningRate 0.0700   Epoch: 3   Global Step: 13500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:37,744-Speed 3399.01 samples/sec   Loss 9.7430   LearningRate 0.0700   Epoch: 3   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:37:40,758-Speed 3398.05 samples/sec   Loss 9.6841   LearningRate 0.0700   Epoch: 3   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:37:43,762-Speed 3409.20 samples/sec   Loss 9.6742   LearningRate 0.0700   Epoch: 3   Global Step: 13530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:46,779-Speed 3394.77 samples/sec   Loss 9.7547   LearningRate 0.0699   Epoch: 3   Global Step: 13540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:49,798-Speed 3392.39 samples/sec   Loss 9.6887   LearningRate 0.0699   Epoch: 3   Global Step: 13550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:52,821-Speed 3388.73 samples/sec   Loss 9.8210   LearningRate 0.0699   Epoch: 3   Global Step: 13560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:55,836-Speed 3397.45 samples/sec   Loss 9.7950   LearningRate 0.0699   Epoch: 3   Global Step: 13570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:37:58,862-Speed 3384.50 samples/sec   Loss 9.8441   LearningRate 0.0699   Epoch: 3   Global Step: 13580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:38:01,883-Speed 3390.80 samples/sec   Loss 9.7730   LearningRate 0.0698   Epoch: 3   Global Step: 13590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:38:04,900-Speed 3394.72 samples/sec   Loss 9.8139   LearningRate 0.0698   Epoch: 3   Global Step: 13600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:38:07,914-Speed 3398.17 samples/sec   Loss 9.6906   LearningRate 0.0698   Epoch: 3   Global Step: 13610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:38:10,928-Speed 3397.82 samples/sec   Loss 9.8175   LearningRate 0.0698   Epoch: 3   Global Step: 13620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:38:13,978-Speed 3358.43 samples/sec   Loss 9.6516   LearningRate 0.0698   Epoch: 3   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:17,003-Speed 3385.75 samples/sec   Loss 9.8625   LearningRate 0.0697   Epoch: 3   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:20,020-Speed 3395.05 samples/sec   Loss 9.9136   LearningRate 0.0697   Epoch: 3   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:23,041-Speed 3390.52 samples/sec   Loss 9.7395   LearningRate 0.0697   Epoch: 3   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:26,065-Speed 3387.42 samples/sec   Loss 9.8863   LearningRate 0.0697   Epoch: 3   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:29,086-Speed 3389.93 samples/sec   Loss 9.7511   LearningRate 0.0697   Epoch: 3   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:32,103-Speed 3395.33 samples/sec   Loss 9.8632   LearningRate 0.0696   Epoch: 3   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:35,128-Speed 3385.77 samples/sec   Loss 9.6979   LearningRate 0.0696   Epoch: 3   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:38,140-Speed 3400.51 samples/sec   Loss 9.7828   LearningRate 0.0696   Epoch: 3   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:41,155-Speed 3396.44 samples/sec   Loss 9.7028   LearningRate 0.0696   Epoch: 3   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:44,164-Speed 3403.88 samples/sec   Loss 9.6986   LearningRate 0.0696   Epoch: 3   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:47,197-Speed 3377.21 samples/sec   Loss 9.7479   LearningRate 0.0695   Epoch: 3   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:50,221-Speed 3387.56 samples/sec   Loss 9.7235   LearningRate 0.0695   Epoch: 3   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:53,246-Speed 3384.88 samples/sec   Loss 9.8933   LearningRate 0.0695   Epoch: 3   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:56,270-Speed 3387.45 samples/sec   Loss 9.9412   LearningRate 0.0695   Epoch: 3   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:38:59,290-Speed 3391.44 samples/sec   Loss 9.7257   LearningRate 0.0695   Epoch: 3   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:02,312-Speed 3390.12 samples/sec   Loss 9.7387   LearningRate 0.0694   Epoch: 3   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:05,336-Speed 3386.42 samples/sec   Loss 9.8023   LearningRate 0.0694   Epoch: 3   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:08,370-Speed 3376.24 samples/sec   Loss 9.9232   LearningRate 0.0694   Epoch: 3   Global Step: 13810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:11,390-Speed 3391.37 samples/sec   Loss 9.9139   LearningRate 0.0694   Epoch: 3   Global Step: 13820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:14,405-Speed 3396.54 samples/sec   Loss 9.8397   LearningRate 0.0694   Epoch: 3   Global Step: 13830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:17,411-Speed 3407.21 samples/sec   Loss 9.8778   LearningRate 0.0693   Epoch: 3   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:20,427-Speed 3396.35 samples/sec   Loss 9.8052   LearningRate 0.0693   Epoch: 3   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:23,454-Speed 3383.65 samples/sec   Loss 10.0055   LearningRate 0.0693   Epoch: 3   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:26,496-Speed 3367.02 samples/sec   Loss 9.7629   LearningRate 0.0693   Epoch: 3   Global Step: 13870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:29,527-Speed 3380.01 samples/sec   Loss 9.8834   LearningRate 0.0692   Epoch: 3   Global Step: 13880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:32,547-Speed 3390.52 samples/sec   Loss 9.7788   LearningRate 0.0692   Epoch: 3   Global Step: 13890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:35,566-Speed 3392.83 samples/sec   Loss 9.8402   LearningRate 0.0692   Epoch: 3   Global Step: 13900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:38,585-Speed 3393.09 samples/sec   Loss 9.8211   LearningRate 0.0692   Epoch: 3   Global Step: 13910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:41,602-Speed 3394.51 samples/sec   Loss 9.8150   LearningRate 0.0692   Epoch: 3   Global Step: 13920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:44,623-Speed 3390.20 samples/sec   Loss 9.8504   LearningRate 0.0691   Epoch: 3   Global Step: 13930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:39:47,660-Speed 3372.28 samples/sec   Loss 9.8185   LearningRate 0.0691   Epoch: 3   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:50,706-Speed 3362.22 samples/sec   Loss 9.8572   LearningRate 0.0691   Epoch: 3   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:53,787-Speed 3324.44 samples/sec   Loss 9.9088   LearningRate 0.0691   Epoch: 3   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:56,804-Speed 3394.97 samples/sec   Loss 9.7204   LearningRate 0.0691   Epoch: 3   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:39:59,822-Speed 3393.82 samples/sec   Loss 9.8287   LearningRate 0.0690   Epoch: 3   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:40:02,852-Speed 3381.31 samples/sec   Loss 9.7189   LearningRate 0.0690   Epoch: 3   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:40:05,869-Speed 3394.18 samples/sec   Loss 9.8940   LearningRate 0.0690   Epoch: 3   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:40:49,421-[lfw][14000]XNorm: 23.042746
Training: 2022-04-26 13:40:49,421-[lfw][14000]Accuracy-Flip: 0.99483+-0.00361
Training: 2022-04-26 13:40:49,422-[lfw][14000]Accuracy-Highest: 0.99650
Training: 2022-04-26 13:41:40,271-[cfp_fp][14000]XNorm: 21.683929
Training: 2022-04-26 13:41:40,272-[cfp_fp][14000]Accuracy-Flip: 0.97643+-0.00674
Training: 2022-04-26 13:41:40,272-[cfp_fp][14000]Accuracy-Highest: 0.97643
Training: 2022-04-26 13:42:23,982-[agedb_30][14000]XNorm: 23.173297
Training: 2022-04-26 13:42:23,983-[agedb_30][14000]Accuracy-Flip: 0.96050+-0.00882
Training: 2022-04-26 13:42:23,983-[agedb_30][14000]Accuracy-Highest: 0.96050
Training: 2022-04-26 13:42:27,016-Speed 72.55 samples/sec   Loss 10.0276   LearningRate 0.0690   Epoch: 3   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:42:30,024-Speed 3405.11 samples/sec   Loss 9.8891   LearningRate 0.0690   Epoch: 3   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:42:33,041-Speed 3394.42 samples/sec   Loss 9.8351   LearningRate 0.0689   Epoch: 3   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:42:36,045-Speed 3410.17 samples/sec   Loss 9.8520   LearningRate 0.0689   Epoch: 3   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:42:39,061-Speed 3396.07 samples/sec   Loss 9.8053   LearningRate 0.0689   Epoch: 3   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:42:42,080-Speed 3392.98 samples/sec   Loss 9.7468   LearningRate 0.0689   Epoch: 3   Global Step: 14060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:42:45,094-Speed 3398.30 samples/sec   Loss 9.8270   LearningRate 0.0689   Epoch: 3   Global Step: 14070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:42:48,119-Speed 3385.92 samples/sec   Loss 9.7930   LearningRate 0.0688   Epoch: 3   Global Step: 14080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:42:51,138-Speed 3392.00 samples/sec   Loss 9.9710   LearningRate 0.0688   Epoch: 3   Global Step: 14090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:42:54,156-Speed 3394.22 samples/sec   Loss 9.7946   LearningRate 0.0688   Epoch: 3   Global Step: 14100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:42:57,180-Speed 3386.67 samples/sec   Loss 9.8610   LearningRate 0.0688   Epoch: 3   Global Step: 14110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:43:00,212-Speed 3378.48 samples/sec   Loss 9.8134   LearningRate 0.0688   Epoch: 3   Global Step: 14120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:43:03,233-Speed 3389.20 samples/sec   Loss 10.0371   LearningRate 0.0687   Epoch: 3   Global Step: 14130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:43:06,252-Speed 3393.51 samples/sec   Loss 9.9370   LearningRate 0.0687   Epoch: 3   Global Step: 14140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:43:09,273-Speed 3391.03 samples/sec   Loss 9.9156   LearningRate 0.0687   Epoch: 3   Global Step: 14150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:43:12,293-Speed 3390.73 samples/sec   Loss 9.8674   LearningRate 0.0687   Epoch: 3   Global Step: 14160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:15,314-Speed 3390.69 samples/sec   Loss 9.9540   LearningRate 0.0687   Epoch: 3   Global Step: 14170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:18,341-Speed 3383.13 samples/sec   Loss 10.0559   LearningRate 0.0686   Epoch: 3   Global Step: 14180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:21,365-Speed 3387.73 samples/sec   Loss 9.6873   LearningRate 0.0686   Epoch: 3   Global Step: 14190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:24,401-Speed 3373.21 samples/sec   Loss 9.7371   LearningRate 0.0686   Epoch: 3   Global Step: 14200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:27,432-Speed 3378.60 samples/sec   Loss 9.8719   LearningRate 0.0686   Epoch: 3   Global Step: 14210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:30,459-Speed 3383.98 samples/sec   Loss 9.9381   LearningRate 0.0686   Epoch: 3   Global Step: 14220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:33,483-Speed 3387.42 samples/sec   Loss 9.7564   LearningRate 0.0685   Epoch: 3   Global Step: 14230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:36,514-Speed 3379.46 samples/sec   Loss 9.8180   LearningRate 0.0685   Epoch: 3   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:39,538-Speed 3386.84 samples/sec   Loss 9.8060   LearningRate 0.0685   Epoch: 3   Global Step: 14250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:42,560-Speed 3389.39 samples/sec   Loss 9.8265   LearningRate 0.0685   Epoch: 3   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:45,584-Speed 3386.15 samples/sec   Loss 9.8977   LearningRate 0.0685   Epoch: 3   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:48,609-Speed 3386.54 samples/sec   Loss 9.8304   LearningRate 0.0684   Epoch: 3   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:51,629-Speed 3391.51 samples/sec   Loss 10.0060   LearningRate 0.0684   Epoch: 3   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:54,654-Speed 3385.72 samples/sec   Loss 9.8847   LearningRate 0.0684   Epoch: 3   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:43:57,672-Speed 3393.07 samples/sec   Loss 9.7997   LearningRate 0.0684   Epoch: 3   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:00,700-Speed 3382.74 samples/sec   Loss 9.7791   LearningRate 0.0684   Epoch: 3   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:03,716-Speed 3396.31 samples/sec   Loss 9.8876   LearningRate 0.0683   Epoch: 3   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:06,740-Speed 3387.68 samples/sec   Loss 9.8934   LearningRate 0.0683   Epoch: 3   Global Step: 14340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:09,759-Speed 3391.84 samples/sec   Loss 9.8056   LearningRate 0.0683   Epoch: 3   Global Step: 14350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:12,783-Speed 3387.13 samples/sec   Loss 9.9144   LearningRate 0.0683   Epoch: 3   Global Step: 14360   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 13:44:15,814-Speed 3379.42 samples/sec   Loss 9.9466   LearningRate 0.0683   Epoch: 3   Global Step: 14370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:18,832-Speed 3394.23 samples/sec   Loss 9.8759   LearningRate 0.0682   Epoch: 3   Global Step: 14380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:21,836-Speed 3408.68 samples/sec   Loss 9.7915   LearningRate 0.0682   Epoch: 3   Global Step: 14390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:24,865-Speed 3381.91 samples/sec   Loss 9.8170   LearningRate 0.0682   Epoch: 3   Global Step: 14400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:27,898-Speed 3376.90 samples/sec   Loss 9.7279   LearningRate 0.0682   Epoch: 3   Global Step: 14410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:30,920-Speed 3388.70 samples/sec   Loss 9.7244   LearningRate 0.0682   Epoch: 3   Global Step: 14420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:33,942-Speed 3389.37 samples/sec   Loss 9.7697   LearningRate 0.0681   Epoch: 3   Global Step: 14430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:36,960-Speed 3394.55 samples/sec   Loss 9.7663   LearningRate 0.0681   Epoch: 3   Global Step: 14440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:39,978-Speed 3393.34 samples/sec   Loss 9.8094   LearningRate 0.0681   Epoch: 3   Global Step: 14450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:42,993-Speed 3396.58 samples/sec   Loss 9.7048   LearningRate 0.0681   Epoch: 3   Global Step: 14460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:46,018-Speed 3385.81 samples/sec   Loss 9.8232   LearningRate 0.0681   Epoch: 3   Global Step: 14470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:49,056-Speed 3372.27 samples/sec   Loss 9.8355   LearningRate 0.0680   Epoch: 3   Global Step: 14480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:44:52,080-Speed 3386.52 samples/sec   Loss 9.7642   LearningRate 0.0680   Epoch: 3   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:55,152-Speed 3333.88 samples/sec   Loss 9.8987   LearningRate 0.0680   Epoch: 3   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:44:58,183-Speed 3379.04 samples/sec   Loss 9.7240   LearningRate 0.0680   Epoch: 3   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:01,203-Speed 3391.43 samples/sec   Loss 9.7475   LearningRate 0.0680   Epoch: 3   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:04,225-Speed 3389.60 samples/sec   Loss 9.9347   LearningRate 0.0679   Epoch: 3   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:07,247-Speed 3390.23 samples/sec   Loss 9.8671   LearningRate 0.0679   Epoch: 3   Global Step: 14540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:10,265-Speed 3393.65 samples/sec   Loss 9.9418   LearningRate 0.0679   Epoch: 3   Global Step: 14550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:13,281-Speed 3394.99 samples/sec   Loss 9.7324   LearningRate 0.0679   Epoch: 3   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:16,304-Speed 3389.08 samples/sec   Loss 9.7572   LearningRate 0.0679   Epoch: 3   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:19,326-Speed 3389.11 samples/sec   Loss 9.7475   LearningRate 0.0678   Epoch: 3   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:22,339-Speed 3398.66 samples/sec   Loss 9.8131   LearningRate 0.0678   Epoch: 3   Global Step: 14590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:25,362-Speed 3387.92 samples/sec   Loss 9.9068   LearningRate 0.0678   Epoch: 3   Global Step: 14600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:28,395-Speed 3377.51 samples/sec   Loss 9.7306   LearningRate 0.0678   Epoch: 3   Global Step: 14610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:31,423-Speed 3382.72 samples/sec   Loss 9.8392   LearningRate 0.0678   Epoch: 3   Global Step: 14620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:34,452-Speed 3381.09 samples/sec   Loss 9.9088   LearningRate 0.0677   Epoch: 3   Global Step: 14630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:37,478-Speed 3385.72 samples/sec   Loss 9.9240   LearningRate 0.0677   Epoch: 3   Global Step: 14640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:40,497-Speed 3392.14 samples/sec   Loss 9.7333   LearningRate 0.0677   Epoch: 3   Global Step: 14650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:43,519-Speed 3388.74 samples/sec   Loss 9.8504   LearningRate 0.0677   Epoch: 3   Global Step: 14660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:46,538-Speed 3392.90 samples/sec   Loss 9.6642   LearningRate 0.0677   Epoch: 3   Global Step: 14670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:49,579-Speed 3367.66 samples/sec   Loss 9.8537   LearningRate 0.0676   Epoch: 3   Global Step: 14680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:45:52,605-Speed 3385.30 samples/sec   Loss 9.7089   LearningRate 0.0676   Epoch: 3   Global Step: 14690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:55,645-Speed 3368.39 samples/sec   Loss 9.8539   LearningRate 0.0676   Epoch: 3   Global Step: 14700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:45:58,668-Speed 3388.21 samples/sec   Loss 9.7812   LearningRate 0.0676   Epoch: 3   Global Step: 14710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:01,693-Speed 3386.29 samples/sec   Loss 9.7078   LearningRate 0.0676   Epoch: 3   Global Step: 14720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:04,719-Speed 3385.30 samples/sec   Loss 9.8145   LearningRate 0.0675   Epoch: 3   Global Step: 14730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:07,745-Speed 3385.45 samples/sec   Loss 9.7497   LearningRate 0.0675   Epoch: 3   Global Step: 14740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:10,776-Speed 3378.38 samples/sec   Loss 9.7417   LearningRate 0.0675   Epoch: 3   Global Step: 14750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:13,800-Speed 3387.41 samples/sec   Loss 9.9709   LearningRate 0.0675   Epoch: 3   Global Step: 14760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:16,824-Speed 3386.49 samples/sec   Loss 9.7774   LearningRate 0.0675   Epoch: 3   Global Step: 14770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:19,852-Speed 3383.22 samples/sec   Loss 9.9053   LearningRate 0.0675   Epoch: 3   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:22,864-Speed 3399.60 samples/sec   Loss 9.8939   LearningRate 0.0674   Epoch: 3   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:25,894-Speed 3380.91 samples/sec   Loss 9.6935   LearningRate 0.0674   Epoch: 3   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:28,926-Speed 3378.05 samples/sec   Loss 9.7845   LearningRate 0.0674   Epoch: 3   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:46:31,943-Speed 3394.06 samples/sec   Loss 9.6825   LearningRate 0.0674   Epoch: 3   Global Step: 14820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:34,966-Speed 3389.00 samples/sec   Loss 9.7252   LearningRate 0.0674   Epoch: 3   Global Step: 14830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:38,009-Speed 3366.27 samples/sec   Loss 9.8250   LearningRate 0.0673   Epoch: 3   Global Step: 14840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:41,058-Speed 3358.53 samples/sec   Loss 9.7107   LearningRate 0.0673   Epoch: 3   Global Step: 14850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:44,088-Speed 3381.18 samples/sec   Loss 9.6817   LearningRate 0.0673   Epoch: 3   Global Step: 14860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:47,111-Speed 3387.95 samples/sec   Loss 9.8005   LearningRate 0.0673   Epoch: 3   Global Step: 14870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:50,140-Speed 3380.32 samples/sec   Loss 9.9261   LearningRate 0.0673   Epoch: 3   Global Step: 14880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:53,184-Speed 3365.49 samples/sec   Loss 9.6932   LearningRate 0.0672   Epoch: 3   Global Step: 14890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:56,207-Speed 3388.95 samples/sec   Loss 9.7739   LearningRate 0.0672   Epoch: 3   Global Step: 14900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:46:59,234-Speed 3383.30 samples/sec   Loss 9.7713   LearningRate 0.0672   Epoch: 3   Global Step: 14910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:47:02,267-Speed 3377.16 samples/sec   Loss 9.7850   LearningRate 0.0672   Epoch: 3   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:05,305-Speed 3371.13 samples/sec   Loss 9.6571   LearningRate 0.0672   Epoch: 3   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:08,331-Speed 3384.55 samples/sec   Loss 9.8739   LearningRate 0.0671   Epoch: 3   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:11,369-Speed 3371.77 samples/sec   Loss 9.8576   LearningRate 0.0671   Epoch: 3   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:14,402-Speed 3377.22 samples/sec   Loss 9.7311   LearningRate 0.0671   Epoch: 3   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:17,445-Speed 3365.85 samples/sec   Loss 9.8875   LearningRate 0.0671   Epoch: 3   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:20,473-Speed 3382.44 samples/sec   Loss 9.8144   LearningRate 0.0671   Epoch: 3   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:23,501-Speed 3382.37 samples/sec   Loss 9.9235   LearningRate 0.0670   Epoch: 3   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:26,529-Speed 3383.18 samples/sec   Loss 9.7734   LearningRate 0.0670   Epoch: 3   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:29,557-Speed 3381.93 samples/sec   Loss 9.8298   LearningRate 0.0670   Epoch: 3   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:32,580-Speed 3388.37 samples/sec   Loss 9.9387   LearningRate 0.0670   Epoch: 3   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:35,607-Speed 3384.06 samples/sec   Loss 9.7718   LearningRate 0.0670   Epoch: 3   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:38,640-Speed 3376.08 samples/sec   Loss 9.8973   LearningRate 0.0669   Epoch: 3   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:41,668-Speed 3383.26 samples/sec   Loss 9.7014   LearningRate 0.0669   Epoch: 3   Global Step: 15050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:44,692-Speed 3386.77 samples/sec   Loss 9.8001   LearningRate 0.0669   Epoch: 3   Global Step: 15060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:47,718-Speed 3384.99 samples/sec   Loss 9.7761   LearningRate 0.0669   Epoch: 3   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:50,750-Speed 3378.18 samples/sec   Loss 9.7579   LearningRate 0.0669   Epoch: 3   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:53,781-Speed 3378.65 samples/sec   Loss 9.7493   LearningRate 0.0668   Epoch: 3   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:56,817-Speed 3373.52 samples/sec   Loss 9.7406   LearningRate 0.0668   Epoch: 3   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:47:59,842-Speed 3386.33 samples/sec   Loss 9.7544   LearningRate 0.0668   Epoch: 3   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:02,863-Speed 3390.35 samples/sec   Loss 9.6671   LearningRate 0.0668   Epoch: 3   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:05,888-Speed 3385.75 samples/sec   Loss 9.8165   LearningRate 0.0668   Epoch: 3   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:08,915-Speed 3383.83 samples/sec   Loss 9.8728   LearningRate 0.0667   Epoch: 3   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:11,942-Speed 3383.66 samples/sec   Loss 9.7784   LearningRate 0.0667   Epoch: 3   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:14,966-Speed 3387.12 samples/sec   Loss 9.7572   LearningRate 0.0667   Epoch: 3   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:17,990-Speed 3386.59 samples/sec   Loss 9.7861   LearningRate 0.0667   Epoch: 3   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:21,024-Speed 3376.02 samples/sec   Loss 9.7902   LearningRate 0.0667   Epoch: 3   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:24,079-Speed 3352.76 samples/sec   Loss 9.7957   LearningRate 0.0666   Epoch: 3   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:27,102-Speed 3388.29 samples/sec   Loss 9.7014   LearningRate 0.0666   Epoch: 3   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:30,141-Speed 3370.31 samples/sec   Loss 9.7875   LearningRate 0.0666   Epoch: 3   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:33,151-Speed 3402.78 samples/sec   Loss 9.7703   LearningRate 0.0666   Epoch: 3   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:36,178-Speed 3383.61 samples/sec   Loss 9.7897   LearningRate 0.0666   Epoch: 3   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:48:39,198-Speed 3391.41 samples/sec   Loss 9.7395   LearningRate 0.0665   Epoch: 3   Global Step: 15240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:48:42,260-Speed 3344.91 samples/sec   Loss 9.7888   LearningRate 0.0665   Epoch: 3   Global Step: 15250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:48:45,281-Speed 3390.38 samples/sec   Loss 9.7362   LearningRate 0.0665   Epoch: 3   Global Step: 15260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:48:48,313-Speed 3378.56 samples/sec   Loss 9.7339   LearningRate 0.0665   Epoch: 3   Global Step: 15270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:48:51,337-Speed 3386.40 samples/sec   Loss 9.7842   LearningRate 0.0665   Epoch: 3   Global Step: 15280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:48:54,369-Speed 3377.93 samples/sec   Loss 9.6893   LearningRate 0.0664   Epoch: 3   Global Step: 15290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:48:57,396-Speed 3384.40 samples/sec   Loss 9.8380   LearningRate 0.0664   Epoch: 3   Global Step: 15300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:00,429-Speed 3376.31 samples/sec   Loss 9.7375   LearningRate 0.0664   Epoch: 3   Global Step: 15310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:03,455-Speed 3384.70 samples/sec   Loss 9.6730   LearningRate 0.0664   Epoch: 3   Global Step: 15320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:06,479-Speed 3387.20 samples/sec   Loss 9.6833   LearningRate 0.0664   Epoch: 3   Global Step: 15330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:09,503-Speed 3386.70 samples/sec   Loss 9.7121   LearningRate 0.0663   Epoch: 3   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:49:12,531-Speed 3383.74 samples/sec   Loss 9.8724   LearningRate 0.0663   Epoch: 3   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:49:15,566-Speed 3374.43 samples/sec   Loss 9.7238   LearningRate 0.0663   Epoch: 3   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:49:18,592-Speed 3384.44 samples/sec   Loss 9.6767   LearningRate 0.0663   Epoch: 3   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:49:21,620-Speed 3382.64 samples/sec   Loss 9.7336   LearningRate 0.0663   Epoch: 3   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:49:24,645-Speed 3386.21 samples/sec   Loss 9.6402   LearningRate 0.0662   Epoch: 3   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:49:27,674-Speed 3381.02 samples/sec   Loss 9.6604   LearningRate 0.0662   Epoch: 3   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:49:30,693-Speed 3392.33 samples/sec   Loss 9.6466   LearningRate 0.0662   Epoch: 3   Global Step: 15410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:33,728-Speed 3375.08 samples/sec   Loss 9.6902   LearningRate 0.0662   Epoch: 3   Global Step: 15420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:36,770-Speed 3366.68 samples/sec   Loss 9.7391   LearningRate 0.0662   Epoch: 3   Global Step: 15430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:39,803-Speed 3377.06 samples/sec   Loss 9.6770   LearningRate 0.0661   Epoch: 3   Global Step: 15440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:42,832-Speed 3381.43 samples/sec   Loss 9.6504   LearningRate 0.0661   Epoch: 3   Global Step: 15450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:45,861-Speed 3382.05 samples/sec   Loss 9.6776   LearningRate 0.0661   Epoch: 3   Global Step: 15460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:48,890-Speed 3380.47 samples/sec   Loss 9.7278   LearningRate 0.0661   Epoch: 3   Global Step: 15470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:51,927-Speed 3372.62 samples/sec   Loss 9.7928   LearningRate 0.0661   Epoch: 3   Global Step: 15480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:54,961-Speed 3376.48 samples/sec   Loss 9.8252   LearningRate 0.0660   Epoch: 3   Global Step: 15490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:49:57,992-Speed 3378.78 samples/sec   Loss 9.7369   LearningRate 0.0660   Epoch: 3   Global Step: 15500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:50:01,031-Speed 3370.04 samples/sec   Loss 9.6542   LearningRate 0.0660   Epoch: 3   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:04,073-Speed 3367.24 samples/sec   Loss 9.7445   LearningRate 0.0660   Epoch: 3   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:07,109-Speed 3373.47 samples/sec   Loss 9.6377   LearningRate 0.0660   Epoch: 3   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:10,132-Speed 3387.97 samples/sec   Loss 9.7112   LearningRate 0.0659   Epoch: 3   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:13,171-Speed 3370.59 samples/sec   Loss 9.7351   LearningRate 0.0659   Epoch: 3   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:16,240-Speed 3337.22 samples/sec   Loss 9.6859   LearningRate 0.0659   Epoch: 3   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:19,275-Speed 3374.98 samples/sec   Loss 9.5854   LearningRate 0.0659   Epoch: 3   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:22,302-Speed 3383.19 samples/sec   Loss 9.6962   LearningRate 0.0659   Epoch: 3   Global Step: 15580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:25,331-Speed 3382.05 samples/sec   Loss 9.5576   LearningRate 0.0659   Epoch: 3   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:28,355-Speed 3386.19 samples/sec   Loss 9.5994   LearningRate 0.0658   Epoch: 3   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:31,369-Speed 3398.77 samples/sec   Loss 9.6082   LearningRate 0.0658   Epoch: 3   Global Step: 15610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:34,397-Speed 3382.34 samples/sec   Loss 9.6405   LearningRate 0.0658   Epoch: 3   Global Step: 15620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:37,494-Speed 3306.47 samples/sec   Loss 9.6976   LearningRate 0.0658   Epoch: 3   Global Step: 15630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:40,525-Speed 3379.58 samples/sec   Loss 9.5205   LearningRate 0.0658   Epoch: 3   Global Step: 15640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:43,590-Speed 3342.77 samples/sec   Loss 9.6811   LearningRate 0.0657   Epoch: 3   Global Step: 15650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:46,630-Speed 3368.38 samples/sec   Loss 9.7131   LearningRate 0.0657   Epoch: 3   Global Step: 15660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:49,656-Speed 3384.73 samples/sec   Loss 9.5390   LearningRate 0.0657   Epoch: 3   Global Step: 15670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:52,679-Speed 3388.15 samples/sec   Loss 9.5651   LearningRate 0.0657   Epoch: 3   Global Step: 15680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:55,704-Speed 3386.42 samples/sec   Loss 9.4914   LearningRate 0.0657   Epoch: 3   Global Step: 15690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:50:58,730-Speed 3383.84 samples/sec   Loss 9.4647   LearningRate 0.0656   Epoch: 3   Global Step: 15700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:01,761-Speed 3379.65 samples/sec   Loss 9.6937   LearningRate 0.0656   Epoch: 3   Global Step: 15710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:04,809-Speed 3360.37 samples/sec   Loss 9.6085   LearningRate 0.0656   Epoch: 3   Global Step: 15720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:07,841-Speed 3377.37 samples/sec   Loss 9.6180   LearningRate 0.0656   Epoch: 3   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:10,876-Speed 3376.02 samples/sec   Loss 9.5554   LearningRate 0.0656   Epoch: 3   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:13,903-Speed 3383.17 samples/sec   Loss 9.5612   LearningRate 0.0655   Epoch: 3   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:16,929-Speed 3384.71 samples/sec   Loss 9.5222   LearningRate 0.0655   Epoch: 3   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:19,961-Speed 3378.41 samples/sec   Loss 9.5250   LearningRate 0.0655   Epoch: 3   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:22,999-Speed 3371.90 samples/sec   Loss 9.6692   LearningRate 0.0655   Epoch: 3   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:26,039-Speed 3368.45 samples/sec   Loss 9.5992   LearningRate 0.0655   Epoch: 3   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:29,072-Speed 3377.52 samples/sec   Loss 9.5745   LearningRate 0.0654   Epoch: 3   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:32,107-Speed 3374.56 samples/sec   Loss 9.5531   LearningRate 0.0654   Epoch: 3   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:35,137-Speed 3380.39 samples/sec   Loss 9.5941   LearningRate 0.0654   Epoch: 3   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:38,169-Speed 3377.95 samples/sec   Loss 9.5095   LearningRate 0.0654   Epoch: 3   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:41,203-Speed 3376.40 samples/sec   Loss 9.5762   LearningRate 0.0654   Epoch: 3   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:44,232-Speed 3381.34 samples/sec   Loss 9.5474   LearningRate 0.0653   Epoch: 3   Global Step: 15850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:47,260-Speed 3382.46 samples/sec   Loss 9.7683   LearningRate 0.0653   Epoch: 3   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:50,308-Speed 3359.78 samples/sec   Loss 9.6130   LearningRate 0.0653   Epoch: 3   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:53,333-Speed 3386.22 samples/sec   Loss 9.6143   LearningRate 0.0653   Epoch: 3   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:51:56,353-Speed 3391.35 samples/sec   Loss 9.5130   LearningRate 0.0653   Epoch: 3   Global Step: 15890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:51:59,383-Speed 3380.78 samples/sec   Loss 9.6114   LearningRate 0.0652   Epoch: 3   Global Step: 15900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:02,414-Speed 3378.67 samples/sec   Loss 9.5381   LearningRate 0.0652   Epoch: 3   Global Step: 15910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:05,453-Speed 3370.45 samples/sec   Loss 9.7130   LearningRate 0.0652   Epoch: 3   Global Step: 15920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:08,481-Speed 3382.92 samples/sec   Loss 9.6205   LearningRate 0.0652   Epoch: 3   Global Step: 15930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:11,511-Speed 3380.16 samples/sec   Loss 9.4226   LearningRate 0.0652   Epoch: 3   Global Step: 15940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:14,574-Speed 3343.71 samples/sec   Loss 9.4717   LearningRate 0.0651   Epoch: 3   Global Step: 15950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:17,604-Speed 3380.21 samples/sec   Loss 9.5620   LearningRate 0.0651   Epoch: 3   Global Step: 15960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:20,634-Speed 3380.89 samples/sec   Loss 9.5391   LearningRate 0.0651   Epoch: 3   Global Step: 15970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:23,670-Speed 3372.95 samples/sec   Loss 9.4729   LearningRate 0.0651   Epoch: 3   Global Step: 15980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:52:26,699-Speed 3381.61 samples/sec   Loss 9.6011   LearningRate 0.0651   Epoch: 3   Global Step: 15990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:52:29,733-Speed 3375.09 samples/sec   Loss 9.6282   LearningRate 0.0650   Epoch: 3   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:53:13,150-[lfw][16000]XNorm: 23.627792
Training: 2022-04-26 13:53:13,151-[lfw][16000]Accuracy-Flip: 0.99650+-0.00273
Training: 2022-04-26 13:53:13,151-[lfw][16000]Accuracy-Highest: 0.99650
Training: 2022-04-26 13:54:03,813-[cfp_fp][16000]XNorm: 21.488418
Training: 2022-04-26 13:54:03,814-[cfp_fp][16000]Accuracy-Flip: 0.97657+-0.00590
Training: 2022-04-26 13:54:03,814-[cfp_fp][16000]Accuracy-Highest: 0.97657
Training: 2022-04-26 13:54:47,161-[agedb_30][16000]XNorm: 23.146002
Training: 2022-04-26 13:54:47,161-[agedb_30][16000]Accuracy-Flip: 0.96067+-0.00998
Training: 2022-04-26 13:54:47,162-[agedb_30][16000]Accuracy-Highest: 0.96067
Training: 2022-04-26 13:54:50,187-Speed 72.91 samples/sec   Loss 9.5282   LearningRate 0.0650   Epoch: 3   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:54:53,210-Speed 3388.16 samples/sec   Loss 9.7321   LearningRate 0.0650   Epoch: 3   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:54:56,236-Speed 3384.62 samples/sec   Loss 9.4309   LearningRate 0.0650   Epoch: 3   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:54:59,270-Speed 3376.26 samples/sec   Loss 9.6495   LearningRate 0.0650   Epoch: 3   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:02,300-Speed 3380.39 samples/sec   Loss 9.5399   LearningRate 0.0650   Epoch: 3   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:05,336-Speed 3373.57 samples/sec   Loss 9.5787   LearningRate 0.0649   Epoch: 3   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:08,369-Speed 3377.71 samples/sec   Loss 9.5063   LearningRate 0.0649   Epoch: 3   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:11,400-Speed 3378.52 samples/sec   Loss 9.4590   LearningRate 0.0649   Epoch: 3   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:14,422-Speed 3389.48 samples/sec   Loss 9.6523   LearningRate 0.0649   Epoch: 3   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:17,451-Speed 3380.92 samples/sec   Loss 9.5815   LearningRate 0.0649   Epoch: 3   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:20,483-Speed 3378.44 samples/sec   Loss 9.4244   LearningRate 0.0648   Epoch: 3   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:23,515-Speed 3377.70 samples/sec   Loss 9.2414   LearningRate 0.0648   Epoch: 3   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:55:26,542-Speed 3383.83 samples/sec   Loss 9.6249   LearningRate 0.0648   Epoch: 3   Global Step: 16130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:29,603-Speed 3346.88 samples/sec   Loss 9.4465   LearningRate 0.0648   Epoch: 3   Global Step: 16140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:32,634-Speed 3379.29 samples/sec   Loss 9.4803   LearningRate 0.0648   Epoch: 3   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:35,665-Speed 3379.02 samples/sec   Loss 9.4130   LearningRate 0.0647   Epoch: 3   Global Step: 16160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:38,701-Speed 3373.53 samples/sec   Loss 9.5869   LearningRate 0.0647   Epoch: 3   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:41,741-Speed 3370.04 samples/sec   Loss 9.5548   LearningRate 0.0647   Epoch: 3   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:44,764-Speed 3387.76 samples/sec   Loss 9.4830   LearningRate 0.0647   Epoch: 3   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:47,798-Speed 3375.48 samples/sec   Loss 9.4936   LearningRate 0.0647   Epoch: 3   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:50,898-Speed 3304.32 samples/sec   Loss 9.5777   LearningRate 0.0646   Epoch: 3   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:53,967-Speed 3336.58 samples/sec   Loss 9.5224   LearningRate 0.0646   Epoch: 3   Global Step: 16220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:55:56,997-Speed 3381.06 samples/sec   Loss 9.5812   LearningRate 0.0646   Epoch: 3   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:00,030-Speed 3376.30 samples/sec   Loss 9.5460   LearningRate 0.0646   Epoch: 3   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:03,071-Speed 3368.39 samples/sec   Loss 9.5939   LearningRate 0.0646   Epoch: 3   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:06,107-Speed 3374.49 samples/sec   Loss 9.3856   LearningRate 0.0645   Epoch: 3   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:09,129-Speed 3388.48 samples/sec   Loss 9.4272   LearningRate 0.0645   Epoch: 3   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:12,169-Speed 3369.62 samples/sec   Loss 9.4716   LearningRate 0.0645   Epoch: 3   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:15,210-Speed 3367.79 samples/sec   Loss 9.4957   LearningRate 0.0645   Epoch: 3   Global Step: 16290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:18,247-Speed 3372.84 samples/sec   Loss 9.4756   LearningRate 0.0645   Epoch: 3   Global Step: 16300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:21,278-Speed 3378.14 samples/sec   Loss 9.5076   LearningRate 0.0644   Epoch: 3   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:24,348-Speed 3337.02 samples/sec   Loss 9.4626   LearningRate 0.0644   Epoch: 3   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:27,370-Speed 3389.33 samples/sec   Loss 9.4870   LearningRate 0.0644   Epoch: 3   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:30,390-Speed 3391.50 samples/sec   Loss 9.5434   LearningRate 0.0644   Epoch: 3   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:33,417-Speed 3383.85 samples/sec   Loss 9.5014   LearningRate 0.0644   Epoch: 3   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:36,442-Speed 3385.65 samples/sec   Loss 9.4683   LearningRate 0.0643   Epoch: 3   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:39,472-Speed 3380.14 samples/sec   Loss 9.5988   LearningRate 0.0643   Epoch: 3   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:42,497-Speed 3385.71 samples/sec   Loss 9.4784   LearningRate 0.0643   Epoch: 3   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:45,524-Speed 3383.60 samples/sec   Loss 9.5447   LearningRate 0.0643   Epoch: 3   Global Step: 16390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:48,559-Speed 3375.10 samples/sec   Loss 9.5393   LearningRate 0.0643   Epoch: 3   Global Step: 16400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:56:51,589-Speed 3379.70 samples/sec   Loss 9.5740   LearningRate 0.0643   Epoch: 3   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:54,618-Speed 3382.16 samples/sec   Loss 9.5617   LearningRate 0.0642   Epoch: 3   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:56:57,640-Speed 3388.42 samples/sec   Loss 9.4173   LearningRate 0.0642   Epoch: 3   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:00,666-Speed 3385.09 samples/sec   Loss 9.3636   LearningRate 0.0642   Epoch: 3   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:03,706-Speed 3369.80 samples/sec   Loss 9.4798   LearningRate 0.0642   Epoch: 3   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:06,729-Speed 3388.27 samples/sec   Loss 9.4256   LearningRate 0.0642   Epoch: 3   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:09,767-Speed 3371.43 samples/sec   Loss 9.4326   LearningRate 0.0641   Epoch: 3   Global Step: 16470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:12,796-Speed 3381.14 samples/sec   Loss 9.4417   LearningRate 0.0641   Epoch: 3   Global Step: 16480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:15,849-Speed 3354.99 samples/sec   Loss 9.4875   LearningRate 0.0641   Epoch: 3   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:18,871-Speed 3388.87 samples/sec   Loss 9.4088   LearningRate 0.0641   Epoch: 3   Global Step: 16500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:57:21,884-Speed 3399.06 samples/sec   Loss 9.4920   LearningRate 0.0641   Epoch: 3   Global Step: 16510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:24,913-Speed 3381.03 samples/sec   Loss 9.5379   LearningRate 0.0640   Epoch: 3   Global Step: 16520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:28,060-Speed 3255.11 samples/sec   Loss 9.5086   LearningRate 0.0640   Epoch: 3   Global Step: 16530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:31,077-Speed 3394.83 samples/sec   Loss 9.3398   LearningRate 0.0640   Epoch: 3   Global Step: 16540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:43,711-Speed 810.60 samples/sec   Loss 7.6126   LearningRate 0.0640   Epoch: 4   Global Step: 16550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:46,748-Speed 3372.05 samples/sec   Loss 7.6647   LearningRate 0.0640   Epoch: 4   Global Step: 16560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:49,762-Speed 3398.48 samples/sec   Loss 7.6147   LearningRate 0.0639   Epoch: 4   Global Step: 16570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:52,782-Speed 3392.02 samples/sec   Loss 7.7450   LearningRate 0.0639   Epoch: 4   Global Step: 16580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:55,816-Speed 3375.55 samples/sec   Loss 7.7014   LearningRate 0.0639   Epoch: 4   Global Step: 16590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:57:58,844-Speed 3382.82 samples/sec   Loss 7.5750   LearningRate 0.0639   Epoch: 4   Global Step: 16600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:58:01,869-Speed 3386.96 samples/sec   Loss 7.6860   LearningRate 0.0639   Epoch: 4   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:04,893-Speed 3386.89 samples/sec   Loss 7.5969   LearningRate 0.0638   Epoch: 4   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:07,915-Speed 3389.64 samples/sec   Loss 7.6540   LearningRate 0.0638   Epoch: 4   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:10,949-Speed 3375.53 samples/sec   Loss 7.6592   LearningRate 0.0638   Epoch: 4   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:13,993-Speed 3365.26 samples/sec   Loss 7.6300   LearningRate 0.0638   Epoch: 4   Global Step: 16650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:17,018-Speed 3386.50 samples/sec   Loss 7.6390   LearningRate 0.0638   Epoch: 4   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:20,063-Speed 3364.18 samples/sec   Loss 7.6200   LearningRate 0.0637   Epoch: 4   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:23,085-Speed 3389.15 samples/sec   Loss 7.7740   LearningRate 0.0637   Epoch: 4   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:26,131-Speed 3362.97 samples/sec   Loss 7.7177   LearningRate 0.0637   Epoch: 4   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:29,152-Speed 3390.52 samples/sec   Loss 7.8253   LearningRate 0.0637   Epoch: 4   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:32,164-Speed 3401.68 samples/sec   Loss 7.7353   LearningRate 0.0637   Epoch: 4   Global Step: 16710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:35,208-Speed 3365.68 samples/sec   Loss 7.7211   LearningRate 0.0637   Epoch: 4   Global Step: 16720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:38,227-Speed 3392.58 samples/sec   Loss 7.6603   LearningRate 0.0636   Epoch: 4   Global Step: 16730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:41,279-Speed 3355.66 samples/sec   Loss 7.6928   LearningRate 0.0636   Epoch: 4   Global Step: 16740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:44,315-Speed 3373.96 samples/sec   Loss 7.8565   LearningRate 0.0636   Epoch: 4   Global Step: 16750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:47,342-Speed 3383.98 samples/sec   Loss 7.8996   LearningRate 0.0636   Epoch: 4   Global Step: 16760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:50,368-Speed 3384.04 samples/sec   Loss 7.8711   LearningRate 0.0636   Epoch: 4   Global Step: 16770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:53,394-Speed 3384.64 samples/sec   Loss 7.9029   LearningRate 0.0635   Epoch: 4   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:56,423-Speed 3382.62 samples/sec   Loss 7.9289   LearningRate 0.0635   Epoch: 4   Global Step: 16790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:58:59,471-Speed 3360.17 samples/sec   Loss 7.8794   LearningRate 0.0635   Epoch: 4   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:02,501-Speed 3380.95 samples/sec   Loss 7.9066   LearningRate 0.0635   Epoch: 4   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:05,544-Speed 3366.20 samples/sec   Loss 8.0359   LearningRate 0.0635   Epoch: 4   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:08,573-Speed 3381.48 samples/sec   Loss 8.1272   LearningRate 0.0634   Epoch: 4   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:11,601-Speed 3383.06 samples/sec   Loss 8.0612   LearningRate 0.0634   Epoch: 4   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:14,633-Speed 3377.78 samples/sec   Loss 8.0714   LearningRate 0.0634   Epoch: 4   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:17,656-Speed 3388.57 samples/sec   Loss 8.0441   LearningRate 0.0634   Epoch: 4   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:20,675-Speed 3393.62 samples/sec   Loss 8.1265   LearningRate 0.0634   Epoch: 4   Global Step: 16870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:23,729-Speed 3353.79 samples/sec   Loss 8.1095   LearningRate 0.0633   Epoch: 4   Global Step: 16880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:26,776-Speed 3360.55 samples/sec   Loss 8.1807   LearningRate 0.0633   Epoch: 4   Global Step: 16890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:29,838-Speed 3344.97 samples/sec   Loss 7.9945   LearningRate 0.0633   Epoch: 4   Global Step: 16900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:32,870-Speed 3378.71 samples/sec   Loss 8.1217   LearningRate 0.0633   Epoch: 4   Global Step: 16910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:35,911-Speed 3368.16 samples/sec   Loss 8.1691   LearningRate 0.0633   Epoch: 4   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:39,001-Speed 3315.09 samples/sec   Loss 8.1278   LearningRate 0.0632   Epoch: 4   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:42,038-Speed 3372.15 samples/sec   Loss 8.2000   LearningRate 0.0632   Epoch: 4   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:45,068-Speed 3380.76 samples/sec   Loss 8.0405   LearningRate 0.0632   Epoch: 4   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:48,109-Speed 3368.11 samples/sec   Loss 8.0936   LearningRate 0.0632   Epoch: 4   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 13:59:51,150-Speed 3368.02 samples/sec   Loss 8.1383   LearningRate 0.0632   Epoch: 4   Global Step: 16970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:54,274-Speed 3279.28 samples/sec   Loss 8.0590   LearningRate 0.0632   Epoch: 4   Global Step: 16980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 13:59:57,343-Speed 3337.53 samples/sec   Loss 8.0978   LearningRate 0.0631   Epoch: 4   Global Step: 16990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:00,391-Speed 3360.07 samples/sec   Loss 8.1054   LearningRate 0.0631   Epoch: 4   Global Step: 17000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:03,437-Speed 3362.69 samples/sec   Loss 8.0847   LearningRate 0.0631   Epoch: 4   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:06,476-Speed 3370.87 samples/sec   Loss 8.0869   LearningRate 0.0631   Epoch: 4   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:09,508-Speed 3377.81 samples/sec   Loss 8.2343   LearningRate 0.0631   Epoch: 4   Global Step: 17030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:12,536-Speed 3382.17 samples/sec   Loss 8.3515   LearningRate 0.0630   Epoch: 4   Global Step: 17040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:15,591-Speed 3352.68 samples/sec   Loss 8.2304   LearningRate 0.0630   Epoch: 4   Global Step: 17050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:18,633-Speed 3367.01 samples/sec   Loss 8.3326   LearningRate 0.0630   Epoch: 4   Global Step: 17060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:21,663-Speed 3381.02 samples/sec   Loss 8.3819   LearningRate 0.0630   Epoch: 4   Global Step: 17070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:24,701-Speed 3370.84 samples/sec   Loss 8.3109   LearningRate 0.0630   Epoch: 4   Global Step: 17080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:27,732-Speed 3379.64 samples/sec   Loss 8.2601   LearningRate 0.0629   Epoch: 4   Global Step: 17090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:30,771-Speed 3371.23 samples/sec   Loss 8.3982   LearningRate 0.0629   Epoch: 4   Global Step: 17100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:33,797-Speed 3384.69 samples/sec   Loss 8.2549   LearningRate 0.0629   Epoch: 4   Global Step: 17110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:36,820-Speed 3387.23 samples/sec   Loss 8.1882   LearningRate 0.0629   Epoch: 4   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:39,854-Speed 3376.00 samples/sec   Loss 8.3272   LearningRate 0.0629   Epoch: 4   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:42,886-Speed 3378.42 samples/sec   Loss 8.3164   LearningRate 0.0628   Epoch: 4   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:45,914-Speed 3382.39 samples/sec   Loss 8.2899   LearningRate 0.0628   Epoch: 4   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:48,943-Speed 3381.93 samples/sec   Loss 8.3609   LearningRate 0.0628   Epoch: 4   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:51,959-Speed 3396.54 samples/sec   Loss 8.3978   LearningRate 0.0628   Epoch: 4   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:54,987-Speed 3383.10 samples/sec   Loss 8.3889   LearningRate 0.0628   Epoch: 4   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:00:58,030-Speed 3365.61 samples/sec   Loss 8.4204   LearningRate 0.0627   Epoch: 4   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:01,061-Speed 3379.51 samples/sec   Loss 8.3119   LearningRate 0.0627   Epoch: 4   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:04,098-Speed 3372.59 samples/sec   Loss 8.3923   LearningRate 0.0627   Epoch: 4   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:07,125-Speed 3384.21 samples/sec   Loss 8.4074   LearningRate 0.0627   Epoch: 4   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:10,155-Speed 3379.60 samples/sec   Loss 8.4874   LearningRate 0.0627   Epoch: 4   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:13,197-Speed 3367.13 samples/sec   Loss 8.3027   LearningRate 0.0627   Epoch: 4   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:16,227-Speed 3381.04 samples/sec   Loss 8.3948   LearningRate 0.0626   Epoch: 4   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:19,293-Speed 3340.09 samples/sec   Loss 8.5278   LearningRate 0.0626   Epoch: 4   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:22,342-Speed 3359.42 samples/sec   Loss 8.4673   LearningRate 0.0626   Epoch: 4   Global Step: 17270   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-26 14:01:25,367-Speed 3385.78 samples/sec   Loss 8.5061   LearningRate 0.0626   Epoch: 4   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:28,407-Speed 3369.79 samples/sec   Loss 8.2808   LearningRate 0.0626   Epoch: 4   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:31,452-Speed 3363.54 samples/sec   Loss 8.5185   LearningRate 0.0625   Epoch: 4   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:34,492-Speed 3369.08 samples/sec   Loss 8.5397   LearningRate 0.0625   Epoch: 4   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:37,518-Speed 3385.37 samples/sec   Loss 8.5781   LearningRate 0.0625   Epoch: 4   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:40,549-Speed 3379.24 samples/sec   Loss 8.3326   LearningRate 0.0625   Epoch: 4   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:43,584-Speed 3375.18 samples/sec   Loss 8.4487   LearningRate 0.0625   Epoch: 4   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:46,609-Speed 3385.50 samples/sec   Loss 8.5861   LearningRate 0.0624   Epoch: 4   Global Step: 17350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:49,633-Speed 3386.56 samples/sec   Loss 8.3841   LearningRate 0.0624   Epoch: 4   Global Step: 17360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:52,676-Speed 3366.66 samples/sec   Loss 8.3804   LearningRate 0.0624   Epoch: 4   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:55,692-Speed 3396.10 samples/sec   Loss 8.4565   LearningRate 0.0624   Epoch: 4   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:01:58,720-Speed 3382.13 samples/sec   Loss 8.6211   LearningRate 0.0624   Epoch: 4   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:01,763-Speed 3365.67 samples/sec   Loss 8.4869   LearningRate 0.0623   Epoch: 4   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:04,789-Speed 3385.22 samples/sec   Loss 8.4863   LearningRate 0.0623   Epoch: 4   Global Step: 17410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:07,816-Speed 3384.48 samples/sec   Loss 8.4982   LearningRate 0.0623   Epoch: 4   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:10,844-Speed 3382.02 samples/sec   Loss 8.6059   LearningRate 0.0623   Epoch: 4   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:13,876-Speed 3378.41 samples/sec   Loss 8.6075   LearningRate 0.0623   Epoch: 4   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:16,909-Speed 3376.39 samples/sec   Loss 8.5575   LearningRate 0.0623   Epoch: 4   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:19,939-Speed 3380.50 samples/sec   Loss 8.4754   LearningRate 0.0622   Epoch: 4   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:22,977-Speed 3371.14 samples/sec   Loss 8.5890   LearningRate 0.0622   Epoch: 4   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:26,004-Speed 3384.77 samples/sec   Loss 8.5383   LearningRate 0.0622   Epoch: 4   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:29,033-Speed 3381.32 samples/sec   Loss 8.6882   LearningRate 0.0622   Epoch: 4   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:32,067-Speed 3375.48 samples/sec   Loss 8.6474   LearningRate 0.0622   Epoch: 4   Global Step: 17500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:35,100-Speed 3377.93 samples/sec   Loss 8.5449   LearningRate 0.0621   Epoch: 4   Global Step: 17510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:38,134-Speed 3376.00 samples/sec   Loss 8.5598   LearningRate 0.0621   Epoch: 4   Global Step: 17520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:41,172-Speed 3370.87 samples/sec   Loss 8.5418   LearningRate 0.0621   Epoch: 4   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:44,224-Speed 3356.01 samples/sec   Loss 8.6188   LearningRate 0.0621   Epoch: 4   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:47,253-Speed 3381.51 samples/sec   Loss 8.6252   LearningRate 0.0621   Epoch: 4   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:50,284-Speed 3379.96 samples/sec   Loss 8.5027   LearningRate 0.0620   Epoch: 4   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:53,318-Speed 3375.82 samples/sec   Loss 8.5360   LearningRate 0.0620   Epoch: 4   Global Step: 17570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:56,605-Speed 3116.07 samples/sec   Loss 8.6590   LearningRate 0.0620   Epoch: 4   Global Step: 17580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:02:59,637-Speed 3377.69 samples/sec   Loss 8.7207   LearningRate 0.0620   Epoch: 4   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:02,682-Speed 3363.64 samples/sec   Loss 8.7545   LearningRate 0.0620   Epoch: 4   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:05,743-Speed 3346.13 samples/sec   Loss 8.6106   LearningRate 0.0619   Epoch: 4   Global Step: 17610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:08,774-Speed 3379.93 samples/sec   Loss 8.6431   LearningRate 0.0619   Epoch: 4   Global Step: 17620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:11,805-Speed 3379.34 samples/sec   Loss 8.7249   LearningRate 0.0619   Epoch: 4   Global Step: 17630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:14,833-Speed 3381.71 samples/sec   Loss 8.7581   LearningRate 0.0619   Epoch: 4   Global Step: 17640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:17,872-Speed 3370.36 samples/sec   Loss 8.7438   LearningRate 0.0619   Epoch: 4   Global Step: 17650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:20,909-Speed 3373.46 samples/sec   Loss 8.7017   LearningRate 0.0619   Epoch: 4   Global Step: 17660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:23,969-Speed 3346.86 samples/sec   Loss 8.7553   LearningRate 0.0618   Epoch: 4   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:27,004-Speed 3374.72 samples/sec   Loss 8.7925   LearningRate 0.0618   Epoch: 4   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:30,052-Speed 3360.15 samples/sec   Loss 8.8156   LearningRate 0.0618   Epoch: 4   Global Step: 17690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:33,094-Speed 3367.59 samples/sec   Loss 8.7572   LearningRate 0.0618   Epoch: 4   Global Step: 17700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:36,136-Speed 3367.74 samples/sec   Loss 8.7656   LearningRate 0.0618   Epoch: 4   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:39,188-Speed 3355.76 samples/sec   Loss 8.7691   LearningRate 0.0617   Epoch: 4   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:42,231-Speed 3365.84 samples/sec   Loss 8.6725   LearningRate 0.0617   Epoch: 4   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:45,272-Speed 3368.38 samples/sec   Loss 8.6092   LearningRate 0.0617   Epoch: 4   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:48,315-Speed 3365.38 samples/sec   Loss 8.6428   LearningRate 0.0617   Epoch: 4   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:51,359-Speed 3365.05 samples/sec   Loss 8.6594   LearningRate 0.0617   Epoch: 4   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:54,395-Speed 3373.61 samples/sec   Loss 8.6916   LearningRate 0.0616   Epoch: 4   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:03:57,428-Speed 3377.42 samples/sec   Loss 8.6999   LearningRate 0.0616   Epoch: 4   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:00,463-Speed 3374.94 samples/sec   Loss 8.6125   LearningRate 0.0616   Epoch: 4   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:03,501-Speed 3371.01 samples/sec   Loss 8.6572   LearningRate 0.0616   Epoch: 4   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:06,536-Speed 3375.42 samples/sec   Loss 8.7061   LearningRate 0.0616   Epoch: 4   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:09,574-Speed 3370.73 samples/sec   Loss 8.8310   LearningRate 0.0615   Epoch: 4   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:12,610-Speed 3373.77 samples/sec   Loss 8.6135   LearningRate 0.0615   Epoch: 4   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:15,662-Speed 3356.49 samples/sec   Loss 8.6365   LearningRate 0.0615   Epoch: 4   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:18,705-Speed 3366.18 samples/sec   Loss 8.6165   LearningRate 0.0615   Epoch: 4   Global Step: 17850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:21,743-Speed 3370.87 samples/sec   Loss 8.7286   LearningRate 0.0615   Epoch: 4   Global Step: 17860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:24,788-Speed 3363.38 samples/sec   Loss 8.8694   LearningRate 0.0615   Epoch: 4   Global Step: 17870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:27,825-Speed 3372.20 samples/sec   Loss 8.8584   LearningRate 0.0614   Epoch: 4   Global Step: 17880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:30,869-Speed 3365.18 samples/sec   Loss 8.7997   LearningRate 0.0614   Epoch: 4   Global Step: 17890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:33,907-Speed 3372.06 samples/sec   Loss 8.7277   LearningRate 0.0614   Epoch: 4   Global Step: 17900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:36,941-Speed 3375.12 samples/sec   Loss 8.5748   LearningRate 0.0614   Epoch: 4   Global Step: 17910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:40,000-Speed 3348.94 samples/sec   Loss 8.7197   LearningRate 0.0614   Epoch: 4   Global Step: 17920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:43,045-Speed 3363.18 samples/sec   Loss 8.7239   LearningRate 0.0613   Epoch: 4   Global Step: 17930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:46,092-Speed 3360.76 samples/sec   Loss 8.8321   LearningRate 0.0613   Epoch: 4   Global Step: 17940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:04:49,181-Speed 3316.67 samples/sec   Loss 8.7148   LearningRate 0.0613   Epoch: 4   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:52,234-Speed 3355.45 samples/sec   Loss 8.8614   LearningRate 0.0613   Epoch: 4   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:55,274-Speed 3368.75 samples/sec   Loss 8.7759   LearningRate 0.0613   Epoch: 4   Global Step: 17970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:04:58,324-Speed 3358.46 samples/sec   Loss 8.7080   LearningRate 0.0612   Epoch: 4   Global Step: 17980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:05:01,363-Speed 3370.50 samples/sec   Loss 8.7993   LearningRate 0.0612   Epoch: 4   Global Step: 17990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:05:04,398-Speed 3374.87 samples/sec   Loss 8.8559   LearningRate 0.0612   Epoch: 4   Global Step: 18000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:05:48,039-[lfw][18000]XNorm: 22.333061
Training: 2022-04-26 14:05:48,039-[lfw][18000]Accuracy-Flip: 0.99567+-0.00291
Training: 2022-04-26 14:05:48,040-[lfw][18000]Accuracy-Highest: 0.99650
Training: 2022-04-26 14:06:38,807-[cfp_fp][18000]XNorm: 21.237589
Training: 2022-04-26 14:06:38,807-[cfp_fp][18000]Accuracy-Flip: 0.97914+-0.00709
Training: 2022-04-26 14:06:38,807-[cfp_fp][18000]Accuracy-Highest: 0.97914
Training: 2022-04-26 14:07:22,508-[agedb_30][18000]XNorm: 22.455522
Training: 2022-04-26 14:07:22,508-[agedb_30][18000]Accuracy-Flip: 0.96633+-0.00963
Training: 2022-04-26 14:07:22,509-[agedb_30][18000]Accuracy-Highest: 0.96633
Training: 2022-04-26 14:07:25,557-Speed 72.54 samples/sec   Loss 8.8625   LearningRate 0.0612   Epoch: 4   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:07:28,602-Speed 3363.85 samples/sec   Loss 8.7561   LearningRate 0.0612   Epoch: 4   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:07:31,658-Speed 3350.95 samples/sec   Loss 8.8404   LearningRate 0.0611   Epoch: 4   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:07:34,682-Speed 3387.54 samples/sec   Loss 8.8118   LearningRate 0.0611   Epoch: 4   Global Step: 18040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:37,731-Speed 3358.72 samples/sec   Loss 8.8197   LearningRate 0.0611   Epoch: 4   Global Step: 18050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:40,788-Speed 3350.79 samples/sec   Loss 8.8055   LearningRate 0.0611   Epoch: 4   Global Step: 18060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:43,841-Speed 3354.71 samples/sec   Loss 8.8156   LearningRate 0.0611   Epoch: 4   Global Step: 18070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:46,890-Speed 3359.15 samples/sec   Loss 8.8254   LearningRate 0.0611   Epoch: 4   Global Step: 18080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:49,942-Speed 3356.24 samples/sec   Loss 8.7903   LearningRate 0.0610   Epoch: 4   Global Step: 18090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:52,994-Speed 3355.47 samples/sec   Loss 8.7474   LearningRate 0.0610   Epoch: 4   Global Step: 18100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:56,041-Speed 3361.99 samples/sec   Loss 8.8232   LearningRate 0.0610   Epoch: 4   Global Step: 18110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:07:59,081-Speed 3369.84 samples/sec   Loss 8.6893   LearningRate 0.0610   Epoch: 4   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:02,135-Speed 3353.27 samples/sec   Loss 8.8953   LearningRate 0.0610   Epoch: 4   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:05,179-Speed 3365.09 samples/sec   Loss 8.6735   LearningRate 0.0609   Epoch: 4   Global Step: 18140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:08,224-Speed 3362.81 samples/sec   Loss 8.7380   LearningRate 0.0609   Epoch: 4   Global Step: 18150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:11,278-Speed 3354.31 samples/sec   Loss 8.7879   LearningRate 0.0609   Epoch: 4   Global Step: 18160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:14,350-Speed 3334.37 samples/sec   Loss 8.8544   LearningRate 0.0609   Epoch: 4   Global Step: 18170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:17,390-Speed 3368.62 samples/sec   Loss 8.7732   LearningRate 0.0609   Epoch: 4   Global Step: 18180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:20,428-Speed 3371.37 samples/sec   Loss 8.7263   LearningRate 0.0608   Epoch: 4   Global Step: 18190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:23,478-Speed 3358.48 samples/sec   Loss 8.8207   LearningRate 0.0608   Epoch: 4   Global Step: 18200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:26,515-Speed 3373.26 samples/sec   Loss 8.8140   LearningRate 0.0608   Epoch: 4   Global Step: 18210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:29,547-Speed 3377.99 samples/sec   Loss 8.7988   LearningRate 0.0608   Epoch: 4   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:32,580-Speed 3376.46 samples/sec   Loss 8.8033   LearningRate 0.0608   Epoch: 4   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:35,610-Speed 3380.55 samples/sec   Loss 8.8065   LearningRate 0.0608   Epoch: 4   Global Step: 18240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:08:38,638-Speed 3382.23 samples/sec   Loss 8.8530   LearningRate 0.0607   Epoch: 4   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:41,714-Speed 3329.42 samples/sec   Loss 8.8800   LearningRate 0.0607   Epoch: 4   Global Step: 18260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:44,739-Speed 3386.34 samples/sec   Loss 8.6533   LearningRate 0.0607   Epoch: 4   Global Step: 18270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:47,789-Speed 3358.02 samples/sec   Loss 8.7280   LearningRate 0.0607   Epoch: 4   Global Step: 18280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:50,811-Speed 3389.45 samples/sec   Loss 8.9005   LearningRate 0.0607   Epoch: 4   Global Step: 18290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:53,847-Speed 3374.48 samples/sec   Loss 8.8272   LearningRate 0.0606   Epoch: 4   Global Step: 18300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:56,892-Speed 3363.26 samples/sec   Loss 8.9011   LearningRate 0.0606   Epoch: 4   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:08:59,955-Speed 3343.98 samples/sec   Loss 8.7524   LearningRate 0.0606   Epoch: 4   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:09:03,006-Speed 3357.10 samples/sec   Loss 8.9544   LearningRate 0.0606   Epoch: 4   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:09:06,051-Speed 3363.61 samples/sec   Loss 8.9124   LearningRate 0.0606   Epoch: 4   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:09:09,096-Speed 3363.96 samples/sec   Loss 8.9160   LearningRate 0.0605   Epoch: 4   Global Step: 18350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:12,156-Speed 3346.89 samples/sec   Loss 8.9330   LearningRate 0.0605   Epoch: 4   Global Step: 18360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:15,186-Speed 3380.47 samples/sec   Loss 8.8171   LearningRate 0.0605   Epoch: 4   Global Step: 18370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:18,243-Speed 3351.40 samples/sec   Loss 8.9213   LearningRate 0.0605   Epoch: 4   Global Step: 18380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:21,280-Speed 3372.30 samples/sec   Loss 8.8528   LearningRate 0.0605   Epoch: 4   Global Step: 18390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:24,312-Speed 3377.43 samples/sec   Loss 8.8483   LearningRate 0.0605   Epoch: 4   Global Step: 18400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:27,353-Speed 3369.29 samples/sec   Loss 8.9332   LearningRate 0.0604   Epoch: 4   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:30,382-Speed 3380.73 samples/sec   Loss 8.9277   LearningRate 0.0604   Epoch: 4   Global Step: 18420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:33,430-Speed 3361.13 samples/sec   Loss 8.8959   LearningRate 0.0604   Epoch: 4   Global Step: 18430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:36,494-Speed 3341.93 samples/sec   Loss 8.7941   LearningRate 0.0604   Epoch: 4   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:39,506-Speed 3401.41 samples/sec   Loss 8.9324   LearningRate 0.0604   Epoch: 4   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:42,533-Speed 3383.28 samples/sec   Loss 8.7755   LearningRate 0.0603   Epoch: 4   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:45,573-Speed 3369.60 samples/sec   Loss 8.8290   LearningRate 0.0603   Epoch: 4   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:09:48,624-Speed 3356.35 samples/sec   Loss 8.8547   LearningRate 0.0603   Epoch: 4   Global Step: 18480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:09:51,675-Speed 3357.21 samples/sec   Loss 9.0217   LearningRate 0.0603   Epoch: 4   Global Step: 18490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:09:54,723-Speed 3360.52 samples/sec   Loss 8.9883   LearningRate 0.0603   Epoch: 4   Global Step: 18500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:09:57,769-Speed 3362.84 samples/sec   Loss 8.8699   LearningRate 0.0602   Epoch: 4   Global Step: 18510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:10:00,825-Speed 3351.81 samples/sec   Loss 8.8796   LearningRate 0.0602   Epoch: 4   Global Step: 18520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:10:03,862-Speed 3371.97 samples/sec   Loss 8.9364   LearningRate 0.0602   Epoch: 4   Global Step: 18530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:10:06,885-Speed 3387.98 samples/sec   Loss 8.7665   LearningRate 0.0602   Epoch: 4   Global Step: 18540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:10:09,911-Speed 3385.70 samples/sec   Loss 8.8841   LearningRate 0.0602   Epoch: 4   Global Step: 18550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:10:12,950-Speed 3370.00 samples/sec   Loss 8.9422   LearningRate 0.0602   Epoch: 4   Global Step: 18560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:10:15,993-Speed 3365.49 samples/sec   Loss 8.9035   LearningRate 0.0601   Epoch: 4   Global Step: 18570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:10:19,035-Speed 3366.82 samples/sec   Loss 8.8952   LearningRate 0.0601   Epoch: 4   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:22,086-Speed 3357.10 samples/sec   Loss 8.9793   LearningRate 0.0601   Epoch: 4   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:25,131-Speed 3363.79 samples/sec   Loss 8.7668   LearningRate 0.0601   Epoch: 4   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:28,162-Speed 3379.28 samples/sec   Loss 8.7859   LearningRate 0.0601   Epoch: 4   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:31,188-Speed 3385.02 samples/sec   Loss 8.8205   LearningRate 0.0600   Epoch: 4   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:34,243-Speed 3353.41 samples/sec   Loss 8.8107   LearningRate 0.0600   Epoch: 4   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:37,273-Speed 3379.82 samples/sec   Loss 8.8296   LearningRate 0.0600   Epoch: 4   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:40,306-Speed 3377.37 samples/sec   Loss 8.8346   LearningRate 0.0600   Epoch: 4   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:43,333-Speed 3383.63 samples/sec   Loss 8.7947   LearningRate 0.0600   Epoch: 4   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:46,359-Speed 3384.47 samples/sec   Loss 8.8293   LearningRate 0.0599   Epoch: 4   Global Step: 18670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:49,378-Speed 3393.07 samples/sec   Loss 8.8163   LearningRate 0.0599   Epoch: 4   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:52,486-Speed 3295.36 samples/sec   Loss 8.9018   LearningRate 0.0599   Epoch: 4   Global Step: 18690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:55,550-Speed 3343.57 samples/sec   Loss 8.8935   LearningRate 0.0599   Epoch: 4   Global Step: 18700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:10:58,594-Speed 3364.80 samples/sec   Loss 8.8670   LearningRate 0.0599   Epoch: 4   Global Step: 18710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:01,630-Speed 3374.09 samples/sec   Loss 8.6457   LearningRate 0.0599   Epoch: 4   Global Step: 18720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:04,700-Speed 3335.35 samples/sec   Loss 8.8425   LearningRate 0.0598   Epoch: 4   Global Step: 18730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:07,765-Speed 3341.73 samples/sec   Loss 8.8249   LearningRate 0.0598   Epoch: 4   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:10,810-Speed 3364.04 samples/sec   Loss 8.9132   LearningRate 0.0598   Epoch: 4   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:13,846-Speed 3374.18 samples/sec   Loss 8.9575   LearningRate 0.0598   Epoch: 4   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:16,884-Speed 3370.93 samples/sec   Loss 8.7856   LearningRate 0.0598   Epoch: 4   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:19,924-Speed 3369.27 samples/sec   Loss 8.9551   LearningRate 0.0597   Epoch: 4   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:22,956-Speed 3378.75 samples/sec   Loss 8.7013   LearningRate 0.0597   Epoch: 4   Global Step: 18790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:26,011-Speed 3352.61 samples/sec   Loss 8.8326   LearningRate 0.0597   Epoch: 4   Global Step: 18800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:29,054-Speed 3365.78 samples/sec   Loss 8.6049   LearningRate 0.0597   Epoch: 4   Global Step: 18810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:32,132-Speed 3327.19 samples/sec   Loss 8.7956   LearningRate 0.0597   Epoch: 4   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:35,168-Speed 3374.39 samples/sec   Loss 8.8527   LearningRate 0.0596   Epoch: 4   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:38,196-Speed 3382.78 samples/sec   Loss 8.6575   LearningRate 0.0596   Epoch: 4   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:41,232-Speed 3373.79 samples/sec   Loss 8.8722   LearningRate 0.0596   Epoch: 4   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:44,285-Speed 3354.21 samples/sec   Loss 8.9097   LearningRate 0.0596   Epoch: 4   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:47,318-Speed 3376.61 samples/sec   Loss 8.8485   LearningRate 0.0596   Epoch: 4   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:50,365-Speed 3361.39 samples/sec   Loss 8.7969   LearningRate 0.0596   Epoch: 4   Global Step: 18880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:11:53,408-Speed 3366.20 samples/sec   Loss 8.8658   LearningRate 0.0595   Epoch: 4   Global Step: 18890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:56,451-Speed 3366.68 samples/sec   Loss 8.9518   LearningRate 0.0595   Epoch: 4   Global Step: 18900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:11:59,485-Speed 3375.79 samples/sec   Loss 8.8739   LearningRate 0.0595   Epoch: 4   Global Step: 18910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:02,524-Speed 3370.04 samples/sec   Loss 8.6826   LearningRate 0.0595   Epoch: 4   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:05,591-Speed 3340.06 samples/sec   Loss 8.9360   LearningRate 0.0595   Epoch: 4   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:08,642-Speed 3356.85 samples/sec   Loss 8.9255   LearningRate 0.0594   Epoch: 4   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:11,695-Speed 3354.84 samples/sec   Loss 8.8605   LearningRate 0.0594   Epoch: 4   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:14,745-Speed 3357.94 samples/sec   Loss 8.8588   LearningRate 0.0594   Epoch: 4   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:17,792-Speed 3361.04 samples/sec   Loss 8.7329   LearningRate 0.0594   Epoch: 4   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:20,824-Speed 3379.08 samples/sec   Loss 8.7286   LearningRate 0.0594   Epoch: 4   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:23,848-Speed 3387.05 samples/sec   Loss 8.8021   LearningRate 0.0593   Epoch: 4   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:26,916-Speed 3338.48 samples/sec   Loss 8.9685   LearningRate 0.0593   Epoch: 4   Global Step: 19000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:29,969-Speed 3354.65 samples/sec   Loss 8.8514   LearningRate 0.0593   Epoch: 4   Global Step: 19010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:33,023-Speed 3353.97 samples/sec   Loss 8.9035   LearningRate 0.0593   Epoch: 4   Global Step: 19020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:36,105-Speed 3323.16 samples/sec   Loss 8.9443   LearningRate 0.0593   Epoch: 4   Global Step: 19030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:39,156-Speed 3357.17 samples/sec   Loss 8.9142   LearningRate 0.0593   Epoch: 4   Global Step: 19040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:42,207-Speed 3357.44 samples/sec   Loss 8.9986   LearningRate 0.0592   Epoch: 4   Global Step: 19050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:45,248-Speed 3368.26 samples/sec   Loss 8.6953   LearningRate 0.0592   Epoch: 4   Global Step: 19060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:48,307-Speed 3347.86 samples/sec   Loss 8.8379   LearningRate 0.0592   Epoch: 4   Global Step: 19070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:51,346-Speed 3370.81 samples/sec   Loss 8.8116   LearningRate 0.0592   Epoch: 4   Global Step: 19080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:54,379-Speed 3376.18 samples/sec   Loss 8.7431   LearningRate 0.0592   Epoch: 4   Global Step: 19090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:12:57,461-Speed 3323.91 samples/sec   Loss 8.7371   LearningRate 0.0591   Epoch: 4   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:00,518-Speed 3350.47 samples/sec   Loss 8.7563   LearningRate 0.0591   Epoch: 4   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:03,563-Speed 3363.31 samples/sec   Loss 8.6741   LearningRate 0.0591   Epoch: 4   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:06,606-Speed 3365.86 samples/sec   Loss 8.8619   LearningRate 0.0591   Epoch: 4   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:09,645-Speed 3371.07 samples/sec   Loss 8.8702   LearningRate 0.0591   Epoch: 4   Global Step: 19140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:12,687-Speed 3366.44 samples/sec   Loss 8.8408   LearningRate 0.0591   Epoch: 4   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:15,763-Speed 3329.96 samples/sec   Loss 8.9654   LearningRate 0.0590   Epoch: 4   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:18,832-Speed 3337.01 samples/sec   Loss 8.8339   LearningRate 0.0590   Epoch: 4   Global Step: 19170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:21,884-Speed 3356.70 samples/sec   Loss 8.8838   LearningRate 0.0590   Epoch: 4   Global Step: 19180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:24,929-Speed 3363.60 samples/sec   Loss 8.8616   LearningRate 0.0590   Epoch: 4   Global Step: 19190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:13:27,976-Speed 3362.25 samples/sec   Loss 8.8019   LearningRate 0.0590   Epoch: 4   Global Step: 19200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:31,026-Speed 3357.51 samples/sec   Loss 8.8453   LearningRate 0.0589   Epoch: 4   Global Step: 19210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:34,096-Speed 3336.17 samples/sec   Loss 8.8195   LearningRate 0.0589   Epoch: 4   Global Step: 19220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:37,128-Speed 3377.92 samples/sec   Loss 8.5778   LearningRate 0.0589   Epoch: 4   Global Step: 19230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:40,171-Speed 3366.41 samples/sec   Loss 8.7838   LearningRate 0.0589   Epoch: 4   Global Step: 19240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:43,242-Speed 3335.14 samples/sec   Loss 8.8594   LearningRate 0.0589   Epoch: 4   Global Step: 19250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:46,325-Speed 3321.88 samples/sec   Loss 8.8985   LearningRate 0.0588   Epoch: 4   Global Step: 19260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:49,379-Speed 3353.77 samples/sec   Loss 8.8723   LearningRate 0.0588   Epoch: 4   Global Step: 19270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:52,441-Speed 3345.55 samples/sec   Loss 8.7302   LearningRate 0.0588   Epoch: 4   Global Step: 19280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:55,509-Speed 3338.39 samples/sec   Loss 8.8555   LearningRate 0.0588   Epoch: 4   Global Step: 19290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:13:58,569-Speed 3347.86 samples/sec   Loss 8.7771   LearningRate 0.0588   Epoch: 4   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:14:01,591-Speed 3388.68 samples/sec   Loss 8.7418   LearningRate 0.0588   Epoch: 4   Global Step: 19310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:04,662-Speed 3335.44 samples/sec   Loss 8.8067   LearningRate 0.0587   Epoch: 4   Global Step: 19320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:07,722-Speed 3347.64 samples/sec   Loss 8.9007   LearningRate 0.0587   Epoch: 4   Global Step: 19330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:10,791-Speed 3336.92 samples/sec   Loss 8.8324   LearningRate 0.0587   Epoch: 4   Global Step: 19340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:13,861-Speed 3336.91 samples/sec   Loss 8.7620   LearningRate 0.0587   Epoch: 4   Global Step: 19350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:16,940-Speed 3326.90 samples/sec   Loss 8.7714   LearningRate 0.0587   Epoch: 4   Global Step: 19360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:20,011-Speed 3334.55 samples/sec   Loss 8.9485   LearningRate 0.0586   Epoch: 4   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:23,079-Speed 3338.89 samples/sec   Loss 8.7709   LearningRate 0.0586   Epoch: 4   Global Step: 19380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:14:26,134-Speed 3352.59 samples/sec   Loss 8.6925   LearningRate 0.0586   Epoch: 4   Global Step: 19390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:29,171-Speed 3372.26 samples/sec   Loss 8.7997   LearningRate 0.0586   Epoch: 4   Global Step: 19400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:32,198-Speed 3384.10 samples/sec   Loss 8.8261   LearningRate 0.0586   Epoch: 4   Global Step: 19410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:35,286-Speed 3317.01 samples/sec   Loss 8.8163   LearningRate 0.0585   Epoch: 4   Global Step: 19420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:38,349-Speed 3344.02 samples/sec   Loss 8.8855   LearningRate 0.0585   Epoch: 4   Global Step: 19430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:41,466-Speed 3286.26 samples/sec   Loss 8.7726   LearningRate 0.0585   Epoch: 4   Global Step: 19440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:44,525-Speed 3348.16 samples/sec   Loss 8.8109   LearningRate 0.0585   Epoch: 4   Global Step: 19450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:47,570-Speed 3363.97 samples/sec   Loss 8.7674   LearningRate 0.0585   Epoch: 4   Global Step: 19460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:50,629-Speed 3348.60 samples/sec   Loss 8.7500   LearningRate 0.0585   Epoch: 4   Global Step: 19470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:53,672-Speed 3365.83 samples/sec   Loss 8.8037   LearningRate 0.0584   Epoch: 4   Global Step: 19480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:56,727-Speed 3352.46 samples/sec   Loss 8.7814   LearningRate 0.0584   Epoch: 4   Global Step: 19490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:14:59,790-Speed 3343.55 samples/sec   Loss 8.8101   LearningRate 0.0584   Epoch: 4   Global Step: 19500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:15:02,856-Speed 3340.91 samples/sec   Loss 8.8963   LearningRate 0.0584   Epoch: 4   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:05,941-Speed 3320.57 samples/sec   Loss 8.8723   LearningRate 0.0584   Epoch: 4   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:09,013-Speed 3334.27 samples/sec   Loss 8.8398   LearningRate 0.0583   Epoch: 4   Global Step: 19530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:12,066-Speed 3354.50 samples/sec   Loss 8.7608   LearningRate 0.0583   Epoch: 4   Global Step: 19540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:15,110-Speed 3364.95 samples/sec   Loss 8.7937   LearningRate 0.0583   Epoch: 4   Global Step: 19550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:18,158-Speed 3359.99 samples/sec   Loss 8.8466   LearningRate 0.0583   Epoch: 4   Global Step: 19560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:21,217-Speed 3348.76 samples/sec   Loss 8.6559   LearningRate 0.0583   Epoch: 4   Global Step: 19570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:24,295-Speed 3327.98 samples/sec   Loss 8.8221   LearningRate 0.0583   Epoch: 4   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:27,365-Speed 3336.39 samples/sec   Loss 8.8020   LearningRate 0.0582   Epoch: 4   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:30,468-Speed 3300.73 samples/sec   Loss 8.7998   LearningRate 0.0582   Epoch: 4   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:33,530-Speed 3345.19 samples/sec   Loss 8.7897   LearningRate 0.0582   Epoch: 4   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:36,564-Speed 3375.88 samples/sec   Loss 8.6014   LearningRate 0.0582   Epoch: 4   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:39,618-Speed 3353.30 samples/sec   Loss 8.7384   LearningRate 0.0582   Epoch: 4   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:42,692-Speed 3331.85 samples/sec   Loss 8.7494   LearningRate 0.0581   Epoch: 4   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:45,758-Speed 3341.36 samples/sec   Loss 8.6913   LearningRate 0.0581   Epoch: 4   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:48,821-Speed 3343.26 samples/sec   Loss 8.7549   LearningRate 0.0581   Epoch: 4   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:51,876-Speed 3353.64 samples/sec   Loss 8.6380   LearningRate 0.0581   Epoch: 4   Global Step: 19670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:54,983-Speed 3295.72 samples/sec   Loss 8.7960   LearningRate 0.0581   Epoch: 4   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:15:58,054-Speed 3335.47 samples/sec   Loss 8.8855   LearningRate 0.0581   Epoch: 4   Global Step: 19690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:01,113-Speed 3347.75 samples/sec   Loss 8.6670   LearningRate 0.0580   Epoch: 4   Global Step: 19700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:04,173-Speed 3348.19 samples/sec   Loss 8.6268   LearningRate 0.0580   Epoch: 4   Global Step: 19710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:07,224-Speed 3357.55 samples/sec   Loss 8.6048   LearningRate 0.0580   Epoch: 4   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:10,289-Speed 3341.00 samples/sec   Loss 8.7082   LearningRate 0.0580   Epoch: 4   Global Step: 19730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:13,353-Speed 3342.50 samples/sec   Loss 8.6880   LearningRate 0.0580   Epoch: 4   Global Step: 19740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:16,429-Speed 3330.45 samples/sec   Loss 8.6522   LearningRate 0.0579   Epoch: 4   Global Step: 19750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:19,497-Speed 3338.65 samples/sec   Loss 8.8052   LearningRate 0.0579   Epoch: 4   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:22,558-Speed 3345.95 samples/sec   Loss 8.7675   LearningRate 0.0579   Epoch: 4   Global Step: 19770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:25,612-Speed 3353.26 samples/sec   Loss 8.6418   LearningRate 0.0579   Epoch: 4   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:28,711-Speed 3305.29 samples/sec   Loss 8.6576   LearningRate 0.0579   Epoch: 4   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:31,794-Speed 3322.12 samples/sec   Loss 8.7344   LearningRate 0.0578   Epoch: 4   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:34,868-Speed 3332.73 samples/sec   Loss 8.7724   LearningRate 0.0578   Epoch: 4   Global Step: 19810   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-26 14:16:37,922-Speed 3353.19 samples/sec   Loss 8.7469   LearningRate 0.0578   Epoch: 4   Global Step: 19820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:41,024-Speed 3302.34 samples/sec   Loss 8.6314   LearningRate 0.0578   Epoch: 4   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:44,076-Speed 3355.90 samples/sec   Loss 8.8275   LearningRate 0.0578   Epoch: 4   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:47,141-Speed 3341.51 samples/sec   Loss 8.7822   LearningRate 0.0578   Epoch: 4   Global Step: 19850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:50,209-Speed 3338.53 samples/sec   Loss 8.7476   LearningRate 0.0577   Epoch: 4   Global Step: 19860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:53,260-Speed 3357.60 samples/sec   Loss 8.7648   LearningRate 0.0577   Epoch: 4   Global Step: 19870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:56,343-Speed 3321.80 samples/sec   Loss 8.5626   LearningRate 0.0577   Epoch: 4   Global Step: 19880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:16:59,410-Speed 3339.37 samples/sec   Loss 8.8333   LearningRate 0.0577   Epoch: 4   Global Step: 19890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:17:02,483-Speed 3333.04 samples/sec   Loss 8.5815   LearningRate 0.0577   Epoch: 4   Global Step: 19900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:17:05,547-Speed 3343.20 samples/sec   Loss 8.6258   LearningRate 0.0576   Epoch: 4   Global Step: 19910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:17:08,562-Speed 3397.67 samples/sec   Loss 8.7920   LearningRate 0.0576   Epoch: 4   Global Step: 19920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:11,634-Speed 3333.81 samples/sec   Loss 8.7110   LearningRate 0.0576   Epoch: 4   Global Step: 19930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:14,704-Speed 3335.68 samples/sec   Loss 8.7395   LearningRate 0.0576   Epoch: 4   Global Step: 19940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:17,782-Speed 3327.76 samples/sec   Loss 8.7837   LearningRate 0.0576   Epoch: 4   Global Step: 19950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:20,845-Speed 3343.73 samples/sec   Loss 8.7560   LearningRate 0.0576   Epoch: 4   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:23,921-Speed 3330.44 samples/sec   Loss 8.6348   LearningRate 0.0575   Epoch: 4   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:26,986-Speed 3341.45 samples/sec   Loss 8.7357   LearningRate 0.0575   Epoch: 4   Global Step: 19980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:30,062-Speed 3328.95 samples/sec   Loss 8.4846   LearningRate 0.0575   Epoch: 4   Global Step: 19990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:17:33,127-Speed 3341.87 samples/sec   Loss 8.7525   LearningRate 0.0575   Epoch: 4   Global Step: 20000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:18:16,589-[lfw][20000]XNorm: 23.047786
Training: 2022-04-26 14:18:16,590-[lfw][20000]Accuracy-Flip: 0.99517+-0.00311
Training: 2022-04-26 14:18:16,590-[lfw][20000]Accuracy-Highest: 0.99650
Training: 2022-04-26 14:19:07,156-[cfp_fp][20000]XNorm: 21.128342
Training: 2022-04-26 14:19:07,157-[cfp_fp][20000]Accuracy-Flip: 0.97657+-0.00703
Training: 2022-04-26 14:19:07,158-[cfp_fp][20000]Accuracy-Highest: 0.97914
Training: 2022-04-26 14:19:50,490-[agedb_30][20000]XNorm: 23.148665
Training: 2022-04-26 14:19:50,491-[agedb_30][20000]Accuracy-Flip: 0.96200+-0.00627
Training: 2022-04-26 14:19:50,491-[agedb_30][20000]Accuracy-Highest: 0.96633
Training: 2022-04-26 14:19:53,555-Speed 72.92 samples/sec   Loss 8.6895   LearningRate 0.0575   Epoch: 4   Global Step: 20010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-26 14:19:56,597-Speed 3367.15 samples/sec   Loss 8.6136   LearningRate 0.0574   Epoch: 4   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:19:59,635-Speed 3371.93 samples/sec   Loss 8.6189   LearningRate 0.0574   Epoch: 4   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:02,756-Speed 3281.06 samples/sec   Loss 8.7706   LearningRate 0.0574   Epoch: 4   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:05,840-Speed 3321.34 samples/sec   Loss 8.6723   LearningRate 0.0574   Epoch: 4   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:08,922-Speed 3323.05 samples/sec   Loss 8.7603   LearningRate 0.0574   Epoch: 4   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:11,985-Speed 3344.04 samples/sec   Loss 8.5574   LearningRate 0.0574   Epoch: 4   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:15,080-Speed 3309.44 samples/sec   Loss 8.7027   LearningRate 0.0573   Epoch: 4   Global Step: 20080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:18,160-Speed 3325.65 samples/sec   Loss 8.7682   LearningRate 0.0573   Epoch: 4   Global Step: 20090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:21,236-Speed 3329.72 samples/sec   Loss 8.6947   LearningRate 0.0573   Epoch: 4   Global Step: 20100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:24,304-Speed 3338.97 samples/sec   Loss 8.8001   LearningRate 0.0573   Epoch: 4   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:27,344-Speed 3369.24 samples/sec   Loss 8.7285   LearningRate 0.0573   Epoch: 4   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:30,425-Speed 3324.44 samples/sec   Loss 8.7596   LearningRate 0.0572   Epoch: 4   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:33,518-Speed 3310.80 samples/sec   Loss 8.7105   LearningRate 0.0572   Epoch: 4   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-26 14:20:36,596-Speed 3328.58 samples/sec   Loss 8.5496   LearningRate 0.0572   Epoch: 4   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:20:39,677-Speed 3323.62 samples/sec   Loss 8.6451   LearningRate 0.0572   Epoch: 4   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:20:42,829-Speed 3250.13 samples/sec   Loss 8.5412   LearningRate 0.0572   Epoch: 4   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:20:45,888-Speed 3348.48 samples/sec   Loss 8.7179   LearningRate 0.0572   Epoch: 4   Global Step: 20180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:20:48,964-Speed 3329.54 samples/sec   Loss 8.6430   LearningRate 0.0571   Epoch: 4   Global Step: 20190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:20:52,022-Speed 3349.48 samples/sec   Loss 8.4866   LearningRate 0.0571   Epoch: 4   Global Step: 20200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:20:55,119-Speed 3306.33 samples/sec   Loss 8.7554   LearningRate 0.0571   Epoch: 4   Global Step: 20210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:20:58,194-Speed 3331.17 samples/sec   Loss 8.6469   LearningRate 0.0571   Epoch: 4   Global Step: 20220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:01,265-Speed 3335.25 samples/sec   Loss 8.8055   LearningRate 0.0571   Epoch: 4   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:04,329-Speed 3343.30 samples/sec   Loss 8.6584   LearningRate 0.0570   Epoch: 4   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:07,376-Speed 3361.15 samples/sec   Loss 8.6287   LearningRate 0.0570   Epoch: 4   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:10,435-Speed 3347.75 samples/sec   Loss 8.6495   LearningRate 0.0570   Epoch: 4   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:13,506-Speed 3335.73 samples/sec   Loss 8.6606   LearningRate 0.0570   Epoch: 4   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:16,593-Speed 3318.25 samples/sec   Loss 8.7593   LearningRate 0.0570   Epoch: 4   Global Step: 20280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:21:19,639-Speed 3362.57 samples/sec   Loss 8.6943   LearningRate 0.0570   Epoch: 4   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:21:22,707-Speed 3338.71 samples/sec   Loss 8.6196   LearningRate 0.0569   Epoch: 4   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:21:25,757-Speed 3357.40 samples/sec   Loss 8.5867   LearningRate 0.0569   Epoch: 4   Global Step: 20310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:28,826-Speed 3337.97 samples/sec   Loss 8.6284   LearningRate 0.0569   Epoch: 4   Global Step: 20320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:31,887-Speed 3346.02 samples/sec   Loss 8.5967   LearningRate 0.0569   Epoch: 4   Global Step: 20330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:34,954-Speed 3339.63 samples/sec   Loss 8.6538   LearningRate 0.0569   Epoch: 4   Global Step: 20340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:38,022-Speed 3337.79 samples/sec   Loss 8.7046   LearningRate 0.0568   Epoch: 4   Global Step: 20350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:41,072-Speed 3358.15 samples/sec   Loss 8.7671   LearningRate 0.0568   Epoch: 4   Global Step: 20360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:44,123-Speed 3357.77 samples/sec   Loss 8.5723   LearningRate 0.0568   Epoch: 4   Global Step: 20370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:47,177-Speed 3353.50 samples/sec   Loss 8.6245   LearningRate 0.0568   Epoch: 4   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:50,241-Speed 3343.28 samples/sec   Loss 8.6768   LearningRate 0.0568   Epoch: 4   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:53,384-Speed 3258.92 samples/sec   Loss 8.6486   LearningRate 0.0567   Epoch: 4   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:21:56,415-Speed 3379.23 samples/sec   Loss 8.6928   LearningRate 0.0567   Epoch: 4   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:21:59,450-Speed 3373.78 samples/sec   Loss 8.6703   LearningRate 0.0567   Epoch: 4   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:02,535-Speed 3320.56 samples/sec   Loss 8.7460   LearningRate 0.0567   Epoch: 4   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:05,584-Speed 3359.80 samples/sec   Loss 8.6347   LearningRate 0.0567   Epoch: 4   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:08,624-Speed 3368.79 samples/sec   Loss 8.6167   LearningRate 0.0567   Epoch: 4   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:11,670-Speed 3362.79 samples/sec   Loss 8.6532   LearningRate 0.0566   Epoch: 4   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:14,754-Speed 3320.99 samples/sec   Loss 8.6190   LearningRate 0.0566   Epoch: 4   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:17,830-Speed 3329.90 samples/sec   Loss 8.6520   LearningRate 0.0566   Epoch: 4   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:20,871-Speed 3367.87 samples/sec   Loss 8.6152   LearningRate 0.0566   Epoch: 4   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:23,919-Speed 3360.42 samples/sec   Loss 8.5814   LearningRate 0.0566   Epoch: 4   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:26,984-Speed 3342.47 samples/sec   Loss 8.5305   LearningRate 0.0565   Epoch: 4   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:22:30,064-Speed 3324.58 samples/sec   Loss 8.6883   LearningRate 0.0565   Epoch: 4   Global Step: 20520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:33,130-Speed 3341.08 samples/sec   Loss 8.6090   LearningRate 0.0565   Epoch: 4   Global Step: 20530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:36,174-Speed 3364.86 samples/sec   Loss 8.5825   LearningRate 0.0565   Epoch: 4   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:39,253-Speed 3325.99 samples/sec   Loss 8.5565   LearningRate 0.0565   Epoch: 4   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:42,333-Speed 3326.38 samples/sec   Loss 8.6737   LearningRate 0.0565   Epoch: 4   Global Step: 20560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:45,385-Speed 3355.93 samples/sec   Loss 8.7215   LearningRate 0.0564   Epoch: 4   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:48,450-Speed 3341.61 samples/sec   Loss 8.6519   LearningRate 0.0564   Epoch: 4   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:51,518-Speed 3338.07 samples/sec   Loss 8.6656   LearningRate 0.0564   Epoch: 4   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:54,585-Speed 3339.13 samples/sec   Loss 8.5387   LearningRate 0.0564   Epoch: 4   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:22:57,658-Speed 3333.27 samples/sec   Loss 8.6960   LearningRate 0.0564   Epoch: 4   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:23:00,713-Speed 3352.72 samples/sec   Loss 8.6139   LearningRate 0.0563   Epoch: 4   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:23:03,758-Speed 3363.53 samples/sec   Loss 8.6179   LearningRate 0.0563   Epoch: 4   Global Step: 20630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:23:06,899-Speed 3261.48 samples/sec   Loss 8.6026   LearningRate 0.0563   Epoch: 4   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:09,957-Speed 3349.19 samples/sec   Loss 8.6078   LearningRate 0.0563   Epoch: 4   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:13,019-Speed 3344.40 samples/sec   Loss 8.5779   LearningRate 0.0563   Epoch: 4   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:16,137-Speed 3285.50 samples/sec   Loss 8.5268   LearningRate 0.0563   Epoch: 4   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:29,436-Speed 770.02 samples/sec   Loss 7.5766   LearningRate 0.0562   Epoch: 5   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:32,477-Speed 3369.66 samples/sec   Loss 6.7871   LearningRate 0.0562   Epoch: 5   Global Step: 20690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:35,509-Speed 3378.21 samples/sec   Loss 6.7812   LearningRate 0.0562   Epoch: 5   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:38,578-Speed 3337.20 samples/sec   Loss 6.7544   LearningRate 0.0562   Epoch: 5   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:41,651-Speed 3333.69 samples/sec   Loss 6.7417   LearningRate 0.0562   Epoch: 5   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:44,697-Speed 3363.23 samples/sec   Loss 6.7578   LearningRate 0.0562   Epoch: 5   Global Step: 20730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:23:47,731-Speed 3377.40 samples/sec   Loss 6.7528   LearningRate 0.0561   Epoch: 5   Global Step: 20740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:23:50,767-Speed 3374.10 samples/sec   Loss 6.7495   LearningRate 0.0561   Epoch: 5   Global Step: 20750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:23:53,853-Speed 3319.07 samples/sec   Loss 6.8061   LearningRate 0.0561   Epoch: 5   Global Step: 20760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:23:56,912-Speed 3347.88 samples/sec   Loss 6.7783   LearningRate 0.0561   Epoch: 5   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:23:59,963-Speed 3357.75 samples/sec   Loss 6.7573   LearningRate 0.0561   Epoch: 5   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:03,036-Speed 3333.23 samples/sec   Loss 6.8640   LearningRate 0.0560   Epoch: 5   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:06,154-Speed 3285.79 samples/sec   Loss 6.8362   LearningRate 0.0560   Epoch: 5   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:09,179-Speed 3385.75 samples/sec   Loss 6.8634   LearningRate 0.0560   Epoch: 5   Global Step: 20810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:12,214-Speed 3375.74 samples/sec   Loss 6.8940   LearningRate 0.0560   Epoch: 5   Global Step: 20820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:15,299-Speed 3320.45 samples/sec   Loss 6.8726   LearningRate 0.0560   Epoch: 5   Global Step: 20830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:18,412-Speed 3290.87 samples/sec   Loss 6.9248   LearningRate 0.0560   Epoch: 5   Global Step: 20840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:21,460-Speed 3361.03 samples/sec   Loss 6.9850   LearningRate 0.0559   Epoch: 5   Global Step: 20850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:24,512-Speed 3356.71 samples/sec   Loss 6.9501   LearningRate 0.0559   Epoch: 5   Global Step: 20860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:27,581-Speed 3336.92 samples/sec   Loss 6.9325   LearningRate 0.0559   Epoch: 5   Global Step: 20870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:30,655-Speed 3333.22 samples/sec   Loss 7.0555   LearningRate 0.0559   Epoch: 5   Global Step: 20880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:33,717-Speed 3344.86 samples/sec   Loss 6.8668   LearningRate 0.0559   Epoch: 5   Global Step: 20890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:36,779-Speed 3346.40 samples/sec   Loss 6.9402   LearningRate 0.0558   Epoch: 5   Global Step: 20900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:24:40,623-Speed 2664.00 samples/sec   Loss 7.1676   LearningRate 0.0558   Epoch: 5   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:44,101-Speed 2945.65 samples/sec   Loss 6.9586   LearningRate 0.0558   Epoch: 5   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:47,160-Speed 3349.26 samples/sec   Loss 7.0368   LearningRate 0.0558   Epoch: 5   Global Step: 20930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:50,209-Speed 3359.40 samples/sec   Loss 7.0552   LearningRate 0.0558   Epoch: 5   Global Step: 20940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:53,287-Speed 3327.87 samples/sec   Loss 7.1233   LearningRate 0.0558   Epoch: 5   Global Step: 20950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:56,332-Speed 3364.72 samples/sec   Loss 6.9579   LearningRate 0.0557   Epoch: 5   Global Step: 20960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:24:59,437-Speed 3299.56 samples/sec   Loss 7.2086   LearningRate 0.0557   Epoch: 5   Global Step: 20970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:02,562-Speed 3277.54 samples/sec   Loss 7.0777   LearningRate 0.0557   Epoch: 5   Global Step: 20980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:05,681-Speed 3284.21 samples/sec   Loss 7.2366   LearningRate 0.0557   Epoch: 5   Global Step: 20990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:08,762-Speed 3325.27 samples/sec   Loss 7.1754   LearningRate 0.0557   Epoch: 5   Global Step: 21000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:11,834-Speed 3334.37 samples/sec   Loss 7.2214   LearningRate 0.0556   Epoch: 5   Global Step: 21010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:14,900-Speed 3340.45 samples/sec   Loss 7.3054   LearningRate 0.0556   Epoch: 5   Global Step: 21020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:17,971-Speed 3335.67 samples/sec   Loss 7.1288   LearningRate 0.0556   Epoch: 5   Global Step: 21030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:21,034-Speed 3343.68 samples/sec   Loss 7.2531   LearningRate 0.0556   Epoch: 5   Global Step: 21040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:24,098-Speed 3343.59 samples/sec   Loss 7.2376   LearningRate 0.0556   Epoch: 5   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:27,187-Speed 3316.69 samples/sec   Loss 7.3054   LearningRate 0.0556   Epoch: 5   Global Step: 21060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:30,248-Speed 3346.17 samples/sec   Loss 7.1926   LearningRate 0.0555   Epoch: 5   Global Step: 21070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:33,312-Speed 3343.98 samples/sec   Loss 7.1913   LearningRate 0.0555   Epoch: 5   Global Step: 21080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:36,419-Speed 3296.08 samples/sec   Loss 7.2589   LearningRate 0.0555   Epoch: 5   Global Step: 21090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:39,499-Speed 3325.74 samples/sec   Loss 7.3496   LearningRate 0.0555   Epoch: 5   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:42,571-Speed 3334.64 samples/sec   Loss 7.2091   LearningRate 0.0555   Epoch: 5   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:45,641-Speed 3337.33 samples/sec   Loss 7.2205   LearningRate 0.0554   Epoch: 5   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:48,727-Speed 3318.86 samples/sec   Loss 7.3482   LearningRate 0.0554   Epoch: 5   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:51,798-Speed 3335.70 samples/sec   Loss 7.4049   LearningRate 0.0554   Epoch: 5   Global Step: 21140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:25:54,889-Speed 3313.32 samples/sec   Loss 7.3488   LearningRate 0.0554   Epoch: 5   Global Step: 21150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:25:57,973-Speed 3320.98 samples/sec   Loss 7.1826   LearningRate 0.0554   Epoch: 5   Global Step: 21160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:01,033-Speed 3349.14 samples/sec   Loss 7.3424   LearningRate 0.0554   Epoch: 5   Global Step: 21170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:04,094-Speed 3345.12 samples/sec   Loss 7.4236   LearningRate 0.0553   Epoch: 5   Global Step: 21180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:07,162-Speed 3339.61 samples/sec   Loss 7.3497   LearningRate 0.0553   Epoch: 5   Global Step: 21190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:10,237-Speed 3331.23 samples/sec   Loss 7.3672   LearningRate 0.0553   Epoch: 5   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:13,298-Speed 3345.91 samples/sec   Loss 7.4491   LearningRate 0.0553   Epoch: 5   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:16,417-Speed 3284.49 samples/sec   Loss 7.3233   LearningRate 0.0553   Epoch: 5   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:19,494-Speed 3328.91 samples/sec   Loss 7.4865   LearningRate 0.0552   Epoch: 5   Global Step: 21230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:22,594-Speed 3303.74 samples/sec   Loss 7.4635   LearningRate 0.0552   Epoch: 5   Global Step: 21240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:25,661-Speed 3340.19 samples/sec   Loss 7.5846   LearningRate 0.0552   Epoch: 5   Global Step: 21250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:28,774-Speed 3290.66 samples/sec   Loss 7.5084   LearningRate 0.0552   Epoch: 5   Global Step: 21260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:31,849-Speed 3331.70 samples/sec   Loss 7.4927   LearningRate 0.0552   Epoch: 5   Global Step: 21270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:34,959-Speed 3293.80 samples/sec   Loss 7.4802   LearningRate 0.0552   Epoch: 5   Global Step: 21280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:38,103-Speed 3258.12 samples/sec   Loss 7.4089   LearningRate 0.0551   Epoch: 5   Global Step: 21290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:41,225-Speed 3280.99 samples/sec   Loss 7.4538   LearningRate 0.0551   Epoch: 5   Global Step: 21300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:44,318-Speed 3311.43 samples/sec   Loss 7.6057   LearningRate 0.0551   Epoch: 5   Global Step: 21310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:26:47,394-Speed 3330.22 samples/sec   Loss 7.5080   LearningRate 0.0551   Epoch: 5   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:26:50,478-Speed 3322.21 samples/sec   Loss 7.5964   LearningRate 0.0551   Epoch: 5   Global Step: 21330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:26:53,576-Speed 3306.36 samples/sec   Loss 7.4223   LearningRate 0.0551   Epoch: 5   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:26:56,666-Speed 3315.07 samples/sec   Loss 7.4561   LearningRate 0.0550   Epoch: 5   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:26:59,750-Speed 3321.48 samples/sec   Loss 7.5093   LearningRate 0.0550   Epoch: 5   Global Step: 21360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:27:02,849-Speed 3305.12 samples/sec   Loss 7.5003   LearningRate 0.0550   Epoch: 5   Global Step: 21370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:27:05,931-Speed 3324.08 samples/sec   Loss 7.5341   LearningRate 0.0550   Epoch: 5   Global Step: 21380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:27:08,991-Speed 3347.15 samples/sec   Loss 7.5879   LearningRate 0.0550   Epoch: 5   Global Step: 21390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:27:12,082-Speed 3313.01 samples/sec   Loss 7.5243   LearningRate 0.0549   Epoch: 5   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:27:15,142-Speed 3347.79 samples/sec   Loss 7.6006   LearningRate 0.0549   Epoch: 5   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:27:18,253-Speed 3293.26 samples/sec   Loss 7.5369   LearningRate 0.0549   Epoch: 5   Global Step: 21420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:21,339-Speed 3319.57 samples/sec   Loss 7.6221   LearningRate 0.0549   Epoch: 5   Global Step: 21430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:24,450-Speed 3292.83 samples/sec   Loss 7.5891   LearningRate 0.0549   Epoch: 5   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:27,542-Speed 3312.18 samples/sec   Loss 7.6216   LearningRate 0.0549   Epoch: 5   Global Step: 21450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:30,620-Speed 3328.86 samples/sec   Loss 7.3834   LearningRate 0.0548   Epoch: 5   Global Step: 21460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:33,742-Speed 3281.13 samples/sec   Loss 7.6863   LearningRate 0.0548   Epoch: 5   Global Step: 21470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:36,827-Speed 3319.48 samples/sec   Loss 7.6438   LearningRate 0.0548   Epoch: 5   Global Step: 21480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:39,943-Speed 3288.41 samples/sec   Loss 7.6699   LearningRate 0.0548   Epoch: 5   Global Step: 21490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:43,029-Speed 3319.25 samples/sec   Loss 7.5819   LearningRate 0.0548   Epoch: 5   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:46,123-Speed 3311.08 samples/sec   Loss 7.6467   LearningRate 0.0547   Epoch: 5   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:49,209-Speed 3318.88 samples/sec   Loss 7.6860   LearningRate 0.0547   Epoch: 5   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:52,412-Speed 3197.61 samples/sec   Loss 7.7523   LearningRate 0.0547   Epoch: 5   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:55,510-Speed 3306.75 samples/sec   Loss 7.7012   LearningRate 0.0547   Epoch: 5   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:27:58,626-Speed 3288.07 samples/sec   Loss 7.4709   LearningRate 0.0547   Epoch: 5   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:01,703-Speed 3329.16 samples/sec   Loss 7.6796   LearningRate 0.0547   Epoch: 5   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:04,834-Speed 3270.76 samples/sec   Loss 7.6040   LearningRate 0.0546   Epoch: 5   Global Step: 21570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:07,934-Speed 3305.02 samples/sec   Loss 7.7185   LearningRate 0.0546   Epoch: 5   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:11,040-Speed 3297.06 samples/sec   Loss 7.6243   LearningRate 0.0546   Epoch: 5   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:14,155-Speed 3289.02 samples/sec   Loss 7.7733   LearningRate 0.0546   Epoch: 5   Global Step: 21600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:17,245-Speed 3314.73 samples/sec   Loss 7.7780   LearningRate 0.0546   Epoch: 5   Global Step: 21610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:20,340-Speed 3310.39 samples/sec   Loss 7.7718   LearningRate 0.0545   Epoch: 5   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:23,423-Speed 3322.04 samples/sec   Loss 7.6403   LearningRate 0.0545   Epoch: 5   Global Step: 21630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:26,522-Speed 3305.07 samples/sec   Loss 7.7066   LearningRate 0.0545   Epoch: 5   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:29,601-Speed 3327.53 samples/sec   Loss 7.7776   LearningRate 0.0545   Epoch: 5   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:28:32,662-Speed 3346.33 samples/sec   Loss 7.7835   LearningRate 0.0545   Epoch: 5   Global Step: 21660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:35,783-Speed 3282.50 samples/sec   Loss 7.8545   LearningRate 0.0545   Epoch: 5   Global Step: 21670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:38,906-Speed 3280.03 samples/sec   Loss 7.8267   LearningRate 0.0544   Epoch: 5   Global Step: 21680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:42,015-Speed 3294.98 samples/sec   Loss 7.8459   LearningRate 0.0544   Epoch: 5   Global Step: 21690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:45,116-Speed 3303.52 samples/sec   Loss 7.6932   LearningRate 0.0544   Epoch: 5   Global Step: 21700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:48,252-Speed 3265.86 samples/sec   Loss 7.7686   LearningRate 0.0544   Epoch: 5   Global Step: 21710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:51,347-Speed 3309.46 samples/sec   Loss 7.8195   LearningRate 0.0544   Epoch: 5   Global Step: 21720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:54,467-Speed 3284.29 samples/sec   Loss 7.7654   LearningRate 0.0544   Epoch: 5   Global Step: 21730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:28:57,639-Speed 3229.43 samples/sec   Loss 7.8089   LearningRate 0.0543   Epoch: 5   Global Step: 21740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:00,770-Speed 3272.12 samples/sec   Loss 7.7640   LearningRate 0.0543   Epoch: 5   Global Step: 21750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:03,969-Speed 3201.59 samples/sec   Loss 7.7100   LearningRate 0.0543   Epoch: 5   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:29:07,097-Speed 3275.36 samples/sec   Loss 7.7699   LearningRate 0.0543   Epoch: 5   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:29:10,191-Speed 3310.16 samples/sec   Loss 7.7779   LearningRate 0.0543   Epoch: 5   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:29:13,328-Speed 3266.28 samples/sec   Loss 7.8303   LearningRate 0.0542   Epoch: 5   Global Step: 21790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:16,428-Speed 3303.41 samples/sec   Loss 7.7807   LearningRate 0.0542   Epoch: 5   Global Step: 21800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:19,551-Speed 3280.06 samples/sec   Loss 7.7720   LearningRate 0.0542   Epoch: 5   Global Step: 21810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:22,648-Speed 3307.12 samples/sec   Loss 7.8908   LearningRate 0.0542   Epoch: 5   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:25,764-Speed 3287.90 samples/sec   Loss 7.8742   LearningRate 0.0542   Epoch: 5   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:28,964-Speed 3201.32 samples/sec   Loss 7.8300   LearningRate 0.0542   Epoch: 5   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:32,068-Speed 3299.61 samples/sec   Loss 7.9156   LearningRate 0.0541   Epoch: 5   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:35,166-Speed 3306.39 samples/sec   Loss 7.8797   LearningRate 0.0541   Epoch: 5   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:38,278-Speed 3291.69 samples/sec   Loss 7.8530   LearningRate 0.0541   Epoch: 5   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:41,361-Speed 3323.08 samples/sec   Loss 7.7706   LearningRate 0.0541   Epoch: 5   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:44,460-Speed 3304.92 samples/sec   Loss 7.9473   LearningRate 0.0541   Epoch: 5   Global Step: 21890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:29:47,613-Speed 3248.68 samples/sec   Loss 7.9184   LearningRate 0.0541   Epoch: 5   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:29:50,715-Speed 3302.81 samples/sec   Loss 7.9590   LearningRate 0.0540   Epoch: 5   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:53,860-Speed 3257.50 samples/sec   Loss 7.9143   LearningRate 0.0540   Epoch: 5   Global Step: 21920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:29:56,956-Speed 3307.88 samples/sec   Loss 7.9360   LearningRate 0.0540   Epoch: 5   Global Step: 21930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:30:00,153-Speed 3204.62 samples/sec   Loss 7.7599   LearningRate 0.0540   Epoch: 5   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:30:03,253-Speed 3304.25 samples/sec   Loss 7.9194   LearningRate 0.0540   Epoch: 5   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:30:06,352-Speed 3305.74 samples/sec   Loss 7.8870   LearningRate 0.0539   Epoch: 5   Global Step: 21960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:30:09,428-Speed 3329.46 samples/sec   Loss 7.9164   LearningRate 0.0539   Epoch: 5   Global Step: 21970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:30:12,523-Speed 3309.64 samples/sec   Loss 7.9231   LearningRate 0.0539   Epoch: 5   Global Step: 21980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:30:15,647-Speed 3279.21 samples/sec   Loss 7.9291   LearningRate 0.0539   Epoch: 5   Global Step: 21990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:30:18,735-Speed 3317.54 samples/sec   Loss 7.9644   LearningRate 0.0539   Epoch: 5   Global Step: 22000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:31:02,672-[lfw][22000]XNorm: 22.109012
Training: 2022-04-26 14:31:02,673-[lfw][22000]Accuracy-Flip: 0.99617+-0.00334
Training: 2022-04-26 14:31:02,674-[lfw][22000]Accuracy-Highest: 0.99650
Training: 2022-04-26 14:31:53,384-[cfp_fp][22000]XNorm: 20.490758
Training: 2022-04-26 14:31:53,385-[cfp_fp][22000]Accuracy-Flip: 0.97971+-0.00833
Training: 2022-04-26 14:31:53,385-[cfp_fp][22000]Accuracy-Highest: 0.97971
Training: 2022-04-26 14:32:36,988-[agedb_30][22000]XNorm: 21.845812
Training: 2022-04-26 14:32:36,988-[agedb_30][22000]Accuracy-Flip: 0.96833+-0.00745
Training: 2022-04-26 14:32:36,989-[agedb_30][22000]Accuracy-Highest: 0.96833
Training: 2022-04-26 14:32:40,074-Speed 72.45 samples/sec   Loss 7.8237   LearningRate 0.0539   Epoch: 5   Global Step: 22010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:32:43,207-Speed 3268.84 samples/sec   Loss 7.8840   LearningRate 0.0538   Epoch: 5   Global Step: 22020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:32:46,287-Speed 3326.40 samples/sec   Loss 7.9063   LearningRate 0.0538   Epoch: 5   Global Step: 22030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:32:49,375-Speed 3317.42 samples/sec   Loss 7.9420   LearningRate 0.0538   Epoch: 5   Global Step: 22040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:32:52,500-Speed 3277.22 samples/sec   Loss 7.8811   LearningRate 0.0538   Epoch: 5   Global Step: 22050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:32:55,583-Speed 3322.56 samples/sec   Loss 7.9350   LearningRate 0.0538   Epoch: 5   Global Step: 22060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:32:58,683-Speed 3303.59 samples/sec   Loss 7.8669   LearningRate 0.0537   Epoch: 5   Global Step: 22070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:33:01,812-Speed 3273.88 samples/sec   Loss 7.9947   LearningRate 0.0537   Epoch: 5   Global Step: 22080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:04,908-Speed 3308.11 samples/sec   Loss 7.8866   LearningRate 0.0537   Epoch: 5   Global Step: 22090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:07,999-Speed 3313.22 samples/sec   Loss 7.9431   LearningRate 0.0537   Epoch: 5   Global Step: 22100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:11,096-Speed 3308.14 samples/sec   Loss 7.8782   LearningRate 0.0537   Epoch: 5   Global Step: 22110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:14,202-Speed 3297.00 samples/sec   Loss 7.9407   LearningRate 0.0537   Epoch: 5   Global Step: 22120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:17,303-Speed 3303.47 samples/sec   Loss 7.8527   LearningRate 0.0536   Epoch: 5   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:20,406-Speed 3300.93 samples/sec   Loss 7.9453   LearningRate 0.0536   Epoch: 5   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:23,509-Speed 3301.25 samples/sec   Loss 7.9940   LearningRate 0.0536   Epoch: 5   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:26,604-Speed 3308.98 samples/sec   Loss 7.8817   LearningRate 0.0536   Epoch: 5   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:29,709-Speed 3299.07 samples/sec   Loss 7.8970   LearningRate 0.0536   Epoch: 5   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:33:32,806-Speed 3307.27 samples/sec   Loss 7.7391   LearningRate 0.0536   Epoch: 5   Global Step: 22180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:35,923-Speed 3285.82 samples/sec   Loss 7.9803   LearningRate 0.0535   Epoch: 5   Global Step: 22190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:39,034-Speed 3292.34 samples/sec   Loss 7.9837   LearningRate 0.0535   Epoch: 5   Global Step: 22200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:42,142-Speed 3295.79 samples/sec   Loss 7.9220   LearningRate 0.0535   Epoch: 5   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:45,243-Speed 3303.89 samples/sec   Loss 7.9690   LearningRate 0.0535   Epoch: 5   Global Step: 22220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:48,349-Speed 3297.77 samples/sec   Loss 7.8911   LearningRate 0.0535   Epoch: 5   Global Step: 22230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:51,446-Speed 3307.96 samples/sec   Loss 7.9178   LearningRate 0.0534   Epoch: 5   Global Step: 22240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:54,545-Speed 3304.69 samples/sec   Loss 8.0718   LearningRate 0.0534   Epoch: 5   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:33:57,636-Speed 3313.67 samples/sec   Loss 8.0705   LearningRate 0.0534   Epoch: 5   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:34:00,718-Speed 3324.11 samples/sec   Loss 7.9336   LearningRate 0.0534   Epoch: 5   Global Step: 22270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:03,798-Speed 3325.10 samples/sec   Loss 7.9046   LearningRate 0.0534   Epoch: 5   Global Step: 22280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:06,896-Speed 3306.35 samples/sec   Loss 7.8045   LearningRate 0.0534   Epoch: 5   Global Step: 22290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:09,988-Speed 3313.49 samples/sec   Loss 7.8937   LearningRate 0.0533   Epoch: 5   Global Step: 22300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:13,092-Speed 3299.63 samples/sec   Loss 7.9314   LearningRate 0.0533   Epoch: 5   Global Step: 22310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:16,204-Speed 3292.64 samples/sec   Loss 7.8811   LearningRate 0.0533   Epoch: 5   Global Step: 22320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:19,316-Speed 3291.00 samples/sec   Loss 8.0699   LearningRate 0.0533   Epoch: 5   Global Step: 22330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:22,407-Speed 3314.77 samples/sec   Loss 8.0289   LearningRate 0.0533   Epoch: 5   Global Step: 22340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:25,505-Speed 3306.25 samples/sec   Loss 7.9127   LearningRate 0.0533   Epoch: 5   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:28,602-Speed 3307.46 samples/sec   Loss 7.9454   LearningRate 0.0532   Epoch: 5   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:31,692-Speed 3314.75 samples/sec   Loss 7.9402   LearningRate 0.0532   Epoch: 5   Global Step: 22370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:34:34,785-Speed 3312.58 samples/sec   Loss 8.1257   LearningRate 0.0532   Epoch: 5   Global Step: 22380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:34:37,900-Speed 3288.44 samples/sec   Loss 7.9210   LearningRate 0.0532   Epoch: 5   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:40,994-Speed 3310.51 samples/sec   Loss 7.8434   LearningRate 0.0532   Epoch: 5   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:44,117-Speed 3280.64 samples/sec   Loss 7.9332   LearningRate 0.0531   Epoch: 5   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:47,227-Speed 3293.15 samples/sec   Loss 8.0887   LearningRate 0.0531   Epoch: 5   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:50,360-Speed 3269.89 samples/sec   Loss 8.0248   LearningRate 0.0531   Epoch: 5   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:53,526-Speed 3235.78 samples/sec   Loss 8.0598   LearningRate 0.0531   Epoch: 5   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:56,641-Speed 3288.25 samples/sec   Loss 7.8647   LearningRate 0.0531   Epoch: 5   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:34:59,737-Speed 3309.28 samples/sec   Loss 8.0883   LearningRate 0.0531   Epoch: 5   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:35:02,870-Speed 3269.75 samples/sec   Loss 7.9635   LearningRate 0.0530   Epoch: 5   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:35:05,973-Speed 3300.66 samples/sec   Loss 8.0747   LearningRate 0.0530   Epoch: 5   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:35:09,083-Speed 3293.98 samples/sec   Loss 8.1416   LearningRate 0.0530   Epoch: 5   Global Step: 22490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:12,162-Speed 3326.25 samples/sec   Loss 8.0244   LearningRate 0.0530   Epoch: 5   Global Step: 22500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:15,289-Speed 3275.92 samples/sec   Loss 7.9677   LearningRate 0.0530   Epoch: 5   Global Step: 22510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:18,387-Speed 3306.73 samples/sec   Loss 8.0790   LearningRate 0.0530   Epoch: 5   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:21,535-Speed 3254.16 samples/sec   Loss 7.9886   LearningRate 0.0529   Epoch: 5   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:24,695-Speed 3242.22 samples/sec   Loss 8.0125   LearningRate 0.0529   Epoch: 5   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:27,821-Speed 3276.63 samples/sec   Loss 8.0165   LearningRate 0.0529   Epoch: 5   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:30,950-Speed 3273.83 samples/sec   Loss 8.1101   LearningRate 0.0529   Epoch: 5   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:34,067-Speed 3286.34 samples/sec   Loss 8.0416   LearningRate 0.0529   Epoch: 5   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:37,196-Speed 3273.75 samples/sec   Loss 7.9441   LearningRate 0.0528   Epoch: 5   Global Step: 22580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:40,270-Speed 3331.58 samples/sec   Loss 8.0346   LearningRate 0.0528   Epoch: 5   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:43,369-Speed 3305.87 samples/sec   Loss 8.1480   LearningRate 0.0528   Epoch: 5   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:35:46,501-Speed 3270.19 samples/sec   Loss 7.8910   LearningRate 0.0528   Epoch: 5   Global Step: 22610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:35:49,638-Speed 3265.61 samples/sec   Loss 7.9440   LearningRate 0.0528   Epoch: 5   Global Step: 22620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:35:52,941-Speed 3101.39 samples/sec   Loss 8.0509   LearningRate 0.0528   Epoch: 5   Global Step: 22630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:35:56,095-Speed 3247.88 samples/sec   Loss 8.0692   LearningRate 0.0527   Epoch: 5   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:35:59,213-Speed 3285.87 samples/sec   Loss 8.0201   LearningRate 0.0527   Epoch: 5   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:02,313-Speed 3304.07 samples/sec   Loss 8.0613   LearningRate 0.0527   Epoch: 5   Global Step: 22660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:05,440-Speed 3275.35 samples/sec   Loss 8.0642   LearningRate 0.0527   Epoch: 5   Global Step: 22670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:08,528-Speed 3318.03 samples/sec   Loss 7.9766   LearningRate 0.0527   Epoch: 5   Global Step: 22680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:11,652-Speed 3278.20 samples/sec   Loss 7.9768   LearningRate 0.0527   Epoch: 5   Global Step: 22690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:14,798-Speed 3256.45 samples/sec   Loss 8.0461   LearningRate 0.0526   Epoch: 5   Global Step: 22700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:17,901-Speed 3301.32 samples/sec   Loss 7.9545   LearningRate 0.0526   Epoch: 5   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:36:20,991-Speed 3315.42 samples/sec   Loss 7.9796   LearningRate 0.0526   Epoch: 5   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:36:24,070-Speed 3326.98 samples/sec   Loss 8.1030   LearningRate 0.0526   Epoch: 5   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:27,176-Speed 3297.22 samples/sec   Loss 8.0870   LearningRate 0.0526   Epoch: 5   Global Step: 22740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:30,269-Speed 3311.70 samples/sec   Loss 7.9022   LearningRate 0.0525   Epoch: 5   Global Step: 22750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:33,436-Speed 3234.33 samples/sec   Loss 8.0908   LearningRate 0.0525   Epoch: 5   Global Step: 22760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:36,524-Speed 3317.38 samples/sec   Loss 7.9702   LearningRate 0.0525   Epoch: 5   Global Step: 22770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:39,650-Speed 3276.38 samples/sec   Loss 8.0252   LearningRate 0.0525   Epoch: 5   Global Step: 22780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:42,762-Speed 3292.51 samples/sec   Loss 8.0099   LearningRate 0.0525   Epoch: 5   Global Step: 22790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:45,897-Speed 3267.60 samples/sec   Loss 8.0238   LearningRate 0.0525   Epoch: 5   Global Step: 22800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:48,987-Speed 3315.58 samples/sec   Loss 8.0691   LearningRate 0.0524   Epoch: 5   Global Step: 22810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:52,082-Speed 3309.21 samples/sec   Loss 7.9743   LearningRate 0.0524   Epoch: 5   Global Step: 22820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:36:55,187-Speed 3299.35 samples/sec   Loss 8.0506   LearningRate 0.0524   Epoch: 5   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:36:58,281-Speed 3310.10 samples/sec   Loss 8.0183   LearningRate 0.0524   Epoch: 5   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:37:01,431-Speed 3252.30 samples/sec   Loss 8.0376   LearningRate 0.0524   Epoch: 5   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:37:04,518-Speed 3318.95 samples/sec   Loss 8.0934   LearningRate 0.0524   Epoch: 5   Global Step: 22860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:37:07,622-Speed 3299.68 samples/sec   Loss 7.9275   LearningRate 0.0523   Epoch: 5   Global Step: 22870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:37:10,720-Speed 3306.15 samples/sec   Loss 8.0924   LearningRate 0.0523   Epoch: 5   Global Step: 22880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:37:13,831-Speed 3292.50 samples/sec   Loss 8.0541   LearningRate 0.0523   Epoch: 5   Global Step: 22890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:37:16,945-Speed 3290.20 samples/sec   Loss 8.0096   LearningRate 0.0523   Epoch: 5   Global Step: 22900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:37:20,062-Speed 3286.27 samples/sec   Loss 8.0760   LearningRate 0.0523   Epoch: 5   Global Step: 22910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:37:23,185-Speed 3280.35 samples/sec   Loss 8.0345   LearningRate 0.0523   Epoch: 5   Global Step: 22920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:37:26,282-Speed 3307.56 samples/sec   Loss 8.1037   LearningRate 0.0522   Epoch: 5   Global Step: 22930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:29,378-Speed 3308.98 samples/sec   Loss 8.0257   LearningRate 0.0522   Epoch: 5   Global Step: 22940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:32,501-Speed 3279.60 samples/sec   Loss 8.0585   LearningRate 0.0522   Epoch: 5   Global Step: 22950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:35,595-Speed 3310.56 samples/sec   Loss 8.0453   LearningRate 0.0522   Epoch: 5   Global Step: 22960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:38,738-Speed 3259.46 samples/sec   Loss 7.9609   LearningRate 0.0522   Epoch: 5   Global Step: 22970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:41,921-Speed 3218.62 samples/sec   Loss 8.1068   LearningRate 0.0521   Epoch: 5   Global Step: 22980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:45,013-Speed 3312.76 samples/sec   Loss 8.0213   LearningRate 0.0521   Epoch: 5   Global Step: 22990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:48,134-Speed 3281.73 samples/sec   Loss 7.9527   LearningRate 0.0521   Epoch: 5   Global Step: 23000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:51,253-Speed 3284.86 samples/sec   Loss 7.9855   LearningRate 0.0521   Epoch: 5   Global Step: 23010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:54,346-Speed 3311.99 samples/sec   Loss 7.9183   LearningRate 0.0521   Epoch: 5   Global Step: 23020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 14:37:57,459-Speed 3290.90 samples/sec   Loss 8.1110   LearningRate 0.0521   Epoch: 5   Global Step: 23030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:00,560-Speed 3302.58 samples/sec   Loss 7.9687   LearningRate 0.0520   Epoch: 5   Global Step: 23040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:03,656-Speed 3308.97 samples/sec   Loss 7.9280   LearningRate 0.0520   Epoch: 5   Global Step: 23050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:06,787-Speed 3271.38 samples/sec   Loss 8.1794   LearningRate 0.0520   Epoch: 5   Global Step: 23060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:09,906-Speed 3284.60 samples/sec   Loss 7.9404   LearningRate 0.0520   Epoch: 5   Global Step: 23070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:13,018-Speed 3291.76 samples/sec   Loss 8.0563   LearningRate 0.0520   Epoch: 5   Global Step: 23080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:16,175-Speed 3244.90 samples/sec   Loss 7.9052   LearningRate 0.0520   Epoch: 5   Global Step: 23090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:19,314-Speed 3263.22 samples/sec   Loss 7.9711   LearningRate 0.0519   Epoch: 5   Global Step: 23100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:22,429-Speed 3288.42 samples/sec   Loss 8.0304   LearningRate 0.0519   Epoch: 5   Global Step: 23110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:25,523-Speed 3310.41 samples/sec   Loss 8.1083   LearningRate 0.0519   Epoch: 5   Global Step: 23120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:38:28,637-Speed 3289.25 samples/sec   Loss 8.1142   LearningRate 0.0519   Epoch: 5   Global Step: 23130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:31,740-Speed 3301.26 samples/sec   Loss 8.0367   LearningRate 0.0519   Epoch: 5   Global Step: 23140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:34,862-Speed 3281.74 samples/sec   Loss 8.0890   LearningRate 0.0519   Epoch: 5   Global Step: 23150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:37,956-Speed 3310.37 samples/sec   Loss 8.0175   LearningRate 0.0518   Epoch: 5   Global Step: 23160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:41,047-Speed 3314.08 samples/sec   Loss 8.0575   LearningRate 0.0518   Epoch: 5   Global Step: 23170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:44,154-Speed 3297.24 samples/sec   Loss 8.0099   LearningRate 0.0518   Epoch: 5   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:47,265-Speed 3292.71 samples/sec   Loss 7.9844   LearningRate 0.0518   Epoch: 5   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:50,363-Speed 3307.12 samples/sec   Loss 8.0290   LearningRate 0.0518   Epoch: 5   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:53,528-Speed 3235.73 samples/sec   Loss 8.0353   LearningRate 0.0517   Epoch: 5   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:56,628-Speed 3304.78 samples/sec   Loss 7.8882   LearningRate 0.0517   Epoch: 5   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:38:59,746-Speed 3285.39 samples/sec   Loss 7.9897   LearningRate 0.0517   Epoch: 5   Global Step: 23230   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-26 14:39:02,870-Speed 3279.44 samples/sec   Loss 8.0483   LearningRate 0.0517   Epoch: 5   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:06,008-Speed 3264.48 samples/sec   Loss 8.1264   LearningRate 0.0517   Epoch: 5   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:09,124-Speed 3286.43 samples/sec   Loss 8.0952   LearningRate 0.0517   Epoch: 5   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:12,320-Speed 3205.30 samples/sec   Loss 7.9721   LearningRate 0.0516   Epoch: 5   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:15,464-Speed 3257.69 samples/sec   Loss 7.9676   LearningRate 0.0516   Epoch: 5   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:18,568-Speed 3300.44 samples/sec   Loss 7.9626   LearningRate 0.0516   Epoch: 5   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:21,675-Speed 3296.65 samples/sec   Loss 8.1783   LearningRate 0.0516   Epoch: 5   Global Step: 23300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:24,855-Speed 3222.05 samples/sec   Loss 8.0002   LearningRate 0.0516   Epoch: 5   Global Step: 23310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:28,019-Speed 3236.80 samples/sec   Loss 7.9737   LearningRate 0.0516   Epoch: 5   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:31,153-Speed 3268.73 samples/sec   Loss 7.8255   LearningRate 0.0515   Epoch: 5   Global Step: 23330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:34,230-Speed 3328.42 samples/sec   Loss 7.9994   LearningRate 0.0515   Epoch: 5   Global Step: 23340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:37,357-Speed 3276.77 samples/sec   Loss 7.9784   LearningRate 0.0515   Epoch: 5   Global Step: 23350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:40,462-Speed 3298.45 samples/sec   Loss 7.9527   LearningRate 0.0515   Epoch: 5   Global Step: 23360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:43,567-Speed 3298.92 samples/sec   Loss 7.9799   LearningRate 0.0515   Epoch: 5   Global Step: 23370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:46,669-Speed 3303.90 samples/sec   Loss 8.0921   LearningRate 0.0515   Epoch: 5   Global Step: 23380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:49,813-Speed 3257.50 samples/sec   Loss 8.0828   LearningRate 0.0514   Epoch: 5   Global Step: 23390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:52,922-Speed 3295.50 samples/sec   Loss 7.8789   LearningRate 0.0514   Epoch: 5   Global Step: 23400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:56,080-Speed 3243.20 samples/sec   Loss 8.0075   LearningRate 0.0514   Epoch: 5   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:39:59,201-Speed 3282.51 samples/sec   Loss 7.9462   LearningRate 0.0514   Epoch: 5   Global Step: 23420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:02,316-Speed 3287.88 samples/sec   Loss 7.9638   LearningRate 0.0514   Epoch: 5   Global Step: 23430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:05,432-Speed 3287.62 samples/sec   Loss 8.0622   LearningRate 0.0513   Epoch: 5   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:08,528-Speed 3308.29 samples/sec   Loss 7.8897   LearningRate 0.0513   Epoch: 5   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:11,642-Speed 3289.32 samples/sec   Loss 8.0418   LearningRate 0.0513   Epoch: 5   Global Step: 23460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:14,749-Speed 3297.07 samples/sec   Loss 8.0380   LearningRate 0.0513   Epoch: 5   Global Step: 23470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:17,831-Speed 3324.15 samples/sec   Loss 7.9812   LearningRate 0.0513   Epoch: 5   Global Step: 23480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:20,948-Speed 3286.37 samples/sec   Loss 8.0261   LearningRate 0.0513   Epoch: 5   Global Step: 23490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:24,051-Speed 3301.67 samples/sec   Loss 8.0162   LearningRate 0.0512   Epoch: 5   Global Step: 23500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:27,187-Speed 3266.26 samples/sec   Loss 7.9833   LearningRate 0.0512   Epoch: 5   Global Step: 23510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:30,313-Speed 3276.53 samples/sec   Loss 7.9050   LearningRate 0.0512   Epoch: 5   Global Step: 23520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:33,416-Speed 3301.67 samples/sec   Loss 7.9216   LearningRate 0.0512   Epoch: 5   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:36,516-Speed 3303.78 samples/sec   Loss 8.0113   LearningRate 0.0512   Epoch: 5   Global Step: 23540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:39,614-Speed 3306.98 samples/sec   Loss 8.0902   LearningRate 0.0512   Epoch: 5   Global Step: 23550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:42,727-Speed 3291.25 samples/sec   Loss 8.0419   LearningRate 0.0511   Epoch: 5   Global Step: 23560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:45,826-Speed 3305.19 samples/sec   Loss 7.9644   LearningRate 0.0511   Epoch: 5   Global Step: 23570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:48,952-Speed 3277.29 samples/sec   Loss 7.9548   LearningRate 0.0511   Epoch: 5   Global Step: 23580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:52,069-Speed 3286.01 samples/sec   Loss 8.0481   LearningRate 0.0511   Epoch: 5   Global Step: 23590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:40:55,248-Speed 3222.61 samples/sec   Loss 7.9850   LearningRate 0.0511   Epoch: 5   Global Step: 23600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:40:58,349-Speed 3302.43 samples/sec   Loss 7.9070   LearningRate 0.0511   Epoch: 5   Global Step: 23610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:01,502-Speed 3249.72 samples/sec   Loss 8.0334   LearningRate 0.0510   Epoch: 5   Global Step: 23620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:04,670-Speed 3233.06 samples/sec   Loss 7.9557   LearningRate 0.0510   Epoch: 5   Global Step: 23630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:07,811-Speed 3261.90 samples/sec   Loss 8.0711   LearningRate 0.0510   Epoch: 5   Global Step: 23640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:10,912-Speed 3303.22 samples/sec   Loss 8.0176   LearningRate 0.0510   Epoch: 5   Global Step: 23650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:14,061-Speed 3252.53 samples/sec   Loss 7.9514   LearningRate 0.0510   Epoch: 5   Global Step: 23660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:17,181-Speed 3283.31 samples/sec   Loss 7.9486   LearningRate 0.0509   Epoch: 5   Global Step: 23670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:20,325-Speed 3257.98 samples/sec   Loss 7.9254   LearningRate 0.0509   Epoch: 5   Global Step: 23680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:23,440-Speed 3288.47 samples/sec   Loss 7.9457   LearningRate 0.0509   Epoch: 5   Global Step: 23690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:26,564-Speed 3279.76 samples/sec   Loss 7.9704   LearningRate 0.0509   Epoch: 5   Global Step: 23700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:41:29,670-Speed 3297.71 samples/sec   Loss 7.9782   LearningRate 0.0509   Epoch: 5   Global Step: 23710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:41:32,818-Speed 3253.58 samples/sec   Loss 8.0652   LearningRate 0.0509   Epoch: 5   Global Step: 23720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:41:35,927-Speed 3295.00 samples/sec   Loss 8.0096   LearningRate 0.0508   Epoch: 5   Global Step: 23730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:41:39,038-Speed 3292.95 samples/sec   Loss 8.0160   LearningRate 0.0508   Epoch: 5   Global Step: 23740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:41:42,145-Speed 3296.52 samples/sec   Loss 8.0139   LearningRate 0.0508   Epoch: 5   Global Step: 23750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:41:45,233-Speed 3316.87 samples/sec   Loss 7.8352   LearningRate 0.0508   Epoch: 5   Global Step: 23760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:48,339-Speed 3298.21 samples/sec   Loss 7.9130   LearningRate 0.0508   Epoch: 5   Global Step: 23770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:51,484-Speed 3257.15 samples/sec   Loss 7.9618   LearningRate 0.0508   Epoch: 5   Global Step: 23780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:54,589-Speed 3298.24 samples/sec   Loss 7.9575   LearningRate 0.0507   Epoch: 5   Global Step: 23790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:41:57,689-Speed 3304.47 samples/sec   Loss 8.0227   LearningRate 0.0507   Epoch: 5   Global Step: 23800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:00,839-Speed 3252.12 samples/sec   Loss 7.9531   LearningRate 0.0507   Epoch: 5   Global Step: 23810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:03,958-Speed 3284.41 samples/sec   Loss 7.9919   LearningRate 0.0507   Epoch: 5   Global Step: 23820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:07,091-Speed 3269.57 samples/sec   Loss 8.0512   LearningRate 0.0507   Epoch: 5   Global Step: 23830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:10,225-Speed 3268.78 samples/sec   Loss 8.0529   LearningRate 0.0507   Epoch: 5   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:13,349-Speed 3279.20 samples/sec   Loss 8.0207   LearningRate 0.0506   Epoch: 5   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:16,483-Speed 3268.17 samples/sec   Loss 7.9431   LearningRate 0.0506   Epoch: 5   Global Step: 23860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:42:19,630-Speed 3255.39 samples/sec   Loss 7.9207   LearningRate 0.0506   Epoch: 5   Global Step: 23870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:42:22,748-Speed 3285.61 samples/sec   Loss 7.7526   LearningRate 0.0506   Epoch: 5   Global Step: 23880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:42:25,881-Speed 3269.00 samples/sec   Loss 7.9732   LearningRate 0.0506   Epoch: 5   Global Step: 23890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:42:28,990-Speed 3294.85 samples/sec   Loss 7.7944   LearningRate 0.0506   Epoch: 5   Global Step: 23900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:42:32,100-Speed 3293.64 samples/sec   Loss 7.9463   LearningRate 0.0505   Epoch: 5   Global Step: 23910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:42:35,286-Speed 3215.83 samples/sec   Loss 7.9594   LearningRate 0.0505   Epoch: 5   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:38,453-Speed 3234.30 samples/sec   Loss 7.9021   LearningRate 0.0505   Epoch: 5   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:41,676-Speed 3177.70 samples/sec   Loss 8.0045   LearningRate 0.0505   Epoch: 5   Global Step: 23940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:44,794-Speed 3285.69 samples/sec   Loss 8.0547   LearningRate 0.0505   Epoch: 5   Global Step: 23950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:47,970-Speed 3225.58 samples/sec   Loss 7.9076   LearningRate 0.0504   Epoch: 5   Global Step: 23960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:51,080-Speed 3293.56 samples/sec   Loss 7.9967   LearningRate 0.0504   Epoch: 5   Global Step: 23970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:54,204-Speed 3279.29 samples/sec   Loss 7.8972   LearningRate 0.0504   Epoch: 5   Global Step: 23980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:42:57,322-Speed 3284.89 samples/sec   Loss 7.8498   LearningRate 0.0504   Epoch: 5   Global Step: 23990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:43:00,437-Speed 3288.92 samples/sec   Loss 7.9137   LearningRate 0.0504   Epoch: 5   Global Step: 24000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:43:44,545-[lfw][24000]XNorm: 23.279770
Training: 2022-04-26 14:43:44,546-[lfw][24000]Accuracy-Flip: 0.99700+-0.00267
Training: 2022-04-26 14:43:44,546-[lfw][24000]Accuracy-Highest: 0.99700
Training: 2022-04-26 14:44:35,154-[cfp_fp][24000]XNorm: 21.535113
Training: 2022-04-26 14:44:35,156-[cfp_fp][24000]Accuracy-Flip: 0.98071+-0.00683
Training: 2022-04-26 14:44:35,156-[cfp_fp][24000]Accuracy-Highest: 0.98071
Training: 2022-04-26 14:45:18,709-[agedb_30][24000]XNorm: 23.012329
Training: 2022-04-26 14:45:18,710-[agedb_30][24000]Accuracy-Flip: 0.97200+-0.00670
Training: 2022-04-26 14:45:18,710-[agedb_30][24000]Accuracy-Highest: 0.97200
Training: 2022-04-26 14:45:21,810-Speed 72.43 samples/sec   Loss 7.7947   LearningRate 0.0504   Epoch: 5   Global Step: 24010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:45:24,912-Speed 3302.13 samples/sec   Loss 7.9161   LearningRate 0.0503   Epoch: 5   Global Step: 24020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:27,998-Speed 3319.85 samples/sec   Loss 7.9285   LearningRate 0.0503   Epoch: 5   Global Step: 24030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:31,080-Speed 3322.84 samples/sec   Loss 7.8576   LearningRate 0.0503   Epoch: 5   Global Step: 24040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:34,152-Speed 3333.64 samples/sec   Loss 7.9939   LearningRate 0.0503   Epoch: 5   Global Step: 24050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:37,225-Speed 3333.27 samples/sec   Loss 7.8634   LearningRate 0.0503   Epoch: 5   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:40,300-Speed 3331.17 samples/sec   Loss 7.8934   LearningRate 0.0503   Epoch: 5   Global Step: 24070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:43,378-Speed 3328.05 samples/sec   Loss 7.8829   LearningRate 0.0502   Epoch: 5   Global Step: 24080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:46,457-Speed 3325.47 samples/sec   Loss 7.9275   LearningRate 0.0502   Epoch: 5   Global Step: 24090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:49,535-Speed 3328.58 samples/sec   Loss 8.1173   LearningRate 0.0502   Epoch: 5   Global Step: 24100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:52,621-Speed 3318.86 samples/sec   Loss 7.9546   LearningRate 0.0502   Epoch: 5   Global Step: 24110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:55,695-Speed 3331.46 samples/sec   Loss 8.0722   LearningRate 0.0502   Epoch: 5   Global Step: 24120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:45:58,782-Speed 3317.73 samples/sec   Loss 7.8021   LearningRate 0.0502   Epoch: 5   Global Step: 24130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:01,880-Speed 3307.19 samples/sec   Loss 7.8959   LearningRate 0.0501   Epoch: 5   Global Step: 24140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:04,965-Speed 3320.03 samples/sec   Loss 7.8549   LearningRate 0.0501   Epoch: 5   Global Step: 24150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:08,053-Speed 3316.39 samples/sec   Loss 7.7952   LearningRate 0.0501   Epoch: 5   Global Step: 24160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:11,166-Speed 3289.82 samples/sec   Loss 7.9181   LearningRate 0.0501   Epoch: 5   Global Step: 24170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:14,332-Speed 3236.03 samples/sec   Loss 7.8497   LearningRate 0.0501   Epoch: 5   Global Step: 24180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:17,421-Speed 3316.03 samples/sec   Loss 7.9330   LearningRate 0.0501   Epoch: 5   Global Step: 24190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:20,506-Speed 3319.65 samples/sec   Loss 7.9201   LearningRate 0.0500   Epoch: 5   Global Step: 24200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:23,596-Speed 3314.11 samples/sec   Loss 7.8767   LearningRate 0.0500   Epoch: 5   Global Step: 24210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:26,685-Speed 3316.09 samples/sec   Loss 7.8929   LearningRate 0.0500   Epoch: 5   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:29,772-Speed 3318.51 samples/sec   Loss 7.8757   LearningRate 0.0500   Epoch: 5   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:32,859-Speed 3317.81 samples/sec   Loss 8.0757   LearningRate 0.0500   Epoch: 5   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:35,946-Speed 3319.14 samples/sec   Loss 7.7270   LearningRate 0.0500   Epoch: 5   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:39,048-Speed 3301.94 samples/sec   Loss 7.9393   LearningRate 0.0499   Epoch: 5   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:46:42,163-Speed 3287.34 samples/sec   Loss 7.8918   LearningRate 0.0499   Epoch: 5   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:46:45,245-Speed 3323.98 samples/sec   Loss 7.7960   LearningRate 0.0499   Epoch: 5   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:46:48,321-Speed 3330.53 samples/sec   Loss 7.9573   LearningRate 0.0499   Epoch: 5   Global Step: 24290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:46:51,422-Speed 3302.05 samples/sec   Loss 7.8760   LearningRate 0.0499   Epoch: 5   Global Step: 24300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:46:54,503-Speed 3324.17 samples/sec   Loss 7.8649   LearningRate 0.0499   Epoch: 5   Global Step: 24310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:46:57,591-Speed 3317.05 samples/sec   Loss 7.8767   LearningRate 0.0498   Epoch: 5   Global Step: 24320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:00,668-Speed 3329.52 samples/sec   Loss 7.8731   LearningRate 0.0498   Epoch: 5   Global Step: 24330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:03,745-Speed 3328.33 samples/sec   Loss 7.9299   LearningRate 0.0498   Epoch: 5   Global Step: 24340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:06,844-Speed 3304.69 samples/sec   Loss 7.7552   LearningRate 0.0498   Epoch: 5   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:09,917-Speed 3333.45 samples/sec   Loss 8.0151   LearningRate 0.0498   Epoch: 5   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:12,988-Speed 3335.90 samples/sec   Loss 7.8615   LearningRate 0.0497   Epoch: 5   Global Step: 24370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:16,059-Speed 3335.03 samples/sec   Loss 7.8715   LearningRate 0.0497   Epoch: 5   Global Step: 24380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:19,138-Speed 3326.21 samples/sec   Loss 7.7668   LearningRate 0.0497   Epoch: 5   Global Step: 24390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:22,207-Speed 3337.68 samples/sec   Loss 7.8579   LearningRate 0.0497   Epoch: 5   Global Step: 24400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:25,295-Speed 3316.49 samples/sec   Loss 7.8619   LearningRate 0.0497   Epoch: 5   Global Step: 24410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:28,371-Speed 3329.54 samples/sec   Loss 7.8470   LearningRate 0.0497   Epoch: 5   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:31,466-Speed 3309.91 samples/sec   Loss 7.9257   LearningRate 0.0496   Epoch: 5   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:34,537-Speed 3335.28 samples/sec   Loss 7.8746   LearningRate 0.0496   Epoch: 5   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:37,609-Speed 3333.88 samples/sec   Loss 7.9012   LearningRate 0.0496   Epoch: 5   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:40,683-Speed 3332.62 samples/sec   Loss 7.8514   LearningRate 0.0496   Epoch: 5   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:43,745-Speed 3345.17 samples/sec   Loss 7.8260   LearningRate 0.0496   Epoch: 5   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:46,811-Speed 3339.92 samples/sec   Loss 7.8006   LearningRate 0.0496   Epoch: 5   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:47:49,867-Speed 3352.00 samples/sec   Loss 7.7375   LearningRate 0.0495   Epoch: 5   Global Step: 24490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:52,939-Speed 3334.59 samples/sec   Loss 7.9375   LearningRate 0.0495   Epoch: 5   Global Step: 24500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:56,017-Speed 3327.33 samples/sec   Loss 7.8571   LearningRate 0.0495   Epoch: 5   Global Step: 24510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:47:59,092-Speed 3330.82 samples/sec   Loss 7.8403   LearningRate 0.0495   Epoch: 5   Global Step: 24520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:02,164-Speed 3333.80 samples/sec   Loss 7.8225   LearningRate 0.0495   Epoch: 5   Global Step: 24530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:05,233-Speed 3338.06 samples/sec   Loss 7.9326   LearningRate 0.0495   Epoch: 5   Global Step: 24540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:08,311-Speed 3326.86 samples/sec   Loss 7.8984   LearningRate 0.0494   Epoch: 5   Global Step: 24550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:11,381-Speed 3336.27 samples/sec   Loss 7.9379   LearningRate 0.0494   Epoch: 5   Global Step: 24560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:14,452-Speed 3335.62 samples/sec   Loss 7.9095   LearningRate 0.0494   Epoch: 5   Global Step: 24570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:17,533-Speed 3324.44 samples/sec   Loss 7.7491   LearningRate 0.0494   Epoch: 5   Global Step: 24580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:20,601-Speed 3337.89 samples/sec   Loss 7.7592   LearningRate 0.0494   Epoch: 5   Global Step: 24590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:23,674-Speed 3333.56 samples/sec   Loss 7.8617   LearningRate 0.0494   Epoch: 5   Global Step: 24600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:26,767-Speed 3311.57 samples/sec   Loss 7.9327   LearningRate 0.0493   Epoch: 5   Global Step: 24610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:29,840-Speed 3333.71 samples/sec   Loss 7.7649   LearningRate 0.0493   Epoch: 5   Global Step: 24620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:32,913-Speed 3332.73 samples/sec   Loss 7.7887   LearningRate 0.0493   Epoch: 5   Global Step: 24630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:35,993-Speed 3325.71 samples/sec   Loss 7.8158   LearningRate 0.0493   Epoch: 5   Global Step: 24640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:39,076-Speed 3322.61 samples/sec   Loss 7.8125   LearningRate 0.0493   Epoch: 5   Global Step: 24650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:42,149-Speed 3332.75 samples/sec   Loss 7.7567   LearningRate 0.0493   Epoch: 5   Global Step: 24660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:45,222-Speed 3332.26 samples/sec   Loss 7.9305   LearningRate 0.0492   Epoch: 5   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:48,306-Speed 3322.04 samples/sec   Loss 7.7040   LearningRate 0.0492   Epoch: 5   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:48:51,374-Speed 3338.63 samples/sec   Loss 7.8527   LearningRate 0.0492   Epoch: 5   Global Step: 24690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:48:54,448-Speed 3331.42 samples/sec   Loss 7.8317   LearningRate 0.0492   Epoch: 5   Global Step: 24700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:48:57,517-Speed 3337.28 samples/sec   Loss 7.9205   LearningRate 0.0492   Epoch: 5   Global Step: 24710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:49:00,586-Speed 3337.28 samples/sec   Loss 7.8400   LearningRate 0.0492   Epoch: 5   Global Step: 24720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:49:03,701-Speed 3287.90 samples/sec   Loss 7.8326   LearningRate 0.0491   Epoch: 5   Global Step: 24730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:49:06,825-Speed 3279.80 samples/sec   Loss 7.7975   LearningRate 0.0491   Epoch: 5   Global Step: 24740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:49:09,879-Speed 3352.97 samples/sec   Loss 7.6075   LearningRate 0.0491   Epoch: 5   Global Step: 24750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:12,949-Speed 3336.91 samples/sec   Loss 7.7715   LearningRate 0.0491   Epoch: 5   Global Step: 24760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:16,023-Speed 3331.43 samples/sec   Loss 7.6545   LearningRate 0.0491   Epoch: 5   Global Step: 24770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:19,115-Speed 3312.22 samples/sec   Loss 7.7987   LearningRate 0.0491   Epoch: 5   Global Step: 24780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:22,190-Speed 3331.61 samples/sec   Loss 7.8375   LearningRate 0.0490   Epoch: 5   Global Step: 24790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:25,402-Speed 3188.66 samples/sec   Loss 7.7669   LearningRate 0.0490   Epoch: 5   Global Step: 24800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:28,472-Speed 3336.51 samples/sec   Loss 7.7880   LearningRate 0.0490   Epoch: 5   Global Step: 24810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:41,011-Speed 816.74 samples/sec   Loss 6.0942   LearningRate 0.0490   Epoch: 6   Global Step: 24820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:44,100-Speed 3316.64 samples/sec   Loss 6.2165   LearningRate 0.0490   Epoch: 6   Global Step: 24830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:47,181-Speed 3324.85 samples/sec   Loss 6.0897   LearningRate 0.0489   Epoch: 6   Global Step: 24840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:49:50,281-Speed 3303.35 samples/sec   Loss 6.1350   LearningRate 0.0489   Epoch: 6   Global Step: 24850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:49:53,424-Speed 3259.91 samples/sec   Loss 6.0255   LearningRate 0.0489   Epoch: 6   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:49:56,552-Speed 3275.47 samples/sec   Loss 5.9905   LearningRate 0.0489   Epoch: 6   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:49:59,635-Speed 3321.97 samples/sec   Loss 6.0717   LearningRate 0.0489   Epoch: 6   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:02,758-Speed 3280.05 samples/sec   Loss 6.1986   LearningRate 0.0489   Epoch: 6   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:05,879-Speed 3282.70 samples/sec   Loss 6.1661   LearningRate 0.0488   Epoch: 6   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:08,967-Speed 3317.47 samples/sec   Loss 6.1152   LearningRate 0.0488   Epoch: 6   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:12,104-Speed 3265.14 samples/sec   Loss 6.2821   LearningRate 0.0488   Epoch: 6   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:15,220-Speed 3287.94 samples/sec   Loss 6.2731   LearningRate 0.0488   Epoch: 6   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:18,371-Speed 3250.70 samples/sec   Loss 6.2918   LearningRate 0.0488   Epoch: 6   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:21,461-Speed 3314.92 samples/sec   Loss 6.1509   LearningRate 0.0488   Epoch: 6   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:24,597-Speed 3267.31 samples/sec   Loss 6.1244   LearningRate 0.0487   Epoch: 6   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:27,704-Speed 3295.58 samples/sec   Loss 6.3453   LearningRate 0.0487   Epoch: 6   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:30,797-Speed 3311.98 samples/sec   Loss 6.2288   LearningRate 0.0487   Epoch: 6   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:33,904-Speed 3297.29 samples/sec   Loss 6.3047   LearningRate 0.0487   Epoch: 6   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:37,057-Speed 3249.35 samples/sec   Loss 6.2169   LearningRate 0.0487   Epoch: 6   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:50:40,168-Speed 3292.87 samples/sec   Loss 6.2943   LearningRate 0.0487   Epoch: 6   Global Step: 25010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:43,289-Speed 3281.80 samples/sec   Loss 6.3765   LearningRate 0.0486   Epoch: 6   Global Step: 25020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:46,420-Speed 3271.48 samples/sec   Loss 6.4286   LearningRate 0.0486   Epoch: 6   Global Step: 25030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:49,539-Speed 3284.47 samples/sec   Loss 6.4681   LearningRate 0.0486   Epoch: 6   Global Step: 25040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:52,704-Speed 3236.48 samples/sec   Loss 6.3273   LearningRate 0.0486   Epoch: 6   Global Step: 25050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:55,800-Speed 3308.94 samples/sec   Loss 6.2521   LearningRate 0.0486   Epoch: 6   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:50:58,928-Speed 3274.11 samples/sec   Loss 6.3430   LearningRate 0.0486   Epoch: 6   Global Step: 25070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:02,047-Speed 3284.49 samples/sec   Loss 6.3419   LearningRate 0.0485   Epoch: 6   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:05,171-Speed 3278.68 samples/sec   Loss 6.4184   LearningRate 0.0485   Epoch: 6   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:08,338-Speed 3234.59 samples/sec   Loss 6.4199   LearningRate 0.0485   Epoch: 6   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:11,454-Speed 3287.23 samples/sec   Loss 6.4347   LearningRate 0.0485   Epoch: 6   Global Step: 25110   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-26 14:51:14,585-Speed 3272.08 samples/sec   Loss 6.4898   LearningRate 0.0485   Epoch: 6   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:17,685-Speed 3304.49 samples/sec   Loss 6.3830   LearningRate 0.0485   Epoch: 6   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:20,784-Speed 3304.62 samples/sec   Loss 6.3857   LearningRate 0.0484   Epoch: 6   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:23,880-Speed 3309.10 samples/sec   Loss 6.4286   LearningRate 0.0484   Epoch: 6   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:27,053-Speed 3228.92 samples/sec   Loss 6.4311   LearningRate 0.0484   Epoch: 6   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:30,179-Speed 3275.82 samples/sec   Loss 6.4293   LearningRate 0.0484   Epoch: 6   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:33,295-Speed 3287.73 samples/sec   Loss 6.5358   LearningRate 0.0484   Epoch: 6   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:36,387-Speed 3313.07 samples/sec   Loss 6.4696   LearningRate 0.0484   Epoch: 6   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:39,471-Speed 3320.77 samples/sec   Loss 6.4339   LearningRate 0.0483   Epoch: 6   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:42,574-Speed 3300.83 samples/sec   Loss 6.5653   LearningRate 0.0483   Epoch: 6   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:45,684-Speed 3293.66 samples/sec   Loss 6.4150   LearningRate 0.0483   Epoch: 6   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:51:48,757-Speed 3333.67 samples/sec   Loss 6.5626   LearningRate 0.0483   Epoch: 6   Global Step: 25230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:51:51,847-Speed 3315.35 samples/sec   Loss 6.6307   LearningRate 0.0483   Epoch: 6   Global Step: 25240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:51:54,951-Speed 3299.52 samples/sec   Loss 6.5838   LearningRate 0.0483   Epoch: 6   Global Step: 25250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:51:58,063-Speed 3292.35 samples/sec   Loss 6.6401   LearningRate 0.0482   Epoch: 6   Global Step: 25260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:01,151-Speed 3317.27 samples/sec   Loss 6.6179   LearningRate 0.0482   Epoch: 6   Global Step: 25270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:04,273-Speed 3280.89 samples/sec   Loss 6.6343   LearningRate 0.0482   Epoch: 6   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:07,390-Speed 3286.29 samples/sec   Loss 6.5360   LearningRate 0.0482   Epoch: 6   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:10,477-Speed 3318.95 samples/sec   Loss 6.5911   LearningRate 0.0482   Epoch: 6   Global Step: 25300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:13,604-Speed 3275.42 samples/sec   Loss 6.6203   LearningRate 0.0482   Epoch: 6   Global Step: 25310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:16,720-Speed 3288.13 samples/sec   Loss 6.6837   LearningRate 0.0481   Epoch: 6   Global Step: 25320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:19,814-Speed 3310.60 samples/sec   Loss 6.6879   LearningRate 0.0481   Epoch: 6   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:52:22,923-Speed 3296.81 samples/sec   Loss 6.6652   LearningRate 0.0481   Epoch: 6   Global Step: 25340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:52:26,005-Speed 3324.03 samples/sec   Loss 6.6980   LearningRate 0.0481   Epoch: 6   Global Step: 25350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:29,119-Speed 3289.60 samples/sec   Loss 6.6596   LearningRate 0.0481   Epoch: 6   Global Step: 25360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:32,199-Speed 3325.76 samples/sec   Loss 6.7692   LearningRate 0.0481   Epoch: 6   Global Step: 25370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:35,319-Speed 3282.99 samples/sec   Loss 6.5304   LearningRate 0.0480   Epoch: 6   Global Step: 25380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:38,425-Speed 3297.82 samples/sec   Loss 6.6497   LearningRate 0.0480   Epoch: 6   Global Step: 25390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:41,640-Speed 3185.89 samples/sec   Loss 6.7277   LearningRate 0.0480   Epoch: 6   Global Step: 25400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:44,742-Speed 3302.57 samples/sec   Loss 6.7370   LearningRate 0.0480   Epoch: 6   Global Step: 25410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:47,846-Speed 3299.31 samples/sec   Loss 6.6426   LearningRate 0.0480   Epoch: 6   Global Step: 25420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:50,994-Speed 3255.07 samples/sec   Loss 6.6969   LearningRate 0.0480   Epoch: 6   Global Step: 25430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:54,083-Speed 3315.16 samples/sec   Loss 6.6720   LearningRate 0.0479   Epoch: 6   Global Step: 25440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:52:57,181-Speed 3306.60 samples/sec   Loss 6.7963   LearningRate 0.0479   Epoch: 6   Global Step: 25450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:00,271-Speed 3314.95 samples/sec   Loss 6.6389   LearningRate 0.0479   Epoch: 6   Global Step: 25460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:03,399-Speed 3275.00 samples/sec   Loss 6.7945   LearningRate 0.0479   Epoch: 6   Global Step: 25470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:06,506-Speed 3297.08 samples/sec   Loss 6.6691   LearningRate 0.0479   Epoch: 6   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:09,635-Speed 3273.26 samples/sec   Loss 6.7365   LearningRate 0.0479   Epoch: 6   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:12,748-Speed 3290.11 samples/sec   Loss 6.7572   LearningRate 0.0478   Epoch: 6   Global Step: 25500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:15,863-Speed 3288.92 samples/sec   Loss 6.7657   LearningRate 0.0478   Epoch: 6   Global Step: 25510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:18,962-Speed 3305.19 samples/sec   Loss 6.7092   LearningRate 0.0478   Epoch: 6   Global Step: 25520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:22,078-Speed 3287.04 samples/sec   Loss 6.7704   LearningRate 0.0478   Epoch: 6   Global Step: 25530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:25,188-Speed 3293.96 samples/sec   Loss 6.8187   LearningRate 0.0478   Epoch: 6   Global Step: 25540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:28,272-Speed 3321.31 samples/sec   Loss 6.8099   LearningRate 0.0478   Epoch: 6   Global Step: 25550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:31,362-Speed 3314.49 samples/sec   Loss 6.7941   LearningRate 0.0477   Epoch: 6   Global Step: 25560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:34,471-Speed 3294.52 samples/sec   Loss 6.8809   LearningRate 0.0477   Epoch: 6   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:37,595-Speed 3279.16 samples/sec   Loss 6.8981   LearningRate 0.0477   Epoch: 6   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:40,723-Speed 3275.01 samples/sec   Loss 6.8623   LearningRate 0.0477   Epoch: 6   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:43,840-Speed 3286.20 samples/sec   Loss 6.8013   LearningRate 0.0477   Epoch: 6   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:46,954-Speed 3290.16 samples/sec   Loss 6.8731   LearningRate 0.0477   Epoch: 6   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:50,057-Speed 3301.17 samples/sec   Loss 6.9094   LearningRate 0.0476   Epoch: 6   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:53,164-Speed 3296.53 samples/sec   Loss 7.0143   LearningRate 0.0476   Epoch: 6   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:56,252-Speed 3317.05 samples/sec   Loss 6.9394   LearningRate 0.0476   Epoch: 6   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:53:59,329-Speed 3329.54 samples/sec   Loss 6.8890   LearningRate 0.0476   Epoch: 6   Global Step: 25650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:02,487-Speed 3243.86 samples/sec   Loss 6.8303   LearningRate 0.0476   Epoch: 6   Global Step: 25660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:05,588-Speed 3303.49 samples/sec   Loss 6.8696   LearningRate 0.0476   Epoch: 6   Global Step: 25670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:08,695-Speed 3297.39 samples/sec   Loss 7.0469   LearningRate 0.0475   Epoch: 6   Global Step: 25680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:11,791-Speed 3308.19 samples/sec   Loss 6.7935   LearningRate 0.0475   Epoch: 6   Global Step: 25690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:15,003-Speed 3190.14 samples/sec   Loss 6.9708   LearningRate 0.0475   Epoch: 6   Global Step: 25700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:18,145-Speed 3259.80 samples/sec   Loss 6.7940   LearningRate 0.0475   Epoch: 6   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:21,323-Speed 3223.56 samples/sec   Loss 6.9916   LearningRate 0.0475   Epoch: 6   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:24,411-Speed 3316.74 samples/sec   Loss 6.9612   LearningRate 0.0475   Epoch: 6   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:27,535-Speed 3278.71 samples/sec   Loss 6.9051   LearningRate 0.0474   Epoch: 6   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:30,663-Speed 3275.08 samples/sec   Loss 6.9828   LearningRate 0.0474   Epoch: 6   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:54:33,748-Speed 3320.64 samples/sec   Loss 6.9394   LearningRate 0.0474   Epoch: 6   Global Step: 25760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:36,879-Speed 3271.75 samples/sec   Loss 6.8742   LearningRate 0.0474   Epoch: 6   Global Step: 25770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:39,976-Speed 3306.68 samples/sec   Loss 6.9691   LearningRate 0.0474   Epoch: 6   Global Step: 25780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:43,109-Speed 3270.53 samples/sec   Loss 6.9527   LearningRate 0.0474   Epoch: 6   Global Step: 25790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:46,206-Speed 3307.08 samples/sec   Loss 6.9294   LearningRate 0.0473   Epoch: 6   Global Step: 25800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:49,318-Speed 3291.59 samples/sec   Loss 6.9411   LearningRate 0.0473   Epoch: 6   Global Step: 25810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:52,467-Speed 3253.24 samples/sec   Loss 6.9376   LearningRate 0.0473   Epoch: 6   Global Step: 25820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:55,583-Speed 3287.42 samples/sec   Loss 7.0918   LearningRate 0.0473   Epoch: 6   Global Step: 25830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:54:58,699-Speed 3286.94 samples/sec   Loss 6.8991   LearningRate 0.0473   Epoch: 6   Global Step: 25840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:55:01,873-Speed 3227.40 samples/sec   Loss 7.0326   LearningRate 0.0473   Epoch: 6   Global Step: 25850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:55:05,091-Speed 3182.92 samples/sec   Loss 7.0209   LearningRate 0.0472   Epoch: 6   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:08,230-Speed 3263.24 samples/sec   Loss 6.9808   LearningRate 0.0472   Epoch: 6   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:11,335-Speed 3299.24 samples/sec   Loss 6.8889   LearningRate 0.0472   Epoch: 6   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:14,448-Speed 3290.75 samples/sec   Loss 7.0104   LearningRate 0.0472   Epoch: 6   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:17,577-Speed 3274.14 samples/sec   Loss 6.9584   LearningRate 0.0472   Epoch: 6   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:20,680-Speed 3300.87 samples/sec   Loss 7.0007   LearningRate 0.0472   Epoch: 6   Global Step: 25910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:23,785-Speed 3299.42 samples/sec   Loss 7.0439   LearningRate 0.0471   Epoch: 6   Global Step: 25920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:27,023-Speed 3163.29 samples/sec   Loss 7.0837   LearningRate 0.0471   Epoch: 6   Global Step: 25930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:30,175-Speed 3250.31 samples/sec   Loss 7.1205   LearningRate 0.0471   Epoch: 6   Global Step: 25940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:33,266-Speed 3313.45 samples/sec   Loss 7.0138   LearningRate 0.0471   Epoch: 6   Global Step: 25950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:36,381-Speed 3289.01 samples/sec   Loss 7.0447   LearningRate 0.0471   Epoch: 6   Global Step: 25960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:39,480-Speed 3304.79 samples/sec   Loss 6.9064   LearningRate 0.0471   Epoch: 6   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:42,616-Speed 3267.25 samples/sec   Loss 7.0739   LearningRate 0.0470   Epoch: 6   Global Step: 25980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:45,712-Speed 3308.64 samples/sec   Loss 7.0423   LearningRate 0.0470   Epoch: 6   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:55:48,826-Speed 3289.83 samples/sec   Loss 7.0814   LearningRate 0.0470   Epoch: 6   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:56:33,279-[lfw][26000]XNorm: 22.316052
Training: 2022-04-26 14:56:33,280-[lfw][26000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-04-26 14:56:33,281-[lfw][26000]Accuracy-Highest: 0.99717
Training: 2022-04-26 14:57:24,074-[cfp_fp][26000]XNorm: 21.369194
Training: 2022-04-26 14:57:24,075-[cfp_fp][26000]Accuracy-Flip: 0.97886+-0.00663
Training: 2022-04-26 14:57:24,076-[cfp_fp][26000]Accuracy-Highest: 0.98071
Training: 2022-04-26 14:58:07,884-[agedb_30][26000]XNorm: 22.521947
Training: 2022-04-26 14:58:07,885-[agedb_30][26000]Accuracy-Flip: 0.97050+-0.00860
Training: 2022-04-26 14:58:07,885-[agedb_30][26000]Accuracy-Highest: 0.97200
Training: 2022-04-26 14:58:10,971-Speed 72.04 samples/sec   Loss 6.9904   LearningRate 0.0470   Epoch: 6   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:58:14,029-Speed 3348.70 samples/sec   Loss 7.0581   LearningRate 0.0470   Epoch: 6   Global Step: 26020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:17,114-Speed 3320.99 samples/sec   Loss 7.1094   LearningRate 0.0470   Epoch: 6   Global Step: 26030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:20,197-Speed 3321.45 samples/sec   Loss 7.0124   LearningRate 0.0469   Epoch: 6   Global Step: 26040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:23,270-Speed 3333.89 samples/sec   Loss 7.0066   LearningRate 0.0469   Epoch: 6   Global Step: 26050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:26,360-Speed 3314.28 samples/sec   Loss 7.0185   LearningRate 0.0469   Epoch: 6   Global Step: 26060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:29,448-Speed 3316.63 samples/sec   Loss 7.0820   LearningRate 0.0469   Epoch: 6   Global Step: 26070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:32,526-Speed 3328.51 samples/sec   Loss 7.0770   LearningRate 0.0469   Epoch: 6   Global Step: 26080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:35,606-Speed 3325.22 samples/sec   Loss 6.9180   LearningRate 0.0469   Epoch: 6   Global Step: 26090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:38,692-Speed 3318.29 samples/sec   Loss 7.1873   LearningRate 0.0468   Epoch: 6   Global Step: 26100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:41,786-Speed 3310.93 samples/sec   Loss 7.1076   LearningRate 0.0468   Epoch: 6   Global Step: 26110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:44,871-Speed 3320.31 samples/sec   Loss 7.2546   LearningRate 0.0468   Epoch: 6   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:58:47,981-Speed 3292.53 samples/sec   Loss 6.8973   LearningRate 0.0468   Epoch: 6   Global Step: 26130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:58:51,070-Speed 3315.77 samples/sec   Loss 7.0465   LearningRate 0.0468   Epoch: 6   Global Step: 26140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:58:54,143-Speed 3333.77 samples/sec   Loss 7.0344   LearningRate 0.0468   Epoch: 6   Global Step: 26150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:58:57,236-Speed 3311.52 samples/sec   Loss 7.0078   LearningRate 0.0467   Epoch: 6   Global Step: 26160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:00,349-Speed 3290.17 samples/sec   Loss 7.2154   LearningRate 0.0467   Epoch: 6   Global Step: 26170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:03,466-Speed 3286.13 samples/sec   Loss 7.1555   LearningRate 0.0467   Epoch: 6   Global Step: 26180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:06,597-Speed 3271.68 samples/sec   Loss 7.3316   LearningRate 0.0467   Epoch: 6   Global Step: 26190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:09,687-Speed 3314.46 samples/sec   Loss 7.1988   LearningRate 0.0467   Epoch: 6   Global Step: 26200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:12,780-Speed 3311.15 samples/sec   Loss 7.2203   LearningRate 0.0467   Epoch: 6   Global Step: 26210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:15,864-Speed 3322.13 samples/sec   Loss 7.0928   LearningRate 0.0466   Epoch: 6   Global Step: 26220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:18,948-Speed 3320.98 samples/sec   Loss 7.1954   LearningRate 0.0466   Epoch: 6   Global Step: 26230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:22,026-Speed 3327.44 samples/sec   Loss 7.0255   LearningRate 0.0466   Epoch: 6   Global Step: 26240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:25,111-Speed 3319.80 samples/sec   Loss 7.1291   LearningRate 0.0466   Epoch: 6   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:59:28,201-Speed 3315.42 samples/sec   Loss 7.2325   LearningRate 0.0466   Epoch: 6   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:59:31,281-Speed 3324.87 samples/sec   Loss 7.1484   LearningRate 0.0466   Epoch: 6   Global Step: 26270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:59:34,360-Speed 3327.44 samples/sec   Loss 7.1764   LearningRate 0.0465   Epoch: 6   Global Step: 26280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 14:59:37,420-Speed 3347.25 samples/sec   Loss 7.2267   LearningRate 0.0465   Epoch: 6   Global Step: 26290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:40,501-Speed 3324.91 samples/sec   Loss 7.0516   LearningRate 0.0465   Epoch: 6   Global Step: 26300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:43,576-Speed 3330.87 samples/sec   Loss 7.1041   LearningRate 0.0465   Epoch: 6   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:46,650-Speed 3331.97 samples/sec   Loss 7.2321   LearningRate 0.0465   Epoch: 6   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:49,722-Speed 3333.87 samples/sec   Loss 6.9986   LearningRate 0.0465   Epoch: 6   Global Step: 26330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:52,805-Speed 3323.32 samples/sec   Loss 7.0945   LearningRate 0.0464   Epoch: 6   Global Step: 26340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:55,877-Speed 3333.87 samples/sec   Loss 7.0987   LearningRate 0.0464   Epoch: 6   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 14:59:58,946-Speed 3337.18 samples/sec   Loss 7.1141   LearningRate 0.0464   Epoch: 6   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:02,019-Speed 3332.07 samples/sec   Loss 7.3362   LearningRate 0.0464   Epoch: 6   Global Step: 26370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:05,104-Speed 3320.70 samples/sec   Loss 7.1941   LearningRate 0.0464   Epoch: 6   Global Step: 26380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:08,174-Speed 3336.49 samples/sec   Loss 7.2112   LearningRate 0.0464   Epoch: 6   Global Step: 26390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:00:11,242-Speed 3338.27 samples/sec   Loss 7.2211   LearningRate 0.0463   Epoch: 6   Global Step: 26400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:00:14,315-Speed 3333.42 samples/sec   Loss 7.2812   LearningRate 0.0463   Epoch: 6   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:00:17,389-Speed 3332.52 samples/sec   Loss 7.1361   LearningRate 0.0463   Epoch: 6   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:00:20,463-Speed 3331.53 samples/sec   Loss 7.1587   LearningRate 0.0463   Epoch: 6   Global Step: 26430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:00:23,517-Speed 3353.85 samples/sec   Loss 7.0651   LearningRate 0.0463   Epoch: 6   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:26,588-Speed 3335.70 samples/sec   Loss 7.1468   LearningRate 0.0463   Epoch: 6   Global Step: 26450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:29,663-Speed 3331.01 samples/sec   Loss 7.2175   LearningRate 0.0462   Epoch: 6   Global Step: 26460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:32,733-Speed 3336.23 samples/sec   Loss 7.2007   LearningRate 0.0462   Epoch: 6   Global Step: 26470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:35,809-Speed 3329.95 samples/sec   Loss 7.3052   LearningRate 0.0462   Epoch: 6   Global Step: 26480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:38,882-Speed 3332.59 samples/sec   Loss 7.2560   LearningRate 0.0462   Epoch: 6   Global Step: 26490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:41,959-Speed 3329.32 samples/sec   Loss 7.2214   LearningRate 0.0462   Epoch: 6   Global Step: 26500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:45,029-Speed 3336.18 samples/sec   Loss 7.0740   LearningRate 0.0462   Epoch: 6   Global Step: 26510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:48,096-Speed 3338.85 samples/sec   Loss 7.1843   LearningRate 0.0461   Epoch: 6   Global Step: 26520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:51,169-Speed 3333.58 samples/sec   Loss 7.3042   LearningRate 0.0461   Epoch: 6   Global Step: 26530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:00:54,253-Speed 3320.93 samples/sec   Loss 7.2250   LearningRate 0.0461   Epoch: 6   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:00:57,305-Speed 3356.46 samples/sec   Loss 7.2661   LearningRate 0.0461   Epoch: 6   Global Step: 26550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:00,376-Speed 3335.73 samples/sec   Loss 7.3405   LearningRate 0.0461   Epoch: 6   Global Step: 26560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:03,455-Speed 3326.30 samples/sec   Loss 7.1679   LearningRate 0.0461   Epoch: 6   Global Step: 26570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:06,538-Speed 3322.19 samples/sec   Loss 7.1967   LearningRate 0.0460   Epoch: 6   Global Step: 26580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:09,604-Speed 3339.82 samples/sec   Loss 7.2227   LearningRate 0.0460   Epoch: 6   Global Step: 26590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:12,679-Speed 3331.45 samples/sec   Loss 7.2405   LearningRate 0.0460   Epoch: 6   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:15,754-Speed 3331.40 samples/sec   Loss 7.1779   LearningRate 0.0460   Epoch: 6   Global Step: 26610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:18,823-Speed 3336.90 samples/sec   Loss 7.1639   LearningRate 0.0460   Epoch: 6   Global Step: 26620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:21,911-Speed 3317.22 samples/sec   Loss 7.2342   LearningRate 0.0460   Epoch: 6   Global Step: 26630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:24,980-Speed 3336.50 samples/sec   Loss 7.0919   LearningRate 0.0460   Epoch: 6   Global Step: 26640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:28,123-Speed 3259.43 samples/sec   Loss 7.1571   LearningRate 0.0459   Epoch: 6   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:01:31,224-Speed 3303.14 samples/sec   Loss 7.1192   LearningRate 0.0459   Epoch: 6   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:01:34,321-Speed 3306.82 samples/sec   Loss 7.2365   LearningRate 0.0459   Epoch: 6   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:01:37,402-Speed 3324.01 samples/sec   Loss 7.3832   LearningRate 0.0459   Epoch: 6   Global Step: 26680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:01:40,498-Speed 3309.03 samples/sec   Loss 7.2373   LearningRate 0.0459   Epoch: 6   Global Step: 26690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:43,579-Speed 3328.66 samples/sec   Loss 7.0599   LearningRate 0.0459   Epoch: 6   Global Step: 26700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:46,654-Speed 3331.79 samples/sec   Loss 7.2393   LearningRate 0.0458   Epoch: 6   Global Step: 26710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:49,730-Speed 3329.74 samples/sec   Loss 7.2479   LearningRate 0.0458   Epoch: 6   Global Step: 26720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:52,812-Speed 3322.95 samples/sec   Loss 7.2042   LearningRate 0.0458   Epoch: 6   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:55,907-Speed 3308.85 samples/sec   Loss 7.2188   LearningRate 0.0458   Epoch: 6   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:01:58,982-Speed 3331.43 samples/sec   Loss 7.1723   LearningRate 0.0458   Epoch: 6   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:02:02,056-Speed 3331.51 samples/sec   Loss 7.2509   LearningRate 0.0458   Epoch: 6   Global Step: 26760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:02:05,133-Speed 3329.71 samples/sec   Loss 7.2868   LearningRate 0.0457   Epoch: 6   Global Step: 26770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:02:08,213-Speed 3325.26 samples/sec   Loss 7.4304   LearningRate 0.0457   Epoch: 6   Global Step: 26780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:02:11,324-Speed 3291.79 samples/sec   Loss 7.2743   LearningRate 0.0457   Epoch: 6   Global Step: 26790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:14,428-Speed 3299.78 samples/sec   Loss 7.2463   LearningRate 0.0457   Epoch: 6   Global Step: 26800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:17,512-Speed 3321.27 samples/sec   Loss 7.2910   LearningRate 0.0457   Epoch: 6   Global Step: 26810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:20,599-Speed 3318.40 samples/sec   Loss 7.3226   LearningRate 0.0457   Epoch: 6   Global Step: 26820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:23,678-Speed 3325.86 samples/sec   Loss 7.1821   LearningRate 0.0456   Epoch: 6   Global Step: 26830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:26,768-Speed 3314.73 samples/sec   Loss 7.2864   LearningRate 0.0456   Epoch: 6   Global Step: 26840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:29,856-Speed 3317.85 samples/sec   Loss 7.2394   LearningRate 0.0456   Epoch: 6   Global Step: 26850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:32,932-Speed 3329.15 samples/sec   Loss 7.2151   LearningRate 0.0456   Epoch: 6   Global Step: 26860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:36,020-Speed 3316.99 samples/sec   Loss 7.2627   LearningRate 0.0456   Epoch: 6   Global Step: 26870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:39,102-Speed 3323.98 samples/sec   Loss 7.2379   LearningRate 0.0456   Epoch: 6   Global Step: 26880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:42,325-Speed 3177.87 samples/sec   Loss 7.2802   LearningRate 0.0455   Epoch: 6   Global Step: 26890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:45,403-Speed 3327.52 samples/sec   Loss 7.4368   LearningRate 0.0455   Epoch: 6   Global Step: 26900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:48,489-Speed 3318.53 samples/sec   Loss 7.3821   LearningRate 0.0455   Epoch: 6   Global Step: 26910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:51,567-Speed 3328.50 samples/sec   Loss 7.2988   LearningRate 0.0455   Epoch: 6   Global Step: 26920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:54,644-Speed 3328.71 samples/sec   Loss 7.3023   LearningRate 0.0455   Epoch: 6   Global Step: 26930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:02:57,735-Speed 3313.78 samples/sec   Loss 7.3266   LearningRate 0.0455   Epoch: 6   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:00,823-Speed 3316.68 samples/sec   Loss 7.2078   LearningRate 0.0454   Epoch: 6   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:03,906-Speed 3322.36 samples/sec   Loss 7.3126   LearningRate 0.0454   Epoch: 6   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:06,980-Speed 3331.24 samples/sec   Loss 7.2302   LearningRate 0.0454   Epoch: 6   Global Step: 26970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:10,059-Speed 3326.69 samples/sec   Loss 7.3258   LearningRate 0.0454   Epoch: 6   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:13,136-Speed 3328.72 samples/sec   Loss 7.2035   LearningRate 0.0454   Epoch: 6   Global Step: 26990   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-26 15:03:16,274-Speed 3263.99 samples/sec   Loss 7.3404   LearningRate 0.0454   Epoch: 6   Global Step: 27000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:19,350-Speed 3329.84 samples/sec   Loss 7.2727   LearningRate 0.0453   Epoch: 6   Global Step: 27010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:22,442-Speed 3312.61 samples/sec   Loss 7.2634   LearningRate 0.0453   Epoch: 6   Global Step: 27020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:03:25,516-Speed 3332.47 samples/sec   Loss 7.2589   LearningRate 0.0453   Epoch: 6   Global Step: 27030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:03:28,620-Speed 3299.97 samples/sec   Loss 7.2688   LearningRate 0.0453   Epoch: 6   Global Step: 27040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:03:31,704-Speed 3320.24 samples/sec   Loss 7.4195   LearningRate 0.0453   Epoch: 6   Global Step: 27050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:03:34,783-Speed 3326.71 samples/sec   Loss 7.1890   LearningRate 0.0453   Epoch: 6   Global Step: 27060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:03:37,867-Speed 3321.48 samples/sec   Loss 7.2534   LearningRate 0.0452   Epoch: 6   Global Step: 27070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:03:40,942-Speed 3331.18 samples/sec   Loss 7.3548   LearningRate 0.0452   Epoch: 6   Global Step: 27080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:03:43,998-Speed 3351.13 samples/sec   Loss 7.3187   LearningRate 0.0452   Epoch: 6   Global Step: 27090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:03:47,075-Speed 3329.09 samples/sec   Loss 7.2980   LearningRate 0.0452   Epoch: 6   Global Step: 27100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:03:50,198-Speed 3279.81 samples/sec   Loss 7.2000   LearningRate 0.0452   Epoch: 6   Global Step: 27110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:03:53,369-Speed 3229.85 samples/sec   Loss 7.2379   LearningRate 0.0452   Epoch: 6   Global Step: 27120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:03:56,472-Speed 3300.01 samples/sec   Loss 7.1859   LearningRate 0.0452   Epoch: 6   Global Step: 27130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:03:59,554-Speed 3324.20 samples/sec   Loss 7.3797   LearningRate 0.0451   Epoch: 6   Global Step: 27140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:04:02,634-Speed 3324.86 samples/sec   Loss 7.0842   LearningRate 0.0451   Epoch: 6   Global Step: 27150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:04:05,710-Speed 3329.84 samples/sec   Loss 7.3813   LearningRate 0.0451   Epoch: 6   Global Step: 27160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:04:08,796-Speed 3319.47 samples/sec   Loss 7.2453   LearningRate 0.0451   Epoch: 6   Global Step: 27170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:04:11,876-Speed 3324.92 samples/sec   Loss 7.2485   LearningRate 0.0451   Epoch: 6   Global Step: 27180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:04:14,951-Speed 3331.48 samples/sec   Loss 7.2519   LearningRate 0.0451   Epoch: 6   Global Step: 27190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:18,029-Speed 3327.71 samples/sec   Loss 7.3679   LearningRate 0.0450   Epoch: 6   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:21,110-Speed 3323.62 samples/sec   Loss 7.1952   LearningRate 0.0450   Epoch: 6   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:24,186-Speed 3330.56 samples/sec   Loss 7.2770   LearningRate 0.0450   Epoch: 6   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:27,279-Speed 3311.63 samples/sec   Loss 7.2482   LearningRate 0.0450   Epoch: 6   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:30,358-Speed 3326.11 samples/sec   Loss 7.4913   LearningRate 0.0450   Epoch: 6   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:33,443-Speed 3320.49 samples/sec   Loss 7.2520   LearningRate 0.0450   Epoch: 6   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:36,524-Speed 3324.34 samples/sec   Loss 7.2004   LearningRate 0.0449   Epoch: 6   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:39,606-Speed 3322.93 samples/sec   Loss 7.3505   LearningRate 0.0449   Epoch: 6   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:42,692-Speed 3320.08 samples/sec   Loss 7.4182   LearningRate 0.0449   Epoch: 6   Global Step: 27280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:45,772-Speed 3325.29 samples/sec   Loss 7.3374   LearningRate 0.0449   Epoch: 6   Global Step: 27290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:04:48,848-Speed 3329.15 samples/sec   Loss 7.3103   LearningRate 0.0449   Epoch: 6   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:04:51,930-Speed 3323.38 samples/sec   Loss 7.2615   LearningRate 0.0449   Epoch: 6   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:04:55,000-Speed 3336.65 samples/sec   Loss 7.2272   LearningRate 0.0448   Epoch: 6   Global Step: 27320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:04:58,093-Speed 3312.00 samples/sec   Loss 7.2404   LearningRate 0.0448   Epoch: 6   Global Step: 27330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:01,207-Speed 3289.54 samples/sec   Loss 7.2233   LearningRate 0.0448   Epoch: 6   Global Step: 27340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:04,320-Speed 3290.59 samples/sec   Loss 7.2685   LearningRate 0.0448   Epoch: 6   Global Step: 27350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:07,410-Speed 3314.45 samples/sec   Loss 7.3956   LearningRate 0.0448   Epoch: 6   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:10,499-Speed 3316.03 samples/sec   Loss 7.3288   LearningRate 0.0448   Epoch: 6   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:13,578-Speed 3326.65 samples/sec   Loss 7.2793   LearningRate 0.0447   Epoch: 6   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:16,672-Speed 3309.72 samples/sec   Loss 7.3244   LearningRate 0.0447   Epoch: 6   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:19,749-Speed 3329.14 samples/sec   Loss 7.3559   LearningRate 0.0447   Epoch: 6   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:22,832-Speed 3322.04 samples/sec   Loss 7.3995   LearningRate 0.0447   Epoch: 6   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:05:25,913-Speed 3325.24 samples/sec   Loss 7.3503   LearningRate 0.0447   Epoch: 6   Global Step: 27420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:28,994-Speed 3323.99 samples/sec   Loss 7.3128   LearningRate 0.0447   Epoch: 6   Global Step: 27430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:32,073-Speed 3326.27 samples/sec   Loss 7.2281   LearningRate 0.0446   Epoch: 6   Global Step: 27440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:35,180-Speed 3297.06 samples/sec   Loss 7.1825   LearningRate 0.0446   Epoch: 6   Global Step: 27450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:38,262-Speed 3323.42 samples/sec   Loss 7.3597   LearningRate 0.0446   Epoch: 6   Global Step: 27460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:41,352-Speed 3314.84 samples/sec   Loss 7.1739   LearningRate 0.0446   Epoch: 6   Global Step: 27470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:44,445-Speed 3311.63 samples/sec   Loss 7.3105   LearningRate 0.0446   Epoch: 6   Global Step: 27480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:47,524-Speed 3325.98 samples/sec   Loss 7.3591   LearningRate 0.0446   Epoch: 6   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:50,615-Speed 3314.50 samples/sec   Loss 7.2239   LearningRate 0.0446   Epoch: 6   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:53,705-Speed 3314.14 samples/sec   Loss 7.3080   LearningRate 0.0445   Epoch: 6   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:56,780-Speed 3330.62 samples/sec   Loss 7.3372   LearningRate 0.0445   Epoch: 6   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:05:59,868-Speed 3317.76 samples/sec   Loss 7.2934   LearningRate 0.0445   Epoch: 6   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:02,952-Speed 3320.89 samples/sec   Loss 7.2678   LearningRate 0.0445   Epoch: 6   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:06,041-Speed 3315.98 samples/sec   Loss 7.2841   LearningRate 0.0445   Epoch: 6   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:09,132-Speed 3314.13 samples/sec   Loss 7.2372   LearningRate 0.0445   Epoch: 6   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:12,215-Speed 3322.17 samples/sec   Loss 7.3967   LearningRate 0.0444   Epoch: 6   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:15,350-Speed 3266.47 samples/sec   Loss 7.2633   LearningRate 0.0444   Epoch: 6   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:18,442-Speed 3312.81 samples/sec   Loss 7.1862   LearningRate 0.0444   Epoch: 6   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:21,525-Speed 3322.87 samples/sec   Loss 7.2284   LearningRate 0.0444   Epoch: 6   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:24,629-Speed 3299.34 samples/sec   Loss 7.2478   LearningRate 0.0444   Epoch: 6   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:27,693-Speed 3343.18 samples/sec   Loss 7.2515   LearningRate 0.0444   Epoch: 6   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:30,771-Speed 3327.37 samples/sec   Loss 7.4058   LearningRate 0.0443   Epoch: 6   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:33,851-Speed 3325.92 samples/sec   Loss 7.2033   LearningRate 0.0443   Epoch: 6   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:36,933-Speed 3323.18 samples/sec   Loss 7.2917   LearningRate 0.0443   Epoch: 6   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:40,011-Speed 3327.67 samples/sec   Loss 7.2244   LearningRate 0.0443   Epoch: 6   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:43,097-Speed 3318.91 samples/sec   Loss 7.2649   LearningRate 0.0443   Epoch: 6   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:46,202-Speed 3299.04 samples/sec   Loss 7.2873   LearningRate 0.0443   Epoch: 6   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:49,295-Speed 3311.17 samples/sec   Loss 7.1845   LearningRate 0.0442   Epoch: 6   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:06:52,386-Speed 3313.40 samples/sec   Loss 7.2554   LearningRate 0.0442   Epoch: 6   Global Step: 27700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:06:55,473-Speed 3318.75 samples/sec   Loss 7.3307   LearningRate 0.0442   Epoch: 6   Global Step: 27710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:06:58,557-Speed 3320.72 samples/sec   Loss 7.2829   LearningRate 0.0442   Epoch: 6   Global Step: 27720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:01,638-Speed 3324.25 samples/sec   Loss 7.2151   LearningRate 0.0442   Epoch: 6   Global Step: 27730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:04,811-Speed 3228.14 samples/sec   Loss 7.2987   LearningRate 0.0442   Epoch: 6   Global Step: 27740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:07,951-Speed 3262.00 samples/sec   Loss 7.1768   LearningRate 0.0441   Epoch: 6   Global Step: 27750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:11,073-Speed 3280.81 samples/sec   Loss 7.1946   LearningRate 0.0441   Epoch: 6   Global Step: 27760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:14,166-Speed 3310.60 samples/sec   Loss 7.2333   LearningRate 0.0441   Epoch: 6   Global Step: 27770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:17,256-Speed 3315.46 samples/sec   Loss 7.3622   LearningRate 0.0441   Epoch: 6   Global Step: 27780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:20,337-Speed 3324.92 samples/sec   Loss 7.1866   LearningRate 0.0441   Epoch: 6   Global Step: 27790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:23,418-Speed 3323.49 samples/sec   Loss 7.1407   LearningRate 0.0441   Epoch: 6   Global Step: 27800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:07:26,593-Speed 3225.91 samples/sec   Loss 7.2505   LearningRate 0.0441   Epoch: 6   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:07:29,721-Speed 3275.18 samples/sec   Loss 7.2100   LearningRate 0.0440   Epoch: 6   Global Step: 27820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:32,805-Speed 3321.22 samples/sec   Loss 7.2417   LearningRate 0.0440   Epoch: 6   Global Step: 27830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:35,887-Speed 3322.91 samples/sec   Loss 7.2766   LearningRate 0.0440   Epoch: 6   Global Step: 27840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:38,980-Speed 3311.62 samples/sec   Loss 7.3143   LearningRate 0.0440   Epoch: 6   Global Step: 27850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:42,059-Speed 3326.41 samples/sec   Loss 7.3411   LearningRate 0.0440   Epoch: 6   Global Step: 27860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:45,152-Speed 3312.15 samples/sec   Loss 7.2539   LearningRate 0.0440   Epoch: 6   Global Step: 27870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:48,249-Speed 3306.77 samples/sec   Loss 7.2340   LearningRate 0.0439   Epoch: 6   Global Step: 27880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:51,332-Speed 3321.60 samples/sec   Loss 7.2947   LearningRate 0.0439   Epoch: 6   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:54,416-Speed 3321.74 samples/sec   Loss 7.2508   LearningRate 0.0439   Epoch: 6   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:07:57,501-Speed 3320.40 samples/sec   Loss 7.3306   LearningRate 0.0439   Epoch: 6   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:08:00,596-Speed 3308.83 samples/sec   Loss 7.1799   LearningRate 0.0439   Epoch: 6   Global Step: 27920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:08:03,676-Speed 3325.92 samples/sec   Loss 7.2792   LearningRate 0.0439   Epoch: 6   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:08:06,765-Speed 3316.61 samples/sec   Loss 7.3188   LearningRate 0.0438   Epoch: 6   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:08:09,847-Speed 3322.87 samples/sec   Loss 7.2363   LearningRate 0.0438   Epoch: 6   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:08:12,928-Speed 3324.14 samples/sec   Loss 7.3131   LearningRate 0.0438   Epoch: 6   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:08:16,007-Speed 3327.07 samples/sec   Loss 7.1649   LearningRate 0.0438   Epoch: 6   Global Step: 27970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:08:19,092-Speed 3319.48 samples/sec   Loss 7.2665   LearningRate 0.0438   Epoch: 6   Global Step: 27980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:08:22,179-Speed 3318.60 samples/sec   Loss 7.3478   LearningRate 0.0438   Epoch: 6   Global Step: 27990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:08:25,263-Speed 3320.89 samples/sec   Loss 7.2204   LearningRate 0.0437   Epoch: 6   Global Step: 28000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:09:08,797-[lfw][28000]XNorm: 22.443326
Training: 2022-04-26 15:09:08,797-[lfw][28000]Accuracy-Flip: 0.99633+-0.00306
Training: 2022-04-26 15:09:08,798-[lfw][28000]Accuracy-Highest: 0.99717
Training: 2022-04-26 15:09:59,328-[cfp_fp][28000]XNorm: 21.010852
Training: 2022-04-26 15:09:59,328-[cfp_fp][28000]Accuracy-Flip: 0.98071+-0.00689
Training: 2022-04-26 15:09:59,329-[cfp_fp][28000]Accuracy-Highest: 0.98071
Training: 2022-04-26 15:10:43,044-[agedb_30][28000]XNorm: 22.418203
Training: 2022-04-26 15:10:43,044-[agedb_30][28000]Accuracy-Flip: 0.96683+-0.00783
Training: 2022-04-26 15:10:43,045-[agedb_30][28000]Accuracy-Highest: 0.97200
Training: 2022-04-26 15:10:46,117-Speed 72.70 samples/sec   Loss 7.2152   LearningRate 0.0437   Epoch: 6   Global Step: 28010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:10:49,185-Speed 3338.71 samples/sec   Loss 7.2954   LearningRate 0.0437   Epoch: 6   Global Step: 28020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:10:52,269-Speed 3321.64 samples/sec   Loss 7.2044   LearningRate 0.0437   Epoch: 6   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:10:55,340-Speed 3335.77 samples/sec   Loss 7.2164   LearningRate 0.0437   Epoch: 6   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:10:58,438-Speed 3306.05 samples/sec   Loss 7.1660   LearningRate 0.0437   Epoch: 6   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:11:01,520-Speed 3323.07 samples/sec   Loss 7.2814   LearningRate 0.0437   Epoch: 6   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:11:04,613-Speed 3311.51 samples/sec   Loss 7.1603   LearningRate 0.0436   Epoch: 6   Global Step: 28070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:11:07,703-Speed 3315.56 samples/sec   Loss 7.3573   LearningRate 0.0436   Epoch: 6   Global Step: 28080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:11:10,762-Speed 3347.76 samples/sec   Loss 7.1732   LearningRate 0.0436   Epoch: 6   Global Step: 28090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:11:13,834-Speed 3334.45 samples/sec   Loss 7.2570   LearningRate 0.0436   Epoch: 6   Global Step: 28100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:16,913-Speed 3326.82 samples/sec   Loss 7.2941   LearningRate 0.0436   Epoch: 6   Global Step: 28110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:19,993-Speed 3325.93 samples/sec   Loss 7.2518   LearningRate 0.0436   Epoch: 6   Global Step: 28120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:23,073-Speed 3325.18 samples/sec   Loss 7.2324   LearningRate 0.0435   Epoch: 6   Global Step: 28130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:26,161-Speed 3317.28 samples/sec   Loss 7.2882   LearningRate 0.0435   Epoch: 6   Global Step: 28140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:29,242-Speed 3323.51 samples/sec   Loss 7.3620   LearningRate 0.0435   Epoch: 6   Global Step: 28150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:32,333-Speed 3314.24 samples/sec   Loss 7.2316   LearningRate 0.0435   Epoch: 6   Global Step: 28160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:35,424-Speed 3313.75 samples/sec   Loss 7.1313   LearningRate 0.0435   Epoch: 6   Global Step: 28170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:38,510-Speed 3318.81 samples/sec   Loss 7.2347   LearningRate 0.0435   Epoch: 6   Global Step: 28180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:41,632-Speed 3280.47 samples/sec   Loss 7.2729   LearningRate 0.0434   Epoch: 6   Global Step: 28190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-26 15:11:44,726-Speed 3310.78 samples/sec   Loss 7.2469   LearningRate 0.0434   Epoch: 6   Global Step: 28200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:11:47,834-Speed 3295.15 samples/sec   Loss 7.2566   LearningRate 0.0434   Epoch: 6   Global Step: 28210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:11:50,999-Speed 3236.90 samples/sec   Loss 7.2699   LearningRate 0.0434   Epoch: 6   Global Step: 28220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:11:54,138-Speed 3262.42 samples/sec   Loss 7.2822   LearningRate 0.0434   Epoch: 6   Global Step: 28230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:11:57,230-Speed 3312.22 samples/sec   Loss 7.2884   LearningRate 0.0434   Epoch: 6   Global Step: 28240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:00,318-Speed 3316.87 samples/sec   Loss 7.2221   LearningRate 0.0433   Epoch: 6   Global Step: 28250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:03,422-Speed 3299.94 samples/sec   Loss 7.3000   LearningRate 0.0433   Epoch: 6   Global Step: 28260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:06,508-Speed 3319.44 samples/sec   Loss 7.1209   LearningRate 0.0433   Epoch: 6   Global Step: 28270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:09,598-Speed 3314.37 samples/sec   Loss 7.1186   LearningRate 0.0433   Epoch: 6   Global Step: 28280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:12,687-Speed 3315.52 samples/sec   Loss 7.2430   LearningRate 0.0433   Epoch: 6   Global Step: 28290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:15,789-Speed 3301.49 samples/sec   Loss 7.2653   LearningRate 0.0433   Epoch: 6   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:12:18,870-Speed 3325.02 samples/sec   Loss 7.2179   LearningRate 0.0433   Epoch: 6   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:12:21,948-Speed 3327.49 samples/sec   Loss 7.1952   LearningRate 0.0432   Epoch: 6   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:12:25,029-Speed 3324.53 samples/sec   Loss 7.2049   LearningRate 0.0432   Epoch: 6   Global Step: 28330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:12:28,121-Speed 3313.01 samples/sec   Loss 7.1031   LearningRate 0.0432   Epoch: 6   Global Step: 28340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:12:31,201-Speed 3325.65 samples/sec   Loss 7.1015   LearningRate 0.0432   Epoch: 6   Global Step: 28350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:12:34,260-Speed 3347.78 samples/sec   Loss 7.2561   LearningRate 0.0432   Epoch: 6   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:37,354-Speed 3310.91 samples/sec   Loss 7.3367   LearningRate 0.0432   Epoch: 6   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:40,431-Speed 3328.44 samples/sec   Loss 7.1547   LearningRate 0.0431   Epoch: 6   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:43,507-Speed 3330.57 samples/sec   Loss 7.2781   LearningRate 0.0431   Epoch: 6   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:46,579-Speed 3333.75 samples/sec   Loss 7.1287   LearningRate 0.0431   Epoch: 6   Global Step: 28400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:49,649-Speed 3336.62 samples/sec   Loss 7.1047   LearningRate 0.0431   Epoch: 6   Global Step: 28410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:52,728-Speed 3325.96 samples/sec   Loss 7.2596   LearningRate 0.0431   Epoch: 6   Global Step: 28420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:55,803-Speed 3331.53 samples/sec   Loss 7.1852   LearningRate 0.0431   Epoch: 6   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:12:58,889-Speed 3319.34 samples/sec   Loss 7.1840   LearningRate 0.0430   Epoch: 6   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:01,964-Speed 3330.62 samples/sec   Loss 7.1657   LearningRate 0.0430   Epoch: 6   Global Step: 28450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:05,042-Speed 3327.95 samples/sec   Loss 7.0774   LearningRate 0.0430   Epoch: 6   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:13:08,130-Speed 3317.11 samples/sec   Loss 7.1354   LearningRate 0.0430   Epoch: 6   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:13:11,211-Speed 3324.66 samples/sec   Loss 7.2222   LearningRate 0.0430   Epoch: 6   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:13:14,280-Speed 3336.35 samples/sec   Loss 7.0947   LearningRate 0.0430   Epoch: 6   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:13:17,354-Speed 3332.52 samples/sec   Loss 7.3935   LearningRate 0.0430   Epoch: 6   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:13:20,424-Speed 3336.53 samples/sec   Loss 7.0863   LearningRate 0.0429   Epoch: 6   Global Step: 28510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:13:23,497-Speed 3332.97 samples/sec   Loss 7.2547   LearningRate 0.0429   Epoch: 6   Global Step: 28520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:13:26,575-Speed 3327.56 samples/sec   Loss 7.2602   LearningRate 0.0429   Epoch: 6   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:29,650-Speed 3331.62 samples/sec   Loss 7.2245   LearningRate 0.0429   Epoch: 6   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:32,725-Speed 3330.09 samples/sec   Loss 7.2095   LearningRate 0.0429   Epoch: 6   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:35,795-Speed 3336.37 samples/sec   Loss 7.1847   LearningRate 0.0429   Epoch: 6   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:38,865-Speed 3336.64 samples/sec   Loss 7.1787   LearningRate 0.0428   Epoch: 6   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:41,938-Speed 3333.46 samples/sec   Loss 7.1221   LearningRate 0.0428   Epoch: 6   Global Step: 28580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:45,005-Speed 3339.66 samples/sec   Loss 7.2297   LearningRate 0.0428   Epoch: 6   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:48,081-Speed 3329.83 samples/sec   Loss 7.1000   LearningRate 0.0428   Epoch: 6   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:51,150-Speed 3337.33 samples/sec   Loss 7.2245   LearningRate 0.0428   Epoch: 6   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:54,224-Speed 3332.59 samples/sec   Loss 7.2462   LearningRate 0.0428   Epoch: 6   Global Step: 28620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:13:57,295-Speed 3334.98 samples/sec   Loss 7.0106   LearningRate 0.0427   Epoch: 6   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:00,368-Speed 3332.51 samples/sec   Loss 7.2232   LearningRate 0.0427   Epoch: 6   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:03,440-Speed 3334.29 samples/sec   Loss 7.1050   LearningRate 0.0427   Epoch: 6   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:06,515-Speed 3330.98 samples/sec   Loss 7.2578   LearningRate 0.0427   Epoch: 6   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:09,590-Speed 3330.53 samples/sec   Loss 7.1077   LearningRate 0.0427   Epoch: 6   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:12,661-Speed 3335.11 samples/sec   Loss 7.0929   LearningRate 0.0427   Epoch: 6   Global Step: 28680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:15,736-Speed 3331.11 samples/sec   Loss 7.2304   LearningRate 0.0427   Epoch: 6   Global Step: 28690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:18,819-Speed 3322.05 samples/sec   Loss 7.1385   LearningRate 0.0426   Epoch: 6   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:21,894-Speed 3331.48 samples/sec   Loss 7.2647   LearningRate 0.0426   Epoch: 6   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:24,976-Speed 3322.85 samples/sec   Loss 7.1747   LearningRate 0.0426   Epoch: 6   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:14:28,034-Speed 3350.10 samples/sec   Loss 7.2228   LearningRate 0.0426   Epoch: 6   Global Step: 28730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:31,110-Speed 3329.77 samples/sec   Loss 7.1411   LearningRate 0.0426   Epoch: 6   Global Step: 28740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:34,183-Speed 3332.46 samples/sec   Loss 7.1455   LearningRate 0.0426   Epoch: 6   Global Step: 28750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:37,269-Speed 3319.03 samples/sec   Loss 7.1807   LearningRate 0.0425   Epoch: 6   Global Step: 28760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:40,364-Speed 3310.32 samples/sec   Loss 7.2548   LearningRate 0.0425   Epoch: 6   Global Step: 28770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:43,444-Speed 3324.65 samples/sec   Loss 7.0437   LearningRate 0.0425   Epoch: 6   Global Step: 28780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:46,531-Speed 3318.17 samples/sec   Loss 7.1994   LearningRate 0.0425   Epoch: 6   Global Step: 28790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:49,611-Speed 3325.61 samples/sec   Loss 7.1343   LearningRate 0.0425   Epoch: 6   Global Step: 28800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:52,685-Speed 3332.46 samples/sec   Loss 7.1110   LearningRate 0.0425   Epoch: 6   Global Step: 28810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:55,762-Speed 3327.99 samples/sec   Loss 7.2244   LearningRate 0.0424   Epoch: 6   Global Step: 28820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:14:58,835-Speed 3333.73 samples/sec   Loss 7.1273   LearningRate 0.0424   Epoch: 6   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:15:01,965-Speed 3272.18 samples/sec   Loss 7.1402   LearningRate 0.0424   Epoch: 6   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:15:05,210-Speed 3156.39 samples/sec   Loss 7.0438   LearningRate 0.0424   Epoch: 6   Global Step: 28850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:15:08,283-Speed 3333.55 samples/sec   Loss 7.2506   LearningRate 0.0424   Epoch: 6   Global Step: 28860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:15:11,357-Speed 3331.16 samples/sec   Loss 7.1455   LearningRate 0.0424   Epoch: 6   Global Step: 28870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:15:14,437-Speed 3326.08 samples/sec   Loss 7.1068   LearningRate 0.0424   Epoch: 6   Global Step: 28880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:15:17,505-Speed 3338.27 samples/sec   Loss 7.0985   LearningRate 0.0423   Epoch: 6   Global Step: 28890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:20,594-Speed 3315.62 samples/sec   Loss 7.1387   LearningRate 0.0423   Epoch: 6   Global Step: 28900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:23,682-Speed 3317.12 samples/sec   Loss 7.1403   LearningRate 0.0423   Epoch: 6   Global Step: 28910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:26,762-Speed 3325.76 samples/sec   Loss 7.0973   LearningRate 0.0423   Epoch: 6   Global Step: 28920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:29,842-Speed 3325.62 samples/sec   Loss 7.0506   LearningRate 0.0423   Epoch: 6   Global Step: 28930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:33,050-Speed 3192.41 samples/sec   Loss 7.2004   LearningRate 0.0423   Epoch: 6   Global Step: 28940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:46,057-Speed 787.36 samples/sec   Loss 6.2068   LearningRate 0.0422   Epoch: 7   Global Step: 28950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:49,221-Speed 3238.10 samples/sec   Loss 5.5127   LearningRate 0.0422   Epoch: 7   Global Step: 28960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:52,296-Speed 3331.15 samples/sec   Loss 5.3687   LearningRate 0.0422   Epoch: 7   Global Step: 28970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:55,398-Speed 3302.17 samples/sec   Loss 5.5166   LearningRate 0.0422   Epoch: 7   Global Step: 28980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:15:58,484-Speed 3319.92 samples/sec   Loss 5.5434   LearningRate 0.0422   Epoch: 7   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:01,559-Speed 3330.43 samples/sec   Loss 5.4245   LearningRate 0.0422   Epoch: 7   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:04,632-Speed 3333.21 samples/sec   Loss 5.5943   LearningRate 0.0421   Epoch: 7   Global Step: 29010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:07,710-Speed 3328.05 samples/sec   Loss 5.4422   LearningRate 0.0421   Epoch: 7   Global Step: 29020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:10,791-Speed 3323.99 samples/sec   Loss 5.6317   LearningRate 0.0421   Epoch: 7   Global Step: 29030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:13,905-Speed 3290.05 samples/sec   Loss 5.5595   LearningRate 0.0421   Epoch: 7   Global Step: 29040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:17,106-Speed 3199.79 samples/sec   Loss 5.5739   LearningRate 0.0421   Epoch: 7   Global Step: 29050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:20,184-Speed 3327.74 samples/sec   Loss 5.5884   LearningRate 0.0421   Epoch: 7   Global Step: 29060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:23,255-Speed 3334.50 samples/sec   Loss 5.6166   LearningRate 0.0421   Epoch: 7   Global Step: 29070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:26,334-Speed 3327.38 samples/sec   Loss 5.6007   LearningRate 0.0420   Epoch: 7   Global Step: 29080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:29,408-Speed 3331.13 samples/sec   Loss 5.5975   LearningRate 0.0420   Epoch: 7   Global Step: 29090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:32,484-Speed 3330.52 samples/sec   Loss 5.6120   LearningRate 0.0420   Epoch: 7   Global Step: 29100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:16:35,547-Speed 3343.87 samples/sec   Loss 5.5287   LearningRate 0.0420   Epoch: 7   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:16:38,649-Speed 3302.41 samples/sec   Loss 5.5731   LearningRate 0.0420   Epoch: 7   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:16:41,728-Speed 3326.13 samples/sec   Loss 5.6480   LearningRate 0.0420   Epoch: 7   Global Step: 29130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:16:44,825-Speed 3307.94 samples/sec   Loss 5.6139   LearningRate 0.0419   Epoch: 7   Global Step: 29140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:16:47,929-Speed 3299.57 samples/sec   Loss 5.6046   LearningRate 0.0419   Epoch: 7   Global Step: 29150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:16:51,069-Speed 3262.24 samples/sec   Loss 5.4859   LearningRate 0.0419   Epoch: 7   Global Step: 29160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:16:54,274-Speed 3195.93 samples/sec   Loss 5.7367   LearningRate 0.0419   Epoch: 7   Global Step: 29170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:16:57,374-Speed 3303.71 samples/sec   Loss 5.6884   LearningRate 0.0419   Epoch: 7   Global Step: 29180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:00,501-Speed 3276.22 samples/sec   Loss 5.7492   LearningRate 0.0419   Epoch: 7   Global Step: 29190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:03,602-Speed 3302.62 samples/sec   Loss 5.8595   LearningRate 0.0419   Epoch: 7   Global Step: 29200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:06,663-Speed 3346.36 samples/sec   Loss 5.7671   LearningRate 0.0418   Epoch: 7   Global Step: 29210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:09,765-Speed 3301.81 samples/sec   Loss 5.7675   LearningRate 0.0418   Epoch: 7   Global Step: 29220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:12,860-Speed 3309.24 samples/sec   Loss 5.7979   LearningRate 0.0418   Epoch: 7   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:15,958-Speed 3306.63 samples/sec   Loss 5.6648   LearningRate 0.0418   Epoch: 7   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:19,046-Speed 3316.58 samples/sec   Loss 5.8398   LearningRate 0.0418   Epoch: 7   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:22,126-Speed 3324.97 samples/sec   Loss 5.8094   LearningRate 0.0418   Epoch: 7   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:25,211-Speed 3320.63 samples/sec   Loss 5.8618   LearningRate 0.0417   Epoch: 7   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:28,295-Speed 3320.29 samples/sec   Loss 5.7572   LearningRate 0.0417   Epoch: 7   Global Step: 29280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:31,384-Speed 3316.50 samples/sec   Loss 5.8413   LearningRate 0.0417   Epoch: 7   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:34,482-Speed 3306.65 samples/sec   Loss 5.8997   LearningRate 0.0417   Epoch: 7   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:37,573-Speed 3313.43 samples/sec   Loss 5.8022   LearningRate 0.0417   Epoch: 7   Global Step: 29310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:17:40,659-Speed 3318.74 samples/sec   Loss 5.8526   LearningRate 0.0417   Epoch: 7   Global Step: 29320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:17:43,730-Speed 3335.06 samples/sec   Loss 5.9034   LearningRate 0.0416   Epoch: 7   Global Step: 29330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:46,823-Speed 3311.80 samples/sec   Loss 5.9386   LearningRate 0.0416   Epoch: 7   Global Step: 29340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:49,913-Speed 3315.07 samples/sec   Loss 5.8461   LearningRate 0.0416   Epoch: 7   Global Step: 29350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:52,999-Speed 3318.62 samples/sec   Loss 5.8036   LearningRate 0.0416   Epoch: 7   Global Step: 29360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:56,090-Speed 3314.05 samples/sec   Loss 5.7783   LearningRate 0.0416   Epoch: 7   Global Step: 29370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:17:59,177-Speed 3317.76 samples/sec   Loss 5.8461   LearningRate 0.0416   Epoch: 7   Global Step: 29380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:02,262-Speed 3320.49 samples/sec   Loss 5.9034   LearningRate 0.0416   Epoch: 7   Global Step: 29390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:05,349-Speed 3317.29 samples/sec   Loss 5.9130   LearningRate 0.0415   Epoch: 7   Global Step: 29400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:08,435-Speed 3319.45 samples/sec   Loss 5.9603   LearningRate 0.0415   Epoch: 7   Global Step: 29410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:11,518-Speed 3321.65 samples/sec   Loss 5.9883   LearningRate 0.0415   Epoch: 7   Global Step: 29420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:14,608-Speed 3314.66 samples/sec   Loss 5.8249   LearningRate 0.0415   Epoch: 7   Global Step: 29430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:18:17,693-Speed 3319.75 samples/sec   Loss 5.9951   LearningRate 0.0415   Epoch: 7   Global Step: 29440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:18:20,787-Speed 3310.85 samples/sec   Loss 5.9304   LearningRate 0.0415   Epoch: 7   Global Step: 29450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:18:23,905-Speed 3284.87 samples/sec   Loss 5.9095   LearningRate 0.0414   Epoch: 7   Global Step: 29460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:18:26,997-Speed 3313.29 samples/sec   Loss 6.0644   LearningRate 0.0414   Epoch: 7   Global Step: 29470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:18:30,066-Speed 3337.38 samples/sec   Loss 6.0103   LearningRate 0.0414   Epoch: 7   Global Step: 29480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:33,147-Speed 3324.31 samples/sec   Loss 5.9555   LearningRate 0.0414   Epoch: 7   Global Step: 29490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:36,238-Speed 3313.26 samples/sec   Loss 5.9812   LearningRate 0.0414   Epoch: 7   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:39,322-Speed 3321.15 samples/sec   Loss 5.9126   LearningRate 0.0414   Epoch: 7   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:42,408-Speed 3318.61 samples/sec   Loss 6.0848   LearningRate 0.0414   Epoch: 7   Global Step: 29520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:45,499-Speed 3314.60 samples/sec   Loss 6.0167   LearningRate 0.0413   Epoch: 7   Global Step: 29530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:48,587-Speed 3316.99 samples/sec   Loss 6.0767   LearningRate 0.0413   Epoch: 7   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:51,670-Speed 3321.35 samples/sec   Loss 6.0148   LearningRate 0.0413   Epoch: 7   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:54,755-Speed 3320.79 samples/sec   Loss 6.0083   LearningRate 0.0413   Epoch: 7   Global Step: 29560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:18:57,839-Speed 3321.54 samples/sec   Loss 6.0459   LearningRate 0.0413   Epoch: 7   Global Step: 29570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:19:00,922-Speed 3321.94 samples/sec   Loss 6.1075   LearningRate 0.0413   Epoch: 7   Global Step: 29580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:19:04,022-Speed 3304.60 samples/sec   Loss 6.1330   LearningRate 0.0412   Epoch: 7   Global Step: 29590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:19:07,141-Speed 3283.91 samples/sec   Loss 6.1306   LearningRate 0.0412   Epoch: 7   Global Step: 29600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:19:10,239-Speed 3306.14 samples/sec   Loss 6.1735   LearningRate 0.0412   Epoch: 7   Global Step: 29610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:19:13,334-Speed 3308.61 samples/sec   Loss 5.9815   LearningRate 0.0412   Epoch: 7   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:19:16,419-Speed 3320.65 samples/sec   Loss 6.1506   LearningRate 0.0412   Epoch: 7   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:19:19,502-Speed 3322.08 samples/sec   Loss 5.9982   LearningRate 0.0412   Epoch: 7   Global Step: 29640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:19:22,592-Speed 3314.58 samples/sec   Loss 6.0717   LearningRate 0.0411   Epoch: 7   Global Step: 29650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:19:25,678-Speed 3318.83 samples/sec   Loss 6.1177   LearningRate 0.0411   Epoch: 7   Global Step: 29660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:19:28,770-Speed 3313.35 samples/sec   Loss 6.1139   LearningRate 0.0411   Epoch: 7   Global Step: 29670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:19:31,867-Speed 3307.78 samples/sec   Loss 6.1506   LearningRate 0.0411   Epoch: 7   Global Step: 29680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:19:34,953-Speed 3318.98 samples/sec   Loss 6.0516   LearningRate 0.0411   Epoch: 7   Global Step: 29690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:19:38,065-Speed 3290.53 samples/sec   Loss 6.1152   LearningRate 0.0411   Epoch: 7   Global Step: 29700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:19:41,156-Speed 3313.80 samples/sec   Loss 6.1977   LearningRate 0.0411   Epoch: 7   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:19:44,248-Speed 3312.90 samples/sec   Loss 6.1381   LearningRate 0.0410   Epoch: 7   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:19:47,339-Speed 3313.74 samples/sec   Loss 6.0054   LearningRate 0.0410   Epoch: 7   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:19:50,425-Speed 3318.94 samples/sec   Loss 6.1719   LearningRate 0.0410   Epoch: 7   Global Step: 29740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:19:53,528-Speed 3300.26 samples/sec   Loss 6.1992   LearningRate 0.0410   Epoch: 7   Global Step: 29750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:19:56,624-Speed 3308.98 samples/sec   Loss 6.1165   LearningRate 0.0410   Epoch: 7   Global Step: 29760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:19:59,722-Speed 3305.82 samples/sec   Loss 6.2426   LearningRate 0.0410   Epoch: 7   Global Step: 29770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:02,809-Speed 3317.90 samples/sec   Loss 6.1250   LearningRate 0.0409   Epoch: 7   Global Step: 29780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:05,911-Speed 3302.51 samples/sec   Loss 6.2483   LearningRate 0.0409   Epoch: 7   Global Step: 29790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:09,011-Speed 3303.34 samples/sec   Loss 6.2177   LearningRate 0.0409   Epoch: 7   Global Step: 29800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:12,112-Speed 3303.12 samples/sec   Loss 6.2650   LearningRate 0.0409   Epoch: 7   Global Step: 29810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:15,208-Speed 3308.52 samples/sec   Loss 6.1318   LearningRate 0.0409   Epoch: 7   Global Step: 29820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:18,305-Speed 3307.09 samples/sec   Loss 6.2630   LearningRate 0.0409   Epoch: 7   Global Step: 29830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:21,393-Speed 3317.25 samples/sec   Loss 6.1443   LearningRate 0.0409   Epoch: 7   Global Step: 29840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:20:24,489-Speed 3308.39 samples/sec   Loss 6.1859   LearningRate 0.0408   Epoch: 7   Global Step: 29850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:20:27,578-Speed 3316.14 samples/sec   Loss 6.3143   LearningRate 0.0408   Epoch: 7   Global Step: 29860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:20:30,666-Speed 3315.94 samples/sec   Loss 6.2281   LearningRate 0.0408   Epoch: 7   Global Step: 29870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:20:33,753-Speed 3318.52 samples/sec   Loss 6.2221   LearningRate 0.0408   Epoch: 7   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:20:36,836-Speed 3322.60 samples/sec   Loss 6.2571   LearningRate 0.0408   Epoch: 7   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:39,958-Speed 3280.85 samples/sec   Loss 6.2179   LearningRate 0.0408   Epoch: 7   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:43,069-Speed 3292.39 samples/sec   Loss 6.3040   LearningRate 0.0407   Epoch: 7   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:46,157-Speed 3317.47 samples/sec   Loss 6.2836   LearningRate 0.0407   Epoch: 7   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:49,256-Speed 3304.69 samples/sec   Loss 6.1341   LearningRate 0.0407   Epoch: 7   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:52,360-Speed 3300.13 samples/sec   Loss 6.2193   LearningRate 0.0407   Epoch: 7   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:55,469-Speed 3294.80 samples/sec   Loss 6.3151   LearningRate 0.0407   Epoch: 7   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:20:58,581-Speed 3290.41 samples/sec   Loss 6.2613   LearningRate 0.0407   Epoch: 7   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:21:01,671-Speed 3314.87 samples/sec   Loss 6.3203   LearningRate 0.0407   Epoch: 7   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:21:04,773-Speed 3302.15 samples/sec   Loss 6.2802   LearningRate 0.0406   Epoch: 7   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:21:07,865-Speed 3312.75 samples/sec   Loss 6.2799   LearningRate 0.0406   Epoch: 7   Global Step: 29990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:21:10,960-Speed 3308.97 samples/sec   Loss 6.2934   LearningRate 0.0406   Epoch: 7   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:21:54,802-[lfw][30000]XNorm: 23.307369
Training: 2022-04-26 15:21:54,802-[lfw][30000]Accuracy-Flip: 0.99767+-0.00327
Training: 2022-04-26 15:21:54,803-[lfw][30000]Accuracy-Highest: 0.99767
Training: 2022-04-26 15:22:45,423-[cfp_fp][30000]XNorm: 22.277873
Training: 2022-04-26 15:22:45,423-[cfp_fp][30000]Accuracy-Flip: 0.98343+-0.00633
Training: 2022-04-26 15:22:45,424-[cfp_fp][30000]Accuracy-Highest: 0.98343
Training: 2022-04-26 15:23:29,097-[agedb_30][30000]XNorm: 23.454061
Training: 2022-04-26 15:23:29,098-[agedb_30][30000]Accuracy-Flip: 0.96700+-0.00662
Training: 2022-04-26 15:23:29,098-[agedb_30][30000]Accuracy-Highest: 0.97200
Training: 2022-04-26 15:23:32,201-Speed 72.50 samples/sec   Loss 6.4502   LearningRate 0.0406   Epoch: 7   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:23:35,267-Speed 3340.25 samples/sec   Loss 6.3797   LearningRate 0.0406   Epoch: 7   Global Step: 30020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:38,344-Speed 3330.05 samples/sec   Loss 6.2821   LearningRate 0.0406   Epoch: 7   Global Step: 30030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:41,421-Speed 3328.70 samples/sec   Loss 6.3521   LearningRate 0.0405   Epoch: 7   Global Step: 30040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:44,512-Speed 3312.67 samples/sec   Loss 6.4407   LearningRate 0.0405   Epoch: 7   Global Step: 30050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:47,597-Speed 3320.77 samples/sec   Loss 6.4579   LearningRate 0.0405   Epoch: 7   Global Step: 30060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:50,680-Speed 3322.61 samples/sec   Loss 6.4230   LearningRate 0.0405   Epoch: 7   Global Step: 30070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:53,781-Speed 3303.21 samples/sec   Loss 6.4730   LearningRate 0.0405   Epoch: 7   Global Step: 30080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:56,867-Speed 3318.32 samples/sec   Loss 6.3003   LearningRate 0.0405   Epoch: 7   Global Step: 30090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:23:59,967-Speed 3304.55 samples/sec   Loss 6.3424   LearningRate 0.0405   Epoch: 7   Global Step: 30100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:24:03,059-Speed 3312.15 samples/sec   Loss 6.4115   LearningRate 0.0404   Epoch: 7   Global Step: 30110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:24:06,157-Speed 3306.14 samples/sec   Loss 6.4294   LearningRate 0.0404   Epoch: 7   Global Step: 30120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-26 15:24:09,227-Speed 3336.59 samples/sec   Loss 6.2862   LearningRate 0.0404   Epoch: 7   Global Step: 30130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-26 15:24:12,319-Speed 3313.38 samples/sec   Loss 6.3975   LearningRate 0.0404   Epoch: 7   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:15,424-Speed 3297.90 samples/sec   Loss 6.3713   LearningRate 0.0404   Epoch: 7   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:18,515-Speed 3313.42 samples/sec   Loss 6.3249   LearningRate 0.0404   Epoch: 7   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:21,616-Speed 3303.46 samples/sec   Loss 6.3976   LearningRate 0.0403   Epoch: 7   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:24,714-Speed 3305.94 samples/sec   Loss 6.3042   LearningRate 0.0403   Epoch: 7   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:27,816-Speed 3301.45 samples/sec   Loss 6.3269   LearningRate 0.0403   Epoch: 7   Global Step: 30190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:30,911-Speed 3309.50 samples/sec   Loss 6.3809   LearningRate 0.0403   Epoch: 7   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:34,009-Speed 3306.01 samples/sec   Loss 6.4672   LearningRate 0.0403   Epoch: 7   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:37,119-Speed 3293.75 samples/sec   Loss 6.3406   LearningRate 0.0403   Epoch: 7   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:24:40,264-Speed 3257.04 samples/sec   Loss 6.4377   LearningRate 0.0403   Epoch: 7   Global Step: 30230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:24:43,360-Speed 3308.57 samples/sec   Loss 6.3345   LearningRate 0.0402   Epoch: 7   Global Step: 30240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:24:46,455-Speed 3309.30 samples/sec   Loss 6.4835   LearningRate 0.0402   Epoch: 7   Global Step: 30250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:24:49,579-Speed 3277.56 samples/sec   Loss 6.4589   LearningRate 0.0402   Epoch: 7   Global Step: 30260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:24:52,673-Speed 3310.51 samples/sec   Loss 6.4579   LearningRate 0.0402   Epoch: 7   Global Step: 30270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:24:55,777-Speed 3300.09 samples/sec   Loss 6.4913   LearningRate 0.0402   Epoch: 7   Global Step: 30280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:24:58,847-Speed 3336.67 samples/sec   Loss 6.4401   LearningRate 0.0402   Epoch: 7   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:01,937-Speed 3314.80 samples/sec   Loss 6.4250   LearningRate 0.0401   Epoch: 7   Global Step: 30300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:05,084-Speed 3254.44 samples/sec   Loss 6.4376   LearningRate 0.0401   Epoch: 7   Global Step: 30310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:08,175-Speed 3314.73 samples/sec   Loss 6.3368   LearningRate 0.0401   Epoch: 7   Global Step: 30320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:11,267-Speed 3312.55 samples/sec   Loss 6.4287   LearningRate 0.0401   Epoch: 7   Global Step: 30330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:14,354-Speed 3317.54 samples/sec   Loss 6.5117   LearningRate 0.0401   Epoch: 7   Global Step: 30340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:17,434-Speed 3325.70 samples/sec   Loss 6.4411   LearningRate 0.0401   Epoch: 7   Global Step: 30350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:20,519-Speed 3320.35 samples/sec   Loss 6.4533   LearningRate 0.0401   Epoch: 7   Global Step: 30360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:23,608-Speed 3315.72 samples/sec   Loss 6.4228   LearningRate 0.0400   Epoch: 7   Global Step: 30370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:26,702-Speed 3310.54 samples/sec   Loss 6.4191   LearningRate 0.0400   Epoch: 7   Global Step: 30380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:25:29,817-Speed 3287.66 samples/sec   Loss 6.5698   LearningRate 0.0400   Epoch: 7   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:32,890-Speed 3333.70 samples/sec   Loss 6.4500   LearningRate 0.0400   Epoch: 7   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:35,961-Speed 3335.40 samples/sec   Loss 6.5704   LearningRate 0.0400   Epoch: 7   Global Step: 30410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:39,043-Speed 3322.64 samples/sec   Loss 6.4035   LearningRate 0.0400   Epoch: 7   Global Step: 30420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:42,114-Speed 3335.67 samples/sec   Loss 6.3593   LearningRate 0.0399   Epoch: 7   Global Step: 30430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:45,196-Speed 3323.14 samples/sec   Loss 6.5668   LearningRate 0.0399   Epoch: 7   Global Step: 30440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:48,265-Speed 3337.88 samples/sec   Loss 6.3977   LearningRate 0.0399   Epoch: 7   Global Step: 30450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:51,365-Speed 3303.37 samples/sec   Loss 6.5442   LearningRate 0.0399   Epoch: 7   Global Step: 30460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:54,437-Speed 3334.77 samples/sec   Loss 6.5207   LearningRate 0.0399   Epoch: 7   Global Step: 30470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:25:57,514-Speed 3328.55 samples/sec   Loss 6.4627   LearningRate 0.0399   Epoch: 7   Global Step: 30480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:26:00,586-Speed 3334.73 samples/sec   Loss 6.4663   LearningRate 0.0399   Epoch: 7   Global Step: 30490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:26:03,658-Speed 3334.15 samples/sec   Loss 6.4027   LearningRate 0.0398   Epoch: 7   Global Step: 30500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:26:06,715-Speed 3349.79 samples/sec   Loss 6.4988   LearningRate 0.0398   Epoch: 7   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:09,785-Speed 3336.87 samples/sec   Loss 6.4015   LearningRate 0.0398   Epoch: 7   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:12,861-Speed 3329.87 samples/sec   Loss 6.5288   LearningRate 0.0398   Epoch: 7   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:15,934-Speed 3332.63 samples/sec   Loss 6.4695   LearningRate 0.0398   Epoch: 7   Global Step: 30540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:19,034-Speed 3304.03 samples/sec   Loss 6.5058   LearningRate 0.0398   Epoch: 7   Global Step: 30550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:22,106-Speed 3334.45 samples/sec   Loss 6.3231   LearningRate 0.0397   Epoch: 7   Global Step: 30560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:25,181-Speed 3331.44 samples/sec   Loss 6.6491   LearningRate 0.0397   Epoch: 7   Global Step: 30570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:28,258-Speed 3327.66 samples/sec   Loss 6.3899   LearningRate 0.0397   Epoch: 7   Global Step: 30580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:31,330-Speed 3333.93 samples/sec   Loss 6.4500   LearningRate 0.0397   Epoch: 7   Global Step: 30590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:34,414-Speed 3321.54 samples/sec   Loss 6.4015   LearningRate 0.0397   Epoch: 7   Global Step: 30600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:37,502-Speed 3316.65 samples/sec   Loss 6.4361   LearningRate 0.0397   Epoch: 7   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:26:40,575-Speed 3333.40 samples/sec   Loss 6.5343   LearningRate 0.0397   Epoch: 7   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:26:43,650-Speed 3330.99 samples/sec   Loss 6.4574   LearningRate 0.0396   Epoch: 7   Global Step: 30630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:26:46,722-Speed 3334.57 samples/sec   Loss 6.5955   LearningRate 0.0396   Epoch: 7   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:26:49,784-Speed 3343.95 samples/sec   Loss 6.5080   LearningRate 0.0396   Epoch: 7   Global Step: 30650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:52,860-Speed 3329.88 samples/sec   Loss 6.4092   LearningRate 0.0396   Epoch: 7   Global Step: 30660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:55,939-Speed 3326.73 samples/sec   Loss 6.6247   LearningRate 0.0396   Epoch: 7   Global Step: 30670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:26:59,013-Speed 3332.11 samples/sec   Loss 6.5647   LearningRate 0.0396   Epoch: 7   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:02,067-Speed 3353.67 samples/sec   Loss 6.6652   LearningRate 0.0396   Epoch: 7   Global Step: 30690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:05,142-Speed 3329.83 samples/sec   Loss 6.5045   LearningRate 0.0395   Epoch: 7   Global Step: 30700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:08,252-Speed 3293.65 samples/sec   Loss 6.4716   LearningRate 0.0395   Epoch: 7   Global Step: 30710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:11,340-Speed 3317.04 samples/sec   Loss 6.5425   LearningRate 0.0395   Epoch: 7   Global Step: 30720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:14,415-Speed 3331.89 samples/sec   Loss 6.7059   LearningRate 0.0395   Epoch: 7   Global Step: 30730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:17,489-Speed 3331.49 samples/sec   Loss 6.6185   LearningRate 0.0395   Epoch: 7   Global Step: 30740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:20,566-Speed 3328.33 samples/sec   Loss 6.5323   LearningRate 0.0395   Epoch: 7   Global Step: 30750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:23,643-Speed 3328.57 samples/sec   Loss 6.4923   LearningRate 0.0394   Epoch: 7   Global Step: 30760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:26,726-Speed 3323.37 samples/sec   Loss 6.5224   LearningRate 0.0394   Epoch: 7   Global Step: 30770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:29,826-Speed 3303.61 samples/sec   Loss 6.5124   LearningRate 0.0394   Epoch: 7   Global Step: 30780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:27:32,899-Speed 3332.52 samples/sec   Loss 6.5640   LearningRate 0.0394   Epoch: 7   Global Step: 30790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:35,979-Speed 3326.16 samples/sec   Loss 6.5193   LearningRate 0.0394   Epoch: 7   Global Step: 30800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:39,065-Speed 3319.16 samples/sec   Loss 6.5663   LearningRate 0.0394   Epoch: 7   Global Step: 30810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:42,142-Speed 3328.32 samples/sec   Loss 6.6497   LearningRate 0.0394   Epoch: 7   Global Step: 30820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:45,232-Speed 3314.14 samples/sec   Loss 6.4445   LearningRate 0.0393   Epoch: 7   Global Step: 30830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:48,323-Speed 3314.60 samples/sec   Loss 6.5112   LearningRate 0.0393   Epoch: 7   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:51,406-Speed 3322.05 samples/sec   Loss 6.5552   LearningRate 0.0393   Epoch: 7   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:54,528-Speed 3280.39 samples/sec   Loss 6.6120   LearningRate 0.0393   Epoch: 7   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:27:57,622-Speed 3310.37 samples/sec   Loss 6.5620   LearningRate 0.0393   Epoch: 7   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:00,711-Speed 3316.53 samples/sec   Loss 6.5342   LearningRate 0.0393   Epoch: 7   Global Step: 30880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:03,789-Speed 3326.82 samples/sec   Loss 6.4709   LearningRate 0.0392   Epoch: 7   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:28:06,850-Speed 3346.62 samples/sec   Loss 6.5794   LearningRate 0.0392   Epoch: 7   Global Step: 30900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:09,932-Speed 3323.47 samples/sec   Loss 6.4940   LearningRate 0.0392   Epoch: 7   Global Step: 30910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:13,024-Speed 3312.08 samples/sec   Loss 6.6152   LearningRate 0.0392   Epoch: 7   Global Step: 30920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:16,116-Speed 3312.59 samples/sec   Loss 6.5491   LearningRate 0.0392   Epoch: 7   Global Step: 30930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:19,201-Speed 3320.17 samples/sec   Loss 6.6009   LearningRate 0.0392   Epoch: 7   Global Step: 30940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:22,313-Speed 3291.05 samples/sec   Loss 6.5054   LearningRate 0.0392   Epoch: 7   Global Step: 30950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:25,416-Speed 3300.72 samples/sec   Loss 6.6076   LearningRate 0.0391   Epoch: 7   Global Step: 30960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:28,522-Speed 3298.40 samples/sec   Loss 6.6177   LearningRate 0.0391   Epoch: 7   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:31,605-Speed 3323.61 samples/sec   Loss 6.6594   LearningRate 0.0391   Epoch: 7   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:34,701-Speed 3307.54 samples/sec   Loss 6.5466   LearningRate 0.0391   Epoch: 7   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:28:37,774-Speed 3333.88 samples/sec   Loss 6.5240   LearningRate 0.0391   Epoch: 7   Global Step: 31000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:28:40,899-Speed 3277.71 samples/sec   Loss 6.5916   LearningRate 0.0391   Epoch: 7   Global Step: 31010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:28:43,982-Speed 3322.33 samples/sec   Loss 6.6223   LearningRate 0.0391   Epoch: 7   Global Step: 31020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:28:47,063-Speed 3325.13 samples/sec   Loss 6.5113   LearningRate 0.0390   Epoch: 7   Global Step: 31030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:28:50,156-Speed 3310.77 samples/sec   Loss 6.5502   LearningRate 0.0390   Epoch: 7   Global Step: 31040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:28:53,251-Speed 3309.92 samples/sec   Loss 6.6451   LearningRate 0.0390   Epoch: 7   Global Step: 31050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:28:56,332-Speed 3324.06 samples/sec   Loss 6.5821   LearningRate 0.0390   Epoch: 7   Global Step: 31060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:28:59,447-Speed 3287.64 samples/sec   Loss 6.5655   LearningRate 0.0390   Epoch: 7   Global Step: 31070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:29:02,524-Speed 3329.56 samples/sec   Loss 6.6628   LearningRate 0.0390   Epoch: 7   Global Step: 31080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:29:05,615-Speed 3313.45 samples/sec   Loss 6.6163   LearningRate 0.0389   Epoch: 7   Global Step: 31090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:29:08,692-Speed 3328.20 samples/sec   Loss 6.6195   LearningRate 0.0389   Epoch: 7   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:11,772-Speed 3325.31 samples/sec   Loss 6.5488   LearningRate 0.0389   Epoch: 7   Global Step: 31110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:14,853-Speed 3325.28 samples/sec   Loss 6.7032   LearningRate 0.0389   Epoch: 7   Global Step: 31120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:17,937-Speed 3321.09 samples/sec   Loss 6.6553   LearningRate 0.0389   Epoch: 7   Global Step: 31130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:21,019-Speed 3322.54 samples/sec   Loss 6.6141   LearningRate 0.0389   Epoch: 7   Global Step: 31140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:24,112-Speed 3311.61 samples/sec   Loss 6.5927   LearningRate 0.0389   Epoch: 7   Global Step: 31150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:27,195-Speed 3322.03 samples/sec   Loss 6.6581   LearningRate 0.0388   Epoch: 7   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:30,284-Speed 3315.95 samples/sec   Loss 6.5834   LearningRate 0.0388   Epoch: 7   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:33,385-Speed 3302.96 samples/sec   Loss 6.4710   LearningRate 0.0388   Epoch: 7   Global Step: 31180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:36,474-Speed 3315.33 samples/sec   Loss 6.5388   LearningRate 0.0388   Epoch: 7   Global Step: 31190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:39,562-Speed 3316.87 samples/sec   Loss 6.6136   LearningRate 0.0388   Epoch: 7   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:29:42,645-Speed 3322.20 samples/sec   Loss 6.5126   LearningRate 0.0388   Epoch: 7   Global Step: 31210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:45,724-Speed 3326.95 samples/sec   Loss 6.6077   LearningRate 0.0387   Epoch: 7   Global Step: 31220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:48,819-Speed 3309.97 samples/sec   Loss 6.5762   LearningRate 0.0387   Epoch: 7   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:51,906-Speed 3317.06 samples/sec   Loss 6.5680   LearningRate 0.0387   Epoch: 7   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:55,011-Speed 3298.96 samples/sec   Loss 6.6373   LearningRate 0.0387   Epoch: 7   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:29:58,104-Speed 3311.42 samples/sec   Loss 6.4608   LearningRate 0.0387   Epoch: 7   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:30:01,212-Speed 3296.34 samples/sec   Loss 6.7078   LearningRate 0.0387   Epoch: 7   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:30:04,298-Speed 3318.75 samples/sec   Loss 6.5350   LearningRate 0.0387   Epoch: 7   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:30:07,378-Speed 3325.10 samples/sec   Loss 6.5305   LearningRate 0.0386   Epoch: 7   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:30:10,470-Speed 3312.77 samples/sec   Loss 6.6698   LearningRate 0.0386   Epoch: 7   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:30:13,565-Speed 3309.80 samples/sec   Loss 6.6092   LearningRate 0.0386   Epoch: 7   Global Step: 31310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:30:16,648-Speed 3322.22 samples/sec   Loss 6.5264   LearningRate 0.0386   Epoch: 7   Global Step: 31320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:30:19,734-Speed 3318.74 samples/sec   Loss 6.6331   LearningRate 0.0386   Epoch: 7   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:30:22,801-Speed 3340.14 samples/sec   Loss 6.6287   LearningRate 0.0386   Epoch: 7   Global Step: 31340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:30:25,876-Speed 3330.25 samples/sec   Loss 6.5990   LearningRate 0.0386   Epoch: 7   Global Step: 31350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:28,993-Speed 3285.94 samples/sec   Loss 6.6082   LearningRate 0.0385   Epoch: 7   Global Step: 31360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:32,085-Speed 3313.00 samples/sec   Loss 6.6349   LearningRate 0.0385   Epoch: 7   Global Step: 31370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:35,170-Speed 3319.63 samples/sec   Loss 6.7143   LearningRate 0.0385   Epoch: 7   Global Step: 31380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:38,265-Speed 3309.02 samples/sec   Loss 6.6227   LearningRate 0.0385   Epoch: 7   Global Step: 31390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:41,364-Speed 3305.73 samples/sec   Loss 6.6868   LearningRate 0.0385   Epoch: 7   Global Step: 31400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:44,449-Speed 3320.21 samples/sec   Loss 6.5597   LearningRate 0.0385   Epoch: 7   Global Step: 31410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:47,534-Speed 3320.25 samples/sec   Loss 6.6318   LearningRate 0.0384   Epoch: 7   Global Step: 31420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:50,618-Speed 3321.30 samples/sec   Loss 6.5235   LearningRate 0.0384   Epoch: 7   Global Step: 31430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:53,714-Speed 3308.76 samples/sec   Loss 6.5756   LearningRate 0.0384   Epoch: 7   Global Step: 31440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:30:56,802-Speed 3316.33 samples/sec   Loss 6.6682   LearningRate 0.0384   Epoch: 7   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:30:59,903-Speed 3303.63 samples/sec   Loss 6.5764   LearningRate 0.0384   Epoch: 7   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:03,017-Speed 3288.83 samples/sec   Loss 6.7334   LearningRate 0.0384   Epoch: 7   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:06,100-Speed 3322.42 samples/sec   Loss 6.6648   LearningRate 0.0384   Epoch: 7   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:09,185-Speed 3319.65 samples/sec   Loss 6.6192   LearningRate 0.0383   Epoch: 7   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:12,272-Speed 3317.94 samples/sec   Loss 6.6141   LearningRate 0.0383   Epoch: 7   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:15,355-Speed 3322.96 samples/sec   Loss 6.6431   LearningRate 0.0383   Epoch: 7   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:18,435-Speed 3325.52 samples/sec   Loss 6.7707   LearningRate 0.0383   Epoch: 7   Global Step: 31520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:21,527-Speed 3312.70 samples/sec   Loss 6.5732   LearningRate 0.0383   Epoch: 7   Global Step: 31530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:24,634-Speed 3296.66 samples/sec   Loss 6.5919   LearningRate 0.0383   Epoch: 7   Global Step: 31540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:27,705-Speed 3335.21 samples/sec   Loss 6.7248   LearningRate 0.0383   Epoch: 7   Global Step: 31550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:30,789-Speed 3321.06 samples/sec   Loss 6.6839   LearningRate 0.0382   Epoch: 7   Global Step: 31560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:33,874-Speed 3320.19 samples/sec   Loss 6.5469   LearningRate 0.0382   Epoch: 7   Global Step: 31570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:36,956-Speed 3323.18 samples/sec   Loss 6.6415   LearningRate 0.0382   Epoch: 7   Global Step: 31580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:40,041-Speed 3320.05 samples/sec   Loss 6.6123   LearningRate 0.0382   Epoch: 7   Global Step: 31590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:43,126-Speed 3319.62 samples/sec   Loss 6.6359   LearningRate 0.0382   Epoch: 7   Global Step: 31600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:46,215-Speed 3316.70 samples/sec   Loss 6.6338   LearningRate 0.0382   Epoch: 7   Global Step: 31610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:49,302-Speed 3316.94 samples/sec   Loss 6.6070   LearningRate 0.0381   Epoch: 7   Global Step: 31620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:52,409-Speed 3297.10 samples/sec   Loss 6.5309   LearningRate 0.0381   Epoch: 7   Global Step: 31630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:55,492-Speed 3322.37 samples/sec   Loss 6.5084   LearningRate 0.0381   Epoch: 7   Global Step: 31640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:31:58,576-Speed 3320.69 samples/sec   Loss 6.6238   LearningRate 0.0381   Epoch: 7   Global Step: 31650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:32:01,653-Speed 3329.10 samples/sec   Loss 6.5798   LearningRate 0.0381   Epoch: 7   Global Step: 31660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:04,739-Speed 3318.45 samples/sec   Loss 6.5553   LearningRate 0.0381   Epoch: 7   Global Step: 31670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:07,826-Speed 3317.88 samples/sec   Loss 6.6430   LearningRate 0.0381   Epoch: 7   Global Step: 31680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:10,914-Speed 3317.43 samples/sec   Loss 6.5343   LearningRate 0.0380   Epoch: 7   Global Step: 31690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:14,005-Speed 3313.48 samples/sec   Loss 6.6527   LearningRate 0.0380   Epoch: 7   Global Step: 31700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:17,100-Speed 3308.51 samples/sec   Loss 6.6048   LearningRate 0.0380   Epoch: 7   Global Step: 31710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:20,196-Speed 3308.04 samples/sec   Loss 6.6310   LearningRate 0.0380   Epoch: 7   Global Step: 31720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:23,292-Speed 3310.45 samples/sec   Loss 6.7011   LearningRate 0.0380   Epoch: 7   Global Step: 31730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:26,389-Speed 3306.71 samples/sec   Loss 6.7360   LearningRate 0.0380   Epoch: 7   Global Step: 31740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:29,480-Speed 3314.15 samples/sec   Loss 6.6028   LearningRate 0.0380   Epoch: 7   Global Step: 31750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:32,551-Speed 3334.90 samples/sec   Loss 6.6814   LearningRate 0.0379   Epoch: 7   Global Step: 31760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:35,663-Speed 3291.34 samples/sec   Loss 6.5145   LearningRate 0.0379   Epoch: 7   Global Step: 31770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:38,766-Speed 3300.18 samples/sec   Loss 6.5971   LearningRate 0.0379   Epoch: 7   Global Step: 31780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:41,869-Speed 3300.33 samples/sec   Loss 6.6302   LearningRate 0.0379   Epoch: 7   Global Step: 31790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:44,965-Speed 3309.02 samples/sec   Loss 6.5622   LearningRate 0.0379   Epoch: 7   Global Step: 31800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:48,052-Speed 3317.62 samples/sec   Loss 6.4612   LearningRate 0.0379   Epoch: 7   Global Step: 31810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:51,141-Speed 3316.02 samples/sec   Loss 6.6025   LearningRate 0.0379   Epoch: 7   Global Step: 31820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:54,234-Speed 3312.17 samples/sec   Loss 6.5858   LearningRate 0.0378   Epoch: 7   Global Step: 31830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:32:57,340-Speed 3297.52 samples/sec   Loss 6.6038   LearningRate 0.0378   Epoch: 7   Global Step: 31840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:00,445-Speed 3297.72 samples/sec   Loss 6.7116   LearningRate 0.0378   Epoch: 7   Global Step: 31850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:03,549-Speed 3300.27 samples/sec   Loss 6.6245   LearningRate 0.0378   Epoch: 7   Global Step: 31860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:33:06,650-Speed 3302.94 samples/sec   Loss 6.5826   LearningRate 0.0378   Epoch: 7   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:33:09,720-Speed 3336.07 samples/sec   Loss 6.5816   LearningRate 0.0378   Epoch: 7   Global Step: 31880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:12,814-Speed 3310.35 samples/sec   Loss 6.6536   LearningRate 0.0377   Epoch: 7   Global Step: 31890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:15,913-Speed 3305.36 samples/sec   Loss 6.5855   LearningRate 0.0377   Epoch: 7   Global Step: 31900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:19,009-Speed 3308.12 samples/sec   Loss 6.6181   LearningRate 0.0377   Epoch: 7   Global Step: 31910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:22,105-Speed 3307.86 samples/sec   Loss 6.6105   LearningRate 0.0377   Epoch: 7   Global Step: 31920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:25,206-Speed 3302.72 samples/sec   Loss 6.6707   LearningRate 0.0377   Epoch: 7   Global Step: 31930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:28,311-Speed 3299.54 samples/sec   Loss 6.6340   LearningRate 0.0377   Epoch: 7   Global Step: 31940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:31,401-Speed 3315.23 samples/sec   Loss 6.6822   LearningRate 0.0377   Epoch: 7   Global Step: 31950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:34,498-Speed 3306.67 samples/sec   Loss 6.6565   LearningRate 0.0376   Epoch: 7   Global Step: 31960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:37,587-Speed 3315.31 samples/sec   Loss 6.6112   LearningRate 0.0376   Epoch: 7   Global Step: 31970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:33:40,692-Speed 3298.91 samples/sec   Loss 6.5952   LearningRate 0.0376   Epoch: 7   Global Step: 31980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:33:43,779-Speed 3317.84 samples/sec   Loss 6.6176   LearningRate 0.0376   Epoch: 7   Global Step: 31990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:33:46,880-Speed 3302.28 samples/sec   Loss 6.5323   LearningRate 0.0376   Epoch: 7   Global Step: 32000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:34:30,462-[lfw][32000]XNorm: 23.566229
Training: 2022-04-26 15:34:30,462-[lfw][32000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-26 15:34:30,463-[lfw][32000]Accuracy-Highest: 0.99767
Training: 2022-04-26 15:35:21,343-[cfp_fp][32000]XNorm: 22.102636
Training: 2022-04-26 15:35:21,344-[cfp_fp][32000]Accuracy-Flip: 0.98143+-0.00857
Training: 2022-04-26 15:35:21,344-[cfp_fp][32000]Accuracy-Highest: 0.98343
Training: 2022-04-26 15:36:05,001-[agedb_30][32000]XNorm: 23.248505
Training: 2022-04-26 15:36:05,002-[agedb_30][32000]Accuracy-Flip: 0.96917+-0.01049
Training: 2022-04-26 15:36:05,002-[agedb_30][32000]Accuracy-Highest: 0.97200
Training: 2022-04-26 15:36:08,096-Speed 72.51 samples/sec   Loss 6.6639   LearningRate 0.0376   Epoch: 7   Global Step: 32010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:36:11,174-Speed 3327.32 samples/sec   Loss 6.6146   LearningRate 0.0376   Epoch: 7   Global Step: 32020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:36:14,241-Speed 3340.01 samples/sec   Loss 6.7075   LearningRate 0.0375   Epoch: 7   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:17,335-Speed 3310.38 samples/sec   Loss 6.6688   LearningRate 0.0375   Epoch: 7   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:20,420-Speed 3319.65 samples/sec   Loss 6.6125   LearningRate 0.0375   Epoch: 7   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:23,514-Speed 3310.37 samples/sec   Loss 6.6688   LearningRate 0.0375   Epoch: 7   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:26,607-Speed 3311.40 samples/sec   Loss 6.5304   LearningRate 0.0375   Epoch: 7   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:29,697-Speed 3314.45 samples/sec   Loss 6.5308   LearningRate 0.0375   Epoch: 7   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:32,784-Speed 3318.32 samples/sec   Loss 6.6634   LearningRate 0.0375   Epoch: 7   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:35,870-Speed 3319.10 samples/sec   Loss 6.5035   LearningRate 0.0374   Epoch: 7   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:38,955-Speed 3319.13 samples/sec   Loss 6.5788   LearningRate 0.0374   Epoch: 7   Global Step: 32110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:42,042-Speed 3318.30 samples/sec   Loss 6.7026   LearningRate 0.0374   Epoch: 7   Global Step: 32120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:36:45,134-Speed 3312.67 samples/sec   Loss 6.5840   LearningRate 0.0374   Epoch: 7   Global Step: 32130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:36:48,231-Speed 3307.58 samples/sec   Loss 6.6176   LearningRate 0.0374   Epoch: 7   Global Step: 32140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:36:51,326-Speed 3308.94 samples/sec   Loss 6.5176   LearningRate 0.0374   Epoch: 7   Global Step: 32150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:36:54,437-Speed 3292.96 samples/sec   Loss 6.5686   LearningRate 0.0373   Epoch: 7   Global Step: 32160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:36:57,542-Speed 3298.32 samples/sec   Loss 6.6317   LearningRate 0.0373   Epoch: 7   Global Step: 32170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:37:00,688-Speed 3255.88 samples/sec   Loss 6.6515   LearningRate 0.0373   Epoch: 7   Global Step: 32180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:03,790-Speed 3301.75 samples/sec   Loss 6.6346   LearningRate 0.0373   Epoch: 7   Global Step: 32190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:06,925-Speed 3266.66 samples/sec   Loss 6.5815   LearningRate 0.0373   Epoch: 7   Global Step: 32200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:10,033-Speed 3295.87 samples/sec   Loss 6.6502   LearningRate 0.0373   Epoch: 7   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:13,131-Speed 3305.59 samples/sec   Loss 6.5459   LearningRate 0.0373   Epoch: 7   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:16,244-Speed 3290.00 samples/sec   Loss 6.5801   LearningRate 0.0372   Epoch: 7   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:19,347-Speed 3301.89 samples/sec   Loss 6.5180   LearningRate 0.0372   Epoch: 7   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:22,443-Speed 3308.30 samples/sec   Loss 6.6835   LearningRate 0.0372   Epoch: 7   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:25,552-Speed 3294.52 samples/sec   Loss 6.6529   LearningRate 0.0372   Epoch: 7   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:28,661-Speed 3293.94 samples/sec   Loss 6.5131   LearningRate 0.0372   Epoch: 7   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:31,754-Speed 3311.29 samples/sec   Loss 6.6588   LearningRate 0.0372   Epoch: 7   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:37:34,848-Speed 3309.82 samples/sec   Loss 6.5437   LearningRate 0.0372   Epoch: 7   Global Step: 32290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:37:37,923-Speed 3331.64 samples/sec   Loss 6.6077   LearningRate 0.0371   Epoch: 7   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:41,025-Speed 3301.41 samples/sec   Loss 6.6071   LearningRate 0.0371   Epoch: 7   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:44,116-Speed 3313.55 samples/sec   Loss 6.5467   LearningRate 0.0371   Epoch: 7   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:47,208-Speed 3312.70 samples/sec   Loss 6.5314   LearningRate 0.0371   Epoch: 7   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:50,307-Speed 3305.70 samples/sec   Loss 6.4764   LearningRate 0.0371   Epoch: 7   Global Step: 32340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:53,395-Speed 3316.50 samples/sec   Loss 6.4686   LearningRate 0.0371   Epoch: 7   Global Step: 32350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:56,485-Speed 3314.64 samples/sec   Loss 6.6845   LearningRate 0.0371   Epoch: 7   Global Step: 32360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:37:59,573-Speed 3316.41 samples/sec   Loss 6.5662   LearningRate 0.0370   Epoch: 7   Global Step: 32370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:02,684-Speed 3292.34 samples/sec   Loss 6.5247   LearningRate 0.0370   Epoch: 7   Global Step: 32380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:05,921-Speed 3164.13 samples/sec   Loss 6.5849   LearningRate 0.0370   Epoch: 7   Global Step: 32390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:08,986-Speed 3341.46 samples/sec   Loss 6.6145   LearningRate 0.0370   Epoch: 7   Global Step: 32400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:12,056-Speed 3336.46 samples/sec   Loss 6.6257   LearningRate 0.0370   Epoch: 7   Global Step: 32410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:15,141-Speed 3320.66 samples/sec   Loss 6.6624   LearningRate 0.0370   Epoch: 7   Global Step: 32420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:18,224-Speed 3322.49 samples/sec   Loss 6.6543   LearningRate 0.0369   Epoch: 7   Global Step: 32430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:21,303-Speed 3326.22 samples/sec   Loss 6.5932   LearningRate 0.0369   Epoch: 7   Global Step: 32440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:24,397-Speed 3310.26 samples/sec   Loss 6.5123   LearningRate 0.0369   Epoch: 7   Global Step: 32450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:27,487-Speed 3315.47 samples/sec   Loss 6.5651   LearningRate 0.0369   Epoch: 7   Global Step: 32460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:30,571-Speed 3320.85 samples/sec   Loss 6.6064   LearningRate 0.0369   Epoch: 7   Global Step: 32470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:33,665-Speed 3310.46 samples/sec   Loss 6.4173   LearningRate 0.0369   Epoch: 7   Global Step: 32480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:36,749-Speed 3320.70 samples/sec   Loss 6.5298   LearningRate 0.0369   Epoch: 7   Global Step: 32490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:39,835-Speed 3318.86 samples/sec   Loss 6.8213   LearningRate 0.0368   Epoch: 7   Global Step: 32500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:38:42,929-Speed 3311.05 samples/sec   Loss 6.6113   LearningRate 0.0368   Epoch: 7   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:46,014-Speed 3320.25 samples/sec   Loss 6.6260   LearningRate 0.0368   Epoch: 7   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:49,096-Speed 3323.65 samples/sec   Loss 6.4920   LearningRate 0.0368   Epoch: 7   Global Step: 32530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:52,194-Speed 3305.76 samples/sec   Loss 6.5375   LearningRate 0.0368   Epoch: 7   Global Step: 32540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:55,280-Speed 3318.76 samples/sec   Loss 6.5780   LearningRate 0.0368   Epoch: 7   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:38:58,358-Speed 3327.47 samples/sec   Loss 6.5464   LearningRate 0.0368   Epoch: 7   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:01,441-Speed 3322.39 samples/sec   Loss 6.6577   LearningRate 0.0367   Epoch: 7   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:04,555-Speed 3289.61 samples/sec   Loss 6.5615   LearningRate 0.0367   Epoch: 7   Global Step: 32580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:07,634-Speed 3325.82 samples/sec   Loss 6.5670   LearningRate 0.0367   Epoch: 7   Global Step: 32590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:10,744-Speed 3294.32 samples/sec   Loss 6.5154   LearningRate 0.0367   Epoch: 7   Global Step: 32600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:13,831-Speed 3318.08 samples/sec   Loss 6.4631   LearningRate 0.0367   Epoch: 7   Global Step: 32610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:16,912-Speed 3323.75 samples/sec   Loss 6.5332   LearningRate 0.0367   Epoch: 7   Global Step: 32620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:20,030-Speed 3285.22 samples/sec   Loss 6.5961   LearningRate 0.0367   Epoch: 7   Global Step: 32630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:23,118-Speed 3316.17 samples/sec   Loss 6.5129   LearningRate 0.0366   Epoch: 7   Global Step: 32640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:26,499-Speed 3029.85 samples/sec   Loss 6.5760   LearningRate 0.0366   Epoch: 7   Global Step: 32650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:29,588-Speed 3315.50 samples/sec   Loss 6.5369   LearningRate 0.0366   Epoch: 7   Global Step: 32660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:32,686-Speed 3305.95 samples/sec   Loss 6.5298   LearningRate 0.0366   Epoch: 7   Global Step: 32670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:35,815-Speed 3274.42 samples/sec   Loss 6.5052   LearningRate 0.0366   Epoch: 7   Global Step: 32680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:38,933-Speed 3284.29 samples/sec   Loss 6.5697   LearningRate 0.0366   Epoch: 7   Global Step: 32690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:42,069-Speed 3266.47 samples/sec   Loss 6.6095   LearningRate 0.0366   Epoch: 7   Global Step: 32700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:39:45,127-Speed 3350.06 samples/sec   Loss 6.5597   LearningRate 0.0365   Epoch: 7   Global Step: 32710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:48,239-Speed 3291.11 samples/sec   Loss 6.6150   LearningRate 0.0365   Epoch: 7   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:51,329-Speed 3315.04 samples/sec   Loss 6.5256   LearningRate 0.0365   Epoch: 7   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:54,418-Speed 3315.98 samples/sec   Loss 6.5362   LearningRate 0.0365   Epoch: 7   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:39:57,508-Speed 3314.30 samples/sec   Loss 6.6427   LearningRate 0.0365   Epoch: 7   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:40:00,599-Speed 3313.31 samples/sec   Loss 6.6051   LearningRate 0.0365   Epoch: 7   Global Step: 32760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:40:03,688-Speed 3315.93 samples/sec   Loss 6.5137   LearningRate 0.0365   Epoch: 7   Global Step: 32770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:40:06,777-Speed 3315.67 samples/sec   Loss 6.5575   LearningRate 0.0364   Epoch: 7   Global Step: 32780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:40:09,871-Speed 3310.63 samples/sec   Loss 6.6717   LearningRate 0.0364   Epoch: 7   Global Step: 32790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:40:12,977-Speed 3298.19 samples/sec   Loss 6.4788   LearningRate 0.0364   Epoch: 7   Global Step: 32800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:40:16,068-Speed 3313.20 samples/sec   Loss 6.6321   LearningRate 0.0364   Epoch: 7   Global Step: 32810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:40:19,106-Speed 3370.80 samples/sec   Loss 6.6412   LearningRate 0.0364   Epoch: 7   Global Step: 32820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:22,193-Speed 3318.63 samples/sec   Loss 6.4356   LearningRate 0.0364   Epoch: 7   Global Step: 32830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:25,293-Speed 3303.17 samples/sec   Loss 6.6158   LearningRate 0.0363   Epoch: 7   Global Step: 32840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:28,383-Speed 3314.65 samples/sec   Loss 6.4429   LearningRate 0.0363   Epoch: 7   Global Step: 32850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:31,471-Speed 3317.79 samples/sec   Loss 6.5418   LearningRate 0.0363   Epoch: 7   Global Step: 32860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:34,560-Speed 3315.01 samples/sec   Loss 6.5389   LearningRate 0.0363   Epoch: 7   Global Step: 32870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:37,676-Speed 3287.84 samples/sec   Loss 6.6472   LearningRate 0.0363   Epoch: 7   Global Step: 32880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:40,767-Speed 3313.73 samples/sec   Loss 6.6000   LearningRate 0.0363   Epoch: 7   Global Step: 32890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:43,862-Speed 3308.62 samples/sec   Loss 6.5017   LearningRate 0.0363   Epoch: 7   Global Step: 32900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:46,977-Speed 3288.65 samples/sec   Loss 6.6851   LearningRate 0.0362   Epoch: 7   Global Step: 32910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-26 15:40:50,072-Speed 3309.53 samples/sec   Loss 6.5102   LearningRate 0.0362   Epoch: 7   Global Step: 32920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:40:53,171-Speed 3305.36 samples/sec   Loss 6.5499   LearningRate 0.0362   Epoch: 7   Global Step: 32930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:40:56,263-Speed 3311.76 samples/sec   Loss 6.4929   LearningRate 0.0362   Epoch: 7   Global Step: 32940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:40:59,351-Speed 3316.74 samples/sec   Loss 6.5499   LearningRate 0.0362   Epoch: 7   Global Step: 32950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:02,443-Speed 3313.12 samples/sec   Loss 6.6929   LearningRate 0.0362   Epoch: 7   Global Step: 32960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:05,590-Speed 3254.77 samples/sec   Loss 6.5678   LearningRate 0.0362   Epoch: 7   Global Step: 32970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:08,679-Speed 3315.88 samples/sec   Loss 6.6023   LearningRate 0.0361   Epoch: 7   Global Step: 32980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:11,771-Speed 3313.62 samples/sec   Loss 6.5981   LearningRate 0.0361   Epoch: 7   Global Step: 32990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:14,874-Speed 3299.90 samples/sec   Loss 6.5247   LearningRate 0.0361   Epoch: 7   Global Step: 33000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:17,972-Speed 3306.52 samples/sec   Loss 6.5377   LearningRate 0.0361   Epoch: 7   Global Step: 33010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:21,058-Speed 3319.46 samples/sec   Loss 6.4553   LearningRate 0.0361   Epoch: 7   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:41:24,141-Speed 3321.71 samples/sec   Loss 6.5454   LearningRate 0.0361   Epoch: 7   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:41:27,211-Speed 3336.66 samples/sec   Loss 6.6635   LearningRate 0.0361   Epoch: 7   Global Step: 33040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:30,304-Speed 3311.80 samples/sec   Loss 6.4127   LearningRate 0.0360   Epoch: 7   Global Step: 33050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:33,400-Speed 3308.05 samples/sec   Loss 6.4226   LearningRate 0.0360   Epoch: 7   Global Step: 33060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:36,553-Speed 3248.01 samples/sec   Loss 6.5261   LearningRate 0.0360   Epoch: 7   Global Step: 33070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:39,648-Speed 3309.91 samples/sec   Loss 6.4687   LearningRate 0.0360   Epoch: 7   Global Step: 33080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:53,484-Speed 740.15 samples/sec   Loss 4.9499   LearningRate 0.0360   Epoch: 8   Global Step: 33090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:56,598-Speed 3289.20 samples/sec   Loss 4.8935   LearningRate 0.0360   Epoch: 8   Global Step: 33100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:41:59,679-Speed 3325.08 samples/sec   Loss 4.8964   LearningRate 0.0360   Epoch: 8   Global Step: 33110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:42:02,756-Speed 3328.00 samples/sec   Loss 4.9603   LearningRate 0.0359   Epoch: 8   Global Step: 33120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:42:05,839-Speed 3322.74 samples/sec   Loss 4.8404   LearningRate 0.0359   Epoch: 8   Global Step: 33130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:42:08,921-Speed 3322.70 samples/sec   Loss 4.9894   LearningRate 0.0359   Epoch: 8   Global Step: 33140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:12,021-Speed 3304.63 samples/sec   Loss 5.0916   LearningRate 0.0359   Epoch: 8   Global Step: 33150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:15,112-Speed 3313.03 samples/sec   Loss 4.9243   LearningRate 0.0359   Epoch: 8   Global Step: 33160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:18,199-Speed 3318.01 samples/sec   Loss 4.8968   LearningRate 0.0359   Epoch: 8   Global Step: 33170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:21,285-Speed 3319.22 samples/sec   Loss 4.9362   LearningRate 0.0359   Epoch: 8   Global Step: 33180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:24,389-Speed 3299.66 samples/sec   Loss 4.8742   LearningRate 0.0358   Epoch: 8   Global Step: 33190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:27,476-Speed 3318.49 samples/sec   Loss 4.9136   LearningRate 0.0358   Epoch: 8   Global Step: 33200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:30,565-Speed 3315.63 samples/sec   Loss 4.9950   LearningRate 0.0358   Epoch: 8   Global Step: 33210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:33,654-Speed 3314.81 samples/sec   Loss 5.0605   LearningRate 0.0358   Epoch: 8   Global Step: 33220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:36,749-Speed 3310.30 samples/sec   Loss 4.9918   LearningRate 0.0358   Epoch: 8   Global Step: 33230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:42:39,837-Speed 3316.43 samples/sec   Loss 5.0170   LearningRate 0.0358   Epoch: 8   Global Step: 33240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:42:42,925-Speed 3317.52 samples/sec   Loss 5.1310   LearningRate 0.0358   Epoch: 8   Global Step: 33250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:42:46,013-Speed 3317.06 samples/sec   Loss 5.0513   LearningRate 0.0357   Epoch: 8   Global Step: 33260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:42:49,106-Speed 3311.88 samples/sec   Loss 5.0139   LearningRate 0.0357   Epoch: 8   Global Step: 33270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:42:52,195-Speed 3315.76 samples/sec   Loss 5.0167   LearningRate 0.0357   Epoch: 8   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:42:55,299-Speed 3299.12 samples/sec   Loss 5.1120   LearningRate 0.0357   Epoch: 8   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:42:58,383-Speed 3321.75 samples/sec   Loss 4.9873   LearningRate 0.0357   Epoch: 8   Global Step: 33300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:01,483-Speed 3304.68 samples/sec   Loss 5.0292   LearningRate 0.0357   Epoch: 8   Global Step: 33310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:04,578-Speed 3309.20 samples/sec   Loss 5.1941   LearningRate 0.0357   Epoch: 8   Global Step: 33320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:07,669-Speed 3313.62 samples/sec   Loss 5.0844   LearningRate 0.0356   Epoch: 8   Global Step: 33330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:10,773-Speed 3299.86 samples/sec   Loss 5.0950   LearningRate 0.0356   Epoch: 8   Global Step: 33340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:13,877-Speed 3299.67 samples/sec   Loss 5.2056   LearningRate 0.0356   Epoch: 8   Global Step: 33350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:16,972-Speed 3312.19 samples/sec   Loss 5.0821   LearningRate 0.0356   Epoch: 8   Global Step: 33360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:20,067-Speed 3309.05 samples/sec   Loss 5.2065   LearningRate 0.0356   Epoch: 8   Global Step: 33370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:23,173-Speed 3298.16 samples/sec   Loss 5.2229   LearningRate 0.0356   Epoch: 8   Global Step: 33380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:26,272-Speed 3304.55 samples/sec   Loss 5.1303   LearningRate 0.0356   Epoch: 8   Global Step: 33390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:29,367-Speed 3309.61 samples/sec   Loss 5.0655   LearningRate 0.0355   Epoch: 8   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:43:32,441-Speed 3331.76 samples/sec   Loss 5.2235   LearningRate 0.0355   Epoch: 8   Global Step: 33410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:35,552-Speed 3292.58 samples/sec   Loss 5.1810   LearningRate 0.0355   Epoch: 8   Global Step: 33420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:38,728-Speed 3224.88 samples/sec   Loss 5.2971   LearningRate 0.0355   Epoch: 8   Global Step: 33430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:41,819-Speed 3313.17 samples/sec   Loss 5.1882   LearningRate 0.0355   Epoch: 8   Global Step: 33440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:44,914-Speed 3309.26 samples/sec   Loss 5.1671   LearningRate 0.0355   Epoch: 8   Global Step: 33450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:48,007-Speed 3316.77 samples/sec   Loss 5.2011   LearningRate 0.0355   Epoch: 8   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:51,104-Speed 3307.44 samples/sec   Loss 5.1331   LearningRate 0.0354   Epoch: 8   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:54,203-Speed 3313.32 samples/sec   Loss 5.0294   LearningRate 0.0354   Epoch: 8   Global Step: 33480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:43:57,310-Speed 3295.70 samples/sec   Loss 5.1745   LearningRate 0.0354   Epoch: 8   Global Step: 33490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:00,401-Speed 3314.41 samples/sec   Loss 5.2221   LearningRate 0.0354   Epoch: 8   Global Step: 33500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:03,482-Speed 3323.68 samples/sec   Loss 5.2218   LearningRate 0.0354   Epoch: 8   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:06,571-Speed 3316.77 samples/sec   Loss 5.3346   LearningRate 0.0354   Epoch: 8   Global Step: 33520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:09,658-Speed 3317.39 samples/sec   Loss 5.2588   LearningRate 0.0353   Epoch: 8   Global Step: 33530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:12,748-Speed 3314.96 samples/sec   Loss 5.3258   LearningRate 0.0353   Epoch: 8   Global Step: 33540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:15,872-Speed 3277.96 samples/sec   Loss 5.2998   LearningRate 0.0353   Epoch: 8   Global Step: 33550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:18,955-Speed 3322.20 samples/sec   Loss 5.2912   LearningRate 0.0353   Epoch: 8   Global Step: 33560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:22,043-Speed 3317.44 samples/sec   Loss 5.2592   LearningRate 0.0353   Epoch: 8   Global Step: 33570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:25,140-Speed 3306.81 samples/sec   Loss 5.2578   LearningRate 0.0353   Epoch: 8   Global Step: 33580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:28,231-Speed 3313.65 samples/sec   Loss 5.3524   LearningRate 0.0353   Epoch: 8   Global Step: 33590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:31,339-Speed 3295.80 samples/sec   Loss 5.3681   LearningRate 0.0352   Epoch: 8   Global Step: 33600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:34,427-Speed 3316.22 samples/sec   Loss 5.2748   LearningRate 0.0352   Epoch: 8   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:44:37,511-Speed 3321.39 samples/sec   Loss 5.3468   LearningRate 0.0352   Epoch: 8   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:44:40,582-Speed 3335.57 samples/sec   Loss 5.3649   LearningRate 0.0352   Epoch: 8   Global Step: 33630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:43,670-Speed 3316.87 samples/sec   Loss 5.4209   LearningRate 0.0352   Epoch: 8   Global Step: 33640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:46,753-Speed 3321.76 samples/sec   Loss 5.4294   LearningRate 0.0352   Epoch: 8   Global Step: 33650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:49,844-Speed 3313.90 samples/sec   Loss 5.3768   LearningRate 0.0352   Epoch: 8   Global Step: 33660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:52,929-Speed 3320.40 samples/sec   Loss 5.5010   LearningRate 0.0351   Epoch: 8   Global Step: 33670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:56,013-Speed 3320.68 samples/sec   Loss 5.2300   LearningRate 0.0351   Epoch: 8   Global Step: 33680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:44:59,110-Speed 3307.38 samples/sec   Loss 5.3504   LearningRate 0.0351   Epoch: 8   Global Step: 33690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:02,204-Speed 3310.40 samples/sec   Loss 5.4393   LearningRate 0.0351   Epoch: 8   Global Step: 33700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:05,298-Speed 3309.80 samples/sec   Loss 5.3423   LearningRate 0.0351   Epoch: 8   Global Step: 33710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:08,386-Speed 3316.98 samples/sec   Loss 5.4886   LearningRate 0.0351   Epoch: 8   Global Step: 33720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:13,216-Speed 3312.38 samples/sec   Loss 5.5181   LearningRate 0.0351   Epoch: 8   Global Step: 33730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:45:21,249-Speed 3321.36 samples/sec   Loss 5.3325   LearningRate 0.0350   Epoch: 8   Global Step: 33740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:45:24,331-Speed 3323.93 samples/sec   Loss 5.4024   LearningRate 0.0350   Epoch: 8   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:27,472-Speed 3261.14 samples/sec   Loss 5.4301   LearningRate 0.0350   Epoch: 8   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:30,568-Speed 3308.37 samples/sec   Loss 5.3926   LearningRate 0.0350   Epoch: 8   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:33,668-Speed 3303.64 samples/sec   Loss 5.5180   LearningRate 0.0350   Epoch: 8   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:36,752-Speed 3320.61 samples/sec   Loss 5.4426   LearningRate 0.0350   Epoch: 8   Global Step: 33790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:39,833-Speed 3324.22 samples/sec   Loss 5.5242   LearningRate 0.0350   Epoch: 8   Global Step: 33800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:42,915-Speed 3324.37 samples/sec   Loss 5.5346   LearningRate 0.0349   Epoch: 8   Global Step: 33810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:45,996-Speed 3323.53 samples/sec   Loss 5.5459   LearningRate 0.0349   Epoch: 8   Global Step: 33820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:49,078-Speed 3322.84 samples/sec   Loss 5.5041   LearningRate 0.0349   Epoch: 8   Global Step: 33830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:45:52,191-Speed 3290.53 samples/sec   Loss 5.4983   LearningRate 0.0349   Epoch: 8   Global Step: 33840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:03,440-Speed 3335.47 samples/sec   Loss 5.5721   LearningRate 0.0349   Epoch: 8   Global Step: 33850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:18,786-Speed 3319.16 samples/sec   Loss 5.3884   LearningRate 0.0349   Epoch: 8   Global Step: 33860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:28,939-Speed 3316.16 samples/sec   Loss 5.5002   LearningRate 0.0349   Epoch: 8   Global Step: 33870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:40,054-Speed 3337.38 samples/sec   Loss 5.5266   LearningRate 0.0348   Epoch: 8   Global Step: 33880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:49,668-Speed 3336.50 samples/sec   Loss 5.6116   LearningRate 0.0348   Epoch: 8   Global Step: 33890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:52,742-Speed 3332.11 samples/sec   Loss 5.6461   LearningRate 0.0348   Epoch: 8   Global Step: 33900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:55,824-Speed 3323.58 samples/sec   Loss 5.6119   LearningRate 0.0348   Epoch: 8   Global Step: 33910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:46:58,923-Speed 3305.09 samples/sec   Loss 5.4627   LearningRate 0.0348   Epoch: 8   Global Step: 33920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:47:02,000-Speed 3328.97 samples/sec   Loss 5.5541   LearningRate 0.0348   Epoch: 8   Global Step: 33930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:47:05,177-Speed 3224.43 samples/sec   Loss 5.4903   LearningRate 0.0348   Epoch: 8   Global Step: 33940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:47:08,237-Speed 3346.20 samples/sec   Loss 5.5761   LearningRate 0.0347   Epoch: 8   Global Step: 33950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:47:14,179-Speed 3323.99 samples/sec   Loss 5.6579   LearningRate 0.0347   Epoch: 8   Global Step: 33960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:47:18,492-Speed 3282.22 samples/sec   Loss 5.5921   LearningRate 0.0347   Epoch: 8   Global Step: 33970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:47:21,574-Speed 3323.11 samples/sec   Loss 5.5751   LearningRate 0.0347   Epoch: 8   Global Step: 33980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:47:24,657-Speed 3323.28 samples/sec   Loss 5.6158   LearningRate 0.0347   Epoch: 8   Global Step: 33990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:47:27,744-Speed 3316.86 samples/sec   Loss 5.5901   LearningRate 0.0347   Epoch: 8   Global Step: 34000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:48:11,678-[lfw][34000]XNorm: 24.178498
Training: 2022-04-26 15:48:11,679-[lfw][34000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-04-26 15:48:11,679-[lfw][34000]Accuracy-Highest: 0.99783
Training: 2022-04-26 15:49:03,135-[cfp_fp][34000]XNorm: 22.775205
Training: 2022-04-26 15:49:03,135-[cfp_fp][34000]Accuracy-Flip: 0.98371+-0.00613
Training: 2022-04-26 15:49:03,136-[cfp_fp][34000]Accuracy-Highest: 0.98371
Training: 2022-04-26 15:49:46,948-[agedb_30][34000]XNorm: 24.158174
Training: 2022-04-26 15:50:02,384-[agedb_30][34000]Accuracy-Flip: 0.97067+-0.00867
Training: 2022-04-26 15:50:02,400-[agedb_30][34000]Accuracy-Highest: 0.97200
Training: 2022-04-26 15:50:05,485-Speed 64.92 samples/sec   Loss 5.6724   LearningRate 0.0347   Epoch: 8   Global Step: 34010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:50:08,554-Speed 3337.04 samples/sec   Loss 5.7524   LearningRate 0.0346   Epoch: 8   Global Step: 34020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:50:11,635-Speed 3338.74 samples/sec   Loss 5.7311   LearningRate 0.0346   Epoch: 8   Global Step: 34030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:50:14,706-Speed 3335.12 samples/sec   Loss 5.4884   LearningRate 0.0346   Epoch: 8   Global Step: 34040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:50:21,142-Speed 3335.19 samples/sec   Loss 5.6031   LearningRate 0.0346   Epoch: 8   Global Step: 34050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:50:34,622-Speed 3337.10 samples/sec   Loss 5.7206   LearningRate 0.0346   Epoch: 8   Global Step: 34060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:50:41,161-Speed 3341.50 samples/sec   Loss 5.6578   LearningRate 0.0346   Epoch: 8   Global Step: 34070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:50:51,688-Speed 3323.47 samples/sec   Loss 5.5551   LearningRate 0.0346   Epoch: 8   Global Step: 34080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:51:00,960-Speed 3334.82 samples/sec   Loss 5.6011   LearningRate 0.0345   Epoch: 8   Global Step: 34090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:51:04,028-Speed 3338.69 samples/sec   Loss 5.6740   LearningRate 0.0345   Epoch: 8   Global Step: 34100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:51:07,131-Speed 3301.15 samples/sec   Loss 5.6896   LearningRate 0.0345   Epoch: 8   Global Step: 34110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:51:10,190-Speed 3347.67 samples/sec   Loss 5.6867   LearningRate 0.0345   Epoch: 8   Global Step: 34120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:51:13,255-Speed 3341.47 samples/sec   Loss 5.5998   LearningRate 0.0345   Epoch: 8   Global Step: 34130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:51:16,315-Speed 3347.61 samples/sec   Loss 5.7039   LearningRate 0.0345   Epoch: 8   Global Step: 34140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:51:42,975-Speed 3333.56 samples/sec   Loss 5.7697   LearningRate 0.0345   Epoch: 8   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:51:53,346-Speed 3340.35 samples/sec   Loss 5.6361   LearningRate 0.0344   Epoch: 8   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:52:04,144-Speed 3362.61 samples/sec   Loss 5.7377   LearningRate 0.0344   Epoch: 8   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:52:09,265-Speed 3345.69 samples/sec   Loss 5.6857   LearningRate 0.0344   Epoch: 8   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:52:20,508-Speed 3343.49 samples/sec   Loss 5.7360   LearningRate 0.0344   Epoch: 8   Global Step: 34190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:52:31,653-Speed 3336.42 samples/sec   Loss 5.6233   LearningRate 0.0344   Epoch: 8   Global Step: 34200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:52:46,266-Speed 3350.26 samples/sec   Loss 5.7250   LearningRate 0.0344   Epoch: 8   Global Step: 34210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:52:56,389-Speed 3342.22 samples/sec   Loss 5.6047   LearningRate 0.0344   Epoch: 8   Global Step: 34220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:52:59,465-Speed 3330.10 samples/sec   Loss 5.6997   LearningRate 0.0344   Epoch: 8   Global Step: 34230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:02,541-Speed 3329.10 samples/sec   Loss 5.8403   LearningRate 0.0343   Epoch: 8   Global Step: 34240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:05,613-Speed 3334.24 samples/sec   Loss 5.7636   LearningRate 0.0343   Epoch: 8   Global Step: 34250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:18,110-Speed 3342.48 samples/sec   Loss 5.7198   LearningRate 0.0343   Epoch: 8   Global Step: 34260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:25,656-Speed 3339.39 samples/sec   Loss 5.6707   LearningRate 0.0343   Epoch: 8   Global Step: 34270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:53:28,783-Speed 3275.33 samples/sec   Loss 5.7182   LearningRate 0.0343   Epoch: 8   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:31,864-Speed 3323.65 samples/sec   Loss 5.6449   LearningRate 0.0343   Epoch: 8   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:34,951-Speed 3318.03 samples/sec   Loss 5.5672   LearningRate 0.0343   Epoch: 8   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:38,019-Speed 3338.39 samples/sec   Loss 5.6402   LearningRate 0.0342   Epoch: 8   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:41,109-Speed 3314.01 samples/sec   Loss 5.6591   LearningRate 0.0342   Epoch: 8   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:44,178-Speed 3337.59 samples/sec   Loss 5.7206   LearningRate 0.0342   Epoch: 8   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:53:56,615-Speed 3322.77 samples/sec   Loss 5.6566   LearningRate 0.0342   Epoch: 8   Global Step: 34340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:54:06,694-Speed 3336.16 samples/sec   Loss 5.6586   LearningRate 0.0342   Epoch: 8   Global Step: 34350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:54:09,765-Speed 3335.85 samples/sec   Loss 5.7546   LearningRate 0.0342   Epoch: 8   Global Step: 34360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:54:13,959-Speed 3346.84 samples/sec   Loss 5.7551   LearningRate 0.0342   Epoch: 8   Global Step: 34370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:54:24,760-Speed 3321.71 samples/sec   Loss 5.7471   LearningRate 0.0341   Epoch: 8   Global Step: 34380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:54:31,523-Speed 3343.61 samples/sec   Loss 5.7232   LearningRate 0.0341   Epoch: 8   Global Step: 34390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:54:37,878-Speed 3356.62 samples/sec   Loss 5.8079   LearningRate 0.0341   Epoch: 8   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:54:40,936-Speed 3348.82 samples/sec   Loss 5.8915   LearningRate 0.0341   Epoch: 8   Global Step: 34410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:54:43,998-Speed 3346.11 samples/sec   Loss 5.7388   LearningRate 0.0341   Epoch: 8   Global Step: 34420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:54:47,045-Speed 3361.08 samples/sec   Loss 5.7840   LearningRate 0.0341   Epoch: 8   Global Step: 34430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:54:52,120-Speed 3348.24 samples/sec   Loss 5.7775   LearningRate 0.0341   Epoch: 8   Global Step: 34440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:54:55,189-Speed 3337.97 samples/sec   Loss 5.7956   LearningRate 0.0340   Epoch: 8   Global Step: 34450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:54:58,261-Speed 3334.25 samples/sec   Loss 5.6747   LearningRate 0.0340   Epoch: 8   Global Step: 34460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:55:01,321-Speed 3346.60 samples/sec   Loss 5.7588   LearningRate 0.0340   Epoch: 8   Global Step: 34470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:55:04,392-Speed 3335.21 samples/sec   Loss 5.8711   LearningRate 0.0340   Epoch: 8   Global Step: 34480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:55:07,462-Speed 3336.41 samples/sec   Loss 5.8543   LearningRate 0.0340   Epoch: 8   Global Step: 34490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:55:10,535-Speed 3332.66 samples/sec   Loss 5.8783   LearningRate 0.0340   Epoch: 8   Global Step: 34500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:55:13,607-Speed 3334.53 samples/sec   Loss 5.8629   LearningRate 0.0340   Epoch: 8   Global Step: 34510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:55:16,680-Speed 3333.08 samples/sec   Loss 5.7381   LearningRate 0.0339   Epoch: 8   Global Step: 34520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:55:19,757-Speed 3328.56 samples/sec   Loss 5.6953   LearningRate 0.0339   Epoch: 8   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:22,835-Speed 3327.37 samples/sec   Loss 5.8473   LearningRate 0.0339   Epoch: 8   Global Step: 34540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:25,917-Speed 3323.09 samples/sec   Loss 5.7385   LearningRate 0.0339   Epoch: 8   Global Step: 34550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:28,993-Speed 3330.12 samples/sec   Loss 5.8432   LearningRate 0.0339   Epoch: 8   Global Step: 34560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:32,069-Speed 3330.06 samples/sec   Loss 5.7613   LearningRate 0.0339   Epoch: 8   Global Step: 34570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:35,142-Speed 3333.41 samples/sec   Loss 5.7963   LearningRate 0.0339   Epoch: 8   Global Step: 34580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:38,220-Speed 3327.08 samples/sec   Loss 5.8878   LearningRate 0.0338   Epoch: 8   Global Step: 34590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:41,295-Speed 3330.42 samples/sec   Loss 5.8561   LearningRate 0.0338   Epoch: 8   Global Step: 34600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:44,376-Speed 3324.47 samples/sec   Loss 5.8279   LearningRate 0.0338   Epoch: 8   Global Step: 34610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:47,450-Speed 3331.99 samples/sec   Loss 5.9260   LearningRate 0.0338   Epoch: 8   Global Step: 34620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:50,512-Speed 3345.69 samples/sec   Loss 5.8419   LearningRate 0.0338   Epoch: 8   Global Step: 34630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:53,603-Speed 3313.39 samples/sec   Loss 5.9132   LearningRate 0.0338   Epoch: 8   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:56,685-Speed 3323.87 samples/sec   Loss 5.8308   LearningRate 0.0338   Epoch: 8   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:55:59,766-Speed 3323.40 samples/sec   Loss 5.8299   LearningRate 0.0337   Epoch: 8   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:02,850-Speed 3321.48 samples/sec   Loss 5.7825   LearningRate 0.0337   Epoch: 8   Global Step: 34670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:05,914-Speed 3343.04 samples/sec   Loss 5.9681   LearningRate 0.0337   Epoch: 8   Global Step: 34680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:08,993-Speed 3326.92 samples/sec   Loss 5.7921   LearningRate 0.0337   Epoch: 8   Global Step: 34690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:12,081-Speed 3316.34 samples/sec   Loss 5.8188   LearningRate 0.0337   Epoch: 8   Global Step: 34700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:15,171-Speed 3315.02 samples/sec   Loss 5.8735   LearningRate 0.0337   Epoch: 8   Global Step: 34710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:18,262-Speed 3313.49 samples/sec   Loss 5.8489   LearningRate 0.0337   Epoch: 8   Global Step: 34720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:21,339-Speed 3328.60 samples/sec   Loss 5.8789   LearningRate 0.0336   Epoch: 8   Global Step: 34730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:24,423-Speed 3321.69 samples/sec   Loss 5.8594   LearningRate 0.0336   Epoch: 8   Global Step: 34740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:27,505-Speed 3323.65 samples/sec   Loss 5.9514   LearningRate 0.0336   Epoch: 8   Global Step: 34750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:30,581-Speed 3328.86 samples/sec   Loss 5.8724   LearningRate 0.0336   Epoch: 8   Global Step: 34760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:33,658-Speed 3328.76 samples/sec   Loss 5.7824   LearningRate 0.0336   Epoch: 8   Global Step: 34770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 15:56:36,735-Speed 3328.86 samples/sec   Loss 5.7753   LearningRate 0.0336   Epoch: 8   Global Step: 34780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:39,820-Speed 3320.01 samples/sec   Loss 5.7688   LearningRate 0.0336   Epoch: 8   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:43,019-Speed 3201.77 samples/sec   Loss 5.9263   LearningRate 0.0335   Epoch: 8   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:46,105-Speed 3318.85 samples/sec   Loss 5.9153   LearningRate 0.0335   Epoch: 8   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:49,179-Speed 3331.81 samples/sec   Loss 5.9475   LearningRate 0.0335   Epoch: 8   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:52,252-Speed 3332.75 samples/sec   Loss 5.8719   LearningRate 0.0335   Epoch: 8   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:55,328-Speed 3330.84 samples/sec   Loss 6.0386   LearningRate 0.0335   Epoch: 8   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:56:58,409-Speed 3324.15 samples/sec   Loss 5.8161   LearningRate 0.0335   Epoch: 8   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:01,488-Speed 3326.14 samples/sec   Loss 5.8861   LearningRate 0.0335   Epoch: 8   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:04,575-Speed 3317.60 samples/sec   Loss 5.8703   LearningRate 0.0334   Epoch: 8   Global Step: 34870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:07,653-Speed 3328.12 samples/sec   Loss 5.9511   LearningRate 0.0334   Epoch: 8   Global Step: 34880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:57:10,727-Speed 3331.35 samples/sec   Loss 5.8034   LearningRate 0.0334   Epoch: 8   Global Step: 34890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:57:13,796-Speed 3337.27 samples/sec   Loss 5.8521   LearningRate 0.0334   Epoch: 8   Global Step: 34900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:16,884-Speed 3316.96 samples/sec   Loss 5.8960   LearningRate 0.0334   Epoch: 8   Global Step: 34910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:19,961-Speed 3329.21 samples/sec   Loss 5.8724   LearningRate 0.0334   Epoch: 8   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:23,052-Speed 3313.82 samples/sec   Loss 5.8859   LearningRate 0.0334   Epoch: 8   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:26,156-Speed 3299.62 samples/sec   Loss 5.9049   LearningRate 0.0334   Epoch: 8   Global Step: 34940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:29,245-Speed 3315.41 samples/sec   Loss 5.9876   LearningRate 0.0333   Epoch: 8   Global Step: 34950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:32,329-Speed 3321.50 samples/sec   Loss 5.9262   LearningRate 0.0333   Epoch: 8   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:35,414-Speed 3320.44 samples/sec   Loss 5.9505   LearningRate 0.0333   Epoch: 8   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:38,491-Speed 3328.91 samples/sec   Loss 5.9140   LearningRate 0.0333   Epoch: 8   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:41,571-Speed 3325.49 samples/sec   Loss 5.9748   LearningRate 0.0333   Epoch: 8   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:44,648-Speed 3328.48 samples/sec   Loss 5.8851   LearningRate 0.0333   Epoch: 8   Global Step: 35000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:57:47,731-Speed 3322.12 samples/sec   Loss 5.8642   LearningRate 0.0333   Epoch: 8   Global Step: 35010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:57:50,802-Speed 3335.20 samples/sec   Loss 5.9520   LearningRate 0.0332   Epoch: 8   Global Step: 35020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:53,890-Speed 3316.59 samples/sec   Loss 5.8281   LearningRate 0.0332   Epoch: 8   Global Step: 35030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:57:56,985-Speed 3310.05 samples/sec   Loss 5.7726   LearningRate 0.0332   Epoch: 8   Global Step: 35040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:00,066-Speed 3324.95 samples/sec   Loss 6.0227   LearningRate 0.0332   Epoch: 8   Global Step: 35050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:03,147-Speed 3323.66 samples/sec   Loss 5.8843   LearningRate 0.0332   Epoch: 8   Global Step: 35060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:06,232-Speed 3320.26 samples/sec   Loss 6.0192   LearningRate 0.0332   Epoch: 8   Global Step: 35070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:09,311-Speed 3326.85 samples/sec   Loss 6.0820   LearningRate 0.0332   Epoch: 8   Global Step: 35080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:12,392-Speed 3324.90 samples/sec   Loss 5.9972   LearningRate 0.0331   Epoch: 8   Global Step: 35090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:15,474-Speed 3322.53 samples/sec   Loss 5.9465   LearningRate 0.0331   Epoch: 8   Global Step: 35100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:18,557-Speed 3322.53 samples/sec   Loss 6.0409   LearningRate 0.0331   Epoch: 8   Global Step: 35110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:21,636-Speed 3327.06 samples/sec   Loss 5.9361   LearningRate 0.0331   Epoch: 8   Global Step: 35120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:58:24,733-Speed 3306.35 samples/sec   Loss 5.9823   LearningRate 0.0331   Epoch: 8   Global Step: 35130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:27,889-Speed 3245.34 samples/sec   Loss 5.9617   LearningRate 0.0331   Epoch: 8   Global Step: 35140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:30,967-Speed 3327.82 samples/sec   Loss 5.8471   LearningRate 0.0331   Epoch: 8   Global Step: 35150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:34,053-Speed 3318.74 samples/sec   Loss 5.9559   LearningRate 0.0330   Epoch: 8   Global Step: 35160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:37,135-Speed 3323.82 samples/sec   Loss 5.9773   LearningRate 0.0330   Epoch: 8   Global Step: 35170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:40,217-Speed 3323.33 samples/sec   Loss 5.9523   LearningRate 0.0330   Epoch: 8   Global Step: 35180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:43,301-Speed 3321.64 samples/sec   Loss 6.0491   LearningRate 0.0330   Epoch: 8   Global Step: 35190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:46,387-Speed 3318.15 samples/sec   Loss 5.9227   LearningRate 0.0330   Epoch: 8   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:49,486-Speed 3305.86 samples/sec   Loss 5.9220   LearningRate 0.0330   Epoch: 8   Global Step: 35210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:52,565-Speed 3325.45 samples/sec   Loss 5.9556   LearningRate 0.0330   Epoch: 8   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:58:55,648-Speed 3323.08 samples/sec   Loss 6.0419   LearningRate 0.0329   Epoch: 8   Global Step: 35230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:58:58,746-Speed 3306.23 samples/sec   Loss 5.9000   LearningRate 0.0329   Epoch: 8   Global Step: 35240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:59:01,830-Speed 3320.92 samples/sec   Loss 5.9811   LearningRate 0.0329   Epoch: 8   Global Step: 35250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:59:04,909-Speed 3326.67 samples/sec   Loss 5.9935   LearningRate 0.0329   Epoch: 8   Global Step: 35260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:59:07,991-Speed 3323.20 samples/sec   Loss 6.0885   LearningRate 0.0329   Epoch: 8   Global Step: 35270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:59:11,054-Speed 3344.69 samples/sec   Loss 6.0863   LearningRate 0.0329   Epoch: 8   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:14,133-Speed 3325.84 samples/sec   Loss 5.9137   LearningRate 0.0329   Epoch: 8   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:17,212-Speed 3327.04 samples/sec   Loss 5.8919   LearningRate 0.0329   Epoch: 8   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:20,287-Speed 3330.88 samples/sec   Loss 5.9170   LearningRate 0.0328   Epoch: 8   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:23,366-Speed 3326.15 samples/sec   Loss 6.0949   LearningRate 0.0328   Epoch: 8   Global Step: 35320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:26,469-Speed 3300.99 samples/sec   Loss 6.0060   LearningRate 0.0328   Epoch: 8   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:29,549-Speed 3325.51 samples/sec   Loss 6.0133   LearningRate 0.0328   Epoch: 8   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:32,623-Speed 3332.31 samples/sec   Loss 5.9193   LearningRate 0.0328   Epoch: 8   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:35,710-Speed 3317.98 samples/sec   Loss 5.8571   LearningRate 0.0328   Epoch: 8   Global Step: 35360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:38,791-Speed 3324.46 samples/sec   Loss 5.9781   LearningRate 0.0328   Epoch: 8   Global Step: 35370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:41,877-Speed 3319.01 samples/sec   Loss 5.9546   LearningRate 0.0327   Epoch: 8   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 15:59:44,942-Speed 3341.62 samples/sec   Loss 6.0022   LearningRate 0.0327   Epoch: 8   Global Step: 35390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:48,033-Speed 3313.58 samples/sec   Loss 6.0810   LearningRate 0.0327   Epoch: 8   Global Step: 35400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:51,130-Speed 3307.35 samples/sec   Loss 5.9784   LearningRate 0.0327   Epoch: 8   Global Step: 35410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:54,210-Speed 3326.21 samples/sec   Loss 5.9346   LearningRate 0.0327   Epoch: 8   Global Step: 35420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 15:59:57,289-Speed 3326.45 samples/sec   Loss 5.9201   LearningRate 0.0327   Epoch: 8   Global Step: 35430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:00,368-Speed 3325.76 samples/sec   Loss 5.8585   LearningRate 0.0327   Epoch: 8   Global Step: 35440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:03,451-Speed 3322.43 samples/sec   Loss 6.0510   LearningRate 0.0326   Epoch: 8   Global Step: 35450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:06,537-Speed 3318.98 samples/sec   Loss 6.0519   LearningRate 0.0326   Epoch: 8   Global Step: 35460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:09,618-Speed 3323.81 samples/sec   Loss 6.0040   LearningRate 0.0326   Epoch: 8   Global Step: 35470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:12,707-Speed 3315.91 samples/sec   Loss 6.0116   LearningRate 0.0326   Epoch: 8   Global Step: 35480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:15,789-Speed 3323.44 samples/sec   Loss 5.9019   LearningRate 0.0326   Epoch: 8   Global Step: 35490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:00:18,853-Speed 3343.41 samples/sec   Loss 6.0117   LearningRate 0.0326   Epoch: 8   Global Step: 35500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:21,961-Speed 3295.19 samples/sec   Loss 5.9964   LearningRate 0.0326   Epoch: 8   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:25,043-Speed 3323.27 samples/sec   Loss 5.9065   LearningRate 0.0325   Epoch: 8   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:28,122-Speed 3326.33 samples/sec   Loss 5.8831   LearningRate 0.0325   Epoch: 8   Global Step: 35530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:31,206-Speed 3320.76 samples/sec   Loss 6.0596   LearningRate 0.0325   Epoch: 8   Global Step: 35540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:34,285-Speed 3327.24 samples/sec   Loss 5.9368   LearningRate 0.0325   Epoch: 8   Global Step: 35550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:37,367-Speed 3323.17 samples/sec   Loss 6.0078   LearningRate 0.0325   Epoch: 8   Global Step: 35560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:40,453-Speed 3319.28 samples/sec   Loss 5.9807   LearningRate 0.0325   Epoch: 8   Global Step: 35570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:43,539-Speed 3318.50 samples/sec   Loss 5.9731   LearningRate 0.0325   Epoch: 8   Global Step: 35580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:46,621-Speed 3323.00 samples/sec   Loss 5.9185   LearningRate 0.0325   Epoch: 8   Global Step: 35590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:49,690-Speed 3337.88 samples/sec   Loss 6.0478   LearningRate 0.0324   Epoch: 8   Global Step: 35600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:52,810-Speed 3282.81 samples/sec   Loss 5.9333   LearningRate 0.0324   Epoch: 8   Global Step: 35610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:55,893-Speed 3321.51 samples/sec   Loss 6.0511   LearningRate 0.0324   Epoch: 8   Global Step: 35620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:00:58,979-Speed 3318.76 samples/sec   Loss 5.9288   LearningRate 0.0324   Epoch: 8   Global Step: 35630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:02,070-Speed 3314.36 samples/sec   Loss 6.0272   LearningRate 0.0324   Epoch: 8   Global Step: 35640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:05,152-Speed 3323.15 samples/sec   Loss 5.9824   LearningRate 0.0324   Epoch: 8   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:08,244-Speed 3312.40 samples/sec   Loss 5.9935   LearningRate 0.0324   Epoch: 8   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:11,325-Speed 3324.26 samples/sec   Loss 5.9041   LearningRate 0.0323   Epoch: 8   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:14,413-Speed 3316.81 samples/sec   Loss 6.0021   LearningRate 0.0323   Epoch: 8   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:17,499-Speed 3319.53 samples/sec   Loss 5.9121   LearningRate 0.0323   Epoch: 8   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:20,580-Speed 3324.29 samples/sec   Loss 6.0824   LearningRate 0.0323   Epoch: 8   Global Step: 35700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:01:23,644-Speed 3342.52 samples/sec   Loss 6.0493   LearningRate 0.0323   Epoch: 8   Global Step: 35710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:26,722-Speed 3327.86 samples/sec   Loss 6.0279   LearningRate 0.0323   Epoch: 8   Global Step: 35720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:29,812-Speed 3314.64 samples/sec   Loss 5.8698   LearningRate 0.0323   Epoch: 8   Global Step: 35730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:32,891-Speed 3326.35 samples/sec   Loss 5.9402   LearningRate 0.0322   Epoch: 8   Global Step: 35740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:35,974-Speed 3322.50 samples/sec   Loss 6.0120   LearningRate 0.0322   Epoch: 8   Global Step: 35750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:39,065-Speed 3313.60 samples/sec   Loss 5.9443   LearningRate 0.0322   Epoch: 8   Global Step: 35760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:42,149-Speed 3321.44 samples/sec   Loss 5.9061   LearningRate 0.0322   Epoch: 8   Global Step: 35770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:45,234-Speed 3319.53 samples/sec   Loss 6.0904   LearningRate 0.0322   Epoch: 8   Global Step: 35780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:48,320-Speed 3320.02 samples/sec   Loss 5.9719   LearningRate 0.0322   Epoch: 8   Global Step: 35790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:51,469-Speed 3251.54 samples/sec   Loss 6.0136   LearningRate 0.0322   Epoch: 8   Global Step: 35800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:01:54,618-Speed 3253.02 samples/sec   Loss 5.8902   LearningRate 0.0321   Epoch: 8   Global Step: 35810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:01:57,699-Speed 3324.42 samples/sec   Loss 5.9720   LearningRate 0.0321   Epoch: 8   Global Step: 35820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:02:00,786-Speed 3318.40 samples/sec   Loss 5.9896   LearningRate 0.0321   Epoch: 8   Global Step: 35830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:02:03,857-Speed 3335.37 samples/sec   Loss 5.9803   LearningRate 0.0321   Epoch: 8   Global Step: 35840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:06,941-Speed 3320.63 samples/sec   Loss 5.9309   LearningRate 0.0321   Epoch: 8   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:10,024-Speed 3322.24 samples/sec   Loss 6.0347   LearningRate 0.0321   Epoch: 8   Global Step: 35860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:13,120-Speed 3308.16 samples/sec   Loss 5.9432   LearningRate 0.0321   Epoch: 8   Global Step: 35870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:16,205-Speed 3320.38 samples/sec   Loss 5.9706   LearningRate 0.0321   Epoch: 8   Global Step: 35880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:19,289-Speed 3321.45 samples/sec   Loss 6.0620   LearningRate 0.0320   Epoch: 8   Global Step: 35890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:22,372-Speed 3322.81 samples/sec   Loss 5.8164   LearningRate 0.0320   Epoch: 8   Global Step: 35900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:25,452-Speed 3325.21 samples/sec   Loss 6.0171   LearningRate 0.0320   Epoch: 8   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:28,569-Speed 3285.84 samples/sec   Loss 6.1327   LearningRate 0.0320   Epoch: 8   Global Step: 35920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:31,652-Speed 3322.47 samples/sec   Loss 5.9033   LearningRate 0.0320   Epoch: 8   Global Step: 35930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:34,715-Speed 3343.80 samples/sec   Loss 5.8974   LearningRate 0.0320   Epoch: 8   Global Step: 35940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:37,851-Speed 3265.99 samples/sec   Loss 6.0446   LearningRate 0.0320   Epoch: 8   Global Step: 35950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:41,013-Speed 3240.20 samples/sec   Loss 6.1223   LearningRate 0.0319   Epoch: 8   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:44,107-Speed 3310.50 samples/sec   Loss 5.8248   LearningRate 0.0319   Epoch: 8   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:47,195-Speed 3316.76 samples/sec   Loss 5.9485   LearningRate 0.0319   Epoch: 8   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:50,278-Speed 3322.63 samples/sec   Loss 6.0330   LearningRate 0.0319   Epoch: 8   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:02:53,377-Speed 3304.23 samples/sec   Loss 6.0149   LearningRate 0.0319   Epoch: 8   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:03:37,014-[lfw][36000]XNorm: 23.121268
Training: 2022-04-26 16:03:37,015-[lfw][36000]Accuracy-Flip: 0.99733+-0.00281
Training: 2022-04-26 16:03:37,015-[lfw][36000]Accuracy-Highest: 0.99783
Training: 2022-04-26 16:04:27,773-[cfp_fp][36000]XNorm: 21.942977
Training: 2022-04-26 16:04:27,774-[cfp_fp][36000]Accuracy-Flip: 0.98543+-0.00588
Training: 2022-04-26 16:04:27,774-[cfp_fp][36000]Accuracy-Highest: 0.98543
Training: 2022-04-26 16:05:11,538-[agedb_30][36000]XNorm: 22.965291
Training: 2022-04-26 16:05:11,539-[agedb_30][36000]Accuracy-Flip: 0.97350+-0.00883
Training: 2022-04-26 16:05:11,539-[agedb_30][36000]Accuracy-Highest: 0.97350
Training: 2022-04-26 16:05:14,632-Speed 72.49 samples/sec   Loss 6.0614   LearningRate 0.0319   Epoch: 8   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:17,719-Speed 3318.03 samples/sec   Loss 6.0984   LearningRate 0.0319   Epoch: 8   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:20,814-Speed 3310.02 samples/sec   Loss 6.1009   LearningRate 0.0318   Epoch: 8   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:23,896-Speed 3322.52 samples/sec   Loss 5.9998   LearningRate 0.0318   Epoch: 8   Global Step: 36040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:05:26,984-Speed 3316.89 samples/sec   Loss 5.9987   LearningRate 0.0318   Epoch: 8   Global Step: 36050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:05:30,061-Speed 3329.23 samples/sec   Loss 5.9752   LearningRate 0.0318   Epoch: 8   Global Step: 36060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:05:33,121-Speed 3347.18 samples/sec   Loss 6.0704   LearningRate 0.0318   Epoch: 8   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:36,200-Speed 3325.82 samples/sec   Loss 6.0135   LearningRate 0.0318   Epoch: 8   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:39,287-Speed 3317.94 samples/sec   Loss 5.8872   LearningRate 0.0318   Epoch: 8   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:42,378-Speed 3313.84 samples/sec   Loss 6.1022   LearningRate 0.0318   Epoch: 8   Global Step: 36100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:45,456-Speed 3327.51 samples/sec   Loss 6.0417   LearningRate 0.0317   Epoch: 8   Global Step: 36110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:48,539-Speed 3322.95 samples/sec   Loss 6.0201   LearningRate 0.0317   Epoch: 8   Global Step: 36120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:51,628-Speed 3316.18 samples/sec   Loss 5.9826   LearningRate 0.0317   Epoch: 8   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:54,708-Speed 3325.20 samples/sec   Loss 5.9204   LearningRate 0.0317   Epoch: 8   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:05:57,797-Speed 3316.09 samples/sec   Loss 6.0073   LearningRate 0.0317   Epoch: 8   Global Step: 36150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:00,881-Speed 3321.07 samples/sec   Loss 5.8688   LearningRate 0.0317   Epoch: 8   Global Step: 36160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:03,949-Speed 3338.30 samples/sec   Loss 6.0300   LearningRate 0.0317   Epoch: 8   Global Step: 36170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:07,033-Speed 3320.58 samples/sec   Loss 5.9963   LearningRate 0.0316   Epoch: 8   Global Step: 36180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:10,140-Speed 3296.49 samples/sec   Loss 5.9542   LearningRate 0.0316   Epoch: 8   Global Step: 36190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:13,240-Speed 3304.79 samples/sec   Loss 6.0843   LearningRate 0.0316   Epoch: 8   Global Step: 36200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:16,331-Speed 3313.54 samples/sec   Loss 6.1378   LearningRate 0.0316   Epoch: 8   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:19,419-Speed 3316.43 samples/sec   Loss 6.1019   LearningRate 0.0316   Epoch: 8   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:22,510-Speed 3314.34 samples/sec   Loss 6.0324   LearningRate 0.0316   Epoch: 8   Global Step: 36230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:06:25,589-Speed 3325.98 samples/sec   Loss 5.9759   LearningRate 0.0316   Epoch: 8   Global Step: 36240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:28,680-Speed 3314.10 samples/sec   Loss 5.8869   LearningRate 0.0315   Epoch: 8   Global Step: 36250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:31,769-Speed 3315.41 samples/sec   Loss 5.9975   LearningRate 0.0315   Epoch: 8   Global Step: 36260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:34,855-Speed 3318.93 samples/sec   Loss 6.0189   LearningRate 0.0315   Epoch: 8   Global Step: 36270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:37,944-Speed 3315.24 samples/sec   Loss 6.0108   LearningRate 0.0315   Epoch: 8   Global Step: 36280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:41,043-Speed 3305.41 samples/sec   Loss 5.8828   LearningRate 0.0315   Epoch: 8   Global Step: 36290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:44,127-Speed 3321.22 samples/sec   Loss 5.8744   LearningRate 0.0315   Epoch: 8   Global Step: 36300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:47,215-Speed 3316.60 samples/sec   Loss 6.1051   LearningRate 0.0315   Epoch: 8   Global Step: 36310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:50,297-Speed 3323.60 samples/sec   Loss 5.8486   LearningRate 0.0315   Epoch: 8   Global Step: 36320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:53,387-Speed 3314.48 samples/sec   Loss 6.0438   LearningRate 0.0314   Epoch: 8   Global Step: 36330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:56,451-Speed 3342.94 samples/sec   Loss 6.0395   LearningRate 0.0314   Epoch: 8   Global Step: 36340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:06:59,531-Speed 3326.02 samples/sec   Loss 5.9313   LearningRate 0.0314   Epoch: 8   Global Step: 36350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:02,624-Speed 3311.04 samples/sec   Loss 6.0841   LearningRate 0.0314   Epoch: 8   Global Step: 36360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:05,702-Speed 3327.18 samples/sec   Loss 5.9480   LearningRate 0.0314   Epoch: 8   Global Step: 36370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:08,797-Speed 3310.14 samples/sec   Loss 5.9336   LearningRate 0.0314   Epoch: 8   Global Step: 36380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:11,880-Speed 3322.03 samples/sec   Loss 5.9561   LearningRate 0.0314   Epoch: 8   Global Step: 36390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:14,967-Speed 3318.24 samples/sec   Loss 6.0236   LearningRate 0.0313   Epoch: 8   Global Step: 36400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:18,045-Speed 3328.58 samples/sec   Loss 5.9769   LearningRate 0.0313   Epoch: 8   Global Step: 36410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:21,121-Speed 3329.57 samples/sec   Loss 5.9213   LearningRate 0.0313   Epoch: 8   Global Step: 36420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:24,200-Speed 3326.26 samples/sec   Loss 6.0393   LearningRate 0.0313   Epoch: 8   Global Step: 36430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:07:27,288-Speed 3317.70 samples/sec   Loss 6.1077   LearningRate 0.0313   Epoch: 8   Global Step: 36440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:30,382-Speed 3309.80 samples/sec   Loss 6.1179   LearningRate 0.0313   Epoch: 8   Global Step: 36450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:33,464-Speed 3323.18 samples/sec   Loss 6.0644   LearningRate 0.0313   Epoch: 8   Global Step: 36460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:36,559-Speed 3309.10 samples/sec   Loss 6.1004   LearningRate 0.0312   Epoch: 8   Global Step: 36470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:39,647-Speed 3317.35 samples/sec   Loss 6.0433   LearningRate 0.0312   Epoch: 8   Global Step: 36480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:42,725-Speed 3327.28 samples/sec   Loss 5.9439   LearningRate 0.0312   Epoch: 8   Global Step: 36490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:45,804-Speed 3326.59 samples/sec   Loss 5.9263   LearningRate 0.0312   Epoch: 8   Global Step: 36500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:48,925-Speed 3281.95 samples/sec   Loss 5.9421   LearningRate 0.0312   Epoch: 8   Global Step: 36510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:52,144-Speed 3182.25 samples/sec   Loss 6.0487   LearningRate 0.0312   Epoch: 8   Global Step: 36520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:55,320-Speed 3225.04 samples/sec   Loss 5.8972   LearningRate 0.0312   Epoch: 8   Global Step: 36530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:07:58,384-Speed 3342.19 samples/sec   Loss 5.9358   LearningRate 0.0312   Epoch: 8   Global Step: 36540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:01,468-Speed 3322.04 samples/sec   Loss 5.9792   LearningRate 0.0311   Epoch: 8   Global Step: 36550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:04,549-Speed 3323.98 samples/sec   Loss 5.9849   LearningRate 0.0311   Epoch: 8   Global Step: 36560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:07,627-Speed 3327.19 samples/sec   Loss 6.0590   LearningRate 0.0311   Epoch: 8   Global Step: 36570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:10,777-Speed 3251.90 samples/sec   Loss 6.0373   LearningRate 0.0311   Epoch: 8   Global Step: 36580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:13,920-Speed 3257.98 samples/sec   Loss 6.0408   LearningRate 0.0311   Epoch: 8   Global Step: 36590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:17,046-Speed 3277.09 samples/sec   Loss 5.9396   LearningRate 0.0311   Epoch: 8   Global Step: 36600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:20,125-Speed 3326.40 samples/sec   Loss 6.0756   LearningRate 0.0311   Epoch: 8   Global Step: 36610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:23,211-Speed 3318.65 samples/sec   Loss 5.8316   LearningRate 0.0310   Epoch: 8   Global Step: 36620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:26,293-Speed 3323.71 samples/sec   Loss 5.9374   LearningRate 0.0310   Epoch: 8   Global Step: 36630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:29,373-Speed 3325.64 samples/sec   Loss 5.9725   LearningRate 0.0310   Epoch: 8   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:08:32,441-Speed 3338.66 samples/sec   Loss 5.9054   LearningRate 0.0310   Epoch: 8   Global Step: 36650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:35,526-Speed 3320.18 samples/sec   Loss 6.0201   LearningRate 0.0310   Epoch: 8   Global Step: 36660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:38,620-Speed 3310.09 samples/sec   Loss 5.8585   LearningRate 0.0310   Epoch: 8   Global Step: 36670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:41,709-Speed 3315.59 samples/sec   Loss 6.0653   LearningRate 0.0310   Epoch: 8   Global Step: 36680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:44,793-Speed 3321.73 samples/sec   Loss 6.0267   LearningRate 0.0310   Epoch: 8   Global Step: 36690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:47,881-Speed 3316.52 samples/sec   Loss 6.0279   LearningRate 0.0309   Epoch: 8   Global Step: 36700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:50,965-Speed 3321.20 samples/sec   Loss 5.9698   LearningRate 0.0309   Epoch: 8   Global Step: 36710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:54,048-Speed 3322.48 samples/sec   Loss 5.9685   LearningRate 0.0309   Epoch: 8   Global Step: 36720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:08:57,128-Speed 3325.71 samples/sec   Loss 5.9596   LearningRate 0.0309   Epoch: 8   Global Step: 36730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:00,221-Speed 3310.62 samples/sec   Loss 5.9478   LearningRate 0.0309   Epoch: 8   Global Step: 36740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:03,319-Speed 3306.87 samples/sec   Loss 5.9854   LearningRate 0.0309   Epoch: 8   Global Step: 36750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:09:06,385-Speed 3339.57 samples/sec   Loss 6.0483   LearningRate 0.0309   Epoch: 8   Global Step: 36760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:09,474-Speed 3316.72 samples/sec   Loss 5.9930   LearningRate 0.0308   Epoch: 8   Global Step: 36770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:12,558-Speed 3321.57 samples/sec   Loss 5.8729   LearningRate 0.0308   Epoch: 8   Global Step: 36780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:15,641-Speed 3321.58 samples/sec   Loss 5.9431   LearningRate 0.0308   Epoch: 8   Global Step: 36790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:18,723-Speed 3324.04 samples/sec   Loss 5.9468   LearningRate 0.0308   Epoch: 8   Global Step: 36800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:21,802-Speed 3325.62 samples/sec   Loss 5.9581   LearningRate 0.0308   Epoch: 8   Global Step: 36810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:24,898-Speed 3308.63 samples/sec   Loss 5.9142   LearningRate 0.0308   Epoch: 8   Global Step: 36820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:27,980-Speed 3323.42 samples/sec   Loss 6.0322   LearningRate 0.0308   Epoch: 8   Global Step: 36830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:31,065-Speed 3320.36 samples/sec   Loss 5.9554   LearningRate 0.0308   Epoch: 8   Global Step: 36840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:34,160-Speed 3309.02 samples/sec   Loss 5.9046   LearningRate 0.0307   Epoch: 8   Global Step: 36850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:37,247-Speed 3318.35 samples/sec   Loss 6.0122   LearningRate 0.0307   Epoch: 8   Global Step: 36860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:09:40,310-Speed 3343.36 samples/sec   Loss 5.8953   LearningRate 0.0307   Epoch: 8   Global Step: 36870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:43,393-Speed 3321.97 samples/sec   Loss 5.9727   LearningRate 0.0307   Epoch: 8   Global Step: 36880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:46,476-Speed 3321.89 samples/sec   Loss 5.9586   LearningRate 0.0307   Epoch: 8   Global Step: 36890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:49,557-Speed 3324.93 samples/sec   Loss 5.8019   LearningRate 0.0307   Epoch: 8   Global Step: 36900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:52,650-Speed 3311.56 samples/sec   Loss 5.9911   LearningRate 0.0307   Epoch: 8   Global Step: 36910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:55,740-Speed 3314.98 samples/sec   Loss 5.8675   LearningRate 0.0306   Epoch: 8   Global Step: 36920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:09:58,820-Speed 3324.79 samples/sec   Loss 5.9435   LearningRate 0.0306   Epoch: 8   Global Step: 36930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:01,907-Speed 3318.31 samples/sec   Loss 5.9562   LearningRate 0.0306   Epoch: 8   Global Step: 36940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:04,993-Speed 3318.77 samples/sec   Loss 5.9012   LearningRate 0.0306   Epoch: 8   Global Step: 36950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:08,081-Speed 3316.71 samples/sec   Loss 5.9034   LearningRate 0.0306   Epoch: 8   Global Step: 36960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:11,154-Speed 3333.03 samples/sec   Loss 5.9198   LearningRate 0.0306   Epoch: 8   Global Step: 36970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:14,245-Speed 3313.72 samples/sec   Loss 5.9550   LearningRate 0.0306   Epoch: 8   Global Step: 36980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:17,332-Speed 3317.59 samples/sec   Loss 5.9925   LearningRate 0.0306   Epoch: 8   Global Step: 36990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:20,417-Speed 3320.96 samples/sec   Loss 5.9902   LearningRate 0.0305   Epoch: 8   Global Step: 37000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:23,511-Speed 3310.55 samples/sec   Loss 5.9849   LearningRate 0.0305   Epoch: 8   Global Step: 37010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:26,594-Speed 3322.36 samples/sec   Loss 5.9282   LearningRate 0.0305   Epoch: 8   Global Step: 37020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:29,684-Speed 3314.95 samples/sec   Loss 6.0568   LearningRate 0.0305   Epoch: 8   Global Step: 37030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:32,783-Speed 3304.68 samples/sec   Loss 5.9664   LearningRate 0.0305   Epoch: 8   Global Step: 37040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:35,873-Speed 3314.86 samples/sec   Loss 6.0501   LearningRate 0.0305   Epoch: 8   Global Step: 37050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:38,963-Speed 3314.41 samples/sec   Loss 5.9678   LearningRate 0.0305   Epoch: 8   Global Step: 37060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:42,086-Speed 3279.05 samples/sec   Loss 6.0113   LearningRate 0.0304   Epoch: 8   Global Step: 37070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:45,176-Speed 3315.17 samples/sec   Loss 5.9455   LearningRate 0.0304   Epoch: 8   Global Step: 37080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:48,272-Speed 3308.80 samples/sec   Loss 5.8583   LearningRate 0.0304   Epoch: 8   Global Step: 37090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:51,364-Speed 3312.38 samples/sec   Loss 5.9100   LearningRate 0.0304   Epoch: 8   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:54,547-Speed 3217.48 samples/sec   Loss 6.0563   LearningRate 0.0304   Epoch: 8   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:10:57,632-Speed 3319.60 samples/sec   Loss 5.8130   LearningRate 0.0304   Epoch: 8   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:00,725-Speed 3312.07 samples/sec   Loss 5.9040   LearningRate 0.0304   Epoch: 8   Global Step: 37130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:03,826-Speed 3302.68 samples/sec   Loss 5.8350   LearningRate 0.0303   Epoch: 8   Global Step: 37140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:06,964-Speed 3264.05 samples/sec   Loss 5.9281   LearningRate 0.0303   Epoch: 8   Global Step: 37150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:10,059-Speed 3308.36 samples/sec   Loss 5.8988   LearningRate 0.0303   Epoch: 8   Global Step: 37160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:13,147-Speed 3318.20 samples/sec   Loss 5.8231   LearningRate 0.0303   Epoch: 8   Global Step: 37170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:16,259-Speed 3291.39 samples/sec   Loss 5.9870   LearningRate 0.0303   Epoch: 8   Global Step: 37180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:19,346-Speed 3317.64 samples/sec   Loss 5.9727   LearningRate 0.0303   Epoch: 8   Global Step: 37190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:22,441-Speed 3308.40 samples/sec   Loss 5.8929   LearningRate 0.0303   Epoch: 8   Global Step: 37200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:11:25,574-Speed 3269.99 samples/sec   Loss 5.8753   LearningRate 0.0303   Epoch: 8   Global Step: 37210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:11:37,865-Speed 833.14 samples/sec   Loss 5.1877   LearningRate 0.0302   Epoch: 9   Global Step: 37220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:11:41,162-Speed 3107.68 samples/sec   Loss 4.3750   LearningRate 0.0302   Epoch: 9   Global Step: 37230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:11:44,450-Speed 3114.49 samples/sec   Loss 4.3723   LearningRate 0.0302   Epoch: 9   Global Step: 37240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:11:47,833-Speed 3027.42 samples/sec   Loss 4.3633   LearningRate 0.0302   Epoch: 9   Global Step: 37250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:11:51,108-Speed 3127.66 samples/sec   Loss 4.3962   LearningRate 0.0302   Epoch: 9   Global Step: 37260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:11:54,207-Speed 3305.39 samples/sec   Loss 4.3343   LearningRate 0.0302   Epoch: 9   Global Step: 37270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:11:57,287-Speed 3325.97 samples/sec   Loss 4.3678   LearningRate 0.0302   Epoch: 9   Global Step: 37280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:00,368-Speed 3324.63 samples/sec   Loss 4.3495   LearningRate 0.0302   Epoch: 9   Global Step: 37290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:03,448-Speed 3324.48 samples/sec   Loss 4.3364   LearningRate 0.0301   Epoch: 9   Global Step: 37300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:06,527-Speed 3326.49 samples/sec   Loss 4.3861   LearningRate 0.0301   Epoch: 9   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:12:09,614-Speed 3318.46 samples/sec   Loss 4.4444   LearningRate 0.0301   Epoch: 9   Global Step: 37320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:12:12,706-Speed 3312.50 samples/sec   Loss 4.4638   LearningRate 0.0301   Epoch: 9   Global Step: 37330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:12:15,786-Speed 3325.24 samples/sec   Loss 4.4790   LearningRate 0.0301   Epoch: 9   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:12:18,874-Speed 3316.57 samples/sec   Loss 4.3528   LearningRate 0.0301   Epoch: 9   Global Step: 37350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:12:21,938-Speed 3342.82 samples/sec   Loss 4.4990   LearningRate 0.0301   Epoch: 9   Global Step: 37360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:25,040-Speed 3302.70 samples/sec   Loss 4.4554   LearningRate 0.0300   Epoch: 9   Global Step: 37370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:28,127-Speed 3317.30 samples/sec   Loss 4.4258   LearningRate 0.0300   Epoch: 9   Global Step: 37380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:31,215-Speed 3317.05 samples/sec   Loss 4.5482   LearningRate 0.0300   Epoch: 9   Global Step: 37390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:34,304-Speed 3316.12 samples/sec   Loss 4.4570   LearningRate 0.0300   Epoch: 9   Global Step: 37400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:37,398-Speed 3310.24 samples/sec   Loss 4.5064   LearningRate 0.0300   Epoch: 9   Global Step: 37410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:40,483-Speed 3319.77 samples/sec   Loss 4.4786   LearningRate 0.0300   Epoch: 9   Global Step: 37420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:43,570-Speed 3318.17 samples/sec   Loss 4.5607   LearningRate 0.0300   Epoch: 9   Global Step: 37430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:46,657-Speed 3317.16 samples/sec   Loss 4.5921   LearningRate 0.0300   Epoch: 9   Global Step: 37440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:49,749-Speed 3312.74 samples/sec   Loss 4.5514   LearningRate 0.0299   Epoch: 9   Global Step: 37450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:12:52,836-Speed 3317.99 samples/sec   Loss 4.6437   LearningRate 0.0299   Epoch: 9   Global Step: 37460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:12:55,930-Speed 3310.63 samples/sec   Loss 4.5643   LearningRate 0.0299   Epoch: 9   Global Step: 37470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:12:59,023-Speed 3311.13 samples/sec   Loss 4.4846   LearningRate 0.0299   Epoch: 9   Global Step: 37480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:02,116-Speed 3312.33 samples/sec   Loss 4.5719   LearningRate 0.0299   Epoch: 9   Global Step: 37490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:05,209-Speed 3311.11 samples/sec   Loss 4.6470   LearningRate 0.0299   Epoch: 9   Global Step: 37500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:08,299-Speed 3314.79 samples/sec   Loss 4.5566   LearningRate 0.0299   Epoch: 9   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:11,397-Speed 3305.50 samples/sec   Loss 4.6411   LearningRate 0.0298   Epoch: 9   Global Step: 37520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:14,494-Speed 3307.32 samples/sec   Loss 4.5524   LearningRate 0.0298   Epoch: 9   Global Step: 37530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:17,590-Speed 3308.49 samples/sec   Loss 4.6819   LearningRate 0.0298   Epoch: 9   Global Step: 37540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:20,681-Speed 3313.29 samples/sec   Loss 4.6296   LearningRate 0.0298   Epoch: 9   Global Step: 37550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:23,775-Speed 3309.97 samples/sec   Loss 4.6526   LearningRate 0.0298   Epoch: 9   Global Step: 37560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:13:26,872-Speed 3307.50 samples/sec   Loss 4.6230   LearningRate 0.0298   Epoch: 9   Global Step: 37570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:13:30,354-Speed 2941.91 samples/sec   Loss 4.6029   LearningRate 0.0298   Epoch: 9   Global Step: 37580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:33,456-Speed 3301.90 samples/sec   Loss 4.6449   LearningRate 0.0298   Epoch: 9   Global Step: 37590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:36,552-Speed 3308.45 samples/sec   Loss 4.7074   LearningRate 0.0297   Epoch: 9   Global Step: 37600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:39,644-Speed 3312.31 samples/sec   Loss 4.6725   LearningRate 0.0297   Epoch: 9   Global Step: 37610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:42,744-Speed 3303.90 samples/sec   Loss 4.6257   LearningRate 0.0297   Epoch: 9   Global Step: 37620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:45,835-Speed 3313.05 samples/sec   Loss 4.6441   LearningRate 0.0297   Epoch: 9   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:48,930-Speed 3310.08 samples/sec   Loss 4.5769   LearningRate 0.0297   Epoch: 9   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:52,020-Speed 3314.71 samples/sec   Loss 4.6926   LearningRate 0.0297   Epoch: 9   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:55,110-Speed 3314.72 samples/sec   Loss 4.6605   LearningRate 0.0297   Epoch: 9   Global Step: 37660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:13:58,212-Speed 3301.80 samples/sec   Loss 4.7382   LearningRate 0.0296   Epoch: 9   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:01,285-Speed 3332.36 samples/sec   Loss 4.7716   LearningRate 0.0296   Epoch: 9   Global Step: 37680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:04,376-Speed 3313.56 samples/sec   Loss 4.7975   LearningRate 0.0296   Epoch: 9   Global Step: 37690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:07,465-Speed 3316.53 samples/sec   Loss 4.6816   LearningRate 0.0296   Epoch: 9   Global Step: 37700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:10,556-Speed 3313.40 samples/sec   Loss 4.7190   LearningRate 0.0296   Epoch: 9   Global Step: 37710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:13,640-Speed 3321.18 samples/sec   Loss 4.6721   LearningRate 0.0296   Epoch: 9   Global Step: 37720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:16,732-Speed 3312.16 samples/sec   Loss 4.8367   LearningRate 0.0296   Epoch: 9   Global Step: 37730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:19,828-Speed 3308.39 samples/sec   Loss 4.6401   LearningRate 0.0296   Epoch: 9   Global Step: 37740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:22,918-Speed 3315.38 samples/sec   Loss 4.8113   LearningRate 0.0295   Epoch: 9   Global Step: 37750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:26,043-Speed 3276.50 samples/sec   Loss 4.7386   LearningRate 0.0295   Epoch: 9   Global Step: 37760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:29,132-Speed 3316.17 samples/sec   Loss 4.8287   LearningRate 0.0295   Epoch: 9   Global Step: 37770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:32,216-Speed 3321.07 samples/sec   Loss 4.8496   LearningRate 0.0295   Epoch: 9   Global Step: 37780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:14:35,304-Speed 3316.68 samples/sec   Loss 4.7318   LearningRate 0.0295   Epoch: 9   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:14:38,374-Speed 3336.83 samples/sec   Loss 4.8325   LearningRate 0.0295   Epoch: 9   Global Step: 37800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:41,469-Speed 3309.21 samples/sec   Loss 4.8018   LearningRate 0.0295   Epoch: 9   Global Step: 37810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:44,559-Speed 3314.31 samples/sec   Loss 4.8882   LearningRate 0.0295   Epoch: 9   Global Step: 37820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:47,648-Speed 3315.74 samples/sec   Loss 4.7755   LearningRate 0.0294   Epoch: 9   Global Step: 37830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:50,811-Speed 3238.46 samples/sec   Loss 4.7867   LearningRate 0.0294   Epoch: 9   Global Step: 37840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:53,940-Speed 3273.45 samples/sec   Loss 4.8588   LearningRate 0.0294   Epoch: 9   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:14:57,036-Speed 3308.23 samples/sec   Loss 4.7611   LearningRate 0.0294   Epoch: 9   Global Step: 37860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:00,134-Speed 3306.53 samples/sec   Loss 4.7909   LearningRate 0.0294   Epoch: 9   Global Step: 37870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:03,227-Speed 3310.95 samples/sec   Loss 4.8174   LearningRate 0.0294   Epoch: 9   Global Step: 37880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:06,317-Speed 3314.61 samples/sec   Loss 4.8945   LearningRate 0.0294   Epoch: 9   Global Step: 37890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:09,394-Speed 3329.06 samples/sec   Loss 4.8808   LearningRate 0.0293   Epoch: 9   Global Step: 37900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:12,483-Speed 3314.98 samples/sec   Loss 4.8544   LearningRate 0.0293   Epoch: 9   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:15,579-Speed 3308.25 samples/sec   Loss 4.9473   LearningRate 0.0293   Epoch: 9   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:18,665-Speed 3318.89 samples/sec   Loss 4.7462   LearningRate 0.0293   Epoch: 9   Global Step: 37930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:21,752-Speed 3318.57 samples/sec   Loss 4.8838   LearningRate 0.0293   Epoch: 9   Global Step: 37940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:24,841-Speed 3316.07 samples/sec   Loss 4.8565   LearningRate 0.0293   Epoch: 9   Global Step: 37950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:27,934-Speed 3310.98 samples/sec   Loss 4.8609   LearningRate 0.0293   Epoch: 9   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:31,026-Speed 3313.00 samples/sec   Loss 4.8209   LearningRate 0.0293   Epoch: 9   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:34,114-Speed 3316.92 samples/sec   Loss 4.9321   LearningRate 0.0292   Epoch: 9   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:37,206-Speed 3312.08 samples/sec   Loss 4.9870   LearningRate 0.0292   Epoch: 9   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:15:40,279-Speed 3333.64 samples/sec   Loss 4.9338   LearningRate 0.0292   Epoch: 9   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:16:23,939-[lfw][38000]XNorm: 21.440585
Training: 2022-04-26 16:16:23,940-[lfw][38000]Accuracy-Flip: 0.99733+-0.00318
Training: 2022-04-26 16:16:23,940-[lfw][38000]Accuracy-Highest: 0.99783
Training: 2022-04-26 16:17:14,797-[cfp_fp][38000]XNorm: 20.358986
Training: 2022-04-26 16:17:14,798-[cfp_fp][38000]Accuracy-Flip: 0.98571+-0.00609
Training: 2022-04-26 16:17:14,798-[cfp_fp][38000]Accuracy-Highest: 0.98571
Training: 2022-04-26 16:17:58,387-[agedb_30][38000]XNorm: 21.279343
Training: 2022-04-26 16:17:58,388-[agedb_30][38000]Accuracy-Flip: 0.97550+-0.00820
Training: 2022-04-26 16:17:58,388-[agedb_30][38000]Accuracy-Highest: 0.97550
Training: 2022-04-26 16:18:01,467-Speed 72.53 samples/sec   Loss 4.9433   LearningRate 0.0292   Epoch: 9   Global Step: 38010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:04,542-Speed 3331.07 samples/sec   Loss 4.9092   LearningRate 0.0292   Epoch: 9   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:07,618-Speed 3329.50 samples/sec   Loss 5.0530   LearningRate 0.0292   Epoch: 9   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:10,697-Speed 3327.02 samples/sec   Loss 4.9120   LearningRate 0.0292   Epoch: 9   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:13,776-Speed 3326.01 samples/sec   Loss 4.9513   LearningRate 0.0291   Epoch: 9   Global Step: 38050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:16,864-Speed 3317.07 samples/sec   Loss 4.8408   LearningRate 0.0291   Epoch: 9   Global Step: 38060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:19,945-Speed 3324.63 samples/sec   Loss 4.9749   LearningRate 0.0291   Epoch: 9   Global Step: 38070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:23,033-Speed 3317.05 samples/sec   Loss 4.9492   LearningRate 0.0291   Epoch: 9   Global Step: 38080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:26,128-Speed 3308.45 samples/sec   Loss 4.9915   LearningRate 0.0291   Epoch: 9   Global Step: 38090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:29,202-Speed 3332.60 samples/sec   Loss 4.9917   LearningRate 0.0291   Epoch: 9   Global Step: 38100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:32,291-Speed 3316.28 samples/sec   Loss 5.0093   LearningRate 0.0291   Epoch: 9   Global Step: 38110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:35,381-Speed 3314.50 samples/sec   Loss 5.0147   LearningRate 0.0291   Epoch: 9   Global Step: 38120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:38,474-Speed 3312.08 samples/sec   Loss 5.0136   LearningRate 0.0290   Epoch: 9   Global Step: 38130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:41,573-Speed 3304.97 samples/sec   Loss 4.9009   LearningRate 0.0290   Epoch: 9   Global Step: 38140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:44,686-Speed 3289.69 samples/sec   Loss 4.9166   LearningRate 0.0290   Epoch: 9   Global Step: 38150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:47,789-Speed 3300.77 samples/sec   Loss 5.1174   LearningRate 0.0290   Epoch: 9   Global Step: 38160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:50,902-Speed 3290.37 samples/sec   Loss 4.9520   LearningRate 0.0290   Epoch: 9   Global Step: 38170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:54,057-Speed 3247.42 samples/sec   Loss 5.0346   LearningRate 0.0290   Epoch: 9   Global Step: 38180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:18:57,153-Speed 3308.03 samples/sec   Loss 4.7954   LearningRate 0.0290   Epoch: 9   Global Step: 38190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:00,250-Speed 3306.87 samples/sec   Loss 4.9408   LearningRate 0.0290   Epoch: 9   Global Step: 38200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:19:03,334-Speed 3321.58 samples/sec   Loss 5.0260   LearningRate 0.0289   Epoch: 9   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:06,444-Speed 3293.59 samples/sec   Loss 4.9481   LearningRate 0.0289   Epoch: 9   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:09,545-Speed 3303.05 samples/sec   Loss 5.0458   LearningRate 0.0289   Epoch: 9   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:12,649-Speed 3299.37 samples/sec   Loss 5.0439   LearningRate 0.0289   Epoch: 9   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:15,752-Speed 3301.05 samples/sec   Loss 5.1015   LearningRate 0.0289   Epoch: 9   Global Step: 38250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:18,849-Speed 3306.80 samples/sec   Loss 4.9737   LearningRate 0.0289   Epoch: 9   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:21,961-Speed 3290.58 samples/sec   Loss 5.0431   LearningRate 0.0289   Epoch: 9   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:25,056-Speed 3310.09 samples/sec   Loss 4.9933   LearningRate 0.0289   Epoch: 9   Global Step: 38280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:28,160-Speed 3299.69 samples/sec   Loss 5.0081   LearningRate 0.0288   Epoch: 9   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:31,258-Speed 3306.14 samples/sec   Loss 5.1653   LearningRate 0.0288   Epoch: 9   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:34,352-Speed 3310.27 samples/sec   Loss 5.0455   LearningRate 0.0288   Epoch: 9   Global Step: 38310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:19:37,424-Speed 3333.86 samples/sec   Loss 5.0333   LearningRate 0.0288   Epoch: 9   Global Step: 38320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:40,522-Speed 3306.67 samples/sec   Loss 5.0352   LearningRate 0.0288   Epoch: 9   Global Step: 38330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:43,609-Speed 3317.67 samples/sec   Loss 5.1138   LearningRate 0.0288   Epoch: 9   Global Step: 38340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:46,734-Speed 3277.38 samples/sec   Loss 4.9905   LearningRate 0.0288   Epoch: 9   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:49,934-Speed 3200.31 samples/sec   Loss 5.1828   LearningRate 0.0287   Epoch: 9   Global Step: 38360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:53,088-Speed 3247.44 samples/sec   Loss 4.9844   LearningRate 0.0287   Epoch: 9   Global Step: 38370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:56,170-Speed 3324.40 samples/sec   Loss 5.0017   LearningRate 0.0287   Epoch: 9   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:19:59,248-Speed 3327.27 samples/sec   Loss 5.0455   LearningRate 0.0287   Epoch: 9   Global Step: 38390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:02,329-Speed 3324.09 samples/sec   Loss 5.0493   LearningRate 0.0287   Epoch: 9   Global Step: 38400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:05,407-Speed 3327.92 samples/sec   Loss 5.0663   LearningRate 0.0287   Epoch: 9   Global Step: 38410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:08,479-Speed 3333.49 samples/sec   Loss 5.0868   LearningRate 0.0287   Epoch: 9   Global Step: 38420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:20:11,540-Speed 3346.61 samples/sec   Loss 4.9999   LearningRate 0.0287   Epoch: 9   Global Step: 38430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:14,618-Speed 3327.41 samples/sec   Loss 5.0871   LearningRate 0.0286   Epoch: 9   Global Step: 38440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:17,697-Speed 3326.29 samples/sec   Loss 5.1278   LearningRate 0.0286   Epoch: 9   Global Step: 38450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:20,766-Speed 3337.15 samples/sec   Loss 5.1023   LearningRate 0.0286   Epoch: 9   Global Step: 38460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:23,853-Speed 3319.00 samples/sec   Loss 5.0651   LearningRate 0.0286   Epoch: 9   Global Step: 38470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:26,963-Speed 3292.83 samples/sec   Loss 5.1641   LearningRate 0.0286   Epoch: 9   Global Step: 38480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:30,033-Speed 3336.55 samples/sec   Loss 5.1523   LearningRate 0.0286   Epoch: 9   Global Step: 38490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:20:33,086-Speed 3354.64 samples/sec   Loss 5.1841   LearningRate 0.0286   Epoch: 9   Global Step: 38500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:36,158-Speed 3333.61 samples/sec   Loss 5.1526   LearningRate 0.0286   Epoch: 9   Global Step: 38510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:39,229-Speed 3335.13 samples/sec   Loss 5.1777   LearningRate 0.0285   Epoch: 9   Global Step: 38520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:42,305-Speed 3330.29 samples/sec   Loss 5.1048   LearningRate 0.0285   Epoch: 9   Global Step: 38530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:45,374-Speed 3337.69 samples/sec   Loss 5.1620   LearningRate 0.0285   Epoch: 9   Global Step: 38540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:48,444-Speed 3335.32 samples/sec   Loss 5.1275   LearningRate 0.0285   Epoch: 9   Global Step: 38550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:51,529-Speed 3320.24 samples/sec   Loss 5.1353   LearningRate 0.0285   Epoch: 9   Global Step: 38560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:54,600-Speed 3335.54 samples/sec   Loss 5.2328   LearningRate 0.0285   Epoch: 9   Global Step: 38570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:20:57,673-Speed 3333.52 samples/sec   Loss 5.1758   LearningRate 0.0285   Epoch: 9   Global Step: 38580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:21:00,792-Speed 3283.06 samples/sec   Loss 5.1598   LearningRate 0.0284   Epoch: 9   Global Step: 38590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:21:03,870-Speed 3327.84 samples/sec   Loss 5.1746   LearningRate 0.0284   Epoch: 9   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:06,948-Speed 3327.35 samples/sec   Loss 5.1502   LearningRate 0.0284   Epoch: 9   Global Step: 38610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:10,028-Speed 3325.34 samples/sec   Loss 5.1444   LearningRate 0.0284   Epoch: 9   Global Step: 38620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:13,097-Speed 3338.18 samples/sec   Loss 5.0924   LearningRate 0.0284   Epoch: 9   Global Step: 38630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:16,222-Speed 3276.82 samples/sec   Loss 5.1126   LearningRate 0.0284   Epoch: 9   Global Step: 38640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:19,336-Speed 3289.39 samples/sec   Loss 5.1501   LearningRate 0.0284   Epoch: 9   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:22,411-Speed 3331.10 samples/sec   Loss 5.2206   LearningRate 0.0284   Epoch: 9   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:25,605-Speed 3207.70 samples/sec   Loss 5.1697   LearningRate 0.0283   Epoch: 9   Global Step: 38670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:28,772-Speed 3234.06 samples/sec   Loss 5.2012   LearningRate 0.0283   Epoch: 9   Global Step: 38680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:31,843-Speed 3334.78 samples/sec   Loss 5.1504   LearningRate 0.0283   Epoch: 9   Global Step: 38690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:34,912-Speed 3338.01 samples/sec   Loss 5.2063   LearningRate 0.0283   Epoch: 9   Global Step: 38700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:38,012-Speed 3303.19 samples/sec   Loss 5.1489   LearningRate 0.0283   Epoch: 9   Global Step: 38710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:41,089-Speed 3328.95 samples/sec   Loss 5.1758   LearningRate 0.0283   Epoch: 9   Global Step: 38720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:44,165-Speed 3330.46 samples/sec   Loss 5.1774   LearningRate 0.0283   Epoch: 9   Global Step: 38730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:47,240-Speed 3329.74 samples/sec   Loss 5.2536   LearningRate 0.0283   Epoch: 9   Global Step: 38740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:50,322-Speed 3323.97 samples/sec   Loss 5.1767   LearningRate 0.0282   Epoch: 9   Global Step: 38750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:53,394-Speed 3334.05 samples/sec   Loss 5.2739   LearningRate 0.0282   Epoch: 9   Global Step: 38760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:56,466-Speed 3334.35 samples/sec   Loss 5.1972   LearningRate 0.0282   Epoch: 9   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:21:59,543-Speed 3328.82 samples/sec   Loss 5.1483   LearningRate 0.0282   Epoch: 9   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:02,618-Speed 3330.06 samples/sec   Loss 5.2319   LearningRate 0.0282   Epoch: 9   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:05,699-Speed 3324.55 samples/sec   Loss 5.2688   LearningRate 0.0282   Epoch: 9   Global Step: 38800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:22:08,756-Speed 3350.40 samples/sec   Loss 5.2435   LearningRate 0.0282   Epoch: 9   Global Step: 38810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:11,834-Speed 3328.71 samples/sec   Loss 5.3289   LearningRate 0.0282   Epoch: 9   Global Step: 38820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:14,912-Speed 3327.60 samples/sec   Loss 5.2252   LearningRate 0.0281   Epoch: 9   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:18,003-Speed 3313.94 samples/sec   Loss 5.1696   LearningRate 0.0281   Epoch: 9   Global Step: 38840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:21,092-Speed 3315.07 samples/sec   Loss 5.1881   LearningRate 0.0281   Epoch: 9   Global Step: 38850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:24,167-Speed 3330.97 samples/sec   Loss 5.2212   LearningRate 0.0281   Epoch: 9   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:27,246-Speed 3326.35 samples/sec   Loss 5.2204   LearningRate 0.0281   Epoch: 9   Global Step: 38870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:30,320-Speed 3332.38 samples/sec   Loss 5.1722   LearningRate 0.0281   Epoch: 9   Global Step: 38880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:33,400-Speed 3325.26 samples/sec   Loss 5.2361   LearningRate 0.0281   Epoch: 9   Global Step: 38890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:36,480-Speed 3325.79 samples/sec   Loss 5.2593   LearningRate 0.0281   Epoch: 9   Global Step: 38900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:39,566-Speed 3318.85 samples/sec   Loss 5.2927   LearningRate 0.0280   Epoch: 9   Global Step: 38910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:22:42,653-Speed 3317.79 samples/sec   Loss 5.1772   LearningRate 0.0280   Epoch: 9   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:45,731-Speed 3327.90 samples/sec   Loss 5.2372   LearningRate 0.0280   Epoch: 9   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:48,806-Speed 3330.45 samples/sec   Loss 5.3532   LearningRate 0.0280   Epoch: 9   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:51,899-Speed 3312.28 samples/sec   Loss 5.2396   LearningRate 0.0280   Epoch: 9   Global Step: 38950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:54,974-Speed 3331.12 samples/sec   Loss 5.1935   LearningRate 0.0280   Epoch: 9   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:22:58,050-Speed 3329.01 samples/sec   Loss 5.4250   LearningRate 0.0280   Epoch: 9   Global Step: 38970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:01,126-Speed 3330.39 samples/sec   Loss 5.3414   LearningRate 0.0279   Epoch: 9   Global Step: 38980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:04,210-Speed 3321.42 samples/sec   Loss 5.2380   LearningRate 0.0279   Epoch: 9   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:07,287-Speed 3328.07 samples/sec   Loss 5.2673   LearningRate 0.0279   Epoch: 9   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:10,361-Speed 3331.76 samples/sec   Loss 5.2567   LearningRate 0.0279   Epoch: 9   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:13,422-Speed 3346.43 samples/sec   Loss 5.1938   LearningRate 0.0279   Epoch: 9   Global Step: 39020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:16,513-Speed 3313.61 samples/sec   Loss 5.2535   LearningRate 0.0279   Epoch: 9   Global Step: 39030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:19,595-Speed 3323.34 samples/sec   Loss 5.2877   LearningRate 0.0279   Epoch: 9   Global Step: 39040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:22,684-Speed 3316.25 samples/sec   Loss 5.2543   LearningRate 0.0279   Epoch: 9   Global Step: 39050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:25,764-Speed 3325.40 samples/sec   Loss 5.3064   LearningRate 0.0278   Epoch: 9   Global Step: 39060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:28,840-Speed 3329.33 samples/sec   Loss 5.3081   LearningRate 0.0278   Epoch: 9   Global Step: 39070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:31,926-Speed 3318.41 samples/sec   Loss 5.2183   LearningRate 0.0278   Epoch: 9   Global Step: 39080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:35,006-Speed 3325.40 samples/sec   Loss 5.3350   LearningRate 0.0278   Epoch: 9   Global Step: 39090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:38,088-Speed 3323.57 samples/sec   Loss 5.2268   LearningRate 0.0278   Epoch: 9   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:41,164-Speed 3329.90 samples/sec   Loss 5.2925   LearningRate 0.0278   Epoch: 9   Global Step: 39110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:44,245-Speed 3324.34 samples/sec   Loss 5.1885   LearningRate 0.0278   Epoch: 9   Global Step: 39120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:23:47,304-Speed 3348.04 samples/sec   Loss 5.2184   LearningRate 0.0278   Epoch: 9   Global Step: 39130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:50,381-Speed 3329.59 samples/sec   Loss 5.3080   LearningRate 0.0277   Epoch: 9   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:53,459-Speed 3327.81 samples/sec   Loss 5.3678   LearningRate 0.0277   Epoch: 9   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:56,535-Speed 3329.50 samples/sec   Loss 5.3304   LearningRate 0.0277   Epoch: 9   Global Step: 39160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:23:59,613-Speed 3327.31 samples/sec   Loss 5.3352   LearningRate 0.0277   Epoch: 9   Global Step: 39170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:02,697-Speed 3321.81 samples/sec   Loss 5.3236   LearningRate 0.0277   Epoch: 9   Global Step: 39180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:05,777-Speed 3325.58 samples/sec   Loss 5.3125   LearningRate 0.0277   Epoch: 9   Global Step: 39190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:08,853-Speed 3328.99 samples/sec   Loss 5.2820   LearningRate 0.0277   Epoch: 9   Global Step: 39200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:11,929-Speed 3329.74 samples/sec   Loss 5.2208   LearningRate 0.0277   Epoch: 9   Global Step: 39210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:15,008-Speed 3327.60 samples/sec   Loss 5.3394   LearningRate 0.0276   Epoch: 9   Global Step: 39220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:18,095-Speed 3317.15 samples/sec   Loss 5.3964   LearningRate 0.0276   Epoch: 9   Global Step: 39230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:24:21,157-Speed 3344.88 samples/sec   Loss 5.3712   LearningRate 0.0276   Epoch: 9   Global Step: 39240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:24,240-Speed 3322.18 samples/sec   Loss 5.2974   LearningRate 0.0276   Epoch: 9   Global Step: 39250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:27,323-Speed 3322.16 samples/sec   Loss 5.3635   LearningRate 0.0276   Epoch: 9   Global Step: 39260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:30,406-Speed 3322.94 samples/sec   Loss 5.2778   LearningRate 0.0276   Epoch: 9   Global Step: 39270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:33,487-Speed 3324.11 samples/sec   Loss 5.2592   LearningRate 0.0276   Epoch: 9   Global Step: 39280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:36,572-Speed 3320.36 samples/sec   Loss 5.4175   LearningRate 0.0276   Epoch: 9   Global Step: 39290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:39,651-Speed 3325.90 samples/sec   Loss 5.3302   LearningRate 0.0275   Epoch: 9   Global Step: 39300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:24:42,718-Speed 3339.46 samples/sec   Loss 5.3587   LearningRate 0.0275   Epoch: 9   Global Step: 39310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:24:45,798-Speed 3325.53 samples/sec   Loss 5.3550   LearningRate 0.0275   Epoch: 9   Global Step: 39320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:24:48,880-Speed 3323.77 samples/sec   Loss 5.3318   LearningRate 0.0275   Epoch: 9   Global Step: 39330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:24:51,961-Speed 3324.66 samples/sec   Loss 5.4066   LearningRate 0.0275   Epoch: 9   Global Step: 39340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:24:55,039-Speed 3326.45 samples/sec   Loss 5.3581   LearningRate 0.0275   Epoch: 9   Global Step: 39350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:24:58,125-Speed 3319.48 samples/sec   Loss 5.2338   LearningRate 0.0275   Epoch: 9   Global Step: 39360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:25:01,207-Speed 3323.51 samples/sec   Loss 5.3350   LearningRate 0.0275   Epoch: 9   Global Step: 39370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:25:04,293-Speed 3318.63 samples/sec   Loss 5.3652   LearningRate 0.0274   Epoch: 9   Global Step: 39380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:25:07,376-Speed 3322.56 samples/sec   Loss 5.3007   LearningRate 0.0274   Epoch: 9   Global Step: 39390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:25:10,458-Speed 3322.71 samples/sec   Loss 5.3871   LearningRate 0.0274   Epoch: 9   Global Step: 39400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:25:13,553-Speed 3309.40 samples/sec   Loss 5.3893   LearningRate 0.0274   Epoch: 9   Global Step: 39410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:16,648-Speed 3324.30 samples/sec   Loss 5.3802   LearningRate 0.0274   Epoch: 9   Global Step: 39420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:19,731-Speed 3322.16 samples/sec   Loss 5.2879   LearningRate 0.0274   Epoch: 9   Global Step: 39430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:22,833-Speed 3306.51 samples/sec   Loss 5.3234   LearningRate 0.0274   Epoch: 9   Global Step: 39440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:25,920-Speed 3318.94 samples/sec   Loss 5.2675   LearningRate 0.0274   Epoch: 9   Global Step: 39450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:28,999-Speed 3326.27 samples/sec   Loss 5.3291   LearningRate 0.0273   Epoch: 9   Global Step: 39460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:32,086-Speed 3318.07 samples/sec   Loss 5.2383   LearningRate 0.0273   Epoch: 9   Global Step: 39470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:35,173-Speed 3321.96 samples/sec   Loss 5.3597   LearningRate 0.0273   Epoch: 9   Global Step: 39480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:38,266-Speed 3311.75 samples/sec   Loss 5.2294   LearningRate 0.0273   Epoch: 9   Global Step: 39490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:41,349-Speed 3322.79 samples/sec   Loss 5.3715   LearningRate 0.0273   Epoch: 9   Global Step: 39500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:44,431-Speed 3322.65 samples/sec   Loss 5.2793   LearningRate 0.0273   Epoch: 9   Global Step: 39510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:25:47,505-Speed 3342.81 samples/sec   Loss 5.3604   LearningRate 0.0273   Epoch: 9   Global Step: 39520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:50,590-Speed 3320.00 samples/sec   Loss 5.4448   LearningRate 0.0272   Epoch: 9   Global Step: 39530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:53,679-Speed 3326.57 samples/sec   Loss 5.3293   LearningRate 0.0272   Epoch: 9   Global Step: 39540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:56,756-Speed 3328.73 samples/sec   Loss 5.2922   LearningRate 0.0272   Epoch: 9   Global Step: 39550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:25:59,834-Speed 3327.51 samples/sec   Loss 5.3548   LearningRate 0.0272   Epoch: 9   Global Step: 39560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:02,911-Speed 3328.55 samples/sec   Loss 5.3607   LearningRate 0.0272   Epoch: 9   Global Step: 39570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:06,009-Speed 3317.24 samples/sec   Loss 5.3275   LearningRate 0.0272   Epoch: 9   Global Step: 39580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:09,087-Speed 3327.46 samples/sec   Loss 5.2406   LearningRate 0.0272   Epoch: 9   Global Step: 39590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:12,171-Speed 3320.92 samples/sec   Loss 5.2988   LearningRate 0.0272   Epoch: 9   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:15,268-Speed 3315.25 samples/sec   Loss 5.4324   LearningRate 0.0271   Epoch: 9   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:18,319-Speed 3356.63 samples/sec   Loss 5.3129   LearningRate 0.0271   Epoch: 9   Global Step: 39620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:21,399-Speed 3330.25 samples/sec   Loss 5.4203   LearningRate 0.0271   Epoch: 9   Global Step: 39630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:24,476-Speed 3328.15 samples/sec   Loss 5.3609   LearningRate 0.0271   Epoch: 9   Global Step: 39640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:27,556-Speed 3325.56 samples/sec   Loss 5.4540   LearningRate 0.0271   Epoch: 9   Global Step: 39650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:30,639-Speed 3321.49 samples/sec   Loss 5.2871   LearningRate 0.0271   Epoch: 9   Global Step: 39660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:33,722-Speed 3327.55 samples/sec   Loss 5.3664   LearningRate 0.0271   Epoch: 9   Global Step: 39670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:36,807-Speed 3319.54 samples/sec   Loss 5.3463   LearningRate 0.0271   Epoch: 9   Global Step: 39680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:39,902-Speed 3308.89 samples/sec   Loss 5.3150   LearningRate 0.0270   Epoch: 9   Global Step: 39690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:42,984-Speed 3323.31 samples/sec   Loss 5.3799   LearningRate 0.0270   Epoch: 9   Global Step: 39700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:46,071-Speed 3328.09 samples/sec   Loss 5.4186   LearningRate 0.0270   Epoch: 9   Global Step: 39710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:26:49,154-Speed 3322.28 samples/sec   Loss 5.3483   LearningRate 0.0270   Epoch: 9   Global Step: 39720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:52,252-Speed 3319.58 samples/sec   Loss 5.3663   LearningRate 0.0270   Epoch: 9   Global Step: 39730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:55,389-Speed 3264.37 samples/sec   Loss 5.4219   LearningRate 0.0270   Epoch: 9   Global Step: 39740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:26:58,483-Speed 3318.11 samples/sec   Loss 5.3676   LearningRate 0.0270   Epoch: 9   Global Step: 39750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:01,564-Speed 3323.69 samples/sec   Loss 5.3806   LearningRate 0.0270   Epoch: 9   Global Step: 39760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:04,649-Speed 3320.01 samples/sec   Loss 5.3157   LearningRate 0.0269   Epoch: 9   Global Step: 39770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:07,733-Speed 3321.38 samples/sec   Loss 5.3676   LearningRate 0.0269   Epoch: 9   Global Step: 39780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:10,830-Speed 3314.40 samples/sec   Loss 5.3560   LearningRate 0.0269   Epoch: 9   Global Step: 39790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:13,921-Speed 3313.60 samples/sec   Loss 5.4139   LearningRate 0.0269   Epoch: 9   Global Step: 39800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:17,020-Speed 3305.18 samples/sec   Loss 5.3348   LearningRate 0.0269   Epoch: 9   Global Step: 39810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:20,093-Speed 3333.34 samples/sec   Loss 5.3413   LearningRate 0.0269   Epoch: 9   Global Step: 39820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:23,174-Speed 3324.33 samples/sec   Loss 5.3029   LearningRate 0.0269   Epoch: 9   Global Step: 39830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:26,274-Speed 3304.06 samples/sec   Loss 5.4173   LearningRate 0.0269   Epoch: 9   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:29,358-Speed 3320.61 samples/sec   Loss 5.3967   LearningRate 0.0268   Epoch: 9   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:32,444-Speed 3319.80 samples/sec   Loss 5.3451   LearningRate 0.0268   Epoch: 9   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:35,528-Speed 3321.06 samples/sec   Loss 5.3282   LearningRate 0.0268   Epoch: 9   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:38,610-Speed 3322.53 samples/sec   Loss 5.3524   LearningRate 0.0268   Epoch: 9   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:41,698-Speed 3317.11 samples/sec   Loss 5.3256   LearningRate 0.0268   Epoch: 9   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:44,782-Speed 3320.92 samples/sec   Loss 5.4043   LearningRate 0.0268   Epoch: 9   Global Step: 39900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:27:47,896-Speed 3289.82 samples/sec   Loss 5.3406   LearningRate 0.0268   Epoch: 9   Global Step: 39910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:27:51,053-Speed 3244.09 samples/sec   Loss 5.3172   LearningRate 0.0268   Epoch: 9   Global Step: 39920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:27:54,157-Speed 3299.85 samples/sec   Loss 5.4159   LearningRate 0.0267   Epoch: 9   Global Step: 39930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:27:57,243-Speed 3319.37 samples/sec   Loss 5.3799   LearningRate 0.0267   Epoch: 9   Global Step: 39940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:28:00,328-Speed 3319.55 samples/sec   Loss 5.4506   LearningRate 0.0267   Epoch: 9   Global Step: 39950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:28:03,464-Speed 3266.43 samples/sec   Loss 5.5639   LearningRate 0.0267   Epoch: 9   Global Step: 39960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:28:06,547-Speed 3322.02 samples/sec   Loss 5.4729   LearningRate 0.0267   Epoch: 9   Global Step: 39970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:28:09,633-Speed 3318.85 samples/sec   Loss 5.3678   LearningRate 0.0267   Epoch: 9   Global Step: 39980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:28:12,811-Speed 3223.09 samples/sec   Loss 5.4010   LearningRate 0.0267   Epoch: 9   Global Step: 39990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:28:15,930-Speed 3284.07 samples/sec   Loss 5.3497   LearningRate 0.0267   Epoch: 9   Global Step: 40000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-26 16:28:59,497-[lfw][40000]XNorm: 22.320295
Training: 2022-04-26 16:28:59,498-[lfw][40000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-04-26 16:28:59,498-[lfw][40000]Accuracy-Highest: 0.99783
Training: 2022-04-26 16:29:50,010-[cfp_fp][40000]XNorm: 21.546572
Training: 2022-04-26 16:29:50,011-[cfp_fp][40000]Accuracy-Flip: 0.98843+-0.00467
Training: 2022-04-26 16:29:50,012-[cfp_fp][40000]Accuracy-Highest: 0.98843
Training: 2022-04-26 16:30:33,555-[agedb_30][40000]XNorm: 22.644222
Training: 2022-04-26 16:30:33,556-[agedb_30][40000]Accuracy-Flip: 0.97233+-0.00821
Training: 2022-04-26 16:30:33,556-[agedb_30][40000]Accuracy-Highest: 0.97550
Training: 2022-04-26 16:30:36,639-Speed 72.77 samples/sec   Loss 5.3787   LearningRate 0.0266   Epoch: 9   Global Step: 40010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:30:39,716-Speed 3327.94 samples/sec   Loss 5.3968   LearningRate 0.0266   Epoch: 9   Global Step: 40020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:30:42,798-Speed 3323.67 samples/sec   Loss 5.4427   LearningRate 0.0266   Epoch: 9   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:30:45,877-Speed 3326.93 samples/sec   Loss 5.5214   LearningRate 0.0266   Epoch: 9   Global Step: 40040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:30:48,959-Speed 3322.76 samples/sec   Loss 5.4347   LearningRate 0.0266   Epoch: 9   Global Step: 40050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:30:52,043-Speed 3321.60 samples/sec   Loss 5.2573   LearningRate 0.0266   Epoch: 9   Global Step: 40060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:30:55,129-Speed 3319.26 samples/sec   Loss 5.4225   LearningRate 0.0266   Epoch: 9   Global Step: 40070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:30:58,214-Speed 3319.55 samples/sec   Loss 5.4640   LearningRate 0.0266   Epoch: 9   Global Step: 40080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:01,303-Speed 3316.48 samples/sec   Loss 5.3083   LearningRate 0.0265   Epoch: 9   Global Step: 40090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:04,389-Speed 3318.38 samples/sec   Loss 5.4435   LearningRate 0.0265   Epoch: 9   Global Step: 40100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:07,483-Speed 3309.84 samples/sec   Loss 5.3973   LearningRate 0.0265   Epoch: 9   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:31:10,559-Speed 3330.17 samples/sec   Loss 5.3503   LearningRate 0.0265   Epoch: 9   Global Step: 40120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:13,651-Speed 3312.66 samples/sec   Loss 5.3332   LearningRate 0.0265   Epoch: 9   Global Step: 40130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:16,736-Speed 3320.16 samples/sec   Loss 5.3166   LearningRate 0.0265   Epoch: 9   Global Step: 40140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:19,822-Speed 3318.51 samples/sec   Loss 5.4443   LearningRate 0.0265   Epoch: 9   Global Step: 40150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:22,940-Speed 3285.37 samples/sec   Loss 5.4077   LearningRate 0.0265   Epoch: 9   Global Step: 40160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:26,053-Speed 3290.53 samples/sec   Loss 5.3447   LearningRate 0.0264   Epoch: 9   Global Step: 40170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:29,163-Speed 3292.97 samples/sec   Loss 5.4267   LearningRate 0.0264   Epoch: 9   Global Step: 40180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:32,247-Speed 3321.21 samples/sec   Loss 5.4221   LearningRate 0.0264   Epoch: 9   Global Step: 40190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:35,354-Speed 3297.10 samples/sec   Loss 5.3201   LearningRate 0.0264   Epoch: 9   Global Step: 40200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:38,446-Speed 3312.23 samples/sec   Loss 5.3918   LearningRate 0.0264   Epoch: 9   Global Step: 40210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:41,541-Speed 3309.16 samples/sec   Loss 5.3830   LearningRate 0.0264   Epoch: 9   Global Step: 40220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:44,634-Speed 3311.93 samples/sec   Loss 5.3662   LearningRate 0.0264   Epoch: 9   Global Step: 40230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:47,728-Speed 3310.98 samples/sec   Loss 5.2901   LearningRate 0.0264   Epoch: 9   Global Step: 40240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:50,850-Speed 3280.10 samples/sec   Loss 5.3533   LearningRate 0.0263   Epoch: 9   Global Step: 40250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:53,942-Speed 3313.05 samples/sec   Loss 5.3892   LearningRate 0.0263   Epoch: 9   Global Step: 40260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:31:57,027-Speed 3319.39 samples/sec   Loss 5.2538   LearningRate 0.0263   Epoch: 9   Global Step: 40270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:00,121-Speed 3310.97 samples/sec   Loss 5.3638   LearningRate 0.0263   Epoch: 9   Global Step: 40280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:03,207-Speed 3318.36 samples/sec   Loss 5.4576   LearningRate 0.0263   Epoch: 9   Global Step: 40290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:06,294-Speed 3318.46 samples/sec   Loss 5.4270   LearningRate 0.0263   Epoch: 9   Global Step: 40300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:09,377-Speed 3321.46 samples/sec   Loss 5.4353   LearningRate 0.0263   Epoch: 9   Global Step: 40310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:12,469-Speed 3313.13 samples/sec   Loss 5.3353   LearningRate 0.0263   Epoch: 9   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-26 16:32:15,541-Speed 3333.90 samples/sec   Loss 5.3147   LearningRate 0.0262   Epoch: 9   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:18,625-Speed 3320.85 samples/sec   Loss 5.3550   LearningRate 0.0262   Epoch: 9   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:21,714-Speed 3315.85 samples/sec   Loss 5.3538   LearningRate 0.0262   Epoch: 9   Global Step: 40350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:24,804-Speed 3315.18 samples/sec   Loss 5.4166   LearningRate 0.0262   Epoch: 9   Global Step: 40360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-26 16:32:27,892-Speed 3316.17 samples/sec   Loss 5.3729   LearningRate 0.0262   Epoch: 9   Global Step: 40370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:30,987-Speed 3309.36 samples/sec   Loss 5.4741   LearningRate 0.0262   Epoch: 9   Global Step: 40380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:34,070-Speed 3322.61 samples/sec   Loss 5.4849   LearningRate 0.0262   Epoch: 9   Global Step: 40390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:37,173-Speed 3301.00 samples/sec   Loss 5.3425   LearningRate 0.0262   Epoch: 9   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:40,259-Speed 3318.74 samples/sec   Loss 5.4472   LearningRate 0.0261   Epoch: 9   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:43,348-Speed 3316.16 samples/sec   Loss 5.3798   LearningRate 0.0261   Epoch: 9   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:46,433-Speed 3319.10 samples/sec   Loss 5.3879   LearningRate 0.0261   Epoch: 9   Global Step: 40430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:32:49,497-Speed 3343.03 samples/sec   Loss 5.4497   LearningRate 0.0261   Epoch: 9   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:52,591-Speed 3310.41 samples/sec   Loss 5.5256   LearningRate 0.0261   Epoch: 9   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:55,675-Speed 3321.20 samples/sec   Loss 5.3641   LearningRate 0.0261   Epoch: 9   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:32:58,749-Speed 3331.45 samples/sec   Loss 5.4214   LearningRate 0.0261   Epoch: 9   Global Step: 40470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:01,891-Speed 3260.19 samples/sec   Loss 5.3714   LearningRate 0.0261   Epoch: 9   Global Step: 40480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:04,984-Speed 3311.91 samples/sec   Loss 5.2477   LearningRate 0.0261   Epoch: 9   Global Step: 40490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:08,059-Speed 3330.52 samples/sec   Loss 5.2727   LearningRate 0.0260   Epoch: 9   Global Step: 40500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:11,141-Speed 3323.32 samples/sec   Loss 5.4702   LearningRate 0.0260   Epoch: 9   Global Step: 40510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:14,216-Speed 3331.08 samples/sec   Loss 5.4514   LearningRate 0.0260   Epoch: 9   Global Step: 40520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:17,295-Speed 3326.36 samples/sec   Loss 5.3597   LearningRate 0.0260   Epoch: 9   Global Step: 40530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:20,376-Speed 3324.15 samples/sec   Loss 5.3135   LearningRate 0.0260   Epoch: 9   Global Step: 40540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:23,464-Speed 3316.07 samples/sec   Loss 5.3882   LearningRate 0.0260   Epoch: 9   Global Step: 40550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:26,549-Speed 3320.74 samples/sec   Loss 5.3570   LearningRate 0.0260   Epoch: 9   Global Step: 40560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:33:29,628-Speed 3326.66 samples/sec   Loss 5.3595   LearningRate 0.0260   Epoch: 9   Global Step: 40570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:32,712-Speed 3320.48 samples/sec   Loss 5.3704   LearningRate 0.0259   Epoch: 9   Global Step: 40580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:35,789-Speed 3328.77 samples/sec   Loss 5.3224   LearningRate 0.0259   Epoch: 9   Global Step: 40590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:38,869-Speed 3326.08 samples/sec   Loss 5.3745   LearningRate 0.0259   Epoch: 9   Global Step: 40600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:41,953-Speed 3321.37 samples/sec   Loss 5.3682   LearningRate 0.0259   Epoch: 9   Global Step: 40610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:45,031-Speed 3327.53 samples/sec   Loss 5.4593   LearningRate 0.0259   Epoch: 9   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:48,112-Speed 3324.36 samples/sec   Loss 5.3713   LearningRate 0.0259   Epoch: 9   Global Step: 40630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:51,203-Speed 3312.77 samples/sec   Loss 5.4307   LearningRate 0.0259   Epoch: 9   Global Step: 40640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:54,285-Speed 3323.43 samples/sec   Loss 5.5057   LearningRate 0.0259   Epoch: 9   Global Step: 40650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:33:57,366-Speed 3324.19 samples/sec   Loss 5.3982   LearningRate 0.0258   Epoch: 9   Global Step: 40660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:34:00,411-Speed 3363.95 samples/sec   Loss 5.4283   LearningRate 0.0258   Epoch: 9   Global Step: 40670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:03,494-Speed 3322.00 samples/sec   Loss 5.3692   LearningRate 0.0258   Epoch: 9   Global Step: 40680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:06,576-Speed 3323.31 samples/sec   Loss 5.3499   LearningRate 0.0258   Epoch: 9   Global Step: 40690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:09,659-Speed 3322.14 samples/sec   Loss 5.3898   LearningRate 0.0258   Epoch: 9   Global Step: 40700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:12,740-Speed 3324.93 samples/sec   Loss 5.4525   LearningRate 0.0258   Epoch: 9   Global Step: 40710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:15,826-Speed 3318.80 samples/sec   Loss 5.4244   LearningRate 0.0258   Epoch: 9   Global Step: 40720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:18,912-Speed 3319.94 samples/sec   Loss 5.3175   LearningRate 0.0258   Epoch: 9   Global Step: 40730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:21,994-Speed 3322.46 samples/sec   Loss 5.4638   LearningRate 0.0257   Epoch: 9   Global Step: 40740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:25,076-Speed 3323.40 samples/sec   Loss 5.3380   LearningRate 0.0257   Epoch: 9   Global Step: 40750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:28,182-Speed 3297.74 samples/sec   Loss 5.4916   LearningRate 0.0257   Epoch: 9   Global Step: 40760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:31,263-Speed 3324.84 samples/sec   Loss 5.4273   LearningRate 0.0257   Epoch: 9   Global Step: 40770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:34:34,368-Speed 3297.72 samples/sec   Loss 5.3111   LearningRate 0.0257   Epoch: 9   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:34:37,662-Speed 3109.64 samples/sec   Loss 5.3736   LearningRate 0.0257   Epoch: 9   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:34:40,775-Speed 3295.21 samples/sec   Loss 5.3491   LearningRate 0.0257   Epoch: 9   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:34:43,840-Speed 3341.11 samples/sec   Loss 5.3952   LearningRate 0.0257   Epoch: 9   Global Step: 40810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:46,919-Speed 3327.48 samples/sec   Loss 5.4172   LearningRate 0.0256   Epoch: 9   Global Step: 40820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:50,005-Speed 3318.30 samples/sec   Loss 5.4044   LearningRate 0.0256   Epoch: 9   Global Step: 40830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:53,186-Speed 3219.83 samples/sec   Loss 5.4380   LearningRate 0.0256   Epoch: 9   Global Step: 40840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:56,271-Speed 3319.95 samples/sec   Loss 5.4018   LearningRate 0.0256   Epoch: 9   Global Step: 40850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:34:59,363-Speed 3312.97 samples/sec   Loss 5.3546   LearningRate 0.0256   Epoch: 9   Global Step: 40860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:35:02,448-Speed 3319.74 samples/sec   Loss 5.3559   LearningRate 0.0256   Epoch: 9   Global Step: 40870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:35:05,553-Speed 3299.36 samples/sec   Loss 5.3558   LearningRate 0.0256   Epoch: 9   Global Step: 40880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:35:08,651-Speed 3305.88 samples/sec   Loss 5.2722   LearningRate 0.0256   Epoch: 9   Global Step: 40890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:35:11,737-Speed 3319.16 samples/sec   Loss 5.4052   LearningRate 0.0255   Epoch: 9   Global Step: 40900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:35:14,820-Speed 3322.31 samples/sec   Loss 5.4444   LearningRate 0.0255   Epoch: 9   Global Step: 40910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:17,919-Speed 3305.33 samples/sec   Loss 5.3162   LearningRate 0.0255   Epoch: 9   Global Step: 40920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:21,014-Speed 3309.29 samples/sec   Loss 5.2672   LearningRate 0.0255   Epoch: 9   Global Step: 40930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:24,109-Speed 3308.38 samples/sec   Loss 5.5072   LearningRate 0.0255   Epoch: 9   Global Step: 40940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:27,193-Speed 3321.20 samples/sec   Loss 5.3892   LearningRate 0.0255   Epoch: 9   Global Step: 40950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:30,275-Speed 3323.66 samples/sec   Loss 5.2947   LearningRate 0.0255   Epoch: 9   Global Step: 40960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:33,361-Speed 3319.17 samples/sec   Loss 5.3769   LearningRate 0.0255   Epoch: 9   Global Step: 40970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:36,453-Speed 3312.33 samples/sec   Loss 5.3993   LearningRate 0.0254   Epoch: 9   Global Step: 40980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:39,547-Speed 3310.73 samples/sec   Loss 5.4013   LearningRate 0.0254   Epoch: 9   Global Step: 40990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:42,661-Speed 3320.19 samples/sec   Loss 5.3207   LearningRate 0.0254   Epoch: 9   Global Step: 41000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:45,746-Speed 3319.98 samples/sec   Loss 5.4741   LearningRate 0.0254   Epoch: 9   Global Step: 41010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:48,831-Speed 3321.24 samples/sec   Loss 5.2901   LearningRate 0.0254   Epoch: 9   Global Step: 41020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:51,941-Speed 3292.40 samples/sec   Loss 5.3181   LearningRate 0.0254   Epoch: 9   Global Step: 41030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:55,035-Speed 3311.08 samples/sec   Loss 5.3850   LearningRate 0.0254   Epoch: 9   Global Step: 41040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:35:58,122-Speed 3318.29 samples/sec   Loss 5.3813   LearningRate 0.0254   Epoch: 9   Global Step: 41050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:01,211-Speed 3315.22 samples/sec   Loss 5.3549   LearningRate 0.0254   Epoch: 9   Global Step: 41060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:04,300-Speed 3316.00 samples/sec   Loss 5.3617   LearningRate 0.0253   Epoch: 9   Global Step: 41070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:07,393-Speed 3311.42 samples/sec   Loss 5.2571   LearningRate 0.0253   Epoch: 9   Global Step: 41080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:10,486-Speed 3311.39 samples/sec   Loss 5.3652   LearningRate 0.0253   Epoch: 9   Global Step: 41090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:13,583-Speed 3307.53 samples/sec   Loss 5.4229   LearningRate 0.0253   Epoch: 9   Global Step: 41100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:16,662-Speed 3326.51 samples/sec   Loss 5.3072   LearningRate 0.0253   Epoch: 9   Global Step: 41110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:19,754-Speed 3313.10 samples/sec   Loss 5.3773   LearningRate 0.0253   Epoch: 9   Global Step: 41120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:22,839-Speed 3320.04 samples/sec   Loss 5.3613   LearningRate 0.0253   Epoch: 9   Global Step: 41130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:25,928-Speed 3315.72 samples/sec   Loss 5.3726   LearningRate 0.0253   Epoch: 9   Global Step: 41140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:29,048-Speed 3282.91 samples/sec   Loss 5.3927   LearningRate 0.0252   Epoch: 9   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:32,135-Speed 3317.58 samples/sec   Loss 5.4805   LearningRate 0.0252   Epoch: 9   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:36:35,219-Speed 3320.24 samples/sec   Loss 5.3364   LearningRate 0.0252   Epoch: 9   Global Step: 41170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:36:38,311-Speed 3312.83 samples/sec   Loss 5.3739   LearningRate 0.0252   Epoch: 9   Global Step: 41180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:36:41,408-Speed 3307.49 samples/sec   Loss 5.2285   LearningRate 0.0252   Epoch: 9   Global Step: 41190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:36:44,502-Speed 3310.23 samples/sec   Loss 5.2909   LearningRate 0.0252   Epoch: 9   Global Step: 41200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:36:47,593-Speed 3314.49 samples/sec   Loss 5.3517   LearningRate 0.0252   Epoch: 9   Global Step: 41210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:36:50,702-Speed 3293.44 samples/sec   Loss 5.3602   LearningRate 0.0252   Epoch: 9   Global Step: 41220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:36:53,982-Speed 3122.78 samples/sec   Loss 5.2757   LearningRate 0.0251   Epoch: 9   Global Step: 41230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:36:57,073-Speed 3314.07 samples/sec   Loss 5.2996   LearningRate 0.0251   Epoch: 9   Global Step: 41240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:00,200-Speed 3280.25 samples/sec   Loss 5.4617   LearningRate 0.0251   Epoch: 9   Global Step: 41250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:03,290-Speed 3314.85 samples/sec   Loss 5.4193   LearningRate 0.0251   Epoch: 9   Global Step: 41260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:06,403-Speed 3290.11 samples/sec   Loss 5.2563   LearningRate 0.0251   Epoch: 9   Global Step: 41270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:37:09,487-Speed 3321.46 samples/sec   Loss 5.4142   LearningRate 0.0251   Epoch: 9   Global Step: 41280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:37:12,572-Speed 3319.77 samples/sec   Loss 5.3067   LearningRate 0.0251   Epoch: 9   Global Step: 41290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:37:15,643-Speed 3335.40 samples/sec   Loss 5.3548   LearningRate 0.0251   Epoch: 9   Global Step: 41300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:18,727-Speed 3321.55 samples/sec   Loss 5.3022   LearningRate 0.0250   Epoch: 9   Global Step: 41310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:21,824-Speed 3306.95 samples/sec   Loss 5.3112   LearningRate 0.0250   Epoch: 9   Global Step: 41320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:24,915-Speed 3312.92 samples/sec   Loss 5.3298   LearningRate 0.0250   Epoch: 9   Global Step: 41330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:28,062-Speed 3254.93 samples/sec   Loss 5.3848   LearningRate 0.0250   Epoch: 9   Global Step: 41340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:31,144-Speed 3323.45 samples/sec   Loss 5.3692   LearningRate 0.0250   Epoch: 9   Global Step: 41350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:43,480-Speed 830.17 samples/sec   Loss 3.8559   LearningRate 0.0250   Epoch: 10   Global Step: 41360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:46,563-Speed 3322.47 samples/sec   Loss 3.6560   LearningRate 0.0250   Epoch: 10   Global Step: 41370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:49,752-Speed 3211.24 samples/sec   Loss 3.8120   LearningRate 0.0250   Epoch: 10   Global Step: 41380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:52,854-Speed 3302.12 samples/sec   Loss 3.8680   LearningRate 0.0250   Epoch: 10   Global Step: 41390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:37:55,941-Speed 3317.28 samples/sec   Loss 3.9557   LearningRate 0.0249   Epoch: 10   Global Step: 41400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:37:59,032-Speed 3313.64 samples/sec   Loss 3.8024   LearningRate 0.0249   Epoch: 10   Global Step: 41410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:02,126-Speed 3310.70 samples/sec   Loss 3.8707   LearningRate 0.0249   Epoch: 10   Global Step: 41420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:05,231-Speed 3298.73 samples/sec   Loss 3.8430   LearningRate 0.0249   Epoch: 10   Global Step: 41430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:08,314-Speed 3322.01 samples/sec   Loss 3.8579   LearningRate 0.0249   Epoch: 10   Global Step: 41440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:11,403-Speed 3315.95 samples/sec   Loss 3.9023   LearningRate 0.0249   Epoch: 10   Global Step: 41450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:14,495-Speed 3312.23 samples/sec   Loss 3.9258   LearningRate 0.0249   Epoch: 10   Global Step: 41460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:17,587-Speed 3312.97 samples/sec   Loss 3.9248   LearningRate 0.0249   Epoch: 10   Global Step: 41470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:20,671-Speed 3321.59 samples/sec   Loss 3.8875   LearningRate 0.0248   Epoch: 10   Global Step: 41480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:23,753-Speed 3323.14 samples/sec   Loss 3.9416   LearningRate 0.0248   Epoch: 10   Global Step: 41490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:26,836-Speed 3321.74 samples/sec   Loss 3.9176   LearningRate 0.0248   Epoch: 10   Global Step: 41500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:29,939-Speed 3301.50 samples/sec   Loss 3.9160   LearningRate 0.0248   Epoch: 10   Global Step: 41510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:33,024-Speed 3319.14 samples/sec   Loss 3.9252   LearningRate 0.0248   Epoch: 10   Global Step: 41520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:36,141-Speed 3285.76 samples/sec   Loss 3.9252   LearningRate 0.0248   Epoch: 10   Global Step: 41530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:39,226-Speed 3320.05 samples/sec   Loss 3.9001   LearningRate 0.0248   Epoch: 10   Global Step: 41540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:42,316-Speed 3315.88 samples/sec   Loss 3.8657   LearningRate 0.0248   Epoch: 10   Global Step: 41550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:45,404-Speed 3315.90 samples/sec   Loss 4.0293   LearningRate 0.0247   Epoch: 10   Global Step: 41560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:48,513-Speed 3295.39 samples/sec   Loss 4.0223   LearningRate 0.0247   Epoch: 10   Global Step: 41570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:51,667-Speed 3247.06 samples/sec   Loss 4.0640   LearningRate 0.0247   Epoch: 10   Global Step: 41580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:54,844-Speed 3223.21 samples/sec   Loss 4.0521   LearningRate 0.0247   Epoch: 10   Global Step: 41590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:38:57,948-Speed 3300.14 samples/sec   Loss 4.0389   LearningRate 0.0247   Epoch: 10   Global Step: 41600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:39:01,032-Speed 3321.40 samples/sec   Loss 4.0251   LearningRate 0.0247   Epoch: 10   Global Step: 41610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:04,131-Speed 3304.31 samples/sec   Loss 4.0892   LearningRate 0.0247   Epoch: 10   Global Step: 41620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:07,223-Speed 3313.02 samples/sec   Loss 3.9233   LearningRate 0.0247   Epoch: 10   Global Step: 41630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:10,324-Speed 3302.67 samples/sec   Loss 4.0462   LearningRate 0.0247   Epoch: 10   Global Step: 41640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:13,416-Speed 3313.18 samples/sec   Loss 4.0598   LearningRate 0.0246   Epoch: 10   Global Step: 41650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:16,508-Speed 3312.55 samples/sec   Loss 4.1123   LearningRate 0.0246   Epoch: 10   Global Step: 41660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:19,599-Speed 3312.94 samples/sec   Loss 4.0402   LearningRate 0.0246   Epoch: 10   Global Step: 41670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:22,692-Speed 3311.57 samples/sec   Loss 4.0244   LearningRate 0.0246   Epoch: 10   Global Step: 41680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:25,805-Speed 3289.84 samples/sec   Loss 4.0855   LearningRate 0.0246   Epoch: 10   Global Step: 41690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:28,926-Speed 3282.28 samples/sec   Loss 4.0816   LearningRate 0.0246   Epoch: 10   Global Step: 41700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:32,012-Speed 3318.58 samples/sec   Loss 4.0475   LearningRate 0.0246   Epoch: 10   Global Step: 41710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:39:35,094-Speed 3323.37 samples/sec   Loss 4.0683   LearningRate 0.0246   Epoch: 10   Global Step: 41720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:38,185-Speed 3313.14 samples/sec   Loss 4.2715   LearningRate 0.0245   Epoch: 10   Global Step: 41730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:41,276-Speed 3314.28 samples/sec   Loss 4.1903   LearningRate 0.0245   Epoch: 10   Global Step: 41740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:44,365-Speed 3315.90 samples/sec   Loss 4.1038   LearningRate 0.0245   Epoch: 10   Global Step: 41750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:47,456-Speed 3313.15 samples/sec   Loss 4.1298   LearningRate 0.0245   Epoch: 10   Global Step: 41760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:50,541-Speed 3319.97 samples/sec   Loss 4.0376   LearningRate 0.0245   Epoch: 10   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:53,651-Speed 3294.14 samples/sec   Loss 4.1417   LearningRate 0.0245   Epoch: 10   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:56,738-Speed 3316.84 samples/sec   Loss 4.0827   LearningRate 0.0245   Epoch: 10   Global Step: 41790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:39:59,825-Speed 3318.88 samples/sec   Loss 4.1953   LearningRate 0.0245   Epoch: 10   Global Step: 41800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:02,913-Speed 3316.65 samples/sec   Loss 4.1439   LearningRate 0.0244   Epoch: 10   Global Step: 41810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:05,987-Speed 3331.89 samples/sec   Loss 4.0956   LearningRate 0.0244   Epoch: 10   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:09,080-Speed 3310.65 samples/sec   Loss 4.1725   LearningRate 0.0244   Epoch: 10   Global Step: 41830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:12,180-Speed 3303.69 samples/sec   Loss 4.1894   LearningRate 0.0244   Epoch: 10   Global Step: 41840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:15,289-Speed 3295.25 samples/sec   Loss 3.9750   LearningRate 0.0244   Epoch: 10   Global Step: 41850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:18,372-Speed 3321.85 samples/sec   Loss 4.1215   LearningRate 0.0244   Epoch: 10   Global Step: 41860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:21,458-Speed 3318.92 samples/sec   Loss 4.2923   LearningRate 0.0244   Epoch: 10   Global Step: 41870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:24,547-Speed 3316.41 samples/sec   Loss 4.1790   LearningRate 0.0244   Epoch: 10   Global Step: 41880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:27,633-Speed 3318.62 samples/sec   Loss 4.1552   LearningRate 0.0244   Epoch: 10   Global Step: 41890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:30,722-Speed 3315.82 samples/sec   Loss 4.1183   LearningRate 0.0243   Epoch: 10   Global Step: 41900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:33,818-Speed 3307.46 samples/sec   Loss 4.1933   LearningRate 0.0243   Epoch: 10   Global Step: 41910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:36,899-Speed 3324.30 samples/sec   Loss 4.2511   LearningRate 0.0243   Epoch: 10   Global Step: 41920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:40,047-Speed 3253.35 samples/sec   Loss 4.2003   LearningRate 0.0243   Epoch: 10   Global Step: 41930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:43,142-Speed 3310.17 samples/sec   Loss 4.2655   LearningRate 0.0243   Epoch: 10   Global Step: 41940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:40:46,215-Speed 3333.22 samples/sec   Loss 4.1540   LearningRate 0.0243   Epoch: 10   Global Step: 41950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:40:49,306-Speed 3315.12 samples/sec   Loss 4.2721   LearningRate 0.0243   Epoch: 10   Global Step: 41960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:40:52,485-Speed 3221.43 samples/sec   Loss 4.2190   LearningRate 0.0243   Epoch: 10   Global Step: 41970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:40:55,666-Speed 3220.00 samples/sec   Loss 4.1936   LearningRate 0.0242   Epoch: 10   Global Step: 41980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:40:58,754-Speed 3316.21 samples/sec   Loss 4.1430   LearningRate 0.0242   Epoch: 10   Global Step: 41990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:41:01,846-Speed 3312.38 samples/sec   Loss 4.3627   LearningRate 0.0242   Epoch: 10   Global Step: 42000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:41:45,711-[lfw][42000]XNorm: 22.834842
Training: 2022-04-26 16:41:45,712-[lfw][42000]Accuracy-Flip: 0.99683+-0.00263
Training: 2022-04-26 16:41:45,713-[lfw][42000]Accuracy-Highest: 0.99783
Training: 2022-04-26 16:42:36,560-[cfp_fp][42000]XNorm: 22.245010
Training: 2022-04-26 16:42:36,562-[cfp_fp][42000]Accuracy-Flip: 0.98629+-0.00593
Training: 2022-04-26 16:42:36,563-[cfp_fp][42000]Accuracy-Highest: 0.98843
Training: 2022-04-26 16:43:20,378-[agedb_30][42000]XNorm: 22.941134
Training: 2022-04-26 16:43:20,380-[agedb_30][42000]Accuracy-Flip: 0.97283+-0.00789
Training: 2022-04-26 16:43:20,380-[agedb_30][42000]Accuracy-Highest: 0.97550
Training: 2022-04-26 16:43:23,463-Speed 72.31 samples/sec   Loss 4.3333   LearningRate 0.0242   Epoch: 10   Global Step: 42010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:43:26,555-Speed 3312.31 samples/sec   Loss 4.2744   LearningRate 0.0242   Epoch: 10   Global Step: 42020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:43:29,640-Speed 3322.39 samples/sec   Loss 4.2628   LearningRate 0.0242   Epoch: 10   Global Step: 42030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:43:32,721-Speed 3324.21 samples/sec   Loss 4.3232   LearningRate 0.0242   Epoch: 10   Global Step: 42040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:43:35,800-Speed 3325.90 samples/sec   Loss 4.2369   LearningRate 0.0242   Epoch: 10   Global Step: 42050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:43:38,886-Speed 3318.94 samples/sec   Loss 4.2725   LearningRate 0.0241   Epoch: 10   Global Step: 42060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:43:41,974-Speed 3317.58 samples/sec   Loss 4.3001   LearningRate 0.0241   Epoch: 10   Global Step: 42070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:43:45,056-Speed 3323.74 samples/sec   Loss 4.2191   LearningRate 0.0241   Epoch: 10   Global Step: 42080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:43:48,141-Speed 3319.10 samples/sec   Loss 4.2610   LearningRate 0.0241   Epoch: 10   Global Step: 42090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:43:51,231-Speed 3315.06 samples/sec   Loss 4.3665   LearningRate 0.0241   Epoch: 10   Global Step: 42100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:43:54,315-Speed 3320.83 samples/sec   Loss 4.3081   LearningRate 0.0241   Epoch: 10   Global Step: 42110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:43:57,399-Speed 3322.52 samples/sec   Loss 4.3137   LearningRate 0.0241   Epoch: 10   Global Step: 42120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:00,483-Speed 3320.88 samples/sec   Loss 4.2668   LearningRate 0.0241   Epoch: 10   Global Step: 42130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:03,568-Speed 3320.83 samples/sec   Loss 4.2935   LearningRate 0.0241   Epoch: 10   Global Step: 42140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:06,648-Speed 3324.81 samples/sec   Loss 4.4038   LearningRate 0.0240   Epoch: 10   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:44:09,717-Speed 3337.72 samples/sec   Loss 4.3684   LearningRate 0.0240   Epoch: 10   Global Step: 42160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:12,801-Speed 3320.11 samples/sec   Loss 4.3836   LearningRate 0.0240   Epoch: 10   Global Step: 42170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:15,889-Speed 3319.77 samples/sec   Loss 4.3854   LearningRate 0.0240   Epoch: 10   Global Step: 42180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:18,977-Speed 3316.40 samples/sec   Loss 4.2733   LearningRate 0.0240   Epoch: 10   Global Step: 42190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:22,059-Speed 3323.24 samples/sec   Loss 4.4050   LearningRate 0.0240   Epoch: 10   Global Step: 42200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:25,151-Speed 3312.96 samples/sec   Loss 4.3573   LearningRate 0.0240   Epoch: 10   Global Step: 42210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:28,236-Speed 3321.29 samples/sec   Loss 4.3561   LearningRate 0.0240   Epoch: 10   Global Step: 42220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:31,324-Speed 3317.38 samples/sec   Loss 4.4208   LearningRate 0.0239   Epoch: 10   Global Step: 42230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:34,420-Speed 3309.94 samples/sec   Loss 4.4392   LearningRate 0.0239   Epoch: 10   Global Step: 42240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:37,534-Speed 3288.96 samples/sec   Loss 4.4217   LearningRate 0.0239   Epoch: 10   Global Step: 42250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:40,602-Speed 3339.17 samples/sec   Loss 4.4523   LearningRate 0.0239   Epoch: 10   Global Step: 42260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:43,688-Speed 3318.65 samples/sec   Loss 4.4370   LearningRate 0.0239   Epoch: 10   Global Step: 42270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:46,776-Speed 3320.67 samples/sec   Loss 4.4156   LearningRate 0.0239   Epoch: 10   Global Step: 42280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:49,874-Speed 3305.75 samples/sec   Loss 4.3999   LearningRate 0.0239   Epoch: 10   Global Step: 42290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:52,988-Speed 3289.86 samples/sec   Loss 4.4133   LearningRate 0.0239   Epoch: 10   Global Step: 42300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:56,076-Speed 3316.82 samples/sec   Loss 4.4306   LearningRate 0.0239   Epoch: 10   Global Step: 42310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:44:59,161-Speed 3321.06 samples/sec   Loss 4.4013   LearningRate 0.0238   Epoch: 10   Global Step: 42320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:02,244-Speed 3321.92 samples/sec   Loss 4.4480   LearningRate 0.0238   Epoch: 10   Global Step: 42330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:05,399-Speed 3250.36 samples/sec   Loss 4.3155   LearningRate 0.0238   Epoch: 10   Global Step: 42340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:08,477-Speed 3327.12 samples/sec   Loss 4.3766   LearningRate 0.0238   Epoch: 10   Global Step: 42350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:11,560-Speed 3321.69 samples/sec   Loss 4.3697   LearningRate 0.0238   Epoch: 10   Global Step: 42360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:14,643-Speed 3322.36 samples/sec   Loss 4.3937   LearningRate 0.0238   Epoch: 10   Global Step: 42370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:17,724-Speed 3324.91 samples/sec   Loss 4.4358   LearningRate 0.0238   Epoch: 10   Global Step: 42380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:20,804-Speed 3324.78 samples/sec   Loss 4.4330   LearningRate 0.0238   Epoch: 10   Global Step: 42390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:23,908-Speed 3300.49 samples/sec   Loss 4.4216   LearningRate 0.0237   Epoch: 10   Global Step: 42400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:27,003-Speed 3317.59 samples/sec   Loss 4.4818   LearningRate 0.0237   Epoch: 10   Global Step: 42410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:30,085-Speed 3323.02 samples/sec   Loss 4.4644   LearningRate 0.0237   Epoch: 10   Global Step: 42420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:33,170-Speed 3324.69 samples/sec   Loss 4.3754   LearningRate 0.0237   Epoch: 10   Global Step: 42430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:36,251-Speed 3323.58 samples/sec   Loss 4.4612   LearningRate 0.0237   Epoch: 10   Global Step: 42440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:39,344-Speed 3311.46 samples/sec   Loss 4.4182   LearningRate 0.0237   Epoch: 10   Global Step: 42450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:42,409-Speed 3341.01 samples/sec   Loss 4.4040   LearningRate 0.0237   Epoch: 10   Global Step: 42460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:45,496-Speed 3319.89 samples/sec   Loss 4.4865   LearningRate 0.0237   Epoch: 10   Global Step: 42470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:48,587-Speed 3312.96 samples/sec   Loss 4.4267   LearningRate 0.0237   Epoch: 10   Global Step: 42480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:51,704-Speed 3285.61 samples/sec   Loss 4.5069   LearningRate 0.0236   Epoch: 10   Global Step: 42490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:54,790-Speed 3319.68 samples/sec   Loss 4.5202   LearningRate 0.0236   Epoch: 10   Global Step: 42500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:45:57,870-Speed 3328.27 samples/sec   Loss 4.3945   LearningRate 0.0236   Epoch: 10   Global Step: 42510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:00,993-Speed 3279.80 samples/sec   Loss 4.3777   LearningRate 0.0236   Epoch: 10   Global Step: 42520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:04,125-Speed 3271.64 samples/sec   Loss 4.5353   LearningRate 0.0236   Epoch: 10   Global Step: 42530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:07,204-Speed 3327.11 samples/sec   Loss 4.5147   LearningRate 0.0236   Epoch: 10   Global Step: 42540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:10,285-Speed 3324.06 samples/sec   Loss 4.4109   LearningRate 0.0236   Epoch: 10   Global Step: 42550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:13,441-Speed 3245.50 samples/sec   Loss 4.4432   LearningRate 0.0236   Epoch: 10   Global Step: 42560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:46:16,512-Speed 3339.39 samples/sec   Loss 4.5031   LearningRate 0.0235   Epoch: 10   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:19,599-Speed 3317.34 samples/sec   Loss 4.5602   LearningRate 0.0235   Epoch: 10   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:22,724-Speed 3277.34 samples/sec   Loss 4.4670   LearningRate 0.0235   Epoch: 10   Global Step: 42590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:25,803-Speed 3326.18 samples/sec   Loss 4.5005   LearningRate 0.0235   Epoch: 10   Global Step: 42600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:28,907-Speed 3322.84 samples/sec   Loss 4.4184   LearningRate 0.0235   Epoch: 10   Global Step: 42610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:31,987-Speed 3325.65 samples/sec   Loss 4.5715   LearningRate 0.0235   Epoch: 10   Global Step: 42620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:35,124-Speed 3315.24 samples/sec   Loss 4.5315   LearningRate 0.0235   Epoch: 10   Global Step: 42630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:38,246-Speed 3280.30 samples/sec   Loss 4.4799   LearningRate 0.0235   Epoch: 10   Global Step: 42640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:41,400-Speed 3247.84 samples/sec   Loss 4.5519   LearningRate 0.0235   Epoch: 10   Global Step: 42650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:44,481-Speed 3324.31 samples/sec   Loss 4.5595   LearningRate 0.0234   Epoch: 10   Global Step: 42660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:47,604-Speed 3323.38 samples/sec   Loss 4.5008   LearningRate 0.0234   Epoch: 10   Global Step: 42670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:46:50,675-Speed 3334.69 samples/sec   Loss 4.5028   LearningRate 0.0234   Epoch: 10   Global Step: 42680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:53,774-Speed 3304.96 samples/sec   Loss 4.5208   LearningRate 0.0234   Epoch: 10   Global Step: 42690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:56,867-Speed 3321.35 samples/sec   Loss 4.5043   LearningRate 0.0234   Epoch: 10   Global Step: 42700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:46:59,957-Speed 3323.52 samples/sec   Loss 4.4770   LearningRate 0.0234   Epoch: 10   Global Step: 42710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:03,152-Speed 3205.31 samples/sec   Loss 4.3273   LearningRate 0.0234   Epoch: 10   Global Step: 42720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:06,547-Speed 3328.49 samples/sec   Loss 4.6345   LearningRate 0.0234   Epoch: 10   Global Step: 42730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:09,625-Speed 3327.54 samples/sec   Loss 4.5307   LearningRate 0.0233   Epoch: 10   Global Step: 42740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:12,706-Speed 3324.05 samples/sec   Loss 4.6050   LearningRate 0.0233   Epoch: 10   Global Step: 42750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:15,790-Speed 3320.99 samples/sec   Loss 4.5700   LearningRate 0.0233   Epoch: 10   Global Step: 42760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:18,892-Speed 3310.42 samples/sec   Loss 4.5328   LearningRate 0.0233   Epoch: 10   Global Step: 42770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:21,977-Speed 3319.45 samples/sec   Loss 4.5824   LearningRate 0.0233   Epoch: 10   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:47:25,045-Speed 3338.65 samples/sec   Loss 4.6753   LearningRate 0.0233   Epoch: 10   Global Step: 42790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:49,207-Speed 3326.09 samples/sec   Loss 4.6284   LearningRate 0.0233   Epoch: 10   Global Step: 42800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:52,290-Speed 3331.97 samples/sec   Loss 4.6634   LearningRate 0.0233   Epoch: 10   Global Step: 42810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:55,371-Speed 3324.56 samples/sec   Loss 4.5193   LearningRate 0.0233   Epoch: 10   Global Step: 42820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:47:59,162-Speed 3320.37 samples/sec   Loss 4.5489   LearningRate 0.0232   Epoch: 10   Global Step: 42830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:02,240-Speed 3327.48 samples/sec   Loss 4.6346   LearningRate 0.0232   Epoch: 10   Global Step: 42840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:05,326-Speed 3332.56 samples/sec   Loss 4.5835   LearningRate 0.0232   Epoch: 10   Global Step: 42850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:08,395-Speed 3337.69 samples/sec   Loss 4.5055   LearningRate 0.0232   Epoch: 10   Global Step: 42860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:11,481-Speed 3318.64 samples/sec   Loss 4.6940   LearningRate 0.0232   Epoch: 10   Global Step: 42870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:14,580-Speed 3305.01 samples/sec   Loss 4.6046   LearningRate 0.0232   Epoch: 10   Global Step: 42880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:17,691-Speed 3330.69 samples/sec   Loss 4.6245   LearningRate 0.0232   Epoch: 10   Global Step: 42890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:48:20,755-Speed 3342.06 samples/sec   Loss 4.6393   LearningRate 0.0232   Epoch: 10   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:23,836-Speed 3324.71 samples/sec   Loss 4.5745   LearningRate 0.0231   Epoch: 10   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:26,908-Speed 3333.18 samples/sec   Loss 4.5852   LearningRate 0.0231   Epoch: 10   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:30,007-Speed 3330.77 samples/sec   Loss 4.5903   LearningRate 0.0231   Epoch: 10   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:33,083-Speed 3330.03 samples/sec   Loss 4.5327   LearningRate 0.0231   Epoch: 10   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:36,158-Speed 3331.34 samples/sec   Loss 4.5713   LearningRate 0.0231   Epoch: 10   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:39,253-Speed 3309.23 samples/sec   Loss 4.6107   LearningRate 0.0231   Epoch: 10   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:42,389-Speed 3265.66 samples/sec   Loss 4.6119   LearningRate 0.0231   Epoch: 10   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:45,471-Speed 3323.87 samples/sec   Loss 4.5950   LearningRate 0.0231   Epoch: 10   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:48,568-Speed 3314.77 samples/sec   Loss 4.6245   LearningRate 0.0231   Epoch: 10   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:48:51,617-Speed 3358.39 samples/sec   Loss 4.6329   LearningRate 0.0230   Epoch: 10   Global Step: 43000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:48:54,701-Speed 3321.21 samples/sec   Loss 4.7454   LearningRate 0.0230   Epoch: 10   Global Step: 43010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:48:57,800-Speed 3323.10 samples/sec   Loss 4.5937   LearningRate 0.0230   Epoch: 10   Global Step: 43020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:00,899-Speed 3304.30 samples/sec   Loss 4.6170   LearningRate 0.0230   Epoch: 10   Global Step: 43030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:04,006-Speed 3297.80 samples/sec   Loss 4.5418   LearningRate 0.0230   Epoch: 10   Global Step: 43040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:07,105-Speed 3305.12 samples/sec   Loss 4.6307   LearningRate 0.0230   Epoch: 10   Global Step: 43050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:10,190-Speed 3319.82 samples/sec   Loss 4.5975   LearningRate 0.0230   Epoch: 10   Global Step: 43060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:13,282-Speed 3313.09 samples/sec   Loss 4.5818   LearningRate 0.0230   Epoch: 10   Global Step: 43070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:16,410-Speed 3317.41 samples/sec   Loss 4.6216   LearningRate 0.0230   Epoch: 10   Global Step: 43080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:19,499-Speed 3316.15 samples/sec   Loss 4.6184   LearningRate 0.0229   Epoch: 10   Global Step: 43090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:49:22,586-Speed 3317.71 samples/sec   Loss 4.7056   LearningRate 0.0229   Epoch: 10   Global Step: 43100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:25,695-Speed 3294.28 samples/sec   Loss 4.6417   LearningRate 0.0229   Epoch: 10   Global Step: 43110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:28,784-Speed 3321.81 samples/sec   Loss 4.7063   LearningRate 0.0229   Epoch: 10   Global Step: 43120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:31,877-Speed 3310.86 samples/sec   Loss 4.5234   LearningRate 0.0229   Epoch: 10   Global Step: 43130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:43,130-Speed 3296.30 samples/sec   Loss 4.6214   LearningRate 0.0229   Epoch: 10   Global Step: 43140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:46,216-Speed 3319.09 samples/sec   Loss 4.6287   LearningRate 0.0229   Epoch: 10   Global Step: 43150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:49,308-Speed 3326.01 samples/sec   Loss 4.6437   LearningRate 0.0229   Epoch: 10   Global Step: 43160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:52,387-Speed 3326.05 samples/sec   Loss 4.7161   LearningRate 0.0228   Epoch: 10   Global Step: 43170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:49:55,471-Speed 3321.97 samples/sec   Loss 4.6298   LearningRate 0.0228   Epoch: 10   Global Step: 43180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:01,974-Speed 3313.11 samples/sec   Loss 4.6229   LearningRate 0.0228   Epoch: 10   Global Step: 43190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:05,064-Speed 3317.69 samples/sec   Loss 4.5425   LearningRate 0.0228   Epoch: 10   Global Step: 43200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:50:08,132-Speed 3338.51 samples/sec   Loss 4.7668   LearningRate 0.0228   Epoch: 10   Global Step: 43210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:11,225-Speed 3313.73 samples/sec   Loss 4.7031   LearningRate 0.0228   Epoch: 10   Global Step: 43220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:14,328-Speed 3300.55 samples/sec   Loss 4.7309   LearningRate 0.0228   Epoch: 10   Global Step: 43230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:17,474-Speed 3255.08 samples/sec   Loss 4.6830   LearningRate 0.0228   Epoch: 10   Global Step: 43240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:20,565-Speed 3313.65 samples/sec   Loss 4.7362   LearningRate 0.0228   Epoch: 10   Global Step: 43250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:23,646-Speed 3326.94 samples/sec   Loss 4.6525   LearningRate 0.0227   Epoch: 10   Global Step: 43260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:26,730-Speed 3328.93 samples/sec   Loss 4.6233   LearningRate 0.0227   Epoch: 10   Global Step: 43270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:29,811-Speed 3324.57 samples/sec   Loss 4.6645   LearningRate 0.0227   Epoch: 10   Global Step: 43280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:32,893-Speed 3323.24 samples/sec   Loss 4.5816   LearningRate 0.0227   Epoch: 10   Global Step: 43290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:39,245-Speed 3296.17 samples/sec   Loss 4.7425   LearningRate 0.0227   Epoch: 10   Global Step: 43300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:42,311-Speed 3340.40 samples/sec   Loss 4.5390   LearningRate 0.0227   Epoch: 10   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:45,395-Speed 3320.80 samples/sec   Loss 4.7391   LearningRate 0.0227   Epoch: 10   Global Step: 43320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:48,478-Speed 3322.98 samples/sec   Loss 4.7303   LearningRate 0.0227   Epoch: 10   Global Step: 43330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:51,574-Speed 3307.51 samples/sec   Loss 4.6600   LearningRate 0.0227   Epoch: 10   Global Step: 43340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:54,705-Speed 3271.54 samples/sec   Loss 4.7940   LearningRate 0.0226   Epoch: 10   Global Step: 43350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:50:57,797-Speed 3311.93 samples/sec   Loss 4.6969   LearningRate 0.0226   Epoch: 10   Global Step: 43360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:51:00,954-Speed 3244.50 samples/sec   Loss 4.7485   LearningRate 0.0226   Epoch: 10   Global Step: 43370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:51:04,047-Speed 3311.98 samples/sec   Loss 4.7261   LearningRate 0.0226   Epoch: 10   Global Step: 43380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:51:11,263-Speed 3306.46 samples/sec   Loss 4.6459   LearningRate 0.0226   Epoch: 10   Global Step: 43390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:51:14,353-Speed 3314.74 samples/sec   Loss 4.6676   LearningRate 0.0226   Epoch: 10   Global Step: 43400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:51:17,454-Speed 3302.84 samples/sec   Loss 4.5565   LearningRate 0.0226   Epoch: 10   Global Step: 43410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:51:20,561-Speed 3296.68 samples/sec   Loss 4.7297   LearningRate 0.0226   Epoch: 10   Global Step: 43420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:51:23,635-Speed 3331.24 samples/sec   Loss 4.6095   LearningRate 0.0225   Epoch: 10   Global Step: 43430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:26,724-Speed 3315.92 samples/sec   Loss 4.6768   LearningRate 0.0225   Epoch: 10   Global Step: 43440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:29,823-Speed 3305.02 samples/sec   Loss 4.6574   LearningRate 0.0225   Epoch: 10   Global Step: 43450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:32,915-Speed 3312.81 samples/sec   Loss 4.6987   LearningRate 0.0225   Epoch: 10   Global Step: 43460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:40,556-Speed 3313.42 samples/sec   Loss 4.7295   LearningRate 0.0225   Epoch: 10   Global Step: 43470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:43,666-Speed 3293.02 samples/sec   Loss 4.7475   LearningRate 0.0225   Epoch: 10   Global Step: 43480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:46,777-Speed 3292.22 samples/sec   Loss 4.7249   LearningRate 0.0225   Epoch: 10   Global Step: 43490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:49,878-Speed 3302.91 samples/sec   Loss 4.7158   LearningRate 0.0225   Epoch: 10   Global Step: 43500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:52,972-Speed 3309.99 samples/sec   Loss 4.7434   LearningRate 0.0225   Epoch: 10   Global Step: 43510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:51:56,074-Speed 3301.71 samples/sec   Loss 4.7025   LearningRate 0.0224   Epoch: 10   Global Step: 43520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:52:08,185-Speed 3310.24 samples/sec   Loss 4.6288   LearningRate 0.0224   Epoch: 10   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:11,268-Speed 3322.25 samples/sec   Loss 4.7827   LearningRate 0.0224   Epoch: 10   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:14,350-Speed 3322.73 samples/sec   Loss 4.7259   LearningRate 0.0224   Epoch: 10   Global Step: 43550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:17,436-Speed 3319.56 samples/sec   Loss 4.7701   LearningRate 0.0224   Epoch: 10   Global Step: 43560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:20,517-Speed 3323.94 samples/sec   Loss 4.7873   LearningRate 0.0224   Epoch: 10   Global Step: 43570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:23,605-Speed 3317.43 samples/sec   Loss 4.8053   LearningRate 0.0224   Epoch: 10   Global Step: 43580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:26,695-Speed 3314.54 samples/sec   Loss 4.6501   LearningRate 0.0224   Epoch: 10   Global Step: 43590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:29,898-Speed 3317.46 samples/sec   Loss 4.7342   LearningRate 0.0224   Epoch: 10   Global Step: 43600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:34,941-Speed 3322.08 samples/sec   Loss 4.7168   LearningRate 0.0223   Epoch: 10   Global Step: 43610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:38,052-Speed 3292.79 samples/sec   Loss 4.7214   LearningRate 0.0223   Epoch: 10   Global Step: 43620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:41,159-Speed 3296.84 samples/sec   Loss 4.7149   LearningRate 0.0223   Epoch: 10   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:44,249-Speed 3313.77 samples/sec   Loss 4.7689   LearningRate 0.0223   Epoch: 10   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:47,332-Speed 3322.05 samples/sec   Loss 4.7028   LearningRate 0.0223   Epoch: 10   Global Step: 43650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:52:50,419-Speed 3318.47 samples/sec   Loss 4.7574   LearningRate 0.0223   Epoch: 10   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:53:02,870-Speed 3341.20 samples/sec   Loss 4.7801   LearningRate 0.0223   Epoch: 10   Global Step: 43670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:05,962-Speed 3314.09 samples/sec   Loss 4.7321   LearningRate 0.0223   Epoch: 10   Global Step: 43680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:09,049-Speed 3317.56 samples/sec   Loss 4.7279   LearningRate 0.0223   Epoch: 10   Global Step: 43690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:12,128-Speed 3326.60 samples/sec   Loss 4.6895   LearningRate 0.0222   Epoch: 10   Global Step: 43700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:15,204-Speed 3328.84 samples/sec   Loss 4.7290   LearningRate 0.0222   Epoch: 10   Global Step: 43710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:18,288-Speed 3321.56 samples/sec   Loss 4.8523   LearningRate 0.0222   Epoch: 10   Global Step: 43720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:21,372-Speed 3320.45 samples/sec   Loss 4.6270   LearningRate 0.0222   Epoch: 10   Global Step: 43730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:34,203-Speed 3313.75 samples/sec   Loss 4.6861   LearningRate 0.0222   Epoch: 10   Global Step: 43740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:37,281-Speed 3327.90 samples/sec   Loss 4.7517   LearningRate 0.0222   Epoch: 10   Global Step: 43750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:40,364-Speed 3322.27 samples/sec   Loss 4.7040   LearningRate 0.0222   Epoch: 10   Global Step: 43760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 16:53:43,455-Speed 3313.01 samples/sec   Loss 4.7470   LearningRate 0.0222   Epoch: 10   Global Step: 43770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:53:46,555-Speed 3303.61 samples/sec   Loss 4.6868   LearningRate 0.0221   Epoch: 10   Global Step: 43780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:53:49,634-Speed 3326.97 samples/sec   Loss 4.6386   LearningRate 0.0221   Epoch: 10   Global Step: 43790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:53:54,035-Speed 3252.79 samples/sec   Loss 4.7822   LearningRate 0.0221   Epoch: 10   Global Step: 43800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:08,729-Speed 3266.19 samples/sec   Loss 4.7112   LearningRate 0.0221   Epoch: 10   Global Step: 43810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:11,828-Speed 3305.51 samples/sec   Loss 4.7819   LearningRate 0.0221   Epoch: 10   Global Step: 43820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:14,908-Speed 3325.95 samples/sec   Loss 4.7000   LearningRate 0.0221   Epoch: 10   Global Step: 43830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:18,001-Speed 3311.60 samples/sec   Loss 4.7473   LearningRate 0.0221   Epoch: 10   Global Step: 43840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:21,079-Speed 3326.95 samples/sec   Loss 4.6785   LearningRate 0.0221   Epoch: 10   Global Step: 43850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:24,193-Speed 3289.09 samples/sec   Loss 4.7477   LearningRate 0.0221   Epoch: 10   Global Step: 43860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:27,284-Speed 3313.23 samples/sec   Loss 4.7813   LearningRate 0.0220   Epoch: 10   Global Step: 43870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:30,381-Speed 3307.54 samples/sec   Loss 4.8487   LearningRate 0.0220   Epoch: 10   Global Step: 43880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:33,473-Speed 3312.83 samples/sec   Loss 4.8063   LearningRate 0.0220   Epoch: 10   Global Step: 43890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:36,562-Speed 3315.09 samples/sec   Loss 4.7810   LearningRate 0.0220   Epoch: 10   Global Step: 43900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:39,673-Speed 3292.64 samples/sec   Loss 4.7713   LearningRate 0.0220   Epoch: 10   Global Step: 43910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:42,770-Speed 3306.91 samples/sec   Loss 4.7678   LearningRate 0.0220   Epoch: 10   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:45,862-Speed 3312.92 samples/sec   Loss 4.8019   LearningRate 0.0220   Epoch: 10   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:48,970-Speed 3295.66 samples/sec   Loss 4.8105   LearningRate 0.0220   Epoch: 10   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:52,061-Speed 3312.98 samples/sec   Loss 4.7726   LearningRate 0.0220   Epoch: 10   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:55,215-Speed 3247.61 samples/sec   Loss 4.6332   LearningRate 0.0219   Epoch: 10   Global Step: 43960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:54:58,296-Speed 3324.25 samples/sec   Loss 4.7203   LearningRate 0.0219   Epoch: 10   Global Step: 43970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:55:01,392-Speed 3308.51 samples/sec   Loss 4.7531   LearningRate 0.0219   Epoch: 10   Global Step: 43980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:55:04,485-Speed 3311.62 samples/sec   Loss 4.7495   LearningRate 0.0219   Epoch: 10   Global Step: 43990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:55:07,585-Speed 3303.23 samples/sec   Loss 4.7393   LearningRate 0.0219   Epoch: 10   Global Step: 44000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:55:51,275-[lfw][44000]XNorm: 23.253167
Training: 2022-04-26 16:55:51,276-[lfw][44000]Accuracy-Flip: 0.99733+-0.00281
Training: 2022-04-26 16:55:51,276-[lfw][44000]Accuracy-Highest: 0.99783
Training: 2022-04-26 16:56:42,248-[cfp_fp][44000]XNorm: 22.426774
Training: 2022-04-26 16:56:42,249-[cfp_fp][44000]Accuracy-Flip: 0.98700+-0.00591
Training: 2022-04-26 16:56:42,249-[cfp_fp][44000]Accuracy-Highest: 0.98843
Training: 2022-04-26 16:57:26,094-[agedb_30][44000]XNorm: 23.411240
Training: 2022-04-26 16:57:26,095-[agedb_30][44000]Accuracy-Flip: 0.97533+-0.00748
Training: 2022-04-26 16:57:26,095-[agedb_30][44000]Accuracy-Highest: 0.97550
Training: 2022-04-26 16:57:29,166-Speed 72.33 samples/sec   Loss 4.7477   LearningRate 0.0219   Epoch: 10   Global Step: 44010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:32,227-Speed 3346.33 samples/sec   Loss 4.6839   LearningRate 0.0219   Epoch: 10   Global Step: 44020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:35,291-Speed 3342.38 samples/sec   Loss 4.7285   LearningRate 0.0219   Epoch: 10   Global Step: 44030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:38,360-Speed 3337.73 samples/sec   Loss 4.7994   LearningRate 0.0219   Epoch: 10   Global Step: 44040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:41,446-Speed 3318.68 samples/sec   Loss 4.7170   LearningRate 0.0218   Epoch: 10   Global Step: 44050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:44,515-Speed 3337.25 samples/sec   Loss 4.7590   LearningRate 0.0218   Epoch: 10   Global Step: 44060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:47,606-Speed 3313.34 samples/sec   Loss 4.8465   LearningRate 0.0218   Epoch: 10   Global Step: 44070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 16:57:50,667-Speed 3345.56 samples/sec   Loss 4.6836   LearningRate 0.0218   Epoch: 10   Global Step: 44080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:53,748-Speed 3324.64 samples/sec   Loss 4.7257   LearningRate 0.0218   Epoch: 10   Global Step: 44090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:56,838-Speed 3315.06 samples/sec   Loss 4.8067   LearningRate 0.0218   Epoch: 10   Global Step: 44100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:57:59,922-Speed 3321.14 samples/sec   Loss 4.8046   LearningRate 0.0218   Epoch: 10   Global Step: 44110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:03,011-Speed 3315.29 samples/sec   Loss 4.7723   LearningRate 0.0218   Epoch: 10   Global Step: 44120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:06,090-Speed 3326.90 samples/sec   Loss 4.7863   LearningRate 0.0218   Epoch: 10   Global Step: 44130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:09,171-Speed 3324.67 samples/sec   Loss 4.8672   LearningRate 0.0217   Epoch: 10   Global Step: 44140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:12,266-Speed 3309.39 samples/sec   Loss 4.7747   LearningRate 0.0217   Epoch: 10   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:15,349-Speed 3321.61 samples/sec   Loss 4.7227   LearningRate 0.0217   Epoch: 10   Global Step: 44160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:18,435-Speed 3318.43 samples/sec   Loss 4.8172   LearningRate 0.0217   Epoch: 10   Global Step: 44170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:21,510-Speed 3331.27 samples/sec   Loss 4.7331   LearningRate 0.0217   Epoch: 10   Global Step: 44180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:24,597-Speed 3318.49 samples/sec   Loss 4.6783   LearningRate 0.0217   Epoch: 10   Global Step: 44190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:27,686-Speed 3315.85 samples/sec   Loss 4.8156   LearningRate 0.0217   Epoch: 10   Global Step: 44200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:30,774-Speed 3316.99 samples/sec   Loss 4.7424   LearningRate 0.0217   Epoch: 10   Global Step: 44210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:33,865-Speed 3313.39 samples/sec   Loss 4.7722   LearningRate 0.0217   Epoch: 10   Global Step: 44220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:36,955-Speed 3314.43 samples/sec   Loss 4.8030   LearningRate 0.0216   Epoch: 10   Global Step: 44230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:40,054-Speed 3305.34 samples/sec   Loss 4.7686   LearningRate 0.0216   Epoch: 10   Global Step: 44240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:43,146-Speed 3311.78 samples/sec   Loss 4.8027   LearningRate 0.0216   Epoch: 10   Global Step: 44250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:46,233-Speed 3317.99 samples/sec   Loss 4.8629   LearningRate 0.0216   Epoch: 10   Global Step: 44260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:49,334-Speed 3303.14 samples/sec   Loss 4.8567   LearningRate 0.0216   Epoch: 10   Global Step: 44270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:52,404-Speed 3335.98 samples/sec   Loss 4.8786   LearningRate 0.0216   Epoch: 10   Global Step: 44280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:55,570-Speed 3235.71 samples/sec   Loss 4.8752   LearningRate 0.0216   Epoch: 10   Global Step: 44290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:58:58,657-Speed 3317.89 samples/sec   Loss 4.8605   LearningRate 0.0216   Epoch: 10   Global Step: 44300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:01,739-Speed 3323.17 samples/sec   Loss 4.8220   LearningRate 0.0215   Epoch: 10   Global Step: 44310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:04,845-Speed 3296.91 samples/sec   Loss 4.7978   LearningRate 0.0215   Epoch: 10   Global Step: 44320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:07,930-Speed 3320.85 samples/sec   Loss 4.8153   LearningRate 0.0215   Epoch: 10   Global Step: 44330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:11,006-Speed 3328.96 samples/sec   Loss 4.8094   LearningRate 0.0215   Epoch: 10   Global Step: 44340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:14,112-Speed 3297.81 samples/sec   Loss 4.8181   LearningRate 0.0215   Epoch: 10   Global Step: 44350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:17,189-Speed 3328.65 samples/sec   Loss 4.7570   LearningRate 0.0215   Epoch: 10   Global Step: 44360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:20,264-Speed 3330.89 samples/sec   Loss 4.8223   LearningRate 0.0215   Epoch: 10   Global Step: 44370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:23,319-Speed 3352.99 samples/sec   Loss 4.7459   LearningRate 0.0215   Epoch: 10   Global Step: 44380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:26,391-Speed 3334.78 samples/sec   Loss 4.7230   LearningRate 0.0215   Epoch: 10   Global Step: 44390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:29,466-Speed 3330.25 samples/sec   Loss 4.6959   LearningRate 0.0214   Epoch: 10   Global Step: 44400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:32,533-Speed 3338.83 samples/sec   Loss 4.8502   LearningRate 0.0214   Epoch: 10   Global Step: 44410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:35,625-Speed 3312.87 samples/sec   Loss 4.9205   LearningRate 0.0214   Epoch: 10   Global Step: 44420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:38,710-Speed 3320.11 samples/sec   Loss 4.7744   LearningRate 0.0214   Epoch: 10   Global Step: 44430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:41,782-Speed 3333.67 samples/sec   Loss 4.6913   LearningRate 0.0214   Epoch: 10   Global Step: 44440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:44,852-Speed 3336.79 samples/sec   Loss 4.7312   LearningRate 0.0214   Epoch: 10   Global Step: 44450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:47,929-Speed 3328.99 samples/sec   Loss 4.7485   LearningRate 0.0214   Epoch: 10   Global Step: 44460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:50,992-Speed 3343.03 samples/sec   Loss 4.7791   LearningRate 0.0214   Epoch: 10   Global Step: 44470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:54,047-Speed 3353.08 samples/sec   Loss 4.7611   LearningRate 0.0214   Epoch: 10   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 16:59:57,119-Speed 3334.49 samples/sec   Loss 4.7010   LearningRate 0.0213   Epoch: 10   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:00,187-Speed 3338.95 samples/sec   Loss 4.7955   LearningRate 0.0213   Epoch: 10   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:03,256-Speed 3336.94 samples/sec   Loss 4.8174   LearningRate 0.0213   Epoch: 10   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:06,337-Speed 3324.72 samples/sec   Loss 4.7062   LearningRate 0.0213   Epoch: 10   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:09,408-Speed 3334.43 samples/sec   Loss 4.8284   LearningRate 0.0213   Epoch: 10   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:12,509-Speed 3303.85 samples/sec   Loss 4.8334   LearningRate 0.0213   Epoch: 10   Global Step: 44540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:15,601-Speed 3311.51 samples/sec   Loss 4.8261   LearningRate 0.0213   Epoch: 10   Global Step: 44550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:18,672-Speed 3335.82 samples/sec   Loss 4.8244   LearningRate 0.0213   Epoch: 10   Global Step: 44560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:21,741-Speed 3337.56 samples/sec   Loss 4.7477   LearningRate 0.0213   Epoch: 10   Global Step: 44570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:24,794-Speed 3354.78 samples/sec   Loss 4.8096   LearningRate 0.0212   Epoch: 10   Global Step: 44580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:27,865-Speed 3334.89 samples/sec   Loss 4.8459   LearningRate 0.0212   Epoch: 10   Global Step: 44590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:30,935-Speed 3336.39 samples/sec   Loss 4.7339   LearningRate 0.0212   Epoch: 10   Global Step: 44600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:34,009-Speed 3332.38 samples/sec   Loss 4.7548   LearningRate 0.0212   Epoch: 10   Global Step: 44610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:37,075-Speed 3339.79 samples/sec   Loss 4.7988   LearningRate 0.0212   Epoch: 10   Global Step: 44620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:40,147-Speed 3334.23 samples/sec   Loss 4.7196   LearningRate 0.0212   Epoch: 10   Global Step: 44630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:43,235-Speed 3317.38 samples/sec   Loss 4.8050   LearningRate 0.0212   Epoch: 10   Global Step: 44640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:46,302-Speed 3339.80 samples/sec   Loss 4.8013   LearningRate 0.0212   Epoch: 10   Global Step: 44650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:49,373-Speed 3335.30 samples/sec   Loss 4.7885   LearningRate 0.0212   Epoch: 10   Global Step: 44660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:52,465-Speed 3312.41 samples/sec   Loss 4.7442   LearningRate 0.0211   Epoch: 10   Global Step: 44670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:00:55,562-Speed 3306.71 samples/sec   Loss 4.8006   LearningRate 0.0211   Epoch: 10   Global Step: 44680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 17:00:58,626-Speed 3342.85 samples/sec   Loss 4.8562   LearningRate 0.0211   Epoch: 10   Global Step: 44690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:01,706-Speed 3325.90 samples/sec   Loss 4.7552   LearningRate 0.0211   Epoch: 10   Global Step: 44700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:04,777-Speed 3335.11 samples/sec   Loss 4.7468   LearningRate 0.0211   Epoch: 10   Global Step: 44710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:07,885-Speed 3295.65 samples/sec   Loss 4.8233   LearningRate 0.0211   Epoch: 10   Global Step: 44720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:10,961-Speed 3328.83 samples/sec   Loss 4.7082   LearningRate 0.0211   Epoch: 10   Global Step: 44730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:14,036-Speed 3331.99 samples/sec   Loss 4.7064   LearningRate 0.0211   Epoch: 10   Global Step: 44740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:17,114-Speed 3327.06 samples/sec   Loss 4.7854   LearningRate 0.0211   Epoch: 10   Global Step: 44750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:20,195-Speed 3324.32 samples/sec   Loss 4.8250   LearningRate 0.0210   Epoch: 10   Global Step: 44760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:23,275-Speed 3325.73 samples/sec   Loss 4.8128   LearningRate 0.0210   Epoch: 10   Global Step: 44770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:26,350-Speed 3331.34 samples/sec   Loss 4.7505   LearningRate 0.0210   Epoch: 10   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:29,411-Speed 3345.83 samples/sec   Loss 4.7967   LearningRate 0.0210   Epoch: 10   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:32,488-Speed 3328.91 samples/sec   Loss 4.8543   LearningRate 0.0210   Epoch: 10   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:35,562-Speed 3331.60 samples/sec   Loss 4.7292   LearningRate 0.0210   Epoch: 10   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:38,634-Speed 3334.16 samples/sec   Loss 4.6950   LearningRate 0.0210   Epoch: 10   Global Step: 44820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:41,720-Speed 3317.94 samples/sec   Loss 4.7397   LearningRate 0.0210   Epoch: 10   Global Step: 44830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:44,805-Speed 3320.52 samples/sec   Loss 4.8098   LearningRate 0.0210   Epoch: 10   Global Step: 44840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:47,888-Speed 3322.94 samples/sec   Loss 4.7774   LearningRate 0.0209   Epoch: 10   Global Step: 44850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:51,008-Speed 3282.78 samples/sec   Loss 4.7237   LearningRate 0.0209   Epoch: 10   Global Step: 44860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:54,178-Speed 3230.85 samples/sec   Loss 4.7816   LearningRate 0.0209   Epoch: 10   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:01:57,238-Speed 3346.93 samples/sec   Loss 4.8138   LearningRate 0.0209   Epoch: 10   Global Step: 44880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:00,318-Speed 3326.09 samples/sec   Loss 4.7174   LearningRate 0.0209   Epoch: 10   Global Step: 44890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:03,395-Speed 3328.07 samples/sec   Loss 4.8055   LearningRate 0.0209   Epoch: 10   Global Step: 44900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:06,479-Speed 3320.66 samples/sec   Loss 4.9384   LearningRate 0.0209   Epoch: 10   Global Step: 44910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:09,559-Speed 3325.65 samples/sec   Loss 4.7693   LearningRate 0.0209   Epoch: 10   Global Step: 44920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:12,644-Speed 3320.41 samples/sec   Loss 4.8093   LearningRate 0.0209   Epoch: 10   Global Step: 44930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:15,722-Speed 3328.27 samples/sec   Loss 4.8369   LearningRate 0.0208   Epoch: 10   Global Step: 44940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:18,799-Speed 3328.20 samples/sec   Loss 4.7530   LearningRate 0.0208   Epoch: 10   Global Step: 44950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:21,884-Speed 3319.91 samples/sec   Loss 4.7431   LearningRate 0.0208   Epoch: 10   Global Step: 44960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:24,960-Speed 3330.20 samples/sec   Loss 4.8069   LearningRate 0.0208   Epoch: 10   Global Step: 44970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:02:28,081-Speed 3281.33 samples/sec   Loss 4.7929   LearningRate 0.0208   Epoch: 10   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:31,157-Speed 3329.29 samples/sec   Loss 4.8971   LearningRate 0.0208   Epoch: 10   Global Step: 44990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:34,240-Speed 3321.99 samples/sec   Loss 4.8178   LearningRate 0.0208   Epoch: 10   Global Step: 45000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:37,323-Speed 3322.37 samples/sec   Loss 4.7995   LearningRate 0.0208   Epoch: 10   Global Step: 45010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:40,518-Speed 3205.49 samples/sec   Loss 4.7109   LearningRate 0.0208   Epoch: 10   Global Step: 45020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:43,615-Speed 3308.66 samples/sec   Loss 4.8225   LearningRate 0.0207   Epoch: 10   Global Step: 45030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:46,693-Speed 3327.00 samples/sec   Loss 4.8032   LearningRate 0.0207   Epoch: 10   Global Step: 45040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:49,771-Speed 3327.33 samples/sec   Loss 4.7841   LearningRate 0.0207   Epoch: 10   Global Step: 45050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:52,921-Speed 3251.11 samples/sec   Loss 4.8353   LearningRate 0.0207   Epoch: 10   Global Step: 45060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:55,997-Speed 3329.87 samples/sec   Loss 4.8800   LearningRate 0.0207   Epoch: 10   Global Step: 45070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:02:59,069-Speed 3334.01 samples/sec   Loss 4.8490   LearningRate 0.0207   Epoch: 10   Global Step: 45080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:02,148-Speed 3327.54 samples/sec   Loss 4.7457   LearningRate 0.0207   Epoch: 10   Global Step: 45090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:05,232-Speed 3320.36 samples/sec   Loss 4.7156   LearningRate 0.0207   Epoch: 10   Global Step: 45100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:08,307-Speed 3330.46 samples/sec   Loss 4.8353   LearningRate 0.0207   Epoch: 10   Global Step: 45110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:11,382-Speed 3330.80 samples/sec   Loss 4.8175   LearningRate 0.0206   Epoch: 10   Global Step: 45120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:14,466-Speed 3321.66 samples/sec   Loss 4.7967   LearningRate 0.0206   Epoch: 10   Global Step: 45130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:17,556-Speed 3314.52 samples/sec   Loss 4.8209   LearningRate 0.0206   Epoch: 10   Global Step: 45140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:20,638-Speed 3323.53 samples/sec   Loss 4.8637   LearningRate 0.0206   Epoch: 10   Global Step: 45150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:23,721-Speed 3321.68 samples/sec   Loss 4.8900   LearningRate 0.0206   Epoch: 10   Global Step: 45160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:26,800-Speed 3327.37 samples/sec   Loss 4.7915   LearningRate 0.0206   Epoch: 10   Global Step: 45170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:03:29,881-Speed 3323.24 samples/sec   Loss 4.8557   LearningRate 0.0206   Epoch: 10   Global Step: 45180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:32,962-Speed 3324.66 samples/sec   Loss 4.7339   LearningRate 0.0206   Epoch: 10   Global Step: 45190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:36,043-Speed 3324.16 samples/sec   Loss 4.8074   LearningRate 0.0206   Epoch: 10   Global Step: 45200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:39,130-Speed 3317.56 samples/sec   Loss 4.7359   LearningRate 0.0206   Epoch: 10   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:42,212-Speed 3323.71 samples/sec   Loss 4.8828   LearningRate 0.0205   Epoch: 10   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:45,293-Speed 3324.38 samples/sec   Loss 4.7615   LearningRate 0.0205   Epoch: 10   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:48,384-Speed 3313.88 samples/sec   Loss 4.7665   LearningRate 0.0205   Epoch: 10   Global Step: 45240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:51,467-Speed 3322.63 samples/sec   Loss 4.6964   LearningRate 0.0205   Epoch: 10   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:54,557-Speed 3314.72 samples/sec   Loss 4.7385   LearningRate 0.0205   Epoch: 10   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:03:57,634-Speed 3328.76 samples/sec   Loss 4.7697   LearningRate 0.0205   Epoch: 10   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:00,700-Speed 3340.38 samples/sec   Loss 4.7793   LearningRate 0.0205   Epoch: 10   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:03,780-Speed 3324.90 samples/sec   Loss 4.7900   LearningRate 0.0205   Epoch: 10   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:06,868-Speed 3317.62 samples/sec   Loss 4.7600   LearningRate 0.0205   Epoch: 10   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:09,949-Speed 3324.10 samples/sec   Loss 4.6209   LearningRate 0.0204   Epoch: 10   Global Step: 45310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:13,029-Speed 3325.14 samples/sec   Loss 4.7654   LearningRate 0.0204   Epoch: 10   Global Step: 45320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:16,120-Speed 3313.96 samples/sec   Loss 4.8531   LearningRate 0.0204   Epoch: 10   Global Step: 45330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:19,203-Speed 3322.04 samples/sec   Loss 4.7580   LearningRate 0.0204   Epoch: 10   Global Step: 45340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:22,284-Speed 3324.08 samples/sec   Loss 4.8117   LearningRate 0.0204   Epoch: 10   Global Step: 45350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:25,367-Speed 3322.00 samples/sec   Loss 4.7793   LearningRate 0.0204   Epoch: 10   Global Step: 45360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:28,463-Speed 3308.15 samples/sec   Loss 4.7179   LearningRate 0.0204   Epoch: 10   Global Step: 45370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:31,528-Speed 3341.70 samples/sec   Loss 4.8293   LearningRate 0.0204   Epoch: 10   Global Step: 45380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:04:34,607-Speed 3326.68 samples/sec   Loss 4.7252   LearningRate 0.0204   Epoch: 10   Global Step: 45390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:37,701-Speed 3309.98 samples/sec   Loss 4.7705   LearningRate 0.0203   Epoch: 10   Global Step: 45400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:40,793-Speed 3312.83 samples/sec   Loss 4.8472   LearningRate 0.0203   Epoch: 10   Global Step: 45410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:43,881-Speed 3317.20 samples/sec   Loss 4.7766   LearningRate 0.0203   Epoch: 10   Global Step: 45420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:46,968-Speed 3318.08 samples/sec   Loss 4.7256   LearningRate 0.0203   Epoch: 10   Global Step: 45430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:50,046-Speed 3327.74 samples/sec   Loss 4.8158   LearningRate 0.0203   Epoch: 10   Global Step: 45440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:53,145-Speed 3304.37 samples/sec   Loss 4.8356   LearningRate 0.0203   Epoch: 10   Global Step: 45450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:56,232-Speed 3318.47 samples/sec   Loss 4.6995   LearningRate 0.0203   Epoch: 10   Global Step: 45460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:04:59,328-Speed 3307.68 samples/sec   Loss 4.7368   LearningRate 0.0203   Epoch: 10   Global Step: 45470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:05:02,480-Speed 3248.91 samples/sec   Loss 4.6983   LearningRate 0.0203   Epoch: 10   Global Step: 45480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:05:14,769-Speed 833.35 samples/sec   Loss 4.0392   LearningRate 0.0202   Epoch: 11   Global Step: 45490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:17,898-Speed 3274.28 samples/sec   Loss 3.2629   LearningRate 0.0202   Epoch: 11   Global Step: 45500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:20,989-Speed 3313.68 samples/sec   Loss 3.3602   LearningRate 0.0202   Epoch: 11   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:24,084-Speed 3308.91 samples/sec   Loss 3.3521   LearningRate 0.0202   Epoch: 11   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:27,169-Speed 3320.47 samples/sec   Loss 3.3720   LearningRate 0.0202   Epoch: 11   Global Step: 45530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:30,249-Speed 3324.56 samples/sec   Loss 3.3334   LearningRate 0.0202   Epoch: 11   Global Step: 45540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:33,333-Speed 3321.19 samples/sec   Loss 3.3198   LearningRate 0.0202   Epoch: 11   Global Step: 45550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:36,414-Speed 3324.80 samples/sec   Loss 3.3251   LearningRate 0.0202   Epoch: 11   Global Step: 45560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:39,501-Speed 3317.69 samples/sec   Loss 3.3296   LearningRate 0.0202   Epoch: 11   Global Step: 45570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:42,579-Speed 3327.88 samples/sec   Loss 3.4564   LearningRate 0.0201   Epoch: 11   Global Step: 45580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:45,647-Speed 3338.67 samples/sec   Loss 3.3443   LearningRate 0.0201   Epoch: 11   Global Step: 45590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:48,727-Speed 3324.94 samples/sec   Loss 3.3808   LearningRate 0.0201   Epoch: 11   Global Step: 45600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:05:51,796-Speed 3337.37 samples/sec   Loss 3.4081   LearningRate 0.0201   Epoch: 11   Global Step: 45610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:05:54,937-Speed 3261.34 samples/sec   Loss 3.4304   LearningRate 0.0201   Epoch: 11   Global Step: 45620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:05:58,023-Speed 3318.50 samples/sec   Loss 3.4282   LearningRate 0.0201   Epoch: 11   Global Step: 45630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:01,109-Speed 3318.76 samples/sec   Loss 3.3962   LearningRate 0.0201   Epoch: 11   Global Step: 45640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:04,196-Speed 3317.57 samples/sec   Loss 3.2727   LearningRate 0.0201   Epoch: 11   Global Step: 45650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:07,282-Speed 3319.41 samples/sec   Loss 3.4410   LearningRate 0.0201   Epoch: 11   Global Step: 45660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:10,379-Speed 3307.70 samples/sec   Loss 3.4148   LearningRate 0.0200   Epoch: 11   Global Step: 45670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:13,465-Speed 3319.30 samples/sec   Loss 3.4787   LearningRate 0.0200   Epoch: 11   Global Step: 45680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:16,550-Speed 3319.09 samples/sec   Loss 3.4697   LearningRate 0.0200   Epoch: 11   Global Step: 45690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:19,633-Speed 3322.77 samples/sec   Loss 3.4620   LearningRate 0.0200   Epoch: 11   Global Step: 45700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:06:22,713-Speed 3325.10 samples/sec   Loss 3.3959   LearningRate 0.0200   Epoch: 11   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:25,800-Speed 3317.64 samples/sec   Loss 3.4670   LearningRate 0.0200   Epoch: 11   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:28,886-Speed 3319.30 samples/sec   Loss 3.4490   LearningRate 0.0200   Epoch: 11   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:31,974-Speed 3316.36 samples/sec   Loss 3.3818   LearningRate 0.0200   Epoch: 11   Global Step: 45740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:35,072-Speed 3306.66 samples/sec   Loss 3.4230   LearningRate 0.0200   Epoch: 11   Global Step: 45750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:38,158-Speed 3319.27 samples/sec   Loss 3.5393   LearningRate 0.0200   Epoch: 11   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:41,255-Speed 3306.80 samples/sec   Loss 3.4531   LearningRate 0.0199   Epoch: 11   Global Step: 45770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:44,338-Speed 3322.53 samples/sec   Loss 3.6237   LearningRate 0.0199   Epoch: 11   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:47,454-Speed 3286.76 samples/sec   Loss 3.5027   LearningRate 0.0199   Epoch: 11   Global Step: 45790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:50,540-Speed 3319.23 samples/sec   Loss 3.5882   LearningRate 0.0199   Epoch: 11   Global Step: 45800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:53,624-Speed 3320.70 samples/sec   Loss 3.5152   LearningRate 0.0199   Epoch: 11   Global Step: 45810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 17:06:56,700-Speed 3330.18 samples/sec   Loss 3.5094   LearningRate 0.0199   Epoch: 11   Global Step: 45820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:06:59,797-Speed 3306.27 samples/sec   Loss 3.5860   LearningRate 0.0199   Epoch: 11   Global Step: 45830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:02,885-Speed 3316.66 samples/sec   Loss 3.5940   LearningRate 0.0199   Epoch: 11   Global Step: 45840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:05,982-Speed 3307.30 samples/sec   Loss 3.5343   LearningRate 0.0199   Epoch: 11   Global Step: 45850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:09,079-Speed 3307.78 samples/sec   Loss 3.4992   LearningRate 0.0198   Epoch: 11   Global Step: 45860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:12,166-Speed 3318.32 samples/sec   Loss 3.5779   LearningRate 0.0198   Epoch: 11   Global Step: 45870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:15,258-Speed 3312.55 samples/sec   Loss 3.5574   LearningRate 0.0198   Epoch: 11   Global Step: 45880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:18,340-Speed 3323.58 samples/sec   Loss 3.4795   LearningRate 0.0198   Epoch: 11   Global Step: 45890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:21,428-Speed 3316.58 samples/sec   Loss 3.5298   LearningRate 0.0198   Epoch: 11   Global Step: 45900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:24,523-Speed 3309.79 samples/sec   Loss 3.5591   LearningRate 0.0198   Epoch: 11   Global Step: 45910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:27,591-Speed 3337.96 samples/sec   Loss 3.5753   LearningRate 0.0198   Epoch: 11   Global Step: 45920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:30,683-Speed 3312.38 samples/sec   Loss 3.6095   LearningRate 0.0198   Epoch: 11   Global Step: 45930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:33,773-Speed 3314.75 samples/sec   Loss 3.4952   LearningRate 0.0198   Epoch: 11   Global Step: 45940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:36,859-Speed 3318.89 samples/sec   Loss 3.5789   LearningRate 0.0197   Epoch: 11   Global Step: 45950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:39,951-Speed 3311.96 samples/sec   Loss 3.5075   LearningRate 0.0197   Epoch: 11   Global Step: 45960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:43,035-Speed 3322.28 samples/sec   Loss 3.6391   LearningRate 0.0197   Epoch: 11   Global Step: 45970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:46,119-Speed 3320.61 samples/sec   Loss 3.5910   LearningRate 0.0197   Epoch: 11   Global Step: 45980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:49,214-Speed 3308.79 samples/sec   Loss 3.6073   LearningRate 0.0197   Epoch: 11   Global Step: 45990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:07:52,293-Speed 3326.93 samples/sec   Loss 3.6439   LearningRate 0.0197   Epoch: 11   Global Step: 46000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:08:35,775-[lfw][46000]XNorm: 23.678352
Training: 2022-04-26 17:08:35,776-[lfw][46000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-26 17:08:35,776-[lfw][46000]Accuracy-Highest: 0.99783
Training: 2022-04-26 17:09:26,477-[cfp_fp][46000]XNorm: 22.983038
Training: 2022-04-26 17:09:26,478-[cfp_fp][46000]Accuracy-Flip: 0.98786+-0.00430
Training: 2022-04-26 17:09:26,478-[cfp_fp][46000]Accuracy-Highest: 0.98843
Training: 2022-04-26 17:10:10,038-[agedb_30][46000]XNorm: 24.127660
Training: 2022-04-26 17:10:10,038-[agedb_30][46000]Accuracy-Flip: 0.97400+-0.00655
Training: 2022-04-26 17:10:10,039-[agedb_30][46000]Accuracy-Highest: 0.97550
Training: 2022-04-26 17:10:13,120-Speed 72.71 samples/sec   Loss 3.5062   LearningRate 0.0197   Epoch: 11   Global Step: 46010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:16,287-Speed 3234.19 samples/sec   Loss 3.6133   LearningRate 0.0197   Epoch: 11   Global Step: 46020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:19,369-Speed 3323.23 samples/sec   Loss 3.6045   LearningRate 0.0197   Epoch: 11   Global Step: 46030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:22,451-Speed 3323.25 samples/sec   Loss 3.6769   LearningRate 0.0197   Epoch: 11   Global Step: 46040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:25,531-Speed 3325.83 samples/sec   Loss 3.5766   LearningRate 0.0196   Epoch: 11   Global Step: 46050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:28,616-Speed 3319.16 samples/sec   Loss 3.5742   LearningRate 0.0196   Epoch: 11   Global Step: 46060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:31,706-Speed 3314.48 samples/sec   Loss 3.6936   LearningRate 0.0196   Epoch: 11   Global Step: 46070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:34,803-Speed 3307.79 samples/sec   Loss 3.7137   LearningRate 0.0196   Epoch: 11   Global Step: 46080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:37,890-Speed 3318.30 samples/sec   Loss 3.5882   LearningRate 0.0196   Epoch: 11   Global Step: 46090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:41,014-Speed 3277.88 samples/sec   Loss 3.7101   LearningRate 0.0196   Epoch: 11   Global Step: 46100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:44,109-Speed 3309.50 samples/sec   Loss 3.7332   LearningRate 0.0196   Epoch: 11   Global Step: 46110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:47,196-Speed 3317.72 samples/sec   Loss 3.6728   LearningRate 0.0196   Epoch: 11   Global Step: 46120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 17:10:50,383-Speed 3214.06 samples/sec   Loss 3.6542   LearningRate 0.0196   Epoch: 11   Global Step: 46130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:10:53,540-Speed 3244.73 samples/sec   Loss 3.6733   LearningRate 0.0195   Epoch: 11   Global Step: 46140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:10:56,636-Speed 3307.92 samples/sec   Loss 3.5773   LearningRate 0.0195   Epoch: 11   Global Step: 46150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:10:59,728-Speed 3312.06 samples/sec   Loss 3.6321   LearningRate 0.0195   Epoch: 11   Global Step: 46160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:02,838-Speed 3292.82 samples/sec   Loss 3.7012   LearningRate 0.0195   Epoch: 11   Global Step: 46170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:05,936-Speed 3306.37 samples/sec   Loss 3.7620   LearningRate 0.0195   Epoch: 11   Global Step: 46180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:09,032-Speed 3309.08 samples/sec   Loss 3.7204   LearningRate 0.0195   Epoch: 11   Global Step: 46190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:12,125-Speed 3310.84 samples/sec   Loss 3.6950   LearningRate 0.0195   Epoch: 11   Global Step: 46200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:15,217-Speed 3312.55 samples/sec   Loss 3.6975   LearningRate 0.0195   Epoch: 11   Global Step: 46210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:18,320-Speed 3300.62 samples/sec   Loss 3.7033   LearningRate 0.0195   Epoch: 11   Global Step: 46220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:21,424-Speed 3300.14 samples/sec   Loss 3.5944   LearningRate 0.0194   Epoch: 11   Global Step: 46230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:11:24,530-Speed 3297.88 samples/sec   Loss 3.7474   LearningRate 0.0194   Epoch: 11   Global Step: 46240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:27,654-Speed 3278.13 samples/sec   Loss 3.7761   LearningRate 0.0194   Epoch: 11   Global Step: 46250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:30,758-Speed 3299.53 samples/sec   Loss 3.8196   LearningRate 0.0194   Epoch: 11   Global Step: 46260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:33,856-Speed 3305.66 samples/sec   Loss 3.6573   LearningRate 0.0194   Epoch: 11   Global Step: 46270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:36,960-Speed 3300.53 samples/sec   Loss 3.7595   LearningRate 0.0194   Epoch: 11   Global Step: 46280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:40,062-Speed 3301.84 samples/sec   Loss 3.7410   LearningRate 0.0194   Epoch: 11   Global Step: 46290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:43,160-Speed 3306.03 samples/sec   Loss 3.7772   LearningRate 0.0194   Epoch: 11   Global Step: 46300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:46,258-Speed 3306.04 samples/sec   Loss 3.7288   LearningRate 0.0194   Epoch: 11   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:49,388-Speed 3272.45 samples/sec   Loss 3.7449   LearningRate 0.0194   Epoch: 11   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:52,523-Speed 3266.52 samples/sec   Loss 3.8295   LearningRate 0.0193   Epoch: 11   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:11:55,610-Speed 3318.16 samples/sec   Loss 3.7899   LearningRate 0.0193   Epoch: 11   Global Step: 46340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 17:11:58,689-Speed 3325.99 samples/sec   Loss 3.7954   LearningRate 0.0193   Epoch: 11   Global Step: 46350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:01,779-Speed 3315.09 samples/sec   Loss 3.8220   LearningRate 0.0193   Epoch: 11   Global Step: 46360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:04,907-Speed 3274.36 samples/sec   Loss 3.7563   LearningRate 0.0193   Epoch: 11   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:07,994-Speed 3318.41 samples/sec   Loss 3.7700   LearningRate 0.0193   Epoch: 11   Global Step: 46380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:11,081-Speed 3317.05 samples/sec   Loss 3.8276   LearningRate 0.0193   Epoch: 11   Global Step: 46390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:14,211-Speed 3272.82 samples/sec   Loss 3.7724   LearningRate 0.0193   Epoch: 11   Global Step: 46400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:17,422-Speed 3189.40 samples/sec   Loss 3.8505   LearningRate 0.0193   Epoch: 11   Global Step: 46410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:20,504-Speed 3323.42 samples/sec   Loss 3.8492   LearningRate 0.0192   Epoch: 11   Global Step: 46420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:23,582-Speed 3327.14 samples/sec   Loss 3.8875   LearningRate 0.0192   Epoch: 11   Global Step: 46430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:26,678-Speed 3308.95 samples/sec   Loss 3.8136   LearningRate 0.0192   Epoch: 11   Global Step: 46440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:29,767-Speed 3315.62 samples/sec   Loss 3.8501   LearningRate 0.0192   Epoch: 11   Global Step: 46450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:32,846-Speed 3326.06 samples/sec   Loss 3.7943   LearningRate 0.0192   Epoch: 11   Global Step: 46460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:35,943-Speed 3306.67 samples/sec   Loss 3.9096   LearningRate 0.0192   Epoch: 11   Global Step: 46470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:39,060-Speed 3286.81 samples/sec   Loss 3.7684   LearningRate 0.0192   Epoch: 11   Global Step: 46480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:42,139-Speed 3326.52 samples/sec   Loss 4.0314   LearningRate 0.0192   Epoch: 11   Global Step: 46490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:45,218-Speed 3325.88 samples/sec   Loss 3.8532   LearningRate 0.0192   Epoch: 11   Global Step: 46500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:48,300-Speed 3323.30 samples/sec   Loss 3.8426   LearningRate 0.0191   Epoch: 11   Global Step: 46510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:51,402-Speed 3301.98 samples/sec   Loss 3.9228   LearningRate 0.0191   Epoch: 11   Global Step: 46520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:12:54,490-Speed 3317.37 samples/sec   Loss 3.8897   LearningRate 0.0191   Epoch: 11   Global Step: 46530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:12:57,573-Speed 3321.67 samples/sec   Loss 3.8582   LearningRate 0.0191   Epoch: 11   Global Step: 46540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:00,664-Speed 3313.73 samples/sec   Loss 3.8178   LearningRate 0.0191   Epoch: 11   Global Step: 46550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:03,757-Speed 3311.62 samples/sec   Loss 3.8701   LearningRate 0.0191   Epoch: 11   Global Step: 46560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:06,844-Speed 3317.00 samples/sec   Loss 3.8413   LearningRate 0.0191   Epoch: 11   Global Step: 46570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:09,926-Speed 3324.15 samples/sec   Loss 3.8997   LearningRate 0.0191   Epoch: 11   Global Step: 46580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:13,005-Speed 3326.20 samples/sec   Loss 3.8698   LearningRate 0.0191   Epoch: 11   Global Step: 46590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:16,100-Speed 3309.49 samples/sec   Loss 3.8465   LearningRate 0.0191   Epoch: 11   Global Step: 46600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:19,179-Speed 3326.00 samples/sec   Loss 3.8479   LearningRate 0.0190   Epoch: 11   Global Step: 46610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:22,262-Speed 3322.27 samples/sec   Loss 3.8769   LearningRate 0.0190   Epoch: 11   Global Step: 46620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:25,378-Speed 3287.06 samples/sec   Loss 3.7942   LearningRate 0.0190   Epoch: 11   Global Step: 46630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:28,490-Speed 3291.15 samples/sec   Loss 3.8087   LearningRate 0.0190   Epoch: 11   Global Step: 46640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:31,580-Speed 3314.51 samples/sec   Loss 3.8421   LearningRate 0.0190   Epoch: 11   Global Step: 46650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:34,667-Speed 3317.44 samples/sec   Loss 3.9188   LearningRate 0.0190   Epoch: 11   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:37,767-Speed 3305.31 samples/sec   Loss 3.8676   LearningRate 0.0190   Epoch: 11   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:40,861-Speed 3310.19 samples/sec   Loss 3.9413   LearningRate 0.0190   Epoch: 11   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:43,946-Speed 3319.44 samples/sec   Loss 3.8616   LearningRate 0.0190   Epoch: 11   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:47,042-Speed 3308.29 samples/sec   Loss 3.9303   LearningRate 0.0189   Epoch: 11   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:50,129-Speed 3318.48 samples/sec   Loss 3.9799   LearningRate 0.0189   Epoch: 11   Global Step: 46710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:53,221-Speed 3311.95 samples/sec   Loss 3.8973   LearningRate 0.0189   Epoch: 11   Global Step: 46720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:56,286-Speed 3341.61 samples/sec   Loss 3.9442   LearningRate 0.0189   Epoch: 11   Global Step: 46730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:13:59,418-Speed 3270.57 samples/sec   Loss 3.9275   LearningRate 0.0189   Epoch: 11   Global Step: 46740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:02,502-Speed 3320.08 samples/sec   Loss 3.8941   LearningRate 0.0189   Epoch: 11   Global Step: 46750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:05,585-Speed 3322.93 samples/sec   Loss 4.0353   LearningRate 0.0189   Epoch: 11   Global Step: 46760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:08,668-Speed 3322.52 samples/sec   Loss 3.8443   LearningRate 0.0189   Epoch: 11   Global Step: 46770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:11,754-Speed 3318.31 samples/sec   Loss 3.9728   LearningRate 0.0189   Epoch: 11   Global Step: 46780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:14,974-Speed 3180.87 samples/sec   Loss 3.9282   LearningRate 0.0189   Epoch: 11   Global Step: 46790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:18,070-Speed 3309.98 samples/sec   Loss 3.9752   LearningRate 0.0188   Epoch: 11   Global Step: 46800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:21,171-Speed 3303.02 samples/sec   Loss 4.0399   LearningRate 0.0188   Epoch: 11   Global Step: 46810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:24,238-Speed 3339.86 samples/sec   Loss 3.9087   LearningRate 0.0188   Epoch: 11   Global Step: 46820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:27,351-Speed 3290.06 samples/sec   Loss 3.8546   LearningRate 0.0188   Epoch: 11   Global Step: 46830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:30,470-Speed 3283.38 samples/sec   Loss 4.0133   LearningRate 0.0188   Epoch: 11   Global Step: 46840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:33,556-Speed 3319.52 samples/sec   Loss 3.9120   LearningRate 0.0188   Epoch: 11   Global Step: 46850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:36,639-Speed 3321.71 samples/sec   Loss 3.9588   LearningRate 0.0188   Epoch: 11   Global Step: 46860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:39,732-Speed 3311.86 samples/sec   Loss 3.8964   LearningRate 0.0188   Epoch: 11   Global Step: 46870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:42,817-Speed 3320.21 samples/sec   Loss 3.9782   LearningRate 0.0188   Epoch: 11   Global Step: 46880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:45,914-Speed 3306.20 samples/sec   Loss 3.9420   LearningRate 0.0187   Epoch: 11   Global Step: 46890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:49,001-Speed 3318.44 samples/sec   Loss 3.9370   LearningRate 0.0187   Epoch: 11   Global Step: 46900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:52,103-Speed 3301.38 samples/sec   Loss 4.0274   LearningRate 0.0187   Epoch: 11   Global Step: 46910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:14:55,191-Speed 3316.83 samples/sec   Loss 4.0065   LearningRate 0.0187   Epoch: 11   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:14:58,279-Speed 3317.08 samples/sec   Loss 4.0319   LearningRate 0.0187   Epoch: 11   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:01,377-Speed 3305.96 samples/sec   Loss 3.9980   LearningRate 0.0187   Epoch: 11   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:04,509-Speed 3270.01 samples/sec   Loss 3.9240   LearningRate 0.0187   Epoch: 11   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:07,594-Speed 3320.12 samples/sec   Loss 4.0864   LearningRate 0.0187   Epoch: 11   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:10,690-Speed 3309.25 samples/sec   Loss 4.0403   LearningRate 0.0187   Epoch: 11   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:13,773-Speed 3321.41 samples/sec   Loss 3.9543   LearningRate 0.0187   Epoch: 11   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:16,858-Speed 3319.63 samples/sec   Loss 4.0173   LearningRate 0.0186   Epoch: 11   Global Step: 46990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:19,946-Speed 3316.85 samples/sec   Loss 3.9058   LearningRate 0.0186   Epoch: 11   Global Step: 47000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:23,060-Speed 3289.35 samples/sec   Loss 3.9653   LearningRate 0.0186   Epoch: 11   Global Step: 47010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:26,184-Speed 3278.56 samples/sec   Loss 4.0214   LearningRate 0.0186   Epoch: 11   Global Step: 47020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:29,276-Speed 3312.68 samples/sec   Loss 3.9089   LearningRate 0.0186   Epoch: 11   Global Step: 47030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:32,368-Speed 3312.32 samples/sec   Loss 4.0567   LearningRate 0.0186   Epoch: 11   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:35,453-Speed 3319.90 samples/sec   Loss 4.0829   LearningRate 0.0186   Epoch: 11   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:38,549-Speed 3308.20 samples/sec   Loss 3.9096   LearningRate 0.0186   Epoch: 11   Global Step: 47060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:41,635-Speed 3319.74 samples/sec   Loss 3.9961   LearningRate 0.0186   Epoch: 11   Global Step: 47070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:44,723-Speed 3316.93 samples/sec   Loss 3.9930   LearningRate 0.0186   Epoch: 11   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:47,813-Speed 3314.57 samples/sec   Loss 4.0056   LearningRate 0.0185   Epoch: 11   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:50,906-Speed 3311.34 samples/sec   Loss 3.9566   LearningRate 0.0185   Epoch: 11   Global Step: 47100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:54,080-Speed 3227.10 samples/sec   Loss 4.0600   LearningRate 0.0185   Epoch: 11   Global Step: 47110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:15:57,194-Speed 3288.84 samples/sec   Loss 3.9337   LearningRate 0.0185   Epoch: 11   Global Step: 47120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:00,278-Speed 3321.37 samples/sec   Loss 3.9604   LearningRate 0.0185   Epoch: 11   Global Step: 47130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:03,372-Speed 3309.58 samples/sec   Loss 4.0354   LearningRate 0.0185   Epoch: 11   Global Step: 47140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:06,471-Speed 3305.42 samples/sec   Loss 4.0283   LearningRate 0.0185   Epoch: 11   Global Step: 47150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:09,563-Speed 3312.08 samples/sec   Loss 4.0498   LearningRate 0.0185   Epoch: 11   Global Step: 47160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:12,652-Speed 3316.67 samples/sec   Loss 4.0769   LearningRate 0.0185   Epoch: 11   Global Step: 47170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:15,749-Speed 3306.77 samples/sec   Loss 4.0252   LearningRate 0.0184   Epoch: 11   Global Step: 47180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:18,841-Speed 3311.89 samples/sec   Loss 4.0103   LearningRate 0.0184   Epoch: 11   Global Step: 47190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:21,928-Speed 3318.78 samples/sec   Loss 4.0120   LearningRate 0.0184   Epoch: 11   Global Step: 47200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:25,016-Speed 3316.79 samples/sec   Loss 4.1054   LearningRate 0.0184   Epoch: 11   Global Step: 47210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:28,109-Speed 3311.30 samples/sec   Loss 3.9704   LearningRate 0.0184   Epoch: 11   Global Step: 47220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 17:16:31,185-Speed 3329.73 samples/sec   Loss 3.9960   LearningRate 0.0184   Epoch: 11   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:34,309-Speed 3278.98 samples/sec   Loss 4.0699   LearningRate 0.0184   Epoch: 11   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:37,400-Speed 3313.24 samples/sec   Loss 4.0618   LearningRate 0.0184   Epoch: 11   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:40,518-Speed 3285.21 samples/sec   Loss 4.1092   LearningRate 0.0184   Epoch: 11   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:43,612-Speed 3310.51 samples/sec   Loss 4.0969   LearningRate 0.0184   Epoch: 11   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:46,699-Speed 3318.01 samples/sec   Loss 4.0401   LearningRate 0.0183   Epoch: 11   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:49,788-Speed 3315.24 samples/sec   Loss 4.0812   LearningRate 0.0183   Epoch: 11   Global Step: 47290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:52,877-Speed 3315.63 samples/sec   Loss 4.0262   LearningRate 0.0183   Epoch: 11   Global Step: 47300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:55,966-Speed 3316.39 samples/sec   Loss 4.1349   LearningRate 0.0183   Epoch: 11   Global Step: 47310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:16:59,063-Speed 3307.00 samples/sec   Loss 4.0257   LearningRate 0.0183   Epoch: 11   Global Step: 47320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:02,142-Speed 3326.34 samples/sec   Loss 4.1246   LearningRate 0.0183   Epoch: 11   Global Step: 47330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:05,225-Speed 3322.49 samples/sec   Loss 4.0887   LearningRate 0.0183   Epoch: 11   Global Step: 47340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:08,307-Speed 3323.52 samples/sec   Loss 3.9324   LearningRate 0.0183   Epoch: 11   Global Step: 47350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:11,391-Speed 3320.50 samples/sec   Loss 4.0493   LearningRate 0.0183   Epoch: 11   Global Step: 47360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:14,483-Speed 3312.56 samples/sec   Loss 4.0593   LearningRate 0.0183   Epoch: 11   Global Step: 47370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:17,570-Speed 3317.73 samples/sec   Loss 4.1373   LearningRate 0.0182   Epoch: 11   Global Step: 47380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:20,656-Speed 3319.36 samples/sec   Loss 4.2095   LearningRate 0.0182   Epoch: 11   Global Step: 47390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:23,745-Speed 3315.50 samples/sec   Loss 3.9699   LearningRate 0.0182   Epoch: 11   Global Step: 47400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:26,841-Speed 3308.43 samples/sec   Loss 4.0501   LearningRate 0.0182   Epoch: 11   Global Step: 47410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:29,930-Speed 3315.78 samples/sec   Loss 3.9867   LearningRate 0.0182   Epoch: 11   Global Step: 47420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:33,013-Speed 3322.00 samples/sec   Loss 4.0951   LearningRate 0.0182   Epoch: 11   Global Step: 47430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:17:36,104-Speed 3313.44 samples/sec   Loss 3.9771   LearningRate 0.0182   Epoch: 11   Global Step: 47440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:39,184-Speed 3325.06 samples/sec   Loss 4.0063   LearningRate 0.0182   Epoch: 11   Global Step: 47450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:42,267-Speed 3322.98 samples/sec   Loss 4.2370   LearningRate 0.0182   Epoch: 11   Global Step: 47460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:45,352-Speed 3319.39 samples/sec   Loss 4.1374   LearningRate 0.0181   Epoch: 11   Global Step: 47470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:48,434-Speed 3323.69 samples/sec   Loss 4.0445   LearningRate 0.0181   Epoch: 11   Global Step: 47480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:51,516-Speed 3322.45 samples/sec   Loss 4.0900   LearningRate 0.0181   Epoch: 11   Global Step: 47490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:54,615-Speed 3305.30 samples/sec   Loss 4.0687   LearningRate 0.0181   Epoch: 11   Global Step: 47500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:17:57,697-Speed 3323.58 samples/sec   Loss 4.1441   LearningRate 0.0181   Epoch: 11   Global Step: 47510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:00,785-Speed 3317.64 samples/sec   Loss 3.9873   LearningRate 0.0181   Epoch: 11   Global Step: 47520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:03,875-Speed 3314.35 samples/sec   Loss 4.0833   LearningRate 0.0181   Epoch: 11   Global Step: 47530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:06,950-Speed 3330.62 samples/sec   Loss 4.0416   LearningRate 0.0181   Epoch: 11   Global Step: 47540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:10,043-Speed 3311.39 samples/sec   Loss 4.1136   LearningRate 0.0181   Epoch: 11   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:13,126-Speed 3322.25 samples/sec   Loss 4.0418   LearningRate 0.0181   Epoch: 11   Global Step: 47560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:16,219-Speed 3311.08 samples/sec   Loss 4.0611   LearningRate 0.0180   Epoch: 11   Global Step: 47570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:19,309-Speed 3314.17 samples/sec   Loss 4.0949   LearningRate 0.0180   Epoch: 11   Global Step: 47580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:22,445-Speed 3266.18 samples/sec   Loss 4.1182   LearningRate 0.0180   Epoch: 11   Global Step: 47590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:25,625-Speed 3221.67 samples/sec   Loss 4.2211   LearningRate 0.0180   Epoch: 11   Global Step: 47600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:28,714-Speed 3315.95 samples/sec   Loss 4.0908   LearningRate 0.0180   Epoch: 11   Global Step: 47610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:31,796-Speed 3322.75 samples/sec   Loss 4.1587   LearningRate 0.0180   Epoch: 11   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:34,886-Speed 3314.66 samples/sec   Loss 4.0468   LearningRate 0.0180   Epoch: 11   Global Step: 47630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:37,970-Speed 3321.57 samples/sec   Loss 4.1022   LearningRate 0.0180   Epoch: 11   Global Step: 47640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:41,064-Speed 3310.39 samples/sec   Loss 4.0812   LearningRate 0.0180   Epoch: 11   Global Step: 47650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:44,153-Speed 3315.47 samples/sec   Loss 4.1123   LearningRate 0.0180   Epoch: 11   Global Step: 47660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:47,237-Speed 3321.13 samples/sec   Loss 4.1190   LearningRate 0.0179   Epoch: 11   Global Step: 47670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:50,354-Speed 3285.30 samples/sec   Loss 4.1154   LearningRate 0.0179   Epoch: 11   Global Step: 47680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:53,469-Speed 3287.97 samples/sec   Loss 4.0719   LearningRate 0.0179   Epoch: 11   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:56,568-Speed 3305.71 samples/sec   Loss 4.0418   LearningRate 0.0179   Epoch: 11   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:18:59,777-Speed 3191.61 samples/sec   Loss 4.1447   LearningRate 0.0179   Epoch: 11   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:02,874-Speed 3306.50 samples/sec   Loss 4.1398   LearningRate 0.0179   Epoch: 11   Global Step: 47720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:05,972-Speed 3306.95 samples/sec   Loss 4.0759   LearningRate 0.0179   Epoch: 11   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:09,037-Speed 3341.49 samples/sec   Loss 4.0787   LearningRate 0.0179   Epoch: 11   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:12,126-Speed 3315.36 samples/sec   Loss 4.1810   LearningRate 0.0179   Epoch: 11   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:15,237-Speed 3292.40 samples/sec   Loss 4.0681   LearningRate 0.0178   Epoch: 11   Global Step: 47760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:18,329-Speed 3312.75 samples/sec   Loss 4.1976   LearningRate 0.0178   Epoch: 11   Global Step: 47770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:21,412-Speed 3321.78 samples/sec   Loss 4.0906   LearningRate 0.0178   Epoch: 11   Global Step: 47780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:24,500-Speed 3316.68 samples/sec   Loss 4.1740   LearningRate 0.0178   Epoch: 11   Global Step: 47790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:27,642-Speed 3259.94 samples/sec   Loss 3.9968   LearningRate 0.0178   Epoch: 11   Global Step: 47800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:30,732-Speed 3314.69 samples/sec   Loss 4.0970   LearningRate 0.0178   Epoch: 11   Global Step: 47810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:33,816-Speed 3321.11 samples/sec   Loss 4.1314   LearningRate 0.0178   Epoch: 11   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:36,904-Speed 3317.46 samples/sec   Loss 4.0633   LearningRate 0.0178   Epoch: 11   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:39,989-Speed 3319.84 samples/sec   Loss 4.0561   LearningRate 0.0178   Epoch: 11   Global Step: 47840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-26 17:19:43,058-Speed 3336.94 samples/sec   Loss 4.1158   LearningRate 0.0178   Epoch: 11   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:46,144-Speed 3318.81 samples/sec   Loss 4.1271   LearningRate 0.0177   Epoch: 11   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:49,251-Speed 3295.98 samples/sec   Loss 4.1318   LearningRate 0.0177   Epoch: 11   Global Step: 47870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:52,352-Speed 3303.38 samples/sec   Loss 4.0572   LearningRate 0.0177   Epoch: 11   Global Step: 47880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:55,462-Speed 3293.48 samples/sec   Loss 4.1448   LearningRate 0.0177   Epoch: 11   Global Step: 47890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:19:58,547-Speed 3320.18 samples/sec   Loss 4.1116   LearningRate 0.0177   Epoch: 11   Global Step: 47900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:20:01,632-Speed 3319.56 samples/sec   Loss 4.1342   LearningRate 0.0177   Epoch: 11   Global Step: 47910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:20:04,719-Speed 3318.67 samples/sec   Loss 4.1689   LearningRate 0.0177   Epoch: 11   Global Step: 47920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:20:07,789-Speed 3336.22 samples/sec   Loss 4.1373   LearningRate 0.0177   Epoch: 11   Global Step: 47930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:20:10,869-Speed 3325.46 samples/sec   Loss 4.1431   LearningRate 0.0177   Epoch: 11   Global Step: 47940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:20:13,972-Speed 3299.95 samples/sec   Loss 4.1723   LearningRate 0.0177   Epoch: 11   Global Step: 47950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:20:17,068-Speed 3308.58 samples/sec   Loss 4.0148   LearningRate 0.0176   Epoch: 11   Global Step: 47960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:20:20,154-Speed 3319.07 samples/sec   Loss 4.0638   LearningRate 0.0176   Epoch: 11   Global Step: 47970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:20:23,248-Speed 3310.46 samples/sec   Loss 4.1233   LearningRate 0.0176   Epoch: 11   Global Step: 47980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:20:26,338-Speed 3313.99 samples/sec   Loss 4.0945   LearningRate 0.0176   Epoch: 11   Global Step: 47990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:20:29,430-Speed 3312.95 samples/sec   Loss 4.1033   LearningRate 0.0176   Epoch: 11   Global Step: 48000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:21:12,852-[lfw][48000]XNorm: 22.833627
Training: 2022-04-26 17:21:12,853-[lfw][48000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-26 17:21:12,853-[lfw][48000]Accuracy-Highest: 0.99783
Training: 2022-04-26 17:22:03,633-[cfp_fp][48000]XNorm: 21.953586
Training: 2022-04-26 17:22:03,634-[cfp_fp][48000]Accuracy-Flip: 0.98700+-0.00440
Training: 2022-04-26 17:22:03,634-[cfp_fp][48000]Accuracy-Highest: 0.98843
Training: 2022-04-26 17:22:47,381-[agedb_30][48000]XNorm: 23.053739
Training: 2022-04-26 17:22:47,382-[agedb_30][48000]Accuracy-Flip: 0.97217+-0.00654
Training: 2022-04-26 17:22:47,382-[agedb_30][48000]Accuracy-Highest: 0.97550
Training: 2022-04-26 17:22:50,488-Speed 72.59 samples/sec   Loss 4.1832   LearningRate 0.0176   Epoch: 11   Global Step: 48010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:22:53,562-Speed 3331.35 samples/sec   Loss 4.1409   LearningRate 0.0176   Epoch: 11   Global Step: 48020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:22:56,644-Speed 3323.91 samples/sec   Loss 4.1271   LearningRate 0.0176   Epoch: 11   Global Step: 48030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:22:59,722-Speed 3326.61 samples/sec   Loss 4.1279   LearningRate 0.0176   Epoch: 11   Global Step: 48040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:02,830-Speed 3296.10 samples/sec   Loss 4.1571   LearningRate 0.0176   Epoch: 11   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:05,907-Speed 3328.28 samples/sec   Loss 4.1796   LearningRate 0.0175   Epoch: 11   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:08,990-Speed 3322.00 samples/sec   Loss 4.1293   LearningRate 0.0175   Epoch: 11   Global Step: 48070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:12,073-Speed 3322.56 samples/sec   Loss 4.1917   LearningRate 0.0175   Epoch: 11   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:15,135-Speed 3344.41 samples/sec   Loss 4.1326   LearningRate 0.0175   Epoch: 11   Global Step: 48090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:18,216-Speed 3324.03 samples/sec   Loss 4.1407   LearningRate 0.0175   Epoch: 11   Global Step: 48100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:21,310-Speed 3311.18 samples/sec   Loss 4.1806   LearningRate 0.0175   Epoch: 11   Global Step: 48110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:24,410-Speed 3304.14 samples/sec   Loss 4.1525   LearningRate 0.0175   Epoch: 11   Global Step: 48120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:27,516-Speed 3297.37 samples/sec   Loss 4.2644   LearningRate 0.0175   Epoch: 11   Global Step: 48130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:30,596-Speed 3325.24 samples/sec   Loss 4.2079   LearningRate 0.0175   Epoch: 11   Global Step: 48140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:33,676-Speed 3325.23 samples/sec   Loss 4.2881   LearningRate 0.0175   Epoch: 11   Global Step: 48150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:36,757-Speed 3324.27 samples/sec   Loss 4.1338   LearningRate 0.0174   Epoch: 11   Global Step: 48160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:39,837-Speed 3325.72 samples/sec   Loss 4.0877   LearningRate 0.0174   Epoch: 11   Global Step: 48170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:42,929-Speed 3312.79 samples/sec   Loss 4.1745   LearningRate 0.0174   Epoch: 11   Global Step: 48180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:46,026-Speed 3306.56 samples/sec   Loss 4.2007   LearningRate 0.0174   Epoch: 11   Global Step: 48190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:49,108-Speed 3323.33 samples/sec   Loss 4.1609   LearningRate 0.0174   Epoch: 11   Global Step: 48200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:52,200-Speed 3312.28 samples/sec   Loss 4.2010   LearningRate 0.0174   Epoch: 11   Global Step: 48210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:23:55,264-Speed 3343.65 samples/sec   Loss 4.0915   LearningRate 0.0174   Epoch: 11   Global Step: 48220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:23:58,345-Speed 3324.30 samples/sec   Loss 4.0907   LearningRate 0.0174   Epoch: 11   Global Step: 48230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:01,422-Speed 3328.97 samples/sec   Loss 4.1548   LearningRate 0.0174   Epoch: 11   Global Step: 48240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:04,502-Speed 3324.81 samples/sec   Loss 4.2291   LearningRate 0.0174   Epoch: 11   Global Step: 48250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:07,581-Speed 3326.03 samples/sec   Loss 4.1494   LearningRate 0.0173   Epoch: 11   Global Step: 48260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:10,666-Speed 3319.89 samples/sec   Loss 4.1963   LearningRate 0.0173   Epoch: 11   Global Step: 48270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:13,750-Speed 3321.54 samples/sec   Loss 4.1749   LearningRate 0.0173   Epoch: 11   Global Step: 48280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:16,835-Speed 3320.13 samples/sec   Loss 4.1492   LearningRate 0.0173   Epoch: 11   Global Step: 48290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:19,917-Speed 3323.04 samples/sec   Loss 4.1555   LearningRate 0.0173   Epoch: 11   Global Step: 48300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:22,999-Speed 3323.98 samples/sec   Loss 4.1752   LearningRate 0.0173   Epoch: 11   Global Step: 48310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:24:26,083-Speed 3320.44 samples/sec   Loss 4.1952   LearningRate 0.0173   Epoch: 11   Global Step: 48320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:29,162-Speed 3327.13 samples/sec   Loss 4.2717   LearningRate 0.0173   Epoch: 11   Global Step: 48330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:32,239-Speed 3328.14 samples/sec   Loss 4.1255   LearningRate 0.0173   Epoch: 11   Global Step: 48340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:35,314-Speed 3331.08 samples/sec   Loss 4.1769   LearningRate 0.0173   Epoch: 11   Global Step: 48350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:38,419-Speed 3298.00 samples/sec   Loss 4.0626   LearningRate 0.0172   Epoch: 11   Global Step: 48360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:41,499-Speed 3326.00 samples/sec   Loss 4.0709   LearningRate 0.0172   Epoch: 11   Global Step: 48370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:44,576-Speed 3328.06 samples/sec   Loss 4.1942   LearningRate 0.0172   Epoch: 11   Global Step: 48380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:47,664-Speed 3316.55 samples/sec   Loss 4.1297   LearningRate 0.0172   Epoch: 11   Global Step: 48390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:50,745-Speed 3324.97 samples/sec   Loss 4.1814   LearningRate 0.0172   Epoch: 11   Global Step: 48400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:53,832-Speed 3319.13 samples/sec   Loss 4.1209   LearningRate 0.0172   Epoch: 11   Global Step: 48410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:56,895-Speed 3344.32 samples/sec   Loss 4.1987   LearningRate 0.0172   Epoch: 11   Global Step: 48420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:24:59,971-Speed 3329.15 samples/sec   Loss 4.2053   LearningRate 0.0172   Epoch: 11   Global Step: 48430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:25:03,028-Speed 3350.25 samples/sec   Loss 4.3090   LearningRate 0.0172   Epoch: 11   Global Step: 48440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:06,109-Speed 3324.23 samples/sec   Loss 4.1997   LearningRate 0.0172   Epoch: 11   Global Step: 48450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:09,187-Speed 3328.30 samples/sec   Loss 4.1909   LearningRate 0.0171   Epoch: 11   Global Step: 48460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:12,267-Speed 3325.53 samples/sec   Loss 4.0777   LearningRate 0.0171   Epoch: 11   Global Step: 48470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:15,353-Speed 3318.67 samples/sec   Loss 4.1290   LearningRate 0.0171   Epoch: 11   Global Step: 48480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:18,434-Speed 3323.82 samples/sec   Loss 4.1869   LearningRate 0.0171   Epoch: 11   Global Step: 48490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:21,528-Speed 3310.16 samples/sec   Loss 4.1435   LearningRate 0.0171   Epoch: 11   Global Step: 48500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:24,615-Speed 3318.27 samples/sec   Loss 4.1648   LearningRate 0.0171   Epoch: 11   Global Step: 48510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:27,763-Speed 3253.94 samples/sec   Loss 4.1406   LearningRate 0.0171   Epoch: 11   Global Step: 48520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:30,890-Speed 3275.75 samples/sec   Loss 4.2221   LearningRate 0.0171   Epoch: 11   Global Step: 48530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:33,964-Speed 3331.00 samples/sec   Loss 4.2353   LearningRate 0.0171   Epoch: 11   Global Step: 48540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:25:37,042-Speed 3327.33 samples/sec   Loss 4.2465   LearningRate 0.0171   Epoch: 11   Global Step: 48550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:25:40,121-Speed 3326.86 samples/sec   Loss 4.2699   LearningRate 0.0170   Epoch: 11   Global Step: 48560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:25:43,187-Speed 3341.06 samples/sec   Loss 4.3104   LearningRate 0.0170   Epoch: 11   Global Step: 48570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:46,271-Speed 3320.22 samples/sec   Loss 4.1818   LearningRate 0.0170   Epoch: 11   Global Step: 48580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:49,370-Speed 3305.56 samples/sec   Loss 4.1888   LearningRate 0.0170   Epoch: 11   Global Step: 48590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:52,453-Speed 3322.21 samples/sec   Loss 4.1180   LearningRate 0.0170   Epoch: 11   Global Step: 48600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:55,534-Speed 3324.82 samples/sec   Loss 4.1956   LearningRate 0.0170   Epoch: 11   Global Step: 48610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:25:58,618-Speed 3321.01 samples/sec   Loss 4.1783   LearningRate 0.0170   Epoch: 11   Global Step: 48620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:01,698-Speed 3325.43 samples/sec   Loss 4.1966   LearningRate 0.0170   Epoch: 11   Global Step: 48630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:04,777-Speed 3326.55 samples/sec   Loss 4.2058   LearningRate 0.0170   Epoch: 11   Global Step: 48640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:07,855-Speed 3327.54 samples/sec   Loss 4.1526   LearningRate 0.0170   Epoch: 11   Global Step: 48650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:10,942-Speed 3317.00 samples/sec   Loss 4.2389   LearningRate 0.0169   Epoch: 11   Global Step: 48660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:14,041-Speed 3305.32 samples/sec   Loss 4.2343   LearningRate 0.0169   Epoch: 11   Global Step: 48670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:26:17,163-Speed 3280.27 samples/sec   Loss 4.3255   LearningRate 0.0169   Epoch: 11   Global Step: 48680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:26:20,228-Speed 3341.71 samples/sec   Loss 4.2412   LearningRate 0.0169   Epoch: 11   Global Step: 48690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:23,307-Speed 3327.14 samples/sec   Loss 4.1453   LearningRate 0.0169   Epoch: 11   Global Step: 48700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:26,390-Speed 3322.67 samples/sec   Loss 4.1411   LearningRate 0.0169   Epoch: 11   Global Step: 48710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:29,473-Speed 3321.07 samples/sec   Loss 4.1985   LearningRate 0.0169   Epoch: 11   Global Step: 48720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:32,557-Speed 3321.66 samples/sec   Loss 4.2204   LearningRate 0.0169   Epoch: 11   Global Step: 48730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:35,667-Speed 3293.66 samples/sec   Loss 4.0705   LearningRate 0.0169   Epoch: 11   Global Step: 48740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:38,756-Speed 3315.03 samples/sec   Loss 4.2389   LearningRate 0.0169   Epoch: 11   Global Step: 48750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:41,855-Speed 3305.20 samples/sec   Loss 4.1236   LearningRate 0.0168   Epoch: 11   Global Step: 48760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:44,935-Speed 3325.82 samples/sec   Loss 4.1760   LearningRate 0.0168   Epoch: 11   Global Step: 48770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:48,049-Speed 3288.67 samples/sec   Loss 4.3326   LearningRate 0.0168   Epoch: 11   Global Step: 48780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:26:51,141-Speed 3312.70 samples/sec   Loss 4.1084   LearningRate 0.0168   Epoch: 11   Global Step: 48790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:26:54,231-Speed 3314.51 samples/sec   Loss 4.1547   LearningRate 0.0168   Epoch: 11   Global Step: 48800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:26:57,327-Speed 3308.53 samples/sec   Loss 4.2298   LearningRate 0.0168   Epoch: 11   Global Step: 48810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:00,479-Speed 3249.11 samples/sec   Loss 4.1985   LearningRate 0.0168   Epoch: 11   Global Step: 48820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:03,566-Speed 3318.41 samples/sec   Loss 4.1328   LearningRate 0.0168   Epoch: 11   Global Step: 48830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:06,671-Speed 3298.46 samples/sec   Loss 4.1214   LearningRate 0.0168   Epoch: 11   Global Step: 48840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:09,754-Speed 3322.06 samples/sec   Loss 4.0986   LearningRate 0.0168   Epoch: 11   Global Step: 48850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:12,838-Speed 3320.64 samples/sec   Loss 4.2594   LearningRate 0.0167   Epoch: 11   Global Step: 48860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:15,929-Speed 3314.39 samples/sec   Loss 4.2123   LearningRate 0.0167   Epoch: 11   Global Step: 48870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:19,011-Speed 3322.61 samples/sec   Loss 4.1390   LearningRate 0.0167   Epoch: 11   Global Step: 48880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:22,101-Speed 3314.75 samples/sec   Loss 4.1888   LearningRate 0.0167   Epoch: 11   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:25,271-Speed 3230.91 samples/sec   Loss 4.1404   LearningRate 0.0167   Epoch: 11   Global Step: 48900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:28,416-Speed 3256.69 samples/sec   Loss 4.1980   LearningRate 0.0167   Epoch: 11   Global Step: 48910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:31,508-Speed 3313.36 samples/sec   Loss 4.1589   LearningRate 0.0167   Epoch: 11   Global Step: 48920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:34,588-Speed 3325.02 samples/sec   Loss 4.2502   LearningRate 0.0167   Epoch: 11   Global Step: 48930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:37,670-Speed 3322.96 samples/sec   Loss 4.2097   LearningRate 0.0167   Epoch: 11   Global Step: 48940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:40,757-Speed 3318.02 samples/sec   Loss 4.2365   LearningRate 0.0167   Epoch: 11   Global Step: 48950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:43,848-Speed 3313.21 samples/sec   Loss 4.1258   LearningRate 0.0166   Epoch: 11   Global Step: 48960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:46,937-Speed 3315.40 samples/sec   Loss 4.1334   LearningRate 0.0166   Epoch: 11   Global Step: 48970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:50,042-Speed 3299.33 samples/sec   Loss 4.1892   LearningRate 0.0166   Epoch: 11   Global Step: 48980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:53,213-Speed 3229.31 samples/sec   Loss 4.2474   LearningRate 0.0166   Epoch: 11   Global Step: 48990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:27:56,302-Speed 3315.97 samples/sec   Loss 4.1980   LearningRate 0.0166   Epoch: 11   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:27:59,385-Speed 3323.10 samples/sec   Loss 4.1984   LearningRate 0.0166   Epoch: 11   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:02,473-Speed 3315.98 samples/sec   Loss 4.2457   LearningRate 0.0166   Epoch: 11   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:05,559-Speed 3319.92 samples/sec   Loss 4.2111   LearningRate 0.0166   Epoch: 11   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:08,644-Speed 3319.86 samples/sec   Loss 4.1895   LearningRate 0.0166   Epoch: 11   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:11,739-Speed 3308.29 samples/sec   Loss 4.1156   LearningRate 0.0166   Epoch: 11   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:14,847-Speed 3296.24 samples/sec   Loss 4.2490   LearningRate 0.0165   Epoch: 11   Global Step: 49060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:17,927-Speed 3325.12 samples/sec   Loss 4.1722   LearningRate 0.0165   Epoch: 11   Global Step: 49070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:21,030-Speed 3300.53 samples/sec   Loss 4.1366   LearningRate 0.0165   Epoch: 11   Global Step: 49080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:24,124-Speed 3310.72 samples/sec   Loss 4.1367   LearningRate 0.0165   Epoch: 11   Global Step: 49090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:27,213-Speed 3315.85 samples/sec   Loss 4.1190   LearningRate 0.0165   Epoch: 11   Global Step: 49100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:30,293-Speed 3325.73 samples/sec   Loss 4.1745   LearningRate 0.0165   Epoch: 11   Global Step: 49110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:33,383-Speed 3314.27 samples/sec   Loss 4.2232   LearningRate 0.0165   Epoch: 11   Global Step: 49120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:36,465-Speed 3323.06 samples/sec   Loss 4.2565   LearningRate 0.0165   Epoch: 11   Global Step: 49130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:39,563-Speed 3306.69 samples/sec   Loss 4.2120   LearningRate 0.0165   Epoch: 11   Global Step: 49140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:42,656-Speed 3310.82 samples/sec   Loss 4.2459   LearningRate 0.0165   Epoch: 11   Global Step: 49150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:28:45,742-Speed 3319.50 samples/sec   Loss 4.2603   LearningRate 0.0164   Epoch: 11   Global Step: 49160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:48,860-Speed 3284.72 samples/sec   Loss 4.1319   LearningRate 0.0164   Epoch: 11   Global Step: 49170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:51,993-Speed 3268.78 samples/sec   Loss 4.2085   LearningRate 0.0164   Epoch: 11   Global Step: 49180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:55,091-Speed 3306.25 samples/sec   Loss 4.2510   LearningRate 0.0164   Epoch: 11   Global Step: 49190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:28:58,177-Speed 3319.02 samples/sec   Loss 4.2282   LearningRate 0.0164   Epoch: 11   Global Step: 49200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:01,264-Speed 3318.06 samples/sec   Loss 4.1924   LearningRate 0.0164   Epoch: 11   Global Step: 49210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:04,458-Speed 3206.27 samples/sec   Loss 4.2134   LearningRate 0.0164   Epoch: 11   Global Step: 49220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:07,560-Speed 3302.68 samples/sec   Loss 4.1874   LearningRate 0.0164   Epoch: 11   Global Step: 49230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:10,647-Speed 3317.85 samples/sec   Loss 4.2281   LearningRate 0.0164   Epoch: 11   Global Step: 49240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:13,732-Speed 3319.31 samples/sec   Loss 4.2733   LearningRate 0.0164   Epoch: 11   Global Step: 49250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:16,828-Speed 3307.86 samples/sec   Loss 4.1436   LearningRate 0.0164   Epoch: 11   Global Step: 49260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:19,980-Speed 3249.96 samples/sec   Loss 4.1522   LearningRate 0.0163   Epoch: 11   Global Step: 49270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:23,079-Speed 3305.17 samples/sec   Loss 4.2074   LearningRate 0.0163   Epoch: 11   Global Step: 49280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:26,176-Speed 3307.02 samples/sec   Loss 4.1033   LearningRate 0.0163   Epoch: 11   Global Step: 49290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:29,263-Speed 3318.65 samples/sec   Loss 4.1671   LearningRate 0.0163   Epoch: 11   Global Step: 49300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:32,360-Speed 3306.66 samples/sec   Loss 4.2237   LearningRate 0.0163   Epoch: 11   Global Step: 49310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:35,447-Speed 3317.29 samples/sec   Loss 4.1938   LearningRate 0.0163   Epoch: 11   Global Step: 49320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:38,540-Speed 3312.15 samples/sec   Loss 4.2555   LearningRate 0.0163   Epoch: 11   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:29:41,606-Speed 3339.72 samples/sec   Loss 4.2188   LearningRate 0.0163   Epoch: 11   Global Step: 49340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:29:44,691-Speed 3320.26 samples/sec   Loss 4.2329   LearningRate 0.0163   Epoch: 11   Global Step: 49350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:29:47,805-Speed 3288.90 samples/sec   Loss 4.3086   LearningRate 0.0163   Epoch: 11   Global Step: 49360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:29:50,911-Speed 3297.97 samples/sec   Loss 4.1393   LearningRate 0.0162   Epoch: 11   Global Step: 49370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:29:53,996-Speed 3319.17 samples/sec   Loss 4.1746   LearningRate 0.0162   Epoch: 11   Global Step: 49380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:29:57,083-Speed 3318.69 samples/sec   Loss 4.2276   LearningRate 0.0162   Epoch: 11   Global Step: 49390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:30:00,187-Speed 3299.52 samples/sec   Loss 4.2522   LearningRate 0.0162   Epoch: 11   Global Step: 49400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:30:03,275-Speed 3316.95 samples/sec   Loss 4.2007   LearningRate 0.0162   Epoch: 11   Global Step: 49410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:30:06,357-Speed 3323.43 samples/sec   Loss 4.1603   LearningRate 0.0162   Epoch: 11   Global Step: 49420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:30:09,441-Speed 3320.58 samples/sec   Loss 4.1578   LearningRate 0.0162   Epoch: 11   Global Step: 49430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:30:12,540-Speed 3304.93 samples/sec   Loss 4.1673   LearningRate 0.0162   Epoch: 11   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:15,647-Speed 3296.43 samples/sec   Loss 4.2389   LearningRate 0.0162   Epoch: 11   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:18,738-Speed 3313.79 samples/sec   Loss 4.2339   LearningRate 0.0162   Epoch: 11   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:21,830-Speed 3312.18 samples/sec   Loss 4.1709   LearningRate 0.0161   Epoch: 11   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:24,926-Speed 3308.56 samples/sec   Loss 4.1764   LearningRate 0.0161   Epoch: 11   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:28,019-Speed 3311.61 samples/sec   Loss 4.2434   LearningRate 0.0161   Epoch: 11   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:31,106-Speed 3318.46 samples/sec   Loss 4.1678   LearningRate 0.0161   Epoch: 11   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:34,205-Speed 3304.43 samples/sec   Loss 4.1672   LearningRate 0.0161   Epoch: 11   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:37,304-Speed 3305.05 samples/sec   Loss 4.2149   LearningRate 0.0161   Epoch: 11   Global Step: 49520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:40,393-Speed 3315.93 samples/sec   Loss 4.0650   LearningRate 0.0161   Epoch: 11   Global Step: 49530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:43,466-Speed 3333.08 samples/sec   Loss 4.1857   LearningRate 0.0161   Epoch: 11   Global Step: 49540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:46,566-Speed 3303.47 samples/sec   Loss 4.1749   LearningRate 0.0161   Epoch: 11   Global Step: 49550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:49,674-Speed 3295.35 samples/sec   Loss 4.2342   LearningRate 0.0161   Epoch: 11   Global Step: 49560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:30:52,781-Speed 3296.12 samples/sec   Loss 4.1545   LearningRate 0.0160   Epoch: 11   Global Step: 49570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:30:55,876-Speed 3309.81 samples/sec   Loss 4.1520   LearningRate 0.0160   Epoch: 11   Global Step: 49580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:30:58,967-Speed 3313.52 samples/sec   Loss 4.2916   LearningRate 0.0160   Epoch: 11   Global Step: 49590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:02,061-Speed 3310.35 samples/sec   Loss 4.1461   LearningRate 0.0160   Epoch: 11   Global Step: 49600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:05,213-Speed 3249.66 samples/sec   Loss 4.1974   LearningRate 0.0160   Epoch: 11   Global Step: 49610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:08,288-Speed 3330.37 samples/sec   Loss 4.1069   LearningRate 0.0160   Epoch: 11   Global Step: 49620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:20,541-Speed 835.81 samples/sec   Loss 2.7430   LearningRate 0.0160   Epoch: 12   Global Step: 49630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:23,678-Speed 3265.20 samples/sec   Loss 2.8119   LearningRate 0.0160   Epoch: 12   Global Step: 49640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:26,881-Speed 3197.59 samples/sec   Loss 2.8514   LearningRate 0.0160   Epoch: 12   Global Step: 49650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:30,110-Speed 3172.16 samples/sec   Loss 2.8308   LearningRate 0.0160   Epoch: 12   Global Step: 49660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-26 17:31:33,199-Speed 3315.36 samples/sec   Loss 2.8563   LearningRate 0.0160   Epoch: 12   Global Step: 49670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:31:36,297-Speed 3306.50 samples/sec   Loss 2.7713   LearningRate 0.0159   Epoch: 12   Global Step: 49680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:31:39,394-Speed 3306.67 samples/sec   Loss 2.8357   LearningRate 0.0159   Epoch: 12   Global Step: 49690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:31:42,485-Speed 3313.42 samples/sec   Loss 2.8668   LearningRate 0.0159   Epoch: 12   Global Step: 49700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:31:45,583-Speed 3306.51 samples/sec   Loss 2.9120   LearningRate 0.0159   Epoch: 12   Global Step: 49710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:31:48,680-Speed 3307.52 samples/sec   Loss 2.7874   LearningRate 0.0159   Epoch: 12   Global Step: 49720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:31:51,777-Speed 3306.22 samples/sec   Loss 2.8210   LearningRate 0.0159   Epoch: 12   Global Step: 49730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-26 17:31:54,883-Speed 3298.19 samples/sec   Loss 2.8968   LearningRate 0.0159   Epoch: 12   Global Step: 49740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:31:57,976-Speed 3310.54 samples/sec   Loss 2.9558   LearningRate 0.0159   Epoch: 12   Global Step: 49750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:32:01,076-Speed 3304.48 samples/sec   Loss 2.9003   LearningRate 0.0159   Epoch: 12   Global Step: 49760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:32:04,175-Speed 3304.66 samples/sec   Loss 2.8100   LearningRate 0.0159   Epoch: 12   Global Step: 49770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-26 17:32:07,257-Speed 3324.03 samples/sec   Loss 2.9603   LearningRate 0.0158   Epoch: 12   Global Step: 49780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:32:10,351-Speed 3310.28 samples/sec   Loss 2.8753   LearningRate 0.0158   Epoch: 12   Global Step: 49790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:32:13,453-Speed 3302.12 samples/sec   Loss 2.8977   LearningRate 0.0158   Epoch: 12   Global Step: 49800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:32:16,552-Speed 3304.49 samples/sec   Loss 2.9056   LearningRate 0.0158   Epoch: 12   Global Step: 49810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:32:19,667-Speed 3288.05 samples/sec   Loss 2.9099   LearningRate 0.0158   Epoch: 12   Global Step: 49820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:32:22,798-Speed 3271.28 samples/sec   Loss 2.8376   LearningRate 0.0158   Epoch: 12   Global Step: 49830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:32:25,908-Speed 3293.72 samples/sec   Loss 2.8964   LearningRate 0.0158   Epoch: 12   Global Step: 49840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:32:29,021-Speed 3290.13 samples/sec   Loss 2.9250   LearningRate 0.0158   Epoch: 12   Global Step: 49850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:32:32,098-Speed 3328.14 samples/sec   Loss 2.8893   LearningRate 0.0158   Epoch: 12   Global Step: 49860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:35,196-Speed 3306.18 samples/sec   Loss 2.8812   LearningRate 0.0158   Epoch: 12   Global Step: 49870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:38,317-Speed 3282.15 samples/sec   Loss 2.9603   LearningRate 0.0157   Epoch: 12   Global Step: 49880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:41,419-Speed 3301.47 samples/sec   Loss 2.8861   LearningRate 0.0157   Epoch: 12   Global Step: 49890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:44,522-Speed 3301.13 samples/sec   Loss 2.9464   LearningRate 0.0157   Epoch: 12   Global Step: 49900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:47,620-Speed 3306.37 samples/sec   Loss 2.9229   LearningRate 0.0157   Epoch: 12   Global Step: 49910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:50,734-Speed 3288.32 samples/sec   Loss 2.9536   LearningRate 0.0157   Epoch: 12   Global Step: 49920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:53,842-Speed 3296.01 samples/sec   Loss 2.9080   LearningRate 0.0157   Epoch: 12   Global Step: 49930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:32:56,937-Speed 3309.04 samples/sec   Loss 3.0133   LearningRate 0.0157   Epoch: 12   Global Step: 49940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:33:00,024-Speed 3318.53 samples/sec   Loss 3.0027   LearningRate 0.0157   Epoch: 12   Global Step: 49950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:33:03,227-Speed 3197.64 samples/sec   Loss 2.9638   LearningRate 0.0157   Epoch: 12   Global Step: 49960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:33:06,315-Speed 3317.40 samples/sec   Loss 3.0330   LearningRate 0.0157   Epoch: 12   Global Step: 49970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:33:09,405-Speed 3314.15 samples/sec   Loss 2.9597   LearningRate 0.0157   Epoch: 12   Global Step: 49980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:33:12,494-Speed 3316.01 samples/sec   Loss 2.9402   LearningRate 0.0156   Epoch: 12   Global Step: 49990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:33:15,582-Speed 3315.78 samples/sec   Loss 2.9484   LearningRate 0.0156   Epoch: 12   Global Step: 50000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:33:59,205-[lfw][50000]XNorm: 22.521160
Training: 2022-04-26 17:33:59,206-[lfw][50000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-26 17:33:59,206-[lfw][50000]Accuracy-Highest: 0.99783
Training: 2022-04-26 17:34:49,840-[cfp_fp][50000]XNorm: 21.702601
Training: 2022-04-26 17:34:49,841-[cfp_fp][50000]Accuracy-Flip: 0.98900+-0.00616
Training: 2022-04-26 17:34:49,841-[cfp_fp][50000]Accuracy-Highest: 0.98900
Training: 2022-04-26 17:35:33,561-[agedb_30][50000]XNorm: 22.732584
Training: 2022-04-26 17:35:33,562-[agedb_30][50000]Accuracy-Flip: 0.97117+-0.00592
Training: 2022-04-26 17:35:33,562-[agedb_30][50000]Accuracy-Highest: 0.97550
Training: 2022-04-26 17:35:36,642-Speed 72.59 samples/sec   Loss 3.0598   LearningRate 0.0156   Epoch: 12   Global Step: 50010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:35:39,719-Speed 3328.99 samples/sec   Loss 2.9895   LearningRate 0.0156   Epoch: 12   Global Step: 50020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:35:42,811-Speed 3312.61 samples/sec   Loss 2.9518   LearningRate 0.0156   Epoch: 12   Global Step: 50030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:35:45,894-Speed 3321.75 samples/sec   Loss 3.0639   LearningRate 0.0156   Epoch: 12   Global Step: 50040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:35:49,178-Speed 3118.82 samples/sec   Loss 3.0285   LearningRate 0.0156   Epoch: 12   Global Step: 50050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:35:52,331-Speed 3248.46 samples/sec   Loss 3.0274   LearningRate 0.0156   Epoch: 12   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:35:55,420-Speed 3315.84 samples/sec   Loss 3.0381   LearningRate 0.0156   Epoch: 12   Global Step: 50070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:35:58,511-Speed 3313.56 samples/sec   Loss 3.0609   LearningRate 0.0156   Epoch: 12   Global Step: 50080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:01,595-Speed 3320.92 samples/sec   Loss 3.0562   LearningRate 0.0155   Epoch: 12   Global Step: 50090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:04,684-Speed 3316.48 samples/sec   Loss 3.0188   LearningRate 0.0155   Epoch: 12   Global Step: 50100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:07,767-Speed 3321.68 samples/sec   Loss 3.0438   LearningRate 0.0155   Epoch: 12   Global Step: 50110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:10,864-Speed 3307.19 samples/sec   Loss 3.0440   LearningRate 0.0155   Epoch: 12   Global Step: 50120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:13,945-Speed 3324.66 samples/sec   Loss 3.0543   LearningRate 0.0155   Epoch: 12   Global Step: 50130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:17,028-Speed 3321.75 samples/sec   Loss 3.0479   LearningRate 0.0155   Epoch: 12   Global Step: 50140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:20,109-Speed 3324.40 samples/sec   Loss 3.0994   LearningRate 0.0155   Epoch: 12   Global Step: 50150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:23,192-Speed 3321.95 samples/sec   Loss 3.0020   LearningRate 0.0155   Epoch: 12   Global Step: 50160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-26 17:36:26,276-Speed 3320.69 samples/sec   Loss 3.1680   LearningRate 0.0155   Epoch: 12   Global Step: 50170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:29,369-Speed 3311.94 samples/sec   Loss 2.9512   LearningRate 0.0155   Epoch: 12   Global Step: 50180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:32,449-Speed 3325.57 samples/sec   Loss 3.1291   LearningRate 0.0155   Epoch: 12   Global Step: 50190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:35,577-Speed 3274.28 samples/sec   Loss 3.1290   LearningRate 0.0154   Epoch: 12   Global Step: 50200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:38,657-Speed 3325.37 samples/sec   Loss 3.1598   LearningRate 0.0154   Epoch: 12   Global Step: 50210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:41,745-Speed 3316.72 samples/sec   Loss 3.1142   LearningRate 0.0154   Epoch: 12   Global Step: 50220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:44,826-Speed 3324.67 samples/sec   Loss 3.0821   LearningRate 0.0154   Epoch: 12   Global Step: 50230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:47,905-Speed 3327.13 samples/sec   Loss 3.1454   LearningRate 0.0154   Epoch: 12   Global Step: 50240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:50,994-Speed 3315.27 samples/sec   Loss 3.1099   LearningRate 0.0154   Epoch: 12   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:54,084-Speed 3314.18 samples/sec   Loss 2.9991   LearningRate 0.0154   Epoch: 12   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:36:57,164-Speed 3325.50 samples/sec   Loss 3.2000   LearningRate 0.0154   Epoch: 12   Global Step: 50270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:00,254-Speed 3315.20 samples/sec   Loss 3.1659   LearningRate 0.0154   Epoch: 12   Global Step: 50280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:03,402-Speed 3253.61 samples/sec   Loss 3.0445   LearningRate 0.0154   Epoch: 12   Global Step: 50290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:06,500-Speed 3306.65 samples/sec   Loss 3.0877   LearningRate 0.0153   Epoch: 12   Global Step: 50300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:09,583-Speed 3321.30 samples/sec   Loss 3.1437   LearningRate 0.0153   Epoch: 12   Global Step: 50310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:12,668-Speed 3320.47 samples/sec   Loss 3.0777   LearningRate 0.0153   Epoch: 12   Global Step: 50320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:15,749-Speed 3324.49 samples/sec   Loss 3.1957   LearningRate 0.0153   Epoch: 12   Global Step: 50330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:18,833-Speed 3320.82 samples/sec   Loss 3.0767   LearningRate 0.0153   Epoch: 12   Global Step: 50340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:21,919-Speed 3319.48 samples/sec   Loss 3.0740   LearningRate 0.0153   Epoch: 12   Global Step: 50350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:25,099-Speed 3220.32 samples/sec   Loss 3.2298   LearningRate 0.0153   Epoch: 12   Global Step: 50360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:37:28,250-Speed 3250.64 samples/sec   Loss 3.1132   LearningRate 0.0153   Epoch: 12   Global Step: 50370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:31,332-Speed 3323.63 samples/sec   Loss 3.1449   LearningRate 0.0153   Epoch: 12   Global Step: 50380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:34,426-Speed 3310.33 samples/sec   Loss 3.1118   LearningRate 0.0153   Epoch: 12   Global Step: 50390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:37,505-Speed 3326.87 samples/sec   Loss 3.2089   LearningRate 0.0153   Epoch: 12   Global Step: 50400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:40,613-Speed 3295.58 samples/sec   Loss 3.1495   LearningRate 0.0152   Epoch: 12   Global Step: 50410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:43,692-Speed 3325.76 samples/sec   Loss 3.1208   LearningRate 0.0152   Epoch: 12   Global Step: 50420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:46,786-Speed 3310.93 samples/sec   Loss 3.1597   LearningRate 0.0152   Epoch: 12   Global Step: 50430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:49,872-Speed 3319.06 samples/sec   Loss 3.1985   LearningRate 0.0152   Epoch: 12   Global Step: 50440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:52,958-Speed 3318.14 samples/sec   Loss 3.2180   LearningRate 0.0152   Epoch: 12   Global Step: 50450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:56,047-Speed 3315.54 samples/sec   Loss 3.2368   LearningRate 0.0152   Epoch: 12   Global Step: 50460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:37:59,116-Speed 3341.80 samples/sec   Loss 3.2120   LearningRate 0.0152   Epoch: 12   Global Step: 50470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:02,207-Speed 3313.74 samples/sec   Loss 3.2622   LearningRate 0.0152   Epoch: 12   Global Step: 50480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:05,322-Speed 3287.73 samples/sec   Loss 3.2486   LearningRate 0.0152   Epoch: 12   Global Step: 50490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:08,424-Speed 3301.96 samples/sec   Loss 3.2442   LearningRate 0.0152   Epoch: 12   Global Step: 50500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:11,504-Speed 3325.16 samples/sec   Loss 3.1909   LearningRate 0.0152   Epoch: 12   Global Step: 50510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:14,590-Speed 3319.60 samples/sec   Loss 3.1855   LearningRate 0.0151   Epoch: 12   Global Step: 50520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:17,675-Speed 3320.15 samples/sec   Loss 3.1839   LearningRate 0.0151   Epoch: 12   Global Step: 50530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:20,759-Speed 3320.56 samples/sec   Loss 3.2121   LearningRate 0.0151   Epoch: 12   Global Step: 50540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:23,851-Speed 3311.89 samples/sec   Loss 3.1751   LearningRate 0.0151   Epoch: 12   Global Step: 50550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:26,938-Speed 3318.18 samples/sec   Loss 3.2736   LearningRate 0.0151   Epoch: 12   Global Step: 50560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:30,005-Speed 3340.36 samples/sec   Loss 3.2770   LearningRate 0.0151   Epoch: 12   Global Step: 50570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:33,084-Speed 3326.67 samples/sec   Loss 3.3016   LearningRate 0.0151   Epoch: 12   Global Step: 50580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:36,167-Speed 3321.94 samples/sec   Loss 3.2251   LearningRate 0.0151   Epoch: 12   Global Step: 50590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:39,253-Speed 3318.92 samples/sec   Loss 3.2734   LearningRate 0.0151   Epoch: 12   Global Step: 50600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:42,345-Speed 3312.04 samples/sec   Loss 3.2542   LearningRate 0.0151   Epoch: 12   Global Step: 50610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:45,423-Speed 3327.95 samples/sec   Loss 3.2245   LearningRate 0.0150   Epoch: 12   Global Step: 50620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:48,506-Speed 3321.69 samples/sec   Loss 3.2523   LearningRate 0.0150   Epoch: 12   Global Step: 50630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:51,609-Speed 3300.48 samples/sec   Loss 3.1922   LearningRate 0.0150   Epoch: 12   Global Step: 50640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:54,696-Speed 3317.76 samples/sec   Loss 3.2948   LearningRate 0.0150   Epoch: 12   Global Step: 50650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:38:57,779-Speed 3322.62 samples/sec   Loss 3.3080   LearningRate 0.0150   Epoch: 12   Global Step: 50660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:00,881-Speed 3302.53 samples/sec   Loss 3.2720   LearningRate 0.0150   Epoch: 12   Global Step: 50670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:03,966-Speed 3320.16 samples/sec   Loss 3.2356   LearningRate 0.0150   Epoch: 12   Global Step: 50680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:07,048-Speed 3322.30 samples/sec   Loss 3.3298   LearningRate 0.0150   Epoch: 12   Global Step: 50690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:10,133-Speed 3321.11 samples/sec   Loss 3.3772   LearningRate 0.0150   Epoch: 12   Global Step: 50700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:13,220-Speed 3316.69 samples/sec   Loss 3.2115   LearningRate 0.0150   Epoch: 12   Global Step: 50710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:16,305-Speed 3320.74 samples/sec   Loss 3.3037   LearningRate 0.0150   Epoch: 12   Global Step: 50720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:19,391-Speed 3319.25 samples/sec   Loss 3.3024   LearningRate 0.0149   Epoch: 12   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:22,522-Speed 3270.95 samples/sec   Loss 3.2582   LearningRate 0.0149   Epoch: 12   Global Step: 50740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:25,631-Speed 3294.30 samples/sec   Loss 3.2297   LearningRate 0.0149   Epoch: 12   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:28,777-Speed 3256.19 samples/sec   Loss 3.2865   LearningRate 0.0149   Epoch: 12   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:31,847-Speed 3335.58 samples/sec   Loss 3.2814   LearningRate 0.0149   Epoch: 12   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:34,932-Speed 3320.57 samples/sec   Loss 3.3589   LearningRate 0.0149   Epoch: 12   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:38,014-Speed 3323.13 samples/sec   Loss 3.3529   LearningRate 0.0149   Epoch: 12   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:41,099-Speed 3320.02 samples/sec   Loss 3.2904   LearningRate 0.0149   Epoch: 12   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:44,184-Speed 3319.85 samples/sec   Loss 3.3600   LearningRate 0.0149   Epoch: 12   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:39:47,274-Speed 3315.01 samples/sec   Loss 3.3196   LearningRate 0.0149   Epoch: 12   Global Step: 50820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:39:50,354-Speed 3324.96 samples/sec   Loss 3.2301   LearningRate 0.0149   Epoch: 12   Global Step: 50830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:39:53,435-Speed 3323.73 samples/sec   Loss 3.3234   LearningRate 0.0148   Epoch: 12   Global Step: 50840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:39:56,547-Speed 3291.46 samples/sec   Loss 3.2488   LearningRate 0.0148   Epoch: 12   Global Step: 50850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:39:59,636-Speed 3316.06 samples/sec   Loss 3.3415   LearningRate 0.0148   Epoch: 12   Global Step: 50860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:02,729-Speed 3312.00 samples/sec   Loss 3.2909   LearningRate 0.0148   Epoch: 12   Global Step: 50870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:05,820-Speed 3313.16 samples/sec   Loss 3.3545   LearningRate 0.0148   Epoch: 12   Global Step: 50880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:08,902-Speed 3322.49 samples/sec   Loss 3.3768   LearningRate 0.0148   Epoch: 12   Global Step: 50890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:11,999-Speed 3308.19 samples/sec   Loss 3.3349   LearningRate 0.0148   Epoch: 12   Global Step: 50900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:15,090-Speed 3312.54 samples/sec   Loss 3.3422   LearningRate 0.0148   Epoch: 12   Global Step: 50910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:18,182-Speed 3313.12 samples/sec   Loss 3.3698   LearningRate 0.0148   Epoch: 12   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:40:21,266-Speed 3320.98 samples/sec   Loss 3.3329   LearningRate 0.0148   Epoch: 12   Global Step: 50930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:40:24,362-Speed 3308.14 samples/sec   Loss 3.3489   LearningRate 0.0147   Epoch: 12   Global Step: 50940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:40:27,434-Speed 3333.93 samples/sec   Loss 3.3439   LearningRate 0.0147   Epoch: 12   Global Step: 50950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:30,520-Speed 3319.19 samples/sec   Loss 3.2937   LearningRate 0.0147   Epoch: 12   Global Step: 50960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:33,631-Speed 3292.38 samples/sec   Loss 3.3491   LearningRate 0.0147   Epoch: 12   Global Step: 50970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:36,742-Speed 3292.89 samples/sec   Loss 3.3514   LearningRate 0.0147   Epoch: 12   Global Step: 50980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:39,861-Speed 3282.75 samples/sec   Loss 3.3893   LearningRate 0.0147   Epoch: 12   Global Step: 50990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:42,957-Speed 3308.44 samples/sec   Loss 3.4123   LearningRate 0.0147   Epoch: 12   Global Step: 51000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:46,044-Speed 3317.71 samples/sec   Loss 3.3371   LearningRate 0.0147   Epoch: 12   Global Step: 51010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:49,155-Speed 3292.70 samples/sec   Loss 3.3210   LearningRate 0.0147   Epoch: 12   Global Step: 51020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:52,239-Speed 3321.44 samples/sec   Loss 3.3161   LearningRate 0.0147   Epoch: 12   Global Step: 51030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:55,332-Speed 3310.91 samples/sec   Loss 3.3449   LearningRate 0.0147   Epoch: 12   Global Step: 51040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:40:58,449-Speed 3286.33 samples/sec   Loss 3.2689   LearningRate 0.0146   Epoch: 12   Global Step: 51050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:01,539-Speed 3314.23 samples/sec   Loss 3.3544   LearningRate 0.0146   Epoch: 12   Global Step: 51060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:04,638-Speed 3305.61 samples/sec   Loss 3.4220   LearningRate 0.0146   Epoch: 12   Global Step: 51070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:07,768-Speed 3272.01 samples/sec   Loss 3.3060   LearningRate 0.0146   Epoch: 12   Global Step: 51080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:10,999-Speed 3169.80 samples/sec   Loss 3.4320   LearningRate 0.0146   Epoch: 12   Global Step: 51090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:14,087-Speed 3316.97 samples/sec   Loss 3.3406   LearningRate 0.0146   Epoch: 12   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:17,177-Speed 3314.61 samples/sec   Loss 3.3322   LearningRate 0.0146   Epoch: 12   Global Step: 51110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:20,261-Speed 3320.58 samples/sec   Loss 3.4069   LearningRate 0.0146   Epoch: 12   Global Step: 51120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:23,375-Speed 3289.53 samples/sec   Loss 3.3752   LearningRate 0.0146   Epoch: 12   Global Step: 51130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:26,482-Speed 3296.60 samples/sec   Loss 3.3682   LearningRate 0.0146   Epoch: 12   Global Step: 51140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:29,561-Speed 3326.34 samples/sec   Loss 3.3374   LearningRate 0.0146   Epoch: 12   Global Step: 51150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:32,646-Speed 3320.67 samples/sec   Loss 3.4309   LearningRate 0.0145   Epoch: 12   Global Step: 51160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:35,739-Speed 3311.23 samples/sec   Loss 3.4357   LearningRate 0.0145   Epoch: 12   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:38,835-Speed 3307.95 samples/sec   Loss 3.3984   LearningRate 0.0145   Epoch: 12   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:41,928-Speed 3311.05 samples/sec   Loss 3.4236   LearningRate 0.0145   Epoch: 12   Global Step: 51190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:41:44,986-Speed 3350.13 samples/sec   Loss 3.4294   LearningRate 0.0145   Epoch: 12   Global Step: 51200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:41:48,163-Speed 3223.90 samples/sec   Loss 3.4301   LearningRate 0.0145   Epoch: 12   Global Step: 51210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:41:51,294-Speed 3270.59 samples/sec   Loss 3.4020   LearningRate 0.0145   Epoch: 12   Global Step: 51220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:41:54,474-Speed 3221.21 samples/sec   Loss 3.3876   LearningRate 0.0145   Epoch: 12   Global Step: 51230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:41:57,582-Speed 3295.32 samples/sec   Loss 3.3533   LearningRate 0.0145   Epoch: 12   Global Step: 51240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:42:00,673-Speed 3313.59 samples/sec   Loss 3.4539   LearningRate 0.0145   Epoch: 12   Global Step: 51250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:42:03,763-Speed 3315.51 samples/sec   Loss 3.4140   LearningRate 0.0145   Epoch: 12   Global Step: 51260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:42:06,846-Speed 3321.93 samples/sec   Loss 3.4275   LearningRate 0.0144   Epoch: 12   Global Step: 51270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:42:09,934-Speed 3316.31 samples/sec   Loss 3.4473   LearningRate 0.0144   Epoch: 12   Global Step: 51280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:42:13,020-Speed 3318.68 samples/sec   Loss 3.3887   LearningRate 0.0144   Epoch: 12   Global Step: 51290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:42:16,111-Speed 3313.89 samples/sec   Loss 3.4329   LearningRate 0.0144   Epoch: 12   Global Step: 51300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:19,200-Speed 3316.11 samples/sec   Loss 3.4110   LearningRate 0.0144   Epoch: 12   Global Step: 51310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:22,282-Speed 3322.73 samples/sec   Loss 3.3099   LearningRate 0.0144   Epoch: 12   Global Step: 51320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:25,414-Speed 3270.30 samples/sec   Loss 3.3882   LearningRate 0.0144   Epoch: 12   Global Step: 51330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:28,503-Speed 3316.22 samples/sec   Loss 3.4014   LearningRate 0.0144   Epoch: 12   Global Step: 51340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:31,588-Speed 3320.32 samples/sec   Loss 3.4609   LearningRate 0.0144   Epoch: 12   Global Step: 51350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:34,699-Speed 3291.79 samples/sec   Loss 3.4342   LearningRate 0.0144   Epoch: 12   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:37,788-Speed 3315.48 samples/sec   Loss 3.3987   LearningRate 0.0144   Epoch: 12   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:40,878-Speed 3314.94 samples/sec   Loss 3.4283   LearningRate 0.0143   Epoch: 12   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:43,969-Speed 3313.73 samples/sec   Loss 3.3430   LearningRate 0.0143   Epoch: 12   Global Step: 51390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:42:47,061-Speed 3311.76 samples/sec   Loss 3.4909   LearningRate 0.0143   Epoch: 12   Global Step: 51400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:42:50,145-Speed 3321.78 samples/sec   Loss 3.5123   LearningRate 0.0143   Epoch: 12   Global Step: 51410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:42:53,324-Speed 3221.46 samples/sec   Loss 3.4374   LearningRate 0.0143   Epoch: 12   Global Step: 51420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:42:56,415-Speed 3313.71 samples/sec   Loss 3.4133   LearningRate 0.0143   Epoch: 12   Global Step: 51430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:42:59,531-Speed 3287.50 samples/sec   Loss 3.4429   LearningRate 0.0143   Epoch: 12   Global Step: 51440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:02,620-Speed 3315.89 samples/sec   Loss 3.4853   LearningRate 0.0143   Epoch: 12   Global Step: 51450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:05,707-Speed 3317.24 samples/sec   Loss 3.4244   LearningRate 0.0143   Epoch: 12   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:08,792-Speed 3320.53 samples/sec   Loss 3.4495   LearningRate 0.0143   Epoch: 12   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:11,879-Speed 3317.60 samples/sec   Loss 3.4907   LearningRate 0.0143   Epoch: 12   Global Step: 51480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:14,963-Speed 3321.05 samples/sec   Loss 3.4139   LearningRate 0.0142   Epoch: 12   Global Step: 51490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:18,054-Speed 3312.97 samples/sec   Loss 3.4930   LearningRate 0.0142   Epoch: 12   Global Step: 51500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-26 17:43:21,126-Speed 3333.95 samples/sec   Loss 3.4779   LearningRate 0.0142   Epoch: 12   Global Step: 51510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:24,215-Speed 3316.76 samples/sec   Loss 3.4190   LearningRate 0.0142   Epoch: 12   Global Step: 51520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:27,382-Speed 3233.70 samples/sec   Loss 3.4295   LearningRate 0.0142   Epoch: 12   Global Step: 51530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:30,468-Speed 3318.89 samples/sec   Loss 3.3974   LearningRate 0.0142   Epoch: 12   Global Step: 51540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:33,551-Speed 3322.72 samples/sec   Loss 3.4876   LearningRate 0.0142   Epoch: 12   Global Step: 51550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:36,637-Speed 3318.71 samples/sec   Loss 3.4870   LearningRate 0.0142   Epoch: 12   Global Step: 51560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:39,720-Speed 3321.80 samples/sec   Loss 3.4721   LearningRate 0.0142   Epoch: 12   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:42,804-Speed 3320.74 samples/sec   Loss 3.4820   LearningRate 0.0142   Epoch: 12   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:45,897-Speed 3311.46 samples/sec   Loss 3.5176   LearningRate 0.0142   Epoch: 12   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:48,991-Speed 3310.29 samples/sec   Loss 3.4602   LearningRate 0.0141   Epoch: 12   Global Step: 51600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:52,061-Speed 3336.94 samples/sec   Loss 3.5460   LearningRate 0.0141   Epoch: 12   Global Step: 51610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:43:55,130-Speed 3336.98 samples/sec   Loss 3.4494   LearningRate 0.0141   Epoch: 12   Global Step: 51620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:43:58,231-Speed 3302.89 samples/sec   Loss 3.4290   LearningRate 0.0141   Epoch: 12   Global Step: 51630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:01,336-Speed 3298.59 samples/sec   Loss 3.5267   LearningRate 0.0141   Epoch: 12   Global Step: 51640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:04,427-Speed 3314.14 samples/sec   Loss 3.4549   LearningRate 0.0141   Epoch: 12   Global Step: 51650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:07,516-Speed 3315.97 samples/sec   Loss 3.4900   LearningRate 0.0141   Epoch: 12   Global Step: 51660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:10,599-Speed 3322.05 samples/sec   Loss 3.5392   LearningRate 0.0141   Epoch: 12   Global Step: 51670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:13,700-Speed 3302.29 samples/sec   Loss 3.5430   LearningRate 0.0141   Epoch: 12   Global Step: 51680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:16,794-Speed 3310.15 samples/sec   Loss 3.4367   LearningRate 0.0141   Epoch: 12   Global Step: 51690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:19,882-Speed 3317.17 samples/sec   Loss 3.4711   LearningRate 0.0141   Epoch: 12   Global Step: 51700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:22,966-Speed 3321.01 samples/sec   Loss 3.5988   LearningRate 0.0140   Epoch: 12   Global Step: 51710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:44:26,067-Speed 3302.38 samples/sec   Loss 3.4242   LearningRate 0.0140   Epoch: 12   Global Step: 51720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:29,157-Speed 3314.93 samples/sec   Loss 3.5648   LearningRate 0.0140   Epoch: 12   Global Step: 51730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:32,245-Speed 3317.46 samples/sec   Loss 3.4703   LearningRate 0.0140   Epoch: 12   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:35,336-Speed 3313.29 samples/sec   Loss 3.3518   LearningRate 0.0140   Epoch: 12   Global Step: 51750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:38,431-Speed 3309.05 samples/sec   Loss 3.4516   LearningRate 0.0140   Epoch: 12   Global Step: 51760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:41,543-Speed 3291.39 samples/sec   Loss 3.4411   LearningRate 0.0140   Epoch: 12   Global Step: 51770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:44,626-Speed 3321.74 samples/sec   Loss 3.5757   LearningRate 0.0140   Epoch: 12   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:47,728-Speed 3302.72 samples/sec   Loss 3.5066   LearningRate 0.0140   Epoch: 12   Global Step: 51790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:50,820-Speed 3312.13 samples/sec   Loss 3.5045   LearningRate 0.0140   Epoch: 12   Global Step: 51800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:53,914-Speed 3310.74 samples/sec   Loss 3.5229   LearningRate 0.0140   Epoch: 12   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:44:56,985-Speed 3335.02 samples/sec   Loss 3.4625   LearningRate 0.0139   Epoch: 12   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:00,076-Speed 3313.59 samples/sec   Loss 3.5604   LearningRate 0.0139   Epoch: 12   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:03,167-Speed 3312.92 samples/sec   Loss 3.5619   LearningRate 0.0139   Epoch: 12   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:06,266-Speed 3305.77 samples/sec   Loss 3.5102   LearningRate 0.0139   Epoch: 12   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:09,356-Speed 3314.56 samples/sec   Loss 3.5773   LearningRate 0.0139   Epoch: 12   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:12,446-Speed 3314.04 samples/sec   Loss 3.4315   LearningRate 0.0139   Epoch: 12   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:15,545-Speed 3305.58 samples/sec   Loss 3.5957   LearningRate 0.0139   Epoch: 12   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:18,632-Speed 3317.62 samples/sec   Loss 3.4721   LearningRate 0.0139   Epoch: 12   Global Step: 51890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:21,732-Speed 3303.82 samples/sec   Loss 3.5589   LearningRate 0.0139   Epoch: 12   Global Step: 51900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:24,831-Speed 3305.46 samples/sec   Loss 3.5706   LearningRate 0.0139   Epoch: 12   Global Step: 51910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:27,962-Speed 3271.55 samples/sec   Loss 3.5306   LearningRate 0.0139   Epoch: 12   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:31,048-Speed 3318.56 samples/sec   Loss 3.5412   LearningRate 0.0138   Epoch: 12   Global Step: 51930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:34,146-Speed 3305.89 samples/sec   Loss 3.4806   LearningRate 0.0138   Epoch: 12   Global Step: 51940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:37,232-Speed 3319.61 samples/sec   Loss 3.4335   LearningRate 0.0138   Epoch: 12   Global Step: 51950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:40,320-Speed 3316.49 samples/sec   Loss 3.6228   LearningRate 0.0138   Epoch: 12   Global Step: 51960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:43,408-Speed 3316.45 samples/sec   Loss 3.4939   LearningRate 0.0138   Epoch: 12   Global Step: 51970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:46,496-Speed 3316.50 samples/sec   Loss 3.5994   LearningRate 0.0138   Epoch: 12   Global Step: 51980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:49,665-Speed 3232.71 samples/sec   Loss 3.4739   LearningRate 0.0138   Epoch: 12   Global Step: 51990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:45:52,777-Speed 3291.14 samples/sec   Loss 3.4925   LearningRate 0.0138   Epoch: 12   Global Step: 52000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:46:36,204-[lfw][52000]XNorm: 22.921602
Training: 2022-04-26 17:46:36,205-[lfw][52000]Accuracy-Flip: 0.99717+-0.00259
Training: 2022-04-26 17:46:36,205-[lfw][52000]Accuracy-Highest: 0.99783
Training: 2022-04-26 17:47:26,812-[cfp_fp][52000]XNorm: 22.140296
Training: 2022-04-26 17:47:26,813-[cfp_fp][52000]Accuracy-Flip: 0.99029+-0.00538
Training: 2022-04-26 17:47:26,813-[cfp_fp][52000]Accuracy-Highest: 0.99029
Training: 2022-04-26 17:48:10,620-[agedb_30][52000]XNorm: 23.196907
Training: 2022-04-26 17:48:10,620-[agedb_30][52000]Accuracy-Flip: 0.97500+-0.00738
Training: 2022-04-26 17:48:10,621-[agedb_30][52000]Accuracy-Highest: 0.97550
Training: 2022-04-26 17:48:13,705-Speed 72.66 samples/sec   Loss 3.4189   LearningRate 0.0138   Epoch: 12   Global Step: 52010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:48:16,783-Speed 3328.53 samples/sec   Loss 3.5501   LearningRate 0.0138   Epoch: 12   Global Step: 52020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:48:19,846-Speed 3343.98 samples/sec   Loss 3.4445   LearningRate 0.0138   Epoch: 12   Global Step: 52030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:22,932-Speed 3318.17 samples/sec   Loss 3.5147   LearningRate 0.0137   Epoch: 12   Global Step: 52040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:26,021-Speed 3315.57 samples/sec   Loss 3.4512   LearningRate 0.0137   Epoch: 12   Global Step: 52050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:29,103-Speed 3324.01 samples/sec   Loss 3.4594   LearningRate 0.0137   Epoch: 12   Global Step: 52060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:32,188-Speed 3319.84 samples/sec   Loss 3.5117   LearningRate 0.0137   Epoch: 12   Global Step: 52070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:35,272-Speed 3321.71 samples/sec   Loss 3.4973   LearningRate 0.0137   Epoch: 12   Global Step: 52080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:38,360-Speed 3317.04 samples/sec   Loss 3.4599   LearningRate 0.0137   Epoch: 12   Global Step: 52090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:41,461-Speed 3302.20 samples/sec   Loss 3.4904   LearningRate 0.0137   Epoch: 12   Global Step: 52100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:44,560-Speed 3306.60 samples/sec   Loss 3.5031   LearningRate 0.0137   Epoch: 12   Global Step: 52110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:47,646-Speed 3318.92 samples/sec   Loss 3.4999   LearningRate 0.0137   Epoch: 12   Global Step: 52120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:48:50,739-Speed 3311.53 samples/sec   Loss 3.5421   LearningRate 0.0137   Epoch: 12   Global Step: 52130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:48:53,850-Speed 3292.31 samples/sec   Loss 3.5613   LearningRate 0.0137   Epoch: 12   Global Step: 52140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:48:56,944-Speed 3310.29 samples/sec   Loss 3.5886   LearningRate 0.0136   Epoch: 12   Global Step: 52150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:49:00,039-Speed 3309.37 samples/sec   Loss 3.5326   LearningRate 0.0136   Epoch: 12   Global Step: 52160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:49:03,129-Speed 3314.13 samples/sec   Loss 3.5112   LearningRate 0.0136   Epoch: 12   Global Step: 52170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:49:06,207-Speed 3327.54 samples/sec   Loss 3.5634   LearningRate 0.0136   Epoch: 12   Global Step: 52180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:09,286-Speed 3326.80 samples/sec   Loss 3.5056   LearningRate 0.0136   Epoch: 12   Global Step: 52190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:12,368-Speed 3322.57 samples/sec   Loss 3.5613   LearningRate 0.0136   Epoch: 12   Global Step: 52200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:15,447-Speed 3326.29 samples/sec   Loss 3.5384   LearningRate 0.0136   Epoch: 12   Global Step: 52210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:18,535-Speed 3317.67 samples/sec   Loss 3.5455   LearningRate 0.0136   Epoch: 12   Global Step: 52220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:21,621-Speed 3318.56 samples/sec   Loss 3.5535   LearningRate 0.0136   Epoch: 12   Global Step: 52230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:24,712-Speed 3313.80 samples/sec   Loss 3.4724   LearningRate 0.0136   Epoch: 12   Global Step: 52240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:27,800-Speed 3317.27 samples/sec   Loss 3.6517   LearningRate 0.0136   Epoch: 12   Global Step: 52250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:30,885-Speed 3319.81 samples/sec   Loss 3.5406   LearningRate 0.0135   Epoch: 12   Global Step: 52260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:33,963-Speed 3327.24 samples/sec   Loss 3.5448   LearningRate 0.0135   Epoch: 12   Global Step: 52270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:37,043-Speed 3325.48 samples/sec   Loss 3.5298   LearningRate 0.0135   Epoch: 12   Global Step: 52280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:49:40,131-Speed 3316.60 samples/sec   Loss 3.5097   LearningRate 0.0135   Epoch: 12   Global Step: 52290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:49:43,213-Speed 3323.09 samples/sec   Loss 3.4124   LearningRate 0.0135   Epoch: 12   Global Step: 52300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:49:46,319-Speed 3297.96 samples/sec   Loss 3.4596   LearningRate 0.0135   Epoch: 12   Global Step: 52310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:49,510-Speed 3210.03 samples/sec   Loss 3.4865   LearningRate 0.0135   Epoch: 12   Global Step: 52320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:52,665-Speed 3245.93 samples/sec   Loss 3.5584   LearningRate 0.0135   Epoch: 12   Global Step: 52330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:55,746-Speed 3323.97 samples/sec   Loss 3.6019   LearningRate 0.0135   Epoch: 12   Global Step: 52340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:49:58,826-Speed 3325.38 samples/sec   Loss 3.5468   LearningRate 0.0135   Epoch: 12   Global Step: 52350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:01,908-Speed 3323.88 samples/sec   Loss 3.5625   LearningRate 0.0135   Epoch: 12   Global Step: 52360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:04,995-Speed 3317.87 samples/sec   Loss 3.4507   LearningRate 0.0135   Epoch: 12   Global Step: 52370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:08,081-Speed 3319.01 samples/sec   Loss 3.5359   LearningRate 0.0134   Epoch: 12   Global Step: 52380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:11,165-Speed 3320.56 samples/sec   Loss 3.5166   LearningRate 0.0134   Epoch: 12   Global Step: 52390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:14,290-Speed 3277.64 samples/sec   Loss 3.5955   LearningRate 0.0134   Epoch: 12   Global Step: 52400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:17,409-Speed 3284.38 samples/sec   Loss 3.5466   LearningRate 0.0134   Epoch: 12   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:50:20,486-Speed 3328.22 samples/sec   Loss 3.6071   LearningRate 0.0134   Epoch: 12   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:50:23,568-Speed 3323.78 samples/sec   Loss 3.5971   LearningRate 0.0134   Epoch: 12   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:50:26,646-Speed 3326.89 samples/sec   Loss 3.5311   LearningRate 0.0134   Epoch: 12   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:50:29,717-Speed 3335.28 samples/sec   Loss 3.6065   LearningRate 0.0134   Epoch: 12   Global Step: 52450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:32,805-Speed 3317.26 samples/sec   Loss 3.5671   LearningRate 0.0134   Epoch: 12   Global Step: 52460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:35,894-Speed 3315.66 samples/sec   Loss 3.5872   LearningRate 0.0134   Epoch: 12   Global Step: 52470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:38,973-Speed 3326.05 samples/sec   Loss 3.6058   LearningRate 0.0134   Epoch: 12   Global Step: 52480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:42,161-Speed 3212.97 samples/sec   Loss 3.5595   LearningRate 0.0133   Epoch: 12   Global Step: 52490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:45,237-Speed 3329.28 samples/sec   Loss 3.5541   LearningRate 0.0133   Epoch: 12   Global Step: 52500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:48,334-Speed 3307.66 samples/sec   Loss 3.6142   LearningRate 0.0133   Epoch: 12   Global Step: 52510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:51,436-Speed 3301.67 samples/sec   Loss 3.5931   LearningRate 0.0133   Epoch: 12   Global Step: 52520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:54,523-Speed 3317.80 samples/sec   Loss 3.5168   LearningRate 0.0133   Epoch: 12   Global Step: 52530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:50:57,600-Speed 3329.32 samples/sec   Loss 3.5258   LearningRate 0.0133   Epoch: 12   Global Step: 52540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:00,762-Speed 3238.36 samples/sec   Loss 3.5765   LearningRate 0.0133   Epoch: 12   Global Step: 52550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:51:03,829-Speed 3339.60 samples/sec   Loss 3.5716   LearningRate 0.0133   Epoch: 12   Global Step: 52560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:06,898-Speed 3337.86 samples/sec   Loss 3.5695   LearningRate 0.0133   Epoch: 12   Global Step: 52570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:10,082-Speed 3216.95 samples/sec   Loss 3.5722   LearningRate 0.0133   Epoch: 12   Global Step: 52580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:13,163-Speed 3323.79 samples/sec   Loss 3.6097   LearningRate 0.0133   Epoch: 12   Global Step: 52590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:16,240-Speed 3328.44 samples/sec   Loss 3.5421   LearningRate 0.0132   Epoch: 12   Global Step: 52600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:19,322-Speed 3323.62 samples/sec   Loss 3.6899   LearningRate 0.0132   Epoch: 12   Global Step: 52610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:22,401-Speed 3326.98 samples/sec   Loss 3.5774   LearningRate 0.0132   Epoch: 12   Global Step: 52620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:25,496-Speed 3309.00 samples/sec   Loss 3.5952   LearningRate 0.0132   Epoch: 12   Global Step: 52630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:28,578-Speed 3323.54 samples/sec   Loss 3.5483   LearningRate 0.0132   Epoch: 12   Global Step: 52640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:31,701-Speed 3279.51 samples/sec   Loss 3.5606   LearningRate 0.0132   Epoch: 12   Global Step: 52650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:34,775-Speed 3331.34 samples/sec   Loss 3.5730   LearningRate 0.0132   Epoch: 12   Global Step: 52660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:51:37,853-Speed 3327.43 samples/sec   Loss 3.6997   LearningRate 0.0132   Epoch: 12   Global Step: 52670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:40,942-Speed 3315.83 samples/sec   Loss 3.5387   LearningRate 0.0132   Epoch: 12   Global Step: 52680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:44,020-Speed 3328.24 samples/sec   Loss 3.5308   LearningRate 0.0132   Epoch: 12   Global Step: 52690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:47,099-Speed 3325.78 samples/sec   Loss 3.5808   LearningRate 0.0132   Epoch: 12   Global Step: 52700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:50,193-Speed 3311.03 samples/sec   Loss 3.5057   LearningRate 0.0132   Epoch: 12   Global Step: 52710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:53,365-Speed 3228.80 samples/sec   Loss 3.5308   LearningRate 0.0131   Epoch: 12   Global Step: 52720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:56,446-Speed 3324.55 samples/sec   Loss 3.6287   LearningRate 0.0131   Epoch: 12   Global Step: 52730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:51:59,534-Speed 3315.88 samples/sec   Loss 3.5818   LearningRate 0.0131   Epoch: 12   Global Step: 52740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:52:02,644-Speed 3294.02 samples/sec   Loss 3.4760   LearningRate 0.0131   Epoch: 12   Global Step: 52750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:52:05,748-Speed 3299.68 samples/sec   Loss 3.4978   LearningRate 0.0131   Epoch: 12   Global Step: 52760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:52:08,823-Speed 3330.07 samples/sec   Loss 3.5498   LearningRate 0.0131   Epoch: 12   Global Step: 52770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:11,911-Speed 3317.33 samples/sec   Loss 3.6254   LearningRate 0.0131   Epoch: 12   Global Step: 52780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:15,013-Speed 3302.13 samples/sec   Loss 3.5873   LearningRate 0.0131   Epoch: 12   Global Step: 52790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:18,093-Speed 3324.85 samples/sec   Loss 3.5836   LearningRate 0.0131   Epoch: 12   Global Step: 52800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:21,171-Speed 3328.05 samples/sec   Loss 3.5661   LearningRate 0.0131   Epoch: 12   Global Step: 52810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:24,260-Speed 3315.79 samples/sec   Loss 3.6212   LearningRate 0.0131   Epoch: 12   Global Step: 52820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:27,344-Speed 3321.20 samples/sec   Loss 3.5708   LearningRate 0.0130   Epoch: 12   Global Step: 52830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:30,423-Speed 3326.22 samples/sec   Loss 3.5405   LearningRate 0.0130   Epoch: 12   Global Step: 52840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:33,502-Speed 3327.08 samples/sec   Loss 3.5284   LearningRate 0.0130   Epoch: 12   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:36,611-Speed 3293.81 samples/sec   Loss 3.5534   LearningRate 0.0130   Epoch: 12   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:39,826-Speed 3185.78 samples/sec   Loss 3.5493   LearningRate 0.0130   Epoch: 12   Global Step: 52870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:42,915-Speed 3315.74 samples/sec   Loss 3.5478   LearningRate 0.0130   Epoch: 12   Global Step: 52880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:46,008-Speed 3310.97 samples/sec   Loss 3.6123   LearningRate 0.0130   Epoch: 12   Global Step: 52890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:49,110-Speed 3302.13 samples/sec   Loss 3.5190   LearningRate 0.0130   Epoch: 12   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:52,221-Speed 3292.18 samples/sec   Loss 3.4905   LearningRate 0.0130   Epoch: 12   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:55,313-Speed 3313.19 samples/sec   Loss 3.6280   LearningRate 0.0130   Epoch: 12   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:52:58,404-Speed 3312.89 samples/sec   Loss 3.7335   LearningRate 0.0130   Epoch: 12   Global Step: 52930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:01,502-Speed 3306.62 samples/sec   Loss 3.6359   LearningRate 0.0129   Epoch: 12   Global Step: 52940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:04,589-Speed 3317.10 samples/sec   Loss 3.6172   LearningRate 0.0129   Epoch: 12   Global Step: 52950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:07,678-Speed 3316.50 samples/sec   Loss 3.6047   LearningRate 0.0129   Epoch: 12   Global Step: 52960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:10,747-Speed 3337.33 samples/sec   Loss 3.4856   LearningRate 0.0129   Epoch: 12   Global Step: 52970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:13,828-Speed 3323.42 samples/sec   Loss 3.4999   LearningRate 0.0129   Epoch: 12   Global Step: 52980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:16,936-Speed 3295.81 samples/sec   Loss 3.6416   LearningRate 0.0129   Epoch: 12   Global Step: 52990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:20,022-Speed 3318.98 samples/sec   Loss 3.5598   LearningRate 0.0129   Epoch: 12   Global Step: 53000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:23,118-Speed 3309.22 samples/sec   Loss 3.5838   LearningRate 0.0129   Epoch: 12   Global Step: 53010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:26,202-Speed 3321.02 samples/sec   Loss 3.5890   LearningRate 0.0129   Epoch: 12   Global Step: 53020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:29,286-Speed 3320.94 samples/sec   Loss 3.6506   LearningRate 0.0129   Epoch: 12   Global Step: 53030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:32,374-Speed 3316.16 samples/sec   Loss 3.6162   LearningRate 0.0129   Epoch: 12   Global Step: 53040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:35,466-Speed 3312.29 samples/sec   Loss 3.6937   LearningRate 0.0129   Epoch: 12   Global Step: 53050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:38,553-Speed 3318.57 samples/sec   Loss 3.5733   LearningRate 0.0128   Epoch: 12   Global Step: 53060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:41,619-Speed 3340.70 samples/sec   Loss 3.5914   LearningRate 0.0128   Epoch: 12   Global Step: 53070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:44,708-Speed 3315.15 samples/sec   Loss 3.4970   LearningRate 0.0128   Epoch: 12   Global Step: 53080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:47,818-Speed 3293.70 samples/sec   Loss 3.5786   LearningRate 0.0128   Epoch: 12   Global Step: 53090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:50,911-Speed 3311.09 samples/sec   Loss 3.5950   LearningRate 0.0128   Epoch: 12   Global Step: 53100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:54,002-Speed 3313.64 samples/sec   Loss 3.6030   LearningRate 0.0128   Epoch: 12   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:53:57,101-Speed 3305.52 samples/sec   Loss 3.5916   LearningRate 0.0128   Epoch: 12   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:00,189-Speed 3316.66 samples/sec   Loss 3.5800   LearningRate 0.0128   Epoch: 12   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:03,274-Speed 3319.72 samples/sec   Loss 3.6136   LearningRate 0.0128   Epoch: 12   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:06,363-Speed 3315.72 samples/sec   Loss 3.6293   LearningRate 0.0128   Epoch: 12   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:09,447-Speed 3321.31 samples/sec   Loss 3.5068   LearningRate 0.0128   Epoch: 12   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:12,524-Speed 3328.51 samples/sec   Loss 3.6650   LearningRate 0.0128   Epoch: 12   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:15,611-Speed 3317.36 samples/sec   Loss 3.5678   LearningRate 0.0127   Epoch: 12   Global Step: 53180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:18,699-Speed 3316.62 samples/sec   Loss 3.5676   LearningRate 0.0127   Epoch: 12   Global Step: 53190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:21,777-Speed 3328.73 samples/sec   Loss 3.5985   LearningRate 0.0127   Epoch: 12   Global Step: 53200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:24,863-Speed 3318.75 samples/sec   Loss 3.5808   LearningRate 0.0127   Epoch: 12   Global Step: 53210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:27,946-Speed 3321.86 samples/sec   Loss 3.6077   LearningRate 0.0127   Epoch: 12   Global Step: 53220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:31,040-Speed 3310.64 samples/sec   Loss 3.6301   LearningRate 0.0127   Epoch: 12   Global Step: 53230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:34,123-Speed 3321.91 samples/sec   Loss 3.5308   LearningRate 0.0127   Epoch: 12   Global Step: 53240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:37,222-Speed 3305.10 samples/sec   Loss 3.5152   LearningRate 0.0127   Epoch: 12   Global Step: 53250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:40,315-Speed 3310.82 samples/sec   Loss 3.6111   LearningRate 0.0127   Epoch: 12   Global Step: 53260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:43,420-Speed 3299.20 samples/sec   Loss 3.5909   LearningRate 0.0127   Epoch: 12   Global Step: 53270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:46,520-Speed 3304.10 samples/sec   Loss 3.6536   LearningRate 0.0127   Epoch: 12   Global Step: 53280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:49,602-Speed 3322.75 samples/sec   Loss 3.6058   LearningRate 0.0126   Epoch: 12   Global Step: 53290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:54:52,691-Speed 3316.54 samples/sec   Loss 3.6537   LearningRate 0.0126   Epoch: 12   Global Step: 53300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:55,778-Speed 3317.53 samples/sec   Loss 3.6223   LearningRate 0.0126   Epoch: 12   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:54:58,853-Speed 3330.97 samples/sec   Loss 3.6228   LearningRate 0.0126   Epoch: 12   Global Step: 53320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:01,948-Speed 3309.55 samples/sec   Loss 3.5116   LearningRate 0.0126   Epoch: 12   Global Step: 53330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:05,049-Speed 3302.77 samples/sec   Loss 3.5506   LearningRate 0.0126   Epoch: 12   Global Step: 53340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:08,139-Speed 3314.57 samples/sec   Loss 3.5360   LearningRate 0.0126   Epoch: 12   Global Step: 53350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:11,228-Speed 3315.79 samples/sec   Loss 3.6096   LearningRate 0.0126   Epoch: 12   Global Step: 53360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:14,320-Speed 3311.98 samples/sec   Loss 3.6046   LearningRate 0.0126   Epoch: 12   Global Step: 53370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:17,406-Speed 3319.38 samples/sec   Loss 3.5960   LearningRate 0.0126   Epoch: 12   Global Step: 53380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:20,491-Speed 3319.91 samples/sec   Loss 3.6288   LearningRate 0.0126   Epoch: 12   Global Step: 53390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:23,579-Speed 3317.20 samples/sec   Loss 3.5661   LearningRate 0.0126   Epoch: 12   Global Step: 53400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:26,662-Speed 3321.79 samples/sec   Loss 3.6308   LearningRate 0.0125   Epoch: 12   Global Step: 53410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:55:29,748-Speed 3318.93 samples/sec   Loss 3.5364   LearningRate 0.0125   Epoch: 12   Global Step: 53420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:32,830-Speed 3323.80 samples/sec   Loss 3.6087   LearningRate 0.0125   Epoch: 12   Global Step: 53430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:35,916-Speed 3318.57 samples/sec   Loss 3.5902   LearningRate 0.0125   Epoch: 12   Global Step: 53440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:39,014-Speed 3305.36 samples/sec   Loss 3.5534   LearningRate 0.0125   Epoch: 12   Global Step: 53450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:42,133-Speed 3284.13 samples/sec   Loss 3.6626   LearningRate 0.0125   Epoch: 12   Global Step: 53460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:45,220-Speed 3317.56 samples/sec   Loss 3.6774   LearningRate 0.0125   Epoch: 12   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:48,319-Speed 3305.61 samples/sec   Loss 3.6368   LearningRate 0.0125   Epoch: 12   Global Step: 53480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:51,418-Speed 3304.89 samples/sec   Loss 3.5469   LearningRate 0.0125   Epoch: 12   Global Step: 53490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:54,504-Speed 3318.84 samples/sec   Loss 3.5870   LearningRate 0.0125   Epoch: 12   Global Step: 53500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:55:57,591-Speed 3318.42 samples/sec   Loss 3.5640   LearningRate 0.0125   Epoch: 12   Global Step: 53510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:56:00,674-Speed 3321.55 samples/sec   Loss 3.5791   LearningRate 0.0124   Epoch: 12   Global Step: 53520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-26 17:56:03,969-Speed 3108.25 samples/sec   Loss 3.5759   LearningRate 0.0124   Epoch: 12   Global Step: 53530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:56:07,053-Speed 3321.17 samples/sec   Loss 3.5276   LearningRate 0.0124   Epoch: 12   Global Step: 53540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:56:10,410-Speed 3051.32 samples/sec   Loss 3.6156   LearningRate 0.0124   Epoch: 12   Global Step: 53550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:56:14,040-Speed 2821.03 samples/sec   Loss 3.5065   LearningRate 0.0124   Epoch: 12   Global Step: 53560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:56:17,405-Speed 3044.24 samples/sec   Loss 3.6175   LearningRate 0.0124   Epoch: 12   Global Step: 53570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:56:20,497-Speed 3312.45 samples/sec   Loss 3.5457   LearningRate 0.0124   Epoch: 12   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:56:23,576-Speed 3326.93 samples/sec   Loss 3.6105   LearningRate 0.0124   Epoch: 12   Global Step: 53590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:26,673-Speed 3306.43 samples/sec   Loss 3.6002   LearningRate 0.0124   Epoch: 12   Global Step: 53600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:29,780-Speed 3296.29 samples/sec   Loss 3.6345   LearningRate 0.0124   Epoch: 12   Global Step: 53610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:32,881-Speed 3302.80 samples/sec   Loss 3.5905   LearningRate 0.0124   Epoch: 12   Global Step: 53620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:36,010-Speed 3274.22 samples/sec   Loss 3.6041   LearningRate 0.0124   Epoch: 12   Global Step: 53630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:39,129-Speed 3283.78 samples/sec   Loss 3.5612   LearningRate 0.0123   Epoch: 12   Global Step: 53640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:42,222-Speed 3311.15 samples/sec   Loss 3.6230   LearningRate 0.0123   Epoch: 12   Global Step: 53650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:45,308-Speed 3318.97 samples/sec   Loss 3.5958   LearningRate 0.0123   Epoch: 12   Global Step: 53660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:48,408-Speed 3304.14 samples/sec   Loss 3.5025   LearningRate 0.0123   Epoch: 12   Global Step: 53670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:56:51,492-Speed 3321.22 samples/sec   Loss 3.5329   LearningRate 0.0123   Epoch: 12   Global Step: 53680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:56:54,614-Speed 3280.25 samples/sec   Loss 3.5324   LearningRate 0.0123   Epoch: 12   Global Step: 53690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:56:57,705-Speed 3313.18 samples/sec   Loss 3.5540   LearningRate 0.0123   Epoch: 12   Global Step: 53700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:00,842-Speed 3265.38 samples/sec   Loss 3.7114   LearningRate 0.0123   Epoch: 12   Global Step: 53710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:03,988-Speed 3255.19 samples/sec   Loss 3.5346   LearningRate 0.0123   Epoch: 12   Global Step: 53720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:07,075-Speed 3317.85 samples/sec   Loss 3.5743   LearningRate 0.0123   Epoch: 12   Global Step: 53730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:10,168-Speed 3311.35 samples/sec   Loss 3.5794   LearningRate 0.0123   Epoch: 12   Global Step: 53740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:13,342-Speed 3227.51 samples/sec   Loss 3.6002   LearningRate 0.0123   Epoch: 12   Global Step: 53750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:26,741-Speed 764.30 samples/sec   Loss 2.9657   LearningRate 0.0122   Epoch: 13   Global Step: 53760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:29,827-Speed 3318.89 samples/sec   Loss 2.2758   LearningRate 0.0122   Epoch: 13   Global Step: 53770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 17:57:32,904-Speed 3329.12 samples/sec   Loss 2.2870   LearningRate 0.0122   Epoch: 13   Global Step: 53780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:35,991-Speed 3317.70 samples/sec   Loss 2.3470   LearningRate 0.0122   Epoch: 13   Global Step: 53790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:39,073-Speed 3322.66 samples/sec   Loss 2.3125   LearningRate 0.0122   Epoch: 13   Global Step: 53800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:42,160-Speed 3318.15 samples/sec   Loss 2.3150   LearningRate 0.0122   Epoch: 13   Global Step: 53810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:45,250-Speed 3314.68 samples/sec   Loss 2.4115   LearningRate 0.0122   Epoch: 13   Global Step: 53820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:48,383-Speed 3268.65 samples/sec   Loss 2.3750   LearningRate 0.0122   Epoch: 13   Global Step: 53830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:51,468-Speed 3320.41 samples/sec   Loss 2.3260   LearningRate 0.0122   Epoch: 13   Global Step: 53840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:54,554-Speed 3319.81 samples/sec   Loss 2.3247   LearningRate 0.0122   Epoch: 13   Global Step: 53850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:57:57,642-Speed 3316.62 samples/sec   Loss 2.4168   LearningRate 0.0122   Epoch: 13   Global Step: 53860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:58:00,727-Speed 3319.37 samples/sec   Loss 2.3446   LearningRate 0.0122   Epoch: 13   Global Step: 53870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 17:58:04,009-Speed 3121.16 samples/sec   Loss 2.3262   LearningRate 0.0121   Epoch: 13   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:07,092-Speed 3322.30 samples/sec   Loss 2.3662   LearningRate 0.0121   Epoch: 13   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:10,181-Speed 3315.21 samples/sec   Loss 2.4153   LearningRate 0.0121   Epoch: 13   Global Step: 53900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:13,490-Speed 3095.54 samples/sec   Loss 2.4866   LearningRate 0.0121   Epoch: 13   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:16,582-Speed 3311.76 samples/sec   Loss 2.3883   LearningRate 0.0121   Epoch: 13   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:19,679-Speed 3307.68 samples/sec   Loss 2.3269   LearningRate 0.0121   Epoch: 13   Global Step: 53930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:22,776-Speed 3307.76 samples/sec   Loss 2.4299   LearningRate 0.0121   Epoch: 13   Global Step: 53940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:25,872-Speed 3307.60 samples/sec   Loss 2.4804   LearningRate 0.0121   Epoch: 13   Global Step: 53950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:28,965-Speed 3311.63 samples/sec   Loss 2.3984   LearningRate 0.0121   Epoch: 13   Global Step: 53960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:32,060-Speed 3309.12 samples/sec   Loss 2.4087   LearningRate 0.0121   Epoch: 13   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:35,134-Speed 3332.18 samples/sec   Loss 2.4026   LearningRate 0.0121   Epoch: 13   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:38,231-Speed 3307.06 samples/sec   Loss 2.4351   LearningRate 0.0121   Epoch: 13   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:58:41,318-Speed 3317.41 samples/sec   Loss 2.4030   LearningRate 0.0120   Epoch: 13   Global Step: 54000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 17:59:25,253-[lfw][54000]XNorm: 23.115195
Training: 2022-04-26 17:59:25,254-[lfw][54000]Accuracy-Flip: 0.99750+-0.00250
Training: 2022-04-26 17:59:25,254-[lfw][54000]Accuracy-Highest: 0.99783
Training: 2022-04-26 18:00:16,100-[cfp_fp][54000]XNorm: 22.601873
Training: 2022-04-26 18:00:16,101-[cfp_fp][54000]Accuracy-Flip: 0.98971+-0.00486
Training: 2022-04-26 18:00:16,101-[cfp_fp][54000]Accuracy-Highest: 0.99029
Training: 2022-04-26 18:00:59,778-[agedb_30][54000]XNorm: 23.663181
Training: 2022-04-26 18:00:59,778-[agedb_30][54000]Accuracy-Flip: 0.97550+-0.00876
Training: 2022-04-26 18:00:59,779-[agedb_30][54000]Accuracy-Highest: 0.97550
Training: 2022-04-26 18:01:02,909-Speed 72.32 samples/sec   Loss 2.4276   LearningRate 0.0120   Epoch: 13   Global Step: 54010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:05,992-Speed 3322.81 samples/sec   Loss 2.4707   LearningRate 0.0120   Epoch: 13   Global Step: 54020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:09,077-Speed 3319.34 samples/sec   Loss 2.4440   LearningRate 0.0120   Epoch: 13   Global Step: 54030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:12,157-Speed 3325.50 samples/sec   Loss 2.4730   LearningRate 0.0120   Epoch: 13   Global Step: 54040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:15,243-Speed 3318.82 samples/sec   Loss 2.4521   LearningRate 0.0120   Epoch: 13   Global Step: 54050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:18,330-Speed 3318.02 samples/sec   Loss 2.4232   LearningRate 0.0120   Epoch: 13   Global Step: 54060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:21,429-Speed 3304.97 samples/sec   Loss 2.4707   LearningRate 0.0120   Epoch: 13   Global Step: 54070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:24,532-Speed 3301.02 samples/sec   Loss 2.4741   LearningRate 0.0120   Epoch: 13   Global Step: 54080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-26 18:01:27,701-Speed 3232.40 samples/sec   Loss 2.4405   LearningRate 0.0120   Epoch: 13   Global Step: 54090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:30,794-Speed 3310.56 samples/sec   Loss 2.4808   LearningRate 0.0120   Epoch: 13   Global Step: 54100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:33,889-Speed 3309.24 samples/sec   Loss 2.4054   LearningRate 0.0120   Epoch: 13   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:36,983-Speed 3310.20 samples/sec   Loss 2.4473   LearningRate 0.0119   Epoch: 13   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:40,078-Speed 3309.52 samples/sec   Loss 2.5073   LearningRate 0.0119   Epoch: 13   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:43,175-Speed 3307.78 samples/sec   Loss 2.4762   LearningRate 0.0119   Epoch: 13   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:46,278-Speed 3300.32 samples/sec   Loss 2.5425   LearningRate 0.0119   Epoch: 13   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:49,376-Speed 3306.42 samples/sec   Loss 2.4732   LearningRate 0.0119   Epoch: 13   Global Step: 54160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:52,490-Speed 3288.11 samples/sec   Loss 2.4891   LearningRate 0.0119   Epoch: 13   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:55,582-Speed 3314.18 samples/sec   Loss 2.5138   LearningRate 0.0119   Epoch: 13   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:01:58,653-Speed 3334.52 samples/sec   Loss 2.4903   LearningRate 0.0119   Epoch: 13   Global Step: 54190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:01,740-Speed 3317.82 samples/sec   Loss 2.4874   LearningRate 0.0119   Epoch: 13   Global Step: 54200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:04,825-Speed 3320.67 samples/sec   Loss 2.4705   LearningRate 0.0119   Epoch: 13   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:07,912-Speed 3317.89 samples/sec   Loss 2.4622   LearningRate 0.0119   Epoch: 13   Global Step: 54220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:10,994-Speed 3322.79 samples/sec   Loss 2.5790   LearningRate 0.0119   Epoch: 13   Global Step: 54230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:14,074-Speed 3326.12 samples/sec   Loss 2.5594   LearningRate 0.0118   Epoch: 13   Global Step: 54240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:17,161-Speed 3316.93 samples/sec   Loss 2.5717   LearningRate 0.0118   Epoch: 13   Global Step: 54250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:20,238-Speed 3328.65 samples/sec   Loss 2.5711   LearningRate 0.0118   Epoch: 13   Global Step: 54260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:23,321-Speed 3322.22 samples/sec   Loss 2.5223   LearningRate 0.0118   Epoch: 13   Global Step: 54270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:26,402-Speed 3325.19 samples/sec   Loss 2.5814   LearningRate 0.0118   Epoch: 13   Global Step: 54280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:29,469-Speed 3339.32 samples/sec   Loss 2.4631   LearningRate 0.0118   Epoch: 13   Global Step: 54290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:32,551-Speed 3323.13 samples/sec   Loss 2.5440   LearningRate 0.0118   Epoch: 13   Global Step: 54300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:35,712-Speed 3240.12 samples/sec   Loss 2.5639   LearningRate 0.0118   Epoch: 13   Global Step: 54310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:38,845-Speed 3269.82 samples/sec   Loss 2.5477   LearningRate 0.0118   Epoch: 13   Global Step: 54320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:41,960-Speed 3287.12 samples/sec   Loss 2.5191   LearningRate 0.0118   Epoch: 13   Global Step: 54330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:45,044-Speed 3321.67 samples/sec   Loss 2.5093   LearningRate 0.0118   Epoch: 13   Global Step: 54340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:48,128-Speed 3320.38 samples/sec   Loss 2.5855   LearningRate 0.0118   Epoch: 13   Global Step: 54350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:02:51,212-Speed 3321.18 samples/sec   Loss 2.5046   LearningRate 0.0117   Epoch: 13   Global Step: 54360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:02:54,386-Speed 3226.98 samples/sec   Loss 2.5974   LearningRate 0.0117   Epoch: 13   Global Step: 54370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:02:57,500-Speed 3289.26 samples/sec   Loss 2.5280   LearningRate 0.0117   Epoch: 13   Global Step: 54380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:00,601-Speed 3302.70 samples/sec   Loss 2.5353   LearningRate 0.0117   Epoch: 13   Global Step: 54390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:03,680-Speed 3326.70 samples/sec   Loss 2.5541   LearningRate 0.0117   Epoch: 13   Global Step: 54400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:06,774-Speed 3311.12 samples/sec   Loss 2.5492   LearningRate 0.0117   Epoch: 13   Global Step: 54410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:09,849-Speed 3330.15 samples/sec   Loss 2.5588   LearningRate 0.0117   Epoch: 13   Global Step: 54420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:12,923-Speed 3332.55 samples/sec   Loss 2.5976   LearningRate 0.0117   Epoch: 13   Global Step: 54430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:16,006-Speed 3321.78 samples/sec   Loss 2.5564   LearningRate 0.0117   Epoch: 13   Global Step: 54440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:19,088-Speed 3322.96 samples/sec   Loss 2.5905   LearningRate 0.0117   Epoch: 13   Global Step: 54450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:22,191-Speed 3301.24 samples/sec   Loss 2.6297   LearningRate 0.0117   Epoch: 13   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:03:25,339-Speed 3253.44 samples/sec   Loss 2.6363   LearningRate 0.0117   Epoch: 13   Global Step: 54470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:03:28,581-Speed 3159.00 samples/sec   Loss 2.5968   LearningRate 0.0116   Epoch: 13   Global Step: 54480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:03:31,659-Speed 3327.32 samples/sec   Loss 2.6098   LearningRate 0.0116   Epoch: 13   Global Step: 54490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:03:34,726-Speed 3340.23 samples/sec   Loss 2.5445   LearningRate 0.0116   Epoch: 13   Global Step: 54500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:37,802-Speed 3329.48 samples/sec   Loss 2.6339   LearningRate 0.0116   Epoch: 13   Global Step: 54510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:40,922-Speed 3282.96 samples/sec   Loss 2.5777   LearningRate 0.0116   Epoch: 13   Global Step: 54520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:44,003-Speed 3323.77 samples/sec   Loss 2.6069   LearningRate 0.0116   Epoch: 13   Global Step: 54530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:47,093-Speed 3315.06 samples/sec   Loss 2.6339   LearningRate 0.0116   Epoch: 13   Global Step: 54540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:50,174-Speed 3323.87 samples/sec   Loss 2.6742   LearningRate 0.0116   Epoch: 13   Global Step: 54550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:53,262-Speed 3317.24 samples/sec   Loss 2.6125   LearningRate 0.0116   Epoch: 13   Global Step: 54560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:56,348-Speed 3319.19 samples/sec   Loss 2.6221   LearningRate 0.0116   Epoch: 13   Global Step: 54570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:03:59,431-Speed 3321.71 samples/sec   Loss 2.6097   LearningRate 0.0116   Epoch: 13   Global Step: 54580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:02,515-Speed 3321.16 samples/sec   Loss 2.5723   LearningRate 0.0116   Epoch: 13   Global Step: 54590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:05,599-Speed 3321.08 samples/sec   Loss 2.6436   LearningRate 0.0115   Epoch: 13   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:08,678-Speed 3326.77 samples/sec   Loss 2.6462   LearningRate 0.0115   Epoch: 13   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:11,760-Speed 3323.76 samples/sec   Loss 2.6727   LearningRate 0.0115   Epoch: 13   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:14,857-Speed 3306.84 samples/sec   Loss 2.5982   LearningRate 0.0115   Epoch: 13   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:17,947-Speed 3314.03 samples/sec   Loss 2.6291   LearningRate 0.0115   Epoch: 13   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:21,028-Speed 3325.37 samples/sec   Loss 2.5813   LearningRate 0.0115   Epoch: 13   Global Step: 54650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:24,106-Speed 3326.84 samples/sec   Loss 2.6476   LearningRate 0.0115   Epoch: 13   Global Step: 54660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:27,193-Speed 3318.03 samples/sec   Loss 2.6751   LearningRate 0.0115   Epoch: 13   Global Step: 54670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:30,273-Speed 3325.94 samples/sec   Loss 2.6005   LearningRate 0.0115   Epoch: 13   Global Step: 54680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:33,352-Speed 3325.94 samples/sec   Loss 2.7279   LearningRate 0.0115   Epoch: 13   Global Step: 54690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:04:36,418-Speed 3340.59 samples/sec   Loss 2.5985   LearningRate 0.0115   Epoch: 13   Global Step: 54700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:39,503-Speed 3320.14 samples/sec   Loss 2.6725   LearningRate 0.0115   Epoch: 13   Global Step: 54710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:42,604-Speed 3303.01 samples/sec   Loss 2.6390   LearningRate 0.0114   Epoch: 13   Global Step: 54720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:45,685-Speed 3323.80 samples/sec   Loss 2.6846   LearningRate 0.0114   Epoch: 13   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:48,777-Speed 3313.12 samples/sec   Loss 2.6783   LearningRate 0.0114   Epoch: 13   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:51,881-Speed 3300.08 samples/sec   Loss 2.6032   LearningRate 0.0114   Epoch: 13   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:54,980-Speed 3304.58 samples/sec   Loss 2.7609   LearningRate 0.0114   Epoch: 13   Global Step: 54760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:04:58,066-Speed 3318.81 samples/sec   Loss 2.7232   LearningRate 0.0114   Epoch: 13   Global Step: 54770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:01,170-Speed 3299.95 samples/sec   Loss 2.7035   LearningRate 0.0114   Epoch: 13   Global Step: 54780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:04,277-Speed 3295.98 samples/sec   Loss 2.7426   LearningRate 0.0114   Epoch: 13   Global Step: 54790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:07,369-Speed 3312.49 samples/sec   Loss 2.6560   LearningRate 0.0114   Epoch: 13   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:10,456-Speed 3318.53 samples/sec   Loss 2.6339   LearningRate 0.0114   Epoch: 13   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:13,540-Speed 3321.02 samples/sec   Loss 2.7310   LearningRate 0.0114   Epoch: 13   Global Step: 54820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:16,642-Speed 3301.16 samples/sec   Loss 2.6643   LearningRate 0.0114   Epoch: 13   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:19,749-Speed 3297.60 samples/sec   Loss 2.6894   LearningRate 0.0113   Epoch: 13   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:22,833-Speed 3321.07 samples/sec   Loss 2.7488   LearningRate 0.0113   Epoch: 13   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:25,917-Speed 3321.20 samples/sec   Loss 2.7466   LearningRate 0.0113   Epoch: 13   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:29,001-Speed 3320.86 samples/sec   Loss 2.6624   LearningRate 0.0113   Epoch: 13   Global Step: 54870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:32,084-Speed 3322.76 samples/sec   Loss 2.7802   LearningRate 0.0113   Epoch: 13   Global Step: 54880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:35,169-Speed 3319.49 samples/sec   Loss 2.7005   LearningRate 0.0113   Epoch: 13   Global Step: 54890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:38,244-Speed 3330.85 samples/sec   Loss 2.6685   LearningRate 0.0113   Epoch: 13   Global Step: 54900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:41,334-Speed 3314.75 samples/sec   Loss 2.7248   LearningRate 0.0113   Epoch: 13   Global Step: 54910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:05:44,399-Speed 3341.37 samples/sec   Loss 2.6959   LearningRate 0.0113   Epoch: 13   Global Step: 54920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:47,489-Speed 3315.60 samples/sec   Loss 2.6900   LearningRate 0.0113   Epoch: 13   Global Step: 54930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:50,652-Speed 3238.05 samples/sec   Loss 2.7382   LearningRate 0.0113   Epoch: 13   Global Step: 54940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:53,739-Speed 3317.33 samples/sec   Loss 2.7330   LearningRate 0.0113   Epoch: 13   Global Step: 54950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:56,823-Speed 3321.85 samples/sec   Loss 2.7521   LearningRate 0.0113   Epoch: 13   Global Step: 54960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:05:59,920-Speed 3306.55 samples/sec   Loss 2.7147   LearningRate 0.0112   Epoch: 13   Global Step: 54970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:03,007-Speed 3318.29 samples/sec   Loss 2.7914   LearningRate 0.0112   Epoch: 13   Global Step: 54980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:06,091-Speed 3320.07 samples/sec   Loss 2.7256   LearningRate 0.0112   Epoch: 13   Global Step: 54990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:09,178-Speed 3318.41 samples/sec   Loss 2.7777   LearningRate 0.0112   Epoch: 13   Global Step: 55000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:12,319-Speed 3260.58 samples/sec   Loss 2.7429   LearningRate 0.0112   Epoch: 13   Global Step: 55010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:15,437-Speed 3284.83 samples/sec   Loss 2.6285   LearningRate 0.0112   Epoch: 13   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:18,528-Speed 3314.30 samples/sec   Loss 2.7404   LearningRate 0.0112   Epoch: 13   Global Step: 55030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:21,611-Speed 3321.45 samples/sec   Loss 2.7084   LearningRate 0.0112   Epoch: 13   Global Step: 55040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:24,695-Speed 3321.81 samples/sec   Loss 2.7272   LearningRate 0.0112   Epoch: 13   Global Step: 55050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:27,780-Speed 3319.69 samples/sec   Loss 2.7004   LearningRate 0.0112   Epoch: 13   Global Step: 55060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:30,864-Speed 3320.98 samples/sec   Loss 2.7385   LearningRate 0.0112   Epoch: 13   Global Step: 55070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:33,969-Speed 3299.05 samples/sec   Loss 2.7455   LearningRate 0.0112   Epoch: 13   Global Step: 55080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:37,125-Speed 3245.18 samples/sec   Loss 2.7510   LearningRate 0.0111   Epoch: 13   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:40,218-Speed 3310.76 samples/sec   Loss 2.7903   LearningRate 0.0111   Epoch: 13   Global Step: 55100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:43,306-Speed 3317.50 samples/sec   Loss 2.7173   LearningRate 0.0111   Epoch: 13   Global Step: 55110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:06:46,348-Speed 3366.36 samples/sec   Loss 2.7489   LearningRate 0.0111   Epoch: 13   Global Step: 55120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:49,448-Speed 3304.02 samples/sec   Loss 2.8235   LearningRate 0.0111   Epoch: 13   Global Step: 55130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:52,587-Speed 3263.81 samples/sec   Loss 2.7750   LearningRate 0.0111   Epoch: 13   Global Step: 55140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:55,677-Speed 3314.58 samples/sec   Loss 2.7788   LearningRate 0.0111   Epoch: 13   Global Step: 55150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:06:58,771-Speed 3309.67 samples/sec   Loss 2.8425   LearningRate 0.0111   Epoch: 13   Global Step: 55160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:01,858-Speed 3318.59 samples/sec   Loss 2.7746   LearningRate 0.0111   Epoch: 13   Global Step: 55170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:04,963-Speed 3298.43 samples/sec   Loss 2.7898   LearningRate 0.0111   Epoch: 13   Global Step: 55180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:08,052-Speed 3315.40 samples/sec   Loss 2.6864   LearningRate 0.0111   Epoch: 13   Global Step: 55190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:11,133-Speed 3324.03 samples/sec   Loss 2.7968   LearningRate 0.0111   Epoch: 13   Global Step: 55200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:14,237-Speed 3299.68 samples/sec   Loss 2.7857   LearningRate 0.0110   Epoch: 13   Global Step: 55210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:17,321-Speed 3321.31 samples/sec   Loss 2.8182   LearningRate 0.0110   Epoch: 13   Global Step: 55220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:20,404-Speed 3321.73 samples/sec   Loss 2.8100   LearningRate 0.0110   Epoch: 13   Global Step: 55230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:23,505-Speed 3303.74 samples/sec   Loss 2.7610   LearningRate 0.0110   Epoch: 13   Global Step: 55240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:26,605-Speed 3303.56 samples/sec   Loss 2.6741   LearningRate 0.0110   Epoch: 13   Global Step: 55250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:29,697-Speed 3312.49 samples/sec   Loss 2.7919   LearningRate 0.0110   Epoch: 13   Global Step: 55260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:32,780-Speed 3322.34 samples/sec   Loss 2.7677   LearningRate 0.0110   Epoch: 13   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:35,871-Speed 3313.33 samples/sec   Loss 2.8186   LearningRate 0.0110   Epoch: 13   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:38,952-Speed 3324.02 samples/sec   Loss 2.7173   LearningRate 0.0110   Epoch: 13   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:42,038-Speed 3319.87 samples/sec   Loss 2.8365   LearningRate 0.0110   Epoch: 13   Global Step: 55300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:45,168-Speed 3271.39 samples/sec   Loss 2.7710   LearningRate 0.0110   Epoch: 13   Global Step: 55310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:07:48,257-Speed 3316.29 samples/sec   Loss 2.7334   LearningRate 0.0110   Epoch: 13   Global Step: 55320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:51,442-Speed 3216.25 samples/sec   Loss 2.7734   LearningRate 0.0110   Epoch: 13   Global Step: 55330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:54,526-Speed 3320.56 samples/sec   Loss 2.7966   LearningRate 0.0109   Epoch: 13   Global Step: 55340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:07:57,607-Speed 3324.84 samples/sec   Loss 2.7705   LearningRate 0.0109   Epoch: 13   Global Step: 55350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:08:00,696-Speed 3315.22 samples/sec   Loss 2.8213   LearningRate 0.0109   Epoch: 13   Global Step: 55360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:08:03,776-Speed 3324.96 samples/sec   Loss 2.8219   LearningRate 0.0109   Epoch: 13   Global Step: 55370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:08:06,860-Speed 3321.90 samples/sec   Loss 2.7934   LearningRate 0.0109   Epoch: 13   Global Step: 55380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:08:09,993-Speed 3269.04 samples/sec   Loss 2.8067   LearningRate 0.0109   Epoch: 13   Global Step: 55390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:08:13,085-Speed 3312.61 samples/sec   Loss 2.7698   LearningRate 0.0109   Epoch: 13   Global Step: 55400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:08:16,181-Speed 3307.73 samples/sec   Loss 2.8183   LearningRate 0.0109   Epoch: 13   Global Step: 55410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:08:19,265-Speed 3321.58 samples/sec   Loss 2.7749   LearningRate 0.0109   Epoch: 13   Global Step: 55420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:22,355-Speed 3314.13 samples/sec   Loss 2.8681   LearningRate 0.0109   Epoch: 13   Global Step: 55430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:25,445-Speed 3315.60 samples/sec   Loss 2.8346   LearningRate 0.0109   Epoch: 13   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:28,531-Speed 3317.92 samples/sec   Loss 2.8355   LearningRate 0.0109   Epoch: 13   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:31,618-Speed 3318.35 samples/sec   Loss 2.8252   LearningRate 0.0108   Epoch: 13   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:34,702-Speed 3321.30 samples/sec   Loss 2.8814   LearningRate 0.0108   Epoch: 13   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:37,785-Speed 3321.37 samples/sec   Loss 2.7583   LearningRate 0.0108   Epoch: 13   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:40,893-Speed 3295.46 samples/sec   Loss 2.8248   LearningRate 0.0108   Epoch: 13   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:43,980-Speed 3318.07 samples/sec   Loss 2.8074   LearningRate 0.0108   Epoch: 13   Global Step: 55500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:47,074-Speed 3310.40 samples/sec   Loss 2.8238   LearningRate 0.0108   Epoch: 13   Global Step: 55510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:50,143-Speed 3337.39 samples/sec   Loss 2.7383   LearningRate 0.0108   Epoch: 13   Global Step: 55520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:53,235-Speed 3312.79 samples/sec   Loss 2.8791   LearningRate 0.0108   Epoch: 13   Global Step: 55530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:56,317-Speed 3323.42 samples/sec   Loss 2.8798   LearningRate 0.0108   Epoch: 13   Global Step: 55540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:08:59,405-Speed 3316.42 samples/sec   Loss 2.8535   LearningRate 0.0108   Epoch: 13   Global Step: 55550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:02,499-Speed 3310.95 samples/sec   Loss 2.8277   LearningRate 0.0108   Epoch: 13   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:05,582-Speed 3321.09 samples/sec   Loss 2.9383   LearningRate 0.0108   Epoch: 13   Global Step: 55570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:08,669-Speed 3318.50 samples/sec   Loss 2.8101   LearningRate 0.0108   Epoch: 13   Global Step: 55580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:11,752-Speed 3322.12 samples/sec   Loss 2.7966   LearningRate 0.0107   Epoch: 13   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:14,838-Speed 3318.69 samples/sec   Loss 2.8231   LearningRate 0.0107   Epoch: 13   Global Step: 55600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:17,926-Speed 3317.54 samples/sec   Loss 2.8300   LearningRate 0.0107   Epoch: 13   Global Step: 55610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:20,992-Speed 3339.94 samples/sec   Loss 2.9671   LearningRate 0.0107   Epoch: 13   Global Step: 55620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:24,079-Speed 3318.38 samples/sec   Loss 2.7564   LearningRate 0.0107   Epoch: 13   Global Step: 55630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:27,262-Speed 3218.12 samples/sec   Loss 2.8848   LearningRate 0.0107   Epoch: 13   Global Step: 55640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:30,353-Speed 3313.39 samples/sec   Loss 2.7532   LearningRate 0.0107   Epoch: 13   Global Step: 55650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:33,487-Speed 3267.84 samples/sec   Loss 2.8361   LearningRate 0.0107   Epoch: 13   Global Step: 55660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:36,577-Speed 3314.15 samples/sec   Loss 2.9309   LearningRate 0.0107   Epoch: 13   Global Step: 55670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:39,667-Speed 3314.90 samples/sec   Loss 2.8553   LearningRate 0.0107   Epoch: 13   Global Step: 55680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:42,748-Speed 3324.45 samples/sec   Loss 2.9082   LearningRate 0.0107   Epoch: 13   Global Step: 55690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:45,854-Speed 3297.75 samples/sec   Loss 2.8168   LearningRate 0.0107   Epoch: 13   Global Step: 55700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:48,958-Speed 3300.21 samples/sec   Loss 2.8233   LearningRate 0.0107   Epoch: 13   Global Step: 55710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:09:52,056-Speed 3305.66 samples/sec   Loss 2.7824   LearningRate 0.0106   Epoch: 13   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:55,148-Speed 3312.68 samples/sec   Loss 2.9067   LearningRate 0.0106   Epoch: 13   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:09:58,231-Speed 3321.67 samples/sec   Loss 2.8798   LearningRate 0.0106   Epoch: 13   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:01,315-Speed 3320.90 samples/sec   Loss 2.8610   LearningRate 0.0106   Epoch: 13   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:04,392-Speed 3329.23 samples/sec   Loss 2.8937   LearningRate 0.0106   Epoch: 13   Global Step: 55760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:07,477-Speed 3319.53 samples/sec   Loss 2.8943   LearningRate 0.0106   Epoch: 13   Global Step: 55770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:10,583-Speed 3297.68 samples/sec   Loss 2.7867   LearningRate 0.0106   Epoch: 13   Global Step: 55780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:13,709-Speed 3276.76 samples/sec   Loss 2.8186   LearningRate 0.0106   Epoch: 13   Global Step: 55790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:16,798-Speed 3315.43 samples/sec   Loss 2.9095   LearningRate 0.0106   Epoch: 13   Global Step: 55800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:19,883-Speed 3320.31 samples/sec   Loss 2.8496   LearningRate 0.0106   Epoch: 13   Global Step: 55810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:22,968-Speed 3319.87 samples/sec   Loss 2.8723   LearningRate 0.0106   Epoch: 13   Global Step: 55820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:26,053-Speed 3320.22 samples/sec   Loss 2.8876   LearningRate 0.0106   Epoch: 13   Global Step: 55830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:29,152-Speed 3305.63 samples/sec   Loss 2.8444   LearningRate 0.0105   Epoch: 13   Global Step: 55840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:32,235-Speed 3321.95 samples/sec   Loss 2.8128   LearningRate 0.0105   Epoch: 13   Global Step: 55850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:10:35,331-Speed 3307.46 samples/sec   Loss 2.8895   LearningRate 0.0105   Epoch: 13   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:38,417-Speed 3319.48 samples/sec   Loss 2.8574   LearningRate 0.0105   Epoch: 13   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:41,523-Speed 3297.39 samples/sec   Loss 2.8590   LearningRate 0.0105   Epoch: 13   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:44,612-Speed 3315.69 samples/sec   Loss 2.9407   LearningRate 0.0105   Epoch: 13   Global Step: 55890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:47,712-Speed 3303.65 samples/sec   Loss 2.8704   LearningRate 0.0105   Epoch: 13   Global Step: 55900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:50,807-Speed 3309.82 samples/sec   Loss 2.8847   LearningRate 0.0105   Epoch: 13   Global Step: 55910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:53,924-Speed 3285.76 samples/sec   Loss 2.9915   LearningRate 0.0105   Epoch: 13   Global Step: 55920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:10:57,018-Speed 3310.55 samples/sec   Loss 2.9445   LearningRate 0.0105   Epoch: 13   Global Step: 55930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:11:00,130-Speed 3291.42 samples/sec   Loss 2.8536   LearningRate 0.0105   Epoch: 13   Global Step: 55940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:11:03,216-Speed 3319.12 samples/sec   Loss 2.8382   LearningRate 0.0105   Epoch: 13   Global Step: 55950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:11:06,279-Speed 3343.76 samples/sec   Loss 2.8424   LearningRate 0.0105   Epoch: 13   Global Step: 55960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:11:09,365-Speed 3318.36 samples/sec   Loss 2.8838   LearningRate 0.0104   Epoch: 13   Global Step: 55970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:11:12,448-Speed 3322.19 samples/sec   Loss 2.9196   LearningRate 0.0104   Epoch: 13   Global Step: 55980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:11:15,539-Speed 3313.72 samples/sec   Loss 2.8609   LearningRate 0.0104   Epoch: 13   Global Step: 55990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:11:18,621-Speed 3323.39 samples/sec   Loss 2.9584   LearningRate 0.0104   Epoch: 13   Global Step: 56000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:12:02,409-[lfw][56000]XNorm: 21.052049
Training: 2022-04-26 18:12:02,409-[lfw][56000]Accuracy-Flip: 0.99833+-0.00224
Training: 2022-04-26 18:12:02,410-[lfw][56000]Accuracy-Highest: 0.99833
Training: 2022-04-26 18:12:53,377-[cfp_fp][56000]XNorm: 21.061235
Training: 2022-04-26 18:12:53,378-[cfp_fp][56000]Accuracy-Flip: 0.98943+-0.00550
Training: 2022-04-26 18:12:53,378-[cfp_fp][56000]Accuracy-Highest: 0.99029
Training: 2022-04-26 18:13:37,187-[agedb_30][56000]XNorm: 21.310468
Training: 2022-04-26 18:13:37,187-[agedb_30][56000]Accuracy-Flip: 0.97650+-0.00709
Training: 2022-04-26 18:13:37,188-[agedb_30][56000]Accuracy-Highest: 0.97650
Training: 2022-04-26 18:13:40,275-Speed 72.29 samples/sec   Loss 2.9475   LearningRate 0.0104   Epoch: 13   Global Step: 56010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:13:43,357-Speed 3322.72 samples/sec   Loss 2.9618   LearningRate 0.0104   Epoch: 13   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:13:46,440-Speed 3323.03 samples/sec   Loss 2.8910   LearningRate 0.0104   Epoch: 13   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:13:49,525-Speed 3319.43 samples/sec   Loss 2.9352   LearningRate 0.0104   Epoch: 13   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:13:52,610-Speed 3319.71 samples/sec   Loss 2.9152   LearningRate 0.0104   Epoch: 13   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:13:55,703-Speed 3312.06 samples/sec   Loss 2.9261   LearningRate 0.0104   Epoch: 13   Global Step: 56060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-26 18:13:58,786-Speed 3321.70 samples/sec   Loss 2.8941   LearningRate 0.0104   Epoch: 13   Global Step: 56070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:01,881-Speed 3308.86 samples/sec   Loss 2.9372   LearningRate 0.0104   Epoch: 13   Global Step: 56080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:05,006-Speed 3277.80 samples/sec   Loss 2.8726   LearningRate 0.0104   Epoch: 13   Global Step: 56090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:08,113-Speed 3296.70 samples/sec   Loss 2.8768   LearningRate 0.0103   Epoch: 13   Global Step: 56100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:11,211-Speed 3306.05 samples/sec   Loss 2.9374   LearningRate 0.0103   Epoch: 13   Global Step: 56110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:14,301-Speed 3314.96 samples/sec   Loss 2.9225   LearningRate 0.0103   Epoch: 13   Global Step: 56120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:17,394-Speed 3311.70 samples/sec   Loss 2.8968   LearningRate 0.0103   Epoch: 13   Global Step: 56130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:20,482-Speed 3316.32 samples/sec   Loss 2.8753   LearningRate 0.0103   Epoch: 13   Global Step: 56140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:23,579-Speed 3307.48 samples/sec   Loss 2.8702   LearningRate 0.0103   Epoch: 13   Global Step: 56150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:26,671-Speed 3311.88 samples/sec   Loss 2.9385   LearningRate 0.0103   Epoch: 13   Global Step: 56160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:29,738-Speed 3339.30 samples/sec   Loss 2.8806   LearningRate 0.0103   Epoch: 13   Global Step: 56170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:32,848-Speed 3293.35 samples/sec   Loss 2.8932   LearningRate 0.0103   Epoch: 13   Global Step: 56180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:35,971-Speed 3280.07 samples/sec   Loss 2.9310   LearningRate 0.0103   Epoch: 13   Global Step: 56190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:39,059-Speed 3316.85 samples/sec   Loss 2.8567   LearningRate 0.0103   Epoch: 13   Global Step: 56200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:42,153-Speed 3309.34 samples/sec   Loss 2.9017   LearningRate 0.0103   Epoch: 13   Global Step: 56210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:45,235-Speed 3323.99 samples/sec   Loss 2.8762   LearningRate 0.0103   Epoch: 13   Global Step: 56220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:48,316-Speed 3324.35 samples/sec   Loss 2.9724   LearningRate 0.0102   Epoch: 13   Global Step: 56230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:51,404-Speed 3317.50 samples/sec   Loss 2.9286   LearningRate 0.0102   Epoch: 13   Global Step: 56240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:14:54,473-Speed 3336.65 samples/sec   Loss 2.9216   LearningRate 0.0102   Epoch: 13   Global Step: 56250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:14:57,570-Speed 3306.75 samples/sec   Loss 2.9326   LearningRate 0.0102   Epoch: 13   Global Step: 56260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:00,705-Speed 3267.14 samples/sec   Loss 2.8954   LearningRate 0.0102   Epoch: 13   Global Step: 56270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:03,815-Speed 3293.70 samples/sec   Loss 2.8789   LearningRate 0.0102   Epoch: 13   Global Step: 56280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:06,947-Speed 3270.21 samples/sec   Loss 2.8942   LearningRate 0.0102   Epoch: 13   Global Step: 56290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:10,036-Speed 3315.07 samples/sec   Loss 2.9239   LearningRate 0.0102   Epoch: 13   Global Step: 56300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:13,138-Speed 3302.32 samples/sec   Loss 2.9126   LearningRate 0.0102   Epoch: 13   Global Step: 56310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:16,215-Speed 3328.96 samples/sec   Loss 2.9716   LearningRate 0.0102   Epoch: 13   Global Step: 56320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:19,294-Speed 3326.92 samples/sec   Loss 2.9437   LearningRate 0.0102   Epoch: 13   Global Step: 56330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:22,391-Speed 3307.31 samples/sec   Loss 2.8045   LearningRate 0.0102   Epoch: 13   Global Step: 56340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:15:25,489-Speed 3305.08 samples/sec   Loss 2.9777   LearningRate 0.0102   Epoch: 13   Global Step: 56350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:28,573-Speed 3321.34 samples/sec   Loss 2.9473   LearningRate 0.0101   Epoch: 13   Global Step: 56360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:31,654-Speed 3324.73 samples/sec   Loss 2.9146   LearningRate 0.0101   Epoch: 13   Global Step: 56370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:34,733-Speed 3326.34 samples/sec   Loss 2.9604   LearningRate 0.0101   Epoch: 13   Global Step: 56380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:37,811-Speed 3327.15 samples/sec   Loss 2.8006   LearningRate 0.0101   Epoch: 13   Global Step: 56390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:40,887-Speed 3329.57 samples/sec   Loss 2.9751   LearningRate 0.0101   Epoch: 13   Global Step: 56400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:43,972-Speed 3320.29 samples/sec   Loss 2.9275   LearningRate 0.0101   Epoch: 13   Global Step: 56410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:47,128-Speed 3245.56 samples/sec   Loss 2.9412   LearningRate 0.0101   Epoch: 13   Global Step: 56420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:50,365-Speed 3164.43 samples/sec   Loss 2.8909   LearningRate 0.0101   Epoch: 13   Global Step: 56430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:53,459-Speed 3310.49 samples/sec   Loss 3.0465   LearningRate 0.0101   Epoch: 13   Global Step: 56440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:56,520-Speed 3345.52 samples/sec   Loss 2.9337   LearningRate 0.0101   Epoch: 13   Global Step: 56450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:15:59,599-Speed 3326.57 samples/sec   Loss 2.9137   LearningRate 0.0101   Epoch: 13   Global Step: 56460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:16:02,674-Speed 3331.11 samples/sec   Loss 2.9101   LearningRate 0.0101   Epoch: 13   Global Step: 56470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:05,755-Speed 3323.94 samples/sec   Loss 2.9036   LearningRate 0.0101   Epoch: 13   Global Step: 56480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:08,848-Speed 3310.91 samples/sec   Loss 2.9544   LearningRate 0.0100   Epoch: 13   Global Step: 56490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:11,937-Speed 3315.87 samples/sec   Loss 2.9346   LearningRate 0.0100   Epoch: 13   Global Step: 56500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:15,026-Speed 3316.84 samples/sec   Loss 2.9665   LearningRate 0.0100   Epoch: 13   Global Step: 56510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:18,112-Speed 3318.88 samples/sec   Loss 2.9612   LearningRate 0.0100   Epoch: 13   Global Step: 56520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:21,195-Speed 3322.50 samples/sec   Loss 2.9919   LearningRate 0.0100   Epoch: 13   Global Step: 56530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:24,278-Speed 3322.09 samples/sec   Loss 2.9216   LearningRate 0.0100   Epoch: 13   Global Step: 56540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:27,364-Speed 3318.86 samples/sec   Loss 3.0338   LearningRate 0.0100   Epoch: 13   Global Step: 56550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:30,444-Speed 3325.01 samples/sec   Loss 2.8911   LearningRate 0.0100   Epoch: 13   Global Step: 56560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:33,527-Speed 3322.28 samples/sec   Loss 2.8437   LearningRate 0.0100   Epoch: 13   Global Step: 56570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:16:36,654-Speed 3274.62 samples/sec   Loss 2.9526   LearningRate 0.0100   Epoch: 13   Global Step: 56580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:16:39,826-Speed 3229.33 samples/sec   Loss 2.9194   LearningRate 0.0100   Epoch: 13   Global Step: 56590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:16:42,918-Speed 3312.63 samples/sec   Loss 2.8627   LearningRate 0.0100   Epoch: 13   Global Step: 56600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:16:45,996-Speed 3327.52 samples/sec   Loss 2.9776   LearningRate 0.0100   Epoch: 13   Global Step: 56610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:16:49,080-Speed 3321.26 samples/sec   Loss 3.0070   LearningRate 0.0099   Epoch: 13   Global Step: 56620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:16:52,202-Speed 3281.07 samples/sec   Loss 3.0042   LearningRate 0.0099   Epoch: 13   Global Step: 56630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:55,334-Speed 3270.42 samples/sec   Loss 2.9414   LearningRate 0.0099   Epoch: 13   Global Step: 56640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:16:58,417-Speed 3321.44 samples/sec   Loss 2.9347   LearningRate 0.0099   Epoch: 13   Global Step: 56650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:01,528-Speed 3292.36 samples/sec   Loss 2.8619   LearningRate 0.0099   Epoch: 13   Global Step: 56660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:04,628-Speed 3304.25 samples/sec   Loss 2.9227   LearningRate 0.0099   Epoch: 13   Global Step: 56670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:07,713-Speed 3319.40 samples/sec   Loss 2.9396   LearningRate 0.0099   Epoch: 13   Global Step: 56680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:10,830-Speed 3286.54 samples/sec   Loss 2.9063   LearningRate 0.0099   Epoch: 13   Global Step: 56690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:13,909-Speed 3326.42 samples/sec   Loss 2.9804   LearningRate 0.0099   Epoch: 13   Global Step: 56700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:17,006-Speed 3306.87 samples/sec   Loss 2.8848   LearningRate 0.0099   Epoch: 13   Global Step: 56710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:20,090-Speed 3321.25 samples/sec   Loss 2.9300   LearningRate 0.0099   Epoch: 13   Global Step: 56720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:23,171-Speed 3324.48 samples/sec   Loss 2.9546   LearningRate 0.0099   Epoch: 13   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:17:26,429-Speed 3143.24 samples/sec   Loss 2.9357   LearningRate 0.0099   Epoch: 13   Global Step: 56740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:17:29,541-Speed 3292.11 samples/sec   Loss 2.8850   LearningRate 0.0098   Epoch: 13   Global Step: 56750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:17:32,622-Speed 3324.31 samples/sec   Loss 2.9565   LearningRate 0.0098   Epoch: 13   Global Step: 56760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:17:35,725-Speed 3301.03 samples/sec   Loss 3.0393   LearningRate 0.0098   Epoch: 13   Global Step: 56770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:17:38,832-Speed 3296.36 samples/sec   Loss 2.9437   LearningRate 0.0098   Epoch: 13   Global Step: 56780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:17:41,914-Speed 3323.24 samples/sec   Loss 2.9316   LearningRate 0.0098   Epoch: 13   Global Step: 56790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:17:44,979-Speed 3342.73 samples/sec   Loss 2.9836   LearningRate 0.0098   Epoch: 13   Global Step: 56800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:48,061-Speed 3322.27 samples/sec   Loss 2.9605   LearningRate 0.0098   Epoch: 13   Global Step: 56810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:51,151-Speed 3329.87 samples/sec   Loss 2.9783   LearningRate 0.0098   Epoch: 13   Global Step: 56820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:54,233-Speed 3323.55 samples/sec   Loss 2.8963   LearningRate 0.0098   Epoch: 13   Global Step: 56830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:17:57,312-Speed 3326.96 samples/sec   Loss 2.9544   LearningRate 0.0098   Epoch: 13   Global Step: 56840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:00,388-Speed 3329.70 samples/sec   Loss 2.9106   LearningRate 0.0098   Epoch: 13   Global Step: 56850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:03,467-Speed 3325.90 samples/sec   Loss 3.0200   LearningRate 0.0098   Epoch: 13   Global Step: 56860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:06,548-Speed 3324.37 samples/sec   Loss 2.9056   LearningRate 0.0098   Epoch: 13   Global Step: 56870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:09,627-Speed 3326.45 samples/sec   Loss 2.9251   LearningRate 0.0097   Epoch: 13   Global Step: 56880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:12,720-Speed 3311.29 samples/sec   Loss 2.9507   LearningRate 0.0097   Epoch: 13   Global Step: 56890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:15,791-Speed 3334.84 samples/sec   Loss 3.0092   LearningRate 0.0097   Epoch: 13   Global Step: 56900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:18,890-Speed 3305.76 samples/sec   Loss 2.9589   LearningRate 0.0097   Epoch: 13   Global Step: 56910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:21,967-Speed 3328.59 samples/sec   Loss 2.9532   LearningRate 0.0097   Epoch: 13   Global Step: 56920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:25,048-Speed 3324.32 samples/sec   Loss 2.9252   LearningRate 0.0097   Epoch: 13   Global Step: 56930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:28,136-Speed 3317.35 samples/sec   Loss 2.8383   LearningRate 0.0097   Epoch: 13   Global Step: 56940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:31,221-Speed 3320.08 samples/sec   Loss 2.9594   LearningRate 0.0097   Epoch: 13   Global Step: 56950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:34,310-Speed 3314.88 samples/sec   Loss 2.9464   LearningRate 0.0097   Epoch: 13   Global Step: 56960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:37,443-Speed 3269.18 samples/sec   Loss 2.9297   LearningRate 0.0097   Epoch: 13   Global Step: 56970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:40,628-Speed 3215.37 samples/sec   Loss 2.9422   LearningRate 0.0097   Epoch: 13   Global Step: 56980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:43,712-Speed 3322.14 samples/sec   Loss 2.9459   LearningRate 0.0097   Epoch: 13   Global Step: 56990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:46,807-Speed 3308.69 samples/sec   Loss 2.9328   LearningRate 0.0097   Epoch: 13   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:18:49,926-Speed 3284.56 samples/sec   Loss 2.9045   LearningRate 0.0096   Epoch: 13   Global Step: 57010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:53,095-Speed 3232.23 samples/sec   Loss 2.9286   LearningRate 0.0096   Epoch: 13   Global Step: 57020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:56,200-Speed 3298.50 samples/sec   Loss 2.9337   LearningRate 0.0096   Epoch: 13   Global Step: 57030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:18:59,289-Speed 3315.59 samples/sec   Loss 2.8758   LearningRate 0.0096   Epoch: 13   Global Step: 57040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:02,376-Speed 3317.34 samples/sec   Loss 2.9894   LearningRate 0.0096   Epoch: 13   Global Step: 57050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:05,469-Speed 3311.94 samples/sec   Loss 3.0058   LearningRate 0.0096   Epoch: 13   Global Step: 57060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:08,565-Speed 3307.88 samples/sec   Loss 2.9910   LearningRate 0.0096   Epoch: 13   Global Step: 57070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:11,665-Speed 3303.75 samples/sec   Loss 2.9968   LearningRate 0.0096   Epoch: 13   Global Step: 57080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:14,763-Speed 3306.77 samples/sec   Loss 2.8998   LearningRate 0.0096   Epoch: 13   Global Step: 57090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:17,850-Speed 3317.39 samples/sec   Loss 2.8834   LearningRate 0.0096   Epoch: 13   Global Step: 57100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:20,970-Speed 3282.85 samples/sec   Loss 3.0016   LearningRate 0.0096   Epoch: 13   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:19:24,056-Speed 3318.82 samples/sec   Loss 2.9515   LearningRate 0.0096   Epoch: 13   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:19:27,124-Speed 3338.61 samples/sec   Loss 2.9573   LearningRate 0.0096   Epoch: 13   Global Step: 57130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:30,209-Speed 3319.94 samples/sec   Loss 2.9410   LearningRate 0.0096   Epoch: 13   Global Step: 57140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:33,307-Speed 3306.40 samples/sec   Loss 2.9760   LearningRate 0.0095   Epoch: 13   Global Step: 57150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:36,390-Speed 3321.81 samples/sec   Loss 3.0182   LearningRate 0.0095   Epoch: 13   Global Step: 57160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:39,478-Speed 3316.35 samples/sec   Loss 2.9640   LearningRate 0.0095   Epoch: 13   Global Step: 57170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:42,569-Speed 3314.90 samples/sec   Loss 2.9964   LearningRate 0.0095   Epoch: 13   Global Step: 57180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:45,652-Speed 3321.61 samples/sec   Loss 3.0096   LearningRate 0.0095   Epoch: 13   Global Step: 57190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:48,730-Speed 3327.60 samples/sec   Loss 2.9480   LearningRate 0.0095   Epoch: 13   Global Step: 57200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:51,821-Speed 3314.01 samples/sec   Loss 2.9304   LearningRate 0.0095   Epoch: 13   Global Step: 57210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:54,906-Speed 3319.71 samples/sec   Loss 2.9031   LearningRate 0.0095   Epoch: 13   Global Step: 57220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:19:57,993-Speed 3317.21 samples/sec   Loss 2.9396   LearningRate 0.0095   Epoch: 13   Global Step: 57230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:01,081-Speed 3317.58 samples/sec   Loss 2.8995   LearningRate 0.0095   Epoch: 13   Global Step: 57240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:04,167-Speed 3318.54 samples/sec   Loss 3.0219   LearningRate 0.0095   Epoch: 13   Global Step: 57250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:07,254-Speed 3317.41 samples/sec   Loss 2.9373   LearningRate 0.0095   Epoch: 13   Global Step: 57260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:10,362-Speed 3296.06 samples/sec   Loss 2.9581   LearningRate 0.0095   Epoch: 13   Global Step: 57270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:13,481-Speed 3284.11 samples/sec   Loss 2.8915   LearningRate 0.0094   Epoch: 13   Global Step: 57280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:16,576-Speed 3309.00 samples/sec   Loss 2.9541   LearningRate 0.0094   Epoch: 13   Global Step: 57290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:19,660-Speed 3321.19 samples/sec   Loss 2.9600   LearningRate 0.0094   Epoch: 13   Global Step: 57300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:22,741-Speed 3323.92 samples/sec   Loss 2.9810   LearningRate 0.0094   Epoch: 13   Global Step: 57310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:25,832-Speed 3314.47 samples/sec   Loss 2.9758   LearningRate 0.0094   Epoch: 13   Global Step: 57320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:28,910-Speed 3326.63 samples/sec   Loss 2.9079   LearningRate 0.0094   Epoch: 13   Global Step: 57330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:20:31,977-Speed 3340.41 samples/sec   Loss 2.9715   LearningRate 0.0094   Epoch: 13   Global Step: 57340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:35,075-Speed 3305.37 samples/sec   Loss 3.0422   LearningRate 0.0094   Epoch: 13   Global Step: 57350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:38,206-Speed 3272.04 samples/sec   Loss 3.0082   LearningRate 0.0094   Epoch: 13   Global Step: 57360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:41,304-Speed 3305.54 samples/sec   Loss 2.9065   LearningRate 0.0094   Epoch: 13   Global Step: 57370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:44,387-Speed 3322.33 samples/sec   Loss 2.9868   LearningRate 0.0094   Epoch: 13   Global Step: 57380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:47,475-Speed 3317.00 samples/sec   Loss 2.9801   LearningRate 0.0094   Epoch: 13   Global Step: 57390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:50,610-Speed 3267.14 samples/sec   Loss 2.9377   LearningRate 0.0094   Epoch: 13   Global Step: 57400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:53,727-Speed 3285.27 samples/sec   Loss 2.9679   LearningRate 0.0094   Epoch: 13   Global Step: 57410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:56,813-Speed 3318.94 samples/sec   Loss 3.0311   LearningRate 0.0093   Epoch: 13   Global Step: 57420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:20:59,913-Speed 3304.38 samples/sec   Loss 2.9240   LearningRate 0.0093   Epoch: 13   Global Step: 57430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:21:03,009-Speed 3308.34 samples/sec   Loss 2.8589   LearningRate 0.0093   Epoch: 13   Global Step: 57440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:21:06,112-Speed 3300.79 samples/sec   Loss 2.9871   LearningRate 0.0093   Epoch: 13   Global Step: 57450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:21:09,193-Speed 3323.86 samples/sec   Loss 3.0047   LearningRate 0.0093   Epoch: 13   Global Step: 57460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:21:12,280-Speed 3317.66 samples/sec   Loss 2.9430   LearningRate 0.0093   Epoch: 13   Global Step: 57470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:21:15,350-Speed 3336.86 samples/sec   Loss 2.9957   LearningRate 0.0093   Epoch: 13   Global Step: 57480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:21:18,435-Speed 3320.16 samples/sec   Loss 2.9924   LearningRate 0.0093   Epoch: 13   Global Step: 57490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:21:21,515-Speed 3324.73 samples/sec   Loss 2.9598   LearningRate 0.0093   Epoch: 13   Global Step: 57500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:21:24,650-Speed 3266.99 samples/sec   Loss 3.0177   LearningRate 0.0093   Epoch: 13   Global Step: 57510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:21:27,806-Speed 3245.33 samples/sec   Loss 2.8831   LearningRate 0.0093   Epoch: 13   Global Step: 57520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:21:30,903-Speed 3307.43 samples/sec   Loss 2.9564   LearningRate 0.0093   Epoch: 13   Global Step: 57530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:21:33,978-Speed 3330.66 samples/sec   Loss 2.9272   LearningRate 0.0093   Epoch: 13   Global Step: 57540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:37,075-Speed 3307.47 samples/sec   Loss 3.0140   LearningRate 0.0092   Epoch: 13   Global Step: 57550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:40,160-Speed 3320.23 samples/sec   Loss 3.0278   LearningRate 0.0092   Epoch: 13   Global Step: 57560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:43,241-Speed 3324.25 samples/sec   Loss 3.0784   LearningRate 0.0092   Epoch: 13   Global Step: 57570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:46,324-Speed 3321.51 samples/sec   Loss 2.9561   LearningRate 0.0092   Epoch: 13   Global Step: 57580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:49,416-Speed 3313.23 samples/sec   Loss 2.9522   LearningRate 0.0092   Epoch: 13   Global Step: 57590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:52,543-Speed 3275.06 samples/sec   Loss 2.9170   LearningRate 0.0092   Epoch: 13   Global Step: 57600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:55,636-Speed 3312.29 samples/sec   Loss 2.9926   LearningRate 0.0092   Epoch: 13   Global Step: 57610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:21:58,729-Speed 3310.45 samples/sec   Loss 2.9369   LearningRate 0.0092   Epoch: 13   Global Step: 57620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:22:01,831-Speed 3302.72 samples/sec   Loss 3.0124   LearningRate 0.0092   Epoch: 13   Global Step: 57630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:22:04,918-Speed 3318.33 samples/sec   Loss 2.9289   LearningRate 0.0092   Epoch: 13   Global Step: 57640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:08,002-Speed 3320.96 samples/sec   Loss 2.9803   LearningRate 0.0092   Epoch: 13   Global Step: 57650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:11,103-Speed 3303.18 samples/sec   Loss 2.9167   LearningRate 0.0092   Epoch: 13   Global Step: 57660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:14,198-Speed 3310.49 samples/sec   Loss 2.9542   LearningRate 0.0092   Epoch: 13   Global Step: 57670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:17,307-Speed 3293.84 samples/sec   Loss 2.9573   LearningRate 0.0092   Epoch: 13   Global Step: 57680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:20,398-Speed 3313.93 samples/sec   Loss 3.0395   LearningRate 0.0091   Epoch: 13   Global Step: 57690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:23,482-Speed 3320.81 samples/sec   Loss 2.9663   LearningRate 0.0091   Epoch: 13   Global Step: 57700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:26,570-Speed 3316.81 samples/sec   Loss 2.9401   LearningRate 0.0091   Epoch: 13   Global Step: 57710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:29,655-Speed 3320.45 samples/sec   Loss 2.9565   LearningRate 0.0091   Epoch: 13   Global Step: 57720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:32,749-Speed 3310.81 samples/sec   Loss 2.9207   LearningRate 0.0091   Epoch: 13   Global Step: 57730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:35,835-Speed 3318.34 samples/sec   Loss 2.9862   LearningRate 0.0091   Epoch: 13   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:22:38,925-Speed 3314.53 samples/sec   Loss 3.0016   LearningRate 0.0091   Epoch: 13   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:22:42,138-Speed 3188.16 samples/sec   Loss 3.0264   LearningRate 0.0091   Epoch: 13   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:22:45,238-Speed 3303.01 samples/sec   Loss 3.0186   LearningRate 0.0091   Epoch: 13   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:22:48,305-Speed 3341.04 samples/sec   Loss 2.9168   LearningRate 0.0091   Epoch: 13   Global Step: 57780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:51,437-Speed 3269.46 samples/sec   Loss 3.0265   LearningRate 0.0091   Epoch: 13   Global Step: 57790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:54,529-Speed 3313.00 samples/sec   Loss 3.0293   LearningRate 0.0091   Epoch: 13   Global Step: 57800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:22:57,663-Speed 3268.40 samples/sec   Loss 2.8573   LearningRate 0.0091   Epoch: 13   Global Step: 57810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:00,766-Speed 3300.16 samples/sec   Loss 2.9687   LearningRate 0.0091   Epoch: 13   Global Step: 57820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:03,875-Speed 3294.69 samples/sec   Loss 3.0010   LearningRate 0.0090   Epoch: 13   Global Step: 57830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:06,965-Speed 3314.81 samples/sec   Loss 2.9616   LearningRate 0.0090   Epoch: 13   Global Step: 57840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:10,052-Speed 3317.55 samples/sec   Loss 2.9063   LearningRate 0.0090   Epoch: 13   Global Step: 57850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:13,147-Speed 3309.13 samples/sec   Loss 3.0167   LearningRate 0.0090   Epoch: 13   Global Step: 57860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:16,235-Speed 3317.19 samples/sec   Loss 3.0249   LearningRate 0.0090   Epoch: 13   Global Step: 57870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:19,382-Speed 3255.08 samples/sec   Loss 2.9403   LearningRate 0.0090   Epoch: 13   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:23:22,482-Speed 3303.83 samples/sec   Loss 2.9992   LearningRate 0.0090   Epoch: 13   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:23:35,058-Speed 814.32 samples/sec   Loss 1.8995   LearningRate 0.0090   Epoch: 14   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:23:38,230-Speed 3229.48 samples/sec   Loss 1.8768   LearningRate 0.0090   Epoch: 14   Global Step: 57910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:41,318-Speed 3316.60 samples/sec   Loss 1.8100   LearningRate 0.0090   Epoch: 14   Global Step: 57920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:44,415-Speed 3307.27 samples/sec   Loss 1.8415   LearningRate 0.0090   Epoch: 14   Global Step: 57930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:47,513-Speed 3305.99 samples/sec   Loss 1.9012   LearningRate 0.0090   Epoch: 14   Global Step: 57940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:50,612-Speed 3305.34 samples/sec   Loss 1.8076   LearningRate 0.0090   Epoch: 14   Global Step: 57950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:53,704-Speed 3312.47 samples/sec   Loss 1.8760   LearningRate 0.0089   Epoch: 14   Global Step: 57960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:56,794-Speed 3314.90 samples/sec   Loss 1.8335   LearningRate 0.0089   Epoch: 14   Global Step: 57970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:23:59,890-Speed 3307.90 samples/sec   Loss 1.8835   LearningRate 0.0089   Epoch: 14   Global Step: 57980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:24:02,981-Speed 3313.75 samples/sec   Loss 1.8705   LearningRate 0.0089   Epoch: 14   Global Step: 57990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:24:06,087-Speed 3298.09 samples/sec   Loss 1.8894   LearningRate 0.0089   Epoch: 14   Global Step: 58000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:24:49,654-[lfw][58000]XNorm: 21.034857
Training: 2022-04-26 18:24:49,655-[lfw][58000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-26 18:24:49,655-[lfw][58000]Accuracy-Highest: 0.99833
Training: 2022-04-26 18:25:40,381-[cfp_fp][58000]XNorm: 21.101754
Training: 2022-04-26 18:25:40,381-[cfp_fp][58000]Accuracy-Flip: 0.99114+-0.00473
Training: 2022-04-26 18:25:40,382-[cfp_fp][58000]Accuracy-Highest: 0.99114
Training: 2022-04-26 18:26:23,856-[agedb_30][58000]XNorm: 21.877596
Training: 2022-04-26 18:26:23,857-[agedb_30][58000]Accuracy-Flip: 0.97583+-0.00539
Training: 2022-04-26 18:26:23,857-[agedb_30][58000]Accuracy-Highest: 0.97650
Training: 2022-04-26 18:26:26,939-Speed 72.70 samples/sec   Loss 1.8326   LearningRate 0.0089   Epoch: 14   Global Step: 58010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:26:30,009-Speed 3335.92 samples/sec   Loss 1.9250   LearningRate 0.0089   Epoch: 14   Global Step: 58020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:26:33,087-Speed 3329.17 samples/sec   Loss 1.8647   LearningRate 0.0089   Epoch: 14   Global Step: 58030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:26:36,152-Speed 3341.20 samples/sec   Loss 1.9491   LearningRate 0.0089   Epoch: 14   Global Step: 58040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:26:39,311-Speed 3241.70 samples/sec   Loss 1.8691   LearningRate 0.0089   Epoch: 14   Global Step: 58050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:26:42,409-Speed 3306.30 samples/sec   Loss 1.9011   LearningRate 0.0089   Epoch: 14   Global Step: 58060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:26:45,491-Speed 3323.18 samples/sec   Loss 1.8939   LearningRate 0.0089   Epoch: 14   Global Step: 58070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:26:48,573-Speed 3323.61 samples/sec   Loss 1.8439   LearningRate 0.0089   Epoch: 14   Global Step: 58080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:26:51,656-Speed 3322.30 samples/sec   Loss 1.9464   LearningRate 0.0089   Epoch: 14   Global Step: 58090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:26:54,738-Speed 3323.26 samples/sec   Loss 1.8829   LearningRate 0.0088   Epoch: 14   Global Step: 58100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:26:57,823-Speed 3320.02 samples/sec   Loss 2.0117   LearningRate 0.0088   Epoch: 14   Global Step: 58110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:27:00,910-Speed 3317.30 samples/sec   Loss 1.9123   LearningRate 0.0088   Epoch: 14   Global Step: 58120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:27:04,001-Speed 3313.91 samples/sec   Loss 1.9322   LearningRate 0.0088   Epoch: 14   Global Step: 58130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:27:07,088-Speed 3317.46 samples/sec   Loss 1.9149   LearningRate 0.0088   Epoch: 14   Global Step: 58140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:10,181-Speed 3311.51 samples/sec   Loss 1.8586   LearningRate 0.0088   Epoch: 14   Global Step: 58150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:13,264-Speed 3322.75 samples/sec   Loss 1.9264   LearningRate 0.0088   Epoch: 14   Global Step: 58160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:16,347-Speed 3321.78 samples/sec   Loss 1.9700   LearningRate 0.0088   Epoch: 14   Global Step: 58170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:19,430-Speed 3322.54 samples/sec   Loss 1.9457   LearningRate 0.0088   Epoch: 14   Global Step: 58180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:22,511-Speed 3324.41 samples/sec   Loss 1.8719   LearningRate 0.0088   Epoch: 14   Global Step: 58190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:25,620-Speed 3294.23 samples/sec   Loss 1.8849   LearningRate 0.0088   Epoch: 14   Global Step: 58200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:28,709-Speed 3315.96 samples/sec   Loss 1.9275   LearningRate 0.0088   Epoch: 14   Global Step: 58210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:31,792-Speed 3322.11 samples/sec   Loss 1.9522   LearningRate 0.0088   Epoch: 14   Global Step: 58220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:34,875-Speed 3322.11 samples/sec   Loss 1.9558   LearningRate 0.0088   Epoch: 14   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:37,951-Speed 3329.39 samples/sec   Loss 2.0504   LearningRate 0.0087   Epoch: 14   Global Step: 58240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:41,032-Speed 3324.37 samples/sec   Loss 1.9230   LearningRate 0.0087   Epoch: 14   Global Step: 58250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:44,147-Speed 3288.32 samples/sec   Loss 1.9388   LearningRate 0.0087   Epoch: 14   Global Step: 58260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:47,233-Speed 3318.82 samples/sec   Loss 1.9656   LearningRate 0.0087   Epoch: 14   Global Step: 58270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:50,312-Speed 3326.44 samples/sec   Loss 2.0227   LearningRate 0.0087   Epoch: 14   Global Step: 58280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:53,387-Speed 3331.13 samples/sec   Loss 1.9503   LearningRate 0.0087   Epoch: 14   Global Step: 58290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:56,463-Speed 3329.62 samples/sec   Loss 2.0049   LearningRate 0.0087   Epoch: 14   Global Step: 58300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:27:59,542-Speed 3326.25 samples/sec   Loss 1.9463   LearningRate 0.0087   Epoch: 14   Global Step: 58310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:02,648-Speed 3298.03 samples/sec   Loss 2.0047   LearningRate 0.0087   Epoch: 14   Global Step: 58320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:05,725-Speed 3328.44 samples/sec   Loss 1.9495   LearningRate 0.0087   Epoch: 14   Global Step: 58330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:08,785-Speed 3347.42 samples/sec   Loss 1.9728   LearningRate 0.0087   Epoch: 14   Global Step: 58340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:11,861-Speed 3330.10 samples/sec   Loss 1.9668   LearningRate 0.0087   Epoch: 14   Global Step: 58350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:14,951-Speed 3314.57 samples/sec   Loss 1.9872   LearningRate 0.0087   Epoch: 14   Global Step: 58360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:18,030-Speed 3326.47 samples/sec   Loss 1.9411   LearningRate 0.0087   Epoch: 14   Global Step: 58370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:21,088-Speed 3349.52 samples/sec   Loss 1.9248   LearningRate 0.0086   Epoch: 14   Global Step: 58380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:24,161-Speed 3333.00 samples/sec   Loss 2.0336   LearningRate 0.0086   Epoch: 14   Global Step: 58390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:27,235-Speed 3331.51 samples/sec   Loss 2.0585   LearningRate 0.0086   Epoch: 14   Global Step: 58400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:30,328-Speed 3311.54 samples/sec   Loss 1.9940   LearningRate 0.0086   Epoch: 14   Global Step: 58410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:33,406-Speed 3327.56 samples/sec   Loss 1.9973   LearningRate 0.0086   Epoch: 14   Global Step: 58420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:36,508-Speed 3301.63 samples/sec   Loss 2.0129   LearningRate 0.0086   Epoch: 14   Global Step: 58430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:39,698-Speed 3211.07 samples/sec   Loss 2.0043   LearningRate 0.0086   Epoch: 14   Global Step: 58440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:42,776-Speed 3327.14 samples/sec   Loss 2.0275   LearningRate 0.0086   Epoch: 14   Global Step: 58450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:45,856-Speed 3325.58 samples/sec   Loss 2.0231   LearningRate 0.0086   Epoch: 14   Global Step: 58460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:48,934-Speed 3327.36 samples/sec   Loss 2.0286   LearningRate 0.0086   Epoch: 14   Global Step: 58470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:28:52,010-Speed 3330.06 samples/sec   Loss 1.9528   LearningRate 0.0086   Epoch: 14   Global Step: 58480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:55,085-Speed 3331.22 samples/sec   Loss 2.0405   LearningRate 0.0086   Epoch: 14   Global Step: 58490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:28:58,169-Speed 3320.32 samples/sec   Loss 2.0425   LearningRate 0.0086   Epoch: 14   Global Step: 58500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:29:01,246-Speed 3328.43 samples/sec   Loss 2.1144   LearningRate 0.0086   Epoch: 14   Global Step: 58510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:04,326-Speed 3326.08 samples/sec   Loss 1.9844   LearningRate 0.0085   Epoch: 14   Global Step: 58520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:07,409-Speed 3322.01 samples/sec   Loss 1.9955   LearningRate 0.0085   Epoch: 14   Global Step: 58530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:10,485-Speed 3329.72 samples/sec   Loss 2.0092   LearningRate 0.0085   Epoch: 14   Global Step: 58540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:13,561-Speed 3329.39 samples/sec   Loss 2.0322   LearningRate 0.0085   Epoch: 14   Global Step: 58550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:16,642-Speed 3324.17 samples/sec   Loss 2.0835   LearningRate 0.0085   Epoch: 14   Global Step: 58560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:19,719-Speed 3328.98 samples/sec   Loss 2.0176   LearningRate 0.0085   Epoch: 14   Global Step: 58570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:22,804-Speed 3319.82 samples/sec   Loss 2.0431   LearningRate 0.0085   Epoch: 14   Global Step: 58580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:25,905-Speed 3303.18 samples/sec   Loss 2.0465   LearningRate 0.0085   Epoch: 14   Global Step: 58590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:28,994-Speed 3316.33 samples/sec   Loss 2.0304   LearningRate 0.0085   Epoch: 14   Global Step: 58600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:32,074-Speed 3325.09 samples/sec   Loss 1.9878   LearningRate 0.0085   Epoch: 14   Global Step: 58610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:29:35,151-Speed 3327.87 samples/sec   Loss 2.0169   LearningRate 0.0085   Epoch: 14   Global Step: 58620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:29:38,217-Speed 3341.54 samples/sec   Loss 2.0572   LearningRate 0.0085   Epoch: 14   Global Step: 58630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:41,299-Speed 3323.29 samples/sec   Loss 2.0389   LearningRate 0.0085   Epoch: 14   Global Step: 58640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:44,382-Speed 3321.98 samples/sec   Loss 2.1021   LearningRate 0.0085   Epoch: 14   Global Step: 58650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:47,509-Speed 3275.38 samples/sec   Loss 2.0858   LearningRate 0.0085   Epoch: 14   Global Step: 58660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:50,600-Speed 3313.92 samples/sec   Loss 2.1107   LearningRate 0.0084   Epoch: 14   Global Step: 58670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:53,681-Speed 3324.70 samples/sec   Loss 2.0896   LearningRate 0.0084   Epoch: 14   Global Step: 58680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:56,758-Speed 3327.72 samples/sec   Loss 2.0650   LearningRate 0.0084   Epoch: 14   Global Step: 58690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:29:59,837-Speed 3326.39 samples/sec   Loss 2.0735   LearningRate 0.0084   Epoch: 14   Global Step: 58700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:30:02,930-Speed 3311.87 samples/sec   Loss 2.0639   LearningRate 0.0084   Epoch: 14   Global Step: 58710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:30:06,006-Speed 3329.43 samples/sec   Loss 2.0801   LearningRate 0.0084   Epoch: 14   Global Step: 58720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:30:09,087-Speed 3324.45 samples/sec   Loss 2.0897   LearningRate 0.0084   Epoch: 14   Global Step: 58730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:12,164-Speed 3328.59 samples/sec   Loss 2.0382   LearningRate 0.0084   Epoch: 14   Global Step: 58740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:15,251-Speed 3318.31 samples/sec   Loss 2.0481   LearningRate 0.0084   Epoch: 14   Global Step: 58750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:18,333-Speed 3323.26 samples/sec   Loss 2.1141   LearningRate 0.0084   Epoch: 14   Global Step: 58760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:21,408-Speed 3330.91 samples/sec   Loss 2.0442   LearningRate 0.0084   Epoch: 14   Global Step: 58770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:24,500-Speed 3312.15 samples/sec   Loss 2.0642   LearningRate 0.0084   Epoch: 14   Global Step: 58780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:27,586-Speed 3319.58 samples/sec   Loss 2.0652   LearningRate 0.0084   Epoch: 14   Global Step: 58790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:30,669-Speed 3322.09 samples/sec   Loss 2.0728   LearningRate 0.0084   Epoch: 14   Global Step: 58800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:33,752-Speed 3321.50 samples/sec   Loss 2.0373   LearningRate 0.0083   Epoch: 14   Global Step: 58810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:30:36,823-Speed 3334.99 samples/sec   Loss 2.1206   LearningRate 0.0083   Epoch: 14   Global Step: 58820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:30:39,967-Speed 3258.21 samples/sec   Loss 2.0926   LearningRate 0.0083   Epoch: 14   Global Step: 58830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:30:43,031-Speed 3342.73 samples/sec   Loss 2.1544   LearningRate 0.0083   Epoch: 14   Global Step: 58840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:30:46,108-Speed 3329.08 samples/sec   Loss 2.1816   LearningRate 0.0083   Epoch: 14   Global Step: 58850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:30:49,190-Speed 3322.72 samples/sec   Loss 2.0927   LearningRate 0.0083   Epoch: 14   Global Step: 58860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:30:52,277-Speed 3318.24 samples/sec   Loss 2.1438   LearningRate 0.0083   Epoch: 14   Global Step: 58870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:30:55,363-Speed 3319.01 samples/sec   Loss 2.0894   LearningRate 0.0083   Epoch: 14   Global Step: 58880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:30:58,461-Speed 3305.75 samples/sec   Loss 2.0714   LearningRate 0.0083   Epoch: 14   Global Step: 58890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:31:01,558-Speed 3307.23 samples/sec   Loss 2.0777   LearningRate 0.0083   Epoch: 14   Global Step: 58900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:31:04,684-Speed 3276.23 samples/sec   Loss 2.0707   LearningRate 0.0083   Epoch: 14   Global Step: 58910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:31:07,761-Speed 3329.59 samples/sec   Loss 2.1332   LearningRate 0.0083   Epoch: 14   Global Step: 58920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:31:10,861-Speed 3303.86 samples/sec   Loss 2.1183   LearningRate 0.0083   Epoch: 14   Global Step: 58930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-26 18:31:13,941-Speed 3324.78 samples/sec   Loss 2.1449   LearningRate 0.0083   Epoch: 14   Global Step: 58940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:17,019-Speed 3327.75 samples/sec   Loss 2.1090   LearningRate 0.0082   Epoch: 14   Global Step: 58950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:20,097-Speed 3327.48 samples/sec   Loss 2.1402   LearningRate 0.0082   Epoch: 14   Global Step: 58960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:23,247-Speed 3252.12 samples/sec   Loss 2.1331   LearningRate 0.0082   Epoch: 14   Global Step: 58970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:26,346-Speed 3304.98 samples/sec   Loss 2.2369   LearningRate 0.0082   Epoch: 14   Global Step: 58980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:29,451-Speed 3298.05 samples/sec   Loss 2.1750   LearningRate 0.0082   Epoch: 14   Global Step: 58990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:32,548-Speed 3307.29 samples/sec   Loss 2.1136   LearningRate 0.0082   Epoch: 14   Global Step: 59000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:35,635-Speed 3318.49 samples/sec   Loss 2.1309   LearningRate 0.0082   Epoch: 14   Global Step: 59010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:38,720-Speed 3319.39 samples/sec   Loss 2.1539   LearningRate 0.0082   Epoch: 14   Global Step: 59020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:41,815-Speed 3309.66 samples/sec   Loss 2.1114   LearningRate 0.0082   Epoch: 14   Global Step: 59030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-26 18:31:44,898-Speed 3322.53 samples/sec   Loss 2.1728   LearningRate 0.0082   Epoch: 14   Global Step: 59040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:31:47,983-Speed 3319.64 samples/sec   Loss 2.1432   LearningRate 0.0082   Epoch: 14   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:31:51,063-Speed 3325.08 samples/sec   Loss 2.1404   LearningRate 0.0082   Epoch: 14   Global Step: 59060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:31:54,145-Speed 3323.10 samples/sec   Loss 2.1660   LearningRate 0.0082   Epoch: 14   Global Step: 59070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:31:57,227-Speed 3323.28 samples/sec   Loss 2.1536   LearningRate 0.0082   Epoch: 14   Global Step: 59080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:00,325-Speed 3305.82 samples/sec   Loss 2.1595   LearningRate 0.0082   Epoch: 14   Global Step: 59090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:03,411-Speed 3319.15 samples/sec   Loss 2.1938   LearningRate 0.0081   Epoch: 14   Global Step: 59100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:06,496-Speed 3320.15 samples/sec   Loss 2.1554   LearningRate 0.0081   Epoch: 14   Global Step: 59110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:09,584-Speed 3317.02 samples/sec   Loss 2.1811   LearningRate 0.0081   Epoch: 14   Global Step: 59120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:12,677-Speed 3311.46 samples/sec   Loss 2.2280   LearningRate 0.0081   Epoch: 14   Global Step: 59130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:15,743-Speed 3340.17 samples/sec   Loss 2.1380   LearningRate 0.0081   Epoch: 14   Global Step: 59140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:18,822-Speed 3326.56 samples/sec   Loss 2.1318   LearningRate 0.0081   Epoch: 14   Global Step: 59150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:21,901-Speed 3326.35 samples/sec   Loss 2.2062   LearningRate 0.0081   Epoch: 14   Global Step: 59160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:24,985-Speed 3321.69 samples/sec   Loss 2.1821   LearningRate 0.0081   Epoch: 14   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:28,072-Speed 3317.64 samples/sec   Loss 2.1137   LearningRate 0.0081   Epoch: 14   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-26 18:32:31,138-Speed 3340.92 samples/sec   Loss 2.1503   LearningRate 0.0081   Epoch: 14   Global Step: 59190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:34,224-Speed 3318.84 samples/sec   Loss 2.1921   LearningRate 0.0081   Epoch: 14   Global Step: 59200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:37,310-Speed 3319.85 samples/sec   Loss 2.1507   LearningRate 0.0081   Epoch: 14   Global Step: 59210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:40,398-Speed 3315.93 samples/sec   Loss 2.1881   LearningRate 0.0081   Epoch: 14   Global Step: 59220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:43,483-Speed 3320.82 samples/sec   Loss 2.1406   LearningRate 0.0081   Epoch: 14   Global Step: 59230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:46,565-Speed 3322.64 samples/sec   Loss 2.2100   LearningRate 0.0080   Epoch: 14   Global Step: 59240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:49,650-Speed 3319.85 samples/sec   Loss 2.0973   LearningRate 0.0080   Epoch: 14   Global Step: 59250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:52,733-Speed 3322.39 samples/sec   Loss 2.1830   LearningRate 0.0080   Epoch: 14   Global Step: 59260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:55,827-Speed 3311.07 samples/sec   Loss 2.1866   LearningRate 0.0080   Epoch: 14   Global Step: 59270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:32:58,911-Speed 3320.52 samples/sec   Loss 2.1892   LearningRate 0.0080   Epoch: 14   Global Step: 59280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:02,009-Speed 3306.00 samples/sec   Loss 2.1669   LearningRate 0.0080   Epoch: 14   Global Step: 59290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:33:05,093-Speed 3322.01 samples/sec   Loss 2.1708   LearningRate 0.0080   Epoch: 14   Global Step: 59300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:08,187-Speed 3310.31 samples/sec   Loss 2.1248   LearningRate 0.0080   Epoch: 14   Global Step: 59310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:11,286-Speed 3305.15 samples/sec   Loss 2.1498   LearningRate 0.0080   Epoch: 14   Global Step: 59320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:14,368-Speed 3322.67 samples/sec   Loss 2.1917   LearningRate 0.0080   Epoch: 14   Global Step: 59330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:17,472-Speed 3299.75 samples/sec   Loss 2.1581   LearningRate 0.0080   Epoch: 14   Global Step: 59340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:20,556-Speed 3321.13 samples/sec   Loss 2.1864   LearningRate 0.0080   Epoch: 14   Global Step: 59350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:23,645-Speed 3315.32 samples/sec   Loss 2.2126   LearningRate 0.0080   Epoch: 14   Global Step: 59360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:26,748-Speed 3301.18 samples/sec   Loss 2.2259   LearningRate 0.0080   Epoch: 14   Global Step: 59370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:29,850-Speed 3302.25 samples/sec   Loss 2.2769   LearningRate 0.0080   Epoch: 14   Global Step: 59380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:32,934-Speed 3320.80 samples/sec   Loss 2.1515   LearningRate 0.0079   Epoch: 14   Global Step: 59390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:36,014-Speed 3325.10 samples/sec   Loss 2.1898   LearningRate 0.0079   Epoch: 14   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:33:39,100-Speed 3319.48 samples/sec   Loss 2.2099   LearningRate 0.0079   Epoch: 14   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:33:42,181-Speed 3323.79 samples/sec   Loss 2.1894   LearningRate 0.0079   Epoch: 14   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:33:45,263-Speed 3323.53 samples/sec   Loss 2.2594   LearningRate 0.0079   Epoch: 14   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:33:48,362-Speed 3305.15 samples/sec   Loss 2.2700   LearningRate 0.0079   Epoch: 14   Global Step: 59440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:51,469-Speed 3296.46 samples/sec   Loss 2.2006   LearningRate 0.0079   Epoch: 14   Global Step: 59450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:54,583-Speed 3288.77 samples/sec   Loss 2.2573   LearningRate 0.0079   Epoch: 14   Global Step: 59460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:33:57,666-Speed 3323.09 samples/sec   Loss 2.1841   LearningRate 0.0079   Epoch: 14   Global Step: 59470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:00,749-Speed 3321.97 samples/sec   Loss 2.1922   LearningRate 0.0079   Epoch: 14   Global Step: 59480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:03,839-Speed 3314.74 samples/sec   Loss 2.2717   LearningRate 0.0079   Epoch: 14   Global Step: 59490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:06,920-Speed 3323.95 samples/sec   Loss 2.2187   LearningRate 0.0079   Epoch: 14   Global Step: 59500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:10,010-Speed 3315.07 samples/sec   Loss 2.1758   LearningRate 0.0079   Epoch: 14   Global Step: 59510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:13,162-Speed 3249.22 samples/sec   Loss 2.1287   LearningRate 0.0079   Epoch: 14   Global Step: 59520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:16,272-Speed 3293.56 samples/sec   Loss 2.1704   LearningRate 0.0078   Epoch: 14   Global Step: 59530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:19,357-Speed 3320.11 samples/sec   Loss 2.2471   LearningRate 0.0078   Epoch: 14   Global Step: 59540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:22,440-Speed 3321.58 samples/sec   Loss 2.2188   LearningRate 0.0078   Epoch: 14   Global Step: 59550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:25,525-Speed 3319.62 samples/sec   Loss 2.1572   LearningRate 0.0078   Epoch: 14   Global Step: 59560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:28,611-Speed 3319.81 samples/sec   Loss 2.2637   LearningRate 0.0078   Epoch: 14   Global Step: 59570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:31,709-Speed 3305.68 samples/sec   Loss 2.2828   LearningRate 0.0078   Epoch: 14   Global Step: 59580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:34,791-Speed 3323.13 samples/sec   Loss 2.1823   LearningRate 0.0078   Epoch: 14   Global Step: 59590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:37,881-Speed 3315.67 samples/sec   Loss 2.2196   LearningRate 0.0078   Epoch: 14   Global Step: 59600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:40,968-Speed 3317.88 samples/sec   Loss 2.2227   LearningRate 0.0078   Epoch: 14   Global Step: 59610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:44,062-Speed 3309.75 samples/sec   Loss 2.1652   LearningRate 0.0078   Epoch: 14   Global Step: 59620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:47,162-Speed 3304.54 samples/sec   Loss 2.2656   LearningRate 0.0078   Epoch: 14   Global Step: 59630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:50,229-Speed 3338.71 samples/sec   Loss 2.2673   LearningRate 0.0078   Epoch: 14   Global Step: 59640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:34:53,309-Speed 3325.48 samples/sec   Loss 2.2525   LearningRate 0.0078   Epoch: 14   Global Step: 59650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:56,404-Speed 3308.69 samples/sec   Loss 2.1635   LearningRate 0.0078   Epoch: 14   Global Step: 59660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:34:59,500-Speed 3309.28 samples/sec   Loss 2.2293   LearningRate 0.0078   Epoch: 14   Global Step: 59670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:02,610-Speed 3293.09 samples/sec   Loss 2.2212   LearningRate 0.0077   Epoch: 14   Global Step: 59680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:05,712-Speed 3301.39 samples/sec   Loss 2.2820   LearningRate 0.0077   Epoch: 14   Global Step: 59690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:08,799-Speed 3318.73 samples/sec   Loss 2.2309   LearningRate 0.0077   Epoch: 14   Global Step: 59700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:11,886-Speed 3318.02 samples/sec   Loss 2.2413   LearningRate 0.0077   Epoch: 14   Global Step: 59710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:14,993-Speed 3296.08 samples/sec   Loss 2.2069   LearningRate 0.0077   Epoch: 14   Global Step: 59720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:18,088-Speed 3309.61 samples/sec   Loss 2.2289   LearningRate 0.0077   Epoch: 14   Global Step: 59730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:21,177-Speed 3315.14 samples/sec   Loss 2.2341   LearningRate 0.0077   Epoch: 14   Global Step: 59740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:24,280-Speed 3301.13 samples/sec   Loss 2.2679   LearningRate 0.0077   Epoch: 14   Global Step: 59750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:27,366-Speed 3318.30 samples/sec   Loss 2.3128   LearningRate 0.0077   Epoch: 14   Global Step: 59760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:30,464-Speed 3306.17 samples/sec   Loss 2.2220   LearningRate 0.0077   Epoch: 14   Global Step: 59770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:33,549-Speed 3320.61 samples/sec   Loss 2.2471   LearningRate 0.0077   Epoch: 14   Global Step: 59780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:36,634-Speed 3319.93 samples/sec   Loss 2.2006   LearningRate 0.0077   Epoch: 14   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:39,723-Speed 3315.72 samples/sec   Loss 2.2858   LearningRate 0.0077   Epoch: 14   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:42,807-Speed 3320.57 samples/sec   Loss 2.1839   LearningRate 0.0077   Epoch: 14   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:45,897-Speed 3314.51 samples/sec   Loss 2.2576   LearningRate 0.0077   Epoch: 14   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:49,005-Speed 3295.81 samples/sec   Loss 2.2344   LearningRate 0.0076   Epoch: 14   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:52,094-Speed 3315.95 samples/sec   Loss 2.2501   LearningRate 0.0076   Epoch: 14   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:35:55,214-Speed 3282.47 samples/sec   Loss 2.2875   LearningRate 0.0076   Epoch: 14   Global Step: 59850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:35:58,304-Speed 3314.64 samples/sec   Loss 2.2578   LearningRate 0.0076   Epoch: 14   Global Step: 59860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:01,398-Speed 3310.23 samples/sec   Loss 2.2445   LearningRate 0.0076   Epoch: 14   Global Step: 59870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:04,488-Speed 3315.21 samples/sec   Loss 2.3157   LearningRate 0.0076   Epoch: 14   Global Step: 59880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:07,582-Speed 3310.94 samples/sec   Loss 2.2592   LearningRate 0.0076   Epoch: 14   Global Step: 59890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:10,673-Speed 3313.45 samples/sec   Loss 2.2273   LearningRate 0.0076   Epoch: 14   Global Step: 59900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:13,760-Speed 3317.04 samples/sec   Loss 2.2219   LearningRate 0.0076   Epoch: 14   Global Step: 59910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:16,850-Speed 3315.69 samples/sec   Loss 2.2990   LearningRate 0.0076   Epoch: 14   Global Step: 59920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:19,943-Speed 3311.33 samples/sec   Loss 2.2511   LearningRate 0.0076   Epoch: 14   Global Step: 59930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:23,040-Speed 3306.76 samples/sec   Loss 2.2915   LearningRate 0.0076   Epoch: 14   Global Step: 59940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:36:26,142-Speed 3301.53 samples/sec   Loss 2.3039   LearningRate 0.0076   Epoch: 14   Global Step: 59950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:36:29,245-Speed 3300.59 samples/sec   Loss 2.3016   LearningRate 0.0076   Epoch: 14   Global Step: 59960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:36:32,337-Speed 3312.58 samples/sec   Loss 2.3078   LearningRate 0.0076   Epoch: 14   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:36:35,424-Speed 3318.09 samples/sec   Loss 2.2825   LearningRate 0.0075   Epoch: 14   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:36:38,513-Speed 3315.72 samples/sec   Loss 2.2360   LearningRate 0.0075   Epoch: 14   Global Step: 59990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:36:41,638-Speed 3277.29 samples/sec   Loss 2.2696   LearningRate 0.0075   Epoch: 14   Global Step: 60000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:37:25,089-[lfw][60000]XNorm: 21.434355
Training: 2022-04-26 18:37:25,090-[lfw][60000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-26 18:37:25,090-[lfw][60000]Accuracy-Highest: 0.99833
Training: 2022-04-26 18:38:15,466-[cfp_fp][60000]XNorm: 21.303049
Training: 2022-04-26 18:38:15,466-[cfp_fp][60000]Accuracy-Flip: 0.99129+-0.00440
Training: 2022-04-26 18:38:15,467-[cfp_fp][60000]Accuracy-Highest: 0.99129
Training: 2022-04-26 18:38:58,788-[agedb_30][60000]XNorm: 21.759450
Training: 2022-04-26 18:38:58,788-[agedb_30][60000]Accuracy-Flip: 0.97583+-0.00634
Training: 2022-04-26 18:38:58,789-[agedb_30][60000]Accuracy-Highest: 0.97650
Training: 2022-04-26 18:39:01,864-Speed 73.03 samples/sec   Loss 2.2705   LearningRate 0.0075   Epoch: 14   Global Step: 60010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:04,924-Speed 3346.77 samples/sec   Loss 2.3092   LearningRate 0.0075   Epoch: 14   Global Step: 60020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:08,003-Speed 3326.56 samples/sec   Loss 2.3246   LearningRate 0.0075   Epoch: 14   Global Step: 60030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:11,077-Speed 3332.00 samples/sec   Loss 2.3100   LearningRate 0.0075   Epoch: 14   Global Step: 60040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:14,153-Speed 3330.11 samples/sec   Loss 2.2609   LearningRate 0.0075   Epoch: 14   Global Step: 60050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:17,233-Speed 3325.54 samples/sec   Loss 2.2840   LearningRate 0.0075   Epoch: 14   Global Step: 60060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:20,312-Speed 3326.27 samples/sec   Loss 2.2780   LearningRate 0.0075   Epoch: 14   Global Step: 60070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:23,416-Speed 3299.41 samples/sec   Loss 2.2427   LearningRate 0.0075   Epoch: 14   Global Step: 60080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:26,495-Speed 3326.41 samples/sec   Loss 2.2638   LearningRate 0.0075   Epoch: 14   Global Step: 60090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:29,582-Speed 3317.85 samples/sec   Loss 2.2701   LearningRate 0.0075   Epoch: 14   Global Step: 60100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:32,673-Speed 3314.03 samples/sec   Loss 2.2544   LearningRate 0.0075   Epoch: 14   Global Step: 60110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:39:35,761-Speed 3316.91 samples/sec   Loss 2.2607   LearningRate 0.0075   Epoch: 14   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:38,844-Speed 3321.70 samples/sec   Loss 2.2399   LearningRate 0.0074   Epoch: 14   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:41,930-Speed 3319.74 samples/sec   Loss 2.3027   LearningRate 0.0074   Epoch: 14   Global Step: 60140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:45,016-Speed 3318.01 samples/sec   Loss 2.3097   LearningRate 0.0074   Epoch: 14   Global Step: 60150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:48,135-Speed 3284.05 samples/sec   Loss 2.2479   LearningRate 0.0074   Epoch: 14   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:51,438-Speed 3101.01 samples/sec   Loss 2.2210   LearningRate 0.0074   Epoch: 14   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:54,559-Speed 3281.64 samples/sec   Loss 2.2737   LearningRate 0.0074   Epoch: 14   Global Step: 60180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:39:57,647-Speed 3316.95 samples/sec   Loss 2.2806   LearningRate 0.0074   Epoch: 14   Global Step: 60190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:40:00,744-Speed 3307.40 samples/sec   Loss 2.3105   LearningRate 0.0074   Epoch: 14   Global Step: 60200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:40:03,824-Speed 3325.36 samples/sec   Loss 2.2462   LearningRate 0.0074   Epoch: 14   Global Step: 60210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:06,900-Speed 3329.65 samples/sec   Loss 2.3107   LearningRate 0.0074   Epoch: 14   Global Step: 60220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:09,990-Speed 3314.29 samples/sec   Loss 2.2387   LearningRate 0.0074   Epoch: 14   Global Step: 60230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:13,072-Speed 3323.04 samples/sec   Loss 2.2380   LearningRate 0.0074   Epoch: 14   Global Step: 60240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:16,159-Speed 3318.33 samples/sec   Loss 2.2704   LearningRate 0.0074   Epoch: 14   Global Step: 60250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:19,236-Speed 3327.96 samples/sec   Loss 2.3188   LearningRate 0.0074   Epoch: 14   Global Step: 60260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:22,310-Speed 3331.96 samples/sec   Loss 2.3396   LearningRate 0.0074   Epoch: 14   Global Step: 60270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:25,385-Speed 3331.02 samples/sec   Loss 2.2237   LearningRate 0.0073   Epoch: 14   Global Step: 60280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:28,465-Speed 3325.65 samples/sec   Loss 2.2367   LearningRate 0.0073   Epoch: 14   Global Step: 60290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:31,542-Speed 3328.46 samples/sec   Loss 2.2993   LearningRate 0.0073   Epoch: 14   Global Step: 60300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:34,618-Speed 3329.69 samples/sec   Loss 2.2659   LearningRate 0.0073   Epoch: 14   Global Step: 60310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:40:37,685-Speed 3339.60 samples/sec   Loss 2.2720   LearningRate 0.0073   Epoch: 14   Global Step: 60320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:40:40,753-Speed 3337.98 samples/sec   Loss 2.3520   LearningRate 0.0073   Epoch: 14   Global Step: 60330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:40:43,825-Speed 3335.22 samples/sec   Loss 2.3152   LearningRate 0.0073   Epoch: 14   Global Step: 60340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:40:46,883-Speed 3348.78 samples/sec   Loss 2.3041   LearningRate 0.0073   Epoch: 14   Global Step: 60350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:49,970-Speed 3318.12 samples/sec   Loss 2.2857   LearningRate 0.0073   Epoch: 14   Global Step: 60360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:53,042-Speed 3334.02 samples/sec   Loss 2.3275   LearningRate 0.0073   Epoch: 14   Global Step: 60370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:56,111-Speed 3336.72 samples/sec   Loss 2.3836   LearningRate 0.0073   Epoch: 14   Global Step: 60380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:40:59,183-Speed 3334.69 samples/sec   Loss 2.3420   LearningRate 0.0073   Epoch: 14   Global Step: 60390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:02,252-Speed 3336.43 samples/sec   Loss 2.2642   LearningRate 0.0073   Epoch: 14   Global Step: 60400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:05,326-Speed 3332.99 samples/sec   Loss 2.2979   LearningRate 0.0073   Epoch: 14   Global Step: 60410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:08,399-Speed 3332.32 samples/sec   Loss 2.2975   LearningRate 0.0073   Epoch: 14   Global Step: 60420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:11,471-Speed 3334.69 samples/sec   Loss 2.4490   LearningRate 0.0073   Epoch: 14   Global Step: 60430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:14,535-Speed 3342.47 samples/sec   Loss 2.3279   LearningRate 0.0072   Epoch: 14   Global Step: 60440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:17,601-Speed 3340.52 samples/sec   Loss 2.3569   LearningRate 0.0072   Epoch: 14   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:41:20,672-Speed 3334.73 samples/sec   Loss 2.3086   LearningRate 0.0072   Epoch: 14   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:41:23,758-Speed 3319.15 samples/sec   Loss 2.3191   LearningRate 0.0072   Epoch: 14   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:41:26,821-Speed 3343.84 samples/sec   Loss 2.2727   LearningRate 0.0072   Epoch: 14   Global Step: 60480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:29,888-Speed 3339.70 samples/sec   Loss 2.3333   LearningRate 0.0072   Epoch: 14   Global Step: 60490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:33,001-Speed 3289.90 samples/sec   Loss 2.3755   LearningRate 0.0072   Epoch: 14   Global Step: 60500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:36,068-Speed 3339.64 samples/sec   Loss 2.3147   LearningRate 0.0072   Epoch: 14   Global Step: 60510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:39,137-Speed 3337.91 samples/sec   Loss 2.2902   LearningRate 0.0072   Epoch: 14   Global Step: 60520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:42,207-Speed 3336.46 samples/sec   Loss 2.3649   LearningRate 0.0072   Epoch: 14   Global Step: 60530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:45,275-Speed 3337.35 samples/sec   Loss 2.2307   LearningRate 0.0072   Epoch: 14   Global Step: 60540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:48,361-Speed 3319.58 samples/sec   Loss 2.2415   LearningRate 0.0072   Epoch: 14   Global Step: 60550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:51,455-Speed 3309.45 samples/sec   Loss 2.2567   LearningRate 0.0072   Epoch: 14   Global Step: 60560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:54,536-Speed 3324.55 samples/sec   Loss 2.2838   LearningRate 0.0072   Epoch: 14   Global Step: 60570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:41:57,605-Speed 3337.43 samples/sec   Loss 2.3000   LearningRate 0.0072   Epoch: 14   Global Step: 60580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:42:00,681-Speed 3330.07 samples/sec   Loss 2.3788   LearningRate 0.0071   Epoch: 14   Global Step: 60590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:03,758-Speed 3328.37 samples/sec   Loss 2.3001   LearningRate 0.0071   Epoch: 14   Global Step: 60600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:06,840-Speed 3323.23 samples/sec   Loss 2.3474   LearningRate 0.0071   Epoch: 14   Global Step: 60610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:09,923-Speed 3322.87 samples/sec   Loss 2.3297   LearningRate 0.0071   Epoch: 14   Global Step: 60620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:12,996-Speed 3332.76 samples/sec   Loss 2.3933   LearningRate 0.0071   Epoch: 14   Global Step: 60630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:16,200-Speed 3197.35 samples/sec   Loss 2.2828   LearningRate 0.0071   Epoch: 14   Global Step: 60640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:19,305-Speed 3298.08 samples/sec   Loss 2.2803   LearningRate 0.0071   Epoch: 14   Global Step: 60650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:22,387-Speed 3323.65 samples/sec   Loss 2.3836   LearningRate 0.0071   Epoch: 14   Global Step: 60660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:25,462-Speed 3331.16 samples/sec   Loss 2.2709   LearningRate 0.0071   Epoch: 14   Global Step: 60670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:28,537-Speed 3330.28 samples/sec   Loss 2.2795   LearningRate 0.0071   Epoch: 14   Global Step: 60680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:31,610-Speed 3333.16 samples/sec   Loss 2.2724   LearningRate 0.0071   Epoch: 14   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:42:34,664-Speed 3353.63 samples/sec   Loss 2.3269   LearningRate 0.0071   Epoch: 14   Global Step: 60700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:37,744-Speed 3325.95 samples/sec   Loss 2.3538   LearningRate 0.0071   Epoch: 14   Global Step: 60710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:40,821-Speed 3327.62 samples/sec   Loss 2.2958   LearningRate 0.0071   Epoch: 14   Global Step: 60720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:43,894-Speed 3333.18 samples/sec   Loss 2.3137   LearningRate 0.0071   Epoch: 14   Global Step: 60730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:46,973-Speed 3326.76 samples/sec   Loss 2.3305   LearningRate 0.0071   Epoch: 14   Global Step: 60740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:50,054-Speed 3325.26 samples/sec   Loss 2.3051   LearningRate 0.0070   Epoch: 14   Global Step: 60750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:53,136-Speed 3322.57 samples/sec   Loss 2.3594   LearningRate 0.0070   Epoch: 14   Global Step: 60760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:42:56,193-Speed 3350.76 samples/sec   Loss 2.3078   LearningRate 0.0070   Epoch: 14   Global Step: 60770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:42:59,273-Speed 3324.93 samples/sec   Loss 2.2739   LearningRate 0.0070   Epoch: 14   Global Step: 60780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:02,352-Speed 3326.84 samples/sec   Loss 2.3115   LearningRate 0.0070   Epoch: 14   Global Step: 60790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:05,431-Speed 3327.07 samples/sec   Loss 2.3806   LearningRate 0.0070   Epoch: 14   Global Step: 60800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:08,507-Speed 3329.05 samples/sec   Loss 2.3207   LearningRate 0.0070   Epoch: 14   Global Step: 60810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:11,588-Speed 3325.03 samples/sec   Loss 2.2802   LearningRate 0.0070   Epoch: 14   Global Step: 60820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:14,673-Speed 3318.93 samples/sec   Loss 2.3221   LearningRate 0.0070   Epoch: 14   Global Step: 60830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:17,758-Speed 3321.38 samples/sec   Loss 2.2527   LearningRate 0.0070   Epoch: 14   Global Step: 60840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:20,837-Speed 3326.10 samples/sec   Loss 2.2710   LearningRate 0.0070   Epoch: 14   Global Step: 60850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:23,919-Speed 3322.87 samples/sec   Loss 2.2889   LearningRate 0.0070   Epoch: 14   Global Step: 60860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:43:27,036-Speed 3286.17 samples/sec   Loss 2.3014   LearningRate 0.0070   Epoch: 14   Global Step: 60870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:30,203-Speed 3233.61 samples/sec   Loss 2.3382   LearningRate 0.0070   Epoch: 14   Global Step: 60880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:33,280-Speed 3328.98 samples/sec   Loss 2.2848   LearningRate 0.0070   Epoch: 14   Global Step: 60890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:36,354-Speed 3332.08 samples/sec   Loss 2.3282   LearningRate 0.0069   Epoch: 14   Global Step: 60900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:39,434-Speed 3325.26 samples/sec   Loss 2.3365   LearningRate 0.0069   Epoch: 14   Global Step: 60910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:42,514-Speed 3325.33 samples/sec   Loss 2.2983   LearningRate 0.0069   Epoch: 14   Global Step: 60920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:45,591-Speed 3328.68 samples/sec   Loss 2.2309   LearningRate 0.0069   Epoch: 14   Global Step: 60930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:48,687-Speed 3308.80 samples/sec   Loss 2.2759   LearningRate 0.0069   Epoch: 14   Global Step: 60940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:51,764-Speed 3328.06 samples/sec   Loss 2.3170   LearningRate 0.0069   Epoch: 14   Global Step: 60950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:54,844-Speed 3325.96 samples/sec   Loss 2.2946   LearningRate 0.0069   Epoch: 14   Global Step: 60960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:43:57,922-Speed 3327.23 samples/sec   Loss 2.2876   LearningRate 0.0069   Epoch: 14   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:01,000-Speed 3327.14 samples/sec   Loss 2.3619   LearningRate 0.0069   Epoch: 14   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:04,080-Speed 3325.49 samples/sec   Loss 2.3483   LearningRate 0.0069   Epoch: 14   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:07,163-Speed 3322.72 samples/sec   Loss 2.2995   LearningRate 0.0069   Epoch: 14   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:10,248-Speed 3319.05 samples/sec   Loss 2.2875   LearningRate 0.0069   Epoch: 14   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:13,381-Speed 3269.20 samples/sec   Loss 2.3259   LearningRate 0.0069   Epoch: 14   Global Step: 61020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:16,509-Speed 3274.62 samples/sec   Loss 2.3650   LearningRate 0.0069   Epoch: 14   Global Step: 61030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:19,668-Speed 3243.21 samples/sec   Loss 2.3083   LearningRate 0.0069   Epoch: 14   Global Step: 61040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:22,750-Speed 3322.85 samples/sec   Loss 2.3103   LearningRate 0.0069   Epoch: 14   Global Step: 61050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:25,837-Speed 3317.69 samples/sec   Loss 2.3967   LearningRate 0.0068   Epoch: 14   Global Step: 61060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:28,898-Speed 3345.93 samples/sec   Loss 2.3153   LearningRate 0.0068   Epoch: 14   Global Step: 61070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:31,976-Speed 3327.98 samples/sec   Loss 2.2757   LearningRate 0.0068   Epoch: 14   Global Step: 61080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:35,058-Speed 3322.49 samples/sec   Loss 2.3016   LearningRate 0.0068   Epoch: 14   Global Step: 61090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:44:38,122-Speed 3343.30 samples/sec   Loss 2.2594   LearningRate 0.0068   Epoch: 14   Global Step: 61100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:44:41,204-Speed 3323.64 samples/sec   Loss 2.3238   LearningRate 0.0068   Epoch: 14   Global Step: 61110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:44:44,292-Speed 3316.07 samples/sec   Loss 2.3794   LearningRate 0.0068   Epoch: 14   Global Step: 61120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:44:47,371-Speed 3326.75 samples/sec   Loss 2.3114   LearningRate 0.0068   Epoch: 14   Global Step: 61130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:44:50,462-Speed 3313.61 samples/sec   Loss 2.3254   LearningRate 0.0068   Epoch: 14   Global Step: 61140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:44:53,556-Speed 3311.21 samples/sec   Loss 2.3576   LearningRate 0.0068   Epoch: 14   Global Step: 61150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:44:56,632-Speed 3328.98 samples/sec   Loss 2.2992   LearningRate 0.0068   Epoch: 14   Global Step: 61160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:44:59,712-Speed 3325.67 samples/sec   Loss 2.3399   LearningRate 0.0068   Epoch: 14   Global Step: 61170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:45:02,800-Speed 3316.20 samples/sec   Loss 2.3284   LearningRate 0.0068   Epoch: 14   Global Step: 61180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:45:05,884-Speed 3321.47 samples/sec   Loss 2.3213   LearningRate 0.0068   Epoch: 14   Global Step: 61190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:45:08,969-Speed 3319.85 samples/sec   Loss 2.2926   LearningRate 0.0068   Epoch: 14   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:45:12,045-Speed 3329.73 samples/sec   Loss 2.3359   LearningRate 0.0068   Epoch: 14   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:45:15,109-Speed 3342.59 samples/sec   Loss 2.3437   LearningRate 0.0067   Epoch: 14   Global Step: 61220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:45:18,190-Speed 3325.09 samples/sec   Loss 2.3517   LearningRate 0.0067   Epoch: 14   Global Step: 61230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:45:21,272-Speed 3323.60 samples/sec   Loss 2.3076   LearningRate 0.0067   Epoch: 14   Global Step: 61240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:45:24,340-Speed 3338.23 samples/sec   Loss 2.3291   LearningRate 0.0067   Epoch: 14   Global Step: 61250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:27,433-Speed 3311.39 samples/sec   Loss 2.3302   LearningRate 0.0067   Epoch: 14   Global Step: 61260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:30,518-Speed 3319.75 samples/sec   Loss 2.3428   LearningRate 0.0067   Epoch: 14   Global Step: 61270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:33,598-Speed 3324.84 samples/sec   Loss 2.2924   LearningRate 0.0067   Epoch: 14   Global Step: 61280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:36,676-Speed 3328.09 samples/sec   Loss 2.3653   LearningRate 0.0067   Epoch: 14   Global Step: 61290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:39,757-Speed 3324.56 samples/sec   Loss 2.3262   LearningRate 0.0067   Epoch: 14   Global Step: 61300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:42,836-Speed 3326.50 samples/sec   Loss 2.3437   LearningRate 0.0067   Epoch: 14   Global Step: 61310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:45,915-Speed 3326.75 samples/sec   Loss 2.3504   LearningRate 0.0067   Epoch: 14   Global Step: 61320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:49,015-Speed 3304.09 samples/sec   Loss 2.3314   LearningRate 0.0067   Epoch: 14   Global Step: 61330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:52,090-Speed 3330.19 samples/sec   Loss 2.4042   LearningRate 0.0067   Epoch: 14   Global Step: 61340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:45:55,173-Speed 3322.40 samples/sec   Loss 2.2777   LearningRate 0.0067   Epoch: 14   Global Step: 61350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:45:58,252-Speed 3326.14 samples/sec   Loss 2.3561   LearningRate 0.0067   Epoch: 14   Global Step: 61360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:01,333-Speed 3324.09 samples/sec   Loss 2.3525   LearningRate 0.0067   Epoch: 14   Global Step: 61370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:04,413-Speed 3325.35 samples/sec   Loss 2.4029   LearningRate 0.0066   Epoch: 14   Global Step: 61380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:07,494-Speed 3324.44 samples/sec   Loss 2.3635   LearningRate 0.0066   Epoch: 14   Global Step: 61390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:10,572-Speed 3327.95 samples/sec   Loss 2.2637   LearningRate 0.0066   Epoch: 14   Global Step: 61400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:13,652-Speed 3325.46 samples/sec   Loss 2.3691   LearningRate 0.0066   Epoch: 14   Global Step: 61410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:16,731-Speed 3326.85 samples/sec   Loss 2.3643   LearningRate 0.0066   Epoch: 14   Global Step: 61420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:19,812-Speed 3324.17 samples/sec   Loss 2.3783   LearningRate 0.0066   Epoch: 14   Global Step: 61430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:22,895-Speed 3321.89 samples/sec   Loss 2.3713   LearningRate 0.0066   Epoch: 14   Global Step: 61440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:25,973-Speed 3327.38 samples/sec   Loss 2.3444   LearningRate 0.0066   Epoch: 14   Global Step: 61450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:46:29,039-Speed 3341.13 samples/sec   Loss 2.3395   LearningRate 0.0066   Epoch: 14   Global Step: 61460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:32,126-Speed 3317.47 samples/sec   Loss 2.3337   LearningRate 0.0066   Epoch: 14   Global Step: 61470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:35,215-Speed 3315.76 samples/sec   Loss 2.2591   LearningRate 0.0066   Epoch: 14   Global Step: 61480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:38,305-Speed 3314.78 samples/sec   Loss 2.3246   LearningRate 0.0066   Epoch: 14   Global Step: 61490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:41,433-Speed 3274.09 samples/sec   Loss 2.3448   LearningRate 0.0066   Epoch: 14   Global Step: 61500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:44,518-Speed 3320.46 samples/sec   Loss 2.2622   LearningRate 0.0066   Epoch: 14   Global Step: 61510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:47,609-Speed 3313.44 samples/sec   Loss 2.3824   LearningRate 0.0066   Epoch: 14   Global Step: 61520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:50,693-Speed 3320.99 samples/sec   Loss 2.3978   LearningRate 0.0066   Epoch: 14   Global Step: 61530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:53,784-Speed 3313.95 samples/sec   Loss 2.3296   LearningRate 0.0065   Epoch: 14   Global Step: 61540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:56,861-Speed 3327.80 samples/sec   Loss 2.3951   LearningRate 0.0065   Epoch: 14   Global Step: 61550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:46:59,944-Speed 3322.29 samples/sec   Loss 2.2960   LearningRate 0.0065   Epoch: 14   Global Step: 61560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:03,034-Speed 3315.23 samples/sec   Loss 2.2815   LearningRate 0.0065   Epoch: 14   Global Step: 61570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:06,115-Speed 3323.99 samples/sec   Loss 2.3233   LearningRate 0.0065   Epoch: 14   Global Step: 61580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:09,199-Speed 3321.51 samples/sec   Loss 2.3658   LearningRate 0.0065   Epoch: 14   Global Step: 61590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:12,262-Speed 3344.08 samples/sec   Loss 2.3251   LearningRate 0.0065   Epoch: 14   Global Step: 61600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:15,343-Speed 3324.41 samples/sec   Loss 2.3238   LearningRate 0.0065   Epoch: 14   Global Step: 61610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:18,418-Speed 3331.03 samples/sec   Loss 2.3925   LearningRate 0.0065   Epoch: 14   Global Step: 61620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:21,496-Speed 3327.02 samples/sec   Loss 2.3501   LearningRate 0.0065   Epoch: 14   Global Step: 61630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:24,579-Speed 3322.12 samples/sec   Loss 2.3069   LearningRate 0.0065   Epoch: 14   Global Step: 61640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:27,658-Speed 3326.77 samples/sec   Loss 2.3755   LearningRate 0.0065   Epoch: 14   Global Step: 61650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:30,790-Speed 3269.95 samples/sec   Loss 2.3913   LearningRate 0.0065   Epoch: 14   Global Step: 61660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:33,868-Speed 3327.41 samples/sec   Loss 2.3171   LearningRate 0.0065   Epoch: 14   Global Step: 61670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:36,947-Speed 3325.94 samples/sec   Loss 2.3142   LearningRate 0.0065   Epoch: 14   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:40,039-Speed 3313.53 samples/sec   Loss 2.3014   LearningRate 0.0065   Epoch: 14   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:47:43,118-Speed 3326.04 samples/sec   Loss 2.4159   LearningRate 0.0064   Epoch: 14   Global Step: 61700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:46,198-Speed 3325.66 samples/sec   Loss 2.3634   LearningRate 0.0064   Epoch: 14   Global Step: 61710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:49,305-Speed 3297.31 samples/sec   Loss 2.3150   LearningRate 0.0064   Epoch: 14   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:52,395-Speed 3314.03 samples/sec   Loss 2.2791   LearningRate 0.0064   Epoch: 14   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:55,477-Speed 3323.80 samples/sec   Loss 2.3848   LearningRate 0.0064   Epoch: 14   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:47:58,557-Speed 3324.62 samples/sec   Loss 2.3442   LearningRate 0.0064   Epoch: 14   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:01,646-Speed 3316.55 samples/sec   Loss 2.4284   LearningRate 0.0064   Epoch: 14   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:04,729-Speed 3321.52 samples/sec   Loss 2.3532   LearningRate 0.0064   Epoch: 14   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:07,812-Speed 3322.09 samples/sec   Loss 2.2748   LearningRate 0.0064   Epoch: 14   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:10,897-Speed 3320.45 samples/sec   Loss 2.3338   LearningRate 0.0064   Epoch: 14   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:13,963-Speed 3341.01 samples/sec   Loss 2.3704   LearningRate 0.0064   Epoch: 14   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:17,043-Speed 3324.89 samples/sec   Loss 2.3415   LearningRate 0.0064   Epoch: 14   Global Step: 61810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:20,143-Speed 3304.06 samples/sec   Loss 2.3879   LearningRate 0.0064   Epoch: 14   Global Step: 61820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:23,222-Speed 3326.89 samples/sec   Loss 2.2814   LearningRate 0.0064   Epoch: 14   Global Step: 61830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:26,302-Speed 3325.26 samples/sec   Loss 2.3279   LearningRate 0.0064   Epoch: 14   Global Step: 61840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:29,382-Speed 3325.67 samples/sec   Loss 2.2649   LearningRate 0.0064   Epoch: 14   Global Step: 61850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:32,462-Speed 3324.81 samples/sec   Loss 2.3906   LearningRate 0.0064   Epoch: 14   Global Step: 61860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:35,551-Speed 3316.22 samples/sec   Loss 2.3856   LearningRate 0.0063   Epoch: 14   Global Step: 61870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:38,628-Speed 3327.56 samples/sec   Loss 2.3639   LearningRate 0.0063   Epoch: 14   Global Step: 61880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:48:41,696-Speed 3339.80 samples/sec   Loss 2.3578   LearningRate 0.0063   Epoch: 14   Global Step: 61890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:48:44,786-Speed 3314.68 samples/sec   Loss 2.3212   LearningRate 0.0063   Epoch: 14   Global Step: 61900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:48:47,870-Speed 3320.91 samples/sec   Loss 2.3504   LearningRate 0.0063   Epoch: 14   Global Step: 61910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:48:50,954-Speed 3320.99 samples/sec   Loss 2.3825   LearningRate 0.0063   Epoch: 14   Global Step: 61920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:48:54,038-Speed 3320.35 samples/sec   Loss 2.3513   LearningRate 0.0063   Epoch: 14   Global Step: 61930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:48:57,124-Speed 3319.55 samples/sec   Loss 2.3981   LearningRate 0.0063   Epoch: 14   Global Step: 61940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:49:00,207-Speed 3322.42 samples/sec   Loss 2.3368   LearningRate 0.0063   Epoch: 14   Global Step: 61950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:49:03,288-Speed 3323.58 samples/sec   Loss 2.3906   LearningRate 0.0063   Epoch: 14   Global Step: 61960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:49:06,375-Speed 3318.37 samples/sec   Loss 2.3511   LearningRate 0.0063   Epoch: 14   Global Step: 61970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:49:09,459-Speed 3320.84 samples/sec   Loss 2.3109   LearningRate 0.0063   Epoch: 14   Global Step: 61980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:49:12,541-Speed 3323.80 samples/sec   Loss 2.4216   LearningRate 0.0063   Epoch: 14   Global Step: 61990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:49:15,623-Speed 3322.36 samples/sec   Loss 2.3487   LearningRate 0.0063   Epoch: 14   Global Step: 62000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:49:59,273-[lfw][62000]XNorm: 22.630310
Training: 2022-04-26 18:49:59,273-[lfw][62000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-26 18:49:59,274-[lfw][62000]Accuracy-Highest: 0.99833
Training: 2022-04-26 18:50:49,922-[cfp_fp][62000]XNorm: 22.530930
Training: 2022-04-26 18:50:49,923-[cfp_fp][62000]Accuracy-Flip: 0.99129+-0.00449
Training: 2022-04-26 18:50:49,923-[cfp_fp][62000]Accuracy-Highest: 0.99129
Training: 2022-04-26 18:51:33,626-[agedb_30][62000]XNorm: 23.051043
Training: 2022-04-26 18:51:33,626-[agedb_30][62000]Accuracy-Flip: 0.97633+-0.00605
Training: 2022-04-26 18:51:33,627-[agedb_30][62000]Accuracy-Highest: 0.97650
Training: 2022-04-26 18:51:36,690-Speed 72.59 samples/sec   Loss 2.4173   LearningRate 0.0063   Epoch: 14   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:51:39,826-Speed 3265.94 samples/sec   Loss 2.3876   LearningRate 0.0063   Epoch: 14   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:51:52,802-Speed 789.21 samples/sec   Loss 1.8921   LearningRate 0.0062   Epoch: 15   Global Step: 62030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:51:55,882-Speed 3325.50 samples/sec   Loss 1.4211   LearningRate 0.0062   Epoch: 15   Global Step: 62040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:51:59,065-Speed 3218.23 samples/sec   Loss 1.4010   LearningRate 0.0062   Epoch: 15   Global Step: 62050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:02,139-Speed 3331.92 samples/sec   Loss 1.4414   LearningRate 0.0062   Epoch: 15   Global Step: 62060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:05,222-Speed 3323.02 samples/sec   Loss 1.3953   LearningRate 0.0062   Epoch: 15   Global Step: 62070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:08,319-Speed 3306.95 samples/sec   Loss 1.3610   LearningRate 0.0062   Epoch: 15   Global Step: 62080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:11,413-Speed 3309.61 samples/sec   Loss 1.3637   LearningRate 0.0062   Epoch: 15   Global Step: 62090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:14,536-Speed 3279.68 samples/sec   Loss 1.4834   LearningRate 0.0062   Epoch: 15   Global Step: 62100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:17,643-Speed 3296.63 samples/sec   Loss 1.3901   LearningRate 0.0062   Epoch: 15   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:52:20,725-Speed 3323.37 samples/sec   Loss 1.4368   LearningRate 0.0062   Epoch: 15   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:52:23,800-Speed 3331.31 samples/sec   Loss 1.3996   LearningRate 0.0062   Epoch: 15   Global Step: 62130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:26,886-Speed 3318.44 samples/sec   Loss 1.4397   LearningRate 0.0062   Epoch: 15   Global Step: 62140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:29,971-Speed 3319.80 samples/sec   Loss 1.3975   LearningRate 0.0062   Epoch: 15   Global Step: 62150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:33,057-Speed 3319.41 samples/sec   Loss 1.4167   LearningRate 0.0062   Epoch: 15   Global Step: 62160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:36,172-Speed 3288.55 samples/sec   Loss 1.4382   LearningRate 0.0062   Epoch: 15   Global Step: 62170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:39,331-Speed 3241.94 samples/sec   Loss 1.4439   LearningRate 0.0062   Epoch: 15   Global Step: 62180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:52:42,422-Speed 3313.85 samples/sec   Loss 1.4608   LearningRate 0.0062   Epoch: 15   Global Step: 62190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:52:45,512-Speed 3314.79 samples/sec   Loss 1.3997   LearningRate 0.0061   Epoch: 15   Global Step: 62200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:52:48,593-Speed 3323.61 samples/sec   Loss 1.4325   LearningRate 0.0061   Epoch: 15   Global Step: 62210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:52:51,678-Speed 3320.16 samples/sec   Loss 1.4482   LearningRate 0.0061   Epoch: 15   Global Step: 62220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:52:54,766-Speed 3317.26 samples/sec   Loss 1.4467   LearningRate 0.0061   Epoch: 15   Global Step: 62230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:52:57,863-Speed 3306.53 samples/sec   Loss 1.4463   LearningRate 0.0061   Epoch: 15   Global Step: 62240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:53:00,945-Speed 3323.67 samples/sec   Loss 1.4099   LearningRate 0.0061   Epoch: 15   Global Step: 62250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:53:04,019-Speed 3331.91 samples/sec   Loss 1.3817   LearningRate 0.0061   Epoch: 15   Global Step: 62260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:53:07,099-Speed 3324.52 samples/sec   Loss 1.4416   LearningRate 0.0061   Epoch: 15   Global Step: 62270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:53:10,186-Speed 3317.87 samples/sec   Loss 1.4275   LearningRate 0.0061   Epoch: 15   Global Step: 62280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 18:53:13,270-Speed 3322.31 samples/sec   Loss 1.4686   LearningRate 0.0061   Epoch: 15   Global Step: 62290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:16,349-Speed 3326.55 samples/sec   Loss 1.5298   LearningRate 0.0061   Epoch: 15   Global Step: 62300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:19,428-Speed 3326.21 samples/sec   Loss 1.4454   LearningRate 0.0061   Epoch: 15   Global Step: 62310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:22,504-Speed 3329.04 samples/sec   Loss 1.4607   LearningRate 0.0061   Epoch: 15   Global Step: 62320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:25,587-Speed 3322.80 samples/sec   Loss 1.4511   LearningRate 0.0061   Epoch: 15   Global Step: 62330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:28,664-Speed 3328.01 samples/sec   Loss 1.4847   LearningRate 0.0061   Epoch: 15   Global Step: 62340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:31,743-Speed 3326.67 samples/sec   Loss 1.4101   LearningRate 0.0061   Epoch: 15   Global Step: 62350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:34,827-Speed 3321.79 samples/sec   Loss 1.4600   LearningRate 0.0060   Epoch: 15   Global Step: 62360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:37,901-Speed 3331.73 samples/sec   Loss 1.4777   LearningRate 0.0060   Epoch: 15   Global Step: 62370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:40,978-Speed 3328.35 samples/sec   Loss 1.4385   LearningRate 0.0060   Epoch: 15   Global Step: 62380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:53:44,054-Speed 3329.84 samples/sec   Loss 1.4953   LearningRate 0.0060   Epoch: 15   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:53:47,167-Speed 3290.29 samples/sec   Loss 1.4700   LearningRate 0.0060   Epoch: 15   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:53:50,244-Speed 3329.26 samples/sec   Loss 1.4979   LearningRate 0.0060   Epoch: 15   Global Step: 62410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:53:53,316-Speed 3333.33 samples/sec   Loss 1.4624   LearningRate 0.0060   Epoch: 15   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:53:56,389-Speed 3333.89 samples/sec   Loss 1.4683   LearningRate 0.0060   Epoch: 15   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:53:59,469-Speed 3325.75 samples/sec   Loss 1.4959   LearningRate 0.0060   Epoch: 15   Global Step: 62440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:02,562-Speed 3310.63 samples/sec   Loss 1.5186   LearningRate 0.0060   Epoch: 15   Global Step: 62450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:05,637-Speed 3331.42 samples/sec   Loss 1.4345   LearningRate 0.0060   Epoch: 15   Global Step: 62460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:08,710-Speed 3332.47 samples/sec   Loss 1.5232   LearningRate 0.0060   Epoch: 15   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:11,794-Speed 3321.95 samples/sec   Loss 1.5009   LearningRate 0.0060   Epoch: 15   Global Step: 62480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:14,871-Speed 3328.68 samples/sec   Loss 1.4767   LearningRate 0.0060   Epoch: 15   Global Step: 62490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-26 18:54:17,931-Speed 3346.88 samples/sec   Loss 1.5206   LearningRate 0.0060   Epoch: 15   Global Step: 62500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:21,010-Speed 3326.35 samples/sec   Loss 1.5243   LearningRate 0.0060   Epoch: 15   Global Step: 62510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:24,077-Speed 3339.63 samples/sec   Loss 1.4919   LearningRate 0.0060   Epoch: 15   Global Step: 62520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:27,148-Speed 3334.54 samples/sec   Loss 1.5123   LearningRate 0.0059   Epoch: 15   Global Step: 62530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:30,224-Speed 3330.08 samples/sec   Loss 1.5060   LearningRate 0.0059   Epoch: 15   Global Step: 62540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:33,297-Speed 3333.02 samples/sec   Loss 1.5054   LearningRate 0.0059   Epoch: 15   Global Step: 62550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:36,370-Speed 3333.14 samples/sec   Loss 1.5327   LearningRate 0.0059   Epoch: 15   Global Step: 62560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:39,453-Speed 3321.66 samples/sec   Loss 1.5048   LearningRate 0.0059   Epoch: 15   Global Step: 62570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:42,534-Speed 3325.02 samples/sec   Loss 1.5140   LearningRate 0.0059   Epoch: 15   Global Step: 62580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:45,623-Speed 3315.47 samples/sec   Loss 1.4687   LearningRate 0.0059   Epoch: 15   Global Step: 62590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:48,698-Speed 3331.59 samples/sec   Loss 1.4868   LearningRate 0.0059   Epoch: 15   Global Step: 62600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:51,780-Speed 3323.04 samples/sec   Loss 1.5533   LearningRate 0.0059   Epoch: 15   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:54:54,855-Speed 3330.80 samples/sec   Loss 1.5171   LearningRate 0.0059   Epoch: 15   Global Step: 62620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:54:57,929-Speed 3331.59 samples/sec   Loss 1.4971   LearningRate 0.0059   Epoch: 15   Global Step: 62630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:00,999-Speed 3335.77 samples/sec   Loss 1.4974   LearningRate 0.0059   Epoch: 15   Global Step: 62640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:04,084-Speed 3320.08 samples/sec   Loss 1.5814   LearningRate 0.0059   Epoch: 15   Global Step: 62650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:07,159-Speed 3331.45 samples/sec   Loss 1.5181   LearningRate 0.0059   Epoch: 15   Global Step: 62660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:10,233-Speed 3332.07 samples/sec   Loss 1.5103   LearningRate 0.0059   Epoch: 15   Global Step: 62670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:13,314-Speed 3324.08 samples/sec   Loss 1.4983   LearningRate 0.0059   Epoch: 15   Global Step: 62680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:16,389-Speed 3331.21 samples/sec   Loss 1.5366   LearningRate 0.0059   Epoch: 15   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:19,461-Speed 3333.16 samples/sec   Loss 1.5632   LearningRate 0.0058   Epoch: 15   Global Step: 62700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:22,539-Speed 3328.18 samples/sec   Loss 1.5693   LearningRate 0.0058   Epoch: 15   Global Step: 62710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:25,619-Speed 3325.54 samples/sec   Loss 1.5571   LearningRate 0.0058   Epoch: 15   Global Step: 62720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:28,693-Speed 3331.44 samples/sec   Loss 1.5197   LearningRate 0.0058   Epoch: 15   Global Step: 62730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:55:31,772-Speed 3327.00 samples/sec   Loss 1.5759   LearningRate 0.0058   Epoch: 15   Global Step: 62740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:34,852-Speed 3325.31 samples/sec   Loss 1.5577   LearningRate 0.0058   Epoch: 15   Global Step: 62750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:37,930-Speed 3327.60 samples/sec   Loss 1.5125   LearningRate 0.0058   Epoch: 15   Global Step: 62760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:41,004-Speed 3330.87 samples/sec   Loss 1.5148   LearningRate 0.0058   Epoch: 15   Global Step: 62770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:44,080-Speed 3330.92 samples/sec   Loss 1.5385   LearningRate 0.0058   Epoch: 15   Global Step: 62780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:47,164-Speed 3320.93 samples/sec   Loss 1.5590   LearningRate 0.0058   Epoch: 15   Global Step: 62790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:50,234-Speed 3335.80 samples/sec   Loss 1.5363   LearningRate 0.0058   Epoch: 15   Global Step: 62800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:53,313-Speed 3326.81 samples/sec   Loss 1.5524   LearningRate 0.0058   Epoch: 15   Global Step: 62810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:56,387-Speed 3331.97 samples/sec   Loss 1.5650   LearningRate 0.0058   Epoch: 15   Global Step: 62820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:55:59,460-Speed 3332.38 samples/sec   Loss 1.5365   LearningRate 0.0058   Epoch: 15   Global Step: 62830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:56:02,520-Speed 3346.77 samples/sec   Loss 1.5502   LearningRate 0.0058   Epoch: 15   Global Step: 62840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:56:05,581-Speed 3346.77 samples/sec   Loss 1.5659   LearningRate 0.0058   Epoch: 15   Global Step: 62850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:08,663-Speed 3322.69 samples/sec   Loss 1.5854   LearningRate 0.0058   Epoch: 15   Global Step: 62860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:11,741-Speed 3328.02 samples/sec   Loss 1.5480   LearningRate 0.0057   Epoch: 15   Global Step: 62870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:14,820-Speed 3327.00 samples/sec   Loss 1.5620   LearningRate 0.0057   Epoch: 15   Global Step: 62880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:17,892-Speed 3333.52 samples/sec   Loss 1.5509   LearningRate 0.0057   Epoch: 15   Global Step: 62890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:20,967-Speed 3331.03 samples/sec   Loss 1.5337   LearningRate 0.0057   Epoch: 15   Global Step: 62900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:24,044-Speed 3329.12 samples/sec   Loss 1.5822   LearningRate 0.0057   Epoch: 15   Global Step: 62910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:27,129-Speed 3319.59 samples/sec   Loss 1.5505   LearningRate 0.0057   Epoch: 15   Global Step: 62920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:30,206-Speed 3327.89 samples/sec   Loss 1.5697   LearningRate 0.0057   Epoch: 15   Global Step: 62930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:33,290-Speed 3321.27 samples/sec   Loss 1.5676   LearningRate 0.0057   Epoch: 15   Global Step: 62940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:36,366-Speed 3330.02 samples/sec   Loss 1.5739   LearningRate 0.0057   Epoch: 15   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:56:39,482-Speed 3287.37 samples/sec   Loss 1.5182   LearningRate 0.0057   Epoch: 15   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:56:42,562-Speed 3324.42 samples/sec   Loss 1.6192   LearningRate 0.0057   Epoch: 15   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:56:45,640-Speed 3328.51 samples/sec   Loss 1.5708   LearningRate 0.0057   Epoch: 15   Global Step: 62980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:56:48,718-Speed 3327.39 samples/sec   Loss 1.5496   LearningRate 0.0057   Epoch: 15   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:56:51,781-Speed 3344.21 samples/sec   Loss 1.6454   LearningRate 0.0057   Epoch: 15   Global Step: 63000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:54,859-Speed 3327.02 samples/sec   Loss 1.6034   LearningRate 0.0057   Epoch: 15   Global Step: 63010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:56:57,934-Speed 3331.05 samples/sec   Loss 1.5772   LearningRate 0.0057   Epoch: 15   Global Step: 63020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:01,014-Speed 3325.58 samples/sec   Loss 1.6400   LearningRate 0.0057   Epoch: 15   Global Step: 63030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:04,207-Speed 3207.46 samples/sec   Loss 1.5439   LearningRate 0.0057   Epoch: 15   Global Step: 63040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:07,295-Speed 3316.31 samples/sec   Loss 1.6092   LearningRate 0.0056   Epoch: 15   Global Step: 63050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:10,372-Speed 3329.52 samples/sec   Loss 1.5945   LearningRate 0.0056   Epoch: 15   Global Step: 63060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:13,466-Speed 3309.65 samples/sec   Loss 1.6518   LearningRate 0.0056   Epoch: 15   Global Step: 63070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:16,541-Speed 3331.44 samples/sec   Loss 1.5733   LearningRate 0.0056   Epoch: 15   Global Step: 63080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:19,617-Speed 3330.05 samples/sec   Loss 1.6420   LearningRate 0.0056   Epoch: 15   Global Step: 63090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:22,700-Speed 3321.32 samples/sec   Loss 1.5777   LearningRate 0.0056   Epoch: 15   Global Step: 63100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:25,947-Speed 3155.12 samples/sec   Loss 1.6526   LearningRate 0.0056   Epoch: 15   Global Step: 63110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:29,030-Speed 3321.38 samples/sec   Loss 1.5493   LearningRate 0.0056   Epoch: 15   Global Step: 63120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:32,107-Speed 3329.07 samples/sec   Loss 1.5886   LearningRate 0.0056   Epoch: 15   Global Step: 63130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:35,210-Speed 3300.09 samples/sec   Loss 1.5621   LearningRate 0.0056   Epoch: 15   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:38,298-Speed 3317.88 samples/sec   Loss 1.6071   LearningRate 0.0056   Epoch: 15   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:41,383-Speed 3319.91 samples/sec   Loss 1.6220   LearningRate 0.0056   Epoch: 15   Global Step: 63160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:44,462-Speed 3326.10 samples/sec   Loss 1.6256   LearningRate 0.0056   Epoch: 15   Global Step: 63170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:47,541-Speed 3326.42 samples/sec   Loss 1.6068   LearningRate 0.0056   Epoch: 15   Global Step: 63180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:50,632-Speed 3314.02 samples/sec   Loss 1.5433   LearningRate 0.0056   Epoch: 15   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:57:53,693-Speed 3346.11 samples/sec   Loss 1.6043   LearningRate 0.0056   Epoch: 15   Global Step: 63200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:56,770-Speed 3327.75 samples/sec   Loss 1.6025   LearningRate 0.0056   Epoch: 15   Global Step: 63210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:57:59,850-Speed 3326.22 samples/sec   Loss 1.5966   LearningRate 0.0055   Epoch: 15   Global Step: 63220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:02,931-Speed 3323.78 samples/sec   Loss 1.6212   LearningRate 0.0055   Epoch: 15   Global Step: 63230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:06,009-Speed 3327.83 samples/sec   Loss 1.6862   LearningRate 0.0055   Epoch: 15   Global Step: 63240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:09,088-Speed 3326.99 samples/sec   Loss 1.6158   LearningRate 0.0055   Epoch: 15   Global Step: 63250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:12,171-Speed 3322.05 samples/sec   Loss 1.5681   LearningRate 0.0055   Epoch: 15   Global Step: 63260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:15,254-Speed 3321.34 samples/sec   Loss 1.5703   LearningRate 0.0055   Epoch: 15   Global Step: 63270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:18,390-Speed 3266.32 samples/sec   Loss 1.6649   LearningRate 0.0055   Epoch: 15   Global Step: 63280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:21,474-Speed 3321.83 samples/sec   Loss 1.6062   LearningRate 0.0055   Epoch: 15   Global Step: 63290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:24,553-Speed 3326.15 samples/sec   Loss 1.5655   LearningRate 0.0055   Epoch: 15   Global Step: 63300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:58:27,642-Speed 3315.77 samples/sec   Loss 1.5983   LearningRate 0.0055   Epoch: 15   Global Step: 63310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:58:30,718-Speed 3329.69 samples/sec   Loss 1.6327   LearningRate 0.0055   Epoch: 15   Global Step: 63320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:58:33,793-Speed 3331.41 samples/sec   Loss 1.6387   LearningRate 0.0055   Epoch: 15   Global Step: 63330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:58:36,872-Speed 3326.44 samples/sec   Loss 1.5775   LearningRate 0.0055   Epoch: 15   Global Step: 63340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:58:39,951-Speed 3326.08 samples/sec   Loss 1.6041   LearningRate 0.0055   Epoch: 15   Global Step: 63350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:58:43,036-Speed 3320.14 samples/sec   Loss 1.6257   LearningRate 0.0055   Epoch: 15   Global Step: 63360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:58:46,104-Speed 3338.81 samples/sec   Loss 1.5906   LearningRate 0.0055   Epoch: 15   Global Step: 63370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:49,186-Speed 3322.77 samples/sec   Loss 1.5755   LearningRate 0.0055   Epoch: 15   Global Step: 63380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:52,281-Speed 3309.91 samples/sec   Loss 1.6177   LearningRate 0.0055   Epoch: 15   Global Step: 63390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:55,370-Speed 3315.82 samples/sec   Loss 1.6299   LearningRate 0.0054   Epoch: 15   Global Step: 63400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:58:58,452-Speed 3322.49 samples/sec   Loss 1.6257   LearningRate 0.0054   Epoch: 15   Global Step: 63410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:59:01,532-Speed 3325.31 samples/sec   Loss 1.5702   LearningRate 0.0054   Epoch: 15   Global Step: 63420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:59:04,615-Speed 3322.85 samples/sec   Loss 1.6822   LearningRate 0.0054   Epoch: 15   Global Step: 63430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:59:07,694-Speed 3326.09 samples/sec   Loss 1.5859   LearningRate 0.0054   Epoch: 15   Global Step: 63440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:59:10,777-Speed 3322.02 samples/sec   Loss 1.6231   LearningRate 0.0054   Epoch: 15   Global Step: 63450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:59:13,877-Speed 3304.07 samples/sec   Loss 1.6110   LearningRate 0.0054   Epoch: 15   Global Step: 63460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 18:59:16,961-Speed 3321.18 samples/sec   Loss 1.6781   LearningRate 0.0054   Epoch: 15   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:20,045-Speed 3320.64 samples/sec   Loss 1.6201   LearningRate 0.0054   Epoch: 15   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:23,134-Speed 3316.43 samples/sec   Loss 1.6669   LearningRate 0.0054   Epoch: 15   Global Step: 63490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:26,228-Speed 3310.45 samples/sec   Loss 1.6021   LearningRate 0.0054   Epoch: 15   Global Step: 63500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:29,314-Speed 3318.75 samples/sec   Loss 1.5978   LearningRate 0.0054   Epoch: 15   Global Step: 63510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:32,400-Speed 3318.82 samples/sec   Loss 1.6509   LearningRate 0.0054   Epoch: 15   Global Step: 63520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:35,482-Speed 3323.00 samples/sec   Loss 1.6541   LearningRate 0.0054   Epoch: 15   Global Step: 63530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:38,567-Speed 3320.20 samples/sec   Loss 1.6592   LearningRate 0.0054   Epoch: 15   Global Step: 63540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:41,653-Speed 3318.19 samples/sec   Loss 1.6120   LearningRate 0.0054   Epoch: 15   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:44,734-Speed 3325.17 samples/sec   Loss 1.6919   LearningRate 0.0054   Epoch: 15   Global Step: 63560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:47,831-Speed 3306.35 samples/sec   Loss 1.7177   LearningRate 0.0054   Epoch: 15   Global Step: 63570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:50,965-Speed 3269.32 samples/sec   Loss 1.6942   LearningRate 0.0053   Epoch: 15   Global Step: 63580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:54,050-Speed 3319.29 samples/sec   Loss 1.7233   LearningRate 0.0053   Epoch: 15   Global Step: 63590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 18:59:57,144-Speed 3311.09 samples/sec   Loss 1.6761   LearningRate 0.0053   Epoch: 15   Global Step: 63600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:00,233-Speed 3315.92 samples/sec   Loss 1.6463   LearningRate 0.0053   Epoch: 15   Global Step: 63610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:03,340-Speed 3296.16 samples/sec   Loss 1.6369   LearningRate 0.0053   Epoch: 15   Global Step: 63620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:06,426-Speed 3318.41 samples/sec   Loss 1.6829   LearningRate 0.0053   Epoch: 15   Global Step: 63630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:09,510-Speed 3321.27 samples/sec   Loss 1.6747   LearningRate 0.0053   Epoch: 15   Global Step: 63640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:12,573-Speed 3343.40 samples/sec   Loss 1.6802   LearningRate 0.0053   Epoch: 15   Global Step: 63650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:15,663-Speed 3315.48 samples/sec   Loss 1.6268   LearningRate 0.0053   Epoch: 15   Global Step: 63660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:18,744-Speed 3323.41 samples/sec   Loss 1.6499   LearningRate 0.0053   Epoch: 15   Global Step: 63670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:21,828-Speed 3322.39 samples/sec   Loss 1.6461   LearningRate 0.0053   Epoch: 15   Global Step: 63680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:24,938-Speed 3293.19 samples/sec   Loss 1.6279   LearningRate 0.0053   Epoch: 15   Global Step: 63690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:28,019-Speed 3324.41 samples/sec   Loss 1.6211   LearningRate 0.0053   Epoch: 15   Global Step: 63700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:31,104-Speed 3319.68 samples/sec   Loss 1.6503   LearningRate 0.0053   Epoch: 15   Global Step: 63710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:34,190-Speed 3318.76 samples/sec   Loss 1.6999   LearningRate 0.0053   Epoch: 15   Global Step: 63720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:37,274-Speed 3321.09 samples/sec   Loss 1.6841   LearningRate 0.0053   Epoch: 15   Global Step: 63730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:40,358-Speed 3321.21 samples/sec   Loss 1.6659   LearningRate 0.0053   Epoch: 15   Global Step: 63740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:00:43,444-Speed 3318.50 samples/sec   Loss 1.6275   LearningRate 0.0053   Epoch: 15   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:46,526-Speed 3323.94 samples/sec   Loss 1.6652   LearningRate 0.0052   Epoch: 15   Global Step: 63760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:49,615-Speed 3314.94 samples/sec   Loss 1.6858   LearningRate 0.0052   Epoch: 15   Global Step: 63770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:52,703-Speed 3317.34 samples/sec   Loss 1.7280   LearningRate 0.0052   Epoch: 15   Global Step: 63780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:55,849-Speed 3255.55 samples/sec   Loss 1.6705   LearningRate 0.0052   Epoch: 15   Global Step: 63790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:00:58,940-Speed 3313.76 samples/sec   Loss 1.6066   LearningRate 0.0052   Epoch: 15   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:02,027-Speed 3317.86 samples/sec   Loss 1.6738   LearningRate 0.0052   Epoch: 15   Global Step: 63810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:05,099-Speed 3333.46 samples/sec   Loss 1.6903   LearningRate 0.0052   Epoch: 15   Global Step: 63820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:08,184-Speed 3320.16 samples/sec   Loss 1.6471   LearningRate 0.0052   Epoch: 15   Global Step: 63830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:11,269-Speed 3320.51 samples/sec   Loss 1.6251   LearningRate 0.0052   Epoch: 15   Global Step: 63840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:14,353-Speed 3320.69 samples/sec   Loss 1.7130   LearningRate 0.0052   Epoch: 15   Global Step: 63850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:17,440-Speed 3318.22 samples/sec   Loss 1.7066   LearningRate 0.0052   Epoch: 15   Global Step: 63860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:20,535-Speed 3308.46 samples/sec   Loss 1.6579   LearningRate 0.0052   Epoch: 15   Global Step: 63870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:23,631-Speed 3309.25 samples/sec   Loss 1.6981   LearningRate 0.0052   Epoch: 15   Global Step: 63880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:26,721-Speed 3314.81 samples/sec   Loss 1.7306   LearningRate 0.0052   Epoch: 15   Global Step: 63890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:29,807-Speed 3318.15 samples/sec   Loss 1.6796   LearningRate 0.0052   Epoch: 15   Global Step: 63900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:32,891-Speed 3321.38 samples/sec   Loss 1.7047   LearningRate 0.0052   Epoch: 15   Global Step: 63910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:01:35,979-Speed 3316.86 samples/sec   Loss 1.6992   LearningRate 0.0052   Epoch: 15   Global Step: 63920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:39,074-Speed 3309.01 samples/sec   Loss 1.6289   LearningRate 0.0052   Epoch: 15   Global Step: 63930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:42,158-Speed 3321.19 samples/sec   Loss 1.7416   LearningRate 0.0051   Epoch: 15   Global Step: 63940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:45,250-Speed 3313.18 samples/sec   Loss 1.7200   LearningRate 0.0051   Epoch: 15   Global Step: 63950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:48,336-Speed 3318.07 samples/sec   Loss 1.6837   LearningRate 0.0051   Epoch: 15   Global Step: 63960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:51,421-Speed 3320.72 samples/sec   Loss 1.6835   LearningRate 0.0051   Epoch: 15   Global Step: 63970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:54,505-Speed 3320.62 samples/sec   Loss 1.6625   LearningRate 0.0051   Epoch: 15   Global Step: 63980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:01:57,587-Speed 3323.31 samples/sec   Loss 1.6506   LearningRate 0.0051   Epoch: 15   Global Step: 63990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:02:00,673-Speed 3319.47 samples/sec   Loss 1.6922   LearningRate 0.0051   Epoch: 15   Global Step: 64000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:02:44,331-[lfw][64000]XNorm: 21.660623
Training: 2022-04-26 19:02:44,331-[lfw][64000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-26 19:02:44,332-[lfw][64000]Accuracy-Highest: 0.99833
Training: 2022-04-26 19:03:34,740-[cfp_fp][64000]XNorm: 22.001157
Training: 2022-04-26 19:03:34,740-[cfp_fp][64000]Accuracy-Flip: 0.99114+-0.00494
Training: 2022-04-26 19:03:34,741-[cfp_fp][64000]Accuracy-Highest: 0.99129
Training: 2022-04-26 19:04:18,110-[agedb_30][64000]XNorm: 22.273601
Training: 2022-04-26 19:04:18,110-[agedb_30][64000]Accuracy-Flip: 0.97683+-0.00626
Training: 2022-04-26 19:04:18,111-[agedb_30][64000]Accuracy-Highest: 0.97683
Training: 2022-04-26 19:04:21,192-Speed 72.87 samples/sec   Loss 1.7109   LearningRate 0.0051   Epoch: 15   Global Step: 64010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:04:24,250-Speed 3349.87 samples/sec   Loss 1.6945   LearningRate 0.0051   Epoch: 15   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:04:27,321-Speed 3334.93 samples/sec   Loss 1.6727   LearningRate 0.0051   Epoch: 15   Global Step: 64030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:30,393-Speed 3333.81 samples/sec   Loss 1.6864   LearningRate 0.0051   Epoch: 15   Global Step: 64040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:33,473-Speed 3325.61 samples/sec   Loss 1.6626   LearningRate 0.0051   Epoch: 15   Global Step: 64050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:36,578-Speed 3298.49 samples/sec   Loss 1.6844   LearningRate 0.0051   Epoch: 15   Global Step: 64060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:39,661-Speed 3322.29 samples/sec   Loss 1.6613   LearningRate 0.0051   Epoch: 15   Global Step: 64070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:42,747-Speed 3319.17 samples/sec   Loss 1.6833   LearningRate 0.0051   Epoch: 15   Global Step: 64080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:45,825-Speed 3326.68 samples/sec   Loss 1.7002   LearningRate 0.0051   Epoch: 15   Global Step: 64090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:48,918-Speed 3311.90 samples/sec   Loss 1.6529   LearningRate 0.0051   Epoch: 15   Global Step: 64100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:52,014-Speed 3308.80 samples/sec   Loss 1.7056   LearningRate 0.0051   Epoch: 15   Global Step: 64110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:55,104-Speed 3314.26 samples/sec   Loss 1.6947   LearningRate 0.0050   Epoch: 15   Global Step: 64120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:04:58,193-Speed 3315.97 samples/sec   Loss 1.6917   LearningRate 0.0050   Epoch: 15   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:01,293-Speed 3303.74 samples/sec   Loss 1.7017   LearningRate 0.0050   Epoch: 15   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:04,428-Speed 3267.24 samples/sec   Loss 1.7067   LearningRate 0.0050   Epoch: 15   Global Step: 64150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:07,518-Speed 3314.07 samples/sec   Loss 1.6859   LearningRate 0.0050   Epoch: 15   Global Step: 64160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:10,614-Speed 3308.76 samples/sec   Loss 1.7018   LearningRate 0.0050   Epoch: 15   Global Step: 64170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:13,717-Speed 3300.50 samples/sec   Loss 1.6630   LearningRate 0.0050   Epoch: 15   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:16,807-Speed 3314.97 samples/sec   Loss 1.6577   LearningRate 0.0050   Epoch: 15   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:19,906-Speed 3304.81 samples/sec   Loss 1.7387   LearningRate 0.0050   Epoch: 15   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:23,001-Speed 3308.76 samples/sec   Loss 1.6660   LearningRate 0.0050   Epoch: 15   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:26,087-Speed 3319.38 samples/sec   Loss 1.6922   LearningRate 0.0050   Epoch: 15   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:29,161-Speed 3332.02 samples/sec   Loss 1.6887   LearningRate 0.0050   Epoch: 15   Global Step: 64230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:32,249-Speed 3317.43 samples/sec   Loss 1.6808   LearningRate 0.0050   Epoch: 15   Global Step: 64240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:35,331-Speed 3322.51 samples/sec   Loss 1.6980   LearningRate 0.0050   Epoch: 15   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:38,413-Speed 3323.79 samples/sec   Loss 1.7354   LearningRate 0.0050   Epoch: 15   Global Step: 64260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:41,486-Speed 3332.66 samples/sec   Loss 1.7053   LearningRate 0.0050   Epoch: 15   Global Step: 64270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:44,571-Speed 3319.67 samples/sec   Loss 1.7395   LearningRate 0.0050   Epoch: 15   Global Step: 64280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:47,645-Speed 3331.46 samples/sec   Loss 1.6997   LearningRate 0.0050   Epoch: 15   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:05:50,706-Speed 3346.51 samples/sec   Loss 1.6532   LearningRate 0.0050   Epoch: 15   Global Step: 64300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:05:53,782-Speed 3329.65 samples/sec   Loss 1.7223   LearningRate 0.0049   Epoch: 15   Global Step: 64310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:05:56,857-Speed 3331.47 samples/sec   Loss 1.6632   LearningRate 0.0049   Epoch: 15   Global Step: 64320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:05:59,936-Speed 3326.59 samples/sec   Loss 1.7271   LearningRate 0.0049   Epoch: 15   Global Step: 64330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:03,010-Speed 3331.70 samples/sec   Loss 1.6962   LearningRate 0.0049   Epoch: 15   Global Step: 64340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:06,082-Speed 3334.09 samples/sec   Loss 1.6996   LearningRate 0.0049   Epoch: 15   Global Step: 64350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:09,151-Speed 3336.53 samples/sec   Loss 1.7138   LearningRate 0.0049   Epoch: 15   Global Step: 64360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:12,229-Speed 3327.52 samples/sec   Loss 1.7271   LearningRate 0.0049   Epoch: 15   Global Step: 64370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:15,303-Speed 3332.52 samples/sec   Loss 1.7686   LearningRate 0.0049   Epoch: 15   Global Step: 64380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:18,369-Speed 3340.08 samples/sec   Loss 1.7714   LearningRate 0.0049   Epoch: 15   Global Step: 64390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:21,437-Speed 3338.84 samples/sec   Loss 1.7087   LearningRate 0.0049   Epoch: 15   Global Step: 64400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:06:24,489-Speed 3355.99 samples/sec   Loss 1.7217   LearningRate 0.0049   Epoch: 15   Global Step: 64410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:27,552-Speed 3344.08 samples/sec   Loss 1.7583   LearningRate 0.0049   Epoch: 15   Global Step: 64420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:30,622-Speed 3335.91 samples/sec   Loss 1.6939   LearningRate 0.0049   Epoch: 15   Global Step: 64430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:33,695-Speed 3333.77 samples/sec   Loss 1.7202   LearningRate 0.0049   Epoch: 15   Global Step: 64440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:36,769-Speed 3331.74 samples/sec   Loss 1.7152   LearningRate 0.0049   Epoch: 15   Global Step: 64450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:39,847-Speed 3327.42 samples/sec   Loss 1.6948   LearningRate 0.0049   Epoch: 15   Global Step: 64460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:42,925-Speed 3327.05 samples/sec   Loss 1.7374   LearningRate 0.0049   Epoch: 15   Global Step: 64470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:46,003-Speed 3328.09 samples/sec   Loss 1.7218   LearningRate 0.0049   Epoch: 15   Global Step: 64480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:49,081-Speed 3326.53 samples/sec   Loss 1.7451   LearningRate 0.0048   Epoch: 15   Global Step: 64490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:52,161-Speed 3325.82 samples/sec   Loss 1.7779   LearningRate 0.0048   Epoch: 15   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:55,243-Speed 3324.03 samples/sec   Loss 1.7113   LearningRate 0.0048   Epoch: 15   Global Step: 64510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:06:58,311-Speed 3338.65 samples/sec   Loss 1.6893   LearningRate 0.0048   Epoch: 15   Global Step: 64520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:01,385-Speed 3331.98 samples/sec   Loss 1.6841   LearningRate 0.0048   Epoch: 15   Global Step: 64530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:04,454-Speed 3336.36 samples/sec   Loss 1.6938   LearningRate 0.0048   Epoch: 15   Global Step: 64540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:07,526-Speed 3334.04 samples/sec   Loss 1.6841   LearningRate 0.0048   Epoch: 15   Global Step: 64550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:10,596-Speed 3336.56 samples/sec   Loss 1.7462   LearningRate 0.0048   Epoch: 15   Global Step: 64560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:13,676-Speed 3325.35 samples/sec   Loss 1.7621   LearningRate 0.0048   Epoch: 15   Global Step: 64570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:16,748-Speed 3334.18 samples/sec   Loss 1.7304   LearningRate 0.0048   Epoch: 15   Global Step: 64580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:19,819-Speed 3335.38 samples/sec   Loss 1.7132   LearningRate 0.0048   Epoch: 15   Global Step: 64590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:22,901-Speed 3323.59 samples/sec   Loss 1.7714   LearningRate 0.0048   Epoch: 15   Global Step: 64600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:25,981-Speed 3325.22 samples/sec   Loss 1.6487   LearningRate 0.0048   Epoch: 15   Global Step: 64610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:29,072-Speed 3313.22 samples/sec   Loss 1.7167   LearningRate 0.0048   Epoch: 15   Global Step: 64620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:32,143-Speed 3335.30 samples/sec   Loss 1.7038   LearningRate 0.0048   Epoch: 15   Global Step: 64630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:35,218-Speed 3330.96 samples/sec   Loss 1.7300   LearningRate 0.0048   Epoch: 15   Global Step: 64640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:38,292-Speed 3331.21 samples/sec   Loss 1.7430   LearningRate 0.0048   Epoch: 15   Global Step: 64650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:41,373-Speed 3324.97 samples/sec   Loss 1.8155   LearningRate 0.0048   Epoch: 15   Global Step: 64660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:44,451-Speed 3326.93 samples/sec   Loss 1.6847   LearningRate 0.0048   Epoch: 15   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:47,536-Speed 3320.15 samples/sec   Loss 1.7584   LearningRate 0.0047   Epoch: 15   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:07:50,622-Speed 3319.70 samples/sec   Loss 1.7173   LearningRate 0.0047   Epoch: 15   Global Step: 64690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:53,694-Speed 3333.33 samples/sec   Loss 1.7457   LearningRate 0.0047   Epoch: 15   Global Step: 64700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:56,781-Speed 3318.39 samples/sec   Loss 1.7298   LearningRate 0.0047   Epoch: 15   Global Step: 64710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:07:59,859-Speed 3327.27 samples/sec   Loss 1.6980   LearningRate 0.0047   Epoch: 15   Global Step: 64720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:02,935-Speed 3330.13 samples/sec   Loss 1.6763   LearningRate 0.0047   Epoch: 15   Global Step: 64730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:06,014-Speed 3326.74 samples/sec   Loss 1.6960   LearningRate 0.0047   Epoch: 15   Global Step: 64740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:09,086-Speed 3333.64 samples/sec   Loss 1.6942   LearningRate 0.0047   Epoch: 15   Global Step: 64750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:12,167-Speed 3323.75 samples/sec   Loss 1.7416   LearningRate 0.0047   Epoch: 15   Global Step: 64760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:15,249-Speed 3323.71 samples/sec   Loss 1.7265   LearningRate 0.0047   Epoch: 15   Global Step: 64770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:18,323-Speed 3331.88 samples/sec   Loss 1.7820   LearningRate 0.0047   Epoch: 15   Global Step: 64780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:21,400-Speed 3328.35 samples/sec   Loss 1.6803   LearningRate 0.0047   Epoch: 15   Global Step: 64790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:08:24,470-Speed 3336.43 samples/sec   Loss 1.7132   LearningRate 0.0047   Epoch: 15   Global Step: 64800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:08:27,549-Speed 3326.55 samples/sec   Loss 1.7277   LearningRate 0.0047   Epoch: 15   Global Step: 64810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:08:30,623-Speed 3331.91 samples/sec   Loss 1.7078   LearningRate 0.0047   Epoch: 15   Global Step: 64820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:08:33,698-Speed 3331.08 samples/sec   Loss 1.7180   LearningRate 0.0047   Epoch: 15   Global Step: 64830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:08:36,770-Speed 3333.91 samples/sec   Loss 1.7059   LearningRate 0.0047   Epoch: 15   Global Step: 64840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:08:39,830-Speed 3347.04 samples/sec   Loss 1.6846   LearningRate 0.0047   Epoch: 15   Global Step: 64850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:42,907-Speed 3328.92 samples/sec   Loss 1.7152   LearningRate 0.0047   Epoch: 15   Global Step: 64860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:45,991-Speed 3320.17 samples/sec   Loss 1.6991   LearningRate 0.0046   Epoch: 15   Global Step: 64870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:49,075-Speed 3321.71 samples/sec   Loss 1.7143   LearningRate 0.0046   Epoch: 15   Global Step: 64880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:52,177-Speed 3301.90 samples/sec   Loss 1.7290   LearningRate 0.0046   Epoch: 15   Global Step: 64890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:55,254-Speed 3328.24 samples/sec   Loss 1.7193   LearningRate 0.0046   Epoch: 15   Global Step: 64900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:08:58,344-Speed 3314.43 samples/sec   Loss 1.7625   LearningRate 0.0046   Epoch: 15   Global Step: 64910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:01,423-Speed 3327.32 samples/sec   Loss 1.6587   LearningRate 0.0046   Epoch: 15   Global Step: 64920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:04,503-Speed 3325.14 samples/sec   Loss 1.7684   LearningRate 0.0046   Epoch: 15   Global Step: 64930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:07,578-Speed 3331.03 samples/sec   Loss 1.7642   LearningRate 0.0046   Epoch: 15   Global Step: 64940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:10,653-Speed 3330.46 samples/sec   Loss 1.7547   LearningRate 0.0046   Epoch: 15   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:09:13,718-Speed 3341.90 samples/sec   Loss 1.7005   LearningRate 0.0046   Epoch: 15   Global Step: 64960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:16,791-Speed 3332.73 samples/sec   Loss 1.7500   LearningRate 0.0046   Epoch: 15   Global Step: 64970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:19,866-Speed 3330.96 samples/sec   Loss 1.6835   LearningRate 0.0046   Epoch: 15   Global Step: 64980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:22,943-Speed 3329.42 samples/sec   Loss 1.6803   LearningRate 0.0046   Epoch: 15   Global Step: 64990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:26,028-Speed 3319.12 samples/sec   Loss 1.7207   LearningRate 0.0046   Epoch: 15   Global Step: 65000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:29,107-Speed 3326.99 samples/sec   Loss 1.6416   LearningRate 0.0046   Epoch: 15   Global Step: 65010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:32,179-Speed 3334.22 samples/sec   Loss 1.7159   LearningRate 0.0046   Epoch: 15   Global Step: 65020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:35,253-Speed 3332.20 samples/sec   Loss 1.6731   LearningRate 0.0046   Epoch: 15   Global Step: 65030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:38,328-Speed 3330.34 samples/sec   Loss 1.7986   LearningRate 0.0046   Epoch: 15   Global Step: 65040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:41,408-Speed 3324.86 samples/sec   Loss 1.7269   LearningRate 0.0046   Epoch: 15   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:09:44,487-Speed 3326.71 samples/sec   Loss 1.7610   LearningRate 0.0045   Epoch: 15   Global Step: 65060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:09:47,564-Speed 3328.68 samples/sec   Loss 1.7188   LearningRate 0.0045   Epoch: 15   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:09:50,643-Speed 3326.92 samples/sec   Loss 1.7498   LearningRate 0.0045   Epoch: 15   Global Step: 65080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:09:53,722-Speed 3326.96 samples/sec   Loss 1.7342   LearningRate 0.0045   Epoch: 15   Global Step: 65090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:09:56,796-Speed 3331.31 samples/sec   Loss 1.7594   LearningRate 0.0045   Epoch: 15   Global Step: 65100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:09:59,872-Speed 3329.68 samples/sec   Loss 1.7285   LearningRate 0.0045   Epoch: 15   Global Step: 65110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:02,951-Speed 3326.86 samples/sec   Loss 1.7331   LearningRate 0.0045   Epoch: 15   Global Step: 65120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:06,032-Speed 3324.20 samples/sec   Loss 1.7280   LearningRate 0.0045   Epoch: 15   Global Step: 65130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:09,108-Speed 3329.02 samples/sec   Loss 1.7548   LearningRate 0.0045   Epoch: 15   Global Step: 65140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:12,208-Speed 3304.16 samples/sec   Loss 1.7226   LearningRate 0.0045   Epoch: 15   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:15,293-Speed 3319.63 samples/sec   Loss 1.7072   LearningRate 0.0045   Epoch: 15   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:18,371-Speed 3327.79 samples/sec   Loss 1.7269   LearningRate 0.0045   Epoch: 15   Global Step: 65170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:21,476-Speed 3299.06 samples/sec   Loss 1.7381   LearningRate 0.0045   Epoch: 15   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:10:24,584-Speed 3296.12 samples/sec   Loss 1.6638   LearningRate 0.0045   Epoch: 15   Global Step: 65190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:27,661-Speed 3327.95 samples/sec   Loss 1.7550   LearningRate 0.0045   Epoch: 15   Global Step: 65200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:30,742-Speed 3324.71 samples/sec   Loss 1.7483   LearningRate 0.0045   Epoch: 15   Global Step: 65210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:33,822-Speed 3325.58 samples/sec   Loss 1.8164   LearningRate 0.0045   Epoch: 15   Global Step: 65220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:36,910-Speed 3316.36 samples/sec   Loss 1.6566   LearningRate 0.0045   Epoch: 15   Global Step: 65230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:40,076-Speed 3235.22 samples/sec   Loss 1.6985   LearningRate 0.0045   Epoch: 15   Global Step: 65240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:43,153-Speed 3328.40 samples/sec   Loss 1.6800   LearningRate 0.0045   Epoch: 15   Global Step: 65250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:46,253-Speed 3303.58 samples/sec   Loss 1.7333   LearningRate 0.0044   Epoch: 15   Global Step: 65260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:49,355-Speed 3303.03 samples/sec   Loss 1.7641   LearningRate 0.0044   Epoch: 15   Global Step: 65270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:52,499-Speed 3256.97 samples/sec   Loss 1.6870   LearningRate 0.0044   Epoch: 15   Global Step: 65280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:55,606-Speed 3296.78 samples/sec   Loss 1.7578   LearningRate 0.0044   Epoch: 15   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:10:58,733-Speed 3275.73 samples/sec   Loss 1.7389   LearningRate 0.0044   Epoch: 15   Global Step: 65300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:01,818-Speed 3319.53 samples/sec   Loss 1.7160   LearningRate 0.0044   Epoch: 15   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:04,901-Speed 3322.60 samples/sec   Loss 1.6871   LearningRate 0.0044   Epoch: 15   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:07,983-Speed 3323.44 samples/sec   Loss 1.7486   LearningRate 0.0044   Epoch: 15   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:11,064-Speed 3323.89 samples/sec   Loss 1.7344   LearningRate 0.0044   Epoch: 15   Global Step: 65340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:14,141-Speed 3328.42 samples/sec   Loss 1.7468   LearningRate 0.0044   Epoch: 15   Global Step: 65350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:17,229-Speed 3316.45 samples/sec   Loss 1.6999   LearningRate 0.0044   Epoch: 15   Global Step: 65360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:20,316-Speed 3317.99 samples/sec   Loss 1.7407   LearningRate 0.0044   Epoch: 15   Global Step: 65370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:23,399-Speed 3323.06 samples/sec   Loss 1.7893   LearningRate 0.0044   Epoch: 15   Global Step: 65380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:26,484-Speed 3319.46 samples/sec   Loss 1.7054   LearningRate 0.0044   Epoch: 15   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:11:29,607-Speed 3279.50 samples/sec   Loss 1.7556   LearningRate 0.0044   Epoch: 15   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:11:32,687-Speed 3325.53 samples/sec   Loss 1.7740   LearningRate 0.0044   Epoch: 15   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:11:35,770-Speed 3322.62 samples/sec   Loss 1.7304   LearningRate 0.0044   Epoch: 15   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:11:38,859-Speed 3314.88 samples/sec   Loss 1.7493   LearningRate 0.0044   Epoch: 15   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:11:41,926-Speed 3340.01 samples/sec   Loss 1.7268   LearningRate 0.0044   Epoch: 15   Global Step: 65440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:45,005-Speed 3326.58 samples/sec   Loss 1.7844   LearningRate 0.0044   Epoch: 15   Global Step: 65450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:48,085-Speed 3325.09 samples/sec   Loss 1.7667   LearningRate 0.0043   Epoch: 15   Global Step: 65460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:51,171-Speed 3318.97 samples/sec   Loss 1.7436   LearningRate 0.0043   Epoch: 15   Global Step: 65470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:54,257-Speed 3319.60 samples/sec   Loss 1.7623   LearningRate 0.0043   Epoch: 15   Global Step: 65480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:11:57,335-Speed 3327.18 samples/sec   Loss 1.6994   LearningRate 0.0043   Epoch: 15   Global Step: 65490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:12:00,415-Speed 3325.33 samples/sec   Loss 1.7377   LearningRate 0.0043   Epoch: 15   Global Step: 65500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:12:03,498-Speed 3321.95 samples/sec   Loss 1.7357   LearningRate 0.0043   Epoch: 15   Global Step: 65510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:12:06,583-Speed 3320.20 samples/sec   Loss 1.6841   LearningRate 0.0043   Epoch: 15   Global Step: 65520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:12:09,671-Speed 3317.03 samples/sec   Loss 1.6962   LearningRate 0.0043   Epoch: 15   Global Step: 65530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:12:12,762-Speed 3312.77 samples/sec   Loss 1.8198   LearningRate 0.0043   Epoch: 15   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:15,875-Speed 3290.24 samples/sec   Loss 1.7277   LearningRate 0.0043   Epoch: 15   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:18,975-Speed 3304.28 samples/sec   Loss 1.7565   LearningRate 0.0043   Epoch: 15   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:22,066-Speed 3314.32 samples/sec   Loss 1.7024   LearningRate 0.0043   Epoch: 15   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:25,150-Speed 3320.35 samples/sec   Loss 1.7355   LearningRate 0.0043   Epoch: 15   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:28,230-Speed 3325.62 samples/sec   Loss 1.7020   LearningRate 0.0043   Epoch: 15   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:31,309-Speed 3326.01 samples/sec   Loss 1.7454   LearningRate 0.0043   Epoch: 15   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:34,414-Speed 3299.39 samples/sec   Loss 1.7966   LearningRate 0.0043   Epoch: 15   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:37,507-Speed 3310.74 samples/sec   Loss 1.7166   LearningRate 0.0043   Epoch: 15   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:40,588-Speed 3324.14 samples/sec   Loss 1.6901   LearningRate 0.0043   Epoch: 15   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:43,665-Speed 3329.80 samples/sec   Loss 1.6831   LearningRate 0.0043   Epoch: 15   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:46,755-Speed 3314.28 samples/sec   Loss 1.7531   LearningRate 0.0043   Epoch: 15   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:49,841-Speed 3318.58 samples/sec   Loss 1.7418   LearningRate 0.0042   Epoch: 15   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:52,932-Speed 3314.11 samples/sec   Loss 1.7571   LearningRate 0.0042   Epoch: 15   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:56,021-Speed 3321.13 samples/sec   Loss 1.7327   LearningRate 0.0042   Epoch: 15   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:12:59,164-Speed 3258.50 samples/sec   Loss 1.7577   LearningRate 0.0042   Epoch: 15   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:02,265-Speed 3303.08 samples/sec   Loss 1.8243   LearningRate 0.0042   Epoch: 15   Global Step: 65700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:05,355-Speed 3315.06 samples/sec   Loss 1.7352   LearningRate 0.0042   Epoch: 15   Global Step: 65710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:08,442-Speed 3318.08 samples/sec   Loss 1.7505   LearningRate 0.0042   Epoch: 15   Global Step: 65720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:11,527-Speed 3319.49 samples/sec   Loss 1.7635   LearningRate 0.0042   Epoch: 15   Global Step: 65730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:14,614-Speed 3318.15 samples/sec   Loss 1.7458   LearningRate 0.0042   Epoch: 15   Global Step: 65740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:17,701-Speed 3318.25 samples/sec   Loss 1.7500   LearningRate 0.0042   Epoch: 15   Global Step: 65750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:20,786-Speed 3319.88 samples/sec   Loss 1.7315   LearningRate 0.0042   Epoch: 15   Global Step: 65760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:23,885-Speed 3305.24 samples/sec   Loss 1.8149   LearningRate 0.0042   Epoch: 15   Global Step: 65770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:26,973-Speed 3316.87 samples/sec   Loss 1.7373   LearningRate 0.0042   Epoch: 15   Global Step: 65780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:30,061-Speed 3316.32 samples/sec   Loss 1.7799   LearningRate 0.0042   Epoch: 15   Global Step: 65790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:33,151-Speed 3315.06 samples/sec   Loss 1.6882   LearningRate 0.0042   Epoch: 15   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:36,238-Speed 3317.76 samples/sec   Loss 1.7256   LearningRate 0.0042   Epoch: 15   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:39,329-Speed 3313.19 samples/sec   Loss 1.7115   LearningRate 0.0042   Epoch: 15   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:42,418-Speed 3315.98 samples/sec   Loss 1.6928   LearningRate 0.0042   Epoch: 15   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:45,503-Speed 3319.07 samples/sec   Loss 1.7405   LearningRate 0.0042   Epoch: 15   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:48,593-Speed 3315.87 samples/sec   Loss 1.7632   LearningRate 0.0042   Epoch: 15   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:51,678-Speed 3319.85 samples/sec   Loss 1.7320   LearningRate 0.0041   Epoch: 15   Global Step: 65860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:13:54,747-Speed 3337.31 samples/sec   Loss 1.7236   LearningRate 0.0041   Epoch: 15   Global Step: 65870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:13:57,838-Speed 3313.24 samples/sec   Loss 1.7115   LearningRate 0.0041   Epoch: 15   Global Step: 65880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:00,923-Speed 3319.82 samples/sec   Loss 1.8090   LearningRate 0.0041   Epoch: 15   Global Step: 65890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:04,007-Speed 3321.26 samples/sec   Loss 1.8288   LearningRate 0.0041   Epoch: 15   Global Step: 65900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:07,099-Speed 3312.73 samples/sec   Loss 1.7953   LearningRate 0.0041   Epoch: 15   Global Step: 65910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:10,187-Speed 3316.74 samples/sec   Loss 1.7466   LearningRate 0.0041   Epoch: 15   Global Step: 65920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:13,271-Speed 3320.69 samples/sec   Loss 1.7322   LearningRate 0.0041   Epoch: 15   Global Step: 65930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:16,356-Speed 3320.45 samples/sec   Loss 1.7577   LearningRate 0.0041   Epoch: 15   Global Step: 65940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:19,447-Speed 3314.54 samples/sec   Loss 1.7884   LearningRate 0.0041   Epoch: 15   Global Step: 65950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:22,535-Speed 3316.56 samples/sec   Loss 1.7446   LearningRate 0.0041   Epoch: 15   Global Step: 65960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:14:25,622-Speed 3317.05 samples/sec   Loss 1.7772   LearningRate 0.0041   Epoch: 15   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:14:28,719-Speed 3307.17 samples/sec   Loss 1.7759   LearningRate 0.0041   Epoch: 15   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:14:31,816-Speed 3307.25 samples/sec   Loss 1.7683   LearningRate 0.0041   Epoch: 15   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:14:34,917-Speed 3303.04 samples/sec   Loss 1.7637   LearningRate 0.0041   Epoch: 15   Global Step: 66000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:15:18,496-[lfw][66000]XNorm: 21.606378
Training: 2022-04-26 19:15:18,497-[lfw][66000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-26 19:15:18,497-[lfw][66000]Accuracy-Highest: 0.99833
Training: 2022-04-26 19:16:09,290-[cfp_fp][66000]XNorm: 21.855225
Training: 2022-04-26 19:16:09,291-[cfp_fp][66000]Accuracy-Flip: 0.99200+-0.00504
Training: 2022-04-26 19:16:09,291-[cfp_fp][66000]Accuracy-Highest: 0.99200
Training: 2022-04-26 19:16:52,887-[agedb_30][66000]XNorm: 22.236197
Training: 2022-04-26 19:16:52,888-[agedb_30][66000]Accuracy-Flip: 0.97950+-0.00654
Training: 2022-04-26 19:16:52,888-[agedb_30][66000]Accuracy-Highest: 0.97950
Training: 2022-04-26 19:16:55,941-Speed 72.61 samples/sec   Loss 1.6885   LearningRate 0.0041   Epoch: 15   Global Step: 66010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:16:59,009-Speed 3338.71 samples/sec   Loss 1.7469   LearningRate 0.0041   Epoch: 15   Global Step: 66020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:02,109-Speed 3303.80 samples/sec   Loss 1.7801   LearningRate 0.0041   Epoch: 15   Global Step: 66030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:05,185-Speed 3329.41 samples/sec   Loss 1.7391   LearningRate 0.0041   Epoch: 15   Global Step: 66040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:08,260-Speed 3331.21 samples/sec   Loss 1.6852   LearningRate 0.0041   Epoch: 15   Global Step: 66050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:11,340-Speed 3324.74 samples/sec   Loss 1.7730   LearningRate 0.0040   Epoch: 15   Global Step: 66060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:14,425-Speed 3320.84 samples/sec   Loss 1.7647   LearningRate 0.0040   Epoch: 15   Global Step: 66070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:17,499-Speed 3332.02 samples/sec   Loss 1.7125   LearningRate 0.0040   Epoch: 15   Global Step: 66080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:20,574-Speed 3331.04 samples/sec   Loss 1.7471   LearningRate 0.0040   Epoch: 15   Global Step: 66090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:23,653-Speed 3325.79 samples/sec   Loss 1.7611   LearningRate 0.0040   Epoch: 15   Global Step: 66100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:17:26,747-Speed 3310.42 samples/sec   Loss 1.7651   LearningRate 0.0040   Epoch: 15   Global Step: 66110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:17:29,893-Speed 3255.38 samples/sec   Loss 1.7829   LearningRate 0.0040   Epoch: 15   Global Step: 66120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:17:32,981-Speed 3317.44 samples/sec   Loss 1.7506   LearningRate 0.0040   Epoch: 15   Global Step: 66130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:17:36,062-Speed 3324.49 samples/sec   Loss 1.7221   LearningRate 0.0040   Epoch: 15   Global Step: 66140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:17:39,204-Speed 3259.64 samples/sec   Loss 1.7691   LearningRate 0.0040   Epoch: 15   Global Step: 66150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:17:42,288-Speed 3320.23 samples/sec   Loss 1.7790   LearningRate 0.0040   Epoch: 15   Global Step: 66160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:17:54,843-Speed 815.71 samples/sec   Loss 1.0828   LearningRate 0.0040   Epoch: 16   Global Step: 66170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:17:57,939-Speed 3308.04 samples/sec   Loss 1.0288   LearningRate 0.0040   Epoch: 16   Global Step: 66180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:01,021-Speed 3324.31 samples/sec   Loss 1.0340   LearningRate 0.0040   Epoch: 16   Global Step: 66190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:04,086-Speed 3340.97 samples/sec   Loss 1.0005   LearningRate 0.0040   Epoch: 16   Global Step: 66200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:07,168-Speed 3323.31 samples/sec   Loss 1.0177   LearningRate 0.0040   Epoch: 16   Global Step: 66210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:10,276-Speed 3295.24 samples/sec   Loss 1.0267   LearningRate 0.0040   Epoch: 16   Global Step: 66220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:13,384-Speed 3295.46 samples/sec   Loss 1.0122   LearningRate 0.0040   Epoch: 16   Global Step: 66230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:16,473-Speed 3315.26 samples/sec   Loss 0.9839   LearningRate 0.0040   Epoch: 16   Global Step: 66240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:19,557-Speed 3321.56 samples/sec   Loss 1.0363   LearningRate 0.0040   Epoch: 16   Global Step: 66250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:22,633-Speed 3330.36 samples/sec   Loss 1.0266   LearningRate 0.0040   Epoch: 16   Global Step: 66260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:25,717-Speed 3321.59 samples/sec   Loss 1.0170   LearningRate 0.0039   Epoch: 16   Global Step: 66270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:28,809-Speed 3312.30 samples/sec   Loss 1.0704   LearningRate 0.0039   Epoch: 16   Global Step: 66280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:31,883-Speed 3331.56 samples/sec   Loss 1.0407   LearningRate 0.0039   Epoch: 16   Global Step: 66290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:18:34,960-Speed 3328.84 samples/sec   Loss 1.0659   LearningRate 0.0039   Epoch: 16   Global Step: 66300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:38,043-Speed 3322.37 samples/sec   Loss 1.0212   LearningRate 0.0039   Epoch: 16   Global Step: 66310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:41,121-Speed 3326.83 samples/sec   Loss 1.0275   LearningRate 0.0039   Epoch: 16   Global Step: 66320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:44,203-Speed 3324.05 samples/sec   Loss 1.0811   LearningRate 0.0039   Epoch: 16   Global Step: 66330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:47,285-Speed 3322.15 samples/sec   Loss 1.0536   LearningRate 0.0039   Epoch: 16   Global Step: 66340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:50,375-Speed 3315.73 samples/sec   Loss 1.0414   LearningRate 0.0039   Epoch: 16   Global Step: 66350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:53,510-Speed 3267.16 samples/sec   Loss 1.0488   LearningRate 0.0039   Epoch: 16   Global Step: 66360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:56,589-Speed 3326.24 samples/sec   Loss 1.0225   LearningRate 0.0039   Epoch: 16   Global Step: 66370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:18:59,767-Speed 3222.77 samples/sec   Loss 1.0496   LearningRate 0.0039   Epoch: 16   Global Step: 66380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:02,850-Speed 3321.92 samples/sec   Loss 1.0574   LearningRate 0.0039   Epoch: 16   Global Step: 66390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:05,930-Speed 3325.54 samples/sec   Loss 1.0631   LearningRate 0.0039   Epoch: 16   Global Step: 66400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:09,006-Speed 3329.59 samples/sec   Loss 1.0821   LearningRate 0.0039   Epoch: 16   Global Step: 66410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:12,088-Speed 3322.95 samples/sec   Loss 1.0243   LearningRate 0.0039   Epoch: 16   Global Step: 66420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:15,174-Speed 3319.31 samples/sec   Loss 1.0582   LearningRate 0.0039   Epoch: 16   Global Step: 66430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:18,272-Speed 3306.16 samples/sec   Loss 1.0027   LearningRate 0.0039   Epoch: 16   Global Step: 66440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:21,354-Speed 3323.58 samples/sec   Loss 1.0370   LearningRate 0.0039   Epoch: 16   Global Step: 66450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:24,462-Speed 3295.63 samples/sec   Loss 1.0645   LearningRate 0.0039   Epoch: 16   Global Step: 66460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:27,548-Speed 3319.06 samples/sec   Loss 1.0605   LearningRate 0.0039   Epoch: 16   Global Step: 66470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:30,626-Speed 3327.42 samples/sec   Loss 1.0645   LearningRate 0.0038   Epoch: 16   Global Step: 66480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:19:33,693-Speed 3339.56 samples/sec   Loss 1.0338   LearningRate 0.0038   Epoch: 16   Global Step: 66490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:36,766-Speed 3333.14 samples/sec   Loss 1.0722   LearningRate 0.0038   Epoch: 16   Global Step: 66500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:39,839-Speed 3332.96 samples/sec   Loss 1.0490   LearningRate 0.0038   Epoch: 16   Global Step: 66510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:42,911-Speed 3333.71 samples/sec   Loss 1.0514   LearningRate 0.0038   Epoch: 16   Global Step: 66520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:45,995-Speed 3322.31 samples/sec   Loss 1.0516   LearningRate 0.0038   Epoch: 16   Global Step: 66530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:49,085-Speed 3314.74 samples/sec   Loss 1.0582   LearningRate 0.0038   Epoch: 16   Global Step: 66540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:52,168-Speed 3322.20 samples/sec   Loss 1.0900   LearningRate 0.0038   Epoch: 16   Global Step: 66550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:55,252-Speed 3320.85 samples/sec   Loss 1.0757   LearningRate 0.0038   Epoch: 16   Global Step: 66560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:19:58,327-Speed 3330.09 samples/sec   Loss 1.1043   LearningRate 0.0038   Epoch: 16   Global Step: 66570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:01,400-Speed 3333.22 samples/sec   Loss 1.0654   LearningRate 0.0038   Epoch: 16   Global Step: 66580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:04,482-Speed 3323.57 samples/sec   Loss 0.9913   LearningRate 0.0038   Epoch: 16   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:07,557-Speed 3330.97 samples/sec   Loss 1.0689   LearningRate 0.0038   Epoch: 16   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:10,641-Speed 3320.64 samples/sec   Loss 1.0486   LearningRate 0.0038   Epoch: 16   Global Step: 66610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:13,723-Speed 3323.71 samples/sec   Loss 1.1260   LearningRate 0.0038   Epoch: 16   Global Step: 66620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:16,807-Speed 3320.99 samples/sec   Loss 1.0758   LearningRate 0.0038   Epoch: 16   Global Step: 66630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:19,881-Speed 3332.87 samples/sec   Loss 1.1153   LearningRate 0.0038   Epoch: 16   Global Step: 66640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:22,957-Speed 3329.29 samples/sec   Loss 1.0431   LearningRate 0.0038   Epoch: 16   Global Step: 66650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:26,029-Speed 3333.35 samples/sec   Loss 1.0953   LearningRate 0.0038   Epoch: 16   Global Step: 66660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:20:29,089-Speed 3347.92 samples/sec   Loss 1.0709   LearningRate 0.0038   Epoch: 16   Global Step: 66670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:32,164-Speed 3330.91 samples/sec   Loss 1.0855   LearningRate 0.0038   Epoch: 16   Global Step: 66680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:35,239-Speed 3330.32 samples/sec   Loss 1.1096   LearningRate 0.0037   Epoch: 16   Global Step: 66690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:38,427-Speed 3212.84 samples/sec   Loss 1.1075   LearningRate 0.0037   Epoch: 16   Global Step: 66700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:41,548-Speed 3281.54 samples/sec   Loss 1.0761   LearningRate 0.0037   Epoch: 16   Global Step: 66710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:44,625-Speed 3328.63 samples/sec   Loss 1.0695   LearningRate 0.0037   Epoch: 16   Global Step: 66720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:47,705-Speed 3326.36 samples/sec   Loss 1.0860   LearningRate 0.0037   Epoch: 16   Global Step: 66730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:50,787-Speed 3323.18 samples/sec   Loss 1.1141   LearningRate 0.0037   Epoch: 16   Global Step: 66740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:53,862-Speed 3330.10 samples/sec   Loss 1.1077   LearningRate 0.0037   Epoch: 16   Global Step: 66750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:20:56,946-Speed 3321.11 samples/sec   Loss 1.0215   LearningRate 0.0037   Epoch: 16   Global Step: 66760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:21:00,025-Speed 3326.47 samples/sec   Loss 1.0888   LearningRate 0.0037   Epoch: 16   Global Step: 66770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:03,106-Speed 3324.99 samples/sec   Loss 1.0723   LearningRate 0.0037   Epoch: 16   Global Step: 66780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:06,184-Speed 3327.49 samples/sec   Loss 1.0991   LearningRate 0.0037   Epoch: 16   Global Step: 66790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:09,266-Speed 3322.52 samples/sec   Loss 1.0701   LearningRate 0.0037   Epoch: 16   Global Step: 66800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:12,345-Speed 3326.59 samples/sec   Loss 1.0901   LearningRate 0.0037   Epoch: 16   Global Step: 66810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:15,422-Speed 3328.37 samples/sec   Loss 1.1937   LearningRate 0.0037   Epoch: 16   Global Step: 66820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:18,503-Speed 3324.87 samples/sec   Loss 1.0757   LearningRate 0.0037   Epoch: 16   Global Step: 66830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:21,581-Speed 3328.49 samples/sec   Loss 1.1563   LearningRate 0.0037   Epoch: 16   Global Step: 66840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:24,686-Speed 3297.84 samples/sec   Loss 1.0795   LearningRate 0.0037   Epoch: 16   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:27,762-Speed 3329.71 samples/sec   Loss 1.0897   LearningRate 0.0037   Epoch: 16   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:30,840-Speed 3327.33 samples/sec   Loss 1.1336   LearningRate 0.0037   Epoch: 16   Global Step: 66870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-26 19:21:33,905-Speed 3342.06 samples/sec   Loss 1.0830   LearningRate 0.0037   Epoch: 16   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:36,993-Speed 3316.52 samples/sec   Loss 1.1286   LearningRate 0.0037   Epoch: 16   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:40,067-Speed 3331.87 samples/sec   Loss 1.1029   LearningRate 0.0037   Epoch: 16   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:43,141-Speed 3332.00 samples/sec   Loss 1.0838   LearningRate 0.0036   Epoch: 16   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:46,218-Speed 3329.05 samples/sec   Loss 1.1068   LearningRate 0.0036   Epoch: 16   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:49,295-Speed 3329.01 samples/sec   Loss 1.1138   LearningRate 0.0036   Epoch: 16   Global Step: 66930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:52,367-Speed 3334.05 samples/sec   Loss 1.0913   LearningRate 0.0036   Epoch: 16   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:55,451-Speed 3320.56 samples/sec   Loss 1.0772   LearningRate 0.0036   Epoch: 16   Global Step: 66950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:21:58,526-Speed 3331.33 samples/sec   Loss 1.0428   LearningRate 0.0036   Epoch: 16   Global Step: 66960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:01,604-Speed 3327.67 samples/sec   Loss 1.1159   LearningRate 0.0036   Epoch: 16   Global Step: 66970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:04,662-Speed 3349.08 samples/sec   Loss 1.0622   LearningRate 0.0036   Epoch: 16   Global Step: 66980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:07,738-Speed 3330.02 samples/sec   Loss 1.1177   LearningRate 0.0036   Epoch: 16   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:10,816-Speed 3327.32 samples/sec   Loss 1.1350   LearningRate 0.0036   Epoch: 16   Global Step: 67000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:13,913-Speed 3306.27 samples/sec   Loss 1.1334   LearningRate 0.0036   Epoch: 16   Global Step: 67010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:17,052-Speed 3262.89 samples/sec   Loss 1.0970   LearningRate 0.0036   Epoch: 16   Global Step: 67020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:20,139-Speed 3318.58 samples/sec   Loss 1.0863   LearningRate 0.0036   Epoch: 16   Global Step: 67030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:23,228-Speed 3316.07 samples/sec   Loss 1.0991   LearningRate 0.0036   Epoch: 16   Global Step: 67040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:26,330-Speed 3301.76 samples/sec   Loss 1.1533   LearningRate 0.0036   Epoch: 16   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:29,409-Speed 3326.76 samples/sec   Loss 1.1404   LearningRate 0.0036   Epoch: 16   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:32,485-Speed 3329.48 samples/sec   Loss 1.0622   LearningRate 0.0036   Epoch: 16   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:35,604-Speed 3284.05 samples/sec   Loss 1.1274   LearningRate 0.0036   Epoch: 16   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:22:38,720-Speed 3287.00 samples/sec   Loss 1.1516   LearningRate 0.0036   Epoch: 16   Global Step: 67090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:22:41,799-Speed 3325.67 samples/sec   Loss 1.1163   LearningRate 0.0036   Epoch: 16   Global Step: 67100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:22:44,897-Speed 3305.80 samples/sec   Loss 1.0970   LearningRate 0.0036   Epoch: 16   Global Step: 67110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:22:47,979-Speed 3323.38 samples/sec   Loss 1.1377   LearningRate 0.0035   Epoch: 16   Global Step: 67120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:22:51,063-Speed 3321.43 samples/sec   Loss 1.1140   LearningRate 0.0035   Epoch: 16   Global Step: 67130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:22:54,218-Speed 3246.52 samples/sec   Loss 1.1431   LearningRate 0.0035   Epoch: 16   Global Step: 67140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:22:57,296-Speed 3327.40 samples/sec   Loss 1.1729   LearningRate 0.0035   Epoch: 16   Global Step: 67150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:00,435-Speed 3263.39 samples/sec   Loss 1.1187   LearningRate 0.0035   Epoch: 16   Global Step: 67160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:03,545-Speed 3293.50 samples/sec   Loss 1.1479   LearningRate 0.0035   Epoch: 16   Global Step: 67170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:06,653-Speed 3295.23 samples/sec   Loss 1.1027   LearningRate 0.0035   Epoch: 16   Global Step: 67180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:09,730-Speed 3328.64 samples/sec   Loss 1.1902   LearningRate 0.0035   Epoch: 16   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:23:12,813-Speed 3321.48 samples/sec   Loss 1.1632   LearningRate 0.0035   Epoch: 16   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:23:15,891-Speed 3327.81 samples/sec   Loss 1.0918   LearningRate 0.0035   Epoch: 16   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:23:18,969-Speed 3327.81 samples/sec   Loss 1.1169   LearningRate 0.0035   Epoch: 16   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:23:22,049-Speed 3325.64 samples/sec   Loss 1.1487   LearningRate 0.0035   Epoch: 16   Global Step: 67230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:23:25,123-Speed 3331.25 samples/sec   Loss 1.1185   LearningRate 0.0035   Epoch: 16   Global Step: 67240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:28,221-Speed 3306.32 samples/sec   Loss 1.1139   LearningRate 0.0035   Epoch: 16   Global Step: 67250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:31,301-Speed 3325.99 samples/sec   Loss 1.1224   LearningRate 0.0035   Epoch: 16   Global Step: 67260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:34,382-Speed 3324.30 samples/sec   Loss 1.1502   LearningRate 0.0035   Epoch: 16   Global Step: 67270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:37,459-Speed 3328.19 samples/sec   Loss 1.1580   LearningRate 0.0035   Epoch: 16   Global Step: 67280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:40,536-Speed 3328.39 samples/sec   Loss 1.1362   LearningRate 0.0035   Epoch: 16   Global Step: 67290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:43,615-Speed 3327.01 samples/sec   Loss 1.1161   LearningRate 0.0035   Epoch: 16   Global Step: 67300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:46,711-Speed 3307.96 samples/sec   Loss 1.1554   LearningRate 0.0035   Epoch: 16   Global Step: 67310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:49,800-Speed 3316.36 samples/sec   Loss 1.1090   LearningRate 0.0035   Epoch: 16   Global Step: 67320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:52,885-Speed 3319.84 samples/sec   Loss 1.1189   LearningRate 0.0035   Epoch: 16   Global Step: 67330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:23:55,967-Speed 3323.69 samples/sec   Loss 1.1992   LearningRate 0.0034   Epoch: 16   Global Step: 67340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:23:59,031-Speed 3342.12 samples/sec   Loss 1.1464   LearningRate 0.0034   Epoch: 16   Global Step: 67350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:02,104-Speed 3333.54 samples/sec   Loss 1.1748   LearningRate 0.0034   Epoch: 16   Global Step: 67360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:05,198-Speed 3310.14 samples/sec   Loss 1.1191   LearningRate 0.0034   Epoch: 16   Global Step: 67370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:08,284-Speed 3318.65 samples/sec   Loss 1.1041   LearningRate 0.0034   Epoch: 16   Global Step: 67380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:11,368-Speed 3320.74 samples/sec   Loss 1.1381   LearningRate 0.0034   Epoch: 16   Global Step: 67390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:14,447-Speed 3326.61 samples/sec   Loss 1.1465   LearningRate 0.0034   Epoch: 16   Global Step: 67400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:17,531-Speed 3321.11 samples/sec   Loss 1.1108   LearningRate 0.0034   Epoch: 16   Global Step: 67410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:20,613-Speed 3323.59 samples/sec   Loss 1.1570   LearningRate 0.0034   Epoch: 16   Global Step: 67420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:23,689-Speed 3330.31 samples/sec   Loss 1.0859   LearningRate 0.0034   Epoch: 16   Global Step: 67430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:26,791-Speed 3301.61 samples/sec   Loss 1.2102   LearningRate 0.0034   Epoch: 16   Global Step: 67440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:24:29,872-Speed 3324.44 samples/sec   Loss 1.1563   LearningRate 0.0034   Epoch: 16   Global Step: 67450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:32,946-Speed 3331.14 samples/sec   Loss 1.1291   LearningRate 0.0034   Epoch: 16   Global Step: 67460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:36,034-Speed 3317.29 samples/sec   Loss 1.1660   LearningRate 0.0034   Epoch: 16   Global Step: 67470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:39,109-Speed 3330.27 samples/sec   Loss 1.1755   LearningRate 0.0034   Epoch: 16   Global Step: 67480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:42,187-Speed 3328.37 samples/sec   Loss 1.1595   LearningRate 0.0034   Epoch: 16   Global Step: 67490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:45,273-Speed 3319.14 samples/sec   Loss 1.1459   LearningRate 0.0034   Epoch: 16   Global Step: 67500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:48,362-Speed 3315.36 samples/sec   Loss 1.1493   LearningRate 0.0034   Epoch: 16   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:51,455-Speed 3311.65 samples/sec   Loss 1.1924   LearningRate 0.0034   Epoch: 16   Global Step: 67520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:54,540-Speed 3320.36 samples/sec   Loss 1.1478   LearningRate 0.0034   Epoch: 16   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:24:57,619-Speed 3326.11 samples/sec   Loss 1.1710   LearningRate 0.0034   Epoch: 16   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:00,711-Speed 3313.19 samples/sec   Loss 1.1156   LearningRate 0.0034   Epoch: 16   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:03,801-Speed 3313.96 samples/sec   Loss 1.2034   LearningRate 0.0034   Epoch: 16   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:06,880-Speed 3327.08 samples/sec   Loss 1.2072   LearningRate 0.0033   Epoch: 16   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:09,956-Speed 3329.52 samples/sec   Loss 1.1456   LearningRate 0.0033   Epoch: 16   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:13,023-Speed 3339.78 samples/sec   Loss 1.1164   LearningRate 0.0033   Epoch: 16   Global Step: 67590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:16,120-Speed 3307.25 samples/sec   Loss 1.1592   LearningRate 0.0033   Epoch: 16   Global Step: 67600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:19,203-Speed 3321.51 samples/sec   Loss 1.1657   LearningRate 0.0033   Epoch: 16   Global Step: 67610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:22,285-Speed 3323.44 samples/sec   Loss 1.1275   LearningRate 0.0033   Epoch: 16   Global Step: 67620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:25,365-Speed 3325.30 samples/sec   Loss 1.1341   LearningRate 0.0033   Epoch: 16   Global Step: 67630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:28,444-Speed 3326.82 samples/sec   Loss 1.1186   LearningRate 0.0033   Epoch: 16   Global Step: 67640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:31,521-Speed 3328.93 samples/sec   Loss 1.1350   LearningRate 0.0033   Epoch: 16   Global Step: 67650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:34,626-Speed 3298.38 samples/sec   Loss 1.1572   LearningRate 0.0033   Epoch: 16   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:37,717-Speed 3313.00 samples/sec   Loss 1.1237   LearningRate 0.0033   Epoch: 16   Global Step: 67670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:40,826-Speed 3294.90 samples/sec   Loss 1.1659   LearningRate 0.0033   Epoch: 16   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:43,917-Speed 3312.99 samples/sec   Loss 1.1160   LearningRate 0.0033   Epoch: 16   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:47,012-Speed 3309.70 samples/sec   Loss 1.1390   LearningRate 0.0033   Epoch: 16   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:50,101-Speed 3315.90 samples/sec   Loss 1.1452   LearningRate 0.0033   Epoch: 16   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:25:53,169-Speed 3338.97 samples/sec   Loss 1.1997   LearningRate 0.0033   Epoch: 16   Global Step: 67720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:56,252-Speed 3321.58 samples/sec   Loss 1.2014   LearningRate 0.0033   Epoch: 16   Global Step: 67730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:25:59,337-Speed 3320.36 samples/sec   Loss 1.1867   LearningRate 0.0033   Epoch: 16   Global Step: 67740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:02,462-Speed 3277.60 samples/sec   Loss 1.1692   LearningRate 0.0033   Epoch: 16   Global Step: 67750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:05,569-Speed 3295.78 samples/sec   Loss 1.1478   LearningRate 0.0033   Epoch: 16   Global Step: 67760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:08,650-Speed 3324.36 samples/sec   Loss 1.1722   LearningRate 0.0033   Epoch: 16   Global Step: 67770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:11,737-Speed 3317.87 samples/sec   Loss 1.2081   LearningRate 0.0033   Epoch: 16   Global Step: 67780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:14,943-Speed 3194.69 samples/sec   Loss 1.1398   LearningRate 0.0033   Epoch: 16   Global Step: 67790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:18,048-Speed 3298.49 samples/sec   Loss 1.1656   LearningRate 0.0032   Epoch: 16   Global Step: 67800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:21,132-Speed 3322.29 samples/sec   Loss 1.1974   LearningRate 0.0032   Epoch: 16   Global Step: 67810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:26:24,224-Speed 3311.93 samples/sec   Loss 1.2375   LearningRate 0.0032   Epoch: 16   Global Step: 67820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:27,330-Speed 3297.38 samples/sec   Loss 1.1638   LearningRate 0.0032   Epoch: 16   Global Step: 67830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:30,411-Speed 3324.76 samples/sec   Loss 1.1936   LearningRate 0.0032   Epoch: 16   Global Step: 67840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:33,500-Speed 3315.34 samples/sec   Loss 1.1686   LearningRate 0.0032   Epoch: 16   Global Step: 67850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:36,587-Speed 3318.18 samples/sec   Loss 1.1911   LearningRate 0.0032   Epoch: 16   Global Step: 67860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:39,683-Speed 3307.54 samples/sec   Loss 1.1623   LearningRate 0.0032   Epoch: 16   Global Step: 67870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:42,766-Speed 3322.51 samples/sec   Loss 1.1761   LearningRate 0.0032   Epoch: 16   Global Step: 67880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:45,854-Speed 3317.33 samples/sec   Loss 1.1551   LearningRate 0.0032   Epoch: 16   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:48,943-Speed 3315.47 samples/sec   Loss 1.2195   LearningRate 0.0032   Epoch: 16   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:52,039-Speed 3308.64 samples/sec   Loss 1.1496   LearningRate 0.0032   Epoch: 16   Global Step: 67910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:55,111-Speed 3334.29 samples/sec   Loss 1.1385   LearningRate 0.0032   Epoch: 16   Global Step: 67920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:26:58,246-Speed 3266.47 samples/sec   Loss 1.1962   LearningRate 0.0032   Epoch: 16   Global Step: 67930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:27:01,339-Speed 3312.41 samples/sec   Loss 1.1557   LearningRate 0.0032   Epoch: 16   Global Step: 67940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:27:04,431-Speed 3312.34 samples/sec   Loss 1.1941   LearningRate 0.0032   Epoch: 16   Global Step: 67950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:27:07,548-Speed 3285.29 samples/sec   Loss 1.1975   LearningRate 0.0032   Epoch: 16   Global Step: 67960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:27:10,641-Speed 3311.61 samples/sec   Loss 1.1687   LearningRate 0.0032   Epoch: 16   Global Step: 67970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:27:13,725-Speed 3320.83 samples/sec   Loss 1.1525   LearningRate 0.0032   Epoch: 16   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:27:16,819-Speed 3311.29 samples/sec   Loss 1.1545   LearningRate 0.0032   Epoch: 16   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:27:19,906-Speed 3317.31 samples/sec   Loss 1.1920   LearningRate 0.0032   Epoch: 16   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:28:03,426-[lfw][68000]XNorm: 22.442956
Training: 2022-04-26 19:28:03,427-[lfw][68000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-26 19:28:03,427-[lfw][68000]Accuracy-Highest: 0.99833
Training: 2022-04-26 19:28:53,876-[cfp_fp][68000]XNorm: 22.610309
Training: 2022-04-26 19:28:53,877-[cfp_fp][68000]Accuracy-Flip: 0.99186+-0.00491
Training: 2022-04-26 19:28:53,877-[cfp_fp][68000]Accuracy-Highest: 0.99200
Training: 2022-04-26 19:29:37,232-[agedb_30][68000]XNorm: 22.960249
Training: 2022-04-26 19:29:37,233-[agedb_30][68000]Accuracy-Flip: 0.97650+-0.00594
Training: 2022-04-26 19:29:37,233-[agedb_30][68000]Accuracy-Highest: 0.97950
Training: 2022-04-26 19:29:40,350-Speed 72.91 samples/sec   Loss 1.1761   LearningRate 0.0032   Epoch: 16   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:29:43,406-Speed 3351.73 samples/sec   Loss 1.1988   LearningRate 0.0032   Epoch: 16   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:29:46,484-Speed 3327.30 samples/sec   Loss 1.1865   LearningRate 0.0031   Epoch: 16   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:29:49,557-Speed 3332.91 samples/sec   Loss 1.1552   LearningRate 0.0031   Epoch: 16   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:29:52,639-Speed 3323.34 samples/sec   Loss 1.1631   LearningRate 0.0031   Epoch: 16   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:29:55,722-Speed 3322.55 samples/sec   Loss 1.1934   LearningRate 0.0031   Epoch: 16   Global Step: 68060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:29:58,804-Speed 3322.51 samples/sec   Loss 1.1909   LearningRate 0.0031   Epoch: 16   Global Step: 68070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:30:01,867-Speed 3344.45 samples/sec   Loss 1.1975   LearningRate 0.0031   Epoch: 16   Global Step: 68080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:04,959-Speed 3312.05 samples/sec   Loss 1.1571   LearningRate 0.0031   Epoch: 16   Global Step: 68090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:08,055-Speed 3308.59 samples/sec   Loss 1.1858   LearningRate 0.0031   Epoch: 16   Global Step: 68100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:11,143-Speed 3316.77 samples/sec   Loss 1.1748   LearningRate 0.0031   Epoch: 16   Global Step: 68110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:14,273-Speed 3272.08 samples/sec   Loss 1.1621   LearningRate 0.0031   Epoch: 16   Global Step: 68120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:17,366-Speed 3311.63 samples/sec   Loss 1.1770   LearningRate 0.0031   Epoch: 16   Global Step: 68130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:20,453-Speed 3318.70 samples/sec   Loss 1.1651   LearningRate 0.0031   Epoch: 16   Global Step: 68140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:23,529-Speed 3329.01 samples/sec   Loss 1.1543   LearningRate 0.0031   Epoch: 16   Global Step: 68150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:26,621-Speed 3312.44 samples/sec   Loss 1.2284   LearningRate 0.0031   Epoch: 16   Global Step: 68160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:29,725-Speed 3299.17 samples/sec   Loss 1.1677   LearningRate 0.0031   Epoch: 16   Global Step: 68170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:32,822-Speed 3307.64 samples/sec   Loss 1.1255   LearningRate 0.0031   Epoch: 16   Global Step: 68180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:35,967-Speed 3256.38 samples/sec   Loss 1.1590   LearningRate 0.0031   Epoch: 16   Global Step: 68190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:39,112-Speed 3257.65 samples/sec   Loss 1.1514   LearningRate 0.0031   Epoch: 16   Global Step: 68200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:42,321-Speed 3191.33 samples/sec   Loss 1.1768   LearningRate 0.0031   Epoch: 16   Global Step: 68210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:45,419-Speed 3305.90 samples/sec   Loss 1.2116   LearningRate 0.0031   Epoch: 16   Global Step: 68220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:48,519-Speed 3304.17 samples/sec   Loss 1.2155   LearningRate 0.0031   Epoch: 16   Global Step: 68230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:51,615-Speed 3307.98 samples/sec   Loss 1.1819   LearningRate 0.0031   Epoch: 16   Global Step: 68240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:30:54,712-Speed 3307.01 samples/sec   Loss 1.1913   LearningRate 0.0031   Epoch: 16   Global Step: 68250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:30:57,810-Speed 3306.81 samples/sec   Loss 1.1821   LearningRate 0.0030   Epoch: 16   Global Step: 68260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:00,913-Speed 3300.82 samples/sec   Loss 1.1911   LearningRate 0.0030   Epoch: 16   Global Step: 68270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:04,012-Speed 3304.64 samples/sec   Loss 1.1987   LearningRate 0.0030   Epoch: 16   Global Step: 68280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:07,117-Speed 3299.11 samples/sec   Loss 1.1968   LearningRate 0.0030   Epoch: 16   Global Step: 68290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:10,207-Speed 3314.21 samples/sec   Loss 1.1875   LearningRate 0.0030   Epoch: 16   Global Step: 68300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:13,296-Speed 3315.75 samples/sec   Loss 1.1919   LearningRate 0.0030   Epoch: 16   Global Step: 68310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:16,371-Speed 3330.84 samples/sec   Loss 1.1828   LearningRate 0.0030   Epoch: 16   Global Step: 68320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:19,458-Speed 3317.96 samples/sec   Loss 1.1963   LearningRate 0.0030   Epoch: 16   Global Step: 68330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:22,533-Speed 3330.44 samples/sec   Loss 1.2020   LearningRate 0.0030   Epoch: 16   Global Step: 68340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:25,634-Speed 3303.54 samples/sec   Loss 1.1738   LearningRate 0.0030   Epoch: 16   Global Step: 68350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:28,869-Speed 3165.97 samples/sec   Loss 1.2271   LearningRate 0.0030   Epoch: 16   Global Step: 68360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:31,947-Speed 3327.91 samples/sec   Loss 1.1292   LearningRate 0.0030   Epoch: 16   Global Step: 68370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:35,027-Speed 3324.40 samples/sec   Loss 1.2157   LearningRate 0.0030   Epoch: 16   Global Step: 68380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:38,122-Speed 3310.31 samples/sec   Loss 1.2150   LearningRate 0.0030   Epoch: 16   Global Step: 68390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:41,199-Speed 3328.39 samples/sec   Loss 1.2170   LearningRate 0.0030   Epoch: 16   Global Step: 68400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:44,275-Speed 3329.63 samples/sec   Loss 1.1945   LearningRate 0.0030   Epoch: 16   Global Step: 68410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-26 19:31:47,348-Speed 3333.48 samples/sec   Loss 1.2474   LearningRate 0.0030   Epoch: 16   Global Step: 68420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:50,443-Speed 3309.44 samples/sec   Loss 1.1986   LearningRate 0.0030   Epoch: 16   Global Step: 68430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:53,520-Speed 3327.71 samples/sec   Loss 1.1761   LearningRate 0.0030   Epoch: 16   Global Step: 68440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:56,586-Speed 3340.80 samples/sec   Loss 1.1549   LearningRate 0.0030   Epoch: 16   Global Step: 68450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:31:59,652-Speed 3340.00 samples/sec   Loss 1.1959   LearningRate 0.0030   Epoch: 16   Global Step: 68460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:02,723-Speed 3335.09 samples/sec   Loss 1.2090   LearningRate 0.0030   Epoch: 16   Global Step: 68470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:05,816-Speed 3311.70 samples/sec   Loss 1.2116   LearningRate 0.0030   Epoch: 16   Global Step: 68480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:08,885-Speed 3337.30 samples/sec   Loss 1.1955   LearningRate 0.0030   Epoch: 16   Global Step: 68490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:11,966-Speed 3324.56 samples/sec   Loss 1.2000   LearningRate 0.0029   Epoch: 16   Global Step: 68500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:15,091-Speed 3277.50 samples/sec   Loss 1.2144   LearningRate 0.0029   Epoch: 16   Global Step: 68510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:18,193-Speed 3301.98 samples/sec   Loss 1.2689   LearningRate 0.0029   Epoch: 16   Global Step: 68520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:32:21,268-Speed 3330.68 samples/sec   Loss 1.1858   LearningRate 0.0029   Epoch: 16   Global Step: 68530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:32:24,352-Speed 3321.60 samples/sec   Loss 1.2122   LearningRate 0.0029   Epoch: 16   Global Step: 68540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:32:27,419-Speed 3339.18 samples/sec   Loss 1.1335   LearningRate 0.0029   Epoch: 16   Global Step: 68550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-26 19:32:30,474-Speed 3353.17 samples/sec   Loss 1.1971   LearningRate 0.0029   Epoch: 16   Global Step: 68560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:33,555-Speed 3324.10 samples/sec   Loss 1.2601   LearningRate 0.0029   Epoch: 16   Global Step: 68570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:36,704-Speed 3252.40 samples/sec   Loss 1.2543   LearningRate 0.0029   Epoch: 16   Global Step: 68580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:39,795-Speed 3313.13 samples/sec   Loss 1.1966   LearningRate 0.0029   Epoch: 16   Global Step: 68590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-26 19:32:42,994-Speed 3201.68 samples/sec   Loss 1.2279   LearningRate 0.0029   Epoch: 16   Global Step: 68600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:32:46,079-Speed 3320.19 samples/sec   Loss 1.1736   LearningRate 0.0029   Epoch: 16   Global Step: 68610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:32:49,148-Speed 3338.17 samples/sec   Loss 1.2485   LearningRate 0.0029   Epoch: 16   Global Step: 68620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:32:52,297-Speed 3252.49 samples/sec   Loss 1.1858   LearningRate 0.0029   Epoch: 16   Global Step: 68630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:32:55,434-Speed 3264.67 samples/sec   Loss 1.1741   LearningRate 0.0029   Epoch: 16   Global Step: 68640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:32:58,506-Speed 3334.50 samples/sec   Loss 1.1899   LearningRate 0.0029   Epoch: 16   Global Step: 68650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:33:01,578-Speed 3333.26 samples/sec   Loss 1.1919   LearningRate 0.0029   Epoch: 16   Global Step: 68660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:04,654-Speed 3330.07 samples/sec   Loss 1.1592   LearningRate 0.0029   Epoch: 16   Global Step: 68670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:07,757-Speed 3301.08 samples/sec   Loss 1.2151   LearningRate 0.0029   Epoch: 16   Global Step: 68680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:10,831-Speed 3331.23 samples/sec   Loss 1.2271   LearningRate 0.0029   Epoch: 16   Global Step: 68690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:13,912-Speed 3324.52 samples/sec   Loss 1.2111   LearningRate 0.0029   Epoch: 16   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:16,988-Speed 3330.67 samples/sec   Loss 1.1615   LearningRate 0.0029   Epoch: 16   Global Step: 68710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:20,066-Speed 3326.80 samples/sec   Loss 1.2112   LearningRate 0.0029   Epoch: 16   Global Step: 68720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:23,149-Speed 3322.17 samples/sec   Loss 1.2710   LearningRate 0.0029   Epoch: 16   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:26,255-Speed 3297.87 samples/sec   Loss 1.2283   LearningRate 0.0028   Epoch: 16   Global Step: 68740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:29,352-Speed 3307.63 samples/sec   Loss 1.2189   LearningRate 0.0028   Epoch: 16   Global Step: 68750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:32,411-Speed 3347.45 samples/sec   Loss 1.1865   LearningRate 0.0028   Epoch: 16   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:35,482-Speed 3335.63 samples/sec   Loss 1.2423   LearningRate 0.0028   Epoch: 16   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:38,554-Speed 3333.70 samples/sec   Loss 1.2508   LearningRate 0.0028   Epoch: 16   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:41,638-Speed 3321.88 samples/sec   Loss 1.2031   LearningRate 0.0028   Epoch: 16   Global Step: 68790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:44,729-Speed 3313.13 samples/sec   Loss 1.2444   LearningRate 0.0028   Epoch: 16   Global Step: 68800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:33:47,803-Speed 3332.28 samples/sec   Loss 1.2379   LearningRate 0.0028   Epoch: 16   Global Step: 68810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:33:50,900-Speed 3306.73 samples/sec   Loss 1.1664   LearningRate 0.0028   Epoch: 16   Global Step: 68820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:33:53,978-Speed 3328.56 samples/sec   Loss 1.1910   LearningRate 0.0028   Epoch: 16   Global Step: 68830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:33:57,056-Speed 3327.25 samples/sec   Loss 1.1881   LearningRate 0.0028   Epoch: 16   Global Step: 68840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:00,130-Speed 3332.04 samples/sec   Loss 1.1842   LearningRate 0.0028   Epoch: 16   Global Step: 68850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:03,200-Speed 3335.52 samples/sec   Loss 1.2315   LearningRate 0.0028   Epoch: 16   Global Step: 68860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:06,279-Speed 3327.03 samples/sec   Loss 1.1583   LearningRate 0.0028   Epoch: 16   Global Step: 68870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:09,350-Speed 3335.35 samples/sec   Loss 1.2082   LearningRate 0.0028   Epoch: 16   Global Step: 68880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:12,422-Speed 3333.63 samples/sec   Loss 1.2133   LearningRate 0.0028   Epoch: 16   Global Step: 68890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:15,499-Speed 3329.25 samples/sec   Loss 1.2306   LearningRate 0.0028   Epoch: 16   Global Step: 68900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:18,570-Speed 3334.45 samples/sec   Loss 1.2016   LearningRate 0.0028   Epoch: 16   Global Step: 68910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:34:21,642-Speed 3334.83 samples/sec   Loss 1.1698   LearningRate 0.0028   Epoch: 16   Global Step: 68920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:34:24,733-Speed 3313.52 samples/sec   Loss 1.1931   LearningRate 0.0028   Epoch: 16   Global Step: 68930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:34:27,810-Speed 3329.08 samples/sec   Loss 1.2618   LearningRate 0.0028   Epoch: 16   Global Step: 68940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:34:30,866-Speed 3350.87 samples/sec   Loss 1.2360   LearningRate 0.0028   Epoch: 16   Global Step: 68950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:33,943-Speed 3328.45 samples/sec   Loss 1.2375   LearningRate 0.0028   Epoch: 16   Global Step: 68960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:37,130-Speed 3214.13 samples/sec   Loss 1.1605   LearningRate 0.0028   Epoch: 16   Global Step: 68970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:40,225-Speed 3309.57 samples/sec   Loss 1.2344   LearningRate 0.0028   Epoch: 16   Global Step: 68980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:43,328-Speed 3302.77 samples/sec   Loss 1.2330   LearningRate 0.0027   Epoch: 16   Global Step: 68990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:46,435-Speed 3295.96 samples/sec   Loss 1.2486   LearningRate 0.0027   Epoch: 16   Global Step: 69000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:49,514-Speed 3326.73 samples/sec   Loss 1.2763   LearningRate 0.0027   Epoch: 16   Global Step: 69010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:52,597-Speed 3322.61 samples/sec   Loss 1.1646   LearningRate 0.0027   Epoch: 16   Global Step: 69020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:55,689-Speed 3312.40 samples/sec   Loss 1.2099   LearningRate 0.0027   Epoch: 16   Global Step: 69030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:34:58,764-Speed 3330.71 samples/sec   Loss 1.1842   LearningRate 0.0027   Epoch: 16   Global Step: 69040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:35:01,878-Speed 3288.74 samples/sec   Loss 1.1996   LearningRate 0.0027   Epoch: 16   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:04,960-Speed 3323.04 samples/sec   Loss 1.2038   LearningRate 0.0027   Epoch: 16   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:08,048-Speed 3316.80 samples/sec   Loss 1.2003   LearningRate 0.0027   Epoch: 16   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:11,150-Speed 3302.40 samples/sec   Loss 1.1494   LearningRate 0.0027   Epoch: 16   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:14,227-Speed 3328.28 samples/sec   Loss 1.2177   LearningRate 0.0027   Epoch: 16   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:17,304-Speed 3328.19 samples/sec   Loss 1.2260   LearningRate 0.0027   Epoch: 16   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:20,381-Speed 3329.37 samples/sec   Loss 1.2097   LearningRate 0.0027   Epoch: 16   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:23,472-Speed 3314.31 samples/sec   Loss 1.2346   LearningRate 0.0027   Epoch: 16   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:26,590-Speed 3285.45 samples/sec   Loss 1.1869   LearningRate 0.0027   Epoch: 16   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:29,717-Speed 3274.81 samples/sec   Loss 1.2195   LearningRate 0.0027   Epoch: 16   Global Step: 69140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:32,802-Speed 3320.14 samples/sec   Loss 1.1930   LearningRate 0.0027   Epoch: 16   Global Step: 69150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 19:35:35,862-Speed 3347.67 samples/sec   Loss 1.2219   LearningRate 0.0027   Epoch: 16   Global Step: 69160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:38,940-Speed 3327.08 samples/sec   Loss 1.2267   LearningRate 0.0027   Epoch: 16   Global Step: 69170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:42,028-Speed 3316.51 samples/sec   Loss 1.1814   LearningRate 0.0027   Epoch: 16   Global Step: 69180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:35:45,095-Speed 3340.39 samples/sec   Loss 1.2195   LearningRate 0.0027   Epoch: 16   Global Step: 69190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:35:48,218-Speed 3279.09 samples/sec   Loss 1.1817   LearningRate 0.0027   Epoch: 16   Global Step: 69200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:35:51,332-Speed 3288.69 samples/sec   Loss 1.2267   LearningRate 0.0027   Epoch: 16   Global Step: 69210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:35:54,411-Speed 3326.77 samples/sec   Loss 1.2070   LearningRate 0.0027   Epoch: 16   Global Step: 69220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:35:57,492-Speed 3324.82 samples/sec   Loss 1.1811   LearningRate 0.0027   Epoch: 16   Global Step: 69230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:36:00,571-Speed 3326.43 samples/sec   Loss 1.2133   LearningRate 0.0026   Epoch: 16   Global Step: 69240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:36:03,652-Speed 3324.39 samples/sec   Loss 1.2097   LearningRate 0.0026   Epoch: 16   Global Step: 69250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:36:06,727-Speed 3330.30 samples/sec   Loss 1.2220   LearningRate 0.0026   Epoch: 16   Global Step: 69260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:36:09,809-Speed 3323.68 samples/sec   Loss 1.2272   LearningRate 0.0026   Epoch: 16   Global Step: 69270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:36:12,888-Speed 3326.55 samples/sec   Loss 1.2638   LearningRate 0.0026   Epoch: 16   Global Step: 69280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:36:16,016-Speed 3274.09 samples/sec   Loss 1.1807   LearningRate 0.0026   Epoch: 16   Global Step: 69290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:19,099-Speed 3322.30 samples/sec   Loss 1.1888   LearningRate 0.0026   Epoch: 16   Global Step: 69300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:22,177-Speed 3327.57 samples/sec   Loss 1.2237   LearningRate 0.0026   Epoch: 16   Global Step: 69310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:25,257-Speed 3325.69 samples/sec   Loss 1.2829   LearningRate 0.0026   Epoch: 16   Global Step: 69320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:28,337-Speed 3325.74 samples/sec   Loss 1.1858   LearningRate 0.0026   Epoch: 16   Global Step: 69330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:31,416-Speed 3325.70 samples/sec   Loss 1.2282   LearningRate 0.0026   Epoch: 16   Global Step: 69340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:34,492-Speed 3329.75 samples/sec   Loss 1.2088   LearningRate 0.0026   Epoch: 16   Global Step: 69350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:37,569-Speed 3328.55 samples/sec   Loss 1.1621   LearningRate 0.0026   Epoch: 16   Global Step: 69360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:40,649-Speed 3325.30 samples/sec   Loss 1.2350   LearningRate 0.0026   Epoch: 16   Global Step: 69370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:43,726-Speed 3328.86 samples/sec   Loss 1.2129   LearningRate 0.0026   Epoch: 16   Global Step: 69380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:46,784-Speed 3349.80 samples/sec   Loss 1.2248   LearningRate 0.0026   Epoch: 16   Global Step: 69390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:49,869-Speed 3319.58 samples/sec   Loss 1.2120   LearningRate 0.0026   Epoch: 16   Global Step: 69400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:52,966-Speed 3307.44 samples/sec   Loss 1.2085   LearningRate 0.0026   Epoch: 16   Global Step: 69410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:56,047-Speed 3324.64 samples/sec   Loss 1.1977   LearningRate 0.0026   Epoch: 16   Global Step: 69420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:36:59,124-Speed 3328.79 samples/sec   Loss 1.2171   LearningRate 0.0026   Epoch: 16   Global Step: 69430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:02,202-Speed 3326.80 samples/sec   Loss 1.2245   LearningRate 0.0026   Epoch: 16   Global Step: 69440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:05,298-Speed 3308.17 samples/sec   Loss 1.2165   LearningRate 0.0026   Epoch: 16   Global Step: 69450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:08,360-Speed 3344.73 samples/sec   Loss 1.2875   LearningRate 0.0026   Epoch: 16   Global Step: 69460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:11,437-Speed 3329.47 samples/sec   Loss 1.1985   LearningRate 0.0026   Epoch: 16   Global Step: 69470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:14,531-Speed 3309.90 samples/sec   Loss 1.2048   LearningRate 0.0026   Epoch: 16   Global Step: 69480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:17,613-Speed 3323.79 samples/sec   Loss 1.2147   LearningRate 0.0026   Epoch: 16   Global Step: 69490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:20,702-Speed 3314.86 samples/sec   Loss 1.2047   LearningRate 0.0025   Epoch: 16   Global Step: 69500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:23,784-Speed 3324.18 samples/sec   Loss 1.2332   LearningRate 0.0025   Epoch: 16   Global Step: 69510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:26,864-Speed 3325.40 samples/sec   Loss 1.2387   LearningRate 0.0025   Epoch: 16   Global Step: 69520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:29,945-Speed 3324.11 samples/sec   Loss 1.1915   LearningRate 0.0025   Epoch: 16   Global Step: 69530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:33,029-Speed 3321.20 samples/sec   Loss 1.1866   LearningRate 0.0025   Epoch: 16   Global Step: 69540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:36,117-Speed 3316.88 samples/sec   Loss 1.1439   LearningRate 0.0025   Epoch: 16   Global Step: 69550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:37:39,201-Speed 3320.71 samples/sec   Loss 1.1746   LearningRate 0.0025   Epoch: 16   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:42,290-Speed 3315.86 samples/sec   Loss 1.2192   LearningRate 0.0025   Epoch: 16   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:45,384-Speed 3310.16 samples/sec   Loss 1.1872   LearningRate 0.0025   Epoch: 16   Global Step: 69580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:48,479-Speed 3310.05 samples/sec   Loss 1.2256   LearningRate 0.0025   Epoch: 16   Global Step: 69590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:51,563-Speed 3321.00 samples/sec   Loss 1.2358   LearningRate 0.0025   Epoch: 16   Global Step: 69600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:54,654-Speed 3313.41 samples/sec   Loss 1.2740   LearningRate 0.0025   Epoch: 16   Global Step: 69610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:37:57,744-Speed 3314.37 samples/sec   Loss 1.1795   LearningRate 0.0025   Epoch: 16   Global Step: 69620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:00,827-Speed 3322.86 samples/sec   Loss 1.2479   LearningRate 0.0025   Epoch: 16   Global Step: 69630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:03,913-Speed 3318.31 samples/sec   Loss 1.2479   LearningRate 0.0025   Epoch: 16   Global Step: 69640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:06,994-Speed 3324.73 samples/sec   Loss 1.1869   LearningRate 0.0025   Epoch: 16   Global Step: 69650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:10,061-Speed 3338.78 samples/sec   Loss 1.2461   LearningRate 0.0025   Epoch: 16   Global Step: 69660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:13,155-Speed 3311.28 samples/sec   Loss 1.2382   LearningRate 0.0025   Epoch: 16   Global Step: 69670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:16,303-Speed 3253.61 samples/sec   Loss 1.1746   LearningRate 0.0025   Epoch: 16   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:19,386-Speed 3321.68 samples/sec   Loss 1.2017   LearningRate 0.0025   Epoch: 16   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:22,481-Speed 3309.53 samples/sec   Loss 1.1917   LearningRate 0.0025   Epoch: 16   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:25,567-Speed 3318.53 samples/sec   Loss 1.2342   LearningRate 0.0025   Epoch: 16   Global Step: 69710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:28,649-Speed 3323.81 samples/sec   Loss 1.1927   LearningRate 0.0025   Epoch: 16   Global Step: 69720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:31,734-Speed 3319.94 samples/sec   Loss 1.1807   LearningRate 0.0025   Epoch: 16   Global Step: 69730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:34,817-Speed 3321.95 samples/sec   Loss 1.2222   LearningRate 0.0025   Epoch: 16   Global Step: 69740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:37,902-Speed 3320.58 samples/sec   Loss 1.1982   LearningRate 0.0025   Epoch: 16   Global Step: 69750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:38:41,007-Speed 3297.99 samples/sec   Loss 1.1804   LearningRate 0.0024   Epoch: 16   Global Step: 69760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 19:38:44,059-Speed 3355.78 samples/sec   Loss 1.1857   LearningRate 0.0024   Epoch: 16   Global Step: 69770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:38:47,152-Speed 3311.48 samples/sec   Loss 1.1752   LearningRate 0.0024   Epoch: 16   Global Step: 69780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:38:50,331-Speed 3222.12 samples/sec   Loss 1.1787   LearningRate 0.0024   Epoch: 16   Global Step: 69790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:38:53,420-Speed 3315.73 samples/sec   Loss 1.2675   LearningRate 0.0024   Epoch: 16   Global Step: 69800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:38:56,511-Speed 3314.17 samples/sec   Loss 1.2128   LearningRate 0.0024   Epoch: 16   Global Step: 69810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:38:59,615-Speed 3299.34 samples/sec   Loss 1.2036   LearningRate 0.0024   Epoch: 16   Global Step: 69820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:02,702-Speed 3317.95 samples/sec   Loss 1.2359   LearningRate 0.0024   Epoch: 16   Global Step: 69830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:05,794-Speed 3313.05 samples/sec   Loss 1.1749   LearningRate 0.0024   Epoch: 16   Global Step: 69840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:08,877-Speed 3321.18 samples/sec   Loss 1.1775   LearningRate 0.0024   Epoch: 16   Global Step: 69850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:11,962-Speed 3320.17 samples/sec   Loss 1.1761   LearningRate 0.0024   Epoch: 16   Global Step: 69860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:15,052-Speed 3314.86 samples/sec   Loss 1.2838   LearningRate 0.0024   Epoch: 16   Global Step: 69870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:39:18,138-Speed 3319.32 samples/sec   Loss 1.1750   LearningRate 0.0024   Epoch: 16   Global Step: 69880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:39:21,207-Speed 3336.46 samples/sec   Loss 1.2560   LearningRate 0.0024   Epoch: 16   Global Step: 69890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:24,383-Speed 3224.85 samples/sec   Loss 1.2417   LearningRate 0.0024   Epoch: 16   Global Step: 69900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:27,890-Speed 2920.69 samples/sec   Loss 1.2836   LearningRate 0.0024   Epoch: 16   Global Step: 69910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:31,190-Speed 3103.71 samples/sec   Loss 1.2197   LearningRate 0.0024   Epoch: 16   Global Step: 69920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:34,274-Speed 3321.78 samples/sec   Loss 1.2228   LearningRate 0.0024   Epoch: 16   Global Step: 69930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:37,364-Speed 3313.88 samples/sec   Loss 1.2628   LearningRate 0.0024   Epoch: 16   Global Step: 69940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:40,445-Speed 3324.66 samples/sec   Loss 1.1935   LearningRate 0.0024   Epoch: 16   Global Step: 69950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:43,531-Speed 3318.95 samples/sec   Loss 1.2220   LearningRate 0.0024   Epoch: 16   Global Step: 69960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:46,685-Speed 3247.52 samples/sec   Loss 1.2890   LearningRate 0.0024   Epoch: 16   Global Step: 69970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:49,796-Speed 3292.45 samples/sec   Loss 1.2635   LearningRate 0.0024   Epoch: 16   Global Step: 69980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:39:52,884-Speed 3316.70 samples/sec   Loss 1.1664   LearningRate 0.0024   Epoch: 16   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:39:55,978-Speed 3310.51 samples/sec   Loss 1.2289   LearningRate 0.0024   Epoch: 16   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:40:39,373-[lfw][70000]XNorm: 22.071769
Training: 2022-04-26 19:40:39,373-[lfw][70000]Accuracy-Flip: 0.99817+-0.00217
Training: 2022-04-26 19:40:39,374-[lfw][70000]Accuracy-Highest: 0.99833
Training: 2022-04-26 19:41:29,745-[cfp_fp][70000]XNorm: 22.305632
Training: 2022-04-26 19:41:29,746-[cfp_fp][70000]Accuracy-Flip: 0.99043+-0.00586
Training: 2022-04-26 19:41:29,746-[cfp_fp][70000]Accuracy-Highest: 0.99200
Training: 2022-04-26 19:42:13,278-[agedb_30][70000]XNorm: 22.536780
Training: 2022-04-26 19:42:13,279-[agedb_30][70000]Accuracy-Flip: 0.97733+-0.00642
Training: 2022-04-26 19:42:13,279-[agedb_30][70000]Accuracy-Highest: 0.97950
Training: 2022-04-26 19:42:16,351-Speed 72.95 samples/sec   Loss 1.2550   LearningRate 0.0024   Epoch: 16   Global Step: 70010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:42:19,432-Speed 3324.13 samples/sec   Loss 1.2515   LearningRate 0.0024   Epoch: 16   Global Step: 70020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:42:22,515-Speed 3322.38 samples/sec   Loss 1.2387   LearningRate 0.0023   Epoch: 16   Global Step: 70030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:42:25,624-Speed 3293.93 samples/sec   Loss 1.2359   LearningRate 0.0023   Epoch: 16   Global Step: 70040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:42:28,717-Speed 3311.59 samples/sec   Loss 1.2205   LearningRate 0.0023   Epoch: 16   Global Step: 70050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:42:31,804-Speed 3317.84 samples/sec   Loss 1.2038   LearningRate 0.0023   Epoch: 16   Global Step: 70060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:42:34,875-Speed 3334.77 samples/sec   Loss 1.1981   LearningRate 0.0023   Epoch: 16   Global Step: 70070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:37,975-Speed 3304.64 samples/sec   Loss 1.2533   LearningRate 0.0023   Epoch: 16   Global Step: 70080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:41,072-Speed 3306.60 samples/sec   Loss 1.2324   LearningRate 0.0023   Epoch: 16   Global Step: 70090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:44,160-Speed 3316.75 samples/sec   Loss 1.2770   LearningRate 0.0023   Epoch: 16   Global Step: 70100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:47,260-Speed 3303.87 samples/sec   Loss 1.2198   LearningRate 0.0023   Epoch: 16   Global Step: 70110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:50,371-Speed 3292.70 samples/sec   Loss 1.2095   LearningRate 0.0023   Epoch: 16   Global Step: 70120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:53,512-Speed 3260.65 samples/sec   Loss 1.2520   LearningRate 0.0023   Epoch: 16   Global Step: 70130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:56,606-Speed 3310.81 samples/sec   Loss 1.2432   LearningRate 0.0023   Epoch: 16   Global Step: 70140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:42:59,734-Speed 3274.62 samples/sec   Loss 1.1759   LearningRate 0.0023   Epoch: 16   Global Step: 70150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:43:02,827-Speed 3310.69 samples/sec   Loss 1.2203   LearningRate 0.0023   Epoch: 16   Global Step: 70160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 19:43:05,919-Speed 3312.36 samples/sec   Loss 1.1802   LearningRate 0.0023   Epoch: 16   Global Step: 70170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:09,011-Speed 3313.17 samples/sec   Loss 1.1568   LearningRate 0.0023   Epoch: 16   Global Step: 70180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:12,113-Speed 3302.02 samples/sec   Loss 1.2663   LearningRate 0.0023   Epoch: 16   Global Step: 70190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:15,204-Speed 3313.49 samples/sec   Loss 1.2177   LearningRate 0.0023   Epoch: 16   Global Step: 70200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:18,302-Speed 3305.75 samples/sec   Loss 1.2588   LearningRate 0.0023   Epoch: 16   Global Step: 70210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:21,392-Speed 3314.50 samples/sec   Loss 1.2788   LearningRate 0.0023   Epoch: 16   Global Step: 70220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:24,494-Speed 3301.52 samples/sec   Loss 1.2482   LearningRate 0.0023   Epoch: 16   Global Step: 70230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:27,797-Speed 3101.30 samples/sec   Loss 1.2194   LearningRate 0.0023   Epoch: 16   Global Step: 70240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:30,888-Speed 3313.67 samples/sec   Loss 1.2607   LearningRate 0.0023   Epoch: 16   Global Step: 70250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:34,186-Speed 3104.97 samples/sec   Loss 1.2223   LearningRate 0.0023   Epoch: 16   Global Step: 70260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:43:37,281-Speed 3309.65 samples/sec   Loss 1.1892   LearningRate 0.0023   Epoch: 16   Global Step: 70270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:43:40,373-Speed 3312.81 samples/sec   Loss 1.1987   LearningRate 0.0023   Epoch: 16   Global Step: 70280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:43:43,524-Speed 3250.12 samples/sec   Loss 1.1833   LearningRate 0.0023   Epoch: 16   Global Step: 70290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:43:55,942-Speed 824.67 samples/sec   Loss 0.9654   LearningRate 0.0022   Epoch: 17   Global Step: 70300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:43:59,026-Speed 3321.70 samples/sec   Loss 0.7138   LearningRate 0.0022   Epoch: 17   Global Step: 70310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:02,111-Speed 3320.16 samples/sec   Loss 0.7472   LearningRate 0.0022   Epoch: 17   Global Step: 70320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:05,205-Speed 3310.39 samples/sec   Loss 0.7436   LearningRate 0.0022   Epoch: 17   Global Step: 70330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:08,294-Speed 3316.26 samples/sec   Loss 0.7536   LearningRate 0.0022   Epoch: 17   Global Step: 70340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:11,378-Speed 3320.11 samples/sec   Loss 0.7162   LearningRate 0.0022   Epoch: 17   Global Step: 70350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:14,470-Speed 3313.27 samples/sec   Loss 0.7394   LearningRate 0.0022   Epoch: 17   Global Step: 70360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:17,629-Speed 3242.06 samples/sec   Loss 0.7380   LearningRate 0.0022   Epoch: 17   Global Step: 70370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:20,714-Speed 3320.17 samples/sec   Loss 0.7380   LearningRate 0.0022   Epoch: 17   Global Step: 70380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:23,793-Speed 3326.35 samples/sec   Loss 0.7704   LearningRate 0.0022   Epoch: 17   Global Step: 70390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:26,874-Speed 3324.91 samples/sec   Loss 0.7357   LearningRate 0.0022   Epoch: 17   Global Step: 70400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:29,956-Speed 3322.76 samples/sec   Loss 0.7261   LearningRate 0.0022   Epoch: 17   Global Step: 70410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:33,034-Speed 3327.97 samples/sec   Loss 0.7790   LearningRate 0.0022   Epoch: 17   Global Step: 70420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:36,189-Speed 3246.32 samples/sec   Loss 0.7614   LearningRate 0.0022   Epoch: 17   Global Step: 70430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:39,294-Speed 3298.59 samples/sec   Loss 0.7554   LearningRate 0.0022   Epoch: 17   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:42,417-Speed 3280.47 samples/sec   Loss 0.7866   LearningRate 0.0022   Epoch: 17   Global Step: 70450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:45,496-Speed 3325.67 samples/sec   Loss 0.7479   LearningRate 0.0022   Epoch: 17   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:48,576-Speed 3325.76 samples/sec   Loss 0.7589   LearningRate 0.0022   Epoch: 17   Global Step: 70470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 19:44:51,638-Speed 3344.88 samples/sec   Loss 0.7425   LearningRate 0.0022   Epoch: 17   Global Step: 70480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:54,756-Speed 3285.18 samples/sec   Loss 0.7160   LearningRate 0.0022   Epoch: 17   Global Step: 70490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:44:57,831-Speed 3330.32 samples/sec   Loss 0.7170   LearningRate 0.0022   Epoch: 17   Global Step: 70500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:00,919-Speed 3316.95 samples/sec   Loss 0.7631   LearningRate 0.0022   Epoch: 17   Global Step: 70510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:04,006-Speed 3317.89 samples/sec   Loss 0.7523   LearningRate 0.0022   Epoch: 17   Global Step: 70520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:07,087-Speed 3324.96 samples/sec   Loss 0.7566   LearningRate 0.0022   Epoch: 17   Global Step: 70530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:10,168-Speed 3323.56 samples/sec   Loss 0.7261   LearningRate 0.0022   Epoch: 17   Global Step: 70540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:13,237-Speed 3338.05 samples/sec   Loss 0.7247   LearningRate 0.0022   Epoch: 17   Global Step: 70550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:16,326-Speed 3315.51 samples/sec   Loss 0.7271   LearningRate 0.0022   Epoch: 17   Global Step: 70560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:19,638-Speed 3092.38 samples/sec   Loss 0.7466   LearningRate 0.0022   Epoch: 17   Global Step: 70570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:22,718-Speed 3325.17 samples/sec   Loss 0.7403   LearningRate 0.0021   Epoch: 17   Global Step: 70580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:25,804-Speed 3318.84 samples/sec   Loss 0.7183   LearningRate 0.0021   Epoch: 17   Global Step: 70590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:29,098-Speed 3109.40 samples/sec   Loss 0.7540   LearningRate 0.0021   Epoch: 17   Global Step: 70600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:32,184-Speed 3319.33 samples/sec   Loss 0.7416   LearningRate 0.0021   Epoch: 17   Global Step: 70610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:35,281-Speed 3307.14 samples/sec   Loss 0.7554   LearningRate 0.0021   Epoch: 17   Global Step: 70620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:38,363-Speed 3323.88 samples/sec   Loss 0.7557   LearningRate 0.0021   Epoch: 17   Global Step: 70630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:41,447-Speed 3320.88 samples/sec   Loss 0.7573   LearningRate 0.0021   Epoch: 17   Global Step: 70640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:45:44,532-Speed 3320.11 samples/sec   Loss 0.7660   LearningRate 0.0021   Epoch: 17   Global Step: 70650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:47,622-Speed 3314.55 samples/sec   Loss 0.7437   LearningRate 0.0021   Epoch: 17   Global Step: 70660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:50,713-Speed 3313.67 samples/sec   Loss 0.7718   LearningRate 0.0021   Epoch: 17   Global Step: 70670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:53,800-Speed 3318.21 samples/sec   Loss 0.7334   LearningRate 0.0021   Epoch: 17   Global Step: 70680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:56,885-Speed 3319.99 samples/sec   Loss 0.7339   LearningRate 0.0021   Epoch: 17   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:45:59,969-Speed 3320.52 samples/sec   Loss 0.8094   LearningRate 0.0021   Epoch: 17   Global Step: 70700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:46:03,063-Speed 3310.73 samples/sec   Loss 0.7134   LearningRate 0.0021   Epoch: 17   Global Step: 70710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:46:06,129-Speed 3340.28 samples/sec   Loss 0.7549   LearningRate 0.0021   Epoch: 17   Global Step: 70720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:09,215-Speed 3319.72 samples/sec   Loss 0.7767   LearningRate 0.0021   Epoch: 17   Global Step: 70730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:12,318-Speed 3300.81 samples/sec   Loss 0.7331   LearningRate 0.0021   Epoch: 17   Global Step: 70740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:15,411-Speed 3310.92 samples/sec   Loss 0.7394   LearningRate 0.0021   Epoch: 17   Global Step: 70750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:18,495-Speed 3321.13 samples/sec   Loss 0.7558   LearningRate 0.0021   Epoch: 17   Global Step: 70760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:21,580-Speed 3320.33 samples/sec   Loss 0.7580   LearningRate 0.0021   Epoch: 17   Global Step: 70770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:24,686-Speed 3297.52 samples/sec   Loss 0.7371   LearningRate 0.0021   Epoch: 17   Global Step: 70780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:27,770-Speed 3320.82 samples/sec   Loss 0.7604   LearningRate 0.0021   Epoch: 17   Global Step: 70790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:30,868-Speed 3306.40 samples/sec   Loss 0.7488   LearningRate 0.0021   Epoch: 17   Global Step: 70800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:33,962-Speed 3310.37 samples/sec   Loss 0.7652   LearningRate 0.0021   Epoch: 17   Global Step: 70810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:37,220-Speed 3143.57 samples/sec   Loss 0.7747   LearningRate 0.0021   Epoch: 17   Global Step: 70820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:40,322-Speed 3302.44 samples/sec   Loss 0.7725   LearningRate 0.0021   Epoch: 17   Global Step: 70830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:43,428-Speed 3297.73 samples/sec   Loss 0.7177   LearningRate 0.0021   Epoch: 17   Global Step: 70840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:46,518-Speed 3315.14 samples/sec   Loss 0.7612   LearningRate 0.0021   Epoch: 17   Global Step: 70850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:49,604-Speed 3318.53 samples/sec   Loss 0.7406   LearningRate 0.0020   Epoch: 17   Global Step: 70860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:52,704-Speed 3303.76 samples/sec   Loss 0.7898   LearningRate 0.0020   Epoch: 17   Global Step: 70870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:55,805-Speed 3303.03 samples/sec   Loss 0.7916   LearningRate 0.0020   Epoch: 17   Global Step: 70880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:46:58,905-Speed 3303.93 samples/sec   Loss 0.7520   LearningRate 0.0020   Epoch: 17   Global Step: 70890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:47:02,002-Speed 3307.38 samples/sec   Loss 0.7649   LearningRate 0.0020   Epoch: 17   Global Step: 70900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:47:05,092-Speed 3314.37 samples/sec   Loss 0.7728   LearningRate 0.0020   Epoch: 17   Global Step: 70910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:47:08,178-Speed 3319.44 samples/sec   Loss 0.7472   LearningRate 0.0020   Epoch: 17   Global Step: 70920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:11,264-Speed 3318.19 samples/sec   Loss 0.7652   LearningRate 0.0020   Epoch: 17   Global Step: 70930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:14,349-Speed 3320.61 samples/sec   Loss 0.7746   LearningRate 0.0020   Epoch: 17   Global Step: 70940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:17,458-Speed 3294.44 samples/sec   Loss 0.7731   LearningRate 0.0020   Epoch: 17   Global Step: 70950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:20,547-Speed 3315.82 samples/sec   Loss 0.7686   LearningRate 0.0020   Epoch: 17   Global Step: 70960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:23,640-Speed 3310.97 samples/sec   Loss 0.7679   LearningRate 0.0020   Epoch: 17   Global Step: 70970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:26,730-Speed 3315.31 samples/sec   Loss 0.7545   LearningRate 0.0020   Epoch: 17   Global Step: 70980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:29,831-Speed 3302.84 samples/sec   Loss 0.7861   LearningRate 0.0020   Epoch: 17   Global Step: 70990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:32,917-Speed 3318.90 samples/sec   Loss 0.7525   LearningRate 0.0020   Epoch: 17   Global Step: 71000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:36,004-Speed 3317.13 samples/sec   Loss 0.7632   LearningRate 0.0020   Epoch: 17   Global Step: 71010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:39,092-Speed 3317.20 samples/sec   Loss 0.7460   LearningRate 0.0020   Epoch: 17   Global Step: 71020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 19:47:42,172-Speed 3325.83 samples/sec   Loss 0.7674   LearningRate 0.0020   Epoch: 17   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:45,271-Speed 3305.13 samples/sec   Loss 0.7787   LearningRate 0.0020   Epoch: 17   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:48,388-Speed 3285.72 samples/sec   Loss 0.7625   LearningRate 0.0020   Epoch: 17   Global Step: 71050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:51,491-Speed 3300.41 samples/sec   Loss 0.7965   LearningRate 0.0020   Epoch: 17   Global Step: 71060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:54,651-Speed 3242.01 samples/sec   Loss 0.7632   LearningRate 0.0020   Epoch: 17   Global Step: 71070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:47:57,742-Speed 3313.17 samples/sec   Loss 0.7933   LearningRate 0.0020   Epoch: 17   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:00,835-Speed 3311.19 samples/sec   Loss 0.7781   LearningRate 0.0020   Epoch: 17   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:03,944-Speed 3294.07 samples/sec   Loss 0.7444   LearningRate 0.0020   Epoch: 17   Global Step: 71100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:07,030-Speed 3319.15 samples/sec   Loss 0.8122   LearningRate 0.0020   Epoch: 17   Global Step: 71110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:10,120-Speed 3314.29 samples/sec   Loss 0.7672   LearningRate 0.0020   Epoch: 17   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:13,234-Speed 3290.22 samples/sec   Loss 0.7552   LearningRate 0.0020   Epoch: 17   Global Step: 71130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:16,406-Speed 3228.56 samples/sec   Loss 0.7951   LearningRate 0.0020   Epoch: 17   Global Step: 71140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:19,493-Speed 3318.06 samples/sec   Loss 0.7702   LearningRate 0.0020   Epoch: 17   Global Step: 71150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:22,582-Speed 3315.43 samples/sec   Loss 0.7939   LearningRate 0.0019   Epoch: 17   Global Step: 71160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:25,673-Speed 3314.14 samples/sec   Loss 0.7653   LearningRate 0.0019   Epoch: 17   Global Step: 71170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:28,763-Speed 3313.93 samples/sec   Loss 0.7634   LearningRate 0.0019   Epoch: 17   Global Step: 71180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:31,850-Speed 3318.91 samples/sec   Loss 0.7870   LearningRate 0.0019   Epoch: 17   Global Step: 71190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:34,943-Speed 3310.52 samples/sec   Loss 0.7760   LearningRate 0.0019   Epoch: 17   Global Step: 71200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:38,121-Speed 3223.19 samples/sec   Loss 0.7354   LearningRate 0.0019   Epoch: 17   Global Step: 71210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:41,219-Speed 3305.30 samples/sec   Loss 0.7852   LearningRate 0.0019   Epoch: 17   Global Step: 71220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:44,313-Speed 3310.82 samples/sec   Loss 0.7656   LearningRate 0.0019   Epoch: 17   Global Step: 71230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:48:47,400-Speed 3318.11 samples/sec   Loss 0.7829   LearningRate 0.0019   Epoch: 17   Global Step: 71240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:50,496-Speed 3308.71 samples/sec   Loss 0.7801   LearningRate 0.0019   Epoch: 17   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:53,613-Speed 3286.51 samples/sec   Loss 0.8033   LearningRate 0.0019   Epoch: 17   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:56,700-Speed 3317.75 samples/sec   Loss 0.7580   LearningRate 0.0019   Epoch: 17   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:48:59,797-Speed 3306.24 samples/sec   Loss 0.7491   LearningRate 0.0019   Epoch: 17   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:49:02,884-Speed 3318.01 samples/sec   Loss 0.7857   LearningRate 0.0019   Epoch: 17   Global Step: 71290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:05,975-Speed 3314.17 samples/sec   Loss 0.7655   LearningRate 0.0019   Epoch: 17   Global Step: 71300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:09,076-Speed 3302.26 samples/sec   Loss 0.7738   LearningRate 0.0019   Epoch: 17   Global Step: 71310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:12,167-Speed 3313.90 samples/sec   Loss 0.7979   LearningRate 0.0019   Epoch: 17   Global Step: 71320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:15,271-Speed 3298.66 samples/sec   Loss 0.7813   LearningRate 0.0019   Epoch: 17   Global Step: 71330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:18,361-Speed 3316.12 samples/sec   Loss 0.7912   LearningRate 0.0019   Epoch: 17   Global Step: 71340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:21,474-Speed 3290.09 samples/sec   Loss 0.7637   LearningRate 0.0019   Epoch: 17   Global Step: 71350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:24,570-Speed 3308.39 samples/sec   Loss 0.8016   LearningRate 0.0019   Epoch: 17   Global Step: 71360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:27,682-Speed 3290.86 samples/sec   Loss 0.7663   LearningRate 0.0019   Epoch: 17   Global Step: 71370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:30,796-Speed 3289.12 samples/sec   Loss 0.8000   LearningRate 0.0019   Epoch: 17   Global Step: 71380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:33,882-Speed 3318.96 samples/sec   Loss 0.7825   LearningRate 0.0019   Epoch: 17   Global Step: 71390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:49:36,979-Speed 3307.31 samples/sec   Loss 0.8148   LearningRate 0.0019   Epoch: 17   Global Step: 71400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:49:40,062-Speed 3322.14 samples/sec   Loss 0.7671   LearningRate 0.0019   Epoch: 17   Global Step: 71410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:49:43,163-Speed 3302.14 samples/sec   Loss 0.7923   LearningRate 0.0019   Epoch: 17   Global Step: 71420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:49:46,253-Speed 3315.56 samples/sec   Loss 0.7440   LearningRate 0.0019   Epoch: 17   Global Step: 71430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:49:49,448-Speed 3205.91 samples/sec   Loss 0.8017   LearningRate 0.0019   Epoch: 17   Global Step: 71440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:52,584-Speed 3265.00 samples/sec   Loss 0.7998   LearningRate 0.0019   Epoch: 17   Global Step: 71450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:55,689-Speed 3298.67 samples/sec   Loss 0.7690   LearningRate 0.0018   Epoch: 17   Global Step: 71460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:49:58,778-Speed 3316.85 samples/sec   Loss 0.7745   LearningRate 0.0018   Epoch: 17   Global Step: 71470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:01,873-Speed 3308.13 samples/sec   Loss 0.7822   LearningRate 0.0018   Epoch: 17   Global Step: 71480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:04,961-Speed 3316.71 samples/sec   Loss 0.7424   LearningRate 0.0018   Epoch: 17   Global Step: 71490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:08,054-Speed 3311.98 samples/sec   Loss 0.8231   LearningRate 0.0018   Epoch: 17   Global Step: 71500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:11,148-Speed 3310.98 samples/sec   Loss 0.7990   LearningRate 0.0018   Epoch: 17   Global Step: 71510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:14,238-Speed 3313.96 samples/sec   Loss 0.7778   LearningRate 0.0018   Epoch: 17   Global Step: 71520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:17,330-Speed 3312.86 samples/sec   Loss 0.7655   LearningRate 0.0018   Epoch: 17   Global Step: 71530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:20,417-Speed 3317.57 samples/sec   Loss 0.8064   LearningRate 0.0018   Epoch: 17   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:50:23,489-Speed 3334.98 samples/sec   Loss 0.7606   LearningRate 0.0018   Epoch: 17   Global Step: 71550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:26,577-Speed 3316.38 samples/sec   Loss 0.8264   LearningRate 0.0018   Epoch: 17   Global Step: 71560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:29,664-Speed 3317.69 samples/sec   Loss 0.7880   LearningRate 0.0018   Epoch: 17   Global Step: 71570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:32,749-Speed 3319.79 samples/sec   Loss 0.8117   LearningRate 0.0018   Epoch: 17   Global Step: 71580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:35,843-Speed 3310.61 samples/sec   Loss 0.7736   LearningRate 0.0018   Epoch: 17   Global Step: 71590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:38,930-Speed 3318.10 samples/sec   Loss 0.7824   LearningRate 0.0018   Epoch: 17   Global Step: 71600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:42,016-Speed 3318.39 samples/sec   Loss 0.8069   LearningRate 0.0018   Epoch: 17   Global Step: 71610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:45,102-Speed 3319.68 samples/sec   Loss 0.7653   LearningRate 0.0018   Epoch: 17   Global Step: 71620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:48,192-Speed 3314.98 samples/sec   Loss 0.8148   LearningRate 0.0018   Epoch: 17   Global Step: 71630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:51,283-Speed 3313.37 samples/sec   Loss 0.7852   LearningRate 0.0018   Epoch: 17   Global Step: 71640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:50:54,462-Speed 3221.96 samples/sec   Loss 0.7944   LearningRate 0.0018   Epoch: 17   Global Step: 71650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:50:57,552-Speed 3314.27 samples/sec   Loss 0.8065   LearningRate 0.0018   Epoch: 17   Global Step: 71660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:51:00,689-Speed 3264.72 samples/sec   Loss 0.7601   LearningRate 0.0018   Epoch: 17   Global Step: 71670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:51:03,791-Speed 3301.71 samples/sec   Loss 0.7926   LearningRate 0.0018   Epoch: 17   Global Step: 71680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:51:06,883-Speed 3312.94 samples/sec   Loss 0.7507   LearningRate 0.0018   Epoch: 17   Global Step: 71690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:51:09,956-Speed 3333.10 samples/sec   Loss 0.7949   LearningRate 0.0018   Epoch: 17   Global Step: 71700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:13,045-Speed 3316.08 samples/sec   Loss 0.7805   LearningRate 0.0018   Epoch: 17   Global Step: 71710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:16,141-Speed 3308.63 samples/sec   Loss 0.8077   LearningRate 0.0018   Epoch: 17   Global Step: 71720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:19,229-Speed 3316.74 samples/sec   Loss 0.8112   LearningRate 0.0018   Epoch: 17   Global Step: 71730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:22,373-Speed 3257.37 samples/sec   Loss 0.7852   LearningRate 0.0018   Epoch: 17   Global Step: 71740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:25,480-Speed 3296.53 samples/sec   Loss 0.8045   LearningRate 0.0018   Epoch: 17   Global Step: 71750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:28,573-Speed 3312.07 samples/sec   Loss 0.8028   LearningRate 0.0017   Epoch: 17   Global Step: 71760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:31,667-Speed 3310.17 samples/sec   Loss 0.7973   LearningRate 0.0017   Epoch: 17   Global Step: 71770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:34,756-Speed 3315.01 samples/sec   Loss 0.7983   LearningRate 0.0017   Epoch: 17   Global Step: 71780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:37,841-Speed 3320.17 samples/sec   Loss 0.7915   LearningRate 0.0017   Epoch: 17   Global Step: 71790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:40,934-Speed 3311.43 samples/sec   Loss 0.8129   LearningRate 0.0017   Epoch: 17   Global Step: 71800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:51:44,003-Speed 3337.45 samples/sec   Loss 0.8200   LearningRate 0.0017   Epoch: 17   Global Step: 71810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:47,098-Speed 3309.68 samples/sec   Loss 0.7692   LearningRate 0.0017   Epoch: 17   Global Step: 71820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:50,191-Speed 3312.00 samples/sec   Loss 0.8566   LearningRate 0.0017   Epoch: 17   Global Step: 71830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:53,296-Speed 3298.67 samples/sec   Loss 0.8262   LearningRate 0.0017   Epoch: 17   Global Step: 71840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:56,389-Speed 3311.34 samples/sec   Loss 0.8028   LearningRate 0.0017   Epoch: 17   Global Step: 71850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:51:59,472-Speed 3322.26 samples/sec   Loss 0.8127   LearningRate 0.0017   Epoch: 17   Global Step: 71860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:02,562-Speed 3314.03 samples/sec   Loss 0.8081   LearningRate 0.0017   Epoch: 17   Global Step: 71870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:05,657-Speed 3309.15 samples/sec   Loss 0.7664   LearningRate 0.0017   Epoch: 17   Global Step: 71880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:08,747-Speed 3314.47 samples/sec   Loss 0.7881   LearningRate 0.0017   Epoch: 17   Global Step: 71890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:11,851-Speed 3299.47 samples/sec   Loss 0.8268   LearningRate 0.0017   Epoch: 17   Global Step: 71900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:14,975-Speed 3278.93 samples/sec   Loss 0.8294   LearningRate 0.0017   Epoch: 17   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:52:18,132-Speed 3244.34 samples/sec   Loss 0.7753   LearningRate 0.0017   Epoch: 17   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:52:21,197-Speed 3341.49 samples/sec   Loss 0.7935   LearningRate 0.0017   Epoch: 17   Global Step: 71930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:24,285-Speed 3317.66 samples/sec   Loss 0.8251   LearningRate 0.0017   Epoch: 17   Global Step: 71940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:27,373-Speed 3316.41 samples/sec   Loss 0.7931   LearningRate 0.0017   Epoch: 17   Global Step: 71950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:30,470-Speed 3307.26 samples/sec   Loss 0.7645   LearningRate 0.0017   Epoch: 17   Global Step: 71960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:33,559-Speed 3315.32 samples/sec   Loss 0.7923   LearningRate 0.0017   Epoch: 17   Global Step: 71970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:36,647-Speed 3316.85 samples/sec   Loss 0.8003   LearningRate 0.0017   Epoch: 17   Global Step: 71980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:39,734-Speed 3317.45 samples/sec   Loss 0.8053   LearningRate 0.0017   Epoch: 17   Global Step: 71990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:52:42,864-Speed 3272.19 samples/sec   Loss 0.8076   LearningRate 0.0017   Epoch: 17   Global Step: 72000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:53:26,388-[lfw][72000]XNorm: 21.185405
Training: 2022-04-26 19:53:26,388-[lfw][72000]Accuracy-Flip: 0.99850+-0.00203
Training: 2022-04-26 19:53:26,389-[lfw][72000]Accuracy-Highest: 0.99850
Training: 2022-04-26 19:54:16,907-[cfp_fp][72000]XNorm: 21.831528
Training: 2022-04-26 19:54:16,908-[cfp_fp][72000]Accuracy-Flip: 0.99286+-0.00419
Training: 2022-04-26 19:54:16,908-[cfp_fp][72000]Accuracy-Highest: 0.99286
Training: 2022-04-26 19:55:00,537-[agedb_30][72000]XNorm: 21.874107
Training: 2022-04-26 19:55:00,537-[agedb_30][72000]Accuracy-Flip: 0.97833+-0.00645
Training: 2022-04-26 19:55:00,538-[agedb_30][72000]Accuracy-Highest: 0.97950
Training: 2022-04-26 19:55:03,662-Speed 72.73 samples/sec   Loss 0.8210   LearningRate 0.0017   Epoch: 17   Global Step: 72010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:06,753-Speed 3313.37 samples/sec   Loss 0.7925   LearningRate 0.0017   Epoch: 17   Global Step: 72020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:09,843-Speed 3314.71 samples/sec   Loss 0.8165   LearningRate 0.0017   Epoch: 17   Global Step: 72030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:55:12,911-Speed 3338.73 samples/sec   Loss 0.7874   LearningRate 0.0017   Epoch: 17   Global Step: 72040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:16,005-Speed 3310.56 samples/sec   Loss 0.8040   LearningRate 0.0017   Epoch: 17   Global Step: 72050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:19,083-Speed 3327.61 samples/sec   Loss 0.7901   LearningRate 0.0017   Epoch: 17   Global Step: 72060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:22,157-Speed 3331.14 samples/sec   Loss 0.7563   LearningRate 0.0017   Epoch: 17   Global Step: 72070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:25,239-Speed 3324.45 samples/sec   Loss 0.8272   LearningRate 0.0016   Epoch: 17   Global Step: 72080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:28,319-Speed 3324.62 samples/sec   Loss 0.8004   LearningRate 0.0016   Epoch: 17   Global Step: 72090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:31,397-Speed 3328.06 samples/sec   Loss 0.7893   LearningRate 0.0016   Epoch: 17   Global Step: 72100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:34,473-Speed 3329.36 samples/sec   Loss 0.8231   LearningRate 0.0016   Epoch: 17   Global Step: 72110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:37,562-Speed 3315.46 samples/sec   Loss 0.8011   LearningRate 0.0016   Epoch: 17   Global Step: 72120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:40,642-Speed 3326.23 samples/sec   Loss 0.7959   LearningRate 0.0016   Epoch: 17   Global Step: 72130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:55:43,723-Speed 3324.01 samples/sec   Loss 0.8082   LearningRate 0.0016   Epoch: 17   Global Step: 72140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:55:46,828-Speed 3298.67 samples/sec   Loss 0.7741   LearningRate 0.0016   Epoch: 17   Global Step: 72150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:55:49,953-Speed 3277.32 samples/sec   Loss 0.7843   LearningRate 0.0016   Epoch: 17   Global Step: 72160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:55:53,066-Speed 3290.84 samples/sec   Loss 0.7926   LearningRate 0.0016   Epoch: 17   Global Step: 72170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:55:56,150-Speed 3320.39 samples/sec   Loss 0.8354   LearningRate 0.0016   Epoch: 17   Global Step: 72180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:55:59,236-Speed 3319.78 samples/sec   Loss 0.7817   LearningRate 0.0016   Epoch: 17   Global Step: 72190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:02,323-Speed 3318.04 samples/sec   Loss 0.8146   LearningRate 0.0016   Epoch: 17   Global Step: 72200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:05,406-Speed 3321.52 samples/sec   Loss 0.7808   LearningRate 0.0016   Epoch: 17   Global Step: 72210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:08,495-Speed 3316.42 samples/sec   Loss 0.8175   LearningRate 0.0016   Epoch: 17   Global Step: 72220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:11,605-Speed 3293.64 samples/sec   Loss 0.7728   LearningRate 0.0016   Epoch: 17   Global Step: 72230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:14,695-Speed 3314.92 samples/sec   Loss 0.8058   LearningRate 0.0016   Epoch: 17   Global Step: 72240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 19:56:17,846-Speed 3249.52 samples/sec   Loss 0.8034   LearningRate 0.0016   Epoch: 17   Global Step: 72250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:20,934-Speed 3316.95 samples/sec   Loss 0.7965   LearningRate 0.0016   Epoch: 17   Global Step: 72260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:24,028-Speed 3310.90 samples/sec   Loss 0.8425   LearningRate 0.0016   Epoch: 17   Global Step: 72270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:27,120-Speed 3312.95 samples/sec   Loss 0.8346   LearningRate 0.0016   Epoch: 17   Global Step: 72280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:30,211-Speed 3313.74 samples/sec   Loss 0.8155   LearningRate 0.0016   Epoch: 17   Global Step: 72290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:33,296-Speed 3319.22 samples/sec   Loss 0.8125   LearningRate 0.0016   Epoch: 17   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:36,384-Speed 3317.39 samples/sec   Loss 0.8286   LearningRate 0.0016   Epoch: 17   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:39,477-Speed 3311.73 samples/sec   Loss 0.8166   LearningRate 0.0016   Epoch: 17   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:56:42,544-Speed 3338.80 samples/sec   Loss 0.7984   LearningRate 0.0016   Epoch: 17   Global Step: 72330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:56:45,629-Speed 3320.51 samples/sec   Loss 0.7915   LearningRate 0.0016   Epoch: 17   Global Step: 72340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:56:48,713-Speed 3320.86 samples/sec   Loss 0.8293   LearningRate 0.0016   Epoch: 17   Global Step: 72350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:56:51,805-Speed 3311.96 samples/sec   Loss 0.8316   LearningRate 0.0016   Epoch: 17   Global Step: 72360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:56:54,898-Speed 3312.80 samples/sec   Loss 0.8041   LearningRate 0.0016   Epoch: 17   Global Step: 72370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:56:57,982-Speed 3321.05 samples/sec   Loss 0.7968   LearningRate 0.0016   Epoch: 17   Global Step: 72380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:01,065-Speed 3321.97 samples/sec   Loss 0.7857   LearningRate 0.0016   Epoch: 17   Global Step: 72390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:04,148-Speed 3322.30 samples/sec   Loss 0.8010   LearningRate 0.0016   Epoch: 17   Global Step: 72400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:07,248-Speed 3303.39 samples/sec   Loss 0.8177   LearningRate 0.0015   Epoch: 17   Global Step: 72410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:10,325-Speed 3328.58 samples/sec   Loss 0.8164   LearningRate 0.0015   Epoch: 17   Global Step: 72420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:13,404-Speed 3326.53 samples/sec   Loss 0.8009   LearningRate 0.0015   Epoch: 17   Global Step: 72430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:57:16,486-Speed 3322.96 samples/sec   Loss 0.8159   LearningRate 0.0015   Epoch: 17   Global Step: 72440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:57:19,561-Speed 3331.52 samples/sec   Loss 0.8003   LearningRate 0.0015   Epoch: 17   Global Step: 72450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:57:22,650-Speed 3315.29 samples/sec   Loss 0.8117   LearningRate 0.0015   Epoch: 17   Global Step: 72460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:25,732-Speed 3323.68 samples/sec   Loss 0.7994   LearningRate 0.0015   Epoch: 17   Global Step: 72470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:28,810-Speed 3327.71 samples/sec   Loss 0.7848   LearningRate 0.0015   Epoch: 17   Global Step: 72480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:31,888-Speed 3327.78 samples/sec   Loss 0.8111   LearningRate 0.0015   Epoch: 17   Global Step: 72490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:34,971-Speed 3321.51 samples/sec   Loss 0.8105   LearningRate 0.0015   Epoch: 17   Global Step: 72500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:38,051-Speed 3325.66 samples/sec   Loss 0.8028   LearningRate 0.0015   Epoch: 17   Global Step: 72510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:41,128-Speed 3328.38 samples/sec   Loss 0.8187   LearningRate 0.0015   Epoch: 17   Global Step: 72520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:44,205-Speed 3328.73 samples/sec   Loss 0.8269   LearningRate 0.0015   Epoch: 17   Global Step: 72530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:47,283-Speed 3327.59 samples/sec   Loss 0.8324   LearningRate 0.0015   Epoch: 17   Global Step: 72540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:50,367-Speed 3321.08 samples/sec   Loss 0.8070   LearningRate 0.0015   Epoch: 17   Global Step: 72550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:57:53,452-Speed 3320.30 samples/sec   Loss 0.8007   LearningRate 0.0015   Epoch: 17   Global Step: 72560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:57:56,525-Speed 3332.28 samples/sec   Loss 0.8245   LearningRate 0.0015   Epoch: 17   Global Step: 72570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:57:59,601-Speed 3330.38 samples/sec   Loss 0.8099   LearningRate 0.0015   Epoch: 17   Global Step: 72580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:02,676-Speed 3331.44 samples/sec   Loss 0.8501   LearningRate 0.0015   Epoch: 17   Global Step: 72590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:05,750-Speed 3331.05 samples/sec   Loss 0.8008   LearningRate 0.0015   Epoch: 17   Global Step: 72600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:08,831-Speed 3324.24 samples/sec   Loss 0.8280   LearningRate 0.0015   Epoch: 17   Global Step: 72610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:11,913-Speed 3324.08 samples/sec   Loss 0.8163   LearningRate 0.0015   Epoch: 17   Global Step: 72620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:15,019-Speed 3296.92 samples/sec   Loss 0.8081   LearningRate 0.0015   Epoch: 17   Global Step: 72630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:18,096-Speed 3329.65 samples/sec   Loss 0.8334   LearningRate 0.0015   Epoch: 17   Global Step: 72640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:21,173-Speed 3328.04 samples/sec   Loss 0.7859   LearningRate 0.0015   Epoch: 17   Global Step: 72650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:24,240-Speed 3339.60 samples/sec   Loss 0.7603   LearningRate 0.0015   Epoch: 17   Global Step: 72660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:27,320-Speed 3325.66 samples/sec   Loss 0.7679   LearningRate 0.0015   Epoch: 17   Global Step: 72670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:30,414-Speed 3310.92 samples/sec   Loss 0.8260   LearningRate 0.0015   Epoch: 17   Global Step: 72680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:33,495-Speed 3324.27 samples/sec   Loss 0.8309   LearningRate 0.0015   Epoch: 17   Global Step: 72690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:36,599-Speed 3299.18 samples/sec   Loss 0.8064   LearningRate 0.0015   Epoch: 17   Global Step: 72700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:39,782-Speed 3217.63 samples/sec   Loss 0.8247   LearningRate 0.0015   Epoch: 17   Global Step: 72710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:42,909-Speed 3275.87 samples/sec   Loss 0.7681   LearningRate 0.0015   Epoch: 17   Global Step: 72720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:45,989-Speed 3324.87 samples/sec   Loss 0.8191   LearningRate 0.0015   Epoch: 17   Global Step: 72730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:49,074-Speed 3320.01 samples/sec   Loss 0.8469   LearningRate 0.0015   Epoch: 17   Global Step: 72740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:52,156-Speed 3324.23 samples/sec   Loss 0.8119   LearningRate 0.0014   Epoch: 17   Global Step: 72750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:55,218-Speed 3344.35 samples/sec   Loss 0.8579   LearningRate 0.0014   Epoch: 17   Global Step: 72760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:58:58,298-Speed 3325.46 samples/sec   Loss 0.8415   LearningRate 0.0014   Epoch: 17   Global Step: 72770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:01,378-Speed 3326.99 samples/sec   Loss 0.8307   LearningRate 0.0014   Epoch: 17   Global Step: 72780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:04,455-Speed 3328.81 samples/sec   Loss 0.8035   LearningRate 0.0014   Epoch: 17   Global Step: 72790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:07,531-Speed 3330.11 samples/sec   Loss 0.7939   LearningRate 0.0014   Epoch: 17   Global Step: 72800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:10,621-Speed 3314.35 samples/sec   Loss 0.7881   LearningRate 0.0014   Epoch: 17   Global Step: 72810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:13,705-Speed 3321.57 samples/sec   Loss 0.7860   LearningRate 0.0014   Epoch: 17   Global Step: 72820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:16,788-Speed 3322.22 samples/sec   Loss 0.7829   LearningRate 0.0014   Epoch: 17   Global Step: 72830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:19,872-Speed 3320.87 samples/sec   Loss 0.8021   LearningRate 0.0014   Epoch: 17   Global Step: 72840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:22,952-Speed 3325.58 samples/sec   Loss 0.8089   LearningRate 0.0014   Epoch: 17   Global Step: 72850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:26,056-Speed 3299.61 samples/sec   Loss 0.7961   LearningRate 0.0014   Epoch: 17   Global Step: 72860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:29,137-Speed 3323.81 samples/sec   Loss 0.8280   LearningRate 0.0014   Epoch: 17   Global Step: 72870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:32,224-Speed 3317.93 samples/sec   Loss 0.8054   LearningRate 0.0014   Epoch: 17   Global Step: 72880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 19:59:35,324-Speed 3304.59 samples/sec   Loss 0.7943   LearningRate 0.0014   Epoch: 17   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:38,423-Speed 3305.29 samples/sec   Loss 0.8151   LearningRate 0.0014   Epoch: 17   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:41,518-Speed 3309.29 samples/sec   Loss 0.8158   LearningRate 0.0014   Epoch: 17   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:44,600-Speed 3322.81 samples/sec   Loss 0.8088   LearningRate 0.0014   Epoch: 17   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:47,721-Speed 3281.80 samples/sec   Loss 0.8274   LearningRate 0.0014   Epoch: 17   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:50,804-Speed 3322.27 samples/sec   Loss 0.8456   LearningRate 0.0014   Epoch: 17   Global Step: 72940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:53,892-Speed 3317.07 samples/sec   Loss 0.8170   LearningRate 0.0014   Epoch: 17   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 19:59:56,976-Speed 3321.09 samples/sec   Loss 0.8229   LearningRate 0.0014   Epoch: 17   Global Step: 72960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:00:00,058-Speed 3323.29 samples/sec   Loss 0.8272   LearningRate 0.0014   Epoch: 17   Global Step: 72970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:00:03,141-Speed 3322.97 samples/sec   Loss 0.8233   LearningRate 0.0014   Epoch: 17   Global Step: 72980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:00:06,201-Speed 3347.04 samples/sec   Loss 0.8485   LearningRate 0.0014   Epoch: 17   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:00:09,263-Speed 3344.60 samples/sec   Loss 0.8424   LearningRate 0.0014   Epoch: 17   Global Step: 73000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:12,358-Speed 3308.89 samples/sec   Loss 0.8117   LearningRate 0.0014   Epoch: 17   Global Step: 73010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:15,443-Speed 3320.52 samples/sec   Loss 0.8076   LearningRate 0.0014   Epoch: 17   Global Step: 73020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:18,526-Speed 3322.22 samples/sec   Loss 0.8368   LearningRate 0.0014   Epoch: 17   Global Step: 73030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:21,613-Speed 3318.21 samples/sec   Loss 0.8143   LearningRate 0.0014   Epoch: 17   Global Step: 73040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:24,700-Speed 3317.33 samples/sec   Loss 0.8261   LearningRate 0.0014   Epoch: 17   Global Step: 73050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:27,796-Speed 3308.82 samples/sec   Loss 0.8548   LearningRate 0.0014   Epoch: 17   Global Step: 73060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:30,888-Speed 3312.74 samples/sec   Loss 0.8286   LearningRate 0.0014   Epoch: 17   Global Step: 73070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:33,991-Speed 3300.81 samples/sec   Loss 0.8296   LearningRate 0.0014   Epoch: 17   Global Step: 73080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:37,074-Speed 3321.80 samples/sec   Loss 0.8338   LearningRate 0.0014   Epoch: 17   Global Step: 73090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:40,156-Speed 3323.47 samples/sec   Loss 0.8294   LearningRate 0.0013   Epoch: 17   Global Step: 73100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:43,242-Speed 3319.22 samples/sec   Loss 0.8254   LearningRate 0.0013   Epoch: 17   Global Step: 73110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:46,326-Speed 3321.51 samples/sec   Loss 0.8455   LearningRate 0.0013   Epoch: 17   Global Step: 73120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:49,409-Speed 3321.80 samples/sec   Loss 0.8265   LearningRate 0.0013   Epoch: 17   Global Step: 73130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:52,494-Speed 3319.86 samples/sec   Loss 0.8012   LearningRate 0.0013   Epoch: 17   Global Step: 73140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:55,581-Speed 3318.33 samples/sec   Loss 0.8421   LearningRate 0.0013   Epoch: 17   Global Step: 73150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:00:58,671-Speed 3314.51 samples/sec   Loss 0.8210   LearningRate 0.0013   Epoch: 17   Global Step: 73160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:01,760-Speed 3315.55 samples/sec   Loss 0.8062   LearningRate 0.0013   Epoch: 17   Global Step: 73170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:04,854-Speed 3310.46 samples/sec   Loss 0.7850   LearningRate 0.0013   Epoch: 17   Global Step: 73180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:07,940-Speed 3319.54 samples/sec   Loss 0.8145   LearningRate 0.0013   Epoch: 17   Global Step: 73190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:11,033-Speed 3311.24 samples/sec   Loss 0.8215   LearningRate 0.0013   Epoch: 17   Global Step: 73200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:14,125-Speed 3312.17 samples/sec   Loss 0.7818   LearningRate 0.0013   Epoch: 17   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:17,214-Speed 3315.98 samples/sec   Loss 0.7946   LearningRate 0.0013   Epoch: 17   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:20,299-Speed 3320.22 samples/sec   Loss 0.8340   LearningRate 0.0013   Epoch: 17   Global Step: 73230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:23,394-Speed 3308.78 samples/sec   Loss 0.8326   LearningRate 0.0013   Epoch: 17   Global Step: 73240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:26,496-Speed 3301.70 samples/sec   Loss 0.8121   LearningRate 0.0013   Epoch: 17   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:29,590-Speed 3311.42 samples/sec   Loss 0.7820   LearningRate 0.0013   Epoch: 17   Global Step: 73260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:32,674-Speed 3321.27 samples/sec   Loss 0.8134   LearningRate 0.0013   Epoch: 17   Global Step: 73270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:01:35,745-Speed 3335.00 samples/sec   Loss 0.8302   LearningRate 0.0013   Epoch: 17   Global Step: 73280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:38,887-Speed 3259.60 samples/sec   Loss 0.8400   LearningRate 0.0013   Epoch: 17   Global Step: 73290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:41,977-Speed 3315.40 samples/sec   Loss 0.8598   LearningRate 0.0013   Epoch: 17   Global Step: 73300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:45,063-Speed 3319.66 samples/sec   Loss 0.8174   LearningRate 0.0013   Epoch: 17   Global Step: 73310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:48,146-Speed 3322.58 samples/sec   Loss 0.8081   LearningRate 0.0013   Epoch: 17   Global Step: 73320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:51,230-Speed 3320.62 samples/sec   Loss 0.7977   LearningRate 0.0013   Epoch: 17   Global Step: 73330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:54,327-Speed 3307.04 samples/sec   Loss 0.7984   LearningRate 0.0013   Epoch: 17   Global Step: 73340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:01:57,418-Speed 3313.63 samples/sec   Loss 0.8109   LearningRate 0.0013   Epoch: 17   Global Step: 73350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:02:00,508-Speed 3315.26 samples/sec   Loss 0.8602   LearningRate 0.0013   Epoch: 17   Global Step: 73360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:02:03,590-Speed 3323.08 samples/sec   Loss 0.8383   LearningRate 0.0013   Epoch: 17   Global Step: 73370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:02:06,676-Speed 3318.82 samples/sec   Loss 0.8131   LearningRate 0.0013   Epoch: 17   Global Step: 73380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:09,770-Speed 3310.89 samples/sec   Loss 0.8125   LearningRate 0.0013   Epoch: 17   Global Step: 73390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:12,855-Speed 3319.66 samples/sec   Loss 0.8333   LearningRate 0.0013   Epoch: 17   Global Step: 73400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:15,941-Speed 3319.20 samples/sec   Loss 0.8060   LearningRate 0.0013   Epoch: 17   Global Step: 73410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:19,021-Speed 3324.74 samples/sec   Loss 0.8322   LearningRate 0.0013   Epoch: 17   Global Step: 73420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:22,111-Speed 3314.71 samples/sec   Loss 0.8034   LearningRate 0.0013   Epoch: 17   Global Step: 73430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:25,201-Speed 3315.15 samples/sec   Loss 0.7849   LearningRate 0.0013   Epoch: 17   Global Step: 73440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:28,285-Speed 3321.03 samples/sec   Loss 0.8346   LearningRate 0.0013   Epoch: 17   Global Step: 73450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:31,371-Speed 3319.66 samples/sec   Loss 0.8112   LearningRate 0.0012   Epoch: 17   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:34,458-Speed 3317.05 samples/sec   Loss 0.8368   LearningRate 0.0012   Epoch: 17   Global Step: 73470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:37,535-Speed 3329.52 samples/sec   Loss 0.7883   LearningRate 0.0012   Epoch: 17   Global Step: 73480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:40,620-Speed 3319.78 samples/sec   Loss 0.8114   LearningRate 0.0012   Epoch: 17   Global Step: 73490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:43,707-Speed 3317.12 samples/sec   Loss 0.8174   LearningRate 0.0012   Epoch: 17   Global Step: 73500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:02:46,776-Speed 3337.41 samples/sec   Loss 0.8407   LearningRate 0.0012   Epoch: 17   Global Step: 73510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:02:49,860-Speed 3321.41 samples/sec   Loss 0.8295   LearningRate 0.0012   Epoch: 17   Global Step: 73520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:02:52,962-Speed 3301.37 samples/sec   Loss 0.8205   LearningRate 0.0012   Epoch: 17   Global Step: 73530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:02:56,050-Speed 3316.89 samples/sec   Loss 0.8361   LearningRate 0.0012   Epoch: 17   Global Step: 73540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:02:59,148-Speed 3306.51 samples/sec   Loss 0.8078   LearningRate 0.0012   Epoch: 17   Global Step: 73550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:02,244-Speed 3308.19 samples/sec   Loss 0.8116   LearningRate 0.0012   Epoch: 17   Global Step: 73560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:05,347-Speed 3302.36 samples/sec   Loss 0.8188   LearningRate 0.0012   Epoch: 17   Global Step: 73570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:08,434-Speed 3317.66 samples/sec   Loss 0.8360   LearningRate 0.0012   Epoch: 17   Global Step: 73580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:11,517-Speed 3322.25 samples/sec   Loss 0.8079   LearningRate 0.0012   Epoch: 17   Global Step: 73590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:14,608-Speed 3313.83 samples/sec   Loss 0.8329   LearningRate 0.0012   Epoch: 17   Global Step: 73600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:17,682-Speed 3331.68 samples/sec   Loss 0.8264   LearningRate 0.0012   Epoch: 17   Global Step: 73610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:20,770-Speed 3317.02 samples/sec   Loss 0.8449   LearningRate 0.0012   Epoch: 17   Global Step: 73620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:23,867-Speed 3306.76 samples/sec   Loss 0.8282   LearningRate 0.0012   Epoch: 17   Global Step: 73630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:26,980-Speed 3289.90 samples/sec   Loss 0.8146   LearningRate 0.0012   Epoch: 17   Global Step: 73640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:30,073-Speed 3312.33 samples/sec   Loss 0.8231   LearningRate 0.0012   Epoch: 17   Global Step: 73650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:33,159-Speed 3318.69 samples/sec   Loss 0.8513   LearningRate 0.0012   Epoch: 17   Global Step: 73660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:36,251-Speed 3313.19 samples/sec   Loss 0.8437   LearningRate 0.0012   Epoch: 17   Global Step: 73670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:39,341-Speed 3314.17 samples/sec   Loss 0.8166   LearningRate 0.0012   Epoch: 17   Global Step: 73680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:42,427-Speed 3319.04 samples/sec   Loss 0.8192   LearningRate 0.0012   Epoch: 17   Global Step: 73690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:45,520-Speed 3311.02 samples/sec   Loss 0.8170   LearningRate 0.0012   Epoch: 17   Global Step: 73700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:03:48,617-Speed 3307.64 samples/sec   Loss 0.8147   LearningRate 0.0012   Epoch: 17   Global Step: 73710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:03:51,736-Speed 3284.54 samples/sec   Loss 0.8393   LearningRate 0.0012   Epoch: 17   Global Step: 73720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:03:54,832-Speed 3308.28 samples/sec   Loss 0.8316   LearningRate 0.0012   Epoch: 17   Global Step: 73730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:03:57,904-Speed 3333.60 samples/sec   Loss 0.8265   LearningRate 0.0012   Epoch: 17   Global Step: 73740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:01,001-Speed 3306.80 samples/sec   Loss 0.8314   LearningRate 0.0012   Epoch: 17   Global Step: 73750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:04,092-Speed 3314.32 samples/sec   Loss 0.8135   LearningRate 0.0012   Epoch: 17   Global Step: 73760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:07,191-Speed 3305.47 samples/sec   Loss 0.8259   LearningRate 0.0012   Epoch: 17   Global Step: 73770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:10,291-Speed 3303.41 samples/sec   Loss 0.7907   LearningRate 0.0012   Epoch: 17   Global Step: 73780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:13,447-Speed 3246.03 samples/sec   Loss 0.8693   LearningRate 0.0012   Epoch: 17   Global Step: 73790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:16,552-Speed 3298.45 samples/sec   Loss 0.8462   LearningRate 0.0012   Epoch: 17   Global Step: 73800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:19,644-Speed 3312.41 samples/sec   Loss 0.8255   LearningRate 0.0012   Epoch: 17   Global Step: 73810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:22,727-Speed 3321.94 samples/sec   Loss 0.8174   LearningRate 0.0012   Epoch: 17   Global Step: 73820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:25,819-Speed 3313.12 samples/sec   Loss 0.8115   LearningRate 0.0012   Epoch: 17   Global Step: 73830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:28,911-Speed 3312.17 samples/sec   Loss 0.8271   LearningRate 0.0011   Epoch: 17   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:32,037-Speed 3276.83 samples/sec   Loss 0.8761   LearningRate 0.0011   Epoch: 17   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:35,121-Speed 3321.45 samples/sec   Loss 0.8207   LearningRate 0.0011   Epoch: 17   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:38,225-Speed 3299.07 samples/sec   Loss 0.8185   LearningRate 0.0011   Epoch: 17   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:41,309-Speed 3321.48 samples/sec   Loss 0.8104   LearningRate 0.0011   Epoch: 17   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:44,395-Speed 3319.69 samples/sec   Loss 0.8076   LearningRate 0.0011   Epoch: 17   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:47,486-Speed 3312.91 samples/sec   Loss 0.7885   LearningRate 0.0011   Epoch: 17   Global Step: 73900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:50,570-Speed 3321.00 samples/sec   Loss 0.8522   LearningRate 0.0011   Epoch: 17   Global Step: 73910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:53,660-Speed 3315.71 samples/sec   Loss 0.8838   LearningRate 0.0011   Epoch: 17   Global Step: 73920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:04:56,729-Speed 3336.28 samples/sec   Loss 0.8291   LearningRate 0.0011   Epoch: 17   Global Step: 73930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:04:59,818-Speed 3316.38 samples/sec   Loss 0.8061   LearningRate 0.0011   Epoch: 17   Global Step: 73940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:05:02,910-Speed 3312.66 samples/sec   Loss 0.8232   LearningRate 0.0011   Epoch: 17   Global Step: 73950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:05:05,996-Speed 3319.44 samples/sec   Loss 0.8059   LearningRate 0.0011   Epoch: 17   Global Step: 73960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:05:09,082-Speed 3318.68 samples/sec   Loss 0.8080   LearningRate 0.0011   Epoch: 17   Global Step: 73970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:05:12,173-Speed 3313.68 samples/sec   Loss 0.8598   LearningRate 0.0011   Epoch: 17   Global Step: 73980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:05:15,259-Speed 3319.59 samples/sec   Loss 0.8036   LearningRate 0.0011   Epoch: 17   Global Step: 73990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:05:18,352-Speed 3310.85 samples/sec   Loss 0.8294   LearningRate 0.0011   Epoch: 17   Global Step: 74000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:06:01,887-[lfw][74000]XNorm: 21.599039
Training: 2022-04-26 20:06:01,887-[lfw][74000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-26 20:06:01,888-[lfw][74000]Accuracy-Highest: 0.99850
Training: 2022-04-26 20:06:52,452-[cfp_fp][74000]XNorm: 22.344805
Training: 2022-04-26 20:06:52,453-[cfp_fp][74000]Accuracy-Flip: 0.99200+-0.00504
Training: 2022-04-26 20:06:52,453-[cfp_fp][74000]Accuracy-Highest: 0.99286
Training: 2022-04-26 20:07:36,026-[agedb_30][74000]XNorm: 22.294547
Training: 2022-04-26 20:07:36,027-[agedb_30][74000]Accuracy-Flip: 0.97767+-0.00646
Training: 2022-04-26 20:07:36,027-[agedb_30][74000]Accuracy-Highest: 0.97950
Training: 2022-04-26 20:07:39,117-Speed 72.75 samples/sec   Loss 0.7962   LearningRate 0.0011   Epoch: 17   Global Step: 74010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:07:42,187-Speed 3336.31 samples/sec   Loss 0.8198   LearningRate 0.0011   Epoch: 17   Global Step: 74020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:07:45,276-Speed 3316.23 samples/sec   Loss 0.7957   LearningRate 0.0011   Epoch: 17   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:07:48,337-Speed 3345.71 samples/sec   Loss 0.8396   LearningRate 0.0011   Epoch: 17   Global Step: 74040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:07:51,416-Speed 3326.68 samples/sec   Loss 0.8262   LearningRate 0.0011   Epoch: 17   Global Step: 74050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:07:54,509-Speed 3310.81 samples/sec   Loss 0.8322   LearningRate 0.0011   Epoch: 17   Global Step: 74060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:07:57,594-Speed 3320.52 samples/sec   Loss 0.8125   LearningRate 0.0011   Epoch: 17   Global Step: 74070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:00,670-Speed 3329.20 samples/sec   Loss 0.8437   LearningRate 0.0011   Epoch: 17   Global Step: 74080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:03,757-Speed 3318.28 samples/sec   Loss 0.8169   LearningRate 0.0011   Epoch: 17   Global Step: 74090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:06,845-Speed 3316.80 samples/sec   Loss 0.8100   LearningRate 0.0011   Epoch: 17   Global Step: 74100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:10,002-Speed 3244.48 samples/sec   Loss 0.8258   LearningRate 0.0011   Epoch: 17   Global Step: 74110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:13,174-Speed 3228.10 samples/sec   Loss 0.8289   LearningRate 0.0011   Epoch: 17   Global Step: 74120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:16,278-Speed 3300.00 samples/sec   Loss 0.8393   LearningRate 0.0011   Epoch: 17   Global Step: 74130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:19,342-Speed 3343.77 samples/sec   Loss 0.8237   LearningRate 0.0011   Epoch: 17   Global Step: 74140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:22,424-Speed 3323.46 samples/sec   Loss 0.8079   LearningRate 0.0011   Epoch: 17   Global Step: 74150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:25,507-Speed 3321.33 samples/sec   Loss 0.8218   LearningRate 0.0011   Epoch: 17   Global Step: 74160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:28,593-Speed 3319.01 samples/sec   Loss 0.8313   LearningRate 0.0011   Epoch: 17   Global Step: 74170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:31,671-Speed 3328.63 samples/sec   Loss 0.8182   LearningRate 0.0011   Epoch: 17   Global Step: 74180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:34,786-Speed 3287.62 samples/sec   Loss 0.8431   LearningRate 0.0011   Epoch: 17   Global Step: 74190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:08:37,852-Speed 3340.80 samples/sec   Loss 0.8014   LearningRate 0.0011   Epoch: 17   Global Step: 74200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:08:40,976-Speed 3278.96 samples/sec   Loss 0.7985   LearningRate 0.0011   Epoch: 17   Global Step: 74210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:08:44,113-Speed 3265.14 samples/sec   Loss 0.8241   LearningRate 0.0011   Epoch: 17   Global Step: 74220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:08:47,212-Speed 3304.67 samples/sec   Loss 0.7808   LearningRate 0.0010   Epoch: 17   Global Step: 74230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:08:50,291-Speed 3327.16 samples/sec   Loss 0.8366   LearningRate 0.0010   Epoch: 17   Global Step: 74240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:08:53,390-Speed 3304.44 samples/sec   Loss 0.7975   LearningRate 0.0010   Epoch: 17   Global Step: 74250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:08:56,469-Speed 3326.53 samples/sec   Loss 0.8337   LearningRate 0.0010   Epoch: 17   Global Step: 74260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:08:59,547-Speed 3327.20 samples/sec   Loss 0.8426   LearningRate 0.0010   Epoch: 17   Global Step: 74270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:09:02,635-Speed 3317.49 samples/sec   Loss 0.8036   LearningRate 0.0010   Epoch: 17   Global Step: 74280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:09:05,714-Speed 3326.76 samples/sec   Loss 0.8206   LearningRate 0.0010   Epoch: 17   Global Step: 74290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-26 20:09:08,789-Speed 3330.99 samples/sec   Loss 0.8284   LearningRate 0.0010   Epoch: 17   Global Step: 74300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:11,866-Speed 3327.94 samples/sec   Loss 0.8010   LearningRate 0.0010   Epoch: 17   Global Step: 74310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:14,944-Speed 3328.16 samples/sec   Loss 0.7912   LearningRate 0.0010   Epoch: 17   Global Step: 74320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:18,029-Speed 3320.32 samples/sec   Loss 0.8506   LearningRate 0.0010   Epoch: 17   Global Step: 74330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:21,107-Speed 3327.60 samples/sec   Loss 0.8235   LearningRate 0.0010   Epoch: 17   Global Step: 74340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:24,196-Speed 3315.21 samples/sec   Loss 0.8148   LearningRate 0.0010   Epoch: 17   Global Step: 74350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:27,274-Speed 3327.42 samples/sec   Loss 0.8094   LearningRate 0.0010   Epoch: 17   Global Step: 74360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:30,351-Speed 3329.05 samples/sec   Loss 0.8368   LearningRate 0.0010   Epoch: 17   Global Step: 74370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:33,426-Speed 3330.19 samples/sec   Loss 0.8270   LearningRate 0.0010   Epoch: 17   Global Step: 74380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:36,501-Speed 3331.55 samples/sec   Loss 0.7828   LearningRate 0.0010   Epoch: 17   Global Step: 74390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:09:39,576-Speed 3330.88 samples/sec   Loss 0.8435   LearningRate 0.0010   Epoch: 17   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:09:42,651-Speed 3331.21 samples/sec   Loss 0.7931   LearningRate 0.0010   Epoch: 17   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:09:45,782-Speed 3271.48 samples/sec   Loss 0.8147   LearningRate 0.0010   Epoch: 17   Global Step: 74420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:09:48,964-Speed 3218.54 samples/sec   Loss 0.8109   LearningRate 0.0010   Epoch: 17   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:01,277-Speed 831.76 samples/sec   Loss 0.5476   LearningRate 0.0010   Epoch: 18   Global Step: 74440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:04,352-Speed 3331.23 samples/sec   Loss 0.5304   LearningRate 0.0010   Epoch: 18   Global Step: 74450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:07,432-Speed 3324.62 samples/sec   Loss 0.5294   LearningRate 0.0010   Epoch: 18   Global Step: 74460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:10,514-Speed 3322.96 samples/sec   Loss 0.5140   LearningRate 0.0010   Epoch: 18   Global Step: 74470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:13,654-Speed 3262.14 samples/sec   Loss 0.5541   LearningRate 0.0010   Epoch: 18   Global Step: 74480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:16,784-Speed 3272.24 samples/sec   Loss 0.5625   LearningRate 0.0010   Epoch: 18   Global Step: 74490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:19,854-Speed 3336.93 samples/sec   Loss 0.5440   LearningRate 0.0010   Epoch: 18   Global Step: 74500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:22,934-Speed 3324.97 samples/sec   Loss 0.5463   LearningRate 0.0010   Epoch: 18   Global Step: 74510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:26,014-Speed 3325.78 samples/sec   Loss 0.5547   LearningRate 0.0010   Epoch: 18   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:29,095-Speed 3323.91 samples/sec   Loss 0.5944   LearningRate 0.0010   Epoch: 18   Global Step: 74530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:32,169-Speed 3332.16 samples/sec   Loss 0.5577   LearningRate 0.0010   Epoch: 18   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:35,358-Speed 3211.76 samples/sec   Loss 0.5558   LearningRate 0.0010   Epoch: 18   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:38,488-Speed 3272.43 samples/sec   Loss 0.6051   LearningRate 0.0010   Epoch: 18   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:41,613-Speed 3276.63 samples/sec   Loss 0.5293   LearningRate 0.0010   Epoch: 18   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:44,692-Speed 3327.10 samples/sec   Loss 0.5513   LearningRate 0.0010   Epoch: 18   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:47,775-Speed 3322.93 samples/sec   Loss 0.5735   LearningRate 0.0010   Epoch: 18   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:50,873-Speed 3317.08 samples/sec   Loss 0.5873   LearningRate 0.0010   Epoch: 18   Global Step: 74600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 20:10:53,973-Speed 3304.12 samples/sec   Loss 0.5519   LearningRate 0.0010   Epoch: 18   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:10:57,062-Speed 3315.71 samples/sec   Loss 0.5369   LearningRate 0.0010   Epoch: 18   Global Step: 74620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:00,132-Speed 3335.85 samples/sec   Loss 0.5590   LearningRate 0.0010   Epoch: 18   Global Step: 74630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:03,216-Speed 3321.89 samples/sec   Loss 0.5357   LearningRate 0.0009   Epoch: 18   Global Step: 74640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:06,297-Speed 3324.50 samples/sec   Loss 0.5753   LearningRate 0.0009   Epoch: 18   Global Step: 74650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:09,391-Speed 3310.28 samples/sec   Loss 0.5684   LearningRate 0.0009   Epoch: 18   Global Step: 74660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:12,478-Speed 3318.74 samples/sec   Loss 0.5565   LearningRate 0.0009   Epoch: 18   Global Step: 74670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:15,558-Speed 3325.46 samples/sec   Loss 0.5399   LearningRate 0.0009   Epoch: 18   Global Step: 74680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:18,639-Speed 3323.91 samples/sec   Loss 0.5664   LearningRate 0.0009   Epoch: 18   Global Step: 74690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:21,744-Speed 3298.90 samples/sec   Loss 0.5531   LearningRate 0.0009   Epoch: 18   Global Step: 74700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:24,852-Speed 3295.35 samples/sec   Loss 0.5483   LearningRate 0.0009   Epoch: 18   Global Step: 74710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:28,031-Speed 3221.85 samples/sec   Loss 0.5513   LearningRate 0.0009   Epoch: 18   Global Step: 74720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:11:31,116-Speed 3319.32 samples/sec   Loss 0.5746   LearningRate 0.0009   Epoch: 18   Global Step: 74730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:34,211-Speed 3309.31 samples/sec   Loss 0.5423   LearningRate 0.0009   Epoch: 18   Global Step: 74740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:37,293-Speed 3322.99 samples/sec   Loss 0.5535   LearningRate 0.0009   Epoch: 18   Global Step: 74750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:40,382-Speed 3316.19 samples/sec   Loss 0.5620   LearningRate 0.0009   Epoch: 18   Global Step: 74760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:43,471-Speed 3316.03 samples/sec   Loss 0.6080   LearningRate 0.0009   Epoch: 18   Global Step: 74770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:46,555-Speed 3321.04 samples/sec   Loss 0.5571   LearningRate 0.0009   Epoch: 18   Global Step: 74780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:49,646-Speed 3313.27 samples/sec   Loss 0.5618   LearningRate 0.0009   Epoch: 18   Global Step: 74790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:52,738-Speed 3312.28 samples/sec   Loss 0.5757   LearningRate 0.0009   Epoch: 18   Global Step: 74800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:55,823-Speed 3320.26 samples/sec   Loss 0.5893   LearningRate 0.0009   Epoch: 18   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:11:58,904-Speed 3323.71 samples/sec   Loss 0.5855   LearningRate 0.0009   Epoch: 18   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:01,991-Speed 3318.62 samples/sec   Loss 0.5542   LearningRate 0.0009   Epoch: 18   Global Step: 74830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 20:12:05,063-Speed 3333.74 samples/sec   Loss 0.5559   LearningRate 0.0009   Epoch: 18   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:08,145-Speed 3323.58 samples/sec   Loss 0.5339   LearningRate 0.0009   Epoch: 18   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:11,234-Speed 3315.25 samples/sec   Loss 0.5306   LearningRate 0.0009   Epoch: 18   Global Step: 74860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:14,325-Speed 3314.59 samples/sec   Loss 0.5622   LearningRate 0.0009   Epoch: 18   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:17,408-Speed 3321.87 samples/sec   Loss 0.5600   LearningRate 0.0009   Epoch: 18   Global Step: 74880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:20,510-Speed 3301.24 samples/sec   Loss 0.5444   LearningRate 0.0009   Epoch: 18   Global Step: 74890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:23,599-Speed 3316.29 samples/sec   Loss 0.5455   LearningRate 0.0009   Epoch: 18   Global Step: 74900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:26,700-Speed 3302.54 samples/sec   Loss 0.5889   LearningRate 0.0009   Epoch: 18   Global Step: 74910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:29,789-Speed 3316.01 samples/sec   Loss 0.5807   LearningRate 0.0009   Epoch: 18   Global Step: 74920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:32,888-Speed 3304.30 samples/sec   Loss 0.5730   LearningRate 0.0009   Epoch: 18   Global Step: 74930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:35,960-Speed 3334.54 samples/sec   Loss 0.5779   LearningRate 0.0009   Epoch: 18   Global Step: 74940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:39,105-Speed 3256.56 samples/sec   Loss 0.6015   LearningRate 0.0009   Epoch: 18   Global Step: 74950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:42,202-Speed 3307.17 samples/sec   Loss 0.5328   LearningRate 0.0009   Epoch: 18   Global Step: 74960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:45,290-Speed 3317.02 samples/sec   Loss 0.5505   LearningRate 0.0009   Epoch: 18   Global Step: 74970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:48,376-Speed 3319.28 samples/sec   Loss 0.5687   LearningRate 0.0009   Epoch: 18   Global Step: 74980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:51,465-Speed 3315.43 samples/sec   Loss 0.5667   LearningRate 0.0009   Epoch: 18   Global Step: 74990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:54,590-Speed 3277.72 samples/sec   Loss 0.5652   LearningRate 0.0009   Epoch: 18   Global Step: 75000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:12:57,684-Speed 3310.24 samples/sec   Loss 0.5572   LearningRate 0.0009   Epoch: 18   Global Step: 75010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:00,808-Speed 3278.56 samples/sec   Loss 0.5735   LearningRate 0.0009   Epoch: 18   Global Step: 75020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:03,915-Speed 3296.79 samples/sec   Loss 0.5245   LearningRate 0.0009   Epoch: 18   Global Step: 75030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:07,005-Speed 3314.19 samples/sec   Loss 0.5809   LearningRate 0.0009   Epoch: 18   Global Step: 75040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:10,091-Speed 3319.79 samples/sec   Loss 0.5218   LearningRate 0.0009   Epoch: 18   Global Step: 75050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:13,174-Speed 3321.92 samples/sec   Loss 0.5815   LearningRate 0.0009   Epoch: 18   Global Step: 75060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:16,256-Speed 3323.03 samples/sec   Loss 0.5594   LearningRate 0.0009   Epoch: 18   Global Step: 75070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:19,342-Speed 3319.43 samples/sec   Loss 0.5406   LearningRate 0.0008   Epoch: 18   Global Step: 75080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:22,435-Speed 3311.07 samples/sec   Loss 0.5833   LearningRate 0.0008   Epoch: 18   Global Step: 75090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:25,515-Speed 3325.14 samples/sec   Loss 0.5526   LearningRate 0.0008   Epoch: 18   Global Step: 75100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:28,603-Speed 3317.42 samples/sec   Loss 0.5832   LearningRate 0.0008   Epoch: 18   Global Step: 75110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:13:31,684-Speed 3323.77 samples/sec   Loss 0.5604   LearningRate 0.0008   Epoch: 18   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:34,766-Speed 3323.54 samples/sec   Loss 0.5669   LearningRate 0.0008   Epoch: 18   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:37,851-Speed 3319.37 samples/sec   Loss 0.5675   LearningRate 0.0008   Epoch: 18   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:40,932-Speed 3324.81 samples/sec   Loss 0.5791   LearningRate 0.0008   Epoch: 18   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:44,011-Speed 3326.63 samples/sec   Loss 0.5698   LearningRate 0.0008   Epoch: 18   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:47,115-Speed 3299.81 samples/sec   Loss 0.6030   LearningRate 0.0008   Epoch: 18   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:50,202-Speed 3318.46 samples/sec   Loss 0.5552   LearningRate 0.0008   Epoch: 18   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:53,330-Speed 3274.19 samples/sec   Loss 0.5299   LearningRate 0.0008   Epoch: 18   Global Step: 75190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:56,418-Speed 3315.95 samples/sec   Loss 0.5689   LearningRate 0.0008   Epoch: 18   Global Step: 75200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:13:59,507-Speed 3316.27 samples/sec   Loss 0.5701   LearningRate 0.0008   Epoch: 18   Global Step: 75210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:14:02,576-Speed 3337.01 samples/sec   Loss 0.5617   LearningRate 0.0008   Epoch: 18   Global Step: 75220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:14:05,662-Speed 3318.70 samples/sec   Loss 0.5479   LearningRate 0.0008   Epoch: 18   Global Step: 75230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:14:08,747-Speed 3319.95 samples/sec   Loss 0.5786   LearningRate 0.0008   Epoch: 18   Global Step: 75240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:11,828-Speed 3324.30 samples/sec   Loss 0.5363   LearningRate 0.0008   Epoch: 18   Global Step: 75250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:14,921-Speed 3311.48 samples/sec   Loss 0.5843   LearningRate 0.0008   Epoch: 18   Global Step: 75260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:18,014-Speed 3311.57 samples/sec   Loss 0.5569   LearningRate 0.0008   Epoch: 18   Global Step: 75270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:21,096-Speed 3323.78 samples/sec   Loss 0.5867   LearningRate 0.0008   Epoch: 18   Global Step: 75280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:24,179-Speed 3321.90 samples/sec   Loss 0.5434   LearningRate 0.0008   Epoch: 18   Global Step: 75290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:27,259-Speed 3325.36 samples/sec   Loss 0.5630   LearningRate 0.0008   Epoch: 18   Global Step: 75300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:30,343-Speed 3321.20 samples/sec   Loss 0.5901   LearningRate 0.0008   Epoch: 18   Global Step: 75310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:33,430-Speed 3318.37 samples/sec   Loss 0.5762   LearningRate 0.0008   Epoch: 18   Global Step: 75320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:36,537-Speed 3296.78 samples/sec   Loss 0.5796   LearningRate 0.0008   Epoch: 18   Global Step: 75330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:39,622-Speed 3319.95 samples/sec   Loss 0.5893   LearningRate 0.0008   Epoch: 18   Global Step: 75340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:14:42,710-Speed 3316.77 samples/sec   Loss 0.5425   LearningRate 0.0008   Epoch: 18   Global Step: 75350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:14:45,781-Speed 3335.56 samples/sec   Loss 0.6193   LearningRate 0.0008   Epoch: 18   Global Step: 75360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:48,878-Speed 3306.94 samples/sec   Loss 0.5788   LearningRate 0.0008   Epoch: 18   Global Step: 75370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:52,002-Speed 3278.90 samples/sec   Loss 0.5509   LearningRate 0.0008   Epoch: 18   Global Step: 75380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:55,091-Speed 3315.21 samples/sec   Loss 0.5800   LearningRate 0.0008   Epoch: 18   Global Step: 75390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:14:58,179-Speed 3316.83 samples/sec   Loss 0.5739   LearningRate 0.0008   Epoch: 18   Global Step: 75400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:01,266-Speed 3317.99 samples/sec   Loss 0.5503   LearningRate 0.0008   Epoch: 18   Global Step: 75410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:04,364-Speed 3305.83 samples/sec   Loss 0.6163   LearningRate 0.0008   Epoch: 18   Global Step: 75420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:07,477-Speed 3290.36 samples/sec   Loss 0.5805   LearningRate 0.0008   Epoch: 18   Global Step: 75430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:10,570-Speed 3311.43 samples/sec   Loss 0.5633   LearningRate 0.0008   Epoch: 18   Global Step: 75440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:13,659-Speed 3316.22 samples/sec   Loss 0.5513   LearningRate 0.0008   Epoch: 18   Global Step: 75450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:16,744-Speed 3319.85 samples/sec   Loss 0.5749   LearningRate 0.0008   Epoch: 18   Global Step: 75460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:19,827-Speed 3322.03 samples/sec   Loss 0.5780   LearningRate 0.0008   Epoch: 18   Global Step: 75470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:22,914-Speed 3317.54 samples/sec   Loss 0.5557   LearningRate 0.0008   Epoch: 18   Global Step: 75480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:26,000-Speed 3319.55 samples/sec   Loss 0.5650   LearningRate 0.0008   Epoch: 18   Global Step: 75490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:29,083-Speed 3322.27 samples/sec   Loss 0.5914   LearningRate 0.0008   Epoch: 18   Global Step: 75500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:32,166-Speed 3321.29 samples/sec   Loss 0.5461   LearningRate 0.0008   Epoch: 18   Global Step: 75510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:35,257-Speed 3313.81 samples/sec   Loss 0.5374   LearningRate 0.0008   Epoch: 18   Global Step: 75520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:38,344-Speed 3318.50 samples/sec   Loss 0.5482   LearningRate 0.0008   Epoch: 18   Global Step: 75530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:41,433-Speed 3316.63 samples/sec   Loss 0.6081   LearningRate 0.0007   Epoch: 18   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:44,516-Speed 3321.68 samples/sec   Loss 0.5653   LearningRate 0.0007   Epoch: 18   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:15:47,586-Speed 3336.29 samples/sec   Loss 0.5690   LearningRate 0.0007   Epoch: 18   Global Step: 75560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:50,674-Speed 3316.84 samples/sec   Loss 0.5717   LearningRate 0.0007   Epoch: 18   Global Step: 75570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:53,763-Speed 3315.14 samples/sec   Loss 0.5832   LearningRate 0.0007   Epoch: 18   Global Step: 75580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:56,862-Speed 3305.79 samples/sec   Loss 0.5664   LearningRate 0.0007   Epoch: 18   Global Step: 75590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:15:59,958-Speed 3307.67 samples/sec   Loss 0.5591   LearningRate 0.0007   Epoch: 18   Global Step: 75600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:16:03,048-Speed 3315.36 samples/sec   Loss 0.5835   LearningRate 0.0007   Epoch: 18   Global Step: 75610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:16:06,136-Speed 3316.64 samples/sec   Loss 0.5667   LearningRate 0.0007   Epoch: 18   Global Step: 75620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:16:09,231-Speed 3309.05 samples/sec   Loss 0.5644   LearningRate 0.0007   Epoch: 18   Global Step: 75630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:16:12,319-Speed 3317.04 samples/sec   Loss 0.5784   LearningRate 0.0007   Epoch: 18   Global Step: 75640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:16:15,402-Speed 3322.14 samples/sec   Loss 0.5643   LearningRate 0.0007   Epoch: 18   Global Step: 75650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:16:18,489-Speed 3317.98 samples/sec   Loss 0.5655   LearningRate 0.0007   Epoch: 18   Global Step: 75660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:21,573-Speed 3320.75 samples/sec   Loss 0.5657   LearningRate 0.0007   Epoch: 18   Global Step: 75670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:24,668-Speed 3309.61 samples/sec   Loss 0.5833   LearningRate 0.0007   Epoch: 18   Global Step: 75680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:27,754-Speed 3318.77 samples/sec   Loss 0.5660   LearningRate 0.0007   Epoch: 18   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:30,841-Speed 3318.15 samples/sec   Loss 0.5794   LearningRate 0.0007   Epoch: 18   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:33,930-Speed 3316.42 samples/sec   Loss 0.6052   LearningRate 0.0007   Epoch: 18   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:37,017-Speed 3317.23 samples/sec   Loss 0.5815   LearningRate 0.0007   Epoch: 18   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:40,105-Speed 3316.65 samples/sec   Loss 0.5604   LearningRate 0.0007   Epoch: 18   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:43,195-Speed 3314.77 samples/sec   Loss 0.5943   LearningRate 0.0007   Epoch: 18   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:46,283-Speed 3317.66 samples/sec   Loss 0.5902   LearningRate 0.0007   Epoch: 18   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:49,353-Speed 3335.62 samples/sec   Loss 0.5678   LearningRate 0.0007   Epoch: 18   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:52,573-Speed 3180.59 samples/sec   Loss 0.5851   LearningRate 0.0007   Epoch: 18   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:55,655-Speed 3323.36 samples/sec   Loss 0.5776   LearningRate 0.0007   Epoch: 18   Global Step: 75780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:16:58,749-Speed 3310.83 samples/sec   Loss 0.5694   LearningRate 0.0007   Epoch: 18   Global Step: 75790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:01,834-Speed 3319.77 samples/sec   Loss 0.5939   LearningRate 0.0007   Epoch: 18   Global Step: 75800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:04,909-Speed 3330.52 samples/sec   Loss 0.5645   LearningRate 0.0007   Epoch: 18   Global Step: 75810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:07,998-Speed 3315.20 samples/sec   Loss 0.5804   LearningRate 0.0007   Epoch: 18   Global Step: 75820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:11,089-Speed 3314.47 samples/sec   Loss 0.5646   LearningRate 0.0007   Epoch: 18   Global Step: 75830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:14,177-Speed 3316.71 samples/sec   Loss 0.5780   LearningRate 0.0007   Epoch: 18   Global Step: 75840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:17,270-Speed 3312.00 samples/sec   Loss 0.5595   LearningRate 0.0007   Epoch: 18   Global Step: 75850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:20,362-Speed 3311.87 samples/sec   Loss 0.5765   LearningRate 0.0007   Epoch: 18   Global Step: 75860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:23,453-Speed 3313.57 samples/sec   Loss 0.5568   LearningRate 0.0007   Epoch: 18   Global Step: 75870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:26,551-Speed 3306.10 samples/sec   Loss 0.5937   LearningRate 0.0007   Epoch: 18   Global Step: 75880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:29,678-Speed 3275.35 samples/sec   Loss 0.5716   LearningRate 0.0007   Epoch: 18   Global Step: 75890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:32,764-Speed 3318.78 samples/sec   Loss 0.5728   LearningRate 0.0007   Epoch: 18   Global Step: 75900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:17:35,850-Speed 3319.80 samples/sec   Loss 0.5644   LearningRate 0.0007   Epoch: 18   Global Step: 75910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:38,944-Speed 3310.23 samples/sec   Loss 0.5952   LearningRate 0.0007   Epoch: 18   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:42,032-Speed 3316.62 samples/sec   Loss 0.5900   LearningRate 0.0007   Epoch: 18   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:45,114-Speed 3322.83 samples/sec   Loss 0.5957   LearningRate 0.0007   Epoch: 18   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:48,246-Speed 3270.62 samples/sec   Loss 0.5527   LearningRate 0.0007   Epoch: 18   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:51,384-Speed 3263.83 samples/sec   Loss 0.5887   LearningRate 0.0007   Epoch: 18   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:54,520-Speed 3265.92 samples/sec   Loss 0.5776   LearningRate 0.0007   Epoch: 18   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:17:57,608-Speed 3317.46 samples/sec   Loss 0.5895   LearningRate 0.0007   Epoch: 18   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:18:00,693-Speed 3319.40 samples/sec   Loss 0.5717   LearningRate 0.0007   Epoch: 18   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:18:03,778-Speed 3320.60 samples/sec   Loss 0.5649   LearningRate 0.0007   Epoch: 18   Global Step: 76000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:18:47,320-[lfw][76000]XNorm: 20.971660
Training: 2022-04-26 20:18:47,320-[lfw][76000]Accuracy-Flip: 0.99833+-0.00224
Training: 2022-04-26 20:18:47,321-[lfw][76000]Accuracy-Highest: 0.99850
Training: 2022-04-26 20:19:38,142-[cfp_fp][76000]XNorm: 21.790617
Training: 2022-04-26 20:19:38,142-[cfp_fp][76000]Accuracy-Flip: 0.99357+-0.00448
Training: 2022-04-26 20:19:38,143-[cfp_fp][76000]Accuracy-Highest: 0.99357
Training: 2022-04-26 20:20:21,843-[agedb_30][76000]XNorm: 21.695545
Training: 2022-04-26 20:20:21,844-[agedb_30][76000]Accuracy-Flip: 0.97767+-0.00646
Training: 2022-04-26 20:20:21,844-[agedb_30][76000]Accuracy-Highest: 0.97950
Training: 2022-04-26 20:20:24,923-Speed 72.55 samples/sec   Loss 0.5999   LearningRate 0.0007   Epoch: 18   Global Step: 76010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 20:20:27,984-Speed 3345.58 samples/sec   Loss 0.5789   LearningRate 0.0007   Epoch: 18   Global Step: 76020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:31,068-Speed 3321.27 samples/sec   Loss 0.5706   LearningRate 0.0007   Epoch: 18   Global Step: 76030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:34,150-Speed 3323.56 samples/sec   Loss 0.5768   LearningRate 0.0006   Epoch: 18   Global Step: 76040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:37,229-Speed 3325.84 samples/sec   Loss 0.6011   LearningRate 0.0006   Epoch: 18   Global Step: 76050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:40,324-Speed 3309.66 samples/sec   Loss 0.5801   LearningRate 0.0006   Epoch: 18   Global Step: 76060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:43,412-Speed 3317.82 samples/sec   Loss 0.5754   LearningRate 0.0006   Epoch: 18   Global Step: 76070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:46,497-Speed 3319.62 samples/sec   Loss 0.5916   LearningRate 0.0006   Epoch: 18   Global Step: 76080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:49,585-Speed 3316.77 samples/sec   Loss 0.5748   LearningRate 0.0006   Epoch: 18   Global Step: 76090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:52,671-Speed 3319.23 samples/sec   Loss 0.5566   LearningRate 0.0006   Epoch: 18   Global Step: 76100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:55,752-Speed 3323.54 samples/sec   Loss 0.5685   LearningRate 0.0006   Epoch: 18   Global Step: 76110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:20:58,831-Speed 3327.35 samples/sec   Loss 0.5971   LearningRate 0.0006   Epoch: 18   Global Step: 76120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:21:01,919-Speed 3316.46 samples/sec   Loss 0.5911   LearningRate 0.0006   Epoch: 18   Global Step: 76130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:21:04,984-Speed 3341.06 samples/sec   Loss 0.5564   LearningRate 0.0006   Epoch: 18   Global Step: 76140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:08,066-Speed 3323.72 samples/sec   Loss 0.5730   LearningRate 0.0006   Epoch: 18   Global Step: 76150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:11,153-Speed 3317.73 samples/sec   Loss 0.5681   LearningRate 0.0006   Epoch: 18   Global Step: 76160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:14,239-Speed 3319.37 samples/sec   Loss 0.5539   LearningRate 0.0006   Epoch: 18   Global Step: 76170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:17,321-Speed 3322.81 samples/sec   Loss 0.5650   LearningRate 0.0006   Epoch: 18   Global Step: 76180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:20,399-Speed 3327.72 samples/sec   Loss 0.5590   LearningRate 0.0006   Epoch: 18   Global Step: 76190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:23,503-Speed 3299.18 samples/sec   Loss 0.5769   LearningRate 0.0006   Epoch: 18   Global Step: 76200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:26,585-Speed 3323.33 samples/sec   Loss 0.5546   LearningRate 0.0006   Epoch: 18   Global Step: 76210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:29,661-Speed 3330.36 samples/sec   Loss 0.5714   LearningRate 0.0006   Epoch: 18   Global Step: 76220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:32,744-Speed 3321.51 samples/sec   Loss 0.5705   LearningRate 0.0006   Epoch: 18   Global Step: 76230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:35,831-Speed 3317.94 samples/sec   Loss 0.5873   LearningRate 0.0006   Epoch: 18   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:21:38,902-Speed 3335.55 samples/sec   Loss 0.5816   LearningRate 0.0006   Epoch: 18   Global Step: 76250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:41,991-Speed 3316.12 samples/sec   Loss 0.5779   LearningRate 0.0006   Epoch: 18   Global Step: 76260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:45,076-Speed 3319.59 samples/sec   Loss 0.5927   LearningRate 0.0006   Epoch: 18   Global Step: 76270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:48,159-Speed 3322.49 samples/sec   Loss 0.5776   LearningRate 0.0006   Epoch: 18   Global Step: 76280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:51,257-Speed 3306.02 samples/sec   Loss 0.5766   LearningRate 0.0006   Epoch: 18   Global Step: 76290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:54,340-Speed 3322.09 samples/sec   Loss 0.5946   LearningRate 0.0006   Epoch: 18   Global Step: 76300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:21:57,419-Speed 3326.37 samples/sec   Loss 0.5587   LearningRate 0.0006   Epoch: 18   Global Step: 76310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:00,499-Speed 3325.67 samples/sec   Loss 0.5600   LearningRate 0.0006   Epoch: 18   Global Step: 76320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:03,581-Speed 3322.88 samples/sec   Loss 0.5860   LearningRate 0.0006   Epoch: 18   Global Step: 76330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:06,659-Speed 3327.28 samples/sec   Loss 0.5855   LearningRate 0.0006   Epoch: 18   Global Step: 76340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:09,737-Speed 3327.67 samples/sec   Loss 0.5695   LearningRate 0.0006   Epoch: 18   Global Step: 76350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:22:12,827-Speed 3314.67 samples/sec   Loss 0.5860   LearningRate 0.0006   Epoch: 18   Global Step: 76360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:22:15,892-Speed 3342.54 samples/sec   Loss 0.5937   LearningRate 0.0006   Epoch: 18   Global Step: 76370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:18,967-Speed 3330.95 samples/sec   Loss 0.5747   LearningRate 0.0006   Epoch: 18   Global Step: 76380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:22,045-Speed 3327.20 samples/sec   Loss 0.5794   LearningRate 0.0006   Epoch: 18   Global Step: 76390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:25,122-Speed 3328.03 samples/sec   Loss 0.5738   LearningRate 0.0006   Epoch: 18   Global Step: 76400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:28,205-Speed 3322.60 samples/sec   Loss 0.5873   LearningRate 0.0006   Epoch: 18   Global Step: 76410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:31,350-Speed 3256.54 samples/sec   Loss 0.5646   LearningRate 0.0006   Epoch: 18   Global Step: 76420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:34,436-Speed 3318.84 samples/sec   Loss 0.5652   LearningRate 0.0006   Epoch: 18   Global Step: 76430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:37,518-Speed 3323.11 samples/sec   Loss 0.5512   LearningRate 0.0006   Epoch: 18   Global Step: 76440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:40,600-Speed 3323.09 samples/sec   Loss 0.5937   LearningRate 0.0006   Epoch: 18   Global Step: 76450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:43,685-Speed 3320.56 samples/sec   Loss 0.5775   LearningRate 0.0006   Epoch: 18   Global Step: 76460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:22:46,768-Speed 3322.30 samples/sec   Loss 0.5540   LearningRate 0.0006   Epoch: 18   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:22:49,858-Speed 3315.00 samples/sec   Loss 0.5897   LearningRate 0.0006   Epoch: 18   Global Step: 76480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:22:52,967-Speed 3294.10 samples/sec   Loss 0.5477   LearningRate 0.0006   Epoch: 18   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:22:56,045-Speed 3327.85 samples/sec   Loss 0.5716   LearningRate 0.0006   Epoch: 18   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:22:59,119-Speed 3331.99 samples/sec   Loss 0.5936   LearningRate 0.0006   Epoch: 18   Global Step: 76510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:23:02,191-Speed 3333.49 samples/sec   Loss 0.5707   LearningRate 0.0006   Epoch: 18   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:23:05,265-Speed 3331.91 samples/sec   Loss 0.5921   LearningRate 0.0006   Epoch: 18   Global Step: 76530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:23:08,342-Speed 3328.10 samples/sec   Loss 0.6036   LearningRate 0.0006   Epoch: 18   Global Step: 76540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:23:11,419-Speed 3328.84 samples/sec   Loss 0.5986   LearningRate 0.0006   Epoch: 18   Global Step: 76550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:23:14,483-Speed 3342.96 samples/sec   Loss 0.5696   LearningRate 0.0006   Epoch: 18   Global Step: 76560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:17,559-Speed 3329.70 samples/sec   Loss 0.5519   LearningRate 0.0005   Epoch: 18   Global Step: 76570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:20,634-Speed 3330.80 samples/sec   Loss 0.5691   LearningRate 0.0005   Epoch: 18   Global Step: 76580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:23,716-Speed 3323.93 samples/sec   Loss 0.5829   LearningRate 0.0005   Epoch: 18   Global Step: 76590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:26,800-Speed 3320.15 samples/sec   Loss 0.6100   LearningRate 0.0005   Epoch: 18   Global Step: 76600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:29,876-Speed 3330.02 samples/sec   Loss 0.5699   LearningRate 0.0005   Epoch: 18   Global Step: 76610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:32,952-Speed 3329.79 samples/sec   Loss 0.5866   LearningRate 0.0005   Epoch: 18   Global Step: 76620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:36,031-Speed 3327.26 samples/sec   Loss 0.5802   LearningRate 0.0005   Epoch: 18   Global Step: 76630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:39,109-Speed 3327.31 samples/sec   Loss 0.5588   LearningRate 0.0005   Epoch: 18   Global Step: 76640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:42,197-Speed 3316.54 samples/sec   Loss 0.5573   LearningRate 0.0005   Epoch: 18   Global Step: 76650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:45,277-Speed 3325.96 samples/sec   Loss 0.5653   LearningRate 0.0005   Epoch: 18   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:23:48,346-Speed 3337.66 samples/sec   Loss 0.5561   LearningRate 0.0005   Epoch: 18   Global Step: 76670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:51,434-Speed 3316.75 samples/sec   Loss 0.5637   LearningRate 0.0005   Epoch: 18   Global Step: 76680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:54,516-Speed 3324.29 samples/sec   Loss 0.5644   LearningRate 0.0005   Epoch: 18   Global Step: 76690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:23:57,594-Speed 3326.70 samples/sec   Loss 0.6038   LearningRate 0.0005   Epoch: 18   Global Step: 76700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:00,677-Speed 3322.63 samples/sec   Loss 0.5920   LearningRate 0.0005   Epoch: 18   Global Step: 76710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:03,754-Speed 3328.59 samples/sec   Loss 0.5791   LearningRate 0.0005   Epoch: 18   Global Step: 76720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:06,830-Speed 3330.02 samples/sec   Loss 0.5827   LearningRate 0.0005   Epoch: 18   Global Step: 76730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:09,917-Speed 3317.40 samples/sec   Loss 0.5791   LearningRate 0.0005   Epoch: 18   Global Step: 76740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:13,006-Speed 3315.97 samples/sec   Loss 0.5879   LearningRate 0.0005   Epoch: 18   Global Step: 76750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:16,088-Speed 3323.00 samples/sec   Loss 0.5882   LearningRate 0.0005   Epoch: 18   Global Step: 76760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:19,166-Speed 3328.14 samples/sec   Loss 0.5709   LearningRate 0.0005   Epoch: 18   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:24:22,246-Speed 3325.47 samples/sec   Loss 0.6020   LearningRate 0.0005   Epoch: 18   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:24:25,327-Speed 3324.06 samples/sec   Loss 0.5819   LearningRate 0.0005   Epoch: 18   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:24:28,410-Speed 3322.39 samples/sec   Loss 0.5760   LearningRate 0.0005   Epoch: 18   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:24:31,486-Speed 3329.51 samples/sec   Loss 0.5762   LearningRate 0.0005   Epoch: 18   Global Step: 76810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:24:34,550-Speed 3342.71 samples/sec   Loss 0.5622   LearningRate 0.0005   Epoch: 18   Global Step: 76820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:37,631-Speed 3324.72 samples/sec   Loss 0.5834   LearningRate 0.0005   Epoch: 18   Global Step: 76830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:40,727-Speed 3307.70 samples/sec   Loss 0.5842   LearningRate 0.0005   Epoch: 18   Global Step: 76840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:43,811-Speed 3322.05 samples/sec   Loss 0.5622   LearningRate 0.0005   Epoch: 18   Global Step: 76850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:46,890-Speed 3325.84 samples/sec   Loss 0.5618   LearningRate 0.0005   Epoch: 18   Global Step: 76860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:49,969-Speed 3326.78 samples/sec   Loss 0.5640   LearningRate 0.0005   Epoch: 18   Global Step: 76870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:53,067-Speed 3306.59 samples/sec   Loss 0.5804   LearningRate 0.0005   Epoch: 18   Global Step: 76880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:56,151-Speed 3320.09 samples/sec   Loss 0.5853   LearningRate 0.0005   Epoch: 18   Global Step: 76890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:24:59,236-Speed 3320.35 samples/sec   Loss 0.5392   LearningRate 0.0005   Epoch: 18   Global Step: 76900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:25:02,338-Speed 3301.99 samples/sec   Loss 0.5941   LearningRate 0.0005   Epoch: 18   Global Step: 76910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:25:05,431-Speed 3311.86 samples/sec   Loss 0.5964   LearningRate 0.0005   Epoch: 18   Global Step: 76920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:08,510-Speed 3326.46 samples/sec   Loss 0.5748   LearningRate 0.0005   Epoch: 18   Global Step: 76930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:11,597-Speed 3317.62 samples/sec   Loss 0.6183   LearningRate 0.0005   Epoch: 18   Global Step: 76940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:14,689-Speed 3312.82 samples/sec   Loss 0.5856   LearningRate 0.0005   Epoch: 18   Global Step: 76950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:17,783-Speed 3310.12 samples/sec   Loss 0.5677   LearningRate 0.0005   Epoch: 18   Global Step: 76960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:20,867-Speed 3320.76 samples/sec   Loss 0.5659   LearningRate 0.0005   Epoch: 18   Global Step: 76970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:23,947-Speed 3325.76 samples/sec   Loss 0.5739   LearningRate 0.0005   Epoch: 18   Global Step: 76980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:27,035-Speed 3316.67 samples/sec   Loss 0.5657   LearningRate 0.0005   Epoch: 18   Global Step: 76990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:30,122-Speed 3317.63 samples/sec   Loss 0.5664   LearningRate 0.0005   Epoch: 18   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:33,215-Speed 3311.85 samples/sec   Loss 0.5689   LearningRate 0.0005   Epoch: 18   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:36,313-Speed 3305.53 samples/sec   Loss 0.5814   LearningRate 0.0005   Epoch: 18   Global Step: 77020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 20:25:39,381-Speed 3338.49 samples/sec   Loss 0.5657   LearningRate 0.0005   Epoch: 18   Global Step: 77030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:42,462-Speed 3324.56 samples/sec   Loss 0.6002   LearningRate 0.0005   Epoch: 18   Global Step: 77040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:45,543-Speed 3325.00 samples/sec   Loss 0.5925   LearningRate 0.0005   Epoch: 18   Global Step: 77050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:48,629-Speed 3318.61 samples/sec   Loss 0.5698   LearningRate 0.0005   Epoch: 18   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:51,709-Speed 3325.35 samples/sec   Loss 0.5607   LearningRate 0.0005   Epoch: 18   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:54,801-Speed 3312.48 samples/sec   Loss 0.5512   LearningRate 0.0005   Epoch: 18   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:25:57,883-Speed 3323.21 samples/sec   Loss 0.5678   LearningRate 0.0005   Epoch: 18   Global Step: 77090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:00,971-Speed 3316.52 samples/sec   Loss 0.5592   LearningRate 0.0005   Epoch: 18   Global Step: 77100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:04,065-Speed 3310.99 samples/sec   Loss 0.5464   LearningRate 0.0005   Epoch: 18   Global Step: 77110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:07,158-Speed 3311.53 samples/sec   Loss 0.5868   LearningRate 0.0005   Epoch: 18   Global Step: 77120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:10,218-Speed 3347.52 samples/sec   Loss 0.5767   LearningRate 0.0005   Epoch: 18   Global Step: 77130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:13,314-Speed 3307.86 samples/sec   Loss 0.5537   LearningRate 0.0005   Epoch: 18   Global Step: 77140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:16,403-Speed 3316.32 samples/sec   Loss 0.5770   LearningRate 0.0005   Epoch: 18   Global Step: 77150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:19,485-Speed 3322.93 samples/sec   Loss 0.5756   LearningRate 0.0004   Epoch: 18   Global Step: 77160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:22,566-Speed 3323.86 samples/sec   Loss 0.5811   LearningRate 0.0004   Epoch: 18   Global Step: 77170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:25,652-Speed 3319.66 samples/sec   Loss 0.5646   LearningRate 0.0004   Epoch: 18   Global Step: 77180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:28,732-Speed 3324.76 samples/sec   Loss 0.5571   LearningRate 0.0004   Epoch: 18   Global Step: 77190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:31,819-Speed 3317.90 samples/sec   Loss 0.5694   LearningRate 0.0004   Epoch: 18   Global Step: 77200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:34,901-Speed 3322.92 samples/sec   Loss 0.5685   LearningRate 0.0004   Epoch: 18   Global Step: 77210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:37,989-Speed 3317.06 samples/sec   Loss 0.5768   LearningRate 0.0004   Epoch: 18   Global Step: 77220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:26:41,081-Speed 3312.34 samples/sec   Loss 0.5825   LearningRate 0.0004   Epoch: 18   Global Step: 77230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:44,163-Speed 3323.82 samples/sec   Loss 0.5660   LearningRate 0.0004   Epoch: 18   Global Step: 77240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:47,253-Speed 3314.27 samples/sec   Loss 0.5949   LearningRate 0.0004   Epoch: 18   Global Step: 77250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:50,341-Speed 3317.38 samples/sec   Loss 0.5927   LearningRate 0.0004   Epoch: 18   Global Step: 77260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:53,428-Speed 3317.99 samples/sec   Loss 0.5900   LearningRate 0.0004   Epoch: 18   Global Step: 77270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:56,510-Speed 3323.21 samples/sec   Loss 0.5873   LearningRate 0.0004   Epoch: 18   Global Step: 77280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:26:59,607-Speed 3306.28 samples/sec   Loss 0.5688   LearningRate 0.0004   Epoch: 18   Global Step: 77290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:02,698-Speed 3313.61 samples/sec   Loss 0.6032   LearningRate 0.0004   Epoch: 18   Global Step: 77300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:05,784-Speed 3318.66 samples/sec   Loss 0.5647   LearningRate 0.0004   Epoch: 18   Global Step: 77310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:08,871-Speed 3318.73 samples/sec   Loss 0.5689   LearningRate 0.0004   Epoch: 18   Global Step: 77320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:11,956-Speed 3320.48 samples/sec   Loss 0.5787   LearningRate 0.0004   Epoch: 18   Global Step: 77330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:15,044-Speed 3316.99 samples/sec   Loss 0.5853   LearningRate 0.0004   Epoch: 18   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:18,131-Speed 3317.63 samples/sec   Loss 0.5662   LearningRate 0.0004   Epoch: 18   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:21,214-Speed 3321.22 samples/sec   Loss 0.5800   LearningRate 0.0004   Epoch: 18   Global Step: 77360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:24,305-Speed 3314.87 samples/sec   Loss 0.5978   LearningRate 0.0004   Epoch: 18   Global Step: 77370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:27,389-Speed 3320.25 samples/sec   Loss 0.5643   LearningRate 0.0004   Epoch: 18   Global Step: 77380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:30,480-Speed 3313.96 samples/sec   Loss 0.5801   LearningRate 0.0004   Epoch: 18   Global Step: 77390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:33,560-Speed 3324.73 samples/sec   Loss 0.5812   LearningRate 0.0004   Epoch: 18   Global Step: 77400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:36,643-Speed 3322.44 samples/sec   Loss 0.5723   LearningRate 0.0004   Epoch: 18   Global Step: 77410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:39,730-Speed 3318.31 samples/sec   Loss 0.5713   LearningRate 0.0004   Epoch: 18   Global Step: 77420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:42,812-Speed 3324.08 samples/sec   Loss 0.6147   LearningRate 0.0004   Epoch: 18   Global Step: 77430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 20:27:45,881-Speed 3337.07 samples/sec   Loss 0.5685   LearningRate 0.0004   Epoch: 18   Global Step: 77440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:48,971-Speed 3314.20 samples/sec   Loss 0.5792   LearningRate 0.0004   Epoch: 18   Global Step: 77450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:52,060-Speed 3315.37 samples/sec   Loss 0.5635   LearningRate 0.0004   Epoch: 18   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:55,142-Speed 3323.23 samples/sec   Loss 0.5611   LearningRate 0.0004   Epoch: 18   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:27:58,226-Speed 3321.67 samples/sec   Loss 0.5780   LearningRate 0.0004   Epoch: 18   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:01,315-Speed 3314.86 samples/sec   Loss 0.5510   LearningRate 0.0004   Epoch: 18   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:04,399-Speed 3321.47 samples/sec   Loss 0.5951   LearningRate 0.0004   Epoch: 18   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:07,487-Speed 3316.70 samples/sec   Loss 0.5476   LearningRate 0.0004   Epoch: 18   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:10,570-Speed 3323.32 samples/sec   Loss 0.5789   LearningRate 0.0004   Epoch: 18   Global Step: 77520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:13,865-Speed 3107.47 samples/sec   Loss 0.5394   LearningRate 0.0004   Epoch: 18   Global Step: 77530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:16,949-Speed 3321.00 samples/sec   Loss 0.5550   LearningRate 0.0004   Epoch: 18   Global Step: 77540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-26 20:28:20,021-Speed 3334.91 samples/sec   Loss 0.5743   LearningRate 0.0004   Epoch: 18   Global Step: 77550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:23,108-Speed 3317.70 samples/sec   Loss 0.5647   LearningRate 0.0004   Epoch: 18   Global Step: 77560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:26,191-Speed 3321.88 samples/sec   Loss 0.5901   LearningRate 0.0004   Epoch: 18   Global Step: 77570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:29,276-Speed 3320.28 samples/sec   Loss 0.5781   LearningRate 0.0004   Epoch: 18   Global Step: 77580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:28:32,342-Speed 3339.63 samples/sec   Loss 0.6001   LearningRate 0.0004   Epoch: 18   Global Step: 77590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:35,424-Speed 3323.59 samples/sec   Loss 0.5495   LearningRate 0.0004   Epoch: 18   Global Step: 77600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:38,521-Speed 3307.69 samples/sec   Loss 0.5681   LearningRate 0.0004   Epoch: 18   Global Step: 77610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:41,609-Speed 3317.13 samples/sec   Loss 0.5951   LearningRate 0.0004   Epoch: 18   Global Step: 77620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:44,695-Speed 3318.82 samples/sec   Loss 0.5689   LearningRate 0.0004   Epoch: 18   Global Step: 77630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:47,781-Speed 3319.17 samples/sec   Loss 0.6078   LearningRate 0.0004   Epoch: 18   Global Step: 77640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:50,868-Speed 3317.50 samples/sec   Loss 0.5786   LearningRate 0.0004   Epoch: 18   Global Step: 77650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:53,981-Speed 3289.80 samples/sec   Loss 0.5920   LearningRate 0.0004   Epoch: 18   Global Step: 77660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:28:57,073-Speed 3313.03 samples/sec   Loss 0.5876   LearningRate 0.0004   Epoch: 18   Global Step: 77670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:00,226-Speed 3248.10 samples/sec   Loss 0.5743   LearningRate 0.0004   Epoch: 18   Global Step: 77680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:03,317-Speed 3313.32 samples/sec   Loss 0.5825   LearningRate 0.0004   Epoch: 18   Global Step: 77690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:06,409-Speed 3312.37 samples/sec   Loss 0.5826   LearningRate 0.0004   Epoch: 18   Global Step: 77700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:09,504-Speed 3310.11 samples/sec   Loss 0.5574   LearningRate 0.0004   Epoch: 18   Global Step: 77710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:12,591-Speed 3317.72 samples/sec   Loss 0.5870   LearningRate 0.0004   Epoch: 18   Global Step: 77720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:15,678-Speed 3317.92 samples/sec   Loss 0.5591   LearningRate 0.0004   Epoch: 18   Global Step: 77730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:18,763-Speed 3320.46 samples/sec   Loss 0.5498   LearningRate 0.0004   Epoch: 18   Global Step: 77740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:21,847-Speed 3320.05 samples/sec   Loss 0.5534   LearningRate 0.0004   Epoch: 18   Global Step: 77750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:24,951-Speed 3299.94 samples/sec   Loss 0.5855   LearningRate 0.0004   Epoch: 18   Global Step: 77760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:28,041-Speed 3314.62 samples/sec   Loss 0.6102   LearningRate 0.0004   Epoch: 18   Global Step: 77770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:31,133-Speed 3312.65 samples/sec   Loss 0.5604   LearningRate 0.0004   Epoch: 18   Global Step: 77780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:29:34,218-Speed 3320.16 samples/sec   Loss 0.6154   LearningRate 0.0004   Epoch: 18   Global Step: 77790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:37,308-Speed 3314.52 samples/sec   Loss 0.5678   LearningRate 0.0004   Epoch: 18   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:40,396-Speed 3317.68 samples/sec   Loss 0.5709   LearningRate 0.0003   Epoch: 18   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:43,479-Speed 3321.73 samples/sec   Loss 0.5809   LearningRate 0.0003   Epoch: 18   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:46,570-Speed 3314.15 samples/sec   Loss 0.5563   LearningRate 0.0003   Epoch: 18   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:49,662-Speed 3312.24 samples/sec   Loss 0.5967   LearningRate 0.0003   Epoch: 18   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:52,766-Speed 3298.91 samples/sec   Loss 0.5872   LearningRate 0.0003   Epoch: 18   Global Step: 77850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:55,851-Speed 3320.22 samples/sec   Loss 0.5728   LearningRate 0.0003   Epoch: 18   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:29:58,938-Speed 3317.83 samples/sec   Loss 0.5981   LearningRate 0.0003   Epoch: 18   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-26 20:30:02,008-Speed 3336.78 samples/sec   Loss 0.5924   LearningRate 0.0003   Epoch: 18   Global Step: 77880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:05,094-Speed 3318.34 samples/sec   Loss 0.5618   LearningRate 0.0003   Epoch: 18   Global Step: 77890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:08,189-Speed 3309.87 samples/sec   Loss 0.5886   LearningRate 0.0003   Epoch: 18   Global Step: 77900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:11,279-Speed 3314.49 samples/sec   Loss 0.6030   LearningRate 0.0003   Epoch: 18   Global Step: 77910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:14,365-Speed 3319.38 samples/sec   Loss 0.5579   LearningRate 0.0003   Epoch: 18   Global Step: 77920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:17,448-Speed 3321.60 samples/sec   Loss 0.5579   LearningRate 0.0003   Epoch: 18   Global Step: 77930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:20,533-Speed 3320.47 samples/sec   Loss 0.5656   LearningRate 0.0003   Epoch: 18   Global Step: 77940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:23,621-Speed 3315.89 samples/sec   Loss 0.5685   LearningRate 0.0003   Epoch: 18   Global Step: 77950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:26,708-Speed 3317.81 samples/sec   Loss 0.5830   LearningRate 0.0003   Epoch: 18   Global Step: 77960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:29,806-Speed 3306.45 samples/sec   Loss 0.5799   LearningRate 0.0003   Epoch: 18   Global Step: 77970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-26 20:30:32,891-Speed 3319.77 samples/sec   Loss 0.5728   LearningRate 0.0003   Epoch: 18   Global Step: 77980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:30:35,987-Speed 3307.94 samples/sec   Loss 0.5704   LearningRate 0.0003   Epoch: 18   Global Step: 77990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:30:39,080-Speed 3312.47 samples/sec   Loss 0.5836   LearningRate 0.0003   Epoch: 18   Global Step: 78000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:31:22,796-[lfw][78000]XNorm: 21.118087
Training: 2022-04-26 20:31:22,797-[lfw][78000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-26 20:31:22,797-[lfw][78000]Accuracy-Highest: 0.99850
Training: 2022-04-26 20:32:13,435-[cfp_fp][78000]XNorm: 21.934907
Training: 2022-04-26 20:32:13,436-[cfp_fp][78000]Accuracy-Flip: 0.99257+-0.00498
Training: 2022-04-26 20:32:13,437-[cfp_fp][78000]Accuracy-Highest: 0.99357
Training: 2022-04-26 20:32:56,869-[agedb_30][78000]XNorm: 21.847820
Training: 2022-04-26 20:32:56,869-[agedb_30][78000]Accuracy-Flip: 0.97733+-0.00676
Training: 2022-04-26 20:32:56,870-[agedb_30][78000]Accuracy-Highest: 0.97950
Training: 2022-04-26 20:32:59,959-Speed 72.69 samples/sec   Loss 0.5667   LearningRate 0.0003   Epoch: 18   Global Step: 78010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:03,055-Speed 3307.94 samples/sec   Loss 0.5720   LearningRate 0.0003   Epoch: 18   Global Step: 78020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:06,174-Speed 3283.71 samples/sec   Loss 0.6048   LearningRate 0.0003   Epoch: 18   Global Step: 78030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:09,252-Speed 3327.16 samples/sec   Loss 0.5737   LearningRate 0.0003   Epoch: 18   Global Step: 78040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:12,335-Speed 3322.31 samples/sec   Loss 0.5738   LearningRate 0.0003   Epoch: 18   Global Step: 78050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:15,419-Speed 3321.04 samples/sec   Loss 0.5651   LearningRate 0.0003   Epoch: 18   Global Step: 78060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:18,501-Speed 3323.68 samples/sec   Loss 0.5323   LearningRate 0.0003   Epoch: 18   Global Step: 78070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:21,596-Speed 3309.04 samples/sec   Loss 0.5624   LearningRate 0.0003   Epoch: 18   Global Step: 78080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:24,681-Speed 3319.39 samples/sec   Loss 0.5817   LearningRate 0.0003   Epoch: 18   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:27,777-Speed 3308.56 samples/sec   Loss 0.5898   LearningRate 0.0003   Epoch: 18   Global Step: 78100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:30,866-Speed 3315.43 samples/sec   Loss 0.5572   LearningRate 0.0003   Epoch: 18   Global Step: 78110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:33,957-Speed 3314.48 samples/sec   Loss 0.5663   LearningRate 0.0003   Epoch: 18   Global Step: 78120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:33:37,036-Speed 3326.23 samples/sec   Loss 0.5840   LearningRate 0.0003   Epoch: 18   Global Step: 78130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:33:40,123-Speed 3318.19 samples/sec   Loss 0.5939   LearningRate 0.0003   Epoch: 18   Global Step: 78140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:33:43,210-Speed 3318.06 samples/sec   Loss 0.5857   LearningRate 0.0003   Epoch: 18   Global Step: 78150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:33:46,300-Speed 3314.56 samples/sec   Loss 0.5969   LearningRate 0.0003   Epoch: 18   Global Step: 78160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:33:49,392-Speed 3312.61 samples/sec   Loss 0.5732   LearningRate 0.0003   Epoch: 18   Global Step: 78170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:33:52,478-Speed 3318.25 samples/sec   Loss 0.5582   LearningRate 0.0003   Epoch: 18   Global Step: 78180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:33:55,564-Speed 3318.89 samples/sec   Loss 0.6085   LearningRate 0.0003   Epoch: 18   Global Step: 78190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:33:58,663-Speed 3304.81 samples/sec   Loss 0.5731   LearningRate 0.0003   Epoch: 18   Global Step: 78200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:34:01,747-Speed 3321.57 samples/sec   Loss 0.5520   LearningRate 0.0003   Epoch: 18   Global Step: 78210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:34:04,830-Speed 3322.05 samples/sec   Loss 0.5660   LearningRate 0.0003   Epoch: 18   Global Step: 78220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:34:07,912-Speed 3323.32 samples/sec   Loss 0.5409   LearningRate 0.0003   Epoch: 18   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:10,994-Speed 3323.48 samples/sec   Loss 0.5623   LearningRate 0.0003   Epoch: 18   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:14,096-Speed 3302.13 samples/sec   Loss 0.5692   LearningRate 0.0003   Epoch: 18   Global Step: 78250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:17,181-Speed 3319.64 samples/sec   Loss 0.6212   LearningRate 0.0003   Epoch: 18   Global Step: 78260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:20,257-Speed 3329.16 samples/sec   Loss 0.5947   LearningRate 0.0003   Epoch: 18   Global Step: 78270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:23,340-Speed 3322.98 samples/sec   Loss 0.5648   LearningRate 0.0003   Epoch: 18   Global Step: 78280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:26,427-Speed 3317.90 samples/sec   Loss 0.5751   LearningRate 0.0003   Epoch: 18   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:29,515-Speed 3316.49 samples/sec   Loss 0.5687   LearningRate 0.0003   Epoch: 18   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:32,596-Speed 3324.02 samples/sec   Loss 0.5881   LearningRate 0.0003   Epoch: 18   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:35,679-Speed 3322.72 samples/sec   Loss 0.5700   LearningRate 0.0003   Epoch: 18   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:38,735-Speed 3352.05 samples/sec   Loss 0.5814   LearningRate 0.0003   Epoch: 18   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:41,810-Speed 3330.09 samples/sec   Loss 0.5937   LearningRate 0.0003   Epoch: 18   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:44,898-Speed 3317.32 samples/sec   Loss 0.5597   LearningRate 0.0003   Epoch: 18   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:47,985-Speed 3317.66 samples/sec   Loss 0.5547   LearningRate 0.0003   Epoch: 18   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:51,061-Speed 3329.00 samples/sec   Loss 0.5486   LearningRate 0.0003   Epoch: 18   Global Step: 78370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:54,156-Speed 3310.30 samples/sec   Loss 0.5736   LearningRate 0.0003   Epoch: 18   Global Step: 78380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:34:57,250-Speed 3309.43 samples/sec   Loss 0.5616   LearningRate 0.0003   Epoch: 18   Global Step: 78390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:35:00,324-Speed 3332.50 samples/sec   Loss 0.5631   LearningRate 0.0003   Epoch: 18   Global Step: 78400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:35:03,409-Speed 3320.52 samples/sec   Loss 0.5774   LearningRate 0.0003   Epoch: 18   Global Step: 78410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:35:06,488-Speed 3326.17 samples/sec   Loss 0.5686   LearningRate 0.0003   Epoch: 18   Global Step: 78420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:35:09,547-Speed 3348.69 samples/sec   Loss 0.5748   LearningRate 0.0003   Epoch: 18   Global Step: 78430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:35:12,627-Speed 3324.79 samples/sec   Loss 0.5688   LearningRate 0.0003   Epoch: 18   Global Step: 78440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:35:15,684-Speed 3350.64 samples/sec   Loss 0.5955   LearningRate 0.0003   Epoch: 18   Global Step: 78450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:18,756-Speed 3334.02 samples/sec   Loss 0.5872   LearningRate 0.0003   Epoch: 18   Global Step: 78460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:21,829-Speed 3333.33 samples/sec   Loss 0.5690   LearningRate 0.0003   Epoch: 18   Global Step: 78470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:24,903-Speed 3330.81 samples/sec   Loss 0.5546   LearningRate 0.0003   Epoch: 18   Global Step: 78480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:27,981-Speed 3327.39 samples/sec   Loss 0.5939   LearningRate 0.0003   Epoch: 18   Global Step: 78490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:31,064-Speed 3322.91 samples/sec   Loss 0.5593   LearningRate 0.0003   Epoch: 18   Global Step: 78500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:34,146-Speed 3322.70 samples/sec   Loss 0.5660   LearningRate 0.0003   Epoch: 18   Global Step: 78510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:37,226-Speed 3325.52 samples/sec   Loss 0.5836   LearningRate 0.0003   Epoch: 18   Global Step: 78520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:40,311-Speed 3321.12 samples/sec   Loss 0.5814   LearningRate 0.0003   Epoch: 18   Global Step: 78530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:43,392-Speed 3324.28 samples/sec   Loss 0.5767   LearningRate 0.0003   Epoch: 18   Global Step: 78540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:35:46,465-Speed 3332.07 samples/sec   Loss 0.5633   LearningRate 0.0003   Epoch: 18   Global Step: 78550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:35:49,595-Speed 3272.39 samples/sec   Loss 0.5751   LearningRate 0.0003   Epoch: 18   Global Step: 78560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:36:02,058-Speed 821.70 samples/sec   Loss 0.4897   LearningRate 0.0002   Epoch: 19   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:36:05,135-Speed 3329.29 samples/sec   Loss 0.4610   LearningRate 0.0002   Epoch: 19   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:36:08,207-Speed 3333.28 samples/sec   Loss 0.4864   LearningRate 0.0002   Epoch: 19   Global Step: 78590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:11,287-Speed 3326.02 samples/sec   Loss 0.4760   LearningRate 0.0002   Epoch: 19   Global Step: 78600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:14,418-Speed 3271.30 samples/sec   Loss 0.4575   LearningRate 0.0002   Epoch: 19   Global Step: 78610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:17,527-Speed 3294.50 samples/sec   Loss 0.4683   LearningRate 0.0002   Epoch: 19   Global Step: 78620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:20,613-Speed 3318.98 samples/sec   Loss 0.4620   LearningRate 0.0002   Epoch: 19   Global Step: 78630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:23,689-Speed 3330.30 samples/sec   Loss 0.4816   LearningRate 0.0002   Epoch: 19   Global Step: 78640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:26,777-Speed 3315.75 samples/sec   Loss 0.4711   LearningRate 0.0002   Epoch: 19   Global Step: 78650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:29,856-Speed 3326.90 samples/sec   Loss 0.4628   LearningRate 0.0002   Epoch: 19   Global Step: 78660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:32,942-Speed 3318.89 samples/sec   Loss 0.4683   LearningRate 0.0002   Epoch: 19   Global Step: 78670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:36,035-Speed 3311.68 samples/sec   Loss 0.4737   LearningRate 0.0002   Epoch: 19   Global Step: 78680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:39,240-Speed 3195.13 samples/sec   Loss 0.4359   LearningRate 0.0002   Epoch: 19   Global Step: 78690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:36:42,470-Speed 3171.83 samples/sec   Loss 0.4779   LearningRate 0.0002   Epoch: 19   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:36:45,549-Speed 3325.94 samples/sec   Loss 0.4864   LearningRate 0.0002   Epoch: 19   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:36:48,625-Speed 3330.15 samples/sec   Loss 0.4876   LearningRate 0.0002   Epoch: 19   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:36:51,772-Speed 3254.62 samples/sec   Loss 0.5035   LearningRate 0.0002   Epoch: 19   Global Step: 78730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:54,857-Speed 3320.19 samples/sec   Loss 0.4767   LearningRate 0.0002   Epoch: 19   Global Step: 78740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:36:57,931-Speed 3331.83 samples/sec   Loss 0.4675   LearningRate 0.0002   Epoch: 19   Global Step: 78750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:01,027-Speed 3307.81 samples/sec   Loss 0.4492   LearningRate 0.0002   Epoch: 19   Global Step: 78760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:04,118-Speed 3314.26 samples/sec   Loss 0.4754   LearningRate 0.0002   Epoch: 19   Global Step: 78770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:07,202-Speed 3321.46 samples/sec   Loss 0.4547   LearningRate 0.0002   Epoch: 19   Global Step: 78780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:10,275-Speed 3333.49 samples/sec   Loss 0.4690   LearningRate 0.0002   Epoch: 19   Global Step: 78790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:13,355-Speed 3324.82 samples/sec   Loss 0.4746   LearningRate 0.0002   Epoch: 19   Global Step: 78800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:16,436-Speed 3324.26 samples/sec   Loss 0.4594   LearningRate 0.0002   Epoch: 19   Global Step: 78810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:19,535-Speed 3305.46 samples/sec   Loss 0.4671   LearningRate 0.0002   Epoch: 19   Global Step: 78820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:22,628-Speed 3311.20 samples/sec   Loss 0.4610   LearningRate 0.0002   Epoch: 19   Global Step: 78830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:25,706-Speed 3327.06 samples/sec   Loss 0.4809   LearningRate 0.0002   Epoch: 19   Global Step: 78840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:28,791-Speed 3320.74 samples/sec   Loss 0.4649   LearningRate 0.0002   Epoch: 19   Global Step: 78850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:31,876-Speed 3319.66 samples/sec   Loss 0.4633   LearningRate 0.0002   Epoch: 19   Global Step: 78860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:34,971-Speed 3309.38 samples/sec   Loss 0.4566   LearningRate 0.0002   Epoch: 19   Global Step: 78870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:38,065-Speed 3310.72 samples/sec   Loss 0.4873   LearningRate 0.0002   Epoch: 19   Global Step: 78880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:41,145-Speed 3325.07 samples/sec   Loss 0.4556   LearningRate 0.0002   Epoch: 19   Global Step: 78890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:44,255-Speed 3292.94 samples/sec   Loss 0.4667   LearningRate 0.0002   Epoch: 19   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:37:47,321-Speed 3340.50 samples/sec   Loss 0.4875   LearningRate 0.0002   Epoch: 19   Global Step: 78910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:50,429-Speed 3296.30 samples/sec   Loss 0.4853   LearningRate 0.0002   Epoch: 19   Global Step: 78920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:53,531-Speed 3301.88 samples/sec   Loss 0.4788   LearningRate 0.0002   Epoch: 19   Global Step: 78930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:56,616-Speed 3319.52 samples/sec   Loss 0.4757   LearningRate 0.0002   Epoch: 19   Global Step: 78940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:37:59,748-Speed 3269.97 samples/sec   Loss 0.4838   LearningRate 0.0002   Epoch: 19   Global Step: 78950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:02,837-Speed 3316.64 samples/sec   Loss 0.4967   LearningRate 0.0002   Epoch: 19   Global Step: 78960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:05,935-Speed 3305.98 samples/sec   Loss 0.4821   LearningRate 0.0002   Epoch: 19   Global Step: 78970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:09,035-Speed 3303.07 samples/sec   Loss 0.4499   LearningRate 0.0002   Epoch: 19   Global Step: 78980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:12,123-Speed 3316.93 samples/sec   Loss 0.4730   LearningRate 0.0002   Epoch: 19   Global Step: 78990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:15,220-Speed 3307.72 samples/sec   Loss 0.4754   LearningRate 0.0002   Epoch: 19   Global Step: 79000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:18,310-Speed 3313.70 samples/sec   Loss 0.4637   LearningRate 0.0002   Epoch: 19   Global Step: 79010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:38:21,402-Speed 3313.93 samples/sec   Loss 0.4674   LearningRate 0.0002   Epoch: 19   Global Step: 79020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:38:24,494-Speed 3311.66 samples/sec   Loss 0.4569   LearningRate 0.0002   Epoch: 19   Global Step: 79030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:38:27,573-Speed 3326.63 samples/sec   Loss 0.4489   LearningRate 0.0002   Epoch: 19   Global Step: 79040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:30,658-Speed 3320.69 samples/sec   Loss 0.4532   LearningRate 0.0002   Epoch: 19   Global Step: 79050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:33,743-Speed 3319.54 samples/sec   Loss 0.4693   LearningRate 0.0002   Epoch: 19   Global Step: 79060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:36,830-Speed 3317.85 samples/sec   Loss 0.4828   LearningRate 0.0002   Epoch: 19   Global Step: 79070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:40,014-Speed 3216.30 samples/sec   Loss 0.4481   LearningRate 0.0002   Epoch: 19   Global Step: 79080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:43,121-Speed 3296.67 samples/sec   Loss 0.4656   LearningRate 0.0002   Epoch: 19   Global Step: 79090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:46,214-Speed 3311.15 samples/sec   Loss 0.4703   LearningRate 0.0002   Epoch: 19   Global Step: 79100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:49,305-Speed 3313.82 samples/sec   Loss 0.4911   LearningRate 0.0002   Epoch: 19   Global Step: 79110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:52,403-Speed 3306.74 samples/sec   Loss 0.4652   LearningRate 0.0002   Epoch: 19   Global Step: 79120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:55,509-Speed 3297.05 samples/sec   Loss 0.4817   LearningRate 0.0002   Epoch: 19   Global Step: 79130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:38:58,581-Speed 3334.93 samples/sec   Loss 0.4859   LearningRate 0.0002   Epoch: 19   Global Step: 79140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:01,675-Speed 3309.38 samples/sec   Loss 0.4722   LearningRate 0.0002   Epoch: 19   Global Step: 79150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:04,760-Speed 3321.45 samples/sec   Loss 0.4595   LearningRate 0.0002   Epoch: 19   Global Step: 79160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:07,849-Speed 3315.92 samples/sec   Loss 0.4920   LearningRate 0.0002   Epoch: 19   Global Step: 79170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:10,930-Speed 3323.63 samples/sec   Loss 0.4564   LearningRate 0.0002   Epoch: 19   Global Step: 79180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:14,015-Speed 3319.68 samples/sec   Loss 0.4495   LearningRate 0.0002   Epoch: 19   Global Step: 79190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:17,098-Speed 3323.18 samples/sec   Loss 0.4932   LearningRate 0.0002   Epoch: 19   Global Step: 79200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:20,180-Speed 3323.49 samples/sec   Loss 0.4716   LearningRate 0.0002   Epoch: 19   Global Step: 79210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:23,280-Speed 3303.28 samples/sec   Loss 0.4936   LearningRate 0.0002   Epoch: 19   Global Step: 79220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:26,365-Speed 3320.37 samples/sec   Loss 0.4636   LearningRate 0.0002   Epoch: 19   Global Step: 79230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:39:29,452-Speed 3317.49 samples/sec   Loss 0.4755   LearningRate 0.0002   Epoch: 19   Global Step: 79240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:32,541-Speed 3316.59 samples/sec   Loss 0.4715   LearningRate 0.0002   Epoch: 19   Global Step: 79250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:35,634-Speed 3311.47 samples/sec   Loss 0.4636   LearningRate 0.0002   Epoch: 19   Global Step: 79260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:38,720-Speed 3318.82 samples/sec   Loss 0.4776   LearningRate 0.0002   Epoch: 19   Global Step: 79270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:41,809-Speed 3315.84 samples/sec   Loss 0.4993   LearningRate 0.0002   Epoch: 19   Global Step: 79280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:44,893-Speed 3320.41 samples/sec   Loss 0.4691   LearningRate 0.0002   Epoch: 19   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:47,986-Speed 3311.57 samples/sec   Loss 0.4619   LearningRate 0.0002   Epoch: 19   Global Step: 79300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:51,083-Speed 3308.22 samples/sec   Loss 0.4727   LearningRate 0.0002   Epoch: 19   Global Step: 79310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:54,166-Speed 3321.52 samples/sec   Loss 0.4562   LearningRate 0.0002   Epoch: 19   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:39:57,248-Speed 3323.04 samples/sec   Loss 0.4632   LearningRate 0.0002   Epoch: 19   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:00,343-Speed 3309.51 samples/sec   Loss 0.4511   LearningRate 0.0002   Epoch: 19   Global Step: 79340   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 20:40:03,409-Speed 3340.66 samples/sec   Loss 0.4906   LearningRate 0.0002   Epoch: 19   Global Step: 79350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:06,498-Speed 3316.20 samples/sec   Loss 0.4582   LearningRate 0.0002   Epoch: 19   Global Step: 79360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:09,578-Speed 3325.02 samples/sec   Loss 0.4701   LearningRate 0.0002   Epoch: 19   Global Step: 79370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:12,672-Speed 3310.21 samples/sec   Loss 0.4567   LearningRate 0.0002   Epoch: 19   Global Step: 79380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:15,765-Speed 3311.13 samples/sec   Loss 0.4723   LearningRate 0.0002   Epoch: 19   Global Step: 79390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:18,851-Speed 3319.04 samples/sec   Loss 0.4697   LearningRate 0.0002   Epoch: 19   Global Step: 79400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:21,934-Speed 3322.46 samples/sec   Loss 0.4775   LearningRate 0.0002   Epoch: 19   Global Step: 79410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:25,021-Speed 3318.08 samples/sec   Loss 0.4583   LearningRate 0.0002   Epoch: 19   Global Step: 79420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:28,104-Speed 3322.06 samples/sec   Loss 0.4712   LearningRate 0.0002   Epoch: 19   Global Step: 79430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:31,190-Speed 3319.43 samples/sec   Loss 0.4849   LearningRate 0.0002   Epoch: 19   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:34,258-Speed 3338.27 samples/sec   Loss 0.4359   LearningRate 0.0002   Epoch: 19   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:37,374-Speed 3286.71 samples/sec   Loss 0.4767   LearningRate 0.0002   Epoch: 19   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:40,464-Speed 3315.36 samples/sec   Loss 0.4499   LearningRate 0.0002   Epoch: 19   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:43,551-Speed 3317.34 samples/sec   Loss 0.4996   LearningRate 0.0002   Epoch: 19   Global Step: 79480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:46,636-Speed 3319.78 samples/sec   Loss 0.4683   LearningRate 0.0002   Epoch: 19   Global Step: 79490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:49,747-Speed 3292.64 samples/sec   Loss 0.4741   LearningRate 0.0001   Epoch: 19   Global Step: 79500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:52,895-Speed 3254.04 samples/sec   Loss 0.5016   LearningRate 0.0001   Epoch: 19   Global Step: 79510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:55,980-Speed 3319.60 samples/sec   Loss 0.4936   LearningRate 0.0001   Epoch: 19   Global Step: 79520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:40:59,127-Speed 3254.52 samples/sec   Loss 0.4594   LearningRate 0.0001   Epoch: 19   Global Step: 79530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:02,375-Speed 3153.44 samples/sec   Loss 0.4731   LearningRate 0.0001   Epoch: 19   Global Step: 79540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:05,475-Speed 3304.40 samples/sec   Loss 0.5056   LearningRate 0.0001   Epoch: 19   Global Step: 79550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 20:41:08,545-Speed 3336.19 samples/sec   Loss 0.4834   LearningRate 0.0001   Epoch: 19   Global Step: 79560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:11,630-Speed 3319.72 samples/sec   Loss 0.4643   LearningRate 0.0001   Epoch: 19   Global Step: 79570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:14,728-Speed 3305.87 samples/sec   Loss 0.4791   LearningRate 0.0001   Epoch: 19   Global Step: 79580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:17,857-Speed 3273.52 samples/sec   Loss 0.4781   LearningRate 0.0001   Epoch: 19   Global Step: 79590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:20,942-Speed 3321.03 samples/sec   Loss 0.4596   LearningRate 0.0001   Epoch: 19   Global Step: 79600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:24,054-Speed 3291.11 samples/sec   Loss 0.4495   LearningRate 0.0001   Epoch: 19   Global Step: 79610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:27,227-Speed 3227.76 samples/sec   Loss 0.4625   LearningRate 0.0001   Epoch: 19   Global Step: 79620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:30,325-Speed 3305.63 samples/sec   Loss 0.4423   LearningRate 0.0001   Epoch: 19   Global Step: 79630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:33,409-Speed 3321.03 samples/sec   Loss 0.4603   LearningRate 0.0001   Epoch: 19   Global Step: 79640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:36,497-Speed 3316.61 samples/sec   Loss 0.5118   LearningRate 0.0001   Epoch: 19   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:39,572-Speed 3331.21 samples/sec   Loss 0.4667   LearningRate 0.0001   Epoch: 19   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:42,673-Speed 3302.33 samples/sec   Loss 0.4431   LearningRate 0.0001   Epoch: 19   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:45,806-Speed 3268.79 samples/sec   Loss 0.4810   LearningRate 0.0001   Epoch: 19   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:49,006-Speed 3201.22 samples/sec   Loss 0.4661   LearningRate 0.0001   Epoch: 19   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:52,101-Speed 3309.44 samples/sec   Loss 0.4662   LearningRate 0.0001   Epoch: 19   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:55,187-Speed 3319.94 samples/sec   Loss 0.4898   LearningRate 0.0001   Epoch: 19   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:41:58,259-Speed 3334.13 samples/sec   Loss 0.4561   LearningRate 0.0001   Epoch: 19   Global Step: 79720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:01,356-Speed 3307.14 samples/sec   Loss 0.4639   LearningRate 0.0001   Epoch: 19   Global Step: 79730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:04,444-Speed 3316.51 samples/sec   Loss 0.4562   LearningRate 0.0001   Epoch: 19   Global Step: 79740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:07,531-Speed 3318.78 samples/sec   Loss 0.4523   LearningRate 0.0001   Epoch: 19   Global Step: 79750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:10,651-Speed 3282.33 samples/sec   Loss 0.4920   LearningRate 0.0001   Epoch: 19   Global Step: 79760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:13,873-Speed 3179.52 samples/sec   Loss 0.4638   LearningRate 0.0001   Epoch: 19   Global Step: 79770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:16,967-Speed 3310.09 samples/sec   Loss 0.4836   LearningRate 0.0001   Epoch: 19   Global Step: 79780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:20,075-Speed 3295.63 samples/sec   Loss 0.4538   LearningRate 0.0001   Epoch: 19   Global Step: 79790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:23,187-Speed 3290.96 samples/sec   Loss 0.4792   LearningRate 0.0001   Epoch: 19   Global Step: 79800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:26,274-Speed 3318.86 samples/sec   Loss 0.4620   LearningRate 0.0001   Epoch: 19   Global Step: 79810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:42:29,357-Speed 3321.89 samples/sec   Loss 0.4618   LearningRate 0.0001   Epoch: 19   Global Step: 79820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:32,447-Speed 3314.89 samples/sec   Loss 0.4797   LearningRate 0.0001   Epoch: 19   Global Step: 79830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:35,599-Speed 3249.25 samples/sec   Loss 0.4808   LearningRate 0.0001   Epoch: 19   Global Step: 79840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:38,834-Speed 3165.53 samples/sec   Loss 0.4643   LearningRate 0.0001   Epoch: 19   Global Step: 79850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:41,938-Speed 3299.53 samples/sec   Loss 0.4553   LearningRate 0.0001   Epoch: 19   Global Step: 79860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:45,033-Speed 3310.05 samples/sec   Loss 0.4502   LearningRate 0.0001   Epoch: 19   Global Step: 79870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:48,155-Speed 3280.46 samples/sec   Loss 0.4886   LearningRate 0.0001   Epoch: 19   Global Step: 79880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:51,283-Speed 3274.15 samples/sec   Loss 0.4607   LearningRate 0.0001   Epoch: 19   Global Step: 79890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:54,377-Speed 3311.51 samples/sec   Loss 0.4601   LearningRate 0.0001   Epoch: 19   Global Step: 79900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:42:57,474-Speed 3306.46 samples/sec   Loss 0.4835   LearningRate 0.0001   Epoch: 19   Global Step: 79910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:43:00,571-Speed 3307.80 samples/sec   Loss 0.4559   LearningRate 0.0001   Epoch: 19   Global Step: 79920   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 20:43:03,708-Speed 3264.47 samples/sec   Loss 0.4585   LearningRate 0.0001   Epoch: 19   Global Step: 79930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:43:06,799-Speed 3313.64 samples/sec   Loss 0.4444   LearningRate 0.0001   Epoch: 19   Global Step: 79940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:43:09,895-Speed 3308.66 samples/sec   Loss 0.4430   LearningRate 0.0001   Epoch: 19   Global Step: 79950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:43:12,962-Speed 3340.07 samples/sec   Loss 0.4896   LearningRate 0.0001   Epoch: 19   Global Step: 79960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:43:16,070-Speed 3295.01 samples/sec   Loss 0.4920   LearningRate 0.0001   Epoch: 19   Global Step: 79970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:43:19,153-Speed 3322.80 samples/sec   Loss 0.4700   LearningRate 0.0001   Epoch: 19   Global Step: 79980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:43:22,244-Speed 3313.14 samples/sec   Loss 0.4581   LearningRate 0.0001   Epoch: 19   Global Step: 79990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:43:25,331-Speed 3318.54 samples/sec   Loss 0.4662   LearningRate 0.0001   Epoch: 19   Global Step: 80000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:44:09,177-[lfw][80000]XNorm: 21.058709
Training: 2022-04-26 20:44:09,178-[lfw][80000]Accuracy-Flip: 0.99833+-0.00224
Training: 2022-04-26 20:44:09,178-[lfw][80000]Accuracy-Highest: 0.99850
Training: 2022-04-26 20:45:00,273-[cfp_fp][80000]XNorm: 21.836647
Training: 2022-04-26 20:45:00,274-[cfp_fp][80000]Accuracy-Flip: 0.99314+-0.00502
Training: 2022-04-26 20:45:00,274-[cfp_fp][80000]Accuracy-Highest: 0.99357
Training: 2022-04-26 20:45:44,314-[agedb_30][80000]XNorm: 21.775674
Training: 2022-04-26 20:45:44,315-[agedb_30][80000]Accuracy-Flip: 0.97800+-0.00640
Training: 2022-04-26 20:45:44,315-[agedb_30][80000]Accuracy-Highest: 0.97950
Training: 2022-04-26 20:45:47,406-Speed 72.07 samples/sec   Loss 0.4791   LearningRate 0.0001   Epoch: 19   Global Step: 80010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:45:50,495-Speed 3315.83 samples/sec   Loss 0.4702   LearningRate 0.0001   Epoch: 19   Global Step: 80020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:45:53,586-Speed 3312.83 samples/sec   Loss 0.4703   LearningRate 0.0001   Epoch: 19   Global Step: 80030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:45:56,684-Speed 3306.29 samples/sec   Loss 0.4664   LearningRate 0.0001   Epoch: 19   Global Step: 80040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:45:59,773-Speed 3316.33 samples/sec   Loss 0.4567   LearningRate 0.0001   Epoch: 19   Global Step: 80050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:02,860-Speed 3318.08 samples/sec   Loss 0.4480   LearningRate 0.0001   Epoch: 19   Global Step: 80060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:05,945-Speed 3319.93 samples/sec   Loss 0.4760   LearningRate 0.0001   Epoch: 19   Global Step: 80070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:09,039-Speed 3309.68 samples/sec   Loss 0.4699   LearningRate 0.0001   Epoch: 19   Global Step: 80080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:12,242-Speed 3197.94 samples/sec   Loss 0.4560   LearningRate 0.0001   Epoch: 19   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:15,348-Speed 3298.18 samples/sec   Loss 0.4601   LearningRate 0.0001   Epoch: 19   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:18,441-Speed 3311.31 samples/sec   Loss 0.4887   LearningRate 0.0001   Epoch: 19   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:21,539-Speed 3305.63 samples/sec   Loss 0.4782   LearningRate 0.0001   Epoch: 19   Global Step: 80120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:24,605-Speed 3340.57 samples/sec   Loss 0.4756   LearningRate 0.0001   Epoch: 19   Global Step: 80130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:27,695-Speed 3314.96 samples/sec   Loss 0.4841   LearningRate 0.0001   Epoch: 19   Global Step: 80140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:30,783-Speed 3316.62 samples/sec   Loss 0.4667   LearningRate 0.0001   Epoch: 19   Global Step: 80150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:33,866-Speed 3322.07 samples/sec   Loss 0.4803   LearningRate 0.0001   Epoch: 19   Global Step: 80160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:36,947-Speed 3324.08 samples/sec   Loss 0.4695   LearningRate 0.0001   Epoch: 19   Global Step: 80170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:40,035-Speed 3316.94 samples/sec   Loss 0.4747   LearningRate 0.0001   Epoch: 19   Global Step: 80180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:43,121-Speed 3318.97 samples/sec   Loss 0.4594   LearningRate 0.0001   Epoch: 19   Global Step: 80190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:46,203-Speed 3323.87 samples/sec   Loss 0.4732   LearningRate 0.0001   Epoch: 19   Global Step: 80200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:49,290-Speed 3317.01 samples/sec   Loss 0.4761   LearningRate 0.0001   Epoch: 19   Global Step: 80210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:52,400-Speed 3293.40 samples/sec   Loss 0.4680   LearningRate 0.0001   Epoch: 19   Global Step: 80220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:46:55,487-Speed 3319.00 samples/sec   Loss 0.4563   LearningRate 0.0001   Epoch: 19   Global Step: 80230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:46:58,584-Speed 3306.93 samples/sec   Loss 0.4684   LearningRate 0.0001   Epoch: 19   Global Step: 80240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:01,680-Speed 3308.48 samples/sec   Loss 0.4381   LearningRate 0.0001   Epoch: 19   Global Step: 80250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:04,762-Speed 3322.67 samples/sec   Loss 0.4752   LearningRate 0.0001   Epoch: 19   Global Step: 80260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:07,844-Speed 3323.21 samples/sec   Loss 0.4788   LearningRate 0.0001   Epoch: 19   Global Step: 80270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:10,928-Speed 3321.60 samples/sec   Loss 0.4684   LearningRate 0.0001   Epoch: 19   Global Step: 80280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:14,009-Speed 3324.10 samples/sec   Loss 0.4753   LearningRate 0.0001   Epoch: 19   Global Step: 80290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:17,100-Speed 3313.16 samples/sec   Loss 0.4681   LearningRate 0.0001   Epoch: 19   Global Step: 80300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:20,184-Speed 3321.55 samples/sec   Loss 0.4682   LearningRate 0.0001   Epoch: 19   Global Step: 80310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:23,277-Speed 3311.66 samples/sec   Loss 0.4525   LearningRate 0.0001   Epoch: 19   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:26,364-Speed 3317.49 samples/sec   Loss 0.4910   LearningRate 0.0001   Epoch: 19   Global Step: 80330   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 20:47:29,443-Speed 3327.31 samples/sec   Loss 0.4736   LearningRate 0.0001   Epoch: 19   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:32,525-Speed 3322.59 samples/sec   Loss 0.4699   LearningRate 0.0001   Epoch: 19   Global Step: 80350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:35,606-Speed 3324.68 samples/sec   Loss 0.4624   LearningRate 0.0001   Epoch: 19   Global Step: 80360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:38,697-Speed 3313.32 samples/sec   Loss 0.4687   LearningRate 0.0001   Epoch: 19   Global Step: 80370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:41,784-Speed 3317.57 samples/sec   Loss 0.4872   LearningRate 0.0001   Epoch: 19   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:44,876-Speed 3312.74 samples/sec   Loss 0.4495   LearningRate 0.0001   Epoch: 19   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:48,004-Speed 3274.79 samples/sec   Loss 0.4573   LearningRate 0.0001   Epoch: 19   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:51,141-Speed 3264.70 samples/sec   Loss 0.4573   LearningRate 0.0001   Epoch: 19   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:54,226-Speed 3319.89 samples/sec   Loss 0.4886   LearningRate 0.0001   Epoch: 19   Global Step: 80420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:47:57,307-Speed 3324.83 samples/sec   Loss 0.4609   LearningRate 0.0001   Epoch: 19   Global Step: 80430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:00,386-Speed 3325.83 samples/sec   Loss 0.4849   LearningRate 0.0001   Epoch: 19   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:03,480-Speed 3310.67 samples/sec   Loss 0.4462   LearningRate 0.0001   Epoch: 19   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:06,559-Speed 3325.95 samples/sec   Loss 0.4993   LearningRate 0.0001   Epoch: 19   Global Step: 80460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:09,662-Speed 3301.76 samples/sec   Loss 0.4589   LearningRate 0.0001   Epoch: 19   Global Step: 80470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:12,917-Speed 3147.02 samples/sec   Loss 0.4753   LearningRate 0.0001   Epoch: 19   Global Step: 80480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:16,084-Speed 3233.38 samples/sec   Loss 0.4559   LearningRate 0.0001   Epoch: 19   Global Step: 80490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:19,160-Speed 3330.30 samples/sec   Loss 0.4557   LearningRate 0.0001   Epoch: 19   Global Step: 80500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:22,241-Speed 3324.05 samples/sec   Loss 0.4851   LearningRate 0.0001   Epoch: 19   Global Step: 80510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:25,333-Speed 3312.52 samples/sec   Loss 0.4756   LearningRate 0.0001   Epoch: 19   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:28,422-Speed 3315.99 samples/sec   Loss 0.4743   LearningRate 0.0001   Epoch: 19   Global Step: 80530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:31,505-Speed 3322.40 samples/sec   Loss 0.4783   LearningRate 0.0001   Epoch: 19   Global Step: 80540   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 20:48:34,568-Speed 3343.80 samples/sec   Loss 0.4474   LearningRate 0.0001   Epoch: 19   Global Step: 80550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:37,651-Speed 3321.84 samples/sec   Loss 0.5015   LearningRate 0.0001   Epoch: 19   Global Step: 80560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:40,731-Speed 3325.29 samples/sec   Loss 0.4706   LearningRate 0.0001   Epoch: 19   Global Step: 80570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:43,810-Speed 3327.32 samples/sec   Loss 0.4657   LearningRate 0.0001   Epoch: 19   Global Step: 80580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:46,888-Speed 3327.56 samples/sec   Loss 0.4855   LearningRate 0.0001   Epoch: 19   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:50,028-Speed 3261.86 samples/sec   Loss 0.4773   LearningRate 0.0001   Epoch: 19   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:53,122-Speed 3310.55 samples/sec   Loss 0.4693   LearningRate 0.0001   Epoch: 19   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:56,206-Speed 3320.94 samples/sec   Loss 0.4637   LearningRate 0.0001   Epoch: 19   Global Step: 80620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:48:59,287-Speed 3324.18 samples/sec   Loss 0.4513   LearningRate 0.0001   Epoch: 19   Global Step: 80630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:02,364-Speed 3328.91 samples/sec   Loss 0.4844   LearningRate 0.0001   Epoch: 19   Global Step: 80640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:05,429-Speed 3341.06 samples/sec   Loss 0.4620   LearningRate 0.0001   Epoch: 19   Global Step: 80650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:08,508-Speed 3326.98 samples/sec   Loss 0.4596   LearningRate 0.0001   Epoch: 19   Global Step: 80660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:11,589-Speed 3323.90 samples/sec   Loss 0.4667   LearningRate 0.0001   Epoch: 19   Global Step: 80670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:14,683-Speed 3310.92 samples/sec   Loss 0.4751   LearningRate 0.0001   Epoch: 19   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:17,766-Speed 3322.48 samples/sec   Loss 0.4825   LearningRate 0.0001   Epoch: 19   Global Step: 80690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:20,865-Speed 3305.95 samples/sec   Loss 0.4733   LearningRate 0.0001   Epoch: 19   Global Step: 80700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:23,945-Speed 3324.87 samples/sec   Loss 0.4762   LearningRate 0.0001   Epoch: 19   Global Step: 80710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:27,098-Speed 3248.16 samples/sec   Loss 0.4666   LearningRate 0.0001   Epoch: 19   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:30,183-Speed 3320.30 samples/sec   Loss 0.4717   LearningRate 0.0001   Epoch: 19   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:33,266-Speed 3322.16 samples/sec   Loss 0.4703   LearningRate 0.0001   Epoch: 19   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:36,356-Speed 3315.18 samples/sec   Loss 0.4636   LearningRate 0.0001   Epoch: 19   Global Step: 80750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:39,453-Speed 3306.82 samples/sec   Loss 0.4609   LearningRate 0.0001   Epoch: 19   Global Step: 80760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:42,564-Speed 3292.22 samples/sec   Loss 0.4755   LearningRate 0.0001   Epoch: 19   Global Step: 80770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:45,646-Speed 3323.33 samples/sec   Loss 0.4734   LearningRate 0.0001   Epoch: 19   Global Step: 80780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:48,742-Speed 3309.05 samples/sec   Loss 0.4727   LearningRate 0.0001   Epoch: 19   Global Step: 80790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:51,818-Speed 3329.77 samples/sec   Loss 0.4568   LearningRate 0.0001   Epoch: 19   Global Step: 80800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:54,897-Speed 3325.45 samples/sec   Loss 0.4789   LearningRate 0.0001   Epoch: 19   Global Step: 80810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:49:57,978-Speed 3325.21 samples/sec   Loss 0.4800   LearningRate 0.0001   Epoch: 19   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:01,071-Speed 3311.06 samples/sec   Loss 0.4972   LearningRate 0.0001   Epoch: 19   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:04,154-Speed 3322.59 samples/sec   Loss 0.4807   LearningRate 0.0001   Epoch: 19   Global Step: 80840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:07,225-Speed 3334.84 samples/sec   Loss 0.4426   LearningRate 0.0001   Epoch: 19   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:10,328-Speed 3300.97 samples/sec   Loss 0.4578   LearningRate 0.0000   Epoch: 19   Global Step: 80860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:13,447-Speed 3283.35 samples/sec   Loss 0.4674   LearningRate 0.0000   Epoch: 19   Global Step: 80870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:16,530-Speed 3322.62 samples/sec   Loss 0.4627   LearningRate 0.0000   Epoch: 19   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:19,612-Speed 3323.42 samples/sec   Loss 0.4724   LearningRate 0.0000   Epoch: 19   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:22,695-Speed 3322.15 samples/sec   Loss 0.4969   LearningRate 0.0000   Epoch: 19   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:25,776-Speed 3323.97 samples/sec   Loss 0.4597   LearningRate 0.0000   Epoch: 19   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:28,856-Speed 3325.58 samples/sec   Loss 0.4802   LearningRate 0.0000   Epoch: 19   Global Step: 80920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:31,939-Speed 3322.25 samples/sec   Loss 0.4726   LearningRate 0.0000   Epoch: 19   Global Step: 80930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:35,027-Speed 3316.89 samples/sec   Loss 0.4942   LearningRate 0.0000   Epoch: 19   Global Step: 80940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:38,119-Speed 3311.94 samples/sec   Loss 0.4410   LearningRate 0.0000   Epoch: 19   Global Step: 80950   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 20:50:41,193-Speed 3332.29 samples/sec   Loss 0.4768   LearningRate 0.0000   Epoch: 19   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:44,285-Speed 3312.44 samples/sec   Loss 0.4432   LearningRate 0.0000   Epoch: 19   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:47,371-Speed 3318.89 samples/sec   Loss 0.4577   LearningRate 0.0000   Epoch: 19   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:50:50,540-Speed 3231.99 samples/sec   Loss 0.4730   LearningRate 0.0000   Epoch: 19   Global Step: 80990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:50:53,743-Speed 3197.91 samples/sec   Loss 0.4615   LearningRate 0.0000   Epoch: 19   Global Step: 81000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:50:56,835-Speed 3312.58 samples/sec   Loss 0.4755   LearningRate 0.0000   Epoch: 19   Global Step: 81010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:50:59,926-Speed 3312.80 samples/sec   Loss 0.4756   LearningRate 0.0000   Epoch: 19   Global Step: 81020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:03,018-Speed 3313.25 samples/sec   Loss 0.4820   LearningRate 0.0000   Epoch: 19   Global Step: 81030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:06,100-Speed 3323.42 samples/sec   Loss 0.4607   LearningRate 0.0000   Epoch: 19   Global Step: 81040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:09,180-Speed 3324.77 samples/sec   Loss 0.4560   LearningRate 0.0000   Epoch: 19   Global Step: 81050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:12,268-Speed 3317.43 samples/sec   Loss 0.4657   LearningRate 0.0000   Epoch: 19   Global Step: 81060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:15,351-Speed 3322.26 samples/sec   Loss 0.4598   LearningRate 0.0000   Epoch: 19   Global Step: 81070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:18,433-Speed 3323.52 samples/sec   Loss 0.4649   LearningRate 0.0000   Epoch: 19   Global Step: 81080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:21,559-Speed 3275.99 samples/sec   Loss 0.4919   LearningRate 0.0000   Epoch: 19   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:51:24,656-Speed 3307.30 samples/sec   Loss 0.4793   LearningRate 0.0000   Epoch: 19   Global Step: 81100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:27,788-Speed 3269.92 samples/sec   Loss 0.4646   LearningRate 0.0000   Epoch: 19   Global Step: 81110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:30,873-Speed 3319.97 samples/sec   Loss 0.4800   LearningRate 0.0000   Epoch: 19   Global Step: 81120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:33,956-Speed 3321.92 samples/sec   Loss 0.4536   LearningRate 0.0000   Epoch: 19   Global Step: 81130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:37,034-Speed 3327.85 samples/sec   Loss 0.4454   LearningRate 0.0000   Epoch: 19   Global Step: 81140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:40,136-Speed 3302.17 samples/sec   Loss 0.4901   LearningRate 0.0000   Epoch: 19   Global Step: 81150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:43,214-Speed 3326.91 samples/sec   Loss 0.4540   LearningRate 0.0000   Epoch: 19   Global Step: 81160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:46,327-Speed 3290.76 samples/sec   Loss 0.4603   LearningRate 0.0000   Epoch: 19   Global Step: 81170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:49,446-Speed 3284.43 samples/sec   Loss 0.4829   LearningRate 0.0000   Epoch: 19   Global Step: 81180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:52,565-Speed 3283.26 samples/sec   Loss 0.4891   LearningRate 0.0000   Epoch: 19   Global Step: 81190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:51:55,648-Speed 3322.21 samples/sec   Loss 0.4430   LearningRate 0.0000   Epoch: 19   Global Step: 81200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:51:58,734-Speed 3318.25 samples/sec   Loss 0.4624   LearningRate 0.0000   Epoch: 19   Global Step: 81210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:01,819-Speed 3320.82 samples/sec   Loss 0.4453   LearningRate 0.0000   Epoch: 19   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:04,910-Speed 3313.22 samples/sec   Loss 0.4652   LearningRate 0.0000   Epoch: 19   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:08,039-Speed 3273.27 samples/sec   Loss 0.4704   LearningRate 0.0000   Epoch: 19   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:11,127-Speed 3316.18 samples/sec   Loss 0.4768   LearningRate 0.0000   Epoch: 19   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:14,216-Speed 3316.15 samples/sec   Loss 0.4432   LearningRate 0.0000   Epoch: 19   Global Step: 81260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:17,297-Speed 3324.56 samples/sec   Loss 0.4481   LearningRate 0.0000   Epoch: 19   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:20,378-Speed 3324.99 samples/sec   Loss 0.4841   LearningRate 0.0000   Epoch: 19   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:23,464-Speed 3318.85 samples/sec   Loss 0.4709   LearningRate 0.0000   Epoch: 19   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:26,544-Speed 3324.89 samples/sec   Loss 0.5021   LearningRate 0.0000   Epoch: 19   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:29,636-Speed 3312.76 samples/sec   Loss 0.4731   LearningRate 0.0000   Epoch: 19   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:32,716-Speed 3325.50 samples/sec   Loss 0.4904   LearningRate 0.0000   Epoch: 19   Global Step: 81320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:35,799-Speed 3324.74 samples/sec   Loss 0.4661   LearningRate 0.0000   Epoch: 19   Global Step: 81330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:38,884-Speed 3320.20 samples/sec   Loss 0.4730   LearningRate 0.0000   Epoch: 19   Global Step: 81340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:42,038-Speed 3247.07 samples/sec   Loss 0.4794   LearningRate 0.0000   Epoch: 19   Global Step: 81350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:45,127-Speed 3316.37 samples/sec   Loss 0.4537   LearningRate 0.0000   Epoch: 19   Global Step: 81360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:48,219-Speed 3312.21 samples/sec   Loss 0.4608   LearningRate 0.0000   Epoch: 19   Global Step: 81370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:52:51,415-Speed 3205.03 samples/sec   Loss 0.4780   LearningRate 0.0000   Epoch: 19   Global Step: 81380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:52:54,564-Speed 3253.14 samples/sec   Loss 0.4673   LearningRate 0.0000   Epoch: 19   Global Step: 81390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:52:57,665-Speed 3302.84 samples/sec   Loss 0.4726   LearningRate 0.0000   Epoch: 19   Global Step: 81400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:00,750-Speed 3320.39 samples/sec   Loss 0.4814   LearningRate 0.0000   Epoch: 19   Global Step: 81410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:03,834-Speed 3320.97 samples/sec   Loss 0.4982   LearningRate 0.0000   Epoch: 19   Global Step: 81420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:06,922-Speed 3316.50 samples/sec   Loss 0.4607   LearningRate 0.0000   Epoch: 19   Global Step: 81430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:10,008-Speed 3319.04 samples/sec   Loss 0.4634   LearningRate 0.0000   Epoch: 19   Global Step: 81440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:13,095-Speed 3318.36 samples/sec   Loss 0.4996   LearningRate 0.0000   Epoch: 19   Global Step: 81450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:16,181-Speed 3319.57 samples/sec   Loss 0.4758   LearningRate 0.0000   Epoch: 19   Global Step: 81460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:19,265-Speed 3320.68 samples/sec   Loss 0.4447   LearningRate 0.0000   Epoch: 19   Global Step: 81470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:22,362-Speed 3308.06 samples/sec   Loss 0.4816   LearningRate 0.0000   Epoch: 19   Global Step: 81480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:25,449-Speed 3317.51 samples/sec   Loss 0.4476   LearningRate 0.0000   Epoch: 19   Global Step: 81490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:28,536-Speed 3317.56 samples/sec   Loss 0.4632   LearningRate 0.0000   Epoch: 19   Global Step: 81500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:31,621-Speed 3320.08 samples/sec   Loss 0.4686   LearningRate 0.0000   Epoch: 19   Global Step: 81510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:34,722-Speed 3303.48 samples/sec   Loss 0.4813   LearningRate 0.0000   Epoch: 19   Global Step: 81520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:37,809-Speed 3317.90 samples/sec   Loss 0.4610   LearningRate 0.0000   Epoch: 19   Global Step: 81530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:40,896-Speed 3317.79 samples/sec   Loss 0.4741   LearningRate 0.0000   Epoch: 19   Global Step: 81540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:43,983-Speed 3317.30 samples/sec   Loss 0.4754   LearningRate 0.0000   Epoch: 19   Global Step: 81550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:47,097-Speed 3289.63 samples/sec   Loss 0.4636   LearningRate 0.0000   Epoch: 19   Global Step: 81560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:50,217-Speed 3282.88 samples/sec   Loss 0.4638   LearningRate 0.0000   Epoch: 19   Global Step: 81570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:53:53,301-Speed 3320.52 samples/sec   Loss 0.4562   LearningRate 0.0000   Epoch: 19   Global Step: 81580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:56,395-Speed 3310.27 samples/sec   Loss 0.4538   LearningRate 0.0000   Epoch: 19   Global Step: 81590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:53:59,510-Speed 3288.34 samples/sec   Loss 0.4760   LearningRate 0.0000   Epoch: 19   Global Step: 81600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:02,598-Speed 3317.31 samples/sec   Loss 0.4737   LearningRate 0.0000   Epoch: 19   Global Step: 81610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:05,692-Speed 3309.85 samples/sec   Loss 0.4804   LearningRate 0.0000   Epoch: 19   Global Step: 81620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:08,792-Speed 3304.80 samples/sec   Loss 0.4670   LearningRate 0.0000   Epoch: 19   Global Step: 81630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:11,905-Speed 3290.04 samples/sec   Loss 0.4642   LearningRate 0.0000   Epoch: 19   Global Step: 81640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:14,997-Speed 3312.26 samples/sec   Loss 0.4498   LearningRate 0.0000   Epoch: 19   Global Step: 81650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:18,087-Speed 3314.42 samples/sec   Loss 0.4604   LearningRate 0.0000   Epoch: 19   Global Step: 81660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:21,179-Speed 3312.01 samples/sec   Loss 0.4553   LearningRate 0.0000   Epoch: 19   Global Step: 81670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:24,263-Speed 3322.00 samples/sec   Loss 0.4806   LearningRate 0.0000   Epoch: 19   Global Step: 81680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:54:27,343-Speed 3324.55 samples/sec   Loss 0.4618   LearningRate 0.0000   Epoch: 19   Global Step: 81690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:30,428-Speed 3321.19 samples/sec   Loss 0.5026   LearningRate 0.0000   Epoch: 19   Global Step: 81700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:33,518-Speed 3314.57 samples/sec   Loss 0.4806   LearningRate 0.0000   Epoch: 19   Global Step: 81710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:36,684-Speed 3235.14 samples/sec   Loss 0.4687   LearningRate 0.0000   Epoch: 19   Global Step: 81720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:39,780-Speed 3308.07 samples/sec   Loss 0.4702   LearningRate 0.0000   Epoch: 19   Global Step: 81730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:42,870-Speed 3314.65 samples/sec   Loss 0.4554   LearningRate 0.0000   Epoch: 19   Global Step: 81740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:45,957-Speed 3317.76 samples/sec   Loss 0.4631   LearningRate 0.0000   Epoch: 19   Global Step: 81750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:49,045-Speed 3317.67 samples/sec   Loss 0.4757   LearningRate 0.0000   Epoch: 19   Global Step: 81760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:52,130-Speed 3319.79 samples/sec   Loss 0.4666   LearningRate 0.0000   Epoch: 19   Global Step: 81770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:55,232-Speed 3302.05 samples/sec   Loss 0.4705   LearningRate 0.0000   Epoch: 19   Global Step: 81780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:54:58,363-Speed 3270.42 samples/sec   Loss 0.4688   LearningRate 0.0000   Epoch: 19   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:01,481-Speed 3286.05 samples/sec   Loss 0.4912   LearningRate 0.0000   Epoch: 19   Global Step: 81800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:04,582-Speed 3303.03 samples/sec   Loss 0.4424   LearningRate 0.0000   Epoch: 19   Global Step: 81810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:07,672-Speed 3314.28 samples/sec   Loss 0.4893   LearningRate 0.0000   Epoch: 19   Global Step: 81820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:10,761-Speed 3316.12 samples/sec   Loss 0.4723   LearningRate 0.0000   Epoch: 19   Global Step: 81830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:13,853-Speed 3311.88 samples/sec   Loss 0.4890   LearningRate 0.0000   Epoch: 19   Global Step: 81840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:16,961-Speed 3295.35 samples/sec   Loss 0.4871   LearningRate 0.0000   Epoch: 19   Global Step: 81850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:20,053-Speed 3312.75 samples/sec   Loss 0.4759   LearningRate 0.0000   Epoch: 19   Global Step: 81860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:23,165-Speed 3291.54 samples/sec   Loss 0.4816   LearningRate 0.0000   Epoch: 19   Global Step: 81870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:26,262-Speed 3306.91 samples/sec   Loss 0.4778   LearningRate 0.0000   Epoch: 19   Global Step: 81880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:29,358-Speed 3309.26 samples/sec   Loss 0.4600   LearningRate 0.0000   Epoch: 19   Global Step: 81890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:32,448-Speed 3314.40 samples/sec   Loss 0.4743   LearningRate 0.0000   Epoch: 19   Global Step: 81900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:55:35,532-Speed 3320.57 samples/sec   Loss 0.4514   LearningRate 0.0000   Epoch: 19   Global Step: 81910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:38,634-Speed 3302.36 samples/sec   Loss 0.4738   LearningRate 0.0000   Epoch: 19   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:41,721-Speed 3317.75 samples/sec   Loss 0.4570   LearningRate 0.0000   Epoch: 19   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:44,808-Speed 3317.97 samples/sec   Loss 0.4636   LearningRate 0.0000   Epoch: 19   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:47,916-Speed 3295.57 samples/sec   Loss 0.4700   LearningRate 0.0000   Epoch: 19   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:51,019-Speed 3300.60 samples/sec   Loss 0.4875   LearningRate 0.0000   Epoch: 19   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:54,107-Speed 3316.38 samples/sec   Loss 0.4685   LearningRate 0.0000   Epoch: 19   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:55:57,205-Speed 3306.29 samples/sec   Loss 0.4803   LearningRate 0.0000   Epoch: 19   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:56:00,336-Speed 3271.91 samples/sec   Loss 0.4640   LearningRate 0.0000   Epoch: 19   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:56:03,508-Speed 3228.48 samples/sec   Loss 0.4395   LearningRate 0.0000   Epoch: 19   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:56:47,215-[lfw][82000]XNorm: 21.022730
Training: 2022-04-26 20:56:47,216-[lfw][82000]Accuracy-Flip: 0.99833+-0.00224
Training: 2022-04-26 20:56:47,216-[lfw][82000]Accuracy-Highest: 0.99850
Training: 2022-04-26 20:57:37,954-[cfp_fp][82000]XNorm: 21.861879
Training: 2022-04-26 20:57:37,955-[cfp_fp][82000]Accuracy-Flip: 0.99357+-0.00479
Training: 2022-04-26 20:57:37,955-[cfp_fp][82000]Accuracy-Highest: 0.99357
Training: 2022-04-26 20:58:21,647-[agedb_30][82000]XNorm: 21.773780
Training: 2022-04-26 20:58:21,648-[agedb_30][82000]Accuracy-Flip: 0.97883+-0.00667
Training: 2022-04-26 20:58:21,648-[agedb_30][82000]Accuracy-Highest: 0.97950
Training: 2022-04-26 20:58:24,769-Speed 72.49 samples/sec   Loss 0.4800   LearningRate 0.0000   Epoch: 19   Global Step: 82010   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 20:58:27,864-Speed 3309.46 samples/sec   Loss 0.4674   LearningRate 0.0000   Epoch: 19   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:58:30,953-Speed 3316.00 samples/sec   Loss 0.4764   LearningRate 0.0000   Epoch: 19   Global Step: 82030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:58:34,038-Speed 3319.76 samples/sec   Loss 0.4781   LearningRate 0.0000   Epoch: 19   Global Step: 82040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:58:37,118-Speed 3324.60 samples/sec   Loss 0.4687   LearningRate 0.0000   Epoch: 19   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:58:40,206-Speed 3316.63 samples/sec   Loss 0.4530   LearningRate 0.0000   Epoch: 19   Global Step: 82060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:58:43,291-Speed 3320.69 samples/sec   Loss 0.4887   LearningRate 0.0000   Epoch: 19   Global Step: 82070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:58:46,361-Speed 3335.71 samples/sec   Loss 0.4860   LearningRate 0.0000   Epoch: 19   Global Step: 82080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:58:49,463-Speed 3302.20 samples/sec   Loss 0.4716   LearningRate 0.0000   Epoch: 19   Global Step: 82090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:58:52,571-Speed 3295.69 samples/sec   Loss 0.4623   LearningRate 0.0000   Epoch: 19   Global Step: 82100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:58:55,669-Speed 3305.82 samples/sec   Loss 0.5001   LearningRate 0.0000   Epoch: 19   Global Step: 82110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:58:58,781-Speed 3291.70 samples/sec   Loss 0.4600   LearningRate 0.0000   Epoch: 19   Global Step: 82120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:01,986-Speed 3195.30 samples/sec   Loss 0.4573   LearningRate 0.0000   Epoch: 19   Global Step: 82130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:05,150-Speed 3237.29 samples/sec   Loss 0.4537   LearningRate 0.0000   Epoch: 19   Global Step: 82140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:08,238-Speed 3316.88 samples/sec   Loss 0.4661   LearningRate 0.0000   Epoch: 19   Global Step: 82150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:11,330-Speed 3313.09 samples/sec   Loss 0.4734   LearningRate 0.0000   Epoch: 19   Global Step: 82160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:14,417-Speed 3317.76 samples/sec   Loss 0.4576   LearningRate 0.0000   Epoch: 19   Global Step: 82170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:17,515-Speed 3306.11 samples/sec   Loss 0.4854   LearningRate 0.0000   Epoch: 19   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:59:20,597-Speed 3323.57 samples/sec   Loss 0.4357   LearningRate 0.0000   Epoch: 19   Global Step: 82190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:59:23,683-Speed 3318.41 samples/sec   Loss 0.4694   LearningRate 0.0000   Epoch: 19   Global Step: 82200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:59:26,783-Speed 3304.70 samples/sec   Loss 0.4855   LearningRate 0.0000   Epoch: 19   Global Step: 82210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:59:29,870-Speed 3317.10 samples/sec   Loss 0.4711   LearningRate 0.0000   Epoch: 19   Global Step: 82220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:59:32,957-Speed 3318.96 samples/sec   Loss 0.4743   LearningRate 0.0000   Epoch: 19   Global Step: 82230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:59:36,039-Speed 3323.50 samples/sec   Loss 0.4546   LearningRate 0.0000   Epoch: 19   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 20:59:39,115-Speed 3328.82 samples/sec   Loss 0.4399   LearningRate 0.0000   Epoch: 19   Global Step: 82250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:42,203-Speed 3317.62 samples/sec   Loss 0.4679   LearningRate 0.0000   Epoch: 19   Global Step: 82260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:45,315-Speed 3290.80 samples/sec   Loss 0.4753   LearningRate 0.0000   Epoch: 19   Global Step: 82270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:48,526-Speed 3189.87 samples/sec   Loss 0.4714   LearningRate 0.0000   Epoch: 19   Global Step: 82280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:51,638-Speed 3291.20 samples/sec   Loss 0.4683   LearningRate 0.0000   Epoch: 19   Global Step: 82290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:54,721-Speed 3322.64 samples/sec   Loss 0.4656   LearningRate 0.0000   Epoch: 19   Global Step: 82300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 20:59:57,800-Speed 3326.53 samples/sec   Loss 0.4675   LearningRate 0.0000   Epoch: 19   Global Step: 82310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:00:00,884-Speed 3321.13 samples/sec   Loss 0.4668   LearningRate 0.0000   Epoch: 19   Global Step: 82320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:00:03,968-Speed 3320.54 samples/sec   Loss 0.4667   LearningRate 0.0000   Epoch: 19   Global Step: 82330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:00:07,053-Speed 3320.47 samples/sec   Loss 0.4874   LearningRate 0.0000   Epoch: 19   Global Step: 82340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:00:10,132-Speed 3325.94 samples/sec   Loss 0.4815   LearningRate 0.0000   Epoch: 19   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:13,231-Speed 3305.27 samples/sec   Loss 0.4750   LearningRate 0.0000   Epoch: 19   Global Step: 82360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:16,317-Speed 3319.25 samples/sec   Loss 0.4767   LearningRate 0.0000   Epoch: 19   Global Step: 82370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:19,408-Speed 3313.27 samples/sec   Loss 0.4861   LearningRate 0.0000   Epoch: 19   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:22,488-Speed 3326.04 samples/sec   Loss 0.4563   LearningRate 0.0000   Epoch: 19   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:25,567-Speed 3325.88 samples/sec   Loss 0.4639   LearningRate 0.0000   Epoch: 19   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:28,645-Speed 3328.25 samples/sec   Loss 0.4767   LearningRate 0.0000   Epoch: 19   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:31,726-Speed 3323.63 samples/sec   Loss 0.4909   LearningRate 0.0000   Epoch: 19   Global Step: 82420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:34,812-Speed 3318.95 samples/sec   Loss 0.4600   LearningRate 0.0000   Epoch: 19   Global Step: 82430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:37,916-Speed 3299.80 samples/sec   Loss 0.4675   LearningRate 0.0000   Epoch: 19   Global Step: 82440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:40,993-Speed 3328.11 samples/sec   Loss 0.4753   LearningRate 0.0000   Epoch: 19   Global Step: 82450   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-26 21:00:44,061-Speed 3338.97 samples/sec   Loss 0.4730   LearningRate 0.0000   Epoch: 19   Global Step: 82460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:47,149-Speed 3317.14 samples/sec   Loss 0.5050   LearningRate 0.0000   Epoch: 19   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:50,231-Speed 3323.80 samples/sec   Loss 0.4660   LearningRate 0.0000   Epoch: 19   Global Step: 82480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:53,309-Speed 3327.10 samples/sec   Loss 0.4857   LearningRate 0.0000   Epoch: 19   Global Step: 82490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:56,403-Speed 3310.30 samples/sec   Loss 0.4618   LearningRate 0.0000   Epoch: 19   Global Step: 82500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:00:59,518-Speed 3288.30 samples/sec   Loss 0.4563   LearningRate 0.0000   Epoch: 19   Global Step: 82510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:01:02,788-Speed 3132.72 samples/sec   Loss 0.4904   LearningRate 0.0000   Epoch: 19   Global Step: 82520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:01:05,893-Speed 3298.40 samples/sec   Loss 0.4782   LearningRate 0.0000   Epoch: 19   Global Step: 82530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:01:08,953-Speed 3346.82 samples/sec   Loss 0.4670   LearningRate 0.0000   Epoch: 19   Global Step: 82540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:12,041-Speed 3316.88 samples/sec   Loss 0.4620   LearningRate 0.0000   Epoch: 19   Global Step: 82550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:15,132-Speed 3314.12 samples/sec   Loss 0.4750   LearningRate 0.0000   Epoch: 19   Global Step: 82560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:18,210-Speed 3327.84 samples/sec   Loss 0.4752   LearningRate 0.0000   Epoch: 19   Global Step: 82570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:21,331-Speed 3281.56 samples/sec   Loss 0.4636   LearningRate 0.0000   Epoch: 19   Global Step: 82580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:24,407-Speed 3329.41 samples/sec   Loss 0.4681   LearningRate 0.0000   Epoch: 19   Global Step: 82590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:27,491-Speed 3321.24 samples/sec   Loss 0.4626   LearningRate 0.0000   Epoch: 19   Global Step: 82600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:30,574-Speed 3322.51 samples/sec   Loss 0.4788   LearningRate 0.0000   Epoch: 19   Global Step: 82610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:33,666-Speed 3311.39 samples/sec   Loss 0.4555   LearningRate 0.0000   Epoch: 19   Global Step: 82620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:36,755-Speed 3316.29 samples/sec   Loss 0.4837   LearningRate 0.0000   Epoch: 19   Global Step: 82630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:39,849-Speed 3309.80 samples/sec   Loss 0.4537   LearningRate 0.0000   Epoch: 19   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:01:42,927-Speed 3328.02 samples/sec   Loss 0.4789   LearningRate 0.0000   Epoch: 19   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-26 21:01:46,007-Speed 3325.33 samples/sec   Loss 0.4860   LearningRate 0.0000   Epoch: 19   Global Step: 82660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:49,091-Speed 3321.53 samples/sec   Loss 0.4913   LearningRate 0.0000   Epoch: 19   Global Step: 82670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:52,179-Speed 3316.45 samples/sec   Loss 0.4503   LearningRate 0.0000   Epoch: 19   Global Step: 82680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:55,335-Speed 3245.98 samples/sec   Loss 0.4776   LearningRate 0.0000   Epoch: 19   Global Step: 82690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-26 21:01:58,426-Speed 3313.03 samples/sec   Loss 0.4679   LearningRate 0.0000   Epoch: 19   Global Step: 82700   Fp16 Grad Scale: 32768   Required: -0 hours