Training: 2022-04-10 23:13:31,380-rank_id: 0
Training: 2022-04-10 23:13:45,393-: margin_list              [1.0, 0.5, 0.0]
Training: 2022-04-10 23:13:45,393-: network                  r100
Training: 2022-04-10 23:13:45,393-: resume                   False
Training: 2022-04-10 23:13:45,393-: output                   work_dirs/ms1mv3_r100
Training: 2022-04-10 23:13:45,393-: embedding_size           512
Training: 2022-04-10 23:13:45,393-: sample_rate              1.0
Training: 2022-04-10 23:13:45,393-: interclass_filtering_threshold0
Training: 2022-04-10 23:13:45,393-: fp16                     True
Training: 2022-04-10 23:13:45,393-: batch_size               128
Training: 2022-04-10 23:13:45,393-: optimizer                sgd
Training: 2022-04-10 23:13:45,394-: lr                       0.1
Training: 2022-04-10 23:13:45,394-: momentum                 0.9
Training: 2022-04-10 23:13:45,394-: weight_decay             0.0005
Training: 2022-04-10 23:13:45,394-: verbose                  2000
Training: 2022-04-10 23:13:45,394-: frequent                 10
Training: 2022-04-10 23:13:45,394-: dali                     False
Training: 2022-04-10 23:13:45,394-: rec                      /train_tmp/ms1m-retinaface-t1
Training: 2022-04-10 23:13:45,394-: num_classes              93431
Training: 2022-04-10 23:13:45,394-: num_image                5179510
Training: 2022-04-10 23:13:45,394-: num_epoch                20
Training: 2022-04-10 23:13:45,394-: warmup_epoch             0
Training: 2022-04-10 23:13:45,394-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-10 23:13:45,394-: total_batch_size         1024
Training: 2022-04-10 23:13:45,394-: warmup_step              0
Training: 2022-04-10 23:13:45,394-: total_step               101160
Training: 2022-04-10 23:14:52,660-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-10 23:14:58,211-Speed 3389.74 samples/sec   Loss 47.3273   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-10 23:15:01,211-Speed 3415.40 samples/sec   Loss 48.1647   LearningRate 0.0999   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-10 23:15:04,203-Speed 3423.93 samples/sec   Loss 48.8566   LearningRate 0.0999   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-10 23:15:07,154-Speed 3470.90 samples/sec   Loss 48.2520   LearningRate 0.0999   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-10 23:15:10,093-Speed 3486.58 samples/sec   Loss 48.0507   LearningRate 0.0999   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-10 23:15:13,110-Speed 3394.80 samples/sec   Loss 47.8045   LearningRate 0.0999   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-10 23:15:16,120-Speed 3402.99 samples/sec   Loss 47.9883   LearningRate 0.0998   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-10 23:15:19,057-Speed 3488.52 samples/sec   Loss 47.7421   LearningRate 0.0998   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-10 23:15:21,999-Speed 3481.71 samples/sec   Loss 47.3690   LearningRate 0.0998   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-10 23:15:24,975-Speed 3443.00 samples/sec   Loss 47.0640   LearningRate 0.0998   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-10 23:15:27,908-Speed 3492.52 samples/sec   Loss 46.9104   LearningRate 0.0998   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-10 23:15:30,855-Speed 3475.43 samples/sec   Loss 46.8356   LearningRate 0.0997   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:33,784-Speed 3498.00 samples/sec   Loss 46.7744   LearningRate 0.0997   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:36,734-Speed 3471.41 samples/sec   Loss 46.4647   LearningRate 0.0997   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:39,687-Speed 3469.98 samples/sec   Loss 46.3438   LearningRate 0.0997   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:42,641-Speed 3466.93 samples/sec   Loss 46.1223   LearningRate 0.0997   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:45,583-Speed 3482.05 samples/sec   Loss 45.9521   LearningRate 0.0996   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:48,529-Speed 3476.79 samples/sec   Loss 45.9236   LearningRate 0.0996   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:51,466-Speed 3487.14 samples/sec   Loss 45.6441   LearningRate 0.0996   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:15:54,416-Speed 3472.69 samples/sec   Loss 45.5037   LearningRate 0.0996   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:15:57,364-Speed 3475.02 samples/sec   Loss 45.3420   LearningRate 0.0996   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:00,320-Speed 3464.86 samples/sec   Loss 45.1344   LearningRate 0.0995   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:03,276-Speed 3465.52 samples/sec   Loss 45.1216   LearningRate 0.0995   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:06,221-Speed 3477.70 samples/sec   Loss 44.9215   LearningRate 0.0995   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:09,162-Speed 3482.34 samples/sec   Loss 44.7898   LearningRate 0.0995   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:12,112-Speed 3471.78 samples/sec   Loss 44.5184   LearningRate 0.0995   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:15,059-Speed 3476.27 samples/sec   Loss 44.4373   LearningRate 0.0994   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:18,002-Speed 3480.02 samples/sec   Loss 44.2242   LearningRate 0.0994   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:20,956-Speed 3467.10 samples/sec   Loss 44.0992   LearningRate 0.0994   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:16:23,896-Speed 3485.04 samples/sec   Loss 43.9775   LearningRate 0.0994   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:26,857-Speed 3458.23 samples/sec   Loss 43.7149   LearningRate 0.0994   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:29,809-Speed 3470.43 samples/sec   Loss 43.6324   LearningRate 0.0993   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:32,770-Speed 3458.30 samples/sec   Loss 43.4586   LearningRate 0.0993   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:35,750-Speed 3437.56 samples/sec   Loss 43.3129   LearningRate 0.0993   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:38,691-Speed 3482.52 samples/sec   Loss 43.2031   LearningRate 0.0993   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:41,637-Speed 3476.53 samples/sec   Loss 42.9775   LearningRate 0.0993   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:44,578-Speed 3483.58 samples/sec   Loss 42.7885   LearningRate 0.0993   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:47,520-Speed 3481.99 samples/sec   Loss 42.5459   LearningRate 0.0992   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:50,468-Speed 3474.45 samples/sec   Loss 42.5405   LearningRate 0.0992   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:16:53,413-Speed 3478.08 samples/sec   Loss 42.3622   LearningRate 0.0992   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:16:56,346-Speed 3491.10 samples/sec   Loss 42.1799   LearningRate 0.0992   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:16:59,306-Speed 3460.78 samples/sec   Loss 41.9984   LearningRate 0.0992   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:17:02,284-Speed 3439.09 samples/sec   Loss 41.9358   LearningRate 0.0991   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:17:05,227-Speed 3480.40 samples/sec   Loss 41.7629   LearningRate 0.0991   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:17:08,172-Speed 3478.18 samples/sec   Loss 41.5721   LearningRate 0.0991   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:17:11,122-Speed 3471.46 samples/sec   Loss 41.4248   LearningRate 0.0991   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:14,068-Speed 3477.57 samples/sec   Loss 41.2698   LearningRate 0.0991   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:17,003-Speed 3489.43 samples/sec   Loss 41.0891   LearningRate 0.0990   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:19,946-Speed 3481.57 samples/sec   Loss 40.9541   LearningRate 0.0990   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:22,891-Speed 3478.61 samples/sec   Loss 40.8092   LearningRate 0.0990   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:25,836-Speed 3478.02 samples/sec   Loss 40.6291   LearningRate 0.0990   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:28,780-Speed 3478.47 samples/sec   Loss 40.4760   LearningRate 0.0990   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:31,716-Speed 3488.51 samples/sec   Loss 40.4010   LearningRate 0.0989   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:34,651-Speed 3489.78 samples/sec   Loss 40.2737   LearningRate 0.0989   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:37,591-Speed 3484.00 samples/sec   Loss 40.1611   LearningRate 0.0989   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:40,512-Speed 3507.13 samples/sec   Loss 39.9895   LearningRate 0.0989   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:43,449-Speed 3489.13 samples/sec   Loss 39.7490   LearningRate 0.0989   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:46,459-Speed 3402.57 samples/sec   Loss 39.4536   LearningRate 0.0988   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:49,393-Speed 3490.80 samples/sec   Loss 39.3885   LearningRate 0.0988   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:52,343-Speed 3471.68 samples/sec   Loss 39.2799   LearningRate 0.0988   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:55,280-Speed 3487.30 samples/sec   Loss 39.1567   LearningRate 0.0988   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:17:58,220-Speed 3483.58 samples/sec   Loss 39.0062   LearningRate 0.0988   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:01,167-Speed 3475.76 samples/sec   Loss 38.8358   LearningRate 0.0987   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:04,119-Speed 3470.01 samples/sec   Loss 38.6928   LearningRate 0.0987   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:07,065-Speed 3477.25 samples/sec   Loss 38.5280   LearningRate 0.0987   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:09,991-Speed 3500.30 samples/sec   Loss 38.4055   LearningRate 0.0987   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:12,928-Speed 3487.83 samples/sec   Loss 38.1617   LearningRate 0.0987   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:15,861-Speed 3492.78 samples/sec   Loss 38.0510   LearningRate 0.0986   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:18,798-Speed 3486.53 samples/sec   Loss 37.8527   LearningRate 0.0986   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:21,737-Speed 3485.37 samples/sec   Loss 37.6648   LearningRate 0.0986   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:24,681-Speed 3479.34 samples/sec   Loss 37.5056   LearningRate 0.0986   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:27,616-Speed 3490.16 samples/sec   Loss 37.4346   LearningRate 0.0986   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:30,577-Speed 3459.36 samples/sec   Loss 37.2564   LearningRate 0.0985   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:33,522-Speed 3477.18 samples/sec   Loss 37.1208   LearningRate 0.0985   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:36,469-Speed 3476.14 samples/sec   Loss 36.9085   LearningRate 0.0985   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:39,407-Speed 3485.78 samples/sec   Loss 36.7451   LearningRate 0.0985   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:18:42,345-Speed 3486.93 samples/sec   Loss 36.7186   LearningRate 0.0985   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:45,282-Speed 3487.04 samples/sec   Loss 36.4872   LearningRate 0.0984   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:48,230-Speed 3474.73 samples/sec   Loss 36.2792   LearningRate 0.0984   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:51,177-Speed 3475.37 samples/sec   Loss 36.1131   LearningRate 0.0984   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:54,120-Speed 3480.27 samples/sec   Loss 35.9162   LearningRate 0.0984   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:18:57,069-Speed 3473.48 samples/sec   Loss 35.8651   LearningRate 0.0984   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:00,003-Speed 3491.15 samples/sec   Loss 35.5578   LearningRate 0.0983   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:02,948-Speed 3478.07 samples/sec   Loss 35.5299   LearningRate 0.0983   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:05,888-Speed 3483.80 samples/sec   Loss 35.4439   LearningRate 0.0983   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:08,825-Speed 3488.10 samples/sec   Loss 35.0125   LearningRate 0.0983   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:11,752-Speed 3498.44 samples/sec   Loss 34.9572   LearningRate 0.0983   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:14,691-Speed 3485.20 samples/sec   Loss 34.7387   LearningRate 0.0982   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:17,635-Speed 3479.18 samples/sec   Loss 34.5511   LearningRate 0.0982   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:19:20,574-Speed 3485.09 samples/sec   Loss 34.4487   LearningRate 0.0982   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:23,519-Speed 3478.15 samples/sec   Loss 34.4024   LearningRate 0.0982   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:26,474-Speed 3466.25 samples/sec   Loss 34.1118   LearningRate 0.0982   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:29,413-Speed 3485.20 samples/sec   Loss 34.0529   LearningRate 0.0982   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:32,377-Speed 3455.86 samples/sec   Loss 33.8308   LearningRate 0.0981   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:35,319-Speed 3482.07 samples/sec   Loss 33.5956   LearningRate 0.0981   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:38,260-Speed 3482.51 samples/sec   Loss 33.4252   LearningRate 0.0981   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:41,204-Speed 3478.44 samples/sec   Loss 33.5193   LearningRate 0.0981   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-10 23:19:44,136-Speed 3493.20 samples/sec   Loss 33.2280   LearningRate 0.0981   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:47,113-Speed 3441.34 samples/sec   Loss 33.0301   LearningRate 0.0980   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:50,058-Speed 3477.38 samples/sec   Loss 32.8899   LearningRate 0.0980   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:53,008-Speed 3472.16 samples/sec   Loss 32.6804   LearningRate 0.0980   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:55,960-Speed 3469.50 samples/sec   Loss 32.5569   LearningRate 0.0980   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:19:58,928-Speed 3451.85 samples/sec   Loss 32.3279   LearningRate 0.0980   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:01,879-Speed 3470.70 samples/sec   Loss 32.0823   LearningRate 0.0979   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:04,826-Speed 3475.49 samples/sec   Loss 32.1485   LearningRate 0.0979   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:07,768-Speed 3481.71 samples/sec   Loss 31.7788   LearningRate 0.0979   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:10,711-Speed 3480.24 samples/sec   Loss 31.5781   LearningRate 0.0979   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:13,651-Speed 3484.24 samples/sec   Loss 31.3841   LearningRate 0.0979   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:16,605-Speed 3467.35 samples/sec   Loss 31.3892   LearningRate 0.0978   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:19,552-Speed 3475.16 samples/sec   Loss 31.1897   LearningRate 0.0978   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:22,498-Speed 3476.58 samples/sec   Loss 31.0769   LearningRate 0.0978   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:25,442-Speed 3479.85 samples/sec   Loss 30.5718   LearningRate 0.0978   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:28,392-Speed 3471.65 samples/sec   Loss 30.8429   LearningRate 0.0978   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:31,337-Speed 3478.60 samples/sec   Loss 30.5111   LearningRate 0.0977   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:34,289-Speed 3469.38 samples/sec   Loss 30.3706   LearningRate 0.0977   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:37,238-Speed 3473.65 samples/sec   Loss 30.1590   LearningRate 0.0977   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:40,191-Speed 3469.12 samples/sec   Loss 29.9044   LearningRate 0.0977   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:43,125-Speed 3490.05 samples/sec   Loss 29.8432   LearningRate 0.0977   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:46,068-Speed 3480.99 samples/sec   Loss 29.5799   LearningRate 0.0976   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:49,016-Speed 3474.23 samples/sec   Loss 29.5285   LearningRate 0.0976   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:51,964-Speed 3474.44 samples/sec   Loss 29.4015   LearningRate 0.0976   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:54,924-Speed 3460.49 samples/sec   Loss 29.3319   LearningRate 0.0976   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:20:57,868-Speed 3479.75 samples/sec   Loss 29.0804   LearningRate 0.0976   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:00,816-Speed 3473.84 samples/sec   Loss 29.0961   LearningRate 0.0975   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:03,803-Speed 3428.84 samples/sec   Loss 28.7543   LearningRate 0.0975   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:06,752-Speed 3472.93 samples/sec   Loss 28.5811   LearningRate 0.0975   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:09,699-Speed 3476.19 samples/sec   Loss 28.5935   LearningRate 0.0975   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:12,639-Speed 3483.28 samples/sec   Loss 28.3610   LearningRate 0.0975   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:15,605-Speed 3454.25 samples/sec   Loss 28.2017   LearningRate 0.0974   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:18,555-Speed 3471.79 samples/sec   Loss 28.3622   LearningRate 0.0974   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:21,507-Speed 3468.97 samples/sec   Loss 27.8651   LearningRate 0.0974   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:24,475-Speed 3451.60 samples/sec   Loss 27.7559   LearningRate 0.0974   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:27,429-Speed 3468.17 samples/sec   Loss 27.8152   LearningRate 0.0974   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:30,378-Speed 3473.05 samples/sec   Loss 27.3315   LearningRate 0.0973   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:33,327-Speed 3473.35 samples/sec   Loss 27.1738   LearningRate 0.0973   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:36,275-Speed 3473.92 samples/sec   Loss 27.1084   LearningRate 0.0973   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:39,221-Speed 3477.03 samples/sec   Loss 27.0984   LearningRate 0.0973   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:42,160-Speed 3484.94 samples/sec   Loss 26.8077   LearningRate 0.0973   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:45,115-Speed 3466.37 samples/sec   Loss 26.8609   LearningRate 0.0973   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:48,064-Speed 3473.08 samples/sec   Loss 26.9001   LearningRate 0.0972   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:51,021-Speed 3463.80 samples/sec   Loss 26.5722   LearningRate 0.0972   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:53,973-Speed 3470.16 samples/sec   Loss 26.4863   LearningRate 0.0972   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:56,931-Speed 3462.80 samples/sec   Loss 26.5693   LearningRate 0.0972   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:21:59,878-Speed 3475.87 samples/sec   Loss 26.2359   LearningRate 0.0972   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:02,831-Speed 3468.21 samples/sec   Loss 26.2516   LearningRate 0.0971   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:05,779-Speed 3474.89 samples/sec   Loss 25.5802   LearningRate 0.0971   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:08,730-Speed 3470.39 samples/sec   Loss 25.5674   LearningRate 0.0971   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:11,673-Speed 3480.25 samples/sec   Loss 25.4488   LearningRate 0.0971   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:14,625-Speed 3469.62 samples/sec   Loss 25.5923   LearningRate 0.0971   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:17,616-Speed 3424.36 samples/sec   Loss 25.2241   LearningRate 0.0970   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:20,578-Speed 3459.12 samples/sec   Loss 25.0249   LearningRate 0.0970   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:23,541-Speed 3457.13 samples/sec   Loss 25.1807   LearningRate 0.0970   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:26,494-Speed 3467.95 samples/sec   Loss 25.2119   LearningRate 0.0970   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:29,453-Speed 3461.34 samples/sec   Loss 24.9304   LearningRate 0.0970   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:32,439-Speed 3430.89 samples/sec   Loss 24.8900   LearningRate 0.0969   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:35,394-Speed 3466.59 samples/sec   Loss 24.7611   LearningRate 0.0969   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:38,349-Speed 3466.44 samples/sec   Loss 24.4857   LearningRate 0.0969   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:41,302-Speed 3468.95 samples/sec   Loss 24.2593   LearningRate 0.0969   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:44,256-Speed 3467.30 samples/sec   Loss 24.2353   LearningRate 0.0969   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:47,214-Speed 3463.05 samples/sec   Loss 24.2247   LearningRate 0.0968   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:50,182-Speed 3450.58 samples/sec   Loss 24.2177   LearningRate 0.0968   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:53,137-Speed 3466.93 samples/sec   Loss 24.1287   LearningRate 0.0968   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:56,091-Speed 3467.05 samples/sec   Loss 23.9479   LearningRate 0.0968   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:22:59,098-Speed 3406.65 samples/sec   Loss 23.8429   LearningRate 0.0968   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:02,088-Speed 3425.12 samples/sec   Loss 23.6208   LearningRate 0.0967   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:05,046-Speed 3462.85 samples/sec   Loss 23.5947   LearningRate 0.0967   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:08,000-Speed 3466.79 samples/sec   Loss 23.4163   LearningRate 0.0967   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:10,955-Speed 3466.74 samples/sec   Loss 23.2704   LearningRate 0.0967   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-10 23:23:13,898-Speed 3479.75 samples/sec   Loss 23.5878   LearningRate 0.0967   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:16,856-Speed 3463.99 samples/sec   Loss 23.2828   LearningRate 0.0966   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:19,809-Speed 3467.98 samples/sec   Loss 23.1734   LearningRate 0.0966   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:22,762-Speed 3469.22 samples/sec   Loss 22.7724   LearningRate 0.0966   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:25,720-Speed 3462.49 samples/sec   Loss 22.8237   LearningRate 0.0966   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:28,681-Speed 3459.42 samples/sec   Loss 22.6503   LearningRate 0.0966   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:31,634-Speed 3467.84 samples/sec   Loss 22.8688   LearningRate 0.0966   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:34,592-Speed 3462.81 samples/sec   Loss 22.3705   LearningRate 0.0965   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:37,553-Speed 3459.40 samples/sec   Loss 22.1893   LearningRate 0.0965   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:40,520-Speed 3451.67 samples/sec   Loss 22.2668   LearningRate 0.0965   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:43,468-Speed 3474.16 samples/sec   Loss 22.4202   LearningRate 0.0965   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:46,424-Speed 3466.07 samples/sec   Loss 22.1842   LearningRate 0.0965   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:49,379-Speed 3466.07 samples/sec   Loss 21.9713   LearningRate 0.0964   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:52,334-Speed 3466.75 samples/sec   Loss 22.0354   LearningRate 0.0964   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:55,286-Speed 3469.84 samples/sec   Loss 21.8810   LearningRate 0.0964   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:23:58,235-Speed 3472.30 samples/sec   Loss 21.7592   LearningRate 0.0964   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:24:01,243-Speed 3405.59 samples/sec   Loss 21.7708   LearningRate 0.0964   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:24:04,188-Speed 3478.23 samples/sec   Loss 21.5837   LearningRate 0.0963   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:07,160-Speed 3445.49 samples/sec   Loss 21.6370   LearningRate 0.0963   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:10,109-Speed 3473.67 samples/sec   Loss 21.4710   LearningRate 0.0963   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:13,063-Speed 3467.61 samples/sec   Loss 21.3677   LearningRate 0.0963   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:16,021-Speed 3463.87 samples/sec   Loss 21.2666   LearningRate 0.0963   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:18,969-Speed 3473.72 samples/sec   Loss 21.0998   LearningRate 0.0962   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:21,917-Speed 3474.56 samples/sec   Loss 21.1563   LearningRate 0.0962   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:24,872-Speed 3465.90 samples/sec   Loss 21.0580   LearningRate 0.0962   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:27,827-Speed 3466.21 samples/sec   Loss 21.0078   LearningRate 0.0962   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:30,797-Speed 3448.84 samples/sec   Loss 20.8539   LearningRate 0.0962   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-10 23:24:33,748-Speed 3470.46 samples/sec   Loss 20.7606   LearningRate 0.0961   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:24:36,713-Speed 3454.45 samples/sec   Loss 20.7871   LearningRate 0.0961   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:24:39,674-Speed 3459.98 samples/sec   Loss 20.5702   LearningRate 0.0961   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:24:42,626-Speed 3469.07 samples/sec   Loss 20.4358   LearningRate 0.0961   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-10 23:25:26,979-[lfw][2000]XNorm: 21.043712
Training: 2022-04-10 23:25:26,980-[lfw][2000]Accuracy-Flip: 0.98433+-0.00389
Training: 2022-04-10 23:25:26,980-[lfw][2000]Accuracy-Highest: 0.98433
Training: 2022-04-10 23:26:18,162-[cfp_fp][2000]XNorm: 18.296019
Training: 2022-04-10 23:26:18,163-[cfp_fp][2000]Accuracy-Flip: 0.81600+-0.02091
Training: 2022-04-10 23:26:18,164-[cfp_fp][2000]Accuracy-Highest: 0.81600
Training: 2022-04-10 23:27:02,046-[agedb_30][2000]XNorm: 20.396503
Training: 2022-04-10 23:27:02,047-[agedb_30][2000]Accuracy-Flip: 0.88017+-0.02066
Training: 2022-04-10 23:27:02,048-[agedb_30][2000]Accuracy-Highest: 0.88017
Training: 2022-04-10 23:27:04,986-Speed 71.93 samples/sec   Loss 20.4094   LearningRate 0.0961   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:07,914-Speed 3498.12 samples/sec   Loss 20.3666   LearningRate 0.0960   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:10,844-Speed 3495.79 samples/sec   Loss 20.3427   LearningRate 0.0960   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:13,775-Speed 3494.18 samples/sec   Loss 20.4137   LearningRate 0.0960   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:16,728-Speed 3468.54 samples/sec   Loss 20.1456   LearningRate 0.0960   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:19,663-Speed 3491.03 samples/sec   Loss 20.0656   LearningRate 0.0960   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:22,587-Speed 3502.86 samples/sec   Loss 20.2408   LearningRate 0.0959   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:25,536-Speed 3472.92 samples/sec   Loss 19.9981   LearningRate 0.0959   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:28,470-Speed 3490.93 samples/sec   Loss 19.8215   LearningRate 0.0959   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:31,448-Speed 3439.43 samples/sec   Loss 19.8146   LearningRate 0.0959   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:34,388-Speed 3484.56 samples/sec   Loss 19.6353   LearningRate 0.0959   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:37,339-Speed 3470.56 samples/sec   Loss 19.8534   LearningRate 0.0959   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:40,285-Speed 3477.10 samples/sec   Loss 19.5803   LearningRate 0.0958   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:43,227-Speed 3481.63 samples/sec   Loss 19.6852   LearningRate 0.0958   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:46,169-Speed 3481.01 samples/sec   Loss 19.5680   LearningRate 0.0958   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:49,113-Speed 3479.80 samples/sec   Loss 19.4155   LearningRate 0.0958   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:52,042-Speed 3496.17 samples/sec   Loss 19.2233   LearningRate 0.0958   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:54,990-Speed 3474.57 samples/sec   Loss 19.4111   LearningRate 0.0957   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:27:57,939-Speed 3474.17 samples/sec   Loss 19.0449   LearningRate 0.0957   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:00,893-Speed 3466.70 samples/sec   Loss 19.2510   LearningRate 0.0957   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:03,840-Speed 3476.37 samples/sec   Loss 19.3937   LearningRate 0.0957   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:06,782-Speed 3480.85 samples/sec   Loss 19.0701   LearningRate 0.0957   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:09,725-Speed 3481.32 samples/sec   Loss 19.0803   LearningRate 0.0956   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:12,669-Speed 3478.46 samples/sec   Loss 18.9225   LearningRate 0.0956   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:15,624-Speed 3466.03 samples/sec   Loss 19.0080   LearningRate 0.0956   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:18,562-Speed 3486.16 samples/sec   Loss 18.9760   LearningRate 0.0956   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:21,501-Speed 3485.45 samples/sec   Loss 18.9391   LearningRate 0.0956   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:24,437-Speed 3488.25 samples/sec   Loss 18.8347   LearningRate 0.0955   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:27,399-Speed 3458.55 samples/sec   Loss 18.8076   LearningRate 0.0955   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:30,339-Speed 3483.79 samples/sec   Loss 18.6863   LearningRate 0.0955   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:33,283-Speed 3479.12 samples/sec   Loss 18.6960   LearningRate 0.0955   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:36,232-Speed 3473.09 samples/sec   Loss 18.6664   LearningRate 0.0955   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:39,172-Speed 3484.59 samples/sec   Loss 18.6709   LearningRate 0.0954   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:42,109-Speed 3487.53 samples/sec   Loss 18.5786   LearningRate 0.0954   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:45,053-Speed 3479.01 samples/sec   Loss 18.4264   LearningRate 0.0954   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:48,018-Speed 3454.40 samples/sec   Loss 18.4042   LearningRate 0.0954   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:50,987-Speed 3450.31 samples/sec   Loss 18.3494   LearningRate 0.0954   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-10 23:28:53,935-Speed 3474.47 samples/sec   Loss 18.4600   LearningRate 0.0953   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:56,881-Speed 3476.03 samples/sec   Loss 18.2423   LearningRate 0.0953   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:28:59,828-Speed 3476.09 samples/sec   Loss 18.2164   LearningRate 0.0953   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:02,764-Speed 3488.61 samples/sec   Loss 18.2572   LearningRate 0.0953   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:05,715-Speed 3470.65 samples/sec   Loss 18.2174   LearningRate 0.0953   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:08,653-Speed 3486.65 samples/sec   Loss 18.2329   LearningRate 0.0953   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:11,615-Speed 3458.76 samples/sec   Loss 18.1085   LearningRate 0.0952   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:14,578-Speed 3456.43 samples/sec   Loss 17.9340   LearningRate 0.0952   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:17,522-Speed 3478.52 samples/sec   Loss 17.8165   LearningRate 0.0952   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:20,464-Speed 3482.64 samples/sec   Loss 17.8689   LearningRate 0.0952   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:23,389-Speed 3500.99 samples/sec   Loss 17.8758   LearningRate 0.0952   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:26,332-Speed 3480.74 samples/sec   Loss 17.6671   LearningRate 0.0951   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:29,272-Speed 3484.66 samples/sec   Loss 17.8096   LearningRate 0.0951   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:32,212-Speed 3483.60 samples/sec   Loss 17.5732   LearningRate 0.0951   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:35,154-Speed 3481.80 samples/sec   Loss 17.6953   LearningRate 0.0951   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:38,113-Speed 3461.07 samples/sec   Loss 17.5401   LearningRate 0.0951   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:41,054-Speed 3482.29 samples/sec   Loss 17.5734   LearningRate 0.0950   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:44,011-Speed 3463.57 samples/sec   Loss 17.6846   LearningRate 0.0950   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:46,953-Speed 3482.04 samples/sec   Loss 17.5826   LearningRate 0.0950   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:49,897-Speed 3479.69 samples/sec   Loss 17.3718   LearningRate 0.0950   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:52,830-Speed 3491.44 samples/sec   Loss 17.4996   LearningRate 0.0950   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:55,779-Speed 3473.79 samples/sec   Loss 17.4728   LearningRate 0.0949   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:29:58,732-Speed 3468.61 samples/sec   Loss 17.3714   LearningRate 0.0949   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:01,676-Speed 3479.26 samples/sec   Loss 17.3197   LearningRate 0.0949   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:04,631-Speed 3465.83 samples/sec   Loss 17.2131   LearningRate 0.0949   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:07,578-Speed 3475.29 samples/sec   Loss 17.2678   LearningRate 0.0949   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:10,525-Speed 3475.31 samples/sec   Loss 17.3222   LearningRate 0.0948   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:13,472-Speed 3476.89 samples/sec   Loss 17.2733   LearningRate 0.0948   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:16,429-Speed 3463.74 samples/sec   Loss 17.2502   LearningRate 0.0948   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:19,374-Speed 3478.50 samples/sec   Loss 17.2241   LearningRate 0.0948   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:22,313-Speed 3484.57 samples/sec   Loss 17.4127   LearningRate 0.0948   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:25,258-Speed 3478.35 samples/sec   Loss 17.0828   LearningRate 0.0948   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:28,201-Speed 3479.47 samples/sec   Loss 16.9950   LearningRate 0.0947   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:31,151-Speed 3472.03 samples/sec   Loss 17.0025   LearningRate 0.0947   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:34,105-Speed 3467.44 samples/sec   Loss 16.8680   LearningRate 0.0947   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:37,065-Speed 3460.35 samples/sec   Loss 16.8243   LearningRate 0.0947   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:40,009-Speed 3479.54 samples/sec   Loss 16.7777   LearningRate 0.0947   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:42,954-Speed 3478.19 samples/sec   Loss 16.9445   LearningRate 0.0946   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:45,900-Speed 3476.84 samples/sec   Loss 16.7681   LearningRate 0.0946   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:48,844-Speed 3479.49 samples/sec   Loss 16.5929   LearningRate 0.0946   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:51,780-Speed 3488.31 samples/sec   Loss 16.8676   LearningRate 0.0946   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:54,742-Speed 3457.88 samples/sec   Loss 17.0562   LearningRate 0.0946   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:30:57,689-Speed 3475.65 samples/sec   Loss 16.6161   LearningRate 0.0945   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:31:00,634-Speed 3477.56 samples/sec   Loss 16.5615   LearningRate 0.0945   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:31:03,581-Speed 3476.36 samples/sec   Loss 16.7329   LearningRate 0.0945   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:06,525-Speed 3478.89 samples/sec   Loss 16.6488   LearningRate 0.0945   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:09,466-Speed 3482.91 samples/sec   Loss 16.3466   LearningRate 0.0945   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:12,411-Speed 3477.75 samples/sec   Loss 16.6189   LearningRate 0.0944   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:15,354-Speed 3480.23 samples/sec   Loss 16.5782   LearningRate 0.0944   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:18,296-Speed 3481.89 samples/sec   Loss 16.2836   LearningRate 0.0944   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:21,226-Speed 3495.12 samples/sec   Loss 16.4012   LearningRate 0.0944   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:24,173-Speed 3476.39 samples/sec   Loss 16.3922   LearningRate 0.0944   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:27,116-Speed 3480.24 samples/sec   Loss 16.5935   LearningRate 0.0943   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:30,065-Speed 3473.79 samples/sec   Loss 16.5198   LearningRate 0.0943   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:33,021-Speed 3464.50 samples/sec   Loss 16.2683   LearningRate 0.0943   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:35,967-Speed 3477.19 samples/sec   Loss 16.2620   LearningRate 0.0943   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:31:38,911-Speed 3478.68 samples/sec   Loss 16.2917   LearningRate 0.0943   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:31:41,858-Speed 3475.30 samples/sec   Loss 16.3678   LearningRate 0.0943   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:31:44,805-Speed 3476.50 samples/sec   Loss 16.0961   LearningRate 0.0942   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:31:47,761-Speed 3464.85 samples/sec   Loss 16.2049   LearningRate 0.0942   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:31:50,728-Speed 3452.01 samples/sec   Loss 16.1717   LearningRate 0.0942   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:31:53,668-Speed 3483.21 samples/sec   Loss 15.9842   LearningRate 0.0942   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:31:56,610-Speed 3482.16 samples/sec   Loss 16.3690   LearningRate 0.0942   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:31:59,552-Speed 3481.41 samples/sec   Loss 16.0127   LearningRate 0.0941   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:02,495-Speed 3480.49 samples/sec   Loss 15.9795   LearningRate 0.0941   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:05,442-Speed 3475.89 samples/sec   Loss 15.8814   LearningRate 0.0941   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:08,386-Speed 3479.67 samples/sec   Loss 16.0682   LearningRate 0.0941   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:32:11,441-Speed 3352.08 samples/sec   Loss 16.0123   LearningRate 0.0941   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:32:14,394-Speed 3468.51 samples/sec   Loss 16.2272   LearningRate 0.0940   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:32:17,345-Speed 3471.21 samples/sec   Loss 16.0537   LearningRate 0.0940   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:32:20,287-Speed 3481.88 samples/sec   Loss 16.0204   LearningRate 0.0940   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:32:23,284-Speed 3416.51 samples/sec   Loss 15.7814   LearningRate 0.0940   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:32:26,234-Speed 3472.04 samples/sec   Loss 15.8375   LearningRate 0.0940   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:32:29,178-Speed 3479.70 samples/sec   Loss 15.8500   LearningRate 0.0939   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:32,132-Speed 3467.08 samples/sec   Loss 15.7463   LearningRate 0.0939   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:35,074-Speed 3481.77 samples/sec   Loss 15.9577   LearningRate 0.0939   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:38,017-Speed 3480.07 samples/sec   Loss 15.6909   LearningRate 0.0939   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:40,972-Speed 3466.24 samples/sec   Loss 15.8044   LearningRate 0.0939   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:43,914-Speed 3482.49 samples/sec   Loss 15.7948   LearningRate 0.0939   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:46,858-Speed 3478.70 samples/sec   Loss 15.6278   LearningRate 0.0938   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:49,804-Speed 3476.36 samples/sec   Loss 15.6456   LearningRate 0.0938   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:52,791-Speed 3429.24 samples/sec   Loss 15.6605   LearningRate 0.0938   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:55,739-Speed 3473.82 samples/sec   Loss 15.5953   LearningRate 0.0938   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:32:58,697-Speed 3463.75 samples/sec   Loss 15.3188   LearningRate 0.0938   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:01,642-Speed 3477.99 samples/sec   Loss 15.5015   LearningRate 0.0937   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:04,586-Speed 3478.92 samples/sec   Loss 15.4277   LearningRate 0.0937   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:07,531-Speed 3478.40 samples/sec   Loss 15.2575   LearningRate 0.0937   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:10,473-Speed 3481.55 samples/sec   Loss 15.3591   LearningRate 0.0937   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:13,430-Speed 3463.82 samples/sec   Loss 15.4233   LearningRate 0.0937   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:16,375-Speed 3477.33 samples/sec   Loss 15.5429   LearningRate 0.0936   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:19,319-Speed 3479.16 samples/sec   Loss 15.2401   LearningRate 0.0936   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:22,266-Speed 3476.00 samples/sec   Loss 15.4057   LearningRate 0.0936   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:25,215-Speed 3473.34 samples/sec   Loss 15.5889   LearningRate 0.0936   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:28,147-Speed 3493.72 samples/sec   Loss 15.4451   LearningRate 0.0936   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:31,093-Speed 3476.86 samples/sec   Loss 15.2470   LearningRate 0.0935   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:34,036-Speed 3480.23 samples/sec   Loss 15.2365   LearningRate 0.0935   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:36,982-Speed 3476.83 samples/sec   Loss 15.1041   LearningRate 0.0935   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:39,925-Speed 3480.64 samples/sec   Loss 15.2607   LearningRate 0.0935   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:42,872-Speed 3475.43 samples/sec   Loss 15.0939   LearningRate 0.0935   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:45,818-Speed 3476.47 samples/sec   Loss 15.2522   LearningRate 0.0934   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:48,767-Speed 3472.78 samples/sec   Loss 15.2954   LearningRate 0.0934   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:51,713-Speed 3477.38 samples/sec   Loss 15.2037   LearningRate 0.0934   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:54,657-Speed 3478.81 samples/sec   Loss 15.2174   LearningRate 0.0934   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:33:57,591-Speed 3490.82 samples/sec   Loss 15.2697   LearningRate 0.0934   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:00,537-Speed 3476.97 samples/sec   Loss 15.0435   LearningRate 0.0934   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:03,483-Speed 3476.84 samples/sec   Loss 14.8872   LearningRate 0.0933   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:06,427-Speed 3479.16 samples/sec   Loss 15.1141   LearningRate 0.0933   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:09,371-Speed 3479.39 samples/sec   Loss 15.1307   LearningRate 0.0933   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:12,319-Speed 3474.52 samples/sec   Loss 15.0686   LearningRate 0.0933   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:15,263-Speed 3478.52 samples/sec   Loss 15.1910   LearningRate 0.0933   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:18,206-Speed 3480.48 samples/sec   Loss 14.9751   LearningRate 0.0932   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:21,154-Speed 3474.62 samples/sec   Loss 15.2985   LearningRate 0.0932   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:24,102-Speed 3473.92 samples/sec   Loss 14.8868   LearningRate 0.0932   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:34:27,048-Speed 3477.10 samples/sec   Loss 15.0179   LearningRate 0.0932   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:29,995-Speed 3476.56 samples/sec   Loss 14.8241   LearningRate 0.0932   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:32,946-Speed 3470.52 samples/sec   Loss 14.8029   LearningRate 0.0931   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:35,894-Speed 3474.74 samples/sec   Loss 14.7983   LearningRate 0.0931   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:38,849-Speed 3465.48 samples/sec   Loss 14.6971   LearningRate 0.0931   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:41,794-Speed 3478.54 samples/sec   Loss 14.7619   LearningRate 0.0931   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:44,741-Speed 3476.51 samples/sec   Loss 14.7581   LearningRate 0.0931   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:47,685-Speed 3478.57 samples/sec   Loss 14.8906   LearningRate 0.0930   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:50,640-Speed 3466.95 samples/sec   Loss 14.6776   LearningRate 0.0930   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:53,589-Speed 3473.33 samples/sec   Loss 14.7566   LearningRate 0.0930   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:56,528-Speed 3484.81 samples/sec   Loss 14.8743   LearningRate 0.0930   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:34:59,486-Speed 3462.80 samples/sec   Loss 14.7309   LearningRate 0.0930   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:02,437-Speed 3470.41 samples/sec   Loss 14.7657   LearningRate 0.0930   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:05,389-Speed 3469.77 samples/sec   Loss 14.8496   LearningRate 0.0929   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:08,360-Speed 3448.48 samples/sec   Loss 14.8394   LearningRate 0.0929   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:11,326-Speed 3452.61 samples/sec   Loss 14.7216   LearningRate 0.0929   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:14,360-Speed 3375.97 samples/sec   Loss 14.4789   LearningRate 0.0929   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:17,344-Speed 3432.86 samples/sec   Loss 14.8183   LearningRate 0.0929   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:20,290-Speed 3476.99 samples/sec   Loss 14.5628   LearningRate 0.0928   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:23,239-Speed 3473.84 samples/sec   Loss 14.4129   LearningRate 0.0928   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:26,172-Speed 3491.42 samples/sec   Loss 14.6359   LearningRate 0.0928   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:29,117-Speed 3478.92 samples/sec   Loss 14.4456   LearningRate 0.0928   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:32,064-Speed 3475.01 samples/sec   Loss 14.6669   LearningRate 0.0928   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:35,011-Speed 3475.23 samples/sec   Loss 14.6601   LearningRate 0.0927   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:37,968-Speed 3464.79 samples/sec   Loss 14.5969   LearningRate 0.0927   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:40,923-Speed 3465.07 samples/sec   Loss 14.6361   LearningRate 0.0927   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:43,879-Speed 3465.47 samples/sec   Loss 14.4914   LearningRate 0.0927   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:46,842-Speed 3456.58 samples/sec   Loss 14.5048   LearningRate 0.0927   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:49,788-Speed 3478.10 samples/sec   Loss 14.3368   LearningRate 0.0926   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:52,736-Speed 3474.11 samples/sec   Loss 14.1301   LearningRate 0.0926   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:35:55,680-Speed 3478.91 samples/sec   Loss 14.5078   LearningRate 0.0926   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:35:58,614-Speed 3491.32 samples/sec   Loss 14.3325   LearningRate 0.0926   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:01,564-Speed 3471.11 samples/sec   Loss 14.4112   LearningRate 0.0926   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:04,517-Speed 3468.88 samples/sec   Loss 14.3881   LearningRate 0.0926   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:07,460-Speed 3480.20 samples/sec   Loss 14.4072   LearningRate 0.0925   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:10,404-Speed 3480.38 samples/sec   Loss 14.5736   LearningRate 0.0925   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:13,350-Speed 3476.13 samples/sec   Loss 14.3342   LearningRate 0.0925   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:16,303-Speed 3468.44 samples/sec   Loss 14.4362   LearningRate 0.0925   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:19,250-Speed 3475.85 samples/sec   Loss 14.1982   LearningRate 0.0925   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:22,197-Speed 3475.31 samples/sec   Loss 14.4430   LearningRate 0.0924   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:25,145-Speed 3474.92 samples/sec   Loss 14.3865   LearningRate 0.0924   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:28,078-Speed 3491.46 samples/sec   Loss 14.3897   LearningRate 0.0924   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:36:30,998-Speed 3507.90 samples/sec   Loss 14.2903   LearningRate 0.0924   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:36:33,941-Speed 3480.16 samples/sec   Loss 14.4002   LearningRate 0.0924   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:36:36,896-Speed 3465.93 samples/sec   Loss 14.3475   LearningRate 0.0923   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:36:39,853-Speed 3464.66 samples/sec   Loss 14.1693   LearningRate 0.0923   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:36:42,800-Speed 3474.89 samples/sec   Loss 14.3017   LearningRate 0.0923   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:36:45,753-Speed 3469.60 samples/sec   Loss 14.2195   LearningRate 0.0923   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:36:48,705-Speed 3469.64 samples/sec   Loss 14.4081   LearningRate 0.0923   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:36:51,654-Speed 3473.34 samples/sec   Loss 14.0259   LearningRate 0.0922   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-10 23:37:35,780-[lfw][4000]XNorm: 22.088006
Training: 2022-04-10 23:37:35,781-[lfw][4000]Accuracy-Flip: 0.99150+-0.00456
Training: 2022-04-10 23:37:35,781-[lfw][4000]Accuracy-Highest: 0.99150
Training: 2022-04-10 23:38:27,089-[cfp_fp][4000]XNorm: 19.753306
Training: 2022-04-10 23:38:27,090-[cfp_fp][4000]Accuracy-Flip: 0.90800+-0.01447
Training: 2022-04-10 23:38:27,090-[cfp_fp][4000]Accuracy-Highest: 0.90800
Training: 2022-04-10 23:39:11,310-[agedb_30][4000]XNorm: 21.772712
Training: 2022-04-10 23:39:11,311-[agedb_30][4000]Accuracy-Flip: 0.94500+-0.01254
Training: 2022-04-10 23:39:11,311-[agedb_30][4000]Accuracy-Highest: 0.94500
Training: 2022-04-10 23:39:14,250-Speed 71.81 samples/sec   Loss 14.1161   LearningRate 0.0922   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:39:17,193-Speed 3480.36 samples/sec   Loss 14.0775   LearningRate 0.0922   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-10 23:39:20,122-Speed 3496.31 samples/sec   Loss 14.3225   LearningRate 0.0922   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:23,050-Speed 3498.92 samples/sec   Loss 13.9899   LearningRate 0.0922   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:25,980-Speed 3495.71 samples/sec   Loss 14.0594   LearningRate 0.0922   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:28,911-Speed 3495.47 samples/sec   Loss 14.2243   LearningRate 0.0921   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:31,841-Speed 3495.20 samples/sec   Loss 14.0425   LearningRate 0.0921   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:34,775-Speed 3491.11 samples/sec   Loss 14.0778   LearningRate 0.0921   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:37,711-Speed 3488.62 samples/sec   Loss 14.2550   LearningRate 0.0921   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:40,645-Speed 3490.25 samples/sec   Loss 13.8633   LearningRate 0.0921   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:43,585-Speed 3485.05 samples/sec   Loss 13.9405   LearningRate 0.0920   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:46,521-Speed 3487.97 samples/sec   Loss 14.0109   LearningRate 0.0920   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-10 23:39:49,458-Speed 3487.54 samples/sec   Loss 14.0522   LearningRate 0.0920   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:39:52,392-Speed 3491.02 samples/sec   Loss 14.0803   LearningRate 0.0920   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:39:55,326-Speed 3491.83 samples/sec   Loss 14.0520   LearningRate 0.0920   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:39:58,260-Speed 3491.13 samples/sec   Loss 13.9538   LearningRate 0.0919   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:01,226-Speed 3453.12 samples/sec   Loss 14.0756   LearningRate 0.0919   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:04,161-Speed 3489.84 samples/sec   Loss 13.9413   LearningRate 0.0919   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:07,102-Speed 3482.09 samples/sec   Loss 13.8732   LearningRate 0.0919   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:10,041-Speed 3485.58 samples/sec   Loss 13.9583   LearningRate 0.0919   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:12,994-Speed 3468.36 samples/sec   Loss 14.0690   LearningRate 0.0918   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:15,946-Speed 3470.04 samples/sec   Loss 14.0037   LearningRate 0.0918   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:18,889-Speed 3480.77 samples/sec   Loss 13.9181   LearningRate 0.0918   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-10 23:40:21,815-Speed 3499.99 samples/sec   Loss 13.6873   LearningRate 0.0918   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:24,753-Speed 3486.55 samples/sec   Loss 13.8838   LearningRate 0.0918   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:27,691-Speed 3486.02 samples/sec   Loss 13.8686   LearningRate 0.0918   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:30,670-Speed 3438.41 samples/sec   Loss 13.8681   LearningRate 0.0917   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:33,608-Speed 3486.61 samples/sec   Loss 13.8516   LearningRate 0.0917   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:36,549-Speed 3482.46 samples/sec   Loss 13.9355   LearningRate 0.0917   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:39,502-Speed 3468.47 samples/sec   Loss 13.8635   LearningRate 0.0917   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:42,448-Speed 3476.64 samples/sec   Loss 13.7553   LearningRate 0.0917   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:45,387-Speed 3485.64 samples/sec   Loss 13.9129   LearningRate 0.0916   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:48,323-Speed 3488.19 samples/sec   Loss 13.8145   LearningRate 0.0916   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:51,249-Speed 3500.79 samples/sec   Loss 13.6574   LearningRate 0.0916   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:54,209-Speed 3460.82 samples/sec   Loss 13.6588   LearningRate 0.0916   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:40:57,149-Speed 3483.72 samples/sec   Loss 13.5343   LearningRate 0.0916   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:00,095-Speed 3475.83 samples/sec   Loss 13.7914   LearningRate 0.0915   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:03,055-Speed 3461.51 samples/sec   Loss 13.8595   LearningRate 0.0915   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:06,001-Speed 3476.05 samples/sec   Loss 13.8218   LearningRate 0.0915   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:08,943-Speed 3481.65 samples/sec   Loss 13.4874   LearningRate 0.0915   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:11,887-Speed 3480.07 samples/sec   Loss 13.9341   LearningRate 0.0915   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:14,825-Speed 3485.15 samples/sec   Loss 13.7402   LearningRate 0.0915   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:17,777-Speed 3469.83 samples/sec   Loss 13.6212   LearningRate 0.0914   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:20,706-Speed 3496.79 samples/sec   Loss 13.7647   LearningRate 0.0914   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:23,648-Speed 3481.75 samples/sec   Loss 13.7384   LearningRate 0.0914   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:26,608-Speed 3459.93 samples/sec   Loss 13.6635   LearningRate 0.0914   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:29,554-Speed 3477.71 samples/sec   Loss 13.8048   LearningRate 0.0914   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:32,494-Speed 3483.17 samples/sec   Loss 13.5424   LearningRate 0.0913   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:35,433-Speed 3485.97 samples/sec   Loss 13.7155   LearningRate 0.0913   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:38,379-Speed 3477.18 samples/sec   Loss 13.5128   LearningRate 0.0913   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:41,322-Speed 3480.12 samples/sec   Loss 13.5312   LearningRate 0.0913   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:44,274-Speed 3468.99 samples/sec   Loss 13.4966   LearningRate 0.0913   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:47,213-Speed 3485.80 samples/sec   Loss 13.6289   LearningRate 0.0912   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:50,140-Speed 3498.48 samples/sec   Loss 13.5116   LearningRate 0.0912   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:53,085-Speed 3478.03 samples/sec   Loss 13.5298   LearningRate 0.0912   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:56,028-Speed 3480.70 samples/sec   Loss 13.5292   LearningRate 0.0912   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:41:58,976-Speed 3474.85 samples/sec   Loss 13.6103   LearningRate 0.0912   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:01,916-Speed 3484.33 samples/sec   Loss 13.4669   LearningRate 0.0912   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:04,883-Speed 3451.67 samples/sec   Loss 13.4326   LearningRate 0.0911   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:07,826-Speed 3480.21 samples/sec   Loss 13.4220   LearningRate 0.0911   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:10,769-Speed 3480.12 samples/sec   Loss 13.4041   LearningRate 0.0911   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:13,706-Speed 3487.74 samples/sec   Loss 13.4651   LearningRate 0.0911   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:16,652-Speed 3476.78 samples/sec   Loss 13.4470   LearningRate 0.0911   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:19,588-Speed 3488.80 samples/sec   Loss 13.4588   LearningRate 0.0910   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:22,534-Speed 3476.65 samples/sec   Loss 13.4937   LearningRate 0.0910   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:25,475-Speed 3482.77 samples/sec   Loss 13.3665   LearningRate 0.0910   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:28,425-Speed 3472.21 samples/sec   Loss 13.5306   LearningRate 0.0910   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:31,365-Speed 3482.88 samples/sec   Loss 13.4390   LearningRate 0.0910   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:34,308-Speed 3480.93 samples/sec   Loss 13.5699   LearningRate 0.0909   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:37,246-Speed 3486.22 samples/sec   Loss 13.3970   LearningRate 0.0909   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:40,191-Speed 3478.26 samples/sec   Loss 13.4241   LearningRate 0.0909   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:43,137-Speed 3476.73 samples/sec   Loss 13.5188   LearningRate 0.0909   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:46,080-Speed 3480.07 samples/sec   Loss 13.3743   LearningRate 0.0909   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:49,011-Speed 3494.59 samples/sec   Loss 13.4729   LearningRate 0.0908   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:51,955-Speed 3479.33 samples/sec   Loss 13.3007   LearningRate 0.0908   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:54,908-Speed 3469.13 samples/sec   Loss 13.3960   LearningRate 0.0908   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:42:57,850-Speed 3481.60 samples/sec   Loss 13.3533   LearningRate 0.0908   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:43:00,796-Speed 3475.76 samples/sec   Loss 13.2949   LearningRate 0.0908   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:43:03,738-Speed 3481.88 samples/sec   Loss 13.5648   LearningRate 0.0908   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:43:06,681-Speed 3480.92 samples/sec   Loss 13.4591   LearningRate 0.0907   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:43:09,623-Speed 3481.68 samples/sec   Loss 13.3121   LearningRate 0.0907   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:12,573-Speed 3471.21 samples/sec   Loss 13.2440   LearningRate 0.0907   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:15,511-Speed 3485.79 samples/sec   Loss 13.3317   LearningRate 0.0907   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:18,465-Speed 3468.75 samples/sec   Loss 13.2900   LearningRate 0.0907   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:21,422-Speed 3462.93 samples/sec   Loss 13.2052   LearningRate 0.0906   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:24,394-Speed 3447.18 samples/sec   Loss 13.1424   LearningRate 0.0906   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:27,370-Speed 3441.56 samples/sec   Loss 13.0181   LearningRate 0.0906   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:30,316-Speed 3475.89 samples/sec   Loss 13.1294   LearningRate 0.0906   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:33,255-Speed 3485.99 samples/sec   Loss 13.3682   LearningRate 0.0906   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:36,196-Speed 3482.19 samples/sec   Loss 13.1552   LearningRate 0.0905   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:39,137-Speed 3483.38 samples/sec   Loss 13.3048   LearningRate 0.0905   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:42,082-Speed 3477.34 samples/sec   Loss 12.9094   LearningRate 0.0905   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:45,032-Speed 3472.36 samples/sec   Loss 13.1425   LearningRate 0.0905   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:47,975-Speed 3480.39 samples/sec   Loss 12.9680   LearningRate 0.0905   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:43:50,908-Speed 3492.31 samples/sec   Loss 13.0194   LearningRate 0.0905   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:53,858-Speed 3473.48 samples/sec   Loss 13.1550   LearningRate 0.0904   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:56,802-Speed 3479.01 samples/sec   Loss 13.2820   LearningRate 0.0904   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:43:59,745-Speed 3480.39 samples/sec   Loss 13.0207   LearningRate 0.0904   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:44:02,679-Speed 3490.24 samples/sec   Loss 13.2569   LearningRate 0.0904   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:05,633-Speed 3466.94 samples/sec   Loss 13.0524   LearningRate 0.0904   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:08,577-Speed 3479.39 samples/sec   Loss 13.2455   LearningRate 0.0903   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:11,518-Speed 3483.71 samples/sec   Loss 13.1097   LearningRate 0.0903   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:14,458-Speed 3484.62 samples/sec   Loss 13.3204   LearningRate 0.0903   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:17,400-Speed 3481.01 samples/sec   Loss 13.0376   LearningRate 0.0903   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:20,416-Speed 3396.82 samples/sec   Loss 13.0390   LearningRate 0.0903   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:33,052-Speed 810.41 samples/sec   Loss 12.8230   LearningRate 0.0902   Epoch: 1   Global Step: 5060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:36,006-Speed 3468.78 samples/sec   Loss 12.1781   LearningRate 0.0902   Epoch: 1   Global Step: 5070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:38,966-Speed 3459.94 samples/sec   Loss 12.0808   LearningRate 0.0902   Epoch: 1   Global Step: 5080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:44:41,913-Speed 3475.03 samples/sec   Loss 12.1553   LearningRate 0.0902   Epoch: 1   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:44:44,860-Speed 3476.36 samples/sec   Loss 12.2992   LearningRate 0.0902   Epoch: 1   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:44:47,808-Speed 3473.83 samples/sec   Loss 12.0251   LearningRate 0.0902   Epoch: 1   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:44:50,748-Speed 3483.78 samples/sec   Loss 12.1304   LearningRate 0.0901   Epoch: 1   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:44:53,715-Speed 3452.31 samples/sec   Loss 12.2892   LearningRate 0.0901   Epoch: 1   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:44:56,693-Speed 3438.92 samples/sec   Loss 12.3380   LearningRate 0.0901   Epoch: 1   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:44:59,643-Speed 3473.26 samples/sec   Loss 12.0149   LearningRate 0.0901   Epoch: 1   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:02,588-Speed 3477.41 samples/sec   Loss 12.4538   LearningRate 0.0901   Epoch: 1   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:05,532-Speed 3479.43 samples/sec   Loss 12.2624   LearningRate 0.0900   Epoch: 1   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:08,482-Speed 3472.06 samples/sec   Loss 12.3121   LearningRate 0.0900   Epoch: 1   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:11,432-Speed 3470.95 samples/sec   Loss 12.1283   LearningRate 0.0900   Epoch: 1   Global Step: 5190   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:45:14,370-Speed 3486.87 samples/sec   Loss 12.3784   LearningRate 0.0900   Epoch: 1   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:17,341-Speed 3447.72 samples/sec   Loss 12.3968   LearningRate 0.0900   Epoch: 1   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:20,303-Speed 3458.54 samples/sec   Loss 12.3967   LearningRate 0.0899   Epoch: 1   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:23,250-Speed 3475.54 samples/sec   Loss 12.3252   LearningRate 0.0899   Epoch: 1   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:26,199-Speed 3472.66 samples/sec   Loss 12.2831   LearningRate 0.0899   Epoch: 1   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:29,141-Speed 3482.05 samples/sec   Loss 12.3086   LearningRate 0.0899   Epoch: 1   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:32,106-Speed 3454.74 samples/sec   Loss 12.5128   LearningRate 0.0899   Epoch: 1   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:35,055-Speed 3473.40 samples/sec   Loss 12.3510   LearningRate 0.0899   Epoch: 1   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:38,004-Speed 3472.34 samples/sec   Loss 12.2736   LearningRate 0.0898   Epoch: 1   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:40,949-Speed 3477.95 samples/sec   Loss 12.4296   LearningRate 0.0898   Epoch: 1   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:44,238-Speed 3114.66 samples/sec   Loss 12.3285   LearningRate 0.0898   Epoch: 1   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:47,185-Speed 3474.92 samples/sec   Loss 12.3063   LearningRate 0.0898   Epoch: 1   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:50,178-Speed 3422.06 samples/sec   Loss 12.5340   LearningRate 0.0898   Epoch: 1   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:53,124-Speed 3477.98 samples/sec   Loss 12.5000   LearningRate 0.0897   Epoch: 1   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:56,066-Speed 3480.56 samples/sec   Loss 12.4060   LearningRate 0.0897   Epoch: 1   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:45:59,012-Speed 3477.76 samples/sec   Loss 12.3636   LearningRate 0.0897   Epoch: 1   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:01,972-Speed 3459.21 samples/sec   Loss 12.4789   LearningRate 0.0897   Epoch: 1   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:04,922-Speed 3472.28 samples/sec   Loss 12.2905   LearningRate 0.0897   Epoch: 1   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:07,877-Speed 3466.97 samples/sec   Loss 12.4001   LearningRate 0.0896   Epoch: 1   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:10,827-Speed 3471.74 samples/sec   Loss 12.5907   LearningRate 0.0896   Epoch: 1   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:13,762-Speed 3490.21 samples/sec   Loss 12.3940   LearningRate 0.0896   Epoch: 1   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:16,726-Speed 3455.08 samples/sec   Loss 12.5151   LearningRate 0.0896   Epoch: 1   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:19,672-Speed 3478.16 samples/sec   Loss 12.6263   LearningRate 0.0896   Epoch: 1   Global Step: 5420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:22,618-Speed 3475.59 samples/sec   Loss 12.5725   LearningRate 0.0896   Epoch: 1   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:25,582-Speed 3456.14 samples/sec   Loss 12.3621   LearningRate 0.0895   Epoch: 1   Global Step: 5440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:28,573-Speed 3424.12 samples/sec   Loss 12.4196   LearningRate 0.0895   Epoch: 1   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:31,517-Speed 3479.13 samples/sec   Loss 12.7327   LearningRate 0.0895   Epoch: 1   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:34,464-Speed 3475.39 samples/sec   Loss 12.2393   LearningRate 0.0895   Epoch: 1   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:37,411-Speed 3475.45 samples/sec   Loss 12.5070   LearningRate 0.0895   Epoch: 1   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:40,357-Speed 3477.29 samples/sec   Loss 12.3363   LearningRate 0.0894   Epoch: 1   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:43,306-Speed 3472.93 samples/sec   Loss 12.4028   LearningRate 0.0894   Epoch: 1   Global Step: 5500   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:46:46,241-Speed 3490.58 samples/sec   Loss 12.5042   LearningRate 0.0894   Epoch: 1   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:49,193-Speed 3469.76 samples/sec   Loss 12.5440   LearningRate 0.0894   Epoch: 1   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:52,144-Speed 3470.28 samples/sec   Loss 12.1591   LearningRate 0.0894   Epoch: 1   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:55,105-Speed 3459.71 samples/sec   Loss 12.4319   LearningRate 0.0893   Epoch: 1   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:46:58,051-Speed 3477.07 samples/sec   Loss 12.2398   LearningRate 0.0893   Epoch: 1   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:00,995-Speed 3478.27 samples/sec   Loss 12.4642   LearningRate 0.0893   Epoch: 1   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:03,944-Speed 3474.06 samples/sec   Loss 12.4296   LearningRate 0.0893   Epoch: 1   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:06,889-Speed 3477.17 samples/sec   Loss 12.4890   LearningRate 0.0893   Epoch: 1   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:09,837-Speed 3475.01 samples/sec   Loss 12.5329   LearningRate 0.0893   Epoch: 1   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:12,784-Speed 3475.10 samples/sec   Loss 12.2382   LearningRate 0.0892   Epoch: 1   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:15,723-Speed 3485.97 samples/sec   Loss 12.6216   LearningRate 0.0892   Epoch: 1   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:18,679-Speed 3464.43 samples/sec   Loss 12.4460   LearningRate 0.0892   Epoch: 1   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:21,623-Speed 3479.02 samples/sec   Loss 12.3326   LearningRate 0.0892   Epoch: 1   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:24,570-Speed 3475.81 samples/sec   Loss 12.5092   LearningRate 0.0892   Epoch: 1   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:27,514-Speed 3479.43 samples/sec   Loss 12.4432   LearningRate 0.0891   Epoch: 1   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:30,462-Speed 3473.99 samples/sec   Loss 12.3638   LearningRate 0.0891   Epoch: 1   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:33,406-Speed 3479.20 samples/sec   Loss 12.5555   LearningRate 0.0891   Epoch: 1   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:36,357-Speed 3470.63 samples/sec   Loss 12.4436   LearningRate 0.0891   Epoch: 1   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:39,300-Speed 3480.85 samples/sec   Loss 12.4471   LearningRate 0.0891   Epoch: 1   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:42,244-Speed 3479.47 samples/sec   Loss 12.4754   LearningRate 0.0890   Epoch: 1   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:45,175-Speed 3494.65 samples/sec   Loss 12.4369   LearningRate 0.0890   Epoch: 1   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:48,122-Speed 3475.31 samples/sec   Loss 12.5726   LearningRate 0.0890   Epoch: 1   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:51,064-Speed 3481.58 samples/sec   Loss 12.4748   LearningRate 0.0890   Epoch: 1   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:54,013-Speed 3472.70 samples/sec   Loss 12.5552   LearningRate 0.0890   Epoch: 1   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:56,962-Speed 3473.83 samples/sec   Loss 12.2501   LearningRate 0.0890   Epoch: 1   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:47:59,943-Speed 3436.05 samples/sec   Loss 12.2534   LearningRate 0.0889   Epoch: 1   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:02,888-Speed 3477.96 samples/sec   Loss 12.4558   LearningRate 0.0889   Epoch: 1   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:05,880-Speed 3423.72 samples/sec   Loss 12.2403   LearningRate 0.0889   Epoch: 1   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:08,823-Speed 3479.53 samples/sec   Loss 12.3632   LearningRate 0.0889   Epoch: 1   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:11,768-Speed 3478.26 samples/sec   Loss 12.3575   LearningRate 0.0889   Epoch: 1   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:14,715-Speed 3475.62 samples/sec   Loss 12.2082   LearningRate 0.0888   Epoch: 1   Global Step: 5810   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:48:17,650-Speed 3489.83 samples/sec   Loss 12.2937   LearningRate 0.0888   Epoch: 1   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:20,611-Speed 3459.58 samples/sec   Loss 12.3293   LearningRate 0.0888   Epoch: 1   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:23,556-Speed 3478.08 samples/sec   Loss 12.2936   LearningRate 0.0888   Epoch: 1   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:26,507-Speed 3470.72 samples/sec   Loss 12.2519   LearningRate 0.0888   Epoch: 1   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:29,460-Speed 3468.11 samples/sec   Loss 12.4462   LearningRate 0.0887   Epoch: 1   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:32,404-Speed 3479.99 samples/sec   Loss 12.4253   LearningRate 0.0887   Epoch: 1   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:35,347-Speed 3479.91 samples/sec   Loss 12.3812   LearningRate 0.0887   Epoch: 1   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:38,294-Speed 3476.00 samples/sec   Loss 12.4489   LearningRate 0.0887   Epoch: 1   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:41,242-Speed 3474.35 samples/sec   Loss 12.4769   LearningRate 0.0887   Epoch: 1   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:44,187-Speed 3477.57 samples/sec   Loss 12.3100   LearningRate 0.0887   Epoch: 1   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:47,119-Speed 3492.83 samples/sec   Loss 12.3306   LearningRate 0.0886   Epoch: 1   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:50,067-Speed 3474.73 samples/sec   Loss 12.4002   LearningRate 0.0886   Epoch: 1   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:53,023-Speed 3464.96 samples/sec   Loss 12.4834   LearningRate 0.0886   Epoch: 1   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:55,976-Speed 3470.69 samples/sec   Loss 12.2588   LearningRate 0.0886   Epoch: 1   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:48:58,920-Speed 3480.08 samples/sec   Loss 12.3713   LearningRate 0.0886   Epoch: 1   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:49:01,865-Speed 3477.09 samples/sec   Loss 12.2901   LearningRate 0.0885   Epoch: 1   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:49:04,812-Speed 3475.80 samples/sec   Loss 12.3571   LearningRate 0.0885   Epoch: 1   Global Step: 5980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:49:07,758-Speed 3476.41 samples/sec   Loss 12.2130   LearningRate 0.0885   Epoch: 1   Global Step: 5990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:49:10,713-Speed 3466.61 samples/sec   Loss 12.3169   LearningRate 0.0885   Epoch: 1   Global Step: 6000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:49:54,741-[lfw][6000]XNorm: 22.300878
Training: 2022-04-10 23:49:54,741-[lfw][6000]Accuracy-Flip: 0.99417+-0.00327
Training: 2022-04-10 23:49:54,742-[lfw][6000]Accuracy-Highest: 0.99417
Training: 2022-04-10 23:50:45,888-[cfp_fp][6000]XNorm: 19.629825
Training: 2022-04-10 23:50:45,889-[cfp_fp][6000]Accuracy-Flip: 0.92186+-0.01157
Training: 2022-04-10 23:50:45,889-[cfp_fp][6000]Accuracy-Highest: 0.92186
Training: 2022-04-10 23:51:30,104-[agedb_30][6000]XNorm: 22.114248
Training: 2022-04-10 23:51:30,105-[agedb_30][6000]Accuracy-Flip: 0.95683+-0.00603
Training: 2022-04-10 23:51:30,106-[agedb_30][6000]Accuracy-Highest: 0.95683
Training: 2022-04-10 23:51:33,048-Speed 71.94 samples/sec   Loss 12.4022   LearningRate 0.0885   Epoch: 1   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:35,978-Speed 3495.99 samples/sec   Loss 12.4552   LearningRate 0.0885   Epoch: 1   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:38,934-Speed 3464.31 samples/sec   Loss 12.3357   LearningRate 0.0884   Epoch: 1   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:41,869-Speed 3490.47 samples/sec   Loss 12.1732   LearningRate 0.0884   Epoch: 1   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:45,887-Speed 2549.02 samples/sec   Loss 12.4505   LearningRate 0.0884   Epoch: 1   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:48,816-Speed 3497.24 samples/sec   Loss 12.0741   LearningRate 0.0884   Epoch: 1   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:51,763-Speed 3475.40 samples/sec   Loss 12.4680   LearningRate 0.0884   Epoch: 1   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:54,728-Speed 3454.19 samples/sec   Loss 12.1639   LearningRate 0.0883   Epoch: 1   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:51:57,682-Speed 3467.61 samples/sec   Loss 12.2994   LearningRate 0.0883   Epoch: 1   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:00,668-Speed 3430.11 samples/sec   Loss 12.2925   LearningRate 0.0883   Epoch: 1   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:03,663-Speed 3419.71 samples/sec   Loss 12.1398   LearningRate 0.0883   Epoch: 1   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:06,623-Speed 3461.51 samples/sec   Loss 12.1477   LearningRate 0.0883   Epoch: 1   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:09,577-Speed 3467.17 samples/sec   Loss 12.3056   LearningRate 0.0882   Epoch: 1   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:12,515-Speed 3486.35 samples/sec   Loss 12.3175   LearningRate 0.0882   Epoch: 1   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:15,466-Speed 3470.22 samples/sec   Loss 12.1865   LearningRate 0.0882   Epoch: 1   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:18,408-Speed 3481.42 samples/sec   Loss 12.4013   LearningRate 0.0882   Epoch: 1   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:21,353-Speed 3478.05 samples/sec   Loss 12.2280   LearningRate 0.0882   Epoch: 1   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:24,344-Speed 3424.30 samples/sec   Loss 12.3790   LearningRate 0.0882   Epoch: 1   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:27,297-Speed 3469.03 samples/sec   Loss 12.1374   LearningRate 0.0881   Epoch: 1   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:30,282-Speed 3430.88 samples/sec   Loss 12.3414   LearningRate 0.0881   Epoch: 1   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:33,239-Speed 3464.52 samples/sec   Loss 12.2641   LearningRate 0.0881   Epoch: 1   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:36,171-Speed 3492.90 samples/sec   Loss 12.1719   LearningRate 0.0881   Epoch: 1   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:39,125-Speed 3468.17 samples/sec   Loss 12.1504   LearningRate 0.0881   Epoch: 1   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:42,072-Speed 3474.89 samples/sec   Loss 12.1188   LearningRate 0.0880   Epoch: 1   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:45,031-Speed 3462.40 samples/sec   Loss 12.2398   LearningRate 0.0880   Epoch: 1   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:47,985-Speed 3466.93 samples/sec   Loss 12.2128   LearningRate 0.0880   Epoch: 1   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:50,946-Speed 3458.58 samples/sec   Loss 12.2987   LearningRate 0.0880   Epoch: 1   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:53,913-Speed 3452.47 samples/sec   Loss 12.1412   LearningRate 0.0880   Epoch: 1   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:56,859-Speed 3476.85 samples/sec   Loss 12.1731   LearningRate 0.0880   Epoch: 1   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:52:59,814-Speed 3466.65 samples/sec   Loss 12.3244   LearningRate 0.0879   Epoch: 1   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:02,756-Speed 3481.88 samples/sec   Loss 11.9956   LearningRate 0.0879   Epoch: 1   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:05,694-Speed 3485.49 samples/sec   Loss 12.2231   LearningRate 0.0879   Epoch: 1   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:08,638-Speed 3479.18 samples/sec   Loss 12.1260   LearningRate 0.0879   Epoch: 1   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:11,588-Speed 3471.94 samples/sec   Loss 12.1495   LearningRate 0.0879   Epoch: 1   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:14,625-Speed 3372.96 samples/sec   Loss 12.0013   LearningRate 0.0878   Epoch: 1   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:17,613-Speed 3428.55 samples/sec   Loss 12.0774   LearningRate 0.0878   Epoch: 1   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:20,557-Speed 3478.23 samples/sec   Loss 12.1664   LearningRate 0.0878   Epoch: 1   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:23,501-Speed 3479.29 samples/sec   Loss 12.1729   LearningRate 0.0878   Epoch: 1   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:26,455-Speed 3468.03 samples/sec   Loss 12.2487   LearningRate 0.0878   Epoch: 1   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:29,402-Speed 3475.54 samples/sec   Loss 12.1561   LearningRate 0.0877   Epoch: 1   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:32,345-Speed 3480.50 samples/sec   Loss 12.2183   LearningRate 0.0877   Epoch: 1   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:35,275-Speed 3495.90 samples/sec   Loss 12.1373   LearningRate 0.0877   Epoch: 1   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:38,242-Speed 3451.82 samples/sec   Loss 11.9975   LearningRate 0.0877   Epoch: 1   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:41,182-Speed 3483.88 samples/sec   Loss 12.2602   LearningRate 0.0877   Epoch: 1   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:44,123-Speed 3483.04 samples/sec   Loss 12.0863   LearningRate 0.0877   Epoch: 1   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:47,062-Speed 3484.96 samples/sec   Loss 11.8688   LearningRate 0.0876   Epoch: 1   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:50,005-Speed 3480.47 samples/sec   Loss 12.1537   LearningRate 0.0876   Epoch: 1   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:52,947-Speed 3481.66 samples/sec   Loss 12.1329   LearningRate 0.0876   Epoch: 1   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:55,889-Speed 3481.12 samples/sec   Loss 11.9842   LearningRate 0.0876   Epoch: 1   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:53:58,840-Speed 3471.28 samples/sec   Loss 12.1020   LearningRate 0.0876   Epoch: 1   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:01,796-Speed 3465.61 samples/sec   Loss 12.0479   LearningRate 0.0875   Epoch: 1   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:04,755-Speed 3460.57 samples/sec   Loss 11.9309   LearningRate 0.0875   Epoch: 1   Global Step: 6520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-10 23:54:07,687-Speed 3493.72 samples/sec   Loss 11.8735   LearningRate 0.0875   Epoch: 1   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:10,628-Speed 3483.46 samples/sec   Loss 12.0220   LearningRate 0.0875   Epoch: 1   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:13,573-Speed 3477.36 samples/sec   Loss 11.8366   LearningRate 0.0875   Epoch: 1   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:16,523-Speed 3471.83 samples/sec   Loss 11.8319   LearningRate 0.0875   Epoch: 1   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:19,484-Speed 3459.01 samples/sec   Loss 12.0006   LearningRate 0.0874   Epoch: 1   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:22,439-Speed 3466.84 samples/sec   Loss 11.8724   LearningRate 0.0874   Epoch: 1   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-10 23:54:25,414-Speed 3443.33 samples/sec   Loss 11.9883   LearningRate 0.0874   Epoch: 1   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:28,389-Speed 3441.88 samples/sec   Loss 12.0070   LearningRate 0.0874   Epoch: 1   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:31,335-Speed 3477.30 samples/sec   Loss 12.2978   LearningRate 0.0874   Epoch: 1   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:34,274-Speed 3485.31 samples/sec   Loss 11.9962   LearningRate 0.0873   Epoch: 1   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:37,210-Speed 3488.84 samples/sec   Loss 12.0382   LearningRate 0.0873   Epoch: 1   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:40,151-Speed 3481.59 samples/sec   Loss 12.0740   LearningRate 0.0873   Epoch: 1   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:43,094-Speed 3480.70 samples/sec   Loss 12.0850   LearningRate 0.0873   Epoch: 1   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:46,039-Speed 3478.41 samples/sec   Loss 11.8458   LearningRate 0.0873   Epoch: 1   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:49,049-Speed 3402.16 samples/sec   Loss 11.9904   LearningRate 0.0872   Epoch: 1   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:52,007-Speed 3463.78 samples/sec   Loss 11.9327   LearningRate 0.0872   Epoch: 1   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:54,960-Speed 3467.87 samples/sec   Loss 12.0475   LearningRate 0.0872   Epoch: 1   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:54:57,929-Speed 3450.29 samples/sec   Loss 12.0875   LearningRate 0.0872   Epoch: 1   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:00,875-Speed 3476.84 samples/sec   Loss 11.9386   LearningRate 0.0872   Epoch: 1   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:03,831-Speed 3464.67 samples/sec   Loss 11.8308   LearningRate 0.0872   Epoch: 1   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:06,767-Speed 3488.28 samples/sec   Loss 12.0707   LearningRate 0.0871   Epoch: 1   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:09,714-Speed 3476.22 samples/sec   Loss 11.8525   LearningRate 0.0871   Epoch: 1   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:12,657-Speed 3480.21 samples/sec   Loss 11.7587   LearningRate 0.0871   Epoch: 1   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:15,609-Speed 3468.80 samples/sec   Loss 11.9857   LearningRate 0.0871   Epoch: 1   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:18,568-Speed 3462.30 samples/sec   Loss 12.0411   LearningRate 0.0871   Epoch: 1   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:21,513-Speed 3478.04 samples/sec   Loss 11.9991   LearningRate 0.0870   Epoch: 1   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:24,456-Speed 3480.34 samples/sec   Loss 11.7230   LearningRate 0.0870   Epoch: 1   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:27,408-Speed 3469.98 samples/sec   Loss 11.8984   LearningRate 0.0870   Epoch: 1   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:30,353-Speed 3477.22 samples/sec   Loss 11.7486   LearningRate 0.0870   Epoch: 1   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:33,299-Speed 3477.97 samples/sec   Loss 11.9252   LearningRate 0.0870   Epoch: 1   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:36,245-Speed 3475.95 samples/sec   Loss 12.0043   LearningRate 0.0870   Epoch: 1   Global Step: 6830   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:55:39,177-Speed 3492.85 samples/sec   Loss 11.7754   LearningRate 0.0869   Epoch: 1   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:42,167-Speed 3426.54 samples/sec   Loss 11.9587   LearningRate 0.0869   Epoch: 1   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:45,117-Speed 3471.59 samples/sec   Loss 11.8065   LearningRate 0.0869   Epoch: 1   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:48,082-Speed 3455.46 samples/sec   Loss 11.7068   LearningRate 0.0869   Epoch: 1   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:51,030-Speed 3474.59 samples/sec   Loss 11.6219   LearningRate 0.0869   Epoch: 1   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:53,987-Speed 3462.85 samples/sec   Loss 11.8841   LearningRate 0.0868   Epoch: 1   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:56,934-Speed 3476.21 samples/sec   Loss 11.8361   LearningRate 0.0868   Epoch: 1   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:55:59,904-Speed 3448.61 samples/sec   Loss 12.0788   LearningRate 0.0868   Epoch: 1   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:02,931-Speed 3383.28 samples/sec   Loss 11.8765   LearningRate 0.0868   Epoch: 1   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:05,943-Speed 3400.84 samples/sec   Loss 11.8234   LearningRate 0.0868   Epoch: 1   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:08,876-Speed 3491.61 samples/sec   Loss 11.6733   LearningRate 0.0867   Epoch: 1   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:11,832-Speed 3465.86 samples/sec   Loss 11.7718   LearningRate 0.0867   Epoch: 1   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:14,782-Speed 3471.17 samples/sec   Loss 11.8572   LearningRate 0.0867   Epoch: 1   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:17,746-Speed 3456.28 samples/sec   Loss 11.8900   LearningRate 0.0867   Epoch: 1   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:20,690-Speed 3479.33 samples/sec   Loss 11.7922   LearningRate 0.0867   Epoch: 1   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:23,640-Speed 3472.29 samples/sec   Loss 11.5616   LearningRate 0.0867   Epoch: 1   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:26,589-Speed 3473.38 samples/sec   Loss 11.9076   LearningRate 0.0866   Epoch: 1   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:29,537-Speed 3474.19 samples/sec   Loss 11.6790   LearningRate 0.0866   Epoch: 1   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:32,487-Speed 3471.68 samples/sec   Loss 11.8545   LearningRate 0.0866   Epoch: 1   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:35,442-Speed 3466.67 samples/sec   Loss 11.6806   LearningRate 0.0866   Epoch: 1   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:38,380-Speed 3485.73 samples/sec   Loss 11.7693   LearningRate 0.0866   Epoch: 1   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:41,346-Speed 3453.15 samples/sec   Loss 11.5106   LearningRate 0.0865   Epoch: 1   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:44,297-Speed 3470.39 samples/sec   Loss 11.7681   LearningRate 0.0865   Epoch: 1   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:47,319-Speed 3390.80 samples/sec   Loss 11.7098   LearningRate 0.0865   Epoch: 1   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:50,336-Speed 3394.22 samples/sec   Loss 11.6533   LearningRate 0.0865   Epoch: 1   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:53,298-Speed 3458.11 samples/sec   Loss 11.7890   LearningRate 0.0865   Epoch: 1   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:56,246-Speed 3474.33 samples/sec   Loss 11.9361   LearningRate 0.0865   Epoch: 1   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:56:59,196-Speed 3472.19 samples/sec   Loss 11.7460   LearningRate 0.0864   Epoch: 1   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:02,148-Speed 3470.18 samples/sec   Loss 11.8761   LearningRate 0.0864   Epoch: 1   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:05,104-Speed 3464.00 samples/sec   Loss 11.7792   LearningRate 0.0864   Epoch: 1   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:08,044-Speed 3484.52 samples/sec   Loss 11.6496   LearningRate 0.0864   Epoch: 1   Global Step: 7140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:11,006-Speed 3457.43 samples/sec   Loss 11.9215   LearningRate 0.0864   Epoch: 1   Global Step: 7150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:13,976-Speed 3449.08 samples/sec   Loss 11.7510   LearningRate 0.0863   Epoch: 1   Global Step: 7160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:17,028-Speed 3356.48 samples/sec   Loss 11.7474   LearningRate 0.0863   Epoch: 1   Global Step: 7170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:19,980-Speed 3469.04 samples/sec   Loss 11.6004   LearningRate 0.0863   Epoch: 1   Global Step: 7180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:22,938-Speed 3462.82 samples/sec   Loss 11.6602   LearningRate 0.0863   Epoch: 1   Global Step: 7190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:25,887-Speed 3473.66 samples/sec   Loss 11.7124   LearningRate 0.0863   Epoch: 1   Global Step: 7200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:28,844-Speed 3463.92 samples/sec   Loss 11.7582   LearningRate 0.0863   Epoch: 1   Global Step: 7210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:31,793-Speed 3473.24 samples/sec   Loss 11.8634   LearningRate 0.0862   Epoch: 1   Global Step: 7220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:34,756-Speed 3456.32 samples/sec   Loss 11.6674   LearningRate 0.0862   Epoch: 1   Global Step: 7230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:37,711-Speed 3466.67 samples/sec   Loss 11.7760   LearningRate 0.0862   Epoch: 1   Global Step: 7240   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-10 23:57:40,665-Speed 3466.97 samples/sec   Loss 11.6824   LearningRate 0.0862   Epoch: 1   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:43,614-Speed 3473.52 samples/sec   Loss 11.7328   LearningRate 0.0862   Epoch: 1   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:46,563-Speed 3473.95 samples/sec   Loss 11.6366   LearningRate 0.0861   Epoch: 1   Global Step: 7270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:49,508-Speed 3476.72 samples/sec   Loss 11.5545   LearningRate 0.0861   Epoch: 1   Global Step: 7280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:52,471-Speed 3457.66 samples/sec   Loss 11.7007   LearningRate 0.0861   Epoch: 1   Global Step: 7290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:55,419-Speed 3474.31 samples/sec   Loss 11.7966   LearningRate 0.0861   Epoch: 1   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:57:58,377-Speed 3462.27 samples/sec   Loss 11.6010   LearningRate 0.0861   Epoch: 1   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:01,345-Speed 3450.79 samples/sec   Loss 11.6120   LearningRate 0.0861   Epoch: 1   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:04,300-Speed 3466.25 samples/sec   Loss 11.7669   LearningRate 0.0860   Epoch: 1   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:07,245-Speed 3478.37 samples/sec   Loss 11.6287   LearningRate 0.0860   Epoch: 1   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:10,188-Speed 3480.93 samples/sec   Loss 11.6830   LearningRate 0.0860   Epoch: 1   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:13,135-Speed 3475.55 samples/sec   Loss 11.4124   LearningRate 0.0860   Epoch: 1   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:16,082-Speed 3475.07 samples/sec   Loss 11.7005   LearningRate 0.0860   Epoch: 1   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:19,032-Speed 3472.02 samples/sec   Loss 11.6631   LearningRate 0.0859   Epoch: 1   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:21,979-Speed 3475.31 samples/sec   Loss 11.6418   LearningRate 0.0859   Epoch: 1   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:24,978-Speed 3415.28 samples/sec   Loss 11.6721   LearningRate 0.0859   Epoch: 1   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:58:27,963-Speed 3431.80 samples/sec   Loss 11.5858   LearningRate 0.0859   Epoch: 1   Global Step: 7410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:30,920-Speed 3463.28 samples/sec   Loss 11.7076   LearningRate 0.0859   Epoch: 1   Global Step: 7420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:33,867-Speed 3476.18 samples/sec   Loss 11.5258   LearningRate 0.0858   Epoch: 1   Global Step: 7430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:36,817-Speed 3472.61 samples/sec   Loss 11.5597   LearningRate 0.0858   Epoch: 1   Global Step: 7440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:39,778-Speed 3458.78 samples/sec   Loss 11.5078   LearningRate 0.0858   Epoch: 1   Global Step: 7450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:42,727-Speed 3473.72 samples/sec   Loss 11.6566   LearningRate 0.0858   Epoch: 1   Global Step: 7460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:45,672-Speed 3477.02 samples/sec   Loss 11.6676   LearningRate 0.0858   Epoch: 1   Global Step: 7470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:48,630-Speed 3462.79 samples/sec   Loss 11.3536   LearningRate 0.0858   Epoch: 1   Global Step: 7480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:51,605-Speed 3443.06 samples/sec   Loss 11.7683   LearningRate 0.0857   Epoch: 1   Global Step: 7490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:54,570-Speed 3454.11 samples/sec   Loss 11.5962   LearningRate 0.0857   Epoch: 1   Global Step: 7500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:58:57,517-Speed 3476.90 samples/sec   Loss 11.5463   LearningRate 0.0857   Epoch: 1   Global Step: 7510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:00,478-Speed 3458.07 samples/sec   Loss 11.4329   LearningRate 0.0857   Epoch: 1   Global Step: 7520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:03,431-Speed 3469.71 samples/sec   Loss 11.4502   LearningRate 0.0857   Epoch: 1   Global Step: 7530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:06,378-Speed 3474.60 samples/sec   Loss 11.4844   LearningRate 0.0856   Epoch: 1   Global Step: 7540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:09,348-Speed 3448.94 samples/sec   Loss 11.4424   LearningRate 0.0856   Epoch: 1   Global Step: 7550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:12,319-Speed 3447.75 samples/sec   Loss 11.7001   LearningRate 0.0856   Epoch: 1   Global Step: 7560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:15,288-Speed 3450.48 samples/sec   Loss 11.6527   LearningRate 0.0856   Epoch: 1   Global Step: 7570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:18,247-Speed 3461.80 samples/sec   Loss 11.6383   LearningRate 0.0856   Epoch: 1   Global Step: 7580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:21,195-Speed 3473.43 samples/sec   Loss 11.3164   LearningRate 0.0856   Epoch: 1   Global Step: 7590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:24,177-Speed 3435.00 samples/sec   Loss 11.5831   LearningRate 0.0855   Epoch: 1   Global Step: 7600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:27,139-Speed 3458.45 samples/sec   Loss 11.5515   LearningRate 0.0855   Epoch: 1   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-10 23:59:30,090-Speed 3470.92 samples/sec   Loss 11.4223   LearningRate 0.0855   Epoch: 1   Global Step: 7620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:33,060-Speed 3448.78 samples/sec   Loss 11.4480   LearningRate 0.0855   Epoch: 1   Global Step: 7630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:36,053-Speed 3421.28 samples/sec   Loss 11.4409   LearningRate 0.0855   Epoch: 1   Global Step: 7640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:39,035-Speed 3435.34 samples/sec   Loss 11.4609   LearningRate 0.0854   Epoch: 1   Global Step: 7650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:41,984-Speed 3474.12 samples/sec   Loss 11.4083   LearningRate 0.0854   Epoch: 1   Global Step: 7660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:44,933-Speed 3472.70 samples/sec   Loss 11.4867   LearningRate 0.0854   Epoch: 1   Global Step: 7670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:47,878-Speed 3478.39 samples/sec   Loss 11.4616   LearningRate 0.0854   Epoch: 1   Global Step: 7680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:50,831-Speed 3467.59 samples/sec   Loss 11.4773   LearningRate 0.0854   Epoch: 1   Global Step: 7690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:53,782-Speed 3471.91 samples/sec   Loss 11.5718   LearningRate 0.0854   Epoch: 1   Global Step: 7700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:56,733-Speed 3469.77 samples/sec   Loss 11.3854   LearningRate 0.0853   Epoch: 1   Global Step: 7710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-10 23:59:59,708-Speed 3443.20 samples/sec   Loss 11.3365   LearningRate 0.0853   Epoch: 1   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:02,667-Speed 3462.18 samples/sec   Loss 11.4711   LearningRate 0.0853   Epoch: 1   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:05,650-Speed 3433.14 samples/sec   Loss 11.4543   LearningRate 0.0853   Epoch: 1   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:08,631-Speed 3436.80 samples/sec   Loss 11.3512   LearningRate 0.0853   Epoch: 1   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:11,580-Speed 3473.31 samples/sec   Loss 11.4632   LearningRate 0.0852   Epoch: 1   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:14,531-Speed 3470.49 samples/sec   Loss 11.3771   LearningRate 0.0852   Epoch: 1   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:17,484-Speed 3468.76 samples/sec   Loss 11.5698   LearningRate 0.0852   Epoch: 1   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:20,434-Speed 3472.08 samples/sec   Loss 11.5089   LearningRate 0.0852   Epoch: 1   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:23,405-Speed 3447.55 samples/sec   Loss 11.4040   LearningRate 0.0852   Epoch: 1   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:26,356-Speed 3470.32 samples/sec   Loss 11.4682   LearningRate 0.0852   Epoch: 1   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:29,301-Speed 3478.02 samples/sec   Loss 11.2888   LearningRate 0.0851   Epoch: 1   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:32,247-Speed 3476.45 samples/sec   Loss 11.3165   LearningRate 0.0851   Epoch: 1   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:00:35,188-Speed 3483.23 samples/sec   Loss 11.5606   LearningRate 0.0851   Epoch: 1   Global Step: 7840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:38,144-Speed 3465.67 samples/sec   Loss 11.2641   LearningRate 0.0851   Epoch: 1   Global Step: 7850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:41,098-Speed 3467.59 samples/sec   Loss 11.4801   LearningRate 0.0851   Epoch: 1   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:44,044-Speed 3476.15 samples/sec   Loss 11.3758   LearningRate 0.0850   Epoch: 1   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:46,995-Speed 3470.66 samples/sec   Loss 11.3019   LearningRate 0.0850   Epoch: 1   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:49,946-Speed 3471.21 samples/sec   Loss 11.2423   LearningRate 0.0850   Epoch: 1   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:52,898-Speed 3469.89 samples/sec   Loss 11.3940   LearningRate 0.0850   Epoch: 1   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:55,889-Speed 3424.19 samples/sec   Loss 11.3758   LearningRate 0.0850   Epoch: 1   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:00:58,839-Speed 3472.34 samples/sec   Loss 11.5091   LearningRate 0.0850   Epoch: 1   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:01:01,797-Speed 3462.83 samples/sec   Loss 11.5032   LearningRate 0.0849   Epoch: 1   Global Step: 7930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:01:04,753-Speed 3465.10 samples/sec   Loss 11.2371   LearningRate 0.0849   Epoch: 1   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:01:07,701-Speed 3473.97 samples/sec   Loss 11.2223   LearningRate 0.0849   Epoch: 1   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:01:10,705-Speed 3410.46 samples/sec   Loss 11.2812   LearningRate 0.0849   Epoch: 1   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:01:13,757-Speed 3356.00 samples/sec   Loss 11.2735   LearningRate 0.0849   Epoch: 1   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:01:16,737-Speed 3437.13 samples/sec   Loss 11.3711   LearningRate 0.0848   Epoch: 1   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:01:19,686-Speed 3473.06 samples/sec   Loss 11.1554   LearningRate 0.0848   Epoch: 1   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:01:22,634-Speed 3474.39 samples/sec   Loss 11.3769   LearningRate 0.0848   Epoch: 1   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:02:07,094-[lfw][8000]XNorm: 21.365277
Training: 2022-04-11 00:02:07,095-[lfw][8000]Accuracy-Flip: 0.99467+-0.00340
Training: 2022-04-11 00:02:07,095-[lfw][8000]Accuracy-Highest: 0.99467
Training: 2022-04-11 00:02:58,495-[cfp_fp][8000]XNorm: 18.598027
Training: 2022-04-11 00:02:58,496-[cfp_fp][8000]Accuracy-Flip: 0.91271+-0.01554
Training: 2022-04-11 00:02:58,496-[cfp_fp][8000]Accuracy-Highest: 0.92186
Training: 2022-04-11 00:03:42,819-[agedb_30][8000]XNorm: 20.583255
Training: 2022-04-11 00:03:42,820-[agedb_30][8000]Accuracy-Flip: 0.95633+-0.00636
Training: 2022-04-11 00:03:42,821-[agedb_30][8000]Accuracy-Highest: 0.95683
Training: 2022-04-11 00:03:45,755-Speed 71.55 samples/sec   Loss 11.3956   LearningRate 0.0848   Epoch: 1   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 00:03:48,693-Speed 3485.69 samples/sec   Loss 11.3459   LearningRate 0.0848   Epoch: 1   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 00:03:51,622-Speed 3497.56 samples/sec   Loss 11.4367   LearningRate 0.0848   Epoch: 1   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:03:54,551-Speed 3496.95 samples/sec   Loss 11.4454   LearningRate 0.0847   Epoch: 1   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:03:57,487-Speed 3488.88 samples/sec   Loss 11.2438   LearningRate 0.0847   Epoch: 1   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:00,421-Speed 3491.40 samples/sec   Loss 11.4209   LearningRate 0.0847   Epoch: 1   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:03,359-Speed 3485.61 samples/sec   Loss 11.2991   LearningRate 0.0847   Epoch: 1   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:06,360-Speed 3413.33 samples/sec   Loss 11.1986   LearningRate 0.0847   Epoch: 1   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:09,346-Speed 3429.94 samples/sec   Loss 11.2959   LearningRate 0.0846   Epoch: 1   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:12,282-Speed 3488.64 samples/sec   Loss 11.4003   LearningRate 0.0846   Epoch: 1   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:15,217-Speed 3489.56 samples/sec   Loss 11.1969   LearningRate 0.0846   Epoch: 1   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:18,159-Speed 3481.72 samples/sec   Loss 11.2675   LearningRate 0.0846   Epoch: 1   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:21,096-Speed 3488.27 samples/sec   Loss 11.2961   LearningRate 0.0846   Epoch: 1   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:24,021-Speed 3502.00 samples/sec   Loss 11.2967   LearningRate 0.0846   Epoch: 1   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:26,977-Speed 3464.04 samples/sec   Loss 11.2988   LearningRate 0.0845   Epoch: 1   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:29,917-Speed 3484.71 samples/sec   Loss 11.1302   LearningRate 0.0845   Epoch: 1   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:32,855-Speed 3486.02 samples/sec   Loss 11.1367   LearningRate 0.0845   Epoch: 1   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:35,794-Speed 3485.08 samples/sec   Loss 11.2728   LearningRate 0.0845   Epoch: 1   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:04:38,719-Speed 3501.77 samples/sec   Loss 11.0672   LearningRate 0.0845   Epoch: 1   Global Step: 8190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:04:41,666-Speed 3475.53 samples/sec   Loss 11.3449   LearningRate 0.0844   Epoch: 1   Global Step: 8200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:04:44,607-Speed 3482.80 samples/sec   Loss 11.4069   LearningRate 0.0844   Epoch: 1   Global Step: 8210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:04:47,551-Speed 3479.87 samples/sec   Loss 11.3550   LearningRate 0.0844   Epoch: 1   Global Step: 8220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:04:50,511-Speed 3460.39 samples/sec   Loss 11.2673   LearningRate 0.0844   Epoch: 1   Global Step: 8230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:04:53,451-Speed 3483.79 samples/sec   Loss 11.4142   LearningRate 0.0844   Epoch: 1   Global Step: 8240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:04:56,390-Speed 3484.55 samples/sec   Loss 11.2936   LearningRate 0.0844   Epoch: 1   Global Step: 8250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:04:59,322-Speed 3494.09 samples/sec   Loss 11.3169   LearningRate 0.0843   Epoch: 1   Global Step: 8260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:02,270-Speed 3473.48 samples/sec   Loss 11.3213   LearningRate 0.0843   Epoch: 1   Global Step: 8270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:05,209-Speed 3485.69 samples/sec   Loss 11.2319   LearningRate 0.0843   Epoch: 1   Global Step: 8280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:08,146-Speed 3487.00 samples/sec   Loss 11.0289   LearningRate 0.0843   Epoch: 1   Global Step: 8290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:11,091-Speed 3477.72 samples/sec   Loss 11.2461   LearningRate 0.0843   Epoch: 1   Global Step: 8300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:14,028-Speed 3488.47 samples/sec   Loss 11.3124   LearningRate 0.0842   Epoch: 1   Global Step: 8310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:17,006-Speed 3439.53 samples/sec   Loss 11.2455   LearningRate 0.0842   Epoch: 1   Global Step: 8320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:19,943-Speed 3487.20 samples/sec   Loss 11.2093   LearningRate 0.0842   Epoch: 1   Global Step: 8330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:22,880-Speed 3487.37 samples/sec   Loss 11.3841   LearningRate 0.0842   Epoch: 1   Global Step: 8340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:25,817-Speed 3488.42 samples/sec   Loss 11.0479   LearningRate 0.0842   Epoch: 1   Global Step: 8350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-11 00:05:28,754-Speed 3486.38 samples/sec   Loss 11.1351   LearningRate 0.0842   Epoch: 1   Global Step: 8360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:31,711-Speed 3463.53 samples/sec   Loss 11.4995   LearningRate 0.0841   Epoch: 1   Global Step: 8370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:34,659-Speed 3475.87 samples/sec   Loss 11.1889   LearningRate 0.0841   Epoch: 1   Global Step: 8380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:37,605-Speed 3476.56 samples/sec   Loss 11.2275   LearningRate 0.0841   Epoch: 1   Global Step: 8390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:40,593-Speed 3427.72 samples/sec   Loss 11.0607   LearningRate 0.0841   Epoch: 1   Global Step: 8400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:43,531-Speed 3485.66 samples/sec   Loss 11.2486   LearningRate 0.0841   Epoch: 1   Global Step: 8410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:46,477-Speed 3476.89 samples/sec   Loss 11.2538   LearningRate 0.0840   Epoch: 1   Global Step: 8420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:49,446-Speed 3450.35 samples/sec   Loss 11.1443   LearningRate 0.0840   Epoch: 1   Global Step: 8430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:52,389-Speed 3481.06 samples/sec   Loss 11.1824   LearningRate 0.0840   Epoch: 1   Global Step: 8440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:55,325-Speed 3487.98 samples/sec   Loss 11.0909   LearningRate 0.0840   Epoch: 1   Global Step: 8450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:05:58,264-Speed 3484.93 samples/sec   Loss 11.1965   LearningRate 0.0840   Epoch: 1   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:06:01,217-Speed 3468.53 samples/sec   Loss 11.2537   LearningRate 0.0840   Epoch: 1   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:06:04,162-Speed 3479.02 samples/sec   Loss 11.1734   LearningRate 0.0839   Epoch: 1   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:06:07,128-Speed 3453.48 samples/sec   Loss 11.1457   LearningRate 0.0839   Epoch: 1   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:06:10,070-Speed 3481.99 samples/sec   Loss 11.0985   LearningRate 0.0839   Epoch: 1   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:06:13,097-Speed 3383.17 samples/sec   Loss 11.1213   LearningRate 0.0839   Epoch: 1   Global Step: 8510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:16,051-Speed 3467.72 samples/sec   Loss 11.1169   LearningRate 0.0839   Epoch: 1   Global Step: 8520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:18,991-Speed 3483.56 samples/sec   Loss 11.0478   LearningRate 0.0838   Epoch: 1   Global Step: 8530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:21,939-Speed 3474.42 samples/sec   Loss 11.0467   LearningRate 0.0838   Epoch: 1   Global Step: 8540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:24,889-Speed 3472.04 samples/sec   Loss 11.0596   LearningRate 0.0838   Epoch: 1   Global Step: 8550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:27,838-Speed 3472.99 samples/sec   Loss 11.3323   LearningRate 0.0838   Epoch: 1   Global Step: 8560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:30,801-Speed 3457.54 samples/sec   Loss 11.1410   LearningRate 0.0838   Epoch: 1   Global Step: 8570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:33,744-Speed 3479.83 samples/sec   Loss 11.1214   LearningRate 0.0838   Epoch: 1   Global Step: 8580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:36,683-Speed 3485.60 samples/sec   Loss 11.2429   LearningRate 0.0837   Epoch: 1   Global Step: 8590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:39,623-Speed 3483.37 samples/sec   Loss 11.0917   LearningRate 0.0837   Epoch: 1   Global Step: 8600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:42,555-Speed 3493.16 samples/sec   Loss 11.3084   LearningRate 0.0837   Epoch: 1   Global Step: 8610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:45,494-Speed 3485.46 samples/sec   Loss 11.0126   LearningRate 0.0837   Epoch: 1   Global Step: 8620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:48,437-Speed 3480.33 samples/sec   Loss 11.0178   LearningRate 0.0837   Epoch: 1   Global Step: 8630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:51,410-Speed 3444.94 samples/sec   Loss 11.2206   LearningRate 0.0836   Epoch: 1   Global Step: 8640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:54,371-Speed 3458.98 samples/sec   Loss 11.1990   LearningRate 0.0836   Epoch: 1   Global Step: 8650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:06:57,320-Speed 3474.28 samples/sec   Loss 11.1074   LearningRate 0.0836   Epoch: 1   Global Step: 8660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:00,261-Speed 3482.83 samples/sec   Loss 11.1375   LearningRate 0.0836   Epoch: 1   Global Step: 8670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:03,202-Speed 3482.16 samples/sec   Loss 11.1194   LearningRate 0.0836   Epoch: 1   Global Step: 8680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:06,145-Speed 3479.93 samples/sec   Loss 11.0757   LearningRate 0.0836   Epoch: 1   Global Step: 8690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:09,086-Speed 3482.99 samples/sec   Loss 11.1700   LearningRate 0.0835   Epoch: 1   Global Step: 8700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:12,031-Speed 3478.35 samples/sec   Loss 11.2349   LearningRate 0.0835   Epoch: 1   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:14,977-Speed 3476.87 samples/sec   Loss 11.0493   LearningRate 0.0835   Epoch: 1   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:18,010-Speed 3377.06 samples/sec   Loss 11.0805   LearningRate 0.0835   Epoch: 1   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:20,951-Speed 3482.89 samples/sec   Loss 11.2576   LearningRate 0.0835   Epoch: 1   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:23,897-Speed 3476.72 samples/sec   Loss 11.1401   LearningRate 0.0834   Epoch: 1   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:26,837-Speed 3484.07 samples/sec   Loss 10.9691   LearningRate 0.0834   Epoch: 1   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:29,779-Speed 3482.22 samples/sec   Loss 10.9638   LearningRate 0.0834   Epoch: 1   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:32,718-Speed 3484.27 samples/sec   Loss 11.0016   LearningRate 0.0834   Epoch: 1   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:35,660-Speed 3481.65 samples/sec   Loss 10.9286   LearningRate 0.0834   Epoch: 1   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:07:38,591-Speed 3494.82 samples/sec   Loss 11.2384   LearningRate 0.0834   Epoch: 1   Global Step: 8800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:41,530-Speed 3484.73 samples/sec   Loss 11.0215   LearningRate 0.0833   Epoch: 1   Global Step: 8810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:44,469-Speed 3484.61 samples/sec   Loss 10.9443   LearningRate 0.0833   Epoch: 1   Global Step: 8820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:47,420-Speed 3470.64 samples/sec   Loss 10.8001   LearningRate 0.0833   Epoch: 1   Global Step: 8830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:50,369-Speed 3474.04 samples/sec   Loss 11.0068   LearningRate 0.0833   Epoch: 1   Global Step: 8840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:53,314-Speed 3478.08 samples/sec   Loss 11.0704   LearningRate 0.0833   Epoch: 1   Global Step: 8850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:56,256-Speed 3482.37 samples/sec   Loss 10.8681   LearningRate 0.0833   Epoch: 1   Global Step: 8860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:07:59,200-Speed 3478.82 samples/sec   Loss 11.0944   LearningRate 0.0832   Epoch: 1   Global Step: 8870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:08:02,145-Speed 3477.33 samples/sec   Loss 10.9767   LearningRate 0.0832   Epoch: 1   Global Step: 8880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:08:05,090-Speed 3479.01 samples/sec   Loss 10.9145   LearningRate 0.0832   Epoch: 1   Global Step: 8890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:08:08,032-Speed 3480.90 samples/sec   Loss 10.8843   LearningRate 0.0832   Epoch: 1   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:10,971-Speed 3485.13 samples/sec   Loss 11.0924   LearningRate 0.0832   Epoch: 1   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:13,920-Speed 3472.69 samples/sec   Loss 11.0196   LearningRate 0.0831   Epoch: 1   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:16,864-Speed 3479.84 samples/sec   Loss 10.9484   LearningRate 0.0831   Epoch: 1   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:19,811-Speed 3476.32 samples/sec   Loss 11.0023   LearningRate 0.0831   Epoch: 1   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:22,748-Speed 3487.24 samples/sec   Loss 11.0219   LearningRate 0.0831   Epoch: 1   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:25,693-Speed 3477.52 samples/sec   Loss 10.9338   LearningRate 0.0831   Epoch: 1   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:28,636-Speed 3479.96 samples/sec   Loss 11.0331   LearningRate 0.0831   Epoch: 1   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:31,581-Speed 3478.00 samples/sec   Loss 11.0181   LearningRate 0.0830   Epoch: 1   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:34,525-Speed 3479.31 samples/sec   Loss 10.9242   LearningRate 0.0830   Epoch: 1   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:37,465-Speed 3484.48 samples/sec   Loss 10.9845   LearningRate 0.0830   Epoch: 1   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:40,414-Speed 3473.35 samples/sec   Loss 11.1768   LearningRate 0.0830   Epoch: 1   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:43,356-Speed 3482.11 samples/sec   Loss 10.8713   LearningRate 0.0830   Epoch: 1   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:46,299-Speed 3480.17 samples/sec   Loss 10.9125   LearningRate 0.0829   Epoch: 1   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:49,280-Speed 3435.28 samples/sec   Loss 11.1757   LearningRate 0.0829   Epoch: 1   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:52,228-Speed 3474.13 samples/sec   Loss 10.8795   LearningRate 0.0829   Epoch: 1   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:55,174-Speed 3477.41 samples/sec   Loss 10.9134   LearningRate 0.0829   Epoch: 1   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:08:58,115-Speed 3482.09 samples/sec   Loss 10.9638   LearningRate 0.0829   Epoch: 1   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:01,059-Speed 3480.00 samples/sec   Loss 10.9109   LearningRate 0.0829   Epoch: 1   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:04,003-Speed 3478.46 samples/sec   Loss 11.1490   LearningRate 0.0828   Epoch: 1   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:06,937-Speed 3491.64 samples/sec   Loss 10.8609   LearningRate 0.0828   Epoch: 1   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:09,881-Speed 3479.06 samples/sec   Loss 10.8922   LearningRate 0.0828   Epoch: 1   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:12,830-Speed 3472.88 samples/sec   Loss 10.8471   LearningRate 0.0828   Epoch: 1   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:15,781-Speed 3471.70 samples/sec   Loss 10.9641   LearningRate 0.0828   Epoch: 1   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:18,727-Speed 3475.81 samples/sec   Loss 10.9927   LearningRate 0.0827   Epoch: 1   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:21,674-Speed 3475.29 samples/sec   Loss 10.8262   LearningRate 0.0827   Epoch: 1   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:24,618-Speed 3479.92 samples/sec   Loss 10.9571   LearningRate 0.0827   Epoch: 1   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:27,569-Speed 3470.16 samples/sec   Loss 10.8033   LearningRate 0.0827   Epoch: 1   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:30,512-Speed 3481.44 samples/sec   Loss 10.7804   LearningRate 0.0827   Epoch: 1   Global Step: 9180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:33,459-Speed 3475.11 samples/sec   Loss 10.8388   LearningRate 0.0827   Epoch: 1   Global Step: 9190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:36,392-Speed 3492.68 samples/sec   Loss 10.8360   LearningRate 0.0826   Epoch: 1   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:39,340-Speed 3474.47 samples/sec   Loss 11.0047   LearningRate 0.0826   Epoch: 1   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:42,290-Speed 3471.44 samples/sec   Loss 10.8708   LearningRate 0.0826   Epoch: 1   Global Step: 9220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:45,240-Speed 3473.27 samples/sec   Loss 10.8828   LearningRate 0.0826   Epoch: 1   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:48,186-Speed 3475.96 samples/sec   Loss 10.7763   LearningRate 0.0826   Epoch: 1   Global Step: 9240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:51,141-Speed 3466.18 samples/sec   Loss 10.9823   LearningRate 0.0825   Epoch: 1   Global Step: 9250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:54,097-Speed 3465.36 samples/sec   Loss 10.8639   LearningRate 0.0825   Epoch: 1   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:09:57,055-Speed 3461.97 samples/sec   Loss 10.7900   LearningRate 0.0825   Epoch: 1   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:00,092-Speed 3373.28 samples/sec   Loss 10.6864   LearningRate 0.0825   Epoch: 1   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:03,042-Speed 3471.88 samples/sec   Loss 10.8869   LearningRate 0.0825   Epoch: 1   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:05,989-Speed 3476.89 samples/sec   Loss 10.7884   LearningRate 0.0825   Epoch: 1   Global Step: 9300   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:10:08,928-Speed 3484.46 samples/sec   Loss 10.9562   LearningRate 0.0824   Epoch: 1   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:11,871-Speed 3479.93 samples/sec   Loss 10.8761   LearningRate 0.0824   Epoch: 1   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:14,820-Speed 3473.06 samples/sec   Loss 10.7988   LearningRate 0.0824   Epoch: 1   Global Step: 9330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:17,768-Speed 3475.16 samples/sec   Loss 10.8431   LearningRate 0.0824   Epoch: 1   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:20,718-Speed 3471.97 samples/sec   Loss 10.7508   LearningRate 0.0824   Epoch: 1   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:23,664-Speed 3476.04 samples/sec   Loss 11.0304   LearningRate 0.0824   Epoch: 1   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:26,614-Speed 3473.20 samples/sec   Loss 10.7336   LearningRate 0.0823   Epoch: 1   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:29,564-Speed 3472.44 samples/sec   Loss 10.8956   LearningRate 0.0823   Epoch: 1   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:32,508-Speed 3478.24 samples/sec   Loss 10.9184   LearningRate 0.0823   Epoch: 1   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:35,472-Speed 3455.78 samples/sec   Loss 10.7233   LearningRate 0.0823   Epoch: 1   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:38,429-Speed 3463.99 samples/sec   Loss 10.8200   LearningRate 0.0823   Epoch: 1   Global Step: 9410   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:10:41,366-Speed 3487.92 samples/sec   Loss 10.7712   LearningRate 0.0822   Epoch: 1   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:44,316-Speed 3471.25 samples/sec   Loss 10.8268   LearningRate 0.0822   Epoch: 1   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:10:47,253-Speed 3487.84 samples/sec   Loss 11.1065   LearningRate 0.0822   Epoch: 1   Global Step: 9440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:10:50,210-Speed 3463.54 samples/sec   Loss 10.6985   LearningRate 0.0822   Epoch: 1   Global Step: 9450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:10:53,182-Speed 3447.30 samples/sec   Loss 10.9344   LearningRate 0.0822   Epoch: 1   Global Step: 9460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:10:56,128-Speed 3476.89 samples/sec   Loss 10.7886   LearningRate 0.0822   Epoch: 1   Global Step: 9470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:10:59,075-Speed 3475.59 samples/sec   Loss 10.6894   LearningRate 0.0821   Epoch: 1   Global Step: 9480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:11:02,023-Speed 3473.91 samples/sec   Loss 10.8038   LearningRate 0.0821   Epoch: 1   Global Step: 9490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:11:04,964-Speed 3482.58 samples/sec   Loss 10.8463   LearningRate 0.0821   Epoch: 1   Global Step: 9500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:11:07,912-Speed 3474.17 samples/sec   Loss 10.7818   LearningRate 0.0821   Epoch: 1   Global Step: 9510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:11:10,861-Speed 3473.91 samples/sec   Loss 10.7658   LearningRate 0.0821   Epoch: 1   Global Step: 9520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:11:13,808-Speed 3475.93 samples/sec   Loss 10.7793   LearningRate 0.0820   Epoch: 1   Global Step: 9530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:11:16,759-Speed 3469.65 samples/sec   Loss 10.6763   LearningRate 0.0820   Epoch: 1   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:19,708-Speed 3474.25 samples/sec   Loss 10.6945   LearningRate 0.0820   Epoch: 1   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:22,654-Speed 3476.25 samples/sec   Loss 10.8963   LearningRate 0.0820   Epoch: 1   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:25,607-Speed 3469.84 samples/sec   Loss 10.5752   LearningRate 0.0820   Epoch: 1   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:28,551-Speed 3478.42 samples/sec   Loss 10.9165   LearningRate 0.0820   Epoch: 1   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:31,514-Speed 3456.72 samples/sec   Loss 11.0723   LearningRate 0.0819   Epoch: 1   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:34,468-Speed 3467.13 samples/sec   Loss 10.8353   LearningRate 0.0819   Epoch: 1   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:37,412-Speed 3478.91 samples/sec   Loss 10.6384   LearningRate 0.0819   Epoch: 1   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:40,391-Speed 3439.00 samples/sec   Loss 10.6024   LearningRate 0.0819   Epoch: 1   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:43,355-Speed 3455.71 samples/sec   Loss 10.5457   LearningRate 0.0819   Epoch: 1   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:46,294-Speed 3484.86 samples/sec   Loss 10.8110   LearningRate 0.0818   Epoch: 1   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:49,242-Speed 3475.11 samples/sec   Loss 10.7032   LearningRate 0.0818   Epoch: 1   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:52,189-Speed 3475.43 samples/sec   Loss 10.8520   LearningRate 0.0818   Epoch: 1   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:55,132-Speed 3480.20 samples/sec   Loss 10.6608   LearningRate 0.0818   Epoch: 1   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:11:58,080-Speed 3474.48 samples/sec   Loss 10.6234   LearningRate 0.0818   Epoch: 1   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:01,025-Speed 3478.06 samples/sec   Loss 10.7683   LearningRate 0.0818   Epoch: 1   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:03,971-Speed 3476.31 samples/sec   Loss 10.8788   LearningRate 0.0817   Epoch: 1   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:06,916-Speed 3478.15 samples/sec   Loss 10.6470   LearningRate 0.0817   Epoch: 1   Global Step: 9710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:09,863-Speed 3475.67 samples/sec   Loss 10.7109   LearningRate 0.0817   Epoch: 1   Global Step: 9720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:12,812-Speed 3472.98 samples/sec   Loss 10.5228   LearningRate 0.0817   Epoch: 1   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:15,776-Speed 3456.48 samples/sec   Loss 10.6189   LearningRate 0.0817   Epoch: 1   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:18,724-Speed 3474.30 samples/sec   Loss 10.6864   LearningRate 0.0817   Epoch: 1   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:21,675-Speed 3470.38 samples/sec   Loss 10.5862   LearningRate 0.0816   Epoch: 1   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:24,617-Speed 3481.72 samples/sec   Loss 10.5983   LearningRate 0.0816   Epoch: 1   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:27,565-Speed 3474.63 samples/sec   Loss 10.6979   LearningRate 0.0816   Epoch: 1   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:30,516-Speed 3470.41 samples/sec   Loss 10.6763   LearningRate 0.0816   Epoch: 1   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:33,464-Speed 3474.98 samples/sec   Loss 10.8362   LearningRate 0.0816   Epoch: 1   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:36,405-Speed 3482.36 samples/sec   Loss 10.6316   LearningRate 0.0815   Epoch: 1   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:39,352-Speed 3476.14 samples/sec   Loss 10.6581   LearningRate 0.0815   Epoch: 1   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:42,298-Speed 3477.20 samples/sec   Loss 10.5686   LearningRate 0.0815   Epoch: 1   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:45,243-Speed 3478.40 samples/sec   Loss 10.6004   LearningRate 0.0815   Epoch: 1   Global Step: 9840   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:12:48,187-Speed 3479.11 samples/sec   Loss 10.7134   LearningRate 0.0815   Epoch: 1   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:51,131-Speed 3478.25 samples/sec   Loss 10.8971   LearningRate 0.0815   Epoch: 1   Global Step: 9860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:54,088-Speed 3464.40 samples/sec   Loss 10.6830   LearningRate 0.0814   Epoch: 1   Global Step: 9870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:12:57,053-Speed 3454.71 samples/sec   Loss 10.7138   LearningRate 0.0814   Epoch: 1   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:13:00,001-Speed 3473.65 samples/sec   Loss 10.7532   LearningRate 0.0814   Epoch: 1   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:13:02,969-Speed 3452.01 samples/sec   Loss 10.6971   LearningRate 0.0814   Epoch: 1   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:13:05,905-Speed 3488.35 samples/sec   Loss 10.7690   LearningRate 0.0814   Epoch: 1   Global Step: 9910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:08,849-Speed 3479.37 samples/sec   Loss 10.5667   LearningRate 0.0813   Epoch: 1   Global Step: 9920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:11,794-Speed 3478.11 samples/sec   Loss 10.5397   LearningRate 0.0813   Epoch: 1   Global Step: 9930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:14,752-Speed 3461.88 samples/sec   Loss 10.8904   LearningRate 0.0813   Epoch: 1   Global Step: 9940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:17,704-Speed 3470.37 samples/sec   Loss 10.6242   LearningRate 0.0813   Epoch: 1   Global Step: 9950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:20,646-Speed 3481.79 samples/sec   Loss 10.5891   LearningRate 0.0813   Epoch: 1   Global Step: 9960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:23,607-Speed 3458.19 samples/sec   Loss 10.4777   LearningRate 0.0813   Epoch: 1   Global Step: 9970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:26,554-Speed 3476.26 samples/sec   Loss 10.6451   LearningRate 0.0812   Epoch: 1   Global Step: 9980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:29,500-Speed 3477.27 samples/sec   Loss 10.7612   LearningRate 0.0812   Epoch: 1   Global Step: 9990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:13:32,443-Speed 3480.05 samples/sec   Loss 10.6047   LearningRate 0.0812   Epoch: 1   Global Step: 10000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:14:16,575-[lfw][10000]XNorm: 21.927683
Training: 2022-04-11 00:14:16,576-[lfw][10000]Accuracy-Flip: 0.99517+-0.00320
Training: 2022-04-11 00:14:16,576-[lfw][10000]Accuracy-Highest: 0.99517
Training: 2022-04-11 00:15:07,799-[cfp_fp][10000]XNorm: 18.705165
Training: 2022-04-11 00:15:07,800-[cfp_fp][10000]Accuracy-Flip: 0.91743+-0.01279
Training: 2022-04-11 00:15:07,800-[cfp_fp][10000]Accuracy-Highest: 0.92186
Training: 2022-04-11 00:15:51,796-[agedb_30][10000]XNorm: 21.474425
Training: 2022-04-11 00:15:51,796-[agedb_30][10000]Accuracy-Flip: 0.96000+-0.00913
Training: 2022-04-11 00:15:51,797-[agedb_30][10000]Accuracy-Highest: 0.96000
Training: 2022-04-11 00:15:54,743-Speed 71.96 samples/sec   Loss 10.5565   LearningRate 0.0812   Epoch: 1   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:15:57,670-Speed 3499.08 samples/sec   Loss 10.5759   LearningRate 0.0812   Epoch: 1   Global Step: 10020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:00,594-Speed 3503.00 samples/sec   Loss 10.6135   LearningRate 0.0812   Epoch: 1   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:03,532-Speed 3487.30 samples/sec   Loss 10.7502   LearningRate 0.0811   Epoch: 1   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:06,459-Speed 3499.10 samples/sec   Loss 10.6382   LearningRate 0.0811   Epoch: 1   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:09,390-Speed 3494.53 samples/sec   Loss 10.4997   LearningRate 0.0811   Epoch: 1   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:12,318-Speed 3498.81 samples/sec   Loss 10.5600   LearningRate 0.0811   Epoch: 1   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:15,265-Speed 3474.59 samples/sec   Loss 10.5732   LearningRate 0.0811   Epoch: 1   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:18,198-Speed 3493.10 samples/sec   Loss 10.5349   LearningRate 0.0810   Epoch: 1   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:21,128-Speed 3495.71 samples/sec   Loss 10.6567   LearningRate 0.0810   Epoch: 1   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:16:24,102-Speed 3443.50 samples/sec   Loss 10.4294   LearningRate 0.0810   Epoch: 1   Global Step: 10110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:37,672-Speed 754.70 samples/sec   Loss 10.1366   LearningRate 0.0810   Epoch: 2   Global Step: 10120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:40,634-Speed 3458.63 samples/sec   Loss 9.6732   LearningRate 0.0810   Epoch: 2   Global Step: 10130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:43,584-Speed 3472.39 samples/sec   Loss 9.9733   LearningRate 0.0810   Epoch: 2   Global Step: 10140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:46,542-Speed 3462.91 samples/sec   Loss 10.0234   LearningRate 0.0809   Epoch: 2   Global Step: 10150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:49,500-Speed 3463.31 samples/sec   Loss 9.7080   LearningRate 0.0809   Epoch: 2   Global Step: 10160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:52,456-Speed 3465.04 samples/sec   Loss 9.6377   LearningRate 0.0809   Epoch: 2   Global Step: 10170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:55,405-Speed 3473.58 samples/sec   Loss 9.8527   LearningRate 0.0809   Epoch: 2   Global Step: 10180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:16:58,382-Speed 3440.48 samples/sec   Loss 9.8739   LearningRate 0.0809   Epoch: 2   Global Step: 10190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:01,331-Speed 3473.65 samples/sec   Loss 9.7343   LearningRate 0.0809   Epoch: 2   Global Step: 10200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:04,270-Speed 3484.89 samples/sec   Loss 9.8940   LearningRate 0.0808   Epoch: 2   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:07,215-Speed 3477.82 samples/sec   Loss 9.9619   LearningRate 0.0808   Epoch: 2   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:10,161-Speed 3476.84 samples/sec   Loss 9.8723   LearningRate 0.0808   Epoch: 2   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:13,098-Speed 3487.64 samples/sec   Loss 9.9105   LearningRate 0.0808   Epoch: 2   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:16,037-Speed 3485.28 samples/sec   Loss 9.6189   LearningRate 0.0808   Epoch: 2   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:19,000-Speed 3457.46 samples/sec   Loss 9.7835   LearningRate 0.0807   Epoch: 2   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:21,942-Speed 3480.83 samples/sec   Loss 9.8646   LearningRate 0.0807   Epoch: 2   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:24,892-Speed 3472.55 samples/sec   Loss 9.9167   LearningRate 0.0807   Epoch: 2   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:27,848-Speed 3464.80 samples/sec   Loss 9.9956   LearningRate 0.0807   Epoch: 2   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:30,784-Speed 3488.60 samples/sec   Loss 9.9784   LearningRate 0.0807   Epoch: 2   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:33,710-Speed 3501.40 samples/sec   Loss 9.9448   LearningRate 0.0807   Epoch: 2   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:36,653-Speed 3480.04 samples/sec   Loss 9.9315   LearningRate 0.0806   Epoch: 2   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:17:39,609-Speed 3465.18 samples/sec   Loss 9.9768   LearningRate 0.0806   Epoch: 2   Global Step: 10330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:42,548-Speed 3484.26 samples/sec   Loss 10.0251   LearningRate 0.0806   Epoch: 2   Global Step: 10340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:45,490-Speed 3481.03 samples/sec   Loss 10.0981   LearningRate 0.0806   Epoch: 2   Global Step: 10350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:48,431-Speed 3484.04 samples/sec   Loss 10.0712   LearningRate 0.0806   Epoch: 2   Global Step: 10360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:51,378-Speed 3475.70 samples/sec   Loss 10.0176   LearningRate 0.0805   Epoch: 2   Global Step: 10370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:54,323-Speed 3478.12 samples/sec   Loss 10.1186   LearningRate 0.0805   Epoch: 2   Global Step: 10380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:17:57,265-Speed 3480.49 samples/sec   Loss 9.9449   LearningRate 0.0805   Epoch: 2   Global Step: 10390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:18:00,209-Speed 3479.57 samples/sec   Loss 10.0306   LearningRate 0.0805   Epoch: 2   Global Step: 10400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:18:03,161-Speed 3470.16 samples/sec   Loss 10.1588   LearningRate 0.0805   Epoch: 2   Global Step: 10410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:18:06,105-Speed 3478.75 samples/sec   Loss 9.9971   LearningRate 0.0805   Epoch: 2   Global Step: 10420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:18:09,049-Speed 3479.59 samples/sec   Loss 10.1750   LearningRate 0.0804   Epoch: 2   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:11,992-Speed 3480.39 samples/sec   Loss 10.1048   LearningRate 0.0804   Epoch: 2   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:14,938-Speed 3476.92 samples/sec   Loss 10.0031   LearningRate 0.0804   Epoch: 2   Global Step: 10450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:17,881-Speed 3480.67 samples/sec   Loss 9.9977   LearningRate 0.0804   Epoch: 2   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:20,824-Speed 3479.88 samples/sec   Loss 9.8892   LearningRate 0.0804   Epoch: 2   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:23,803-Speed 3438.06 samples/sec   Loss 9.9977   LearningRate 0.0804   Epoch: 2   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:26,747-Speed 3479.46 samples/sec   Loss 9.9510   LearningRate 0.0803   Epoch: 2   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:29,708-Speed 3459.83 samples/sec   Loss 10.1474   LearningRate 0.0803   Epoch: 2   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:32,656-Speed 3473.50 samples/sec   Loss 10.0370   LearningRate 0.0803   Epoch: 2   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:35,602-Speed 3476.76 samples/sec   Loss 10.0398   LearningRate 0.0803   Epoch: 2   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:38,550-Speed 3475.32 samples/sec   Loss 10.0722   LearningRate 0.0803   Epoch: 2   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:41,513-Speed 3456.89 samples/sec   Loss 10.3676   LearningRate 0.0802   Epoch: 2   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:44,455-Speed 3481.86 samples/sec   Loss 10.2310   LearningRate 0.0802   Epoch: 2   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:47,399-Speed 3478.69 samples/sec   Loss 10.1496   LearningRate 0.0802   Epoch: 2   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:18:50,326-Speed 3499.07 samples/sec   Loss 10.2190   LearningRate 0.0802   Epoch: 2   Global Step: 10570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:18:53,270-Speed 3479.04 samples/sec   Loss 10.2912   LearningRate 0.0802   Epoch: 2   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:18:56,211-Speed 3483.19 samples/sec   Loss 10.2302   LearningRate 0.0802   Epoch: 2   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:18:59,155-Speed 3479.06 samples/sec   Loss 10.1436   LearningRate 0.0801   Epoch: 2   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:19:02,106-Speed 3470.66 samples/sec   Loss 10.2062   LearningRate 0.0801   Epoch: 2   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:19:05,061-Speed 3465.94 samples/sec   Loss 10.2153   LearningRate 0.0801   Epoch: 2   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:19:08,016-Speed 3466.20 samples/sec   Loss 10.1390   LearningRate 0.0801   Epoch: 2   Global Step: 10630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:19:10,957-Speed 3483.51 samples/sec   Loss 10.1548   LearningRate 0.0801   Epoch: 2   Global Step: 10640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:19:13,905-Speed 3474.36 samples/sec   Loss 10.2886   LearningRate 0.0801   Epoch: 2   Global Step: 10650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:19:16,847-Speed 3481.35 samples/sec   Loss 10.0659   LearningRate 0.0800   Epoch: 2   Global Step: 10660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:19:19,787-Speed 3484.28 samples/sec   Loss 10.1937   LearningRate 0.0800   Epoch: 2   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:22,725-Speed 3486.03 samples/sec   Loss 10.3913   LearningRate 0.0800   Epoch: 2   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:25,668-Speed 3479.99 samples/sec   Loss 10.2654   LearningRate 0.0800   Epoch: 2   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:28,615-Speed 3476.53 samples/sec   Loss 10.3601   LearningRate 0.0800   Epoch: 2   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:31,565-Speed 3472.10 samples/sec   Loss 10.2640   LearningRate 0.0799   Epoch: 2   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:34,508-Speed 3479.93 samples/sec   Loss 10.2876   LearningRate 0.0799   Epoch: 2   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:37,454-Speed 3477.52 samples/sec   Loss 10.2673   LearningRate 0.0799   Epoch: 2   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:40,400-Speed 3476.66 samples/sec   Loss 10.2056   LearningRate 0.0799   Epoch: 2   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:43,356-Speed 3464.40 samples/sec   Loss 10.2850   LearningRate 0.0799   Epoch: 2   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:46,299-Speed 3480.27 samples/sec   Loss 10.0498   LearningRate 0.0799   Epoch: 2   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:49,232-Speed 3492.51 samples/sec   Loss 10.2209   LearningRate 0.0798   Epoch: 2   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:52,176-Speed 3479.13 samples/sec   Loss 10.3262   LearningRate 0.0798   Epoch: 2   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:55,129-Speed 3469.26 samples/sec   Loss 10.2149   LearningRate 0.0798   Epoch: 2   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:19:58,069-Speed 3483.31 samples/sec   Loss 10.1068   LearningRate 0.0798   Epoch: 2   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:01,019-Speed 3471.90 samples/sec   Loss 10.1653   LearningRate 0.0798   Epoch: 2   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:03,973-Speed 3468.25 samples/sec   Loss 10.3409   LearningRate 0.0798   Epoch: 2   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:06,913-Speed 3483.57 samples/sec   Loss 10.2709   LearningRate 0.0797   Epoch: 2   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:09,855-Speed 3481.43 samples/sec   Loss 10.0901   LearningRate 0.0797   Epoch: 2   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:12,803-Speed 3474.29 samples/sec   Loss 10.1778   LearningRate 0.0797   Epoch: 2   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:15,748-Speed 3477.40 samples/sec   Loss 10.3112   LearningRate 0.0797   Epoch: 2   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:18,697-Speed 3473.84 samples/sec   Loss 10.1159   LearningRate 0.0797   Epoch: 2   Global Step: 10870   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:20:21,630-Speed 3492.55 samples/sec   Loss 10.1064   LearningRate 0.0796   Epoch: 2   Global Step: 10880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:24,582-Speed 3469.73 samples/sec   Loss 10.1144   LearningRate 0.0796   Epoch: 2   Global Step: 10890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:27,527-Speed 3477.60 samples/sec   Loss 10.1725   LearningRate 0.0796   Epoch: 2   Global Step: 10900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:30,469-Speed 3481.89 samples/sec   Loss 10.2907   LearningRate 0.0796   Epoch: 2   Global Step: 10910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:33,416-Speed 3476.18 samples/sec   Loss 10.0941   LearningRate 0.0796   Epoch: 2   Global Step: 10920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:36,356-Speed 3482.81 samples/sec   Loss 10.1132   LearningRate 0.0796   Epoch: 2   Global Step: 10930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:39,300-Speed 3479.79 samples/sec   Loss 10.3284   LearningRate 0.0795   Epoch: 2   Global Step: 10940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:42,244-Speed 3479.01 samples/sec   Loss 10.0877   LearningRate 0.0795   Epoch: 2   Global Step: 10950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:45,195-Speed 3471.26 samples/sec   Loss 10.1720   LearningRate 0.0795   Epoch: 2   Global Step: 10960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:48,139-Speed 3478.70 samples/sec   Loss 10.1693   LearningRate 0.0795   Epoch: 2   Global Step: 10970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:51,095-Speed 3464.39 samples/sec   Loss 10.2850   LearningRate 0.0795   Epoch: 2   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:54,049-Speed 3468.09 samples/sec   Loss 10.3233   LearningRate 0.0795   Epoch: 2   Global Step: 10990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:56,994-Speed 3478.58 samples/sec   Loss 10.2199   LearningRate 0.0794   Epoch: 2   Global Step: 11000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:20:59,937-Speed 3479.80 samples/sec   Loss 10.2208   LearningRate 0.0794   Epoch: 2   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:21:02,886-Speed 3473.94 samples/sec   Loss 10.1829   LearningRate 0.0794   Epoch: 2   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:21:05,885-Speed 3414.32 samples/sec   Loss 10.2947   LearningRate 0.0794   Epoch: 2   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:21:08,830-Speed 3478.93 samples/sec   Loss 10.1567   LearningRate 0.0794   Epoch: 2   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:21:11,781-Speed 3470.41 samples/sec   Loss 10.3202   LearningRate 0.0793   Epoch: 2   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:21:14,742-Speed 3459.23 samples/sec   Loss 10.1414   LearningRate 0.0793   Epoch: 2   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:21:17,708-Speed 3453.71 samples/sec   Loss 10.0655   LearningRate 0.0793   Epoch: 2   Global Step: 11070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:20,660-Speed 3468.86 samples/sec   Loss 10.3676   LearningRate 0.0793   Epoch: 2   Global Step: 11080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:23,603-Speed 3480.74 samples/sec   Loss 10.1590   LearningRate 0.0793   Epoch: 2   Global Step: 11090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:26,551-Speed 3474.33 samples/sec   Loss 10.1222   LearningRate 0.0793   Epoch: 2   Global Step: 11100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:29,496-Speed 3478.65 samples/sec   Loss 10.1634   LearningRate 0.0792   Epoch: 2   Global Step: 11110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:32,437-Speed 3482.98 samples/sec   Loss 10.2929   LearningRate 0.0792   Epoch: 2   Global Step: 11120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:35,386-Speed 3472.97 samples/sec   Loss 10.2778   LearningRate 0.0792   Epoch: 2   Global Step: 11130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:38,331-Speed 3477.80 samples/sec   Loss 10.1859   LearningRate 0.0792   Epoch: 2   Global Step: 11140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:41,284-Speed 3467.82 samples/sec   Loss 10.2221   LearningRate 0.0792   Epoch: 2   Global Step: 11150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:44,227-Speed 3480.86 samples/sec   Loss 10.1977   LearningRate 0.0792   Epoch: 2   Global Step: 11160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:47,165-Speed 3486.22 samples/sec   Loss 10.1989   LearningRate 0.0791   Epoch: 2   Global Step: 11170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:50,112-Speed 3476.36 samples/sec   Loss 10.1648   LearningRate 0.0791   Epoch: 2   Global Step: 11180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:53,057-Speed 3477.14 samples/sec   Loss 10.1744   LearningRate 0.0791   Epoch: 2   Global Step: 11190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:56,002-Speed 3478.74 samples/sec   Loss 10.1891   LearningRate 0.0791   Epoch: 2   Global Step: 11200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:21:58,958-Speed 3464.96 samples/sec   Loss 10.3090   LearningRate 0.0791   Epoch: 2   Global Step: 11210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:22:01,900-Speed 3480.56 samples/sec   Loss 10.1413   LearningRate 0.0790   Epoch: 2   Global Step: 11220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:22:04,846-Speed 3477.36 samples/sec   Loss 10.1574   LearningRate 0.0790   Epoch: 2   Global Step: 11230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:22:07,788-Speed 3481.47 samples/sec   Loss 10.1244   LearningRate 0.0790   Epoch: 2   Global Step: 11240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:22:10,745-Speed 3464.43 samples/sec   Loss 10.2755   LearningRate 0.0790   Epoch: 2   Global Step: 11250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:22:13,712-Speed 3452.08 samples/sec   Loss 10.2731   LearningRate 0.0790   Epoch: 2   Global Step: 11260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:22:16,661-Speed 3472.83 samples/sec   Loss 10.3630   LearningRate 0.0790   Epoch: 2   Global Step: 11270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:19,612-Speed 3472.17 samples/sec   Loss 10.1987   LearningRate 0.0789   Epoch: 2   Global Step: 11280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:22,564-Speed 3469.23 samples/sec   Loss 10.2661   LearningRate 0.0789   Epoch: 2   Global Step: 11290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:25,515-Speed 3470.30 samples/sec   Loss 10.1114   LearningRate 0.0789   Epoch: 2   Global Step: 11300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:28,459-Speed 3480.12 samples/sec   Loss 10.0523   LearningRate 0.0789   Epoch: 2   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:31,411-Speed 3469.21 samples/sec   Loss 10.1899   LearningRate 0.0789   Epoch: 2   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:34,360-Speed 3473.50 samples/sec   Loss 10.1066   LearningRate 0.0789   Epoch: 2   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:37,309-Speed 3472.83 samples/sec   Loss 10.2930   LearningRate 0.0788   Epoch: 2   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:40,262-Speed 3468.25 samples/sec   Loss 10.2239   LearningRate 0.0788   Epoch: 2   Global Step: 11350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:43,212-Speed 3471.67 samples/sec   Loss 10.1424   LearningRate 0.0788   Epoch: 2   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:46,149-Speed 3488.50 samples/sec   Loss 10.1483   LearningRate 0.0788   Epoch: 2   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:49,099-Speed 3472.65 samples/sec   Loss 10.1128   LearningRate 0.0788   Epoch: 2   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:52,203-Speed 3299.81 samples/sec   Loss 10.2886   LearningRate 0.0787   Epoch: 2   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:55,192-Speed 3426.65 samples/sec   Loss 10.3126   LearningRate 0.0787   Epoch: 2   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:22:58,156-Speed 3455.27 samples/sec   Loss 10.2704   LearningRate 0.0787   Epoch: 2   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:01,109-Speed 3468.29 samples/sec   Loss 10.3630   LearningRate 0.0787   Epoch: 2   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:04,066-Speed 3464.36 samples/sec   Loss 10.1821   LearningRate 0.0787   Epoch: 2   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:07,018-Speed 3469.47 samples/sec   Loss 10.0344   LearningRate 0.0787   Epoch: 2   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:09,984-Speed 3453.23 samples/sec   Loss 10.3817   LearningRate 0.0786   Epoch: 2   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:12,938-Speed 3467.59 samples/sec   Loss 10.2356   LearningRate 0.0786   Epoch: 2   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:15,891-Speed 3468.53 samples/sec   Loss 10.1494   LearningRate 0.0786   Epoch: 2   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:18,843-Speed 3469.77 samples/sec   Loss 10.3678   LearningRate 0.0786   Epoch: 2   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:21,793-Speed 3472.58 samples/sec   Loss 10.2936   LearningRate 0.0786   Epoch: 2   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:24,734-Speed 3482.21 samples/sec   Loss 10.1895   LearningRate 0.0786   Epoch: 2   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:27,690-Speed 3464.72 samples/sec   Loss 10.3514   LearningRate 0.0785   Epoch: 2   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:30,639-Speed 3472.75 samples/sec   Loss 10.3094   LearningRate 0.0785   Epoch: 2   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:33,586-Speed 3476.14 samples/sec   Loss 10.0682   LearningRate 0.0785   Epoch: 2   Global Step: 11530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:36,531-Speed 3477.69 samples/sec   Loss 10.0601   LearningRate 0.0785   Epoch: 2   Global Step: 11540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:39,484-Speed 3469.17 samples/sec   Loss 10.0765   LearningRate 0.0785   Epoch: 2   Global Step: 11550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:42,451-Speed 3451.92 samples/sec   Loss 10.0818   LearningRate 0.0785   Epoch: 2   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:45,390-Speed 3485.26 samples/sec   Loss 10.0914   LearningRate 0.0784   Epoch: 2   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:48,338-Speed 3474.37 samples/sec   Loss 10.0568   LearningRate 0.0784   Epoch: 2   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:51,288-Speed 3472.45 samples/sec   Loss 10.0414   LearningRate 0.0784   Epoch: 2   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:54,248-Speed 3460.72 samples/sec   Loss 10.2063   LearningRate 0.0784   Epoch: 2   Global Step: 11600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:23:57,192-Speed 3477.92 samples/sec   Loss 10.2756   LearningRate 0.0784   Epoch: 2   Global Step: 11610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:00,152-Speed 3460.69 samples/sec   Loss 10.0915   LearningRate 0.0783   Epoch: 2   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:03,119-Speed 3452.66 samples/sec   Loss 10.1502   LearningRate 0.0783   Epoch: 2   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:06,067-Speed 3473.79 samples/sec   Loss 10.0497   LearningRate 0.0783   Epoch: 2   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:09,025-Speed 3463.36 samples/sec   Loss 10.1324   LearningRate 0.0783   Epoch: 2   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:11,985-Speed 3460.72 samples/sec   Loss 10.0955   LearningRate 0.0783   Epoch: 2   Global Step: 11660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:14,927-Speed 3481.25 samples/sec   Loss 10.0003   LearningRate 0.0783   Epoch: 2   Global Step: 11670   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:24:17,865-Speed 3485.80 samples/sec   Loss 10.0109   LearningRate 0.0782   Epoch: 2   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:20,810-Speed 3478.64 samples/sec   Loss 10.1873   LearningRate 0.0782   Epoch: 2   Global Step: 11690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:23,778-Speed 3450.51 samples/sec   Loss 10.1648   LearningRate 0.0782   Epoch: 2   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:26,743-Speed 3455.09 samples/sec   Loss 10.1423   LearningRate 0.0782   Epoch: 2   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:29,689-Speed 3477.18 samples/sec   Loss 10.1896   LearningRate 0.0782   Epoch: 2   Global Step: 11720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:32,640-Speed 3469.98 samples/sec   Loss 10.1062   LearningRate 0.0782   Epoch: 2   Global Step: 11730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:35,587-Speed 3476.33 samples/sec   Loss 10.0507   LearningRate 0.0781   Epoch: 2   Global Step: 11740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:38,537-Speed 3471.65 samples/sec   Loss 10.2359   LearningRate 0.0781   Epoch: 2   Global Step: 11750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:41,486-Speed 3473.41 samples/sec   Loss 10.0763   LearningRate 0.0781   Epoch: 2   Global Step: 11760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:44,434-Speed 3474.82 samples/sec   Loss 9.9989   LearningRate 0.0781   Epoch: 2   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:47,369-Speed 3489.35 samples/sec   Loss 9.9400   LearningRate 0.0781   Epoch: 2   Global Step: 11780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:50,317-Speed 3474.82 samples/sec   Loss 10.1238   LearningRate 0.0780   Epoch: 2   Global Step: 11790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:53,261-Speed 3478.88 samples/sec   Loss 9.9622   LearningRate 0.0780   Epoch: 2   Global Step: 11800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:56,207-Speed 3476.30 samples/sec   Loss 10.1916   LearningRate 0.0780   Epoch: 2   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:24:59,158-Speed 3470.72 samples/sec   Loss 10.1340   LearningRate 0.0780   Epoch: 2   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:02,126-Speed 3450.41 samples/sec   Loss 10.1384   LearningRate 0.0780   Epoch: 2   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:05,071-Speed 3478.61 samples/sec   Loss 10.0256   LearningRate 0.0780   Epoch: 2   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:08,025-Speed 3467.47 samples/sec   Loss 10.0877   LearningRate 0.0779   Epoch: 2   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:10,969-Speed 3478.70 samples/sec   Loss 10.0810   LearningRate 0.0779   Epoch: 2   Global Step: 11860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:13,924-Speed 3467.27 samples/sec   Loss 10.2245   LearningRate 0.0779   Epoch: 2   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:16,869-Speed 3477.52 samples/sec   Loss 9.9986   LearningRate 0.0779   Epoch: 2   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:19,813-Speed 3478.74 samples/sec   Loss 10.0716   LearningRate 0.0779   Epoch: 2   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:22,767-Speed 3468.16 samples/sec   Loss 10.2958   LearningRate 0.0779   Epoch: 2   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:25,713-Speed 3476.35 samples/sec   Loss 10.0581   LearningRate 0.0778   Epoch: 2   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:28,656-Speed 3480.54 samples/sec   Loss 10.1387   LearningRate 0.0778   Epoch: 2   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:31,603-Speed 3475.99 samples/sec   Loss 10.1102   LearningRate 0.0778   Epoch: 2   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:34,553-Speed 3471.90 samples/sec   Loss 9.9909   LearningRate 0.0778   Epoch: 2   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:37,514-Speed 3459.04 samples/sec   Loss 9.9988   LearningRate 0.0778   Epoch: 2   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:40,459-Speed 3478.50 samples/sec   Loss 9.9152   LearningRate 0.0778   Epoch: 2   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:43,417-Speed 3462.62 samples/sec   Loss 10.2338   LearningRate 0.0777   Epoch: 2   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:46,360-Speed 3480.00 samples/sec   Loss 10.0492   LearningRate 0.0777   Epoch: 2   Global Step: 11980   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:25:49,294-Speed 3490.33 samples/sec   Loss 10.0646   LearningRate 0.0777   Epoch: 2   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:25:52,241-Speed 3475.78 samples/sec   Loss 10.1239   LearningRate 0.0777   Epoch: 2   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:26:36,602-[lfw][12000]XNorm: 21.938753
Training: 2022-04-11 00:26:36,603-[lfw][12000]Accuracy-Flip: 0.99667+-0.00422
Training: 2022-04-11 00:26:36,603-[lfw][12000]Accuracy-Highest: 0.99667
Training: 2022-04-11 00:27:28,170-[cfp_fp][12000]XNorm: 19.444285
Training: 2022-04-11 00:27:28,171-[cfp_fp][12000]Accuracy-Flip: 0.93443+-0.00824
Training: 2022-04-11 00:27:28,171-[cfp_fp][12000]Accuracy-Highest: 0.93443
Training: 2022-04-11 00:28:12,271-[agedb_30][12000]XNorm: 21.466312
Training: 2022-04-11 00:28:12,271-[agedb_30][12000]Accuracy-Flip: 0.96250+-0.00720
Training: 2022-04-11 00:28:12,272-[agedb_30][12000]Accuracy-Highest: 0.96250
Training: 2022-04-11 00:28:15,202-Speed 71.63 samples/sec   Loss 9.8680   LearningRate 0.0777   Epoch: 2   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:18,127-Speed 3501.23 samples/sec   Loss 10.0220   LearningRate 0.0776   Epoch: 2   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:21,052-Speed 3500.88 samples/sec   Loss 10.0308   LearningRate 0.0776   Epoch: 2   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:23,982-Speed 3496.50 samples/sec   Loss 10.0333   LearningRate 0.0776   Epoch: 2   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:26,913-Speed 3494.99 samples/sec   Loss 9.9918   LearningRate 0.0776   Epoch: 2   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:29,842-Speed 3496.86 samples/sec   Loss 10.0190   LearningRate 0.0776   Epoch: 2   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:32,779-Speed 3487.07 samples/sec   Loss 10.1706   LearningRate 0.0776   Epoch: 2   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:35,722-Speed 3480.38 samples/sec   Loss 10.0897   LearningRate 0.0775   Epoch: 2   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:38,648-Speed 3500.91 samples/sec   Loss 10.1149   LearningRate 0.0775   Epoch: 2   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:41,590-Speed 3481.63 samples/sec   Loss 10.0674   LearningRate 0.0775   Epoch: 2   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:44,521-Speed 3494.07 samples/sec   Loss 10.0346   LearningRate 0.0775   Epoch: 2   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:47,522-Speed 3413.54 samples/sec   Loss 9.9361   LearningRate 0.0775   Epoch: 2   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:50,588-Speed 3340.54 samples/sec   Loss 10.0987   LearningRate 0.0775   Epoch: 2   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:53,536-Speed 3474.29 samples/sec   Loss 10.0346   LearningRate 0.0774   Epoch: 2   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:56,474-Speed 3486.14 samples/sec   Loss 10.0401   LearningRate 0.0774   Epoch: 2   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:28:59,422-Speed 3475.27 samples/sec   Loss 10.0083   LearningRate 0.0774   Epoch: 2   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:02,364-Speed 3480.37 samples/sec   Loss 10.0419   LearningRate 0.0774   Epoch: 2   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:05,346-Speed 3435.68 samples/sec   Loss 10.0734   LearningRate 0.0774   Epoch: 2   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:08,276-Speed 3495.71 samples/sec   Loss 10.1108   LearningRate 0.0774   Epoch: 2   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:11,219-Speed 3480.05 samples/sec   Loss 10.1133   LearningRate 0.0773   Epoch: 2   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:14,162-Speed 3480.58 samples/sec   Loss 9.9720   LearningRate 0.0773   Epoch: 2   Global Step: 12210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:17,103-Speed 3481.94 samples/sec   Loss 9.8578   LearningRate 0.0773   Epoch: 2   Global Step: 12220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:20,048-Speed 3478.14 samples/sec   Loss 10.0405   LearningRate 0.0773   Epoch: 2   Global Step: 12230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:22,987-Speed 3485.28 samples/sec   Loss 10.1310   LearningRate 0.0773   Epoch: 2   Global Step: 12240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:25,936-Speed 3473.51 samples/sec   Loss 10.1484   LearningRate 0.0772   Epoch: 2   Global Step: 12250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:28,884-Speed 3473.67 samples/sec   Loss 10.0598   LearningRate 0.0772   Epoch: 2   Global Step: 12260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:31,828-Speed 3479.22 samples/sec   Loss 10.0615   LearningRate 0.0772   Epoch: 2   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:34,791-Speed 3457.78 samples/sec   Loss 9.8628   LearningRate 0.0772   Epoch: 2   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:29:37,709-Speed 3509.42 samples/sec   Loss 10.0013   LearningRate 0.0772   Epoch: 2   Global Step: 12290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:29:40,662-Speed 3468.99 samples/sec   Loss 9.9669   LearningRate 0.0772   Epoch: 2   Global Step: 12300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:29:43,607-Speed 3477.87 samples/sec   Loss 10.0582   LearningRate 0.0771   Epoch: 2   Global Step: 12310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:29:46,563-Speed 3464.85 samples/sec   Loss 10.1298   LearningRate 0.0771   Epoch: 2   Global Step: 12320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:29:49,523-Speed 3460.93 samples/sec   Loss 10.0152   LearningRate 0.0771   Epoch: 2   Global Step: 12330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:29:52,464-Speed 3481.98 samples/sec   Loss 9.8918   LearningRate 0.0771   Epoch: 2   Global Step: 12340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:29:55,417-Speed 3468.42 samples/sec   Loss 9.8080   LearningRate 0.0771   Epoch: 2   Global Step: 12350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:29:58,355-Speed 3486.27 samples/sec   Loss 9.8347   LearningRate 0.0771   Epoch: 2   Global Step: 12360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:01,311-Speed 3465.67 samples/sec   Loss 9.9167   LearningRate 0.0770   Epoch: 2   Global Step: 12370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:04,267-Speed 3464.83 samples/sec   Loss 10.0324   LearningRate 0.0770   Epoch: 2   Global Step: 12380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:07,209-Speed 3481.91 samples/sec   Loss 9.8912   LearningRate 0.0770   Epoch: 2   Global Step: 12390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:10,144-Speed 3490.94 samples/sec   Loss 9.9471   LearningRate 0.0770   Epoch: 2   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:13,096-Speed 3469.73 samples/sec   Loss 9.9826   LearningRate 0.0770   Epoch: 2   Global Step: 12410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:16,045-Speed 3472.10 samples/sec   Loss 9.9954   LearningRate 0.0770   Epoch: 2   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:19,002-Speed 3477.00 samples/sec   Loss 9.9182   LearningRate 0.0769   Epoch: 2   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:21,938-Speed 3489.08 samples/sec   Loss 9.9411   LearningRate 0.0769   Epoch: 2   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:24,908-Speed 3448.89 samples/sec   Loss 9.9870   LearningRate 0.0769   Epoch: 2   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:27,846-Speed 3485.64 samples/sec   Loss 9.7935   LearningRate 0.0769   Epoch: 2   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:30,792-Speed 3476.60 samples/sec   Loss 9.8425   LearningRate 0.0769   Epoch: 2   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:33,736-Speed 3479.97 samples/sec   Loss 9.9496   LearningRate 0.0768   Epoch: 2   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:30:36,665-Speed 3496.13 samples/sec   Loss 9.8958   LearningRate 0.0768   Epoch: 2   Global Step: 12490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:39,609-Speed 3479.39 samples/sec   Loss 9.9050   LearningRate 0.0768   Epoch: 2   Global Step: 12500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:42,553-Speed 3479.46 samples/sec   Loss 9.9236   LearningRate 0.0768   Epoch: 2   Global Step: 12510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:45,496-Speed 3480.36 samples/sec   Loss 9.8512   LearningRate 0.0768   Epoch: 2   Global Step: 12520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:48,436-Speed 3483.40 samples/sec   Loss 10.1636   LearningRate 0.0768   Epoch: 2   Global Step: 12530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:51,399-Speed 3456.88 samples/sec   Loss 10.0044   LearningRate 0.0767   Epoch: 2   Global Step: 12540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:54,347-Speed 3474.44 samples/sec   Loss 9.8958   LearningRate 0.0767   Epoch: 2   Global Step: 12550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:30:57,296-Speed 3472.83 samples/sec   Loss 10.1365   LearningRate 0.0767   Epoch: 2   Global Step: 12560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:31:00,237-Speed 3483.38 samples/sec   Loss 9.8862   LearningRate 0.0767   Epoch: 2   Global Step: 12570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:31:03,182-Speed 3477.52 samples/sec   Loss 9.8115   LearningRate 0.0767   Epoch: 2   Global Step: 12580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:31:06,129-Speed 3475.93 samples/sec   Loss 9.8728   LearningRate 0.0767   Epoch: 2   Global Step: 12590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:09,071-Speed 3481.92 samples/sec   Loss 9.9812   LearningRate 0.0766   Epoch: 2   Global Step: 12600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:12,014-Speed 3479.72 samples/sec   Loss 9.7707   LearningRate 0.0766   Epoch: 2   Global Step: 12610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:14,957-Speed 3480.51 samples/sec   Loss 9.7266   LearningRate 0.0766   Epoch: 2   Global Step: 12620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:17,905-Speed 3474.67 samples/sec   Loss 9.9295   LearningRate 0.0766   Epoch: 2   Global Step: 12630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:20,841-Speed 3488.65 samples/sec   Loss 9.8625   LearningRate 0.0766   Epoch: 2   Global Step: 12640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:23,780-Speed 3484.58 samples/sec   Loss 10.0104   LearningRate 0.0766   Epoch: 2   Global Step: 12650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:26,726-Speed 3476.79 samples/sec   Loss 9.8294   LearningRate 0.0765   Epoch: 2   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:29,668-Speed 3481.37 samples/sec   Loss 9.8044   LearningRate 0.0765   Epoch: 2   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:32,610-Speed 3481.76 samples/sec   Loss 9.9243   LearningRate 0.0765   Epoch: 2   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:35,554-Speed 3479.35 samples/sec   Loss 9.8680   LearningRate 0.0765   Epoch: 2   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:38,516-Speed 3457.74 samples/sec   Loss 9.9869   LearningRate 0.0765   Epoch: 2   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:41,458-Speed 3481.26 samples/sec   Loss 9.9466   LearningRate 0.0765   Epoch: 2   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:44,408-Speed 3472.19 samples/sec   Loss 9.8115   LearningRate 0.0764   Epoch: 2   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:47,369-Speed 3459.41 samples/sec   Loss 9.8337   LearningRate 0.0764   Epoch: 2   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:50,314-Speed 3478.73 samples/sec   Loss 9.7123   LearningRate 0.0764   Epoch: 2   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:53,251-Speed 3486.88 samples/sec   Loss 9.8252   LearningRate 0.0764   Epoch: 2   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:56,193-Speed 3481.33 samples/sec   Loss 9.9863   LearningRate 0.0764   Epoch: 2   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:31:59,137-Speed 3479.14 samples/sec   Loss 9.9289   LearningRate 0.0763   Epoch: 2   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:32:02,083-Speed 3477.53 samples/sec   Loss 9.8744   LearningRate 0.0763   Epoch: 2   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:32:05,012-Speed 3496.62 samples/sec   Loss 9.7733   LearningRate 0.0763   Epoch: 2   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:32:07,965-Speed 3468.46 samples/sec   Loss 9.9080   LearningRate 0.0763   Epoch: 2   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:32:10,904-Speed 3484.54 samples/sec   Loss 9.9513   LearningRate 0.0763   Epoch: 2   Global Step: 12810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:13,873-Speed 3450.28 samples/sec   Loss 9.7982   LearningRate 0.0763   Epoch: 2   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:16,814-Speed 3482.17 samples/sec   Loss 9.8683   LearningRate 0.0762   Epoch: 2   Global Step: 12830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:19,772-Speed 3463.81 samples/sec   Loss 9.8924   LearningRate 0.0762   Epoch: 2   Global Step: 12840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:22,706-Speed 3490.25 samples/sec   Loss 9.9606   LearningRate 0.0762   Epoch: 2   Global Step: 12850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:25,646-Speed 3483.28 samples/sec   Loss 9.9361   LearningRate 0.0762   Epoch: 2   Global Step: 12860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:28,590-Speed 3479.74 samples/sec   Loss 9.7970   LearningRate 0.0762   Epoch: 2   Global Step: 12870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:31,552-Speed 3457.70 samples/sec   Loss 9.8542   LearningRate 0.0762   Epoch: 2   Global Step: 12880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:34,496-Speed 3479.12 samples/sec   Loss 9.8963   LearningRate 0.0761   Epoch: 2   Global Step: 12890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:37,445-Speed 3473.43 samples/sec   Loss 9.6312   LearningRate 0.0761   Epoch: 2   Global Step: 12900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:40,390-Speed 3478.31 samples/sec   Loss 9.9222   LearningRate 0.0761   Epoch: 2   Global Step: 12910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:43,334-Speed 3479.14 samples/sec   Loss 9.9380   LearningRate 0.0761   Epoch: 2   Global Step: 12920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:46,292-Speed 3461.74 samples/sec   Loss 9.8566   LearningRate 0.0761   Epoch: 2   Global Step: 12930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:49,255-Speed 3457.85 samples/sec   Loss 9.7941   LearningRate 0.0761   Epoch: 2   Global Step: 12940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:52,214-Speed 3461.22 samples/sec   Loss 9.7632   LearningRate 0.0760   Epoch: 2   Global Step: 12950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:55,168-Speed 3467.53 samples/sec   Loss 9.6781   LearningRate 0.0760   Epoch: 2   Global Step: 12960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:32:58,144-Speed 3441.42 samples/sec   Loss 9.9019   LearningRate 0.0760   Epoch: 2   Global Step: 12970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:33:01,096-Speed 3470.16 samples/sec   Loss 9.7168   LearningRate 0.0760   Epoch: 2   Global Step: 12980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:33:04,037-Speed 3483.18 samples/sec   Loss 9.9061   LearningRate 0.0760   Epoch: 2   Global Step: 12990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:33:06,977-Speed 3482.91 samples/sec   Loss 9.6492   LearningRate 0.0759   Epoch: 2   Global Step: 13000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:33:09,917-Speed 3484.62 samples/sec   Loss 9.7487   LearningRate 0.0759   Epoch: 2   Global Step: 13010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:12,997-Speed 3325.44 samples/sec   Loss 9.9081   LearningRate 0.0759   Epoch: 2   Global Step: 13020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:15,943-Speed 3476.23 samples/sec   Loss 9.9340   LearningRate 0.0759   Epoch: 2   Global Step: 13030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:18,897-Speed 3467.78 samples/sec   Loss 9.7068   LearningRate 0.0759   Epoch: 2   Global Step: 13040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:21,864-Speed 3451.27 samples/sec   Loss 9.7997   LearningRate 0.0759   Epoch: 2   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:24,827-Speed 3457.51 samples/sec   Loss 9.9323   LearningRate 0.0758   Epoch: 2   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:27,783-Speed 3465.31 samples/sec   Loss 9.7419   LearningRate 0.0758   Epoch: 2   Global Step: 13070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:30,730-Speed 3474.96 samples/sec   Loss 9.8883   LearningRate 0.0758   Epoch: 2   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:33,676-Speed 3477.70 samples/sec   Loss 9.8366   LearningRate 0.0758   Epoch: 2   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:36,644-Speed 3449.85 samples/sec   Loss 9.8144   LearningRate 0.0758   Epoch: 2   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:39,627-Speed 3434.31 samples/sec   Loss 9.8951   LearningRate 0.0758   Epoch: 2   Global Step: 13110   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:33:42,559-Speed 3493.65 samples/sec   Loss 9.9003   LearningRate 0.0757   Epoch: 2   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:45,501-Speed 3481.75 samples/sec   Loss 9.6523   LearningRate 0.0757   Epoch: 2   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:48,480-Speed 3437.21 samples/sec   Loss 9.6702   LearningRate 0.0757   Epoch: 2   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:51,439-Speed 3461.17 samples/sec   Loss 9.7681   LearningRate 0.0757   Epoch: 2   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:54,387-Speed 3474.94 samples/sec   Loss 9.7408   LearningRate 0.0757   Epoch: 2   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:33:57,335-Speed 3475.06 samples/sec   Loss 9.7661   LearningRate 0.0757   Epoch: 2   Global Step: 13170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:00,283-Speed 3474.61 samples/sec   Loss 9.7333   LearningRate 0.0756   Epoch: 2   Global Step: 13180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:03,224-Speed 3482.05 samples/sec   Loss 9.7453   LearningRate 0.0756   Epoch: 2   Global Step: 13190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:06,174-Speed 3472.40 samples/sec   Loss 9.8725   LearningRate 0.0756   Epoch: 2   Global Step: 13200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:09,121-Speed 3475.04 samples/sec   Loss 9.9518   LearningRate 0.0756   Epoch: 2   Global Step: 13210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:12,076-Speed 3466.72 samples/sec   Loss 9.7194   LearningRate 0.0756   Epoch: 2   Global Step: 13220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:15,018-Speed 3481.56 samples/sec   Loss 9.7270   LearningRate 0.0756   Epoch: 2   Global Step: 13230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:17,959-Speed 3482.64 samples/sec   Loss 9.9181   LearningRate 0.0755   Epoch: 2   Global Step: 13240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:20,912-Speed 3468.70 samples/sec   Loss 9.9028   LearningRate 0.0755   Epoch: 2   Global Step: 13250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:23,856-Speed 3479.69 samples/sec   Loss 9.7772   LearningRate 0.0755   Epoch: 2   Global Step: 13260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:26,806-Speed 3471.50 samples/sec   Loss 9.7652   LearningRate 0.0755   Epoch: 2   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:34:29,757-Speed 3470.92 samples/sec   Loss 9.7983   LearningRate 0.0755   Epoch: 2   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:34:32,719-Speed 3457.56 samples/sec   Loss 9.6997   LearningRate 0.0755   Epoch: 2   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:34:35,655-Speed 3488.93 samples/sec   Loss 9.8253   LearningRate 0.0754   Epoch: 2   Global Step: 13300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:38,616-Speed 3459.26 samples/sec   Loss 9.8321   LearningRate 0.0754   Epoch: 2   Global Step: 13310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:41,559-Speed 3479.88 samples/sec   Loss 9.6488   LearningRate 0.0754   Epoch: 2   Global Step: 13320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:44,517-Speed 3463.35 samples/sec   Loss 9.8277   LearningRate 0.0754   Epoch: 2   Global Step: 13330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:47,512-Speed 3419.39 samples/sec   Loss 9.7660   LearningRate 0.0754   Epoch: 2   Global Step: 13340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:50,469-Speed 3463.70 samples/sec   Loss 9.8617   LearningRate 0.0753   Epoch: 2   Global Step: 13350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:53,438-Speed 3450.62 samples/sec   Loss 9.8579   LearningRate 0.0753   Epoch: 2   Global Step: 13360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:56,380-Speed 3481.48 samples/sec   Loss 9.6904   LearningRate 0.0753   Epoch: 2   Global Step: 13370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:34:59,347-Speed 3457.12 samples/sec   Loss 9.7853   LearningRate 0.0753   Epoch: 2   Global Step: 13380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:02,316-Speed 3449.77 samples/sec   Loss 9.7171   LearningRate 0.0753   Epoch: 2   Global Step: 13390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:05,260-Speed 3480.07 samples/sec   Loss 9.8099   LearningRate 0.0753   Epoch: 2   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:08,212-Speed 3468.80 samples/sec   Loss 9.6157   LearningRate 0.0752   Epoch: 2   Global Step: 13410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:11,172-Speed 3461.21 samples/sec   Loss 9.8527   LearningRate 0.0752   Epoch: 2   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:14,134-Speed 3458.17 samples/sec   Loss 9.8391   LearningRate 0.0752   Epoch: 2   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:17,077-Speed 3479.85 samples/sec   Loss 9.7945   LearningRate 0.0752   Epoch: 2   Global Step: 13440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:20,046-Speed 3449.65 samples/sec   Loss 9.8227   LearningRate 0.0752   Epoch: 2   Global Step: 13450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:23,005-Speed 3462.07 samples/sec   Loss 9.5863   LearningRate 0.0752   Epoch: 2   Global Step: 13460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:25,967-Speed 3457.84 samples/sec   Loss 9.8524   LearningRate 0.0751   Epoch: 2   Global Step: 13470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:28,914-Speed 3475.56 samples/sec   Loss 9.6595   LearningRate 0.0751   Epoch: 2   Global Step: 13480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:31,875-Speed 3458.88 samples/sec   Loss 9.7884   LearningRate 0.0751   Epoch: 2   Global Step: 13490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:34,818-Speed 3480.91 samples/sec   Loss 9.7783   LearningRate 0.0751   Epoch: 2   Global Step: 13500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:37,774-Speed 3465.21 samples/sec   Loss 9.7517   LearningRate 0.0751   Epoch: 2   Global Step: 13510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:40,726-Speed 3469.11 samples/sec   Loss 9.8072   LearningRate 0.0751   Epoch: 2   Global Step: 13520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:43,680-Speed 3467.81 samples/sec   Loss 9.7926   LearningRate 0.0750   Epoch: 2   Global Step: 13530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:35:46,643-Speed 3456.00 samples/sec   Loss 9.5404   LearningRate 0.0750   Epoch: 2   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:49,601-Speed 3463.47 samples/sec   Loss 9.6945   LearningRate 0.0750   Epoch: 2   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:52,548-Speed 3476.22 samples/sec   Loss 9.7041   LearningRate 0.0750   Epoch: 2   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:55,488-Speed 3483.46 samples/sec   Loss 9.6416   LearningRate 0.0750   Epoch: 2   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:35:58,434-Speed 3476.69 samples/sec   Loss 9.6934   LearningRate 0.0750   Epoch: 2   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:01,402-Speed 3450.37 samples/sec   Loss 9.6871   LearningRate 0.0749   Epoch: 2   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:04,346-Speed 3479.96 samples/sec   Loss 9.6387   LearningRate 0.0749   Epoch: 2   Global Step: 13600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:07,308-Speed 3458.14 samples/sec   Loss 9.4735   LearningRate 0.0749   Epoch: 2   Global Step: 13610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:10,272-Speed 3454.72 samples/sec   Loss 9.5100   LearningRate 0.0749   Epoch: 2   Global Step: 13620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:13,250-Speed 3439.94 samples/sec   Loss 9.6190   LearningRate 0.0749   Epoch: 2   Global Step: 13630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:16,204-Speed 3467.35 samples/sec   Loss 9.8004   LearningRate 0.0749   Epoch: 2   Global Step: 13640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:19,151-Speed 3475.52 samples/sec   Loss 9.7631   LearningRate 0.0748   Epoch: 2   Global Step: 13650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:22,101-Speed 3472.64 samples/sec   Loss 9.7602   LearningRate 0.0748   Epoch: 2   Global Step: 13660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:25,072-Speed 3446.94 samples/sec   Loss 9.7543   LearningRate 0.0748   Epoch: 2   Global Step: 13670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:28,097-Speed 3386.41 samples/sec   Loss 9.7730   LearningRate 0.0748   Epoch: 2   Global Step: 13680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:31,041-Speed 3479.08 samples/sec   Loss 9.7210   LearningRate 0.0748   Epoch: 2   Global Step: 13690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:36:34,017-Speed 3440.74 samples/sec   Loss 9.6029   LearningRate 0.0747   Epoch: 2   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:36,967-Speed 3471.94 samples/sec   Loss 9.7087   LearningRate 0.0747   Epoch: 2   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:39,931-Speed 3456.88 samples/sec   Loss 9.8465   LearningRate 0.0747   Epoch: 2   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:42,884-Speed 3467.98 samples/sec   Loss 9.5611   LearningRate 0.0747   Epoch: 2   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:45,839-Speed 3466.26 samples/sec   Loss 9.7864   LearningRate 0.0747   Epoch: 2   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:48,821-Speed 3435.06 samples/sec   Loss 9.6425   LearningRate 0.0747   Epoch: 2   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:51,788-Speed 3452.25 samples/sec   Loss 9.7526   LearningRate 0.0746   Epoch: 2   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:54,752-Speed 3455.92 samples/sec   Loss 9.4969   LearningRate 0.0746   Epoch: 2   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:36:57,693-Speed 3481.67 samples/sec   Loss 9.6464   LearningRate 0.0746   Epoch: 2   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:00,662-Speed 3450.18 samples/sec   Loss 9.5830   LearningRate 0.0746   Epoch: 2   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:03,613-Speed 3471.20 samples/sec   Loss 9.7357   LearningRate 0.0746   Epoch: 2   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:06,576-Speed 3455.84 samples/sec   Loss 9.6069   LearningRate 0.0746   Epoch: 2   Global Step: 13810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:09,537-Speed 3459.83 samples/sec   Loss 9.6552   LearningRate 0.0745   Epoch: 2   Global Step: 13820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:12,484-Speed 3476.09 samples/sec   Loss 9.6316   LearningRate 0.0745   Epoch: 2   Global Step: 13830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:15,436-Speed 3470.21 samples/sec   Loss 9.4862   LearningRate 0.0745   Epoch: 2   Global Step: 13840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:18,377-Speed 3481.60 samples/sec   Loss 9.5470   LearningRate 0.0745   Epoch: 2   Global Step: 13850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:21,323-Speed 3477.29 samples/sec   Loss 9.5061   LearningRate 0.0745   Epoch: 2   Global Step: 13860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:24,271-Speed 3474.68 samples/sec   Loss 9.7181   LearningRate 0.0745   Epoch: 2   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:27,228-Speed 3463.43 samples/sec   Loss 9.5638   LearningRate 0.0744   Epoch: 2   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:30,196-Speed 3451.27 samples/sec   Loss 9.6382   LearningRate 0.0744   Epoch: 2   Global Step: 13890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:33,151-Speed 3466.27 samples/sec   Loss 9.6084   LearningRate 0.0744   Epoch: 2   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:36,128-Speed 3440.81 samples/sec   Loss 9.6391   LearningRate 0.0744   Epoch: 2   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:39,076-Speed 3474.61 samples/sec   Loss 9.6264   LearningRate 0.0744   Epoch: 2   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:42,019-Speed 3479.52 samples/sec   Loss 9.4870   LearningRate 0.0744   Epoch: 2   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:44,987-Speed 3451.74 samples/sec   Loss 9.6476   LearningRate 0.0743   Epoch: 2   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:47,942-Speed 3466.28 samples/sec   Loss 9.6736   LearningRate 0.0743   Epoch: 2   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:50,890-Speed 3473.60 samples/sec   Loss 9.7638   LearningRate 0.0743   Epoch: 2   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:53,831-Speed 3482.98 samples/sec   Loss 9.6652   LearningRate 0.0743   Epoch: 2   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:56,790-Speed 3460.90 samples/sec   Loss 9.7038   LearningRate 0.0743   Epoch: 2   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:37:59,753-Speed 3456.96 samples/sec   Loss 9.6262   LearningRate 0.0743   Epoch: 2   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:38:02,707-Speed 3467.97 samples/sec   Loss 9.5170   LearningRate 0.0742   Epoch: 2   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:38:46,998-[lfw][14000]XNorm: 21.923426
Training: 2022-04-11 00:38:46,998-[lfw][14000]Accuracy-Flip: 0.99617+-0.00366
Training: 2022-04-11 00:38:46,999-[lfw][14000]Accuracy-Highest: 0.99667
Training: 2022-04-11 00:39:38,479-[cfp_fp][14000]XNorm: 19.558783
Training: 2022-04-11 00:39:38,480-[cfp_fp][14000]Accuracy-Flip: 0.94229+-0.00967
Training: 2022-04-11 00:39:38,480-[cfp_fp][14000]Accuracy-Highest: 0.94229
Training: 2022-04-11 00:40:22,629-[agedb_30][14000]XNorm: 21.703387
Training: 2022-04-11 00:40:22,629-[agedb_30][14000]Accuracy-Flip: 0.96667+-0.00969
Training: 2022-04-11 00:40:22,630-[agedb_30][14000]Accuracy-Highest: 0.96667
Training: 2022-04-11 00:40:25,567-Speed 71.68 samples/sec   Loss 9.6770   LearningRate 0.0742   Epoch: 2   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:28,517-Speed 3471.69 samples/sec   Loss 9.6177   LearningRate 0.0742   Epoch: 2   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:31,443-Speed 3499.89 samples/sec   Loss 9.4818   LearningRate 0.0742   Epoch: 2   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:34,372-Speed 3497.96 samples/sec   Loss 9.4125   LearningRate 0.0742   Epoch: 2   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:37,303-Speed 3493.71 samples/sec   Loss 9.5620   LearningRate 0.0742   Epoch: 2   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:40,255-Speed 3469.81 samples/sec   Loss 9.6779   LearningRate 0.0741   Epoch: 2   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:43,216-Speed 3459.12 samples/sec   Loss 9.6255   LearningRate 0.0741   Epoch: 2   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:46,181-Speed 3454.41 samples/sec   Loss 9.4516   LearningRate 0.0741   Epoch: 2   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:49,149-Speed 3451.90 samples/sec   Loss 9.7405   LearningRate 0.0741   Epoch: 2   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:52,127-Speed 3439.71 samples/sec   Loss 9.4861   LearningRate 0.0741   Epoch: 2   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:40:55,113-Speed 3429.80 samples/sec   Loss 9.5171   LearningRate 0.0740   Epoch: 2   Global Step: 14110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:40:58,076-Speed 3457.39 samples/sec   Loss 9.5814   LearningRate 0.0740   Epoch: 2   Global Step: 14120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:01,024-Speed 3474.08 samples/sec   Loss 9.5803   LearningRate 0.0740   Epoch: 2   Global Step: 14130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:03,973-Speed 3472.65 samples/sec   Loss 9.6806   LearningRate 0.0740   Epoch: 2   Global Step: 14140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:06,948-Speed 3443.03 samples/sec   Loss 9.4502   LearningRate 0.0740   Epoch: 2   Global Step: 14150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:09,894-Speed 3477.79 samples/sec   Loss 9.7680   LearningRate 0.0740   Epoch: 2   Global Step: 14160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:12,856-Speed 3457.62 samples/sec   Loss 9.5583   LearningRate 0.0739   Epoch: 2   Global Step: 14170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:15,814-Speed 3463.32 samples/sec   Loss 9.5171   LearningRate 0.0739   Epoch: 2   Global Step: 14180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:18,766-Speed 3469.61 samples/sec   Loss 9.6826   LearningRate 0.0739   Epoch: 2   Global Step: 14190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:21,714-Speed 3473.61 samples/sec   Loss 9.5349   LearningRate 0.0739   Epoch: 2   Global Step: 14200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:24,674-Speed 3461.29 samples/sec   Loss 9.5859   LearningRate 0.0739   Epoch: 2   Global Step: 14210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:41:27,616-Speed 3481.49 samples/sec   Loss 9.4997   LearningRate 0.0739   Epoch: 2   Global Step: 14220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:41:30,549-Speed 3491.70 samples/sec   Loss 9.6657   LearningRate 0.0738   Epoch: 2   Global Step: 14230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:33,492-Speed 3480.14 samples/sec   Loss 9.6374   LearningRate 0.0738   Epoch: 2   Global Step: 14240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:36,437-Speed 3478.71 samples/sec   Loss 9.5466   LearningRate 0.0738   Epoch: 2   Global Step: 14250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:39,387-Speed 3472.40 samples/sec   Loss 9.4953   LearningRate 0.0738   Epoch: 2   Global Step: 14260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:42,328-Speed 3482.65 samples/sec   Loss 9.5309   LearningRate 0.0738   Epoch: 2   Global Step: 14270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:45,279-Speed 3470.37 samples/sec   Loss 9.4478   LearningRate 0.0738   Epoch: 2   Global Step: 14280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:48,238-Speed 3461.73 samples/sec   Loss 9.4340   LearningRate 0.0737   Epoch: 2   Global Step: 14290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:51,205-Speed 3452.34 samples/sec   Loss 9.6822   LearningRate 0.0737   Epoch: 2   Global Step: 14300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:54,164-Speed 3461.61 samples/sec   Loss 9.5854   LearningRate 0.0737   Epoch: 2   Global Step: 14310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:41:57,113-Speed 3473.76 samples/sec   Loss 9.8083   LearningRate 0.0737   Epoch: 2   Global Step: 14320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:00,068-Speed 3466.57 samples/sec   Loss 9.6451   LearningRate 0.0737   Epoch: 2   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:03,002-Speed 3490.74 samples/sec   Loss 9.7147   LearningRate 0.0737   Epoch: 2   Global Step: 14340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:05,968-Speed 3453.09 samples/sec   Loss 9.6601   LearningRate 0.0736   Epoch: 2   Global Step: 14350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:08,948-Speed 3437.99 samples/sec   Loss 9.5535   LearningRate 0.0736   Epoch: 2   Global Step: 14360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:11,913-Speed 3454.03 samples/sec   Loss 9.8244   LearningRate 0.0736   Epoch: 2   Global Step: 14370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:14,883-Speed 3448.71 samples/sec   Loss 9.5734   LearningRate 0.0736   Epoch: 2   Global Step: 14380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:17,860-Speed 3440.56 samples/sec   Loss 9.5068   LearningRate 0.0736   Epoch: 2   Global Step: 14390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:20,835-Speed 3443.00 samples/sec   Loss 9.6177   LearningRate 0.0736   Epoch: 2   Global Step: 14400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:23,785-Speed 3472.47 samples/sec   Loss 9.3836   LearningRate 0.0735   Epoch: 2   Global Step: 14410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:26,751-Speed 3453.20 samples/sec   Loss 9.5097   LearningRate 0.0735   Epoch: 2   Global Step: 14420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:29,704-Speed 3469.56 samples/sec   Loss 9.7503   LearningRate 0.0735   Epoch: 2   Global Step: 14430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:42:32,647-Speed 3479.28 samples/sec   Loss 9.5778   LearningRate 0.0735   Epoch: 2   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:35,597-Speed 3472.63 samples/sec   Loss 9.5491   LearningRate 0.0735   Epoch: 2   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:38,533-Speed 3487.80 samples/sec   Loss 9.4678   LearningRate 0.0735   Epoch: 2   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:41,499-Speed 3453.79 samples/sec   Loss 9.5195   LearningRate 0.0734   Epoch: 2   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:44,447-Speed 3474.76 samples/sec   Loss 9.4482   LearningRate 0.0734   Epoch: 2   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:47,440-Speed 3421.59 samples/sec   Loss 9.7919   LearningRate 0.0734   Epoch: 2   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:50,408-Speed 3451.24 samples/sec   Loss 9.5517   LearningRate 0.0734   Epoch: 2   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:53,403-Speed 3420.27 samples/sec   Loss 9.6874   LearningRate 0.0734   Epoch: 2   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:56,366-Speed 3456.82 samples/sec   Loss 9.3809   LearningRate 0.0734   Epoch: 2   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:42:59,336-Speed 3449.21 samples/sec   Loss 9.5937   LearningRate 0.0733   Epoch: 2   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:43:02,298-Speed 3457.47 samples/sec   Loss 9.5450   LearningRate 0.0733   Epoch: 2   Global Step: 14540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:05,276-Speed 3440.01 samples/sec   Loss 9.5679   LearningRate 0.0733   Epoch: 2   Global Step: 14550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:08,225-Speed 3473.09 samples/sec   Loss 9.5972   LearningRate 0.0733   Epoch: 2   Global Step: 14560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:11,179-Speed 3466.48 samples/sec   Loss 9.6184   LearningRate 0.0733   Epoch: 2   Global Step: 14570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:14,132-Speed 3469.30 samples/sec   Loss 9.5035   LearningRate 0.0733   Epoch: 2   Global Step: 14580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:17,215-Speed 3322.08 samples/sec   Loss 9.4878   LearningRate 0.0732   Epoch: 2   Global Step: 14590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:20,155-Speed 3484.23 samples/sec   Loss 9.4687   LearningRate 0.0732   Epoch: 2   Global Step: 14600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:23,101-Speed 3477.37 samples/sec   Loss 9.4931   LearningRate 0.0732   Epoch: 2   Global Step: 14610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:26,058-Speed 3463.44 samples/sec   Loss 9.5728   LearningRate 0.0732   Epoch: 2   Global Step: 14620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:29,010-Speed 3470.33 samples/sec   Loss 9.6299   LearningRate 0.0732   Epoch: 2   Global Step: 14630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:31,963-Speed 3467.58 samples/sec   Loss 9.6385   LearningRate 0.0732   Epoch: 2   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:43:34,905-Speed 3482.03 samples/sec   Loss 9.5287   LearningRate 0.0731   Epoch: 2   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:43:37,874-Speed 3449.70 samples/sec   Loss 9.4722   LearningRate 0.0731   Epoch: 2   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:43:40,844-Speed 3448.84 samples/sec   Loss 9.6254   LearningRate 0.0731   Epoch: 2   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:43:43,796-Speed 3469.26 samples/sec   Loss 9.4434   LearningRate 0.0731   Epoch: 2   Global Step: 14680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:46,748-Speed 3470.36 samples/sec   Loss 9.5526   LearningRate 0.0731   Epoch: 2   Global Step: 14690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:49,716-Speed 3451.53 samples/sec   Loss 9.4908   LearningRate 0.0730   Epoch: 2   Global Step: 14700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:52,692-Speed 3440.76 samples/sec   Loss 9.4986   LearningRate 0.0730   Epoch: 2   Global Step: 14710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:55,645-Speed 3469.61 samples/sec   Loss 9.4429   LearningRate 0.0730   Epoch: 2   Global Step: 14720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:43:58,592-Speed 3475.51 samples/sec   Loss 9.6485   LearningRate 0.0730   Epoch: 2   Global Step: 14730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:01,542-Speed 3471.46 samples/sec   Loss 9.5113   LearningRate 0.0730   Epoch: 2   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:04,533-Speed 3425.39 samples/sec   Loss 9.4456   LearningRate 0.0730   Epoch: 2   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:07,482-Speed 3472.16 samples/sec   Loss 9.5816   LearningRate 0.0729   Epoch: 2   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:10,449-Speed 3453.03 samples/sec   Loss 9.5263   LearningRate 0.0729   Epoch: 2   Global Step: 14770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:13,402-Speed 3467.96 samples/sec   Loss 9.5076   LearningRate 0.0729   Epoch: 2   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:44:16,373-Speed 3448.20 samples/sec   Loss 9.4207   LearningRate 0.0729   Epoch: 2   Global Step: 14790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:19,325-Speed 3469.48 samples/sec   Loss 9.4481   LearningRate 0.0729   Epoch: 2   Global Step: 14800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:22,298-Speed 3445.65 samples/sec   Loss 9.6597   LearningRate 0.0729   Epoch: 2   Global Step: 14810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:25,255-Speed 3463.42 samples/sec   Loss 9.3225   LearningRate 0.0728   Epoch: 2   Global Step: 14820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:28,196-Speed 3482.26 samples/sec   Loss 9.3751   LearningRate 0.0728   Epoch: 2   Global Step: 14830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:31,145-Speed 3473.18 samples/sec   Loss 9.3780   LearningRate 0.0728   Epoch: 2   Global Step: 14840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:34,097-Speed 3470.55 samples/sec   Loss 9.3688   LearningRate 0.0728   Epoch: 2   Global Step: 14850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:37,075-Speed 3438.44 samples/sec   Loss 9.4319   LearningRate 0.0728   Epoch: 2   Global Step: 14860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:40,038-Speed 3458.25 samples/sec   Loss 9.3590   LearningRate 0.0728   Epoch: 2   Global Step: 14870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:43,004-Speed 3453.08 samples/sec   Loss 9.5596   LearningRate 0.0727   Epoch: 2   Global Step: 14880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:44:45,985-Speed 3436.70 samples/sec   Loss 9.5427   LearningRate 0.0727   Epoch: 2   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:44:48,948-Speed 3456.46 samples/sec   Loss 9.4663   LearningRate 0.0727   Epoch: 2   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:44:51,910-Speed 3458.13 samples/sec   Loss 9.4194   LearningRate 0.0727   Epoch: 2   Global Step: 14910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:44:54,879-Speed 3449.77 samples/sec   Loss 9.2537   LearningRate 0.0727   Epoch: 2   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:44:57,841-Speed 3458.29 samples/sec   Loss 9.4605   LearningRate 0.0727   Epoch: 2   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:00,825-Speed 3432.85 samples/sec   Loss 9.5765   LearningRate 0.0726   Epoch: 2   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:03,825-Speed 3413.90 samples/sec   Loss 9.5575   LearningRate 0.0726   Epoch: 2   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:06,775-Speed 3470.95 samples/sec   Loss 9.3635   LearningRate 0.0726   Epoch: 2   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:09,745-Speed 3448.99 samples/sec   Loss 9.3955   LearningRate 0.0726   Epoch: 2   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:12,686-Speed 3483.44 samples/sec   Loss 9.3479   LearningRate 0.0726   Epoch: 2   Global Step: 14980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:15,697-Speed 3401.66 samples/sec   Loss 9.4412   LearningRate 0.0726   Epoch: 2   Global Step: 14990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:18,684-Speed 3428.65 samples/sec   Loss 9.7677   LearningRate 0.0725   Epoch: 2   Global Step: 15000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:21,640-Speed 3464.99 samples/sec   Loss 9.4409   LearningRate 0.0725   Epoch: 2   Global Step: 15010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:24,594-Speed 3467.36 samples/sec   Loss 9.5401   LearningRate 0.0725   Epoch: 2   Global Step: 15020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:27,593-Speed 3415.70 samples/sec   Loss 9.4827   LearningRate 0.0725   Epoch: 2   Global Step: 15030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:30,587-Speed 3421.07 samples/sec   Loss 9.3116   LearningRate 0.0725   Epoch: 2   Global Step: 15040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:33,546-Speed 3461.40 samples/sec   Loss 9.2753   LearningRate 0.0725   Epoch: 2   Global Step: 15050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:36,514-Speed 3450.55 samples/sec   Loss 9.5322   LearningRate 0.0724   Epoch: 2   Global Step: 15060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:39,483-Speed 3451.06 samples/sec   Loss 9.2781   LearningRate 0.0724   Epoch: 2   Global Step: 15070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:45:42,458-Speed 3442.51 samples/sec   Loss 9.5776   LearningRate 0.0724   Epoch: 2   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:45,449-Speed 3424.65 samples/sec   Loss 9.5741   LearningRate 0.0724   Epoch: 2   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:48,418-Speed 3450.08 samples/sec   Loss 9.3886   LearningRate 0.0724   Epoch: 2   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:51,388-Speed 3448.38 samples/sec   Loss 9.4053   LearningRate 0.0724   Epoch: 2   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:54,362-Speed 3444.05 samples/sec   Loss 9.4148   LearningRate 0.0723   Epoch: 2   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:45:57,334-Speed 3446.64 samples/sec   Loss 9.2706   LearningRate 0.0723   Epoch: 2   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:46:00,309-Speed 3443.26 samples/sec   Loss 9.3397   LearningRate 0.0723   Epoch: 2   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:46:03,322-Speed 3398.78 samples/sec   Loss 9.4930   LearningRate 0.0723   Epoch: 2   Global Step: 15150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:06,294-Speed 3446.81 samples/sec   Loss 9.5672   LearningRate 0.0723   Epoch: 2   Global Step: 15160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:09,325-Speed 3378.96 samples/sec   Loss 9.2819   LearningRate 0.0723   Epoch: 2   Global Step: 15170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:22,161-Speed 797.84 samples/sec   Loss 9.0798   LearningRate 0.0722   Epoch: 3   Global Step: 15180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:25,157-Speed 3420.51 samples/sec   Loss 8.5934   LearningRate 0.0722   Epoch: 3   Global Step: 15190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:28,246-Speed 3315.53 samples/sec   Loss 8.5682   LearningRate 0.0722   Epoch: 3   Global Step: 15200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:31,231-Speed 3432.77 samples/sec   Loss 8.4610   LearningRate 0.0722   Epoch: 3   Global Step: 15210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:34,232-Speed 3414.63 samples/sec   Loss 8.5846   LearningRate 0.0722   Epoch: 3   Global Step: 15220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:37,191-Speed 3461.51 samples/sec   Loss 8.5228   LearningRate 0.0722   Epoch: 3   Global Step: 15230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:40,156-Speed 3454.49 samples/sec   Loss 8.6295   LearningRate 0.0721   Epoch: 3   Global Step: 15240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:46:43,126-Speed 3449.11 samples/sec   Loss 8.5577   LearningRate 0.0721   Epoch: 3   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:46:46,092-Speed 3454.21 samples/sec   Loss 8.8430   LearningRate 0.0721   Epoch: 3   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:46:49,116-Speed 3387.25 samples/sec   Loss 8.6771   LearningRate 0.0721   Epoch: 3   Global Step: 15270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:46:52,178-Speed 3344.86 samples/sec   Loss 8.8033   LearningRate 0.0721   Epoch: 3   Global Step: 15280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:46:55,140-Speed 3458.11 samples/sec   Loss 8.6733   LearningRate 0.0721   Epoch: 3   Global Step: 15290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:46:58,116-Speed 3442.45 samples/sec   Loss 8.8458   LearningRate 0.0720   Epoch: 3   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:01,110-Speed 3421.29 samples/sec   Loss 8.8210   LearningRate 0.0720   Epoch: 3   Global Step: 15310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:04,112-Speed 3411.88 samples/sec   Loss 8.8028   LearningRate 0.0720   Epoch: 3   Global Step: 15320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:07,133-Speed 3390.26 samples/sec   Loss 8.7743   LearningRate 0.0720   Epoch: 3   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:10,094-Speed 3460.24 samples/sec   Loss 8.7730   LearningRate 0.0720   Epoch: 3   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:13,060-Speed 3453.41 samples/sec   Loss 8.8098   LearningRate 0.0720   Epoch: 3   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:16,064-Speed 3410.71 samples/sec   Loss 8.7565   LearningRate 0.0719   Epoch: 3   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:19,043-Speed 3437.96 samples/sec   Loss 8.7937   LearningRate 0.0719   Epoch: 3   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:22,002-Speed 3461.02 samples/sec   Loss 8.8227   LearningRate 0.0719   Epoch: 3   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:24,965-Speed 3457.66 samples/sec   Loss 8.9030   LearningRate 0.0719   Epoch: 3   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:28,089-Speed 3278.18 samples/sec   Loss 8.9754   LearningRate 0.0719   Epoch: 3   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:31,203-Speed 3289.63 samples/sec   Loss 8.6840   LearningRate 0.0719   Epoch: 3   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:34,159-Speed 3464.80 samples/sec   Loss 8.7727   LearningRate 0.0718   Epoch: 3   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:37,121-Speed 3459.00 samples/sec   Loss 8.6750   LearningRate 0.0718   Epoch: 3   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:47:40,074-Speed 3468.58 samples/sec   Loss 8.7623   LearningRate 0.0718   Epoch: 3   Global Step: 15440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:47:43,053-Speed 3437.51 samples/sec   Loss 8.8885   LearningRate 0.0718   Epoch: 3   Global Step: 15450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:47:46,046-Speed 3423.00 samples/sec   Loss 8.8977   LearningRate 0.0718   Epoch: 3   Global Step: 15460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:47:49,027-Speed 3436.16 samples/sec   Loss 8.8639   LearningRate 0.0718   Epoch: 3   Global Step: 15470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:47:51,985-Speed 3463.61 samples/sec   Loss 8.9185   LearningRate 0.0717   Epoch: 3   Global Step: 15480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:47:54,963-Speed 3439.08 samples/sec   Loss 8.9685   LearningRate 0.0717   Epoch: 3   Global Step: 15490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:47:57,952-Speed 3427.13 samples/sec   Loss 9.0139   LearningRate 0.0717   Epoch: 3   Global Step: 15500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:00,936-Speed 3433.11 samples/sec   Loss 8.9678   LearningRate 0.0717   Epoch: 3   Global Step: 15510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:03,932-Speed 3418.61 samples/sec   Loss 8.9816   LearningRate 0.0717   Epoch: 3   Global Step: 15520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:06,908-Speed 3441.10 samples/sec   Loss 8.8973   LearningRate 0.0717   Epoch: 3   Global Step: 15530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:09,879-Speed 3448.45 samples/sec   Loss 8.9878   LearningRate 0.0716   Epoch: 3   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:12,852-Speed 3445.30 samples/sec   Loss 9.0036   LearningRate 0.0716   Epoch: 3   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:15,843-Speed 3425.04 samples/sec   Loss 8.9717   LearningRate 0.0716   Epoch: 3   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:18,842-Speed 3415.17 samples/sec   Loss 8.8409   LearningRate 0.0716   Epoch: 3   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:21,813-Speed 3447.86 samples/sec   Loss 8.7602   LearningRate 0.0716   Epoch: 3   Global Step: 15580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:24,805-Speed 3423.19 samples/sec   Loss 8.9194   LearningRate 0.0716   Epoch: 3   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:27,793-Speed 3427.97 samples/sec   Loss 9.0528   LearningRate 0.0715   Epoch: 3   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:30,789-Speed 3418.62 samples/sec   Loss 9.0788   LearningRate 0.0715   Epoch: 3   Global Step: 15610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:33,737-Speed 3474.14 samples/sec   Loss 8.9407   LearningRate 0.0715   Epoch: 3   Global Step: 15620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:48:36,681-Speed 3479.33 samples/sec   Loss 8.8296   LearningRate 0.0715   Epoch: 3   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:39,664-Speed 3433.98 samples/sec   Loss 9.0283   LearningRate 0.0715   Epoch: 3   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:42,658-Speed 3421.46 samples/sec   Loss 9.0809   LearningRate 0.0715   Epoch: 3   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:45,647-Speed 3427.47 samples/sec   Loss 8.9635   LearningRate 0.0714   Epoch: 3   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:48,602-Speed 3466.05 samples/sec   Loss 9.1361   LearningRate 0.0714   Epoch: 3   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:51,591-Speed 3426.43 samples/sec   Loss 8.9753   LearningRate 0.0714   Epoch: 3   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:54,613-Speed 3389.83 samples/sec   Loss 8.9423   LearningRate 0.0714   Epoch: 3   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:48:57,607-Speed 3421.21 samples/sec   Loss 8.7794   LearningRate 0.0714   Epoch: 3   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:49:00,663-Speed 3351.44 samples/sec   Loss 9.0464   LearningRate 0.0714   Epoch: 3   Global Step: 15710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:49:03,707-Speed 3365.00 samples/sec   Loss 9.1324   LearningRate 0.0713   Epoch: 3   Global Step: 15720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:49:06,710-Speed 3410.47 samples/sec   Loss 9.0828   LearningRate 0.0713   Epoch: 3   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:09,725-Speed 3397.74 samples/sec   Loss 8.9976   LearningRate 0.0713   Epoch: 3   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:12,694-Speed 3449.97 samples/sec   Loss 8.9621   LearningRate 0.0713   Epoch: 3   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:15,660-Speed 3453.60 samples/sec   Loss 9.0713   LearningRate 0.0713   Epoch: 3   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:18,628-Speed 3450.21 samples/sec   Loss 8.7920   LearningRate 0.0713   Epoch: 3   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:21,604-Speed 3442.61 samples/sec   Loss 9.0916   LearningRate 0.0712   Epoch: 3   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:24,568-Speed 3456.19 samples/sec   Loss 9.1287   LearningRate 0.0712   Epoch: 3   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:27,534-Speed 3453.74 samples/sec   Loss 9.1006   LearningRate 0.0712   Epoch: 3   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:30,509-Speed 3442.54 samples/sec   Loss 9.1528   LearningRate 0.0712   Epoch: 3   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:33,462-Speed 3468.79 samples/sec   Loss 9.0359   LearningRate 0.0712   Epoch: 3   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:36,421-Speed 3461.71 samples/sec   Loss 9.0522   LearningRate 0.0712   Epoch: 3   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:39,416-Speed 3420.16 samples/sec   Loss 8.9895   LearningRate 0.0711   Epoch: 3   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:42,389-Speed 3445.87 samples/sec   Loss 8.8675   LearningRate 0.0711   Epoch: 3   Global Step: 15850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:45,405-Speed 3396.65 samples/sec   Loss 9.2030   LearningRate 0.0711   Epoch: 3   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:48,385-Speed 3436.84 samples/sec   Loss 9.0171   LearningRate 0.0711   Epoch: 3   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:51,364-Speed 3438.27 samples/sec   Loss 9.1811   LearningRate 0.0711   Epoch: 3   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:54,336-Speed 3446.15 samples/sec   Loss 9.0621   LearningRate 0.0711   Epoch: 3   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:49:57,294-Speed 3462.98 samples/sec   Loss 8.9758   LearningRate 0.0710   Epoch: 3   Global Step: 15900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:50:00,278-Speed 3433.23 samples/sec   Loss 9.2585   LearningRate 0.0710   Epoch: 3   Global Step: 15910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:50:03,259-Speed 3436.08 samples/sec   Loss 9.0530   LearningRate 0.0710   Epoch: 3   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:50:06,206-Speed 3476.09 samples/sec   Loss 9.0952   LearningRate 0.0710   Epoch: 3   Global Step: 15930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:50:09,187-Speed 3436.26 samples/sec   Loss 9.2475   LearningRate 0.0710   Epoch: 3   Global Step: 15940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:50:12,174-Speed 3428.41 samples/sec   Loss 9.1878   LearningRate 0.0710   Epoch: 3   Global Step: 15950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:50:15,151-Speed 3451.21 samples/sec   Loss 9.0750   LearningRate 0.0709   Epoch: 3   Global Step: 15960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:50:18,147-Speed 3418.95 samples/sec   Loss 9.2180   LearningRate 0.0709   Epoch: 3   Global Step: 15970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:50:21,113-Speed 3453.91 samples/sec   Loss 9.2224   LearningRate 0.0709   Epoch: 3   Global Step: 15980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:50:24,105-Speed 3423.10 samples/sec   Loss 9.1421   LearningRate 0.0709   Epoch: 3   Global Step: 15990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:50:27,069-Speed 3455.64 samples/sec   Loss 8.9650   LearningRate 0.0709   Epoch: 3   Global Step: 16000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:51:11,213-[lfw][16000]XNorm: 21.934007
Training: 2022-04-11 00:51:11,214-[lfw][16000]Accuracy-Flip: 0.99533+-0.00340
Training: 2022-04-11 00:51:11,214-[lfw][16000]Accuracy-Highest: 0.99667
Training: 2022-04-11 00:52:02,617-[cfp_fp][16000]XNorm: 19.244206
Training: 2022-04-11 00:52:02,618-[cfp_fp][16000]Accuracy-Flip: 0.95514+-0.01221
Training: 2022-04-11 00:52:02,618-[cfp_fp][16000]Accuracy-Highest: 0.95514
Training: 2022-04-11 00:52:46,694-[agedb_30][16000]XNorm: 21.524140
Training: 2022-04-11 00:52:46,694-[agedb_30][16000]Accuracy-Flip: 0.97133+-0.00816
Training: 2022-04-11 00:52:46,695-[agedb_30][16000]Accuracy-Highest: 0.97133
Training: 2022-04-11 00:52:49,664-Speed 71.81 samples/sec   Loss 9.0826   LearningRate 0.0709   Epoch: 3   Global Step: 16010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:52:52,627-Speed 3456.11 samples/sec   Loss 9.0774   LearningRate 0.0708   Epoch: 3   Global Step: 16020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:52:55,605-Speed 3439.50 samples/sec   Loss 9.2613   LearningRate 0.0708   Epoch: 3   Global Step: 16030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:52:58,582-Speed 3440.12 samples/sec   Loss 8.9511   LearningRate 0.0708   Epoch: 3   Global Step: 16040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:53:01,544-Speed 3458.66 samples/sec   Loss 9.0930   LearningRate 0.0708   Epoch: 3   Global Step: 16050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:53:04,512-Speed 3451.87 samples/sec   Loss 9.1432   LearningRate 0.0708   Epoch: 3   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:07,472-Speed 3460.03 samples/sec   Loss 9.1107   LearningRate 0.0708   Epoch: 3   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:10,461-Speed 3426.35 samples/sec   Loss 8.9173   LearningRate 0.0707   Epoch: 3   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:13,434-Speed 3445.66 samples/sec   Loss 8.9715   LearningRate 0.0707   Epoch: 3   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:16,493-Speed 3348.38 samples/sec   Loss 9.1316   LearningRate 0.0707   Epoch: 3   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:19,472-Speed 3439.08 samples/sec   Loss 9.1335   LearningRate 0.0707   Epoch: 3   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:22,432-Speed 3460.13 samples/sec   Loss 9.1764   LearningRate 0.0707   Epoch: 3   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:25,399-Speed 3453.29 samples/sec   Loss 9.2110   LearningRate 0.0707   Epoch: 3   Global Step: 16130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:28,354-Speed 3466.30 samples/sec   Loss 9.0278   LearningRate 0.0706   Epoch: 3   Global Step: 16140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:31,334-Speed 3436.48 samples/sec   Loss 9.1643   LearningRate 0.0706   Epoch: 3   Global Step: 16150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:34,272-Speed 3486.22 samples/sec   Loss 9.1909   LearningRate 0.0706   Epoch: 3   Global Step: 16160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:37,253-Speed 3436.75 samples/sec   Loss 9.0281   LearningRate 0.0706   Epoch: 3   Global Step: 16170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:40,214-Speed 3459.43 samples/sec   Loss 8.9820   LearningRate 0.0706   Epoch: 3   Global Step: 16180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:43,198-Speed 3432.90 samples/sec   Loss 9.1402   LearningRate 0.0706   Epoch: 3   Global Step: 16190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:46,159-Speed 3457.98 samples/sec   Loss 9.0885   LearningRate 0.0705   Epoch: 3   Global Step: 16200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:49,137-Speed 3440.94 samples/sec   Loss 9.1159   LearningRate 0.0705   Epoch: 3   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:52,088-Speed 3470.88 samples/sec   Loss 9.1206   LearningRate 0.0705   Epoch: 3   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:55,050-Speed 3457.66 samples/sec   Loss 9.0779   LearningRate 0.0705   Epoch: 3   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:53:58,069-Speed 3393.29 samples/sec   Loss 9.1311   LearningRate 0.0705   Epoch: 3   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:54:01,045-Speed 3441.91 samples/sec   Loss 9.1850   LearningRate 0.0705   Epoch: 3   Global Step: 16250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:04,021-Speed 3441.66 samples/sec   Loss 9.0682   LearningRate 0.0704   Epoch: 3   Global Step: 16260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:06,991-Speed 3449.14 samples/sec   Loss 9.1243   LearningRate 0.0704   Epoch: 3   Global Step: 16270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:09,964-Speed 3445.64 samples/sec   Loss 9.0400   LearningRate 0.0704   Epoch: 3   Global Step: 16280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:12,933-Speed 3449.39 samples/sec   Loss 9.0981   LearningRate 0.0704   Epoch: 3   Global Step: 16290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:15,910-Speed 3440.91 samples/sec   Loss 9.0383   LearningRate 0.0704   Epoch: 3   Global Step: 16300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:18,893-Speed 3433.42 samples/sec   Loss 9.1362   LearningRate 0.0704   Epoch: 3   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:21,869-Speed 3441.28 samples/sec   Loss 9.0091   LearningRate 0.0703   Epoch: 3   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:24,848-Speed 3438.54 samples/sec   Loss 9.2026   LearningRate 0.0703   Epoch: 3   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:27,838-Speed 3426.00 samples/sec   Loss 8.9861   LearningRate 0.0703   Epoch: 3   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:30,812-Speed 3444.68 samples/sec   Loss 9.2124   LearningRate 0.0703   Epoch: 3   Global Step: 16350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:54:33,771-Speed 3460.76 samples/sec   Loss 9.1494   LearningRate 0.0703   Epoch: 3   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:36,772-Speed 3413.62 samples/sec   Loss 9.2285   LearningRate 0.0703   Epoch: 3   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:39,751-Speed 3438.52 samples/sec   Loss 8.9151   LearningRate 0.0702   Epoch: 3   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:42,732-Speed 3435.90 samples/sec   Loss 9.1652   LearningRate 0.0702   Epoch: 3   Global Step: 16390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:45,712-Speed 3436.97 samples/sec   Loss 8.9324   LearningRate 0.0702   Epoch: 3   Global Step: 16400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:48,689-Speed 3441.38 samples/sec   Loss 9.0167   LearningRate 0.0702   Epoch: 3   Global Step: 16410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:51,692-Speed 3409.88 samples/sec   Loss 9.1214   LearningRate 0.0702   Epoch: 3   Global Step: 16420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:54,721-Speed 3382.19 samples/sec   Loss 9.1767   LearningRate 0.0702   Epoch: 3   Global Step: 16430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:54:57,667-Speed 3476.83 samples/sec   Loss 9.0319   LearningRate 0.0701   Epoch: 3   Global Step: 16440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:00,647-Speed 3437.31 samples/sec   Loss 8.9848   LearningRate 0.0701   Epoch: 3   Global Step: 16450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:03,635-Speed 3427.29 samples/sec   Loss 9.0639   LearningRate 0.0701   Epoch: 3   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:06,618-Speed 3434.31 samples/sec   Loss 9.0549   LearningRate 0.0701   Epoch: 3   Global Step: 16470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:09,589-Speed 3447.43 samples/sec   Loss 9.1362   LearningRate 0.0701   Epoch: 3   Global Step: 16480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:12,536-Speed 3475.91 samples/sec   Loss 8.9514   LearningRate 0.0701   Epoch: 3   Global Step: 16490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:15,499-Speed 3457.21 samples/sec   Loss 9.0937   LearningRate 0.0700   Epoch: 3   Global Step: 16500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:18,487-Speed 3427.59 samples/sec   Loss 9.2047   LearningRate 0.0700   Epoch: 3   Global Step: 16510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:21,452-Speed 3455.43 samples/sec   Loss 9.1047   LearningRate 0.0700   Epoch: 3   Global Step: 16520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:24,434-Speed 3435.25 samples/sec   Loss 9.1832   LearningRate 0.0700   Epoch: 3   Global Step: 16530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:27,438-Speed 3409.36 samples/sec   Loss 8.9555   LearningRate 0.0700   Epoch: 3   Global Step: 16540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:30,424-Speed 3429.21 samples/sec   Loss 9.3129   LearningRate 0.0700   Epoch: 3   Global Step: 16550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:33,416-Speed 3423.95 samples/sec   Loss 9.2515   LearningRate 0.0699   Epoch: 3   Global Step: 16560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:55:36,410-Speed 3420.99 samples/sec   Loss 9.1217   LearningRate 0.0699   Epoch: 3   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:39,374-Speed 3455.61 samples/sec   Loss 8.9313   LearningRate 0.0699   Epoch: 3   Global Step: 16580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:42,353-Speed 3438.63 samples/sec   Loss 9.0086   LearningRate 0.0699   Epoch: 3   Global Step: 16590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:45,325-Speed 3445.94 samples/sec   Loss 9.0601   LearningRate 0.0699   Epoch: 3   Global Step: 16600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:48,308-Speed 3433.64 samples/sec   Loss 9.2597   LearningRate 0.0699   Epoch: 3   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:51,305-Speed 3418.18 samples/sec   Loss 9.0229   LearningRate 0.0698   Epoch: 3   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:54,266-Speed 3459.35 samples/sec   Loss 9.0036   LearningRate 0.0698   Epoch: 3   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:55:57,254-Speed 3427.89 samples/sec   Loss 9.1638   LearningRate 0.0698   Epoch: 3   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:00,199-Speed 3477.65 samples/sec   Loss 8.9586   LearningRate 0.0698   Epoch: 3   Global Step: 16650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:03,170-Speed 3447.35 samples/sec   Loss 9.1081   LearningRate 0.0698   Epoch: 3   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:06,136-Speed 3453.41 samples/sec   Loss 9.0237   LearningRate 0.0698   Epoch: 3   Global Step: 16670   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 00:56:09,082-Speed 3477.38 samples/sec   Loss 9.1242   LearningRate 0.0697   Epoch: 3   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:12,045-Speed 3456.52 samples/sec   Loss 9.2884   LearningRate 0.0697   Epoch: 3   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:14,993-Speed 3475.09 samples/sec   Loss 9.0096   LearningRate 0.0697   Epoch: 3   Global Step: 16700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:17,975-Speed 3434.70 samples/sec   Loss 9.0843   LearningRate 0.0697   Epoch: 3   Global Step: 16710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:20,961-Speed 3429.97 samples/sec   Loss 9.1160   LearningRate 0.0697   Epoch: 3   Global Step: 16720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:23,964-Speed 3410.10 samples/sec   Loss 9.0858   LearningRate 0.0697   Epoch: 3   Global Step: 16730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:27,066-Speed 3302.21 samples/sec   Loss 9.2949   LearningRate 0.0696   Epoch: 3   Global Step: 16740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:30,122-Speed 3351.41 samples/sec   Loss 9.0445   LearningRate 0.0696   Epoch: 3   Global Step: 16750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:33,086-Speed 3456.04 samples/sec   Loss 8.9160   LearningRate 0.0696   Epoch: 3   Global Step: 16760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:36,080-Speed 3421.01 samples/sec   Loss 9.0163   LearningRate 0.0696   Epoch: 3   Global Step: 16770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:39,071-Speed 3425.36 samples/sec   Loss 9.1208   LearningRate 0.0696   Epoch: 3   Global Step: 16780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:42,082-Speed 3401.30 samples/sec   Loss 9.0468   LearningRate 0.0696   Epoch: 3   Global Step: 16790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:56:45,082-Speed 3414.77 samples/sec   Loss 9.1335   LearningRate 0.0695   Epoch: 3   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:48,046-Speed 3455.81 samples/sec   Loss 8.9640   LearningRate 0.0695   Epoch: 3   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:51,031-Speed 3430.57 samples/sec   Loss 9.0520   LearningRate 0.0695   Epoch: 3   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:54,015-Speed 3432.81 samples/sec   Loss 9.1075   LearningRate 0.0695   Epoch: 3   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:56,995-Speed 3437.90 samples/sec   Loss 9.0948   LearningRate 0.0695   Epoch: 3   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:56:59,983-Speed 3428.23 samples/sec   Loss 9.1700   LearningRate 0.0695   Epoch: 3   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:57:02,977-Speed 3420.67 samples/sec   Loss 9.1150   LearningRate 0.0694   Epoch: 3   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:57:05,951-Speed 3444.52 samples/sec   Loss 8.9959   LearningRate 0.0694   Epoch: 3   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:57:08,940-Speed 3426.51 samples/sec   Loss 9.1411   LearningRate 0.0694   Epoch: 3   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:57:11,936-Speed 3419.14 samples/sec   Loss 9.0144   LearningRate 0.0694   Epoch: 3   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:57:14,908-Speed 3445.74 samples/sec   Loss 9.0130   LearningRate 0.0694   Epoch: 3   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:57:17,908-Speed 3414.06 samples/sec   Loss 8.9562   LearningRate 0.0694   Epoch: 3   Global Step: 16910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 00:57:20,892-Speed 3433.26 samples/sec   Loss 9.2799   LearningRate 0.0693   Epoch: 3   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:23,880-Speed 3428.06 samples/sec   Loss 9.1079   LearningRate 0.0693   Epoch: 3   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:27,009-Speed 3273.35 samples/sec   Loss 9.1873   LearningRate 0.0693   Epoch: 3   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:29,961-Speed 3469.69 samples/sec   Loss 9.0273   LearningRate 0.0693   Epoch: 3   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:32,947-Speed 3430.17 samples/sec   Loss 9.1868   LearningRate 0.0693   Epoch: 3   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:35,942-Speed 3419.27 samples/sec   Loss 8.9775   LearningRate 0.0693   Epoch: 3   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:38,929-Speed 3429.68 samples/sec   Loss 9.0405   LearningRate 0.0692   Epoch: 3   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:41,926-Speed 3417.76 samples/sec   Loss 8.9929   LearningRate 0.0692   Epoch: 3   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:44,911-Speed 3431.86 samples/sec   Loss 8.9240   LearningRate 0.0692   Epoch: 3   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 00:57:47,896-Speed 3430.58 samples/sec   Loss 9.0052   LearningRate 0.0692   Epoch: 3   Global Step: 17010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:57:50,902-Speed 3407.03 samples/sec   Loss 9.0505   LearningRate 0.0692   Epoch: 3   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:57:53,914-Speed 3401.89 samples/sec   Loss 9.1428   LearningRate 0.0692   Epoch: 3   Global Step: 17030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:57:56,885-Speed 3447.35 samples/sec   Loss 8.9986   LearningRate 0.0691   Epoch: 3   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:57:59,891-Speed 3406.96 samples/sec   Loss 8.9888   LearningRate 0.0691   Epoch: 3   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:02,871-Speed 3437.00 samples/sec   Loss 9.0572   LearningRate 0.0691   Epoch: 3   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:05,863-Speed 3423.50 samples/sec   Loss 9.1276   LearningRate 0.0691   Epoch: 3   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:08,862-Speed 3415.84 samples/sec   Loss 9.3065   LearningRate 0.0691   Epoch: 3   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:11,861-Speed 3415.40 samples/sec   Loss 9.0047   LearningRate 0.0691   Epoch: 3   Global Step: 17090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:14,866-Speed 3409.02 samples/sec   Loss 9.0966   LearningRate 0.0690   Epoch: 3   Global Step: 17100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:17,847-Speed 3435.89 samples/sec   Loss 8.8553   LearningRate 0.0690   Epoch: 3   Global Step: 17110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:20,834-Speed 3429.16 samples/sec   Loss 9.2012   LearningRate 0.0690   Epoch: 3   Global Step: 17120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:23,835-Speed 3413.10 samples/sec   Loss 9.1500   LearningRate 0.0690   Epoch: 3   Global Step: 17130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:26,826-Speed 3424.83 samples/sec   Loss 9.0452   LearningRate 0.0690   Epoch: 3   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:29,827-Speed 3413.21 samples/sec   Loss 9.1370   LearningRate 0.0690   Epoch: 3   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:32,803-Speed 3441.09 samples/sec   Loss 9.0833   LearningRate 0.0690   Epoch: 3   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:35,787-Speed 3432.66 samples/sec   Loss 9.0475   LearningRate 0.0689   Epoch: 3   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:38,750-Speed 3457.00 samples/sec   Loss 9.0310   LearningRate 0.0689   Epoch: 3   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:41,732-Speed 3435.37 samples/sec   Loss 8.9994   LearningRate 0.0689   Epoch: 3   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:44,720-Speed 3428.57 samples/sec   Loss 8.9021   LearningRate 0.0689   Epoch: 3   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:47,699-Speed 3437.42 samples/sec   Loss 9.0955   LearningRate 0.0689   Epoch: 3   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:50,719-Speed 3391.09 samples/sec   Loss 9.0208   LearningRate 0.0689   Epoch: 3   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:58:53,707-Speed 3428.65 samples/sec   Loss 8.9463   LearningRate 0.0688   Epoch: 3   Global Step: 17230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:56,733-Speed 3384.96 samples/sec   Loss 8.9916   LearningRate 0.0688   Epoch: 3   Global Step: 17240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:58:59,740-Speed 3406.36 samples/sec   Loss 9.1441   LearningRate 0.0688   Epoch: 3   Global Step: 17250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:02,764-Speed 3387.05 samples/sec   Loss 8.9943   LearningRate 0.0688   Epoch: 3   Global Step: 17260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:05,752-Speed 3428.31 samples/sec   Loss 9.0257   LearningRate 0.0688   Epoch: 3   Global Step: 17270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:08,744-Speed 3423.48 samples/sec   Loss 9.0480   LearningRate 0.0688   Epoch: 3   Global Step: 17280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:11,716-Speed 3446.37 samples/sec   Loss 9.0147   LearningRate 0.0687   Epoch: 3   Global Step: 17290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:14,711-Speed 3420.42 samples/sec   Loss 8.8374   LearningRate 0.0687   Epoch: 3   Global Step: 17300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:17,694-Speed 3433.64 samples/sec   Loss 8.9573   LearningRate 0.0687   Epoch: 3   Global Step: 17310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:20,688-Speed 3420.94 samples/sec   Loss 9.0215   LearningRate 0.0687   Epoch: 3   Global Step: 17320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 00:59:23,674-Speed 3430.32 samples/sec   Loss 8.9377   LearningRate 0.0687   Epoch: 3   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:26,660-Speed 3430.34 samples/sec   Loss 8.8517   LearningRate 0.0687   Epoch: 3   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:29,659-Speed 3415.95 samples/sec   Loss 9.0739   LearningRate 0.0686   Epoch: 3   Global Step: 17350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:32,647-Speed 3427.86 samples/sec   Loss 8.9152   LearningRate 0.0686   Epoch: 3   Global Step: 17360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:35,648-Speed 3413.01 samples/sec   Loss 8.9737   LearningRate 0.0686   Epoch: 3   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:38,678-Speed 3381.19 samples/sec   Loss 9.1094   LearningRate 0.0686   Epoch: 3   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:41,734-Speed 3351.50 samples/sec   Loss 8.9966   LearningRate 0.0686   Epoch: 3   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:44,724-Speed 3425.46 samples/sec   Loss 9.0993   LearningRate 0.0686   Epoch: 3   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:47,732-Speed 3405.51 samples/sec   Loss 9.0864   LearningRate 0.0685   Epoch: 3   Global Step: 17410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:50,732-Speed 3413.38 samples/sec   Loss 9.0527   LearningRate 0.0685   Epoch: 3   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:53,719-Speed 3430.71 samples/sec   Loss 9.0029   LearningRate 0.0685   Epoch: 3   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:56,708-Speed 3426.00 samples/sec   Loss 8.9973   LearningRate 0.0685   Epoch: 3   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 00:59:59,688-Speed 3438.46 samples/sec   Loss 9.1688   LearningRate 0.0685   Epoch: 3   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:02,658-Speed 3448.06 samples/sec   Loss 8.9225   LearningRate 0.0685   Epoch: 3   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:05,640-Speed 3434.58 samples/sec   Loss 8.9811   LearningRate 0.0684   Epoch: 3   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:08,638-Speed 3416.94 samples/sec   Loss 9.0601   LearningRate 0.0684   Epoch: 3   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:11,643-Speed 3408.56 samples/sec   Loss 9.0804   LearningRate 0.0684   Epoch: 3   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:14,652-Speed 3404.21 samples/sec   Loss 9.0487   LearningRate 0.0684   Epoch: 3   Global Step: 17500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:17,653-Speed 3413.01 samples/sec   Loss 9.0912   LearningRate 0.0684   Epoch: 3   Global Step: 17510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:20,623-Speed 3449.16 samples/sec   Loss 8.7135   LearningRate 0.0684   Epoch: 3   Global Step: 17520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:23,585-Speed 3458.11 samples/sec   Loss 9.0592   LearningRate 0.0683   Epoch: 3   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:26,586-Speed 3412.99 samples/sec   Loss 8.9324   LearningRate 0.0683   Epoch: 3   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:29,574-Speed 3428.69 samples/sec   Loss 9.1435   LearningRate 0.0683   Epoch: 3   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:32,567-Speed 3421.36 samples/sec   Loss 9.0951   LearningRate 0.0683   Epoch: 3   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:35,579-Speed 3400.72 samples/sec   Loss 8.9493   LearningRate 0.0683   Epoch: 3   Global Step: 17570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:38,568-Speed 3427.89 samples/sec   Loss 8.8952   LearningRate 0.0683   Epoch: 3   Global Step: 17580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:41,554-Speed 3429.40 samples/sec   Loss 9.0218   LearningRate 0.0682   Epoch: 3   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:44,555-Speed 3413.51 samples/sec   Loss 8.8041   LearningRate 0.0682   Epoch: 3   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:47,564-Speed 3404.14 samples/sec   Loss 8.8750   LearningRate 0.0682   Epoch: 3   Global Step: 17610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:50,557-Speed 3422.29 samples/sec   Loss 8.8677   LearningRate 0.0682   Epoch: 3   Global Step: 17620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:53,548-Speed 3424.73 samples/sec   Loss 8.7401   LearningRate 0.0682   Epoch: 3   Global Step: 17630   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 01:00:56,520-Speed 3446.25 samples/sec   Loss 8.9814   LearningRate 0.0682   Epoch: 3   Global Step: 17640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:00:59,533-Speed 3399.00 samples/sec   Loss 8.9586   LearningRate 0.0681   Epoch: 3   Global Step: 17650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:02,576-Speed 3366.13 samples/sec   Loss 8.9060   LearningRate 0.0681   Epoch: 3   Global Step: 17660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:05,593-Speed 3395.54 samples/sec   Loss 9.0184   LearningRate 0.0681   Epoch: 3   Global Step: 17670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:08,588-Speed 3420.23 samples/sec   Loss 8.9150   LearningRate 0.0681   Epoch: 3   Global Step: 17680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:11,582-Speed 3420.70 samples/sec   Loss 8.8865   LearningRate 0.0681   Epoch: 3   Global Step: 17690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:14,585-Speed 3411.62 samples/sec   Loss 8.8681   LearningRate 0.0681   Epoch: 3   Global Step: 17700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:17,594-Speed 3403.82 samples/sec   Loss 8.9130   LearningRate 0.0681   Epoch: 3   Global Step: 17710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:20,595-Speed 3412.23 samples/sec   Loss 8.8698   LearningRate 0.0680   Epoch: 3   Global Step: 17720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:23,581-Speed 3430.57 samples/sec   Loss 8.8157   LearningRate 0.0680   Epoch: 3   Global Step: 17730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:26,586-Speed 3408.40 samples/sec   Loss 8.9498   LearningRate 0.0680   Epoch: 3   Global Step: 17740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:29,576-Speed 3425.57 samples/sec   Loss 8.8608   LearningRate 0.0680   Epoch: 3   Global Step: 17750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:01:32,565-Speed 3427.83 samples/sec   Loss 8.8063   LearningRate 0.0680   Epoch: 3   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:35,575-Speed 3402.39 samples/sec   Loss 8.9013   LearningRate 0.0680   Epoch: 3   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:38,574-Speed 3415.60 samples/sec   Loss 8.7488   LearningRate 0.0679   Epoch: 3   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:41,560-Speed 3430.07 samples/sec   Loss 8.8586   LearningRate 0.0679   Epoch: 3   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:44,563-Speed 3410.72 samples/sec   Loss 8.9031   LearningRate 0.0679   Epoch: 3   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:47,570-Speed 3405.83 samples/sec   Loss 8.9856   LearningRate 0.0679   Epoch: 3   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:50,578-Speed 3405.56 samples/sec   Loss 9.0899   LearningRate 0.0679   Epoch: 3   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:53,578-Speed 3415.04 samples/sec   Loss 8.9345   LearningRate 0.0679   Epoch: 3   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:56,578-Speed 3414.06 samples/sec   Loss 8.9206   LearningRate 0.0678   Epoch: 3   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:01:59,583-Speed 3408.07 samples/sec   Loss 8.9677   LearningRate 0.0678   Epoch: 3   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:02:02,573-Speed 3426.74 samples/sec   Loss 8.9122   LearningRate 0.0678   Epoch: 3   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:02:05,567-Speed 3421.44 samples/sec   Loss 8.8924   LearningRate 0.0678   Epoch: 3   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:02:08,577-Speed 3402.19 samples/sec   Loss 9.0935   LearningRate 0.0678   Epoch: 3   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:02:11,577-Speed 3414.10 samples/sec   Loss 8.9469   LearningRate 0.0678   Epoch: 3   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:02:14,555-Speed 3439.11 samples/sec   Loss 8.9367   LearningRate 0.0677   Epoch: 3   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:02:17,576-Speed 3390.52 samples/sec   Loss 8.9326   LearningRate 0.0677   Epoch: 3   Global Step: 17910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:20,570-Speed 3422.34 samples/sec   Loss 8.8337   LearningRate 0.0677   Epoch: 3   Global Step: 17920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:23,572-Speed 3411.77 samples/sec   Loss 8.9930   LearningRate 0.0677   Epoch: 3   Global Step: 17930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:26,585-Speed 3398.72 samples/sec   Loss 8.9308   LearningRate 0.0677   Epoch: 3   Global Step: 17940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:29,674-Speed 3315.89 samples/sec   Loss 8.9228   LearningRate 0.0677   Epoch: 3   Global Step: 17950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:32,724-Speed 3359.13 samples/sec   Loss 9.0076   LearningRate 0.0676   Epoch: 3   Global Step: 17960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:35,714-Speed 3425.05 samples/sec   Loss 8.9107   LearningRate 0.0676   Epoch: 3   Global Step: 17970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:38,714-Speed 3413.97 samples/sec   Loss 8.9100   LearningRate 0.0676   Epoch: 3   Global Step: 17980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:41,722-Speed 3405.44 samples/sec   Loss 9.0249   LearningRate 0.0676   Epoch: 3   Global Step: 17990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:02:44,700-Speed 3440.09 samples/sec   Loss 9.0672   LearningRate 0.0676   Epoch: 3   Global Step: 18000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:03:29,188-[lfw][18000]XNorm: 22.461839
Training: 2022-04-11 01:03:29,189-[lfw][18000]Accuracy-Flip: 0.99717+-0.00299
Training: 2022-04-11 01:03:29,189-[lfw][18000]Accuracy-Highest: 0.99717
Training: 2022-04-11 01:04:20,696-[cfp_fp][18000]XNorm: 19.853714
Training: 2022-04-11 01:04:20,697-[cfp_fp][18000]Accuracy-Flip: 0.95000+-0.01088
Training: 2022-04-11 01:04:20,697-[cfp_fp][18000]Accuracy-Highest: 0.95514
Training: 2022-04-11 01:05:04,838-[agedb_30][18000]XNorm: 21.988876
Training: 2022-04-11 01:05:04,838-[agedb_30][18000]Accuracy-Flip: 0.97050+-0.00946
Training: 2022-04-11 01:05:04,839-[agedb_30][18000]Accuracy-Highest: 0.97133
Training: 2022-04-11 01:05:07,827-Speed 71.55 samples/sec   Loss 8.8999   LearningRate 0.0676   Epoch: 3   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:10,795-Speed 3450.22 samples/sec   Loss 8.8670   LearningRate 0.0675   Epoch: 3   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:13,757-Speed 3458.86 samples/sec   Loss 8.8806   LearningRate 0.0675   Epoch: 3   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:16,726-Speed 3450.12 samples/sec   Loss 8.7271   LearningRate 0.0675   Epoch: 3   Global Step: 18040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:19,716-Speed 3425.55 samples/sec   Loss 9.1107   LearningRate 0.0675   Epoch: 3   Global Step: 18050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:22,704-Speed 3428.81 samples/sec   Loss 8.9628   LearningRate 0.0675   Epoch: 3   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:25,702-Speed 3416.19 samples/sec   Loss 8.8780   LearningRate 0.0675   Epoch: 3   Global Step: 18070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:28,681-Speed 3438.31 samples/sec   Loss 8.8771   LearningRate 0.0674   Epoch: 3   Global Step: 18080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:31,667-Speed 3430.83 samples/sec   Loss 9.0826   LearningRate 0.0674   Epoch: 3   Global Step: 18090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:34,646-Speed 3438.46 samples/sec   Loss 8.8487   LearningRate 0.0674   Epoch: 3   Global Step: 18100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:37,607-Speed 3459.10 samples/sec   Loss 8.7722   LearningRate 0.0674   Epoch: 3   Global Step: 18110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:05:40,563-Speed 3465.51 samples/sec   Loss 8.9420   LearningRate 0.0674   Epoch: 3   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:05:43,561-Speed 3415.50 samples/sec   Loss 8.9321   LearningRate 0.0674   Epoch: 3   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:05:46,549-Speed 3428.72 samples/sec   Loss 8.9457   LearningRate 0.0674   Epoch: 3   Global Step: 18140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:05:49,551-Speed 3411.87 samples/sec   Loss 8.8875   LearningRate 0.0673   Epoch: 3   Global Step: 18150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:05:52,555-Speed 3410.08 samples/sec   Loss 9.0006   LearningRate 0.0673   Epoch: 3   Global Step: 18160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:05:55,545-Speed 3425.76 samples/sec   Loss 8.8514   LearningRate 0.0673   Epoch: 3   Global Step: 18170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:05:58,526-Speed 3434.96 samples/sec   Loss 8.9087   LearningRate 0.0673   Epoch: 3   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:01,616-Speed 3314.94 samples/sec   Loss 8.9002   LearningRate 0.0673   Epoch: 3   Global Step: 18190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:04,597-Speed 3436.47 samples/sec   Loss 9.0205   LearningRate 0.0673   Epoch: 3   Global Step: 18200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:07,576-Speed 3438.18 samples/sec   Loss 8.8662   LearningRate 0.0672   Epoch: 3   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:10,562-Speed 3430.66 samples/sec   Loss 9.0302   LearningRate 0.0672   Epoch: 3   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:06:13,564-Speed 3412.40 samples/sec   Loss 9.0116   LearningRate 0.0672   Epoch: 3   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:06:16,533-Speed 3449.51 samples/sec   Loss 8.8629   LearningRate 0.0672   Epoch: 3   Global Step: 18240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:19,530-Speed 3417.59 samples/sec   Loss 8.6938   LearningRate 0.0672   Epoch: 3   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:22,525-Speed 3420.36 samples/sec   Loss 8.8230   LearningRate 0.0672   Epoch: 3   Global Step: 18260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:25,524-Speed 3415.43 samples/sec   Loss 8.6712   LearningRate 0.0671   Epoch: 3   Global Step: 18270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:28,503-Speed 3438.38 samples/sec   Loss 8.8571   LearningRate 0.0671   Epoch: 3   Global Step: 18280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:31,493-Speed 3426.53 samples/sec   Loss 8.7130   LearningRate 0.0671   Epoch: 3   Global Step: 18290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:34,484-Speed 3423.85 samples/sec   Loss 8.8119   LearningRate 0.0671   Epoch: 3   Global Step: 18300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:37,470-Speed 3431.23 samples/sec   Loss 8.8824   LearningRate 0.0671   Epoch: 3   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:40,469-Speed 3414.69 samples/sec   Loss 8.6971   LearningRate 0.0671   Epoch: 3   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:43,461-Speed 3423.49 samples/sec   Loss 8.9363   LearningRate 0.0670   Epoch: 3   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:46,457-Speed 3418.52 samples/sec   Loss 8.8736   LearningRate 0.0670   Epoch: 3   Global Step: 18340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:06:49,469-Speed 3401.44 samples/sec   Loss 8.6873   LearningRate 0.0670   Epoch: 3   Global Step: 18350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:06:52,463-Speed 3421.05 samples/sec   Loss 8.7357   LearningRate 0.0670   Epoch: 3   Global Step: 18360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:06:55,447-Speed 3431.50 samples/sec   Loss 8.6361   LearningRate 0.0670   Epoch: 3   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:06:58,452-Speed 3409.38 samples/sec   Loss 8.8752   LearningRate 0.0670   Epoch: 3   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:01,443-Speed 3424.80 samples/sec   Loss 8.8478   LearningRate 0.0669   Epoch: 3   Global Step: 18390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:04,429-Speed 3429.90 samples/sec   Loss 8.7619   LearningRate 0.0669   Epoch: 3   Global Step: 18400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:07,446-Speed 3396.23 samples/sec   Loss 8.7969   LearningRate 0.0669   Epoch: 3   Global Step: 18410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:10,437-Speed 3424.33 samples/sec   Loss 8.7560   LearningRate 0.0669   Epoch: 3   Global Step: 18420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:13,433-Speed 3418.20 samples/sec   Loss 8.9430   LearningRate 0.0669   Epoch: 3   Global Step: 18430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:16,414-Speed 3436.43 samples/sec   Loss 8.7389   LearningRate 0.0669   Epoch: 3   Global Step: 18440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:19,397-Speed 3434.58 samples/sec   Loss 8.7172   LearningRate 0.0668   Epoch: 3   Global Step: 18450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:22,377-Speed 3436.21 samples/sec   Loss 8.7412   LearningRate 0.0668   Epoch: 3   Global Step: 18460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:07:25,377-Speed 3414.61 samples/sec   Loss 8.6675   LearningRate 0.0668   Epoch: 3   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:28,358-Speed 3436.13 samples/sec   Loss 8.7077   LearningRate 0.0668   Epoch: 3   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:31,343-Speed 3431.52 samples/sec   Loss 8.8050   LearningRate 0.0668   Epoch: 3   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:34,334-Speed 3424.24 samples/sec   Loss 8.6779   LearningRate 0.0668   Epoch: 3   Global Step: 18500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:37,343-Speed 3404.89 samples/sec   Loss 8.7474   LearningRate 0.0668   Epoch: 3   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:40,350-Speed 3405.73 samples/sec   Loss 8.9763   LearningRate 0.0667   Epoch: 3   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:43,344-Speed 3421.11 samples/sec   Loss 8.9897   LearningRate 0.0667   Epoch: 3   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:46,334-Speed 3427.66 samples/sec   Loss 8.7172   LearningRate 0.0667   Epoch: 3   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:49,330-Speed 3418.24 samples/sec   Loss 8.7177   LearningRate 0.0667   Epoch: 3   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:52,324-Speed 3421.09 samples/sec   Loss 8.7839   LearningRate 0.0667   Epoch: 3   Global Step: 18560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:55,295-Speed 3447.83 samples/sec   Loss 8.8772   LearningRate 0.0667   Epoch: 3   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:07:58,280-Speed 3431.39 samples/sec   Loss 8.8002   LearningRate 0.0666   Epoch: 3   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:01,308-Speed 3382.61 samples/sec   Loss 8.9007   LearningRate 0.0666   Epoch: 3   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:04,336-Speed 3383.01 samples/sec   Loss 8.9260   LearningRate 0.0666   Epoch: 3   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:07,330-Speed 3421.31 samples/sec   Loss 9.0223   LearningRate 0.0666   Epoch: 3   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:10,323-Speed 3422.26 samples/sec   Loss 8.8349   LearningRate 0.0666   Epoch: 3   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:13,401-Speed 3326.86 samples/sec   Loss 8.8933   LearningRate 0.0666   Epoch: 3   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:16,401-Speed 3415.03 samples/sec   Loss 8.7761   LearningRate 0.0665   Epoch: 3   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:19,392-Speed 3423.56 samples/sec   Loss 8.7750   LearningRate 0.0665   Epoch: 3   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:22,377-Speed 3431.87 samples/sec   Loss 8.8522   LearningRate 0.0665   Epoch: 3   Global Step: 18660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:25,376-Speed 3415.67 samples/sec   Loss 8.7622   LearningRate 0.0665   Epoch: 3   Global Step: 18670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:28,376-Speed 3414.03 samples/sec   Loss 8.9232   LearningRate 0.0665   Epoch: 3   Global Step: 18680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:31,365-Speed 3427.12 samples/sec   Loss 8.8527   LearningRate 0.0665   Epoch: 3   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:34,357-Speed 3422.70 samples/sec   Loss 8.7982   LearningRate 0.0664   Epoch: 3   Global Step: 18700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:37,352-Speed 3421.14 samples/sec   Loss 8.8430   LearningRate 0.0664   Epoch: 3   Global Step: 18710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:40,355-Speed 3410.28 samples/sec   Loss 8.9512   LearningRate 0.0664   Epoch: 3   Global Step: 18720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:43,363-Speed 3404.51 samples/sec   Loss 8.7736   LearningRate 0.0664   Epoch: 3   Global Step: 18730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:46,367-Speed 3409.50 samples/sec   Loss 8.7112   LearningRate 0.0664   Epoch: 3   Global Step: 18740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:49,370-Speed 3411.31 samples/sec   Loss 8.9191   LearningRate 0.0664   Epoch: 3   Global Step: 18750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:08:52,411-Speed 3368.14 samples/sec   Loss 8.6506   LearningRate 0.0663   Epoch: 3   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:55,411-Speed 3414.80 samples/sec   Loss 8.7510   LearningRate 0.0663   Epoch: 3   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:08:58,400-Speed 3426.87 samples/sec   Loss 8.7333   LearningRate 0.0663   Epoch: 3   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:01,449-Speed 3358.82 samples/sec   Loss 8.7952   LearningRate 0.0663   Epoch: 3   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:04,500-Speed 3356.95 samples/sec   Loss 8.7669   LearningRate 0.0663   Epoch: 3   Global Step: 18800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:07,491-Speed 3425.13 samples/sec   Loss 8.6716   LearningRate 0.0663   Epoch: 3   Global Step: 18810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:10,469-Speed 3439.01 samples/sec   Loss 8.9788   LearningRate 0.0663   Epoch: 3   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:13,455-Speed 3430.38 samples/sec   Loss 8.6354   LearningRate 0.0662   Epoch: 3   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:16,448-Speed 3422.05 samples/sec   Loss 8.6929   LearningRate 0.0662   Epoch: 3   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:19,460-Speed 3400.26 samples/sec   Loss 8.6685   LearningRate 0.0662   Epoch: 3   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:22,465-Speed 3408.91 samples/sec   Loss 8.7209   LearningRate 0.0662   Epoch: 3   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:25,471-Speed 3407.83 samples/sec   Loss 8.8900   LearningRate 0.0662   Epoch: 3   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:28,475-Speed 3409.63 samples/sec   Loss 8.8479   LearningRate 0.0662   Epoch: 3   Global Step: 18880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:31,481-Speed 3407.02 samples/sec   Loss 8.7274   LearningRate 0.0661   Epoch: 3   Global Step: 18890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:34,473-Speed 3423.52 samples/sec   Loss 8.8031   LearningRate 0.0661   Epoch: 3   Global Step: 18900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:37,474-Speed 3413.24 samples/sec   Loss 8.7649   LearningRate 0.0661   Epoch: 3   Global Step: 18910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:09:40,472-Speed 3415.45 samples/sec   Loss 8.9432   LearningRate 0.0661   Epoch: 3   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:43,476-Speed 3409.89 samples/sec   Loss 8.7628   LearningRate 0.0661   Epoch: 3   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:46,478-Speed 3412.92 samples/sec   Loss 8.8025   LearningRate 0.0661   Epoch: 3   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:49,478-Speed 3414.09 samples/sec   Loss 8.5616   LearningRate 0.0660   Epoch: 3   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:52,485-Speed 3406.30 samples/sec   Loss 8.8637   LearningRate 0.0660   Epoch: 3   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:55,488-Speed 3410.52 samples/sec   Loss 8.8843   LearningRate 0.0660   Epoch: 3   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:09:58,495-Speed 3406.66 samples/sec   Loss 8.7057   LearningRate 0.0660   Epoch: 3   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:01,516-Speed 3390.81 samples/sec   Loss 8.5612   LearningRate 0.0660   Epoch: 3   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:04,517-Speed 3413.59 samples/sec   Loss 8.6371   LearningRate 0.0660   Epoch: 3   Global Step: 19000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:07,520-Speed 3410.41 samples/sec   Loss 8.6144   LearningRate 0.0659   Epoch: 3   Global Step: 19010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:10,538-Speed 3394.34 samples/sec   Loss 8.7578   LearningRate 0.0659   Epoch: 3   Global Step: 19020   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 01:10:13,527-Speed 3426.93 samples/sec   Loss 8.8130   LearningRate 0.0659   Epoch: 3   Global Step: 19030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:16,550-Speed 3388.20 samples/sec   Loss 8.8404   LearningRate 0.0659   Epoch: 3   Global Step: 19040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:19,560-Speed 3402.82 samples/sec   Loss 8.7137   LearningRate 0.0659   Epoch: 3   Global Step: 19050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:22,560-Speed 3414.58 samples/sec   Loss 8.8068   LearningRate 0.0659   Epoch: 3   Global Step: 19060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:25,574-Speed 3398.45 samples/sec   Loss 8.7159   LearningRate 0.0659   Epoch: 3   Global Step: 19070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:28,585-Speed 3400.91 samples/sec   Loss 8.7294   LearningRate 0.0658   Epoch: 3   Global Step: 19080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:31,621-Speed 3375.02 samples/sec   Loss 8.7640   LearningRate 0.0658   Epoch: 3   Global Step: 19090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:34,615-Speed 3420.12 samples/sec   Loss 8.7434   LearningRate 0.0658   Epoch: 3   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:37,624-Speed 3404.44 samples/sec   Loss 8.7383   LearningRate 0.0658   Epoch: 3   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:40,629-Speed 3408.36 samples/sec   Loss 8.7196   LearningRate 0.0658   Epoch: 3   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:43,618-Speed 3426.33 samples/sec   Loss 8.5937   LearningRate 0.0658   Epoch: 3   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:10:46,607-Speed 3427.40 samples/sec   Loss 8.7891   LearningRate 0.0657   Epoch: 3   Global Step: 19140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:10:49,593-Speed 3429.82 samples/sec   Loss 8.5750   LearningRate 0.0657   Epoch: 3   Global Step: 19150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:10:52,614-Speed 3391.42 samples/sec   Loss 8.5894   LearningRate 0.0657   Epoch: 3   Global Step: 19160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:10:55,620-Speed 3406.73 samples/sec   Loss 8.6399   LearningRate 0.0657   Epoch: 3   Global Step: 19170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:10:58,640-Speed 3391.14 samples/sec   Loss 8.7530   LearningRate 0.0657   Epoch: 3   Global Step: 19180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:11:01,664-Speed 3387.89 samples/sec   Loss 8.7735   LearningRate 0.0657   Epoch: 3   Global Step: 19190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:11:04,669-Speed 3407.76 samples/sec   Loss 8.8180   LearningRate 0.0656   Epoch: 3   Global Step: 19200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:11:07,683-Speed 3398.34 samples/sec   Loss 8.6980   LearningRate 0.0656   Epoch: 3   Global Step: 19210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:11:10,684-Speed 3413.86 samples/sec   Loss 8.7162   LearningRate 0.0656   Epoch: 3   Global Step: 19220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:11:13,702-Speed 3393.79 samples/sec   Loss 8.8165   LearningRate 0.0656   Epoch: 3   Global Step: 19230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:11:16,755-Speed 3354.61 samples/sec   Loss 8.8343   LearningRate 0.0656   Epoch: 3   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:19,762-Speed 3406.54 samples/sec   Loss 8.6947   LearningRate 0.0656   Epoch: 3   Global Step: 19250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:22,764-Speed 3411.93 samples/sec   Loss 8.6956   LearningRate 0.0655   Epoch: 3   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:25,845-Speed 3324.44 samples/sec   Loss 8.6983   LearningRate 0.0655   Epoch: 3   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:28,846-Speed 3412.60 samples/sec   Loss 8.5763   LearningRate 0.0655   Epoch: 3   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:31,857-Speed 3401.72 samples/sec   Loss 8.8091   LearningRate 0.0655   Epoch: 3   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:34,861-Speed 3410.09 samples/sec   Loss 8.6342   LearningRate 0.0655   Epoch: 3   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:37,896-Speed 3374.10 samples/sec   Loss 8.7413   LearningRate 0.0655   Epoch: 3   Global Step: 19310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:40,915-Speed 3393.17 samples/sec   Loss 8.7462   LearningRate 0.0655   Epoch: 3   Global Step: 19320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:43,917-Speed 3412.47 samples/sec   Loss 8.8068   LearningRate 0.0654   Epoch: 3   Global Step: 19330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:46,902-Speed 3430.55 samples/sec   Loss 8.8250   LearningRate 0.0654   Epoch: 3   Global Step: 19340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:50,000-Speed 3306.48 samples/sec   Loss 8.6592   LearningRate 0.0654   Epoch: 3   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:53,025-Speed 3385.92 samples/sec   Loss 8.8568   LearningRate 0.0654   Epoch: 3   Global Step: 19360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:56,030-Speed 3408.45 samples/sec   Loss 8.6718   LearningRate 0.0654   Epoch: 3   Global Step: 19370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:11:59,034-Speed 3409.61 samples/sec   Loss 8.5398   LearningRate 0.0654   Epoch: 3   Global Step: 19380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:02,040-Speed 3407.00 samples/sec   Loss 8.6428   LearningRate 0.0653   Epoch: 3   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:05,048-Speed 3404.90 samples/sec   Loss 8.6480   LearningRate 0.0653   Epoch: 3   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:08,048-Speed 3414.49 samples/sec   Loss 8.7641   LearningRate 0.0653   Epoch: 3   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:11,070-Speed 3389.79 samples/sec   Loss 8.7091   LearningRate 0.0653   Epoch: 3   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:14,078-Speed 3405.34 samples/sec   Loss 8.7386   LearningRate 0.0653   Epoch: 3   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:17,084-Speed 3407.57 samples/sec   Loss 8.5979   LearningRate 0.0653   Epoch: 3   Global Step: 19440   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 01:12:20,071-Speed 3428.73 samples/sec   Loss 8.5106   LearningRate 0.0652   Epoch: 3   Global Step: 19450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:23,113-Speed 3367.31 samples/sec   Loss 8.7614   LearningRate 0.0652   Epoch: 3   Global Step: 19460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:26,129-Speed 3395.56 samples/sec   Loss 8.8436   LearningRate 0.0652   Epoch: 3   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:12:29,162-Speed 3377.35 samples/sec   Loss 8.7638   LearningRate 0.0652   Epoch: 3   Global Step: 19480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:32,173-Speed 3401.26 samples/sec   Loss 8.6956   LearningRate 0.0652   Epoch: 3   Global Step: 19490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:35,181-Speed 3405.79 samples/sec   Loss 8.6757   LearningRate 0.0652   Epoch: 3   Global Step: 19500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:38,185-Speed 3410.12 samples/sec   Loss 8.7454   LearningRate 0.0651   Epoch: 3   Global Step: 19510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:41,191-Speed 3406.96 samples/sec   Loss 8.5709   LearningRate 0.0651   Epoch: 3   Global Step: 19520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:44,196-Speed 3408.18 samples/sec   Loss 8.5046   LearningRate 0.0651   Epoch: 3   Global Step: 19530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:47,213-Speed 3395.28 samples/sec   Loss 8.6968   LearningRate 0.0651   Epoch: 3   Global Step: 19540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:50,212-Speed 3415.52 samples/sec   Loss 8.5625   LearningRate 0.0651   Epoch: 3   Global Step: 19550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:53,236-Speed 3387.48 samples/sec   Loss 8.5954   LearningRate 0.0651   Epoch: 3   Global Step: 19560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:56,295-Speed 3347.61 samples/sec   Loss 8.5746   LearningRate 0.0651   Epoch: 3   Global Step: 19570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:12:59,350-Speed 3352.46 samples/sec   Loss 8.6435   LearningRate 0.0650   Epoch: 3   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:02,374-Speed 3387.84 samples/sec   Loss 8.6076   LearningRate 0.0650   Epoch: 3   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:05,393-Speed 3392.99 samples/sec   Loss 8.7069   LearningRate 0.0650   Epoch: 3   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:08,397-Speed 3409.53 samples/sec   Loss 8.7575   LearningRate 0.0650   Epoch: 3   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:11,394-Speed 3417.70 samples/sec   Loss 8.6430   LearningRate 0.0650   Epoch: 3   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:14,397-Speed 3410.61 samples/sec   Loss 8.7186   LearningRate 0.0650   Epoch: 3   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:17,406-Speed 3404.48 samples/sec   Loss 8.5570   LearningRate 0.0649   Epoch: 3   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:20,420-Speed 3397.62 samples/sec   Loss 8.6423   LearningRate 0.0649   Epoch: 3   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:23,435-Speed 3398.61 samples/sec   Loss 8.4995   LearningRate 0.0649   Epoch: 3   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:26,447-Speed 3400.20 samples/sec   Loss 8.7057   LearningRate 0.0649   Epoch: 3   Global Step: 19670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:29,451-Speed 3409.89 samples/sec   Loss 8.7664   LearningRate 0.0649   Epoch: 3   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:32,460-Speed 3403.66 samples/sec   Loss 8.5523   LearningRate 0.0649   Epoch: 3   Global Step: 19690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:35,468-Speed 3405.03 samples/sec   Loss 8.6150   LearningRate 0.0648   Epoch: 3   Global Step: 19700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:38,474-Speed 3408.01 samples/sec   Loss 8.6944   LearningRate 0.0648   Epoch: 3   Global Step: 19710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:41,474-Speed 3414.36 samples/sec   Loss 8.6300   LearningRate 0.0648   Epoch: 3   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:44,476-Speed 3411.92 samples/sec   Loss 8.6859   LearningRate 0.0648   Epoch: 3   Global Step: 19730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:47,480-Speed 3409.01 samples/sec   Loss 8.6237   LearningRate 0.0648   Epoch: 3   Global Step: 19740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:50,503-Speed 3389.09 samples/sec   Loss 8.5800   LearningRate 0.0648   Epoch: 3   Global Step: 19750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:53,513-Speed 3404.17 samples/sec   Loss 8.6411   LearningRate 0.0647   Epoch: 3   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:56,510-Speed 3416.59 samples/sec   Loss 8.7080   LearningRate 0.0647   Epoch: 3   Global Step: 19770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:13:59,495-Speed 3431.93 samples/sec   Loss 8.7329   LearningRate 0.0647   Epoch: 3   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:02,520-Speed 3386.15 samples/sec   Loss 8.5889   LearningRate 0.0647   Epoch: 3   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:05,520-Speed 3414.19 samples/sec   Loss 8.5219   LearningRate 0.0647   Epoch: 3   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:08,527-Speed 3407.23 samples/sec   Loss 8.5968   LearningRate 0.0647   Epoch: 3   Global Step: 19810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:11,529-Speed 3411.57 samples/sec   Loss 8.8441   LearningRate 0.0647   Epoch: 3   Global Step: 19820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:14,526-Speed 3417.74 samples/sec   Loss 8.6431   LearningRate 0.0646   Epoch: 3   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:17,529-Speed 3411.36 samples/sec   Loss 8.4534   LearningRate 0.0646   Epoch: 3   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:20,545-Speed 3395.55 samples/sec   Loss 8.4954   LearningRate 0.0646   Epoch: 3   Global Step: 19850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:23,561-Speed 3396.68 samples/sec   Loss 8.5594   LearningRate 0.0646   Epoch: 3   Global Step: 19860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:26,583-Speed 3389.48 samples/sec   Loss 8.5640   LearningRate 0.0646   Epoch: 3   Global Step: 19870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:29,581-Speed 3416.28 samples/sec   Loss 8.5773   LearningRate 0.0646   Epoch: 3   Global Step: 19880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:32,585-Speed 3409.11 samples/sec   Loss 8.6844   LearningRate 0.0645   Epoch: 3   Global Step: 19890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:35,588-Speed 3410.83 samples/sec   Loss 8.5049   LearningRate 0.0645   Epoch: 3   Global Step: 19900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:38,597-Speed 3404.50 samples/sec   Loss 8.6241   LearningRate 0.0645   Epoch: 3   Global Step: 19910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:41,602-Speed 3408.27 samples/sec   Loss 8.6316   LearningRate 0.0645   Epoch: 3   Global Step: 19920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:44,607-Speed 3408.53 samples/sec   Loss 8.6823   LearningRate 0.0645   Epoch: 3   Global Step: 19930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:14:47,591-Speed 3433.40 samples/sec   Loss 8.5173   LearningRate 0.0645   Epoch: 3   Global Step: 19940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:14:50,607-Speed 3396.15 samples/sec   Loss 8.5082   LearningRate 0.0644   Epoch: 3   Global Step: 19950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:14:53,633-Speed 3384.85 samples/sec   Loss 8.6103   LearningRate 0.0644   Epoch: 3   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:14:56,652-Speed 3393.01 samples/sec   Loss 8.4069   LearningRate 0.0644   Epoch: 3   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:14:59,681-Speed 3380.96 samples/sec   Loss 8.5895   LearningRate 0.0644   Epoch: 3   Global Step: 19980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:15:02,823-Speed 3260.20 samples/sec   Loss 8.6539   LearningRate 0.0644   Epoch: 3   Global Step: 19990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:15:05,847-Speed 3386.93 samples/sec   Loss 8.6190   LearningRate 0.0644   Epoch: 3   Global Step: 20000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:15:50,084-[lfw][20000]XNorm: 19.793800
Training: 2022-04-11 01:15:50,084-[lfw][20000]Accuracy-Flip: 0.99683+-0.00337
Training: 2022-04-11 01:15:50,085-[lfw][20000]Accuracy-Highest: 0.99717
Training: 2022-04-11 01:16:41,379-[cfp_fp][20000]XNorm: 17.406228
Training: 2022-04-11 01:16:41,380-[cfp_fp][20000]Accuracy-Flip: 0.95314+-0.01157
Training: 2022-04-11 01:16:41,380-[cfp_fp][20000]Accuracy-Highest: 0.95514
Training: 2022-04-11 01:17:25,749-[agedb_30][20000]XNorm: 19.689191
Training: 2022-04-11 01:17:25,750-[agedb_30][20000]Accuracy-Flip: 0.97383+-0.00727
Training: 2022-04-11 01:17:25,750-[agedb_30][20000]Accuracy-Highest: 0.97383
Training: 2022-04-11 01:17:28,754-Speed 71.66 samples/sec   Loss 8.5656   LearningRate 0.0644   Epoch: 3   Global Step: 20010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:17:31,743-Speed 3426.74 samples/sec   Loss 8.4944   LearningRate 0.0643   Epoch: 3   Global Step: 20020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:17:34,748-Speed 3407.90 samples/sec   Loss 8.6717   LearningRate 0.0643   Epoch: 3   Global Step: 20030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:17:37,737-Speed 3427.76 samples/sec   Loss 8.5959   LearningRate 0.0643   Epoch: 3   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:17:40,729-Speed 3423.02 samples/sec   Loss 8.4088   LearningRate 0.0643   Epoch: 3   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:17:43,727-Speed 3416.92 samples/sec   Loss 8.7080   LearningRate 0.0643   Epoch: 3   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:17:46,719-Speed 3422.87 samples/sec   Loss 8.6402   LearningRate 0.0643   Epoch: 3   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:17:49,707-Speed 3428.27 samples/sec   Loss 8.5998   LearningRate 0.0642   Epoch: 3   Global Step: 20080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:17:52,714-Speed 3406.50 samples/sec   Loss 8.5893   LearningRate 0.0642   Epoch: 3   Global Step: 20090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:17:55,694-Speed 3437.39 samples/sec   Loss 8.5082   LearningRate 0.0642   Epoch: 3   Global Step: 20100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:17:58,698-Speed 3409.63 samples/sec   Loss 8.5903   LearningRate 0.0642   Epoch: 3   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:18:01,673-Speed 3442.29 samples/sec   Loss 8.3820   LearningRate 0.0642   Epoch: 3   Global Step: 20120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:04,663-Speed 3426.03 samples/sec   Loss 8.6337   LearningRate 0.0642   Epoch: 3   Global Step: 20130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:07,664-Speed 3413.63 samples/sec   Loss 8.4736   LearningRate 0.0641   Epoch: 3   Global Step: 20140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:10,664-Speed 3414.05 samples/sec   Loss 8.6034   LearningRate 0.0641   Epoch: 3   Global Step: 20150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:13,659-Speed 3419.90 samples/sec   Loss 8.5430   LearningRate 0.0641   Epoch: 3   Global Step: 20160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:16,652-Speed 3422.10 samples/sec   Loss 8.6362   LearningRate 0.0641   Epoch: 3   Global Step: 20170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:19,649-Speed 3417.19 samples/sec   Loss 8.7580   LearningRate 0.0641   Epoch: 3   Global Step: 20180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:22,682-Speed 3377.93 samples/sec   Loss 8.5636   LearningRate 0.0641   Epoch: 3   Global Step: 20190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:25,697-Speed 3396.14 samples/sec   Loss 8.5825   LearningRate 0.0641   Epoch: 3   Global Step: 20200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:28,704-Speed 3407.31 samples/sec   Loss 8.4958   LearningRate 0.0640   Epoch: 3   Global Step: 20210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:18:31,711-Speed 3406.05 samples/sec   Loss 8.6632   LearningRate 0.0640   Epoch: 3   Global Step: 20220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:18:34,781-Speed 3336.51 samples/sec   Loss 8.4433   LearningRate 0.0640   Epoch: 3   Global Step: 20230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:18:48,132-Speed 767.02 samples/sec   Loss 7.9343   LearningRate 0.0640   Epoch: 4   Global Step: 20240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:18:51,350-Speed 3183.28 samples/sec   Loss 7.7936   LearningRate 0.0640   Epoch: 4   Global Step: 20250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:18:54,385-Speed 3375.51 samples/sec   Loss 7.7007   LearningRate 0.0640   Epoch: 4   Global Step: 20260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:18:57,519-Speed 3268.48 samples/sec   Loss 7.9811   LearningRate 0.0639   Epoch: 4   Global Step: 20270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:00,546-Speed 3384.09 samples/sec   Loss 7.8085   LearningRate 0.0639   Epoch: 4   Global Step: 20280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:03,613-Speed 3338.87 samples/sec   Loss 7.8509   LearningRate 0.0639   Epoch: 4   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:06,637-Speed 3387.15 samples/sec   Loss 7.7147   LearningRate 0.0639   Epoch: 4   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:09,637-Speed 3414.48 samples/sec   Loss 7.7122   LearningRate 0.0639   Epoch: 4   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:12,604-Speed 3453.25 samples/sec   Loss 7.8469   LearningRate 0.0639   Epoch: 4   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:15,610-Speed 3406.77 samples/sec   Loss 7.9488   LearningRate 0.0638   Epoch: 4   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:18,630-Speed 3392.38 samples/sec   Loss 7.8415   LearningRate 0.0638   Epoch: 4   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:21,642-Speed 3400.76 samples/sec   Loss 7.9803   LearningRate 0.0638   Epoch: 4   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:24,640-Speed 3416.02 samples/sec   Loss 8.0284   LearningRate 0.0638   Epoch: 4   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:27,658-Speed 3394.04 samples/sec   Loss 7.9587   LearningRate 0.0638   Epoch: 4   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:30,655-Speed 3417.40 samples/sec   Loss 7.9091   LearningRate 0.0638   Epoch: 4   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:33,652-Speed 3418.13 samples/sec   Loss 7.9865   LearningRate 0.0638   Epoch: 4   Global Step: 20390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:36,652-Speed 3414.90 samples/sec   Loss 7.8677   LearningRate 0.0637   Epoch: 4   Global Step: 20400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:39,662-Speed 3401.83 samples/sec   Loss 7.8742   LearningRate 0.0637   Epoch: 4   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:42,663-Speed 3413.22 samples/sec   Loss 8.0067   LearningRate 0.0637   Epoch: 4   Global Step: 20420   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 01:19:45,645-Speed 3435.33 samples/sec   Loss 8.0516   LearningRate 0.0637   Epoch: 4   Global Step: 20430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:48,644-Speed 3415.70 samples/sec   Loss 8.0048   LearningRate 0.0637   Epoch: 4   Global Step: 20440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:51,635-Speed 3423.94 samples/sec   Loss 8.0004   LearningRate 0.0637   Epoch: 4   Global Step: 20450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:19:54,640-Speed 3408.86 samples/sec   Loss 7.8946   LearningRate 0.0636   Epoch: 4   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:19:57,643-Speed 3411.22 samples/sec   Loss 7.9057   LearningRate 0.0636   Epoch: 4   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:00,642-Speed 3415.50 samples/sec   Loss 8.0310   LearningRate 0.0636   Epoch: 4   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:03,651-Speed 3403.36 samples/sec   Loss 8.0589   LearningRate 0.0636   Epoch: 4   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:06,651-Speed 3414.53 samples/sec   Loss 7.8693   LearningRate 0.0636   Epoch: 4   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:09,650-Speed 3415.01 samples/sec   Loss 8.0731   LearningRate 0.0636   Epoch: 4   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:12,644-Speed 3421.82 samples/sec   Loss 8.0470   LearningRate 0.0635   Epoch: 4   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:15,643-Speed 3415.23 samples/sec   Loss 8.1586   LearningRate 0.0635   Epoch: 4   Global Step: 20530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:18,641-Speed 3416.76 samples/sec   Loss 8.0575   LearningRate 0.0635   Epoch: 4   Global Step: 20540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:21,641-Speed 3413.64 samples/sec   Loss 8.0228   LearningRate 0.0635   Epoch: 4   Global Step: 20550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:20:24,676-Speed 3375.26 samples/sec   Loss 8.0619   LearningRate 0.0635   Epoch: 4   Global Step: 20560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:27,680-Speed 3409.12 samples/sec   Loss 8.0964   LearningRate 0.0635   Epoch: 4   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:30,678-Speed 3416.98 samples/sec   Loss 8.3191   LearningRate 0.0635   Epoch: 4   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:33,686-Speed 3405.35 samples/sec   Loss 8.1784   LearningRate 0.0634   Epoch: 4   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:36,698-Speed 3399.87 samples/sec   Loss 8.1069   LearningRate 0.0634   Epoch: 4   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:39,709-Speed 3401.44 samples/sec   Loss 8.1440   LearningRate 0.0634   Epoch: 4   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:42,713-Speed 3410.10 samples/sec   Loss 8.2036   LearningRate 0.0634   Epoch: 4   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:45,713-Speed 3414.35 samples/sec   Loss 8.0395   LearningRate 0.0634   Epoch: 4   Global Step: 20630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:48,714-Speed 3413.36 samples/sec   Loss 8.0591   LearningRate 0.0634   Epoch: 4   Global Step: 20640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:51,717-Speed 3410.60 samples/sec   Loss 8.1078   LearningRate 0.0633   Epoch: 4   Global Step: 20650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:54,695-Speed 3438.67 samples/sec   Loss 7.8944   LearningRate 0.0633   Epoch: 4   Global Step: 20660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:20:57,702-Speed 3406.63 samples/sec   Loss 8.2395   LearningRate 0.0633   Epoch: 4   Global Step: 20670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:00,715-Speed 3399.81 samples/sec   Loss 8.2620   LearningRate 0.0633   Epoch: 4   Global Step: 20680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:03,720-Speed 3408.74 samples/sec   Loss 8.2104   LearningRate 0.0633   Epoch: 4   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:06,722-Speed 3411.35 samples/sec   Loss 8.2013   LearningRate 0.0633   Epoch: 4   Global Step: 20700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:09,720-Speed 3416.20 samples/sec   Loss 8.1628   LearningRate 0.0632   Epoch: 4   Global Step: 20710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:12,719-Speed 3415.52 samples/sec   Loss 8.1601   LearningRate 0.0632   Epoch: 4   Global Step: 20720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:15,729-Speed 3403.05 samples/sec   Loss 7.9581   LearningRate 0.0632   Epoch: 4   Global Step: 20730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:18,736-Speed 3406.04 samples/sec   Loss 8.2598   LearningRate 0.0632   Epoch: 4   Global Step: 20740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:21,734-Speed 3416.39 samples/sec   Loss 8.1643   LearningRate 0.0632   Epoch: 4   Global Step: 20750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:24,728-Speed 3421.41 samples/sec   Loss 8.1644   LearningRate 0.0632   Epoch: 4   Global Step: 20760   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 01:21:27,834-Speed 3297.70 samples/sec   Loss 8.1745   LearningRate 0.0632   Epoch: 4   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:31,120-Speed 3117.06 samples/sec   Loss 8.2299   LearningRate 0.0631   Epoch: 4   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:34,123-Speed 3410.86 samples/sec   Loss 8.1230   LearningRate 0.0631   Epoch: 4   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:37,128-Speed 3407.81 samples/sec   Loss 8.1912   LearningRate 0.0631   Epoch: 4   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:40,145-Speed 3395.38 samples/sec   Loss 8.2189   LearningRate 0.0631   Epoch: 4   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:43,163-Speed 3394.21 samples/sec   Loss 8.0781   LearningRate 0.0631   Epoch: 4   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:46,161-Speed 3416.45 samples/sec   Loss 8.1212   LearningRate 0.0631   Epoch: 4   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:49,161-Speed 3414.42 samples/sec   Loss 8.1477   LearningRate 0.0630   Epoch: 4   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:52,164-Speed 3410.18 samples/sec   Loss 8.1470   LearningRate 0.0630   Epoch: 4   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:55,171-Speed 3406.60 samples/sec   Loss 8.1489   LearningRate 0.0630   Epoch: 4   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:21:58,150-Speed 3438.46 samples/sec   Loss 8.3649   LearningRate 0.0630   Epoch: 4   Global Step: 20870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:01,174-Speed 3386.97 samples/sec   Loss 8.2322   LearningRate 0.0630   Epoch: 4   Global Step: 20880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:04,174-Speed 3415.34 samples/sec   Loss 8.1957   LearningRate 0.0630   Epoch: 4   Global Step: 20890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:07,175-Speed 3413.18 samples/sec   Loss 8.2540   LearningRate 0.0629   Epoch: 4   Global Step: 20900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:10,176-Speed 3412.91 samples/sec   Loss 8.1195   LearningRate 0.0629   Epoch: 4   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:13,179-Speed 3410.12 samples/sec   Loss 8.0945   LearningRate 0.0629   Epoch: 4   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:16,184-Speed 3408.41 samples/sec   Loss 8.2974   LearningRate 0.0629   Epoch: 4   Global Step: 20930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:19,168-Speed 3433.17 samples/sec   Loss 8.2632   LearningRate 0.0629   Epoch: 4   Global Step: 20940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:22,170-Speed 3412.06 samples/sec   Loss 8.0493   LearningRate 0.0629   Epoch: 4   Global Step: 20950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:25,174-Speed 3409.48 samples/sec   Loss 8.2155   LearningRate 0.0629   Epoch: 4   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:28,195-Speed 3389.76 samples/sec   Loss 8.4436   LearningRate 0.0628   Epoch: 4   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:31,201-Speed 3407.86 samples/sec   Loss 8.1387   LearningRate 0.0628   Epoch: 4   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:34,203-Speed 3412.93 samples/sec   Loss 8.2299   LearningRate 0.0628   Epoch: 4   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:37,222-Speed 3392.41 samples/sec   Loss 8.1877   LearningRate 0.0628   Epoch: 4   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:40,224-Speed 3411.13 samples/sec   Loss 8.2844   LearningRate 0.0628   Epoch: 4   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:43,224-Speed 3414.69 samples/sec   Loss 8.2174   LearningRate 0.0628   Epoch: 4   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:46,227-Speed 3409.89 samples/sec   Loss 8.2709   LearningRate 0.0627   Epoch: 4   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:22:49,242-Speed 3397.70 samples/sec   Loss 8.1808   LearningRate 0.0627   Epoch: 4   Global Step: 21040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:52,267-Speed 3385.80 samples/sec   Loss 8.3889   LearningRate 0.0627   Epoch: 4   Global Step: 21050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:55,276-Speed 3404.39 samples/sec   Loss 8.2528   LearningRate 0.0627   Epoch: 4   Global Step: 21060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:22:58,292-Speed 3396.39 samples/sec   Loss 8.1934   LearningRate 0.0627   Epoch: 4   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:23:01,306-Speed 3398.62 samples/sec   Loss 8.3933   LearningRate 0.0627   Epoch: 4   Global Step: 21080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:23:04,360-Speed 3353.41 samples/sec   Loss 8.1675   LearningRate 0.0627   Epoch: 4   Global Step: 21090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:23:07,372-Speed 3400.18 samples/sec   Loss 8.4810   LearningRate 0.0626   Epoch: 4   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:10,386-Speed 3398.73 samples/sec   Loss 8.2008   LearningRate 0.0626   Epoch: 4   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:13,390-Speed 3409.36 samples/sec   Loss 8.3888   LearningRate 0.0626   Epoch: 4   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:16,400-Speed 3402.79 samples/sec   Loss 8.2012   LearningRate 0.0626   Epoch: 4   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:19,406-Speed 3406.98 samples/sec   Loss 8.2221   LearningRate 0.0626   Epoch: 4   Global Step: 21140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:22,423-Speed 3395.51 samples/sec   Loss 8.3858   LearningRate 0.0626   Epoch: 4   Global Step: 21150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:25,436-Speed 3399.78 samples/sec   Loss 8.3609   LearningRate 0.0625   Epoch: 4   Global Step: 21160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:28,436-Speed 3414.26 samples/sec   Loss 8.3393   LearningRate 0.0625   Epoch: 4   Global Step: 21170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:31,439-Speed 3410.39 samples/sec   Loss 8.2111   LearningRate 0.0625   Epoch: 4   Global Step: 21180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:34,444-Speed 3408.39 samples/sec   Loss 8.4739   LearningRate 0.0625   Epoch: 4   Global Step: 21190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:37,446-Speed 3411.63 samples/sec   Loss 8.2169   LearningRate 0.0625   Epoch: 4   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:23:40,450-Speed 3409.87 samples/sec   Loss 8.2932   LearningRate 0.0625   Epoch: 4   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:23:43,450-Speed 3414.23 samples/sec   Loss 8.2490   LearningRate 0.0624   Epoch: 4   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:23:46,436-Speed 3429.48 samples/sec   Loss 8.2745   LearningRate 0.0624   Epoch: 4   Global Step: 21230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:49,459-Speed 3388.52 samples/sec   Loss 8.4396   LearningRate 0.0624   Epoch: 4   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:52,483-Speed 3387.49 samples/sec   Loss 8.3301   LearningRate 0.0624   Epoch: 4   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:55,498-Speed 3396.95 samples/sec   Loss 8.3204   LearningRate 0.0624   Epoch: 4   Global Step: 21260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:23:58,512-Speed 3398.57 samples/sec   Loss 8.2095   LearningRate 0.0624   Epoch: 4   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:01,530-Speed 3393.47 samples/sec   Loss 8.2336   LearningRate 0.0624   Epoch: 4   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:04,548-Speed 3393.93 samples/sec   Loss 8.0777   LearningRate 0.0623   Epoch: 4   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:07,566-Speed 3394.03 samples/sec   Loss 8.2049   LearningRate 0.0623   Epoch: 4   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:10,572-Speed 3407.05 samples/sec   Loss 8.2864   LearningRate 0.0623   Epoch: 4   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:13,590-Speed 3394.41 samples/sec   Loss 8.4832   LearningRate 0.0623   Epoch: 4   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:16,596-Speed 3406.56 samples/sec   Loss 8.3607   LearningRate 0.0623   Epoch: 4   Global Step: 21330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:19,603-Speed 3406.52 samples/sec   Loss 8.3362   LearningRate 0.0623   Epoch: 4   Global Step: 21340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:22,604-Speed 3413.46 samples/sec   Loss 8.3772   LearningRate 0.0622   Epoch: 4   Global Step: 21350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:25,621-Speed 3394.59 samples/sec   Loss 8.2905   LearningRate 0.0622   Epoch: 4   Global Step: 21360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:28,626-Speed 3409.14 samples/sec   Loss 8.0992   LearningRate 0.0622   Epoch: 4   Global Step: 21370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:31,638-Speed 3399.84 samples/sec   Loss 8.3528   LearningRate 0.0622   Epoch: 4   Global Step: 21380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:34,644-Speed 3407.78 samples/sec   Loss 8.3298   LearningRate 0.0622   Epoch: 4   Global Step: 21390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:37,648-Speed 3409.93 samples/sec   Loss 8.2069   LearningRate 0.0622   Epoch: 4   Global Step: 21400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:24:40,641-Speed 3421.73 samples/sec   Loss 8.1928   LearningRate 0.0622   Epoch: 4   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:43,650-Speed 3404.99 samples/sec   Loss 8.3812   LearningRate 0.0621   Epoch: 4   Global Step: 21420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:46,654-Speed 3409.93 samples/sec   Loss 8.1377   LearningRate 0.0621   Epoch: 4   Global Step: 21430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:49,656-Speed 3411.85 samples/sec   Loss 8.1878   LearningRate 0.0621   Epoch: 4   Global Step: 21440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:52,676-Speed 3391.98 samples/sec   Loss 8.3613   LearningRate 0.0621   Epoch: 4   Global Step: 21450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:55,683-Speed 3405.79 samples/sec   Loss 8.3640   LearningRate 0.0621   Epoch: 4   Global Step: 21460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:24:58,698-Speed 3397.33 samples/sec   Loss 8.4452   LearningRate 0.0621   Epoch: 4   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:01,702-Speed 3409.38 samples/sec   Loss 8.1906   LearningRate 0.0620   Epoch: 4   Global Step: 21480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:04,718-Speed 3395.45 samples/sec   Loss 8.1506   LearningRate 0.0620   Epoch: 4   Global Step: 21490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:07,725-Speed 3406.65 samples/sec   Loss 8.3639   LearningRate 0.0620   Epoch: 4   Global Step: 21500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:10,844-Speed 3284.39 samples/sec   Loss 8.3483   LearningRate 0.0620   Epoch: 4   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:13,861-Speed 3395.17 samples/sec   Loss 8.2253   LearningRate 0.0620   Epoch: 4   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:16,866-Speed 3408.55 samples/sec   Loss 8.2404   LearningRate 0.0620   Epoch: 4   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:19,876-Speed 3402.80 samples/sec   Loss 8.0321   LearningRate 0.0619   Epoch: 4   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:22,879-Speed 3411.21 samples/sec   Loss 8.3055   LearningRate 0.0619   Epoch: 4   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:25,891-Speed 3400.59 samples/sec   Loss 8.3655   LearningRate 0.0619   Epoch: 4   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:28,896-Speed 3408.04 samples/sec   Loss 8.4950   LearningRate 0.0619   Epoch: 4   Global Step: 21570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:31,906-Speed 3402.73 samples/sec   Loss 8.2882   LearningRate 0.0619   Epoch: 4   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:34,913-Speed 3406.56 samples/sec   Loss 8.1481   LearningRate 0.0619   Epoch: 4   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:37,938-Speed 3386.26 samples/sec   Loss 8.3222   LearningRate 0.0619   Epoch: 4   Global Step: 21600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:40,949-Speed 3400.43 samples/sec   Loss 8.1510   LearningRate 0.0618   Epoch: 4   Global Step: 21610   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 01:25:43,959-Speed 3404.06 samples/sec   Loss 8.1721   LearningRate 0.0618   Epoch: 4   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:25:46,946-Speed 3428.84 samples/sec   Loss 8.2161   LearningRate 0.0618   Epoch: 4   Global Step: 21630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:49,957-Speed 3401.15 samples/sec   Loss 8.1645   LearningRate 0.0618   Epoch: 4   Global Step: 21640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:52,961-Speed 3410.73 samples/sec   Loss 8.2286   LearningRate 0.0618   Epoch: 4   Global Step: 21650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:55,968-Speed 3406.25 samples/sec   Loss 8.2025   LearningRate 0.0618   Epoch: 4   Global Step: 21660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:25:58,985-Speed 3394.79 samples/sec   Loss 8.3900   LearningRate 0.0617   Epoch: 4   Global Step: 21670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:26:01,999-Speed 3398.16 samples/sec   Loss 8.3408   LearningRate 0.0617   Epoch: 4   Global Step: 21680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:26:05,009-Speed 3403.20 samples/sec   Loss 8.3014   LearningRate 0.0617   Epoch: 4   Global Step: 21690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:26:08,018-Speed 3403.70 samples/sec   Loss 8.2647   LearningRate 0.0617   Epoch: 4   Global Step: 21700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:26:11,029-Speed 3402.12 samples/sec   Loss 8.2530   LearningRate 0.0617   Epoch: 4   Global Step: 21710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:26:14,037-Speed 3405.26 samples/sec   Loss 8.2628   LearningRate 0.0617   Epoch: 4   Global Step: 21720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:26:17,047-Speed 3402.57 samples/sec   Loss 8.3624   LearningRate 0.0617   Epoch: 4   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:20,058-Speed 3401.12 samples/sec   Loss 8.2826   LearningRate 0.0616   Epoch: 4   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:23,063-Speed 3409.30 samples/sec   Loss 8.1801   LearningRate 0.0616   Epoch: 4   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:26,069-Speed 3407.70 samples/sec   Loss 8.3768   LearningRate 0.0616   Epoch: 4   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:29,082-Speed 3399.48 samples/sec   Loss 8.4329   LearningRate 0.0616   Epoch: 4   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:32,092-Speed 3402.68 samples/sec   Loss 8.2338   LearningRate 0.0616   Epoch: 4   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:35,102-Speed 3402.35 samples/sec   Loss 8.2003   LearningRate 0.0616   Epoch: 4   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:38,115-Speed 3399.46 samples/sec   Loss 8.1915   LearningRate 0.0615   Epoch: 4   Global Step: 21800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:41,127-Speed 3401.41 samples/sec   Loss 8.3168   LearningRate 0.0615   Epoch: 4   Global Step: 21810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:44,140-Speed 3399.55 samples/sec   Loss 8.1880   LearningRate 0.0615   Epoch: 4   Global Step: 21820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:47,136-Speed 3418.86 samples/sec   Loss 8.3953   LearningRate 0.0615   Epoch: 4   Global Step: 21830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:50,146-Speed 3402.35 samples/sec   Loss 8.2251   LearningRate 0.0615   Epoch: 4   Global Step: 21840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:53,157-Speed 3402.17 samples/sec   Loss 8.1693   LearningRate 0.0615   Epoch: 4   Global Step: 21850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:26:56,182-Speed 3385.81 samples/sec   Loss 8.2804   LearningRate 0.0615   Epoch: 4   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:26:59,195-Speed 3399.76 samples/sec   Loss 8.3251   LearningRate 0.0614   Epoch: 4   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:02,218-Speed 3387.15 samples/sec   Loss 8.3297   LearningRate 0.0614   Epoch: 4   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:05,231-Speed 3399.86 samples/sec   Loss 8.1510   LearningRate 0.0614   Epoch: 4   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:08,251-Speed 3391.77 samples/sec   Loss 8.1682   LearningRate 0.0614   Epoch: 4   Global Step: 21900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:11,264-Speed 3399.44 samples/sec   Loss 8.2686   LearningRate 0.0614   Epoch: 4   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:14,270-Speed 3407.48 samples/sec   Loss 8.2927   LearningRate 0.0614   Epoch: 4   Global Step: 21920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:17,290-Speed 3391.60 samples/sec   Loss 8.3009   LearningRate 0.0613   Epoch: 4   Global Step: 21930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:20,300-Speed 3403.07 samples/sec   Loss 8.3254   LearningRate 0.0613   Epoch: 4   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:23,305-Speed 3408.07 samples/sec   Loss 8.1348   LearningRate 0.0613   Epoch: 4   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:27:26,315-Speed 3403.15 samples/sec   Loss 8.3036   LearningRate 0.0613   Epoch: 4   Global Step: 21960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:27:29,325-Speed 3403.64 samples/sec   Loss 8.1635   LearningRate 0.0613   Epoch: 4   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:27:32,334-Speed 3404.36 samples/sec   Loss 8.1914   LearningRate 0.0613   Epoch: 4   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:27:35,339-Speed 3407.77 samples/sec   Loss 8.2857   LearningRate 0.0612   Epoch: 4   Global Step: 21990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:27:38,348-Speed 3404.68 samples/sec   Loss 8.1695   LearningRate 0.0612   Epoch: 4   Global Step: 22000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:28:22,637-[lfw][22000]XNorm: 23.486292
Training: 2022-04-11 01:28:22,637-[lfw][22000]Accuracy-Flip: 0.99667+-0.00289
Training: 2022-04-11 01:28:22,638-[lfw][22000]Accuracy-Highest: 0.99717
Training: 2022-04-11 01:29:14,206-[cfp_fp][22000]XNorm: 20.841552
Training: 2022-04-11 01:29:14,206-[cfp_fp][22000]Accuracy-Flip: 0.95629+-0.01297
Training: 2022-04-11 01:29:14,207-[cfp_fp][22000]Accuracy-Highest: 0.95629
Training: 2022-04-11 01:29:58,255-[agedb_30][22000]XNorm: 23.208466
Training: 2022-04-11 01:29:58,255-[agedb_30][22000]Accuracy-Flip: 0.97200+-0.00963
Training: 2022-04-11 01:29:58,256-[agedb_30][22000]Accuracy-Highest: 0.97383
Training: 2022-04-11 01:30:01,257-Speed 71.65 samples/sec   Loss 8.3200   LearningRate 0.0612   Epoch: 4   Global Step: 22010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:04,259-Speed 3411.89 samples/sec   Loss 8.2228   LearningRate 0.0612   Epoch: 4   Global Step: 22020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:07,254-Speed 3420.19 samples/sec   Loss 8.2490   LearningRate 0.0612   Epoch: 4   Global Step: 22030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:10,872-Speed 2831.29 samples/sec   Loss 8.3996   LearningRate 0.0612   Epoch: 4   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:14,661-Speed 2702.80 samples/sec   Loss 8.2947   LearningRate 0.0612   Epoch: 4   Global Step: 22050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:17,646-Speed 3431.26 samples/sec   Loss 8.2320   LearningRate 0.0611   Epoch: 4   Global Step: 22060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:20,642-Speed 3418.80 samples/sec   Loss 8.2560   LearningRate 0.0611   Epoch: 4   Global Step: 22070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:23,656-Speed 3398.91 samples/sec   Loss 8.3180   LearningRate 0.0611   Epoch: 4   Global Step: 22080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:26,649-Speed 3421.22 samples/sec   Loss 8.4702   LearningRate 0.0611   Epoch: 4   Global Step: 22090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:30:29,635-Speed 3430.68 samples/sec   Loss 8.1963   LearningRate 0.0611   Epoch: 4   Global Step: 22100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:32,636-Speed 3413.14 samples/sec   Loss 8.2689   LearningRate 0.0611   Epoch: 4   Global Step: 22110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:35,645-Speed 3404.66 samples/sec   Loss 8.3522   LearningRate 0.0610   Epoch: 4   Global Step: 22120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:38,651-Speed 3407.05 samples/sec   Loss 8.2672   LearningRate 0.0610   Epoch: 4   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:41,648-Speed 3417.04 samples/sec   Loss 8.2812   LearningRate 0.0610   Epoch: 4   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:44,649-Speed 3414.08 samples/sec   Loss 8.1231   LearningRate 0.0610   Epoch: 4   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:47,644-Speed 3419.96 samples/sec   Loss 8.3632   LearningRate 0.0610   Epoch: 4   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:50,639-Speed 3420.18 samples/sec   Loss 8.2104   LearningRate 0.0610   Epoch: 4   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:53,694-Speed 3352.82 samples/sec   Loss 8.3058   LearningRate 0.0610   Epoch: 4   Global Step: 22180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:56,693-Speed 3414.42 samples/sec   Loss 8.1909   LearningRate 0.0609   Epoch: 4   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:30:59,696-Speed 3411.40 samples/sec   Loss 8.3534   LearningRate 0.0609   Epoch: 4   Global Step: 22200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:31:02,708-Speed 3400.02 samples/sec   Loss 8.3550   LearningRate 0.0609   Epoch: 4   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:31:05,702-Speed 3421.74 samples/sec   Loss 8.1106   LearningRate 0.0609   Epoch: 4   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:08,704-Speed 3411.39 samples/sec   Loss 8.3815   LearningRate 0.0609   Epoch: 4   Global Step: 22230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:11,719-Speed 3397.20 samples/sec   Loss 8.1367   LearningRate 0.0609   Epoch: 4   Global Step: 22240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:14,744-Speed 3386.49 samples/sec   Loss 8.0509   LearningRate 0.0608   Epoch: 4   Global Step: 22250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:17,751-Speed 3406.23 samples/sec   Loss 8.0390   LearningRate 0.0608   Epoch: 4   Global Step: 22260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:20,751-Speed 3413.89 samples/sec   Loss 8.2850   LearningRate 0.0608   Epoch: 4   Global Step: 22270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:23,768-Speed 3395.21 samples/sec   Loss 8.1429   LearningRate 0.0608   Epoch: 4   Global Step: 22280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:26,768-Speed 3414.54 samples/sec   Loss 8.1983   LearningRate 0.0608   Epoch: 4   Global Step: 22290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:29,786-Speed 3393.38 samples/sec   Loss 8.3444   LearningRate 0.0608   Epoch: 4   Global Step: 22300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:32,796-Speed 3402.81 samples/sec   Loss 8.1696   LearningRate 0.0608   Epoch: 4   Global Step: 22310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:35,806-Speed 3403.55 samples/sec   Loss 8.1886   LearningRate 0.0607   Epoch: 4   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:31:38,807-Speed 3412.25 samples/sec   Loss 8.2778   LearningRate 0.0607   Epoch: 4   Global Step: 22330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:31:41,805-Speed 3416.33 samples/sec   Loss 8.2510   LearningRate 0.0607   Epoch: 4   Global Step: 22340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:31:44,783-Speed 3439.65 samples/sec   Loss 8.2025   LearningRate 0.0607   Epoch: 4   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:47,784-Speed 3412.86 samples/sec   Loss 8.2331   LearningRate 0.0607   Epoch: 4   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:50,783-Speed 3416.36 samples/sec   Loss 8.1805   LearningRate 0.0607   Epoch: 4   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:53,781-Speed 3415.80 samples/sec   Loss 8.1194   LearningRate 0.0606   Epoch: 4   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:56,787-Speed 3407.77 samples/sec   Loss 8.4033   LearningRate 0.0606   Epoch: 4   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:31:59,784-Speed 3416.94 samples/sec   Loss 8.1784   LearningRate 0.0606   Epoch: 4   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:02,789-Speed 3408.82 samples/sec   Loss 8.2102   LearningRate 0.0606   Epoch: 4   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:05,785-Speed 3419.90 samples/sec   Loss 8.1689   LearningRate 0.0606   Epoch: 4   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:08,782-Speed 3417.38 samples/sec   Loss 8.1987   LearningRate 0.0606   Epoch: 4   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:11,781-Speed 3414.95 samples/sec   Loss 8.3401   LearningRate 0.0606   Epoch: 4   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:14,782-Speed 3414.52 samples/sec   Loss 7.9934   LearningRate 0.0605   Epoch: 4   Global Step: 22450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:32:17,781-Speed 3415.32 samples/sec   Loss 8.3744   LearningRate 0.0605   Epoch: 4   Global Step: 22460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:32:20,778-Speed 3417.58 samples/sec   Loss 8.3695   LearningRate 0.0605   Epoch: 4   Global Step: 22470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:32:23,777-Speed 3415.36 samples/sec   Loss 8.3371   LearningRate 0.0605   Epoch: 4   Global Step: 22480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:32:26,786-Speed 3403.80 samples/sec   Loss 8.3959   LearningRate 0.0605   Epoch: 4   Global Step: 22490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:32:29,778-Speed 3422.91 samples/sec   Loss 8.2073   LearningRate 0.0605   Epoch: 4   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:32,790-Speed 3400.05 samples/sec   Loss 8.1874   LearningRate 0.0604   Epoch: 4   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:35,789-Speed 3416.49 samples/sec   Loss 8.2731   LearningRate 0.0604   Epoch: 4   Global Step: 22520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:38,785-Speed 3418.29 samples/sec   Loss 8.2355   LearningRate 0.0604   Epoch: 4   Global Step: 22530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:41,789-Speed 3410.28 samples/sec   Loss 8.1561   LearningRate 0.0604   Epoch: 4   Global Step: 22540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:44,784-Speed 3420.23 samples/sec   Loss 8.1498   LearningRate 0.0604   Epoch: 4   Global Step: 22550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:47,782-Speed 3416.20 samples/sec   Loss 8.2323   LearningRate 0.0604   Epoch: 4   Global Step: 22560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:50,778-Speed 3418.31 samples/sec   Loss 8.2034   LearningRate 0.0604   Epoch: 4   Global Step: 22570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:53,783-Speed 3408.15 samples/sec   Loss 8.1938   LearningRate 0.0603   Epoch: 4   Global Step: 22580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:56,782-Speed 3415.80 samples/sec   Loss 8.1246   LearningRate 0.0603   Epoch: 4   Global Step: 22590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:32:59,780-Speed 3416.75 samples/sec   Loss 8.1567   LearningRate 0.0603   Epoch: 4   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:02,778-Speed 3416.71 samples/sec   Loss 8.2261   LearningRate 0.0603   Epoch: 4   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:05,815-Speed 3372.95 samples/sec   Loss 8.0918   LearningRate 0.0603   Epoch: 4   Global Step: 22620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:08,855-Speed 3368.47 samples/sec   Loss 8.2448   LearningRate 0.0603   Epoch: 4   Global Step: 22630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:11,851-Speed 3418.86 samples/sec   Loss 8.2412   LearningRate 0.0602   Epoch: 4   Global Step: 22640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:14,859-Speed 3405.61 samples/sec   Loss 8.2117   LearningRate 0.0602   Epoch: 4   Global Step: 22650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:17,864-Speed 3408.43 samples/sec   Loss 8.1741   LearningRate 0.0602   Epoch: 4   Global Step: 22660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:20,868-Speed 3410.48 samples/sec   Loss 8.2635   LearningRate 0.0602   Epoch: 4   Global Step: 22670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:23,869-Speed 3412.23 samples/sec   Loss 8.2326   LearningRate 0.0602   Epoch: 4   Global Step: 22680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:26,888-Speed 3393.31 samples/sec   Loss 8.1451   LearningRate 0.0602   Epoch: 4   Global Step: 22690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:29,878-Speed 3425.92 samples/sec   Loss 8.0802   LearningRate 0.0602   Epoch: 4   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:32,890-Speed 3400.65 samples/sec   Loss 8.2442   LearningRate 0.0601   Epoch: 4   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:35,888-Speed 3416.40 samples/sec   Loss 8.2694   LearningRate 0.0601   Epoch: 4   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:38,917-Speed 3381.08 samples/sec   Loss 8.3074   LearningRate 0.0601   Epoch: 4   Global Step: 22730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:41,915-Speed 3416.86 samples/sec   Loss 8.2417   LearningRate 0.0601   Epoch: 4   Global Step: 22740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:44,918-Speed 3411.07 samples/sec   Loss 8.2055   LearningRate 0.0601   Epoch: 4   Global Step: 22750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:47,915-Speed 3417.70 samples/sec   Loss 8.2032   LearningRate 0.0601   Epoch: 4   Global Step: 22760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:50,914-Speed 3415.10 samples/sec   Loss 8.1770   LearningRate 0.0600   Epoch: 4   Global Step: 22770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:53,912-Speed 3416.76 samples/sec   Loss 8.2619   LearningRate 0.0600   Epoch: 4   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:56,916-Speed 3409.56 samples/sec   Loss 8.2343   LearningRate 0.0600   Epoch: 4   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:33:59,897-Speed 3435.66 samples/sec   Loss 8.2389   LearningRate 0.0600   Epoch: 4   Global Step: 22800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:02,899-Speed 3412.70 samples/sec   Loss 8.1869   LearningRate 0.0600   Epoch: 4   Global Step: 22810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:05,915-Speed 3395.89 samples/sec   Loss 8.1551   LearningRate 0.0600   Epoch: 4   Global Step: 22820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:08,917-Speed 3411.54 samples/sec   Loss 8.1237   LearningRate 0.0600   Epoch: 4   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:11,916-Speed 3415.06 samples/sec   Loss 8.0531   LearningRate 0.0599   Epoch: 4   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:14,915-Speed 3416.13 samples/sec   Loss 8.1452   LearningRate 0.0599   Epoch: 4   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:17,919-Speed 3409.48 samples/sec   Loss 8.1046   LearningRate 0.0599   Epoch: 4   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:20,916-Speed 3417.23 samples/sec   Loss 8.2249   LearningRate 0.0599   Epoch: 4   Global Step: 22870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:23,922-Speed 3406.81 samples/sec   Loss 8.3966   LearningRate 0.0599   Epoch: 4   Global Step: 22880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:26,945-Speed 3389.13 samples/sec   Loss 8.0804   LearningRate 0.0599   Epoch: 4   Global Step: 22890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:29,929-Speed 3432.87 samples/sec   Loss 8.2382   LearningRate 0.0598   Epoch: 4   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:32,935-Speed 3407.22 samples/sec   Loss 8.0732   LearningRate 0.0598   Epoch: 4   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:35,942-Speed 3405.76 samples/sec   Loss 8.1616   LearningRate 0.0598   Epoch: 4   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:38,941-Speed 3415.18 samples/sec   Loss 8.0908   LearningRate 0.0598   Epoch: 4   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:34:41,923-Speed 3434.97 samples/sec   Loss 8.1425   LearningRate 0.0598   Epoch: 4   Global Step: 22940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:34:44,930-Speed 3406.25 samples/sec   Loss 8.2797   LearningRate 0.0598   Epoch: 4   Global Step: 22950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:34:47,938-Speed 3404.98 samples/sec   Loss 8.3595   LearningRate 0.0598   Epoch: 4   Global Step: 22960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:34:50,952-Speed 3398.52 samples/sec   Loss 8.1791   LearningRate 0.0597   Epoch: 4   Global Step: 22970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:34:53,954-Speed 3412.03 samples/sec   Loss 8.1469   LearningRate 0.0597   Epoch: 4   Global Step: 22980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:34:56,969-Speed 3397.16 samples/sec   Loss 8.0420   LearningRate 0.0597   Epoch: 4   Global Step: 22990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:34:59,994-Speed 3386.35 samples/sec   Loss 8.2114   LearningRate 0.0597   Epoch: 4   Global Step: 23000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:03,020-Speed 3385.02 samples/sec   Loss 8.0636   LearningRate 0.0597   Epoch: 4   Global Step: 23010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:06,025-Speed 3408.67 samples/sec   Loss 8.1125   LearningRate 0.0597   Epoch: 4   Global Step: 23020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:09,044-Speed 3392.92 samples/sec   Loss 8.1330   LearningRate 0.0597   Epoch: 4   Global Step: 23030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:12,046-Speed 3411.27 samples/sec   Loss 8.1459   LearningRate 0.0596   Epoch: 4   Global Step: 23040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:15,061-Speed 3397.01 samples/sec   Loss 8.2145   LearningRate 0.0596   Epoch: 4   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:18,068-Speed 3406.21 samples/sec   Loss 8.1602   LearningRate 0.0596   Epoch: 4   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:21,074-Speed 3407.95 samples/sec   Loss 8.0803   LearningRate 0.0596   Epoch: 4   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:24,073-Speed 3414.84 samples/sec   Loss 8.1580   LearningRate 0.0596   Epoch: 4   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:27,076-Speed 3411.59 samples/sec   Loss 8.0839   LearningRate 0.0596   Epoch: 4   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:30,077-Speed 3413.07 samples/sec   Loss 8.0778   LearningRate 0.0595   Epoch: 4   Global Step: 23100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:33,080-Speed 3409.92 samples/sec   Loss 8.1515   LearningRate 0.0595   Epoch: 4   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:36,093-Speed 3399.62 samples/sec   Loss 8.0621   LearningRate 0.0595   Epoch: 4   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:35:39,084-Speed 3424.83 samples/sec   Loss 8.2105   LearningRate 0.0595   Epoch: 4   Global Step: 23130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:42,088-Speed 3409.18 samples/sec   Loss 8.1254   LearningRate 0.0595   Epoch: 4   Global Step: 23140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:45,099-Speed 3402.39 samples/sec   Loss 8.0255   LearningRate 0.0595   Epoch: 4   Global Step: 23150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:48,107-Speed 3404.92 samples/sec   Loss 8.1329   LearningRate 0.0595   Epoch: 4   Global Step: 23160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:51,109-Speed 3411.79 samples/sec   Loss 8.2252   LearningRate 0.0594   Epoch: 4   Global Step: 23170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:54,113-Speed 3409.46 samples/sec   Loss 8.2022   LearningRate 0.0594   Epoch: 4   Global Step: 23180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:35:57,117-Speed 3410.40 samples/sec   Loss 8.0545   LearningRate 0.0594   Epoch: 4   Global Step: 23190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:36:00,122-Speed 3408.09 samples/sec   Loss 8.2053   LearningRate 0.0594   Epoch: 4   Global Step: 23200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:36:03,136-Speed 3398.75 samples/sec   Loss 8.1135   LearningRate 0.0594   Epoch: 4   Global Step: 23210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:36:06,147-Speed 3401.90 samples/sec   Loss 8.0220   LearningRate 0.0594   Epoch: 4   Global Step: 23220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:36:09,150-Speed 3410.07 samples/sec   Loss 8.0186   LearningRate 0.0593   Epoch: 4   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:12,160-Speed 3403.25 samples/sec   Loss 8.0450   LearningRate 0.0593   Epoch: 4   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:15,171-Speed 3401.32 samples/sec   Loss 8.0324   LearningRate 0.0593   Epoch: 4   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:18,186-Speed 3397.72 samples/sec   Loss 7.9976   LearningRate 0.0593   Epoch: 4   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:21,192-Speed 3407.66 samples/sec   Loss 8.2415   LearningRate 0.0593   Epoch: 4   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:24,220-Speed 3382.60 samples/sec   Loss 8.0015   LearningRate 0.0593   Epoch: 4   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:27,228-Speed 3404.25 samples/sec   Loss 8.0992   LearningRate 0.0593   Epoch: 4   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:30,241-Speed 3400.00 samples/sec   Loss 8.1633   LearningRate 0.0592   Epoch: 4   Global Step: 23300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:33,245-Speed 3410.35 samples/sec   Loss 8.1471   LearningRate 0.0592   Epoch: 4   Global Step: 23310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:36,245-Speed 3413.74 samples/sec   Loss 8.1440   LearningRate 0.0592   Epoch: 4   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:39,250-Speed 3408.91 samples/sec   Loss 8.1009   LearningRate 0.0592   Epoch: 4   Global Step: 23330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:42,254-Speed 3409.72 samples/sec   Loss 8.1107   LearningRate 0.0592   Epoch: 4   Global Step: 23340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:45,255-Speed 3412.91 samples/sec   Loss 8.1241   LearningRate 0.0592   Epoch: 4   Global Step: 23350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:48,263-Speed 3405.47 samples/sec   Loss 8.2195   LearningRate 0.0591   Epoch: 4   Global Step: 23360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:51,268-Speed 3409.12 samples/sec   Loss 8.1908   LearningRate 0.0591   Epoch: 4   Global Step: 23370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:54,275-Speed 3405.10 samples/sec   Loss 8.0809   LearningRate 0.0591   Epoch: 4   Global Step: 23380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:36:57,278-Speed 3411.59 samples/sec   Loss 8.1963   LearningRate 0.0591   Epoch: 4   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:00,284-Speed 3407.07 samples/sec   Loss 8.0470   LearningRate 0.0591   Epoch: 4   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:03,301-Speed 3395.63 samples/sec   Loss 8.0621   LearningRate 0.0591   Epoch: 4   Global Step: 23410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:06,313-Speed 3399.84 samples/sec   Loss 8.2121   LearningRate 0.0591   Epoch: 4   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:09,317-Speed 3410.16 samples/sec   Loss 8.0010   LearningRate 0.0590   Epoch: 4   Global Step: 23430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:12,321-Speed 3409.76 samples/sec   Loss 8.1549   LearningRate 0.0590   Epoch: 4   Global Step: 23440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:15,337-Speed 3395.77 samples/sec   Loss 8.1419   LearningRate 0.0590   Epoch: 4   Global Step: 23450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:18,345-Speed 3404.99 samples/sec   Loss 8.1074   LearningRate 0.0590   Epoch: 4   Global Step: 23460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:21,351-Speed 3407.96 samples/sec   Loss 8.2449   LearningRate 0.0590   Epoch: 4   Global Step: 23470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:24,360-Speed 3403.34 samples/sec   Loss 8.0465   LearningRate 0.0590   Epoch: 4   Global Step: 23480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:27,383-Speed 3388.81 samples/sec   Loss 8.2949   LearningRate 0.0590   Epoch: 4   Global Step: 23490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:37:30,387-Speed 3409.69 samples/sec   Loss 8.0493   LearningRate 0.0589   Epoch: 4   Global Step: 23500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:37:33,402-Speed 3397.23 samples/sec   Loss 8.0787   LearningRate 0.0589   Epoch: 4   Global Step: 23510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:37:36,402-Speed 3413.84 samples/sec   Loss 8.0224   LearningRate 0.0589   Epoch: 4   Global Step: 23520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:37:39,409-Speed 3406.34 samples/sec   Loss 8.0661   LearningRate 0.0589   Epoch: 4   Global Step: 23530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:37:42,415-Speed 3407.16 samples/sec   Loss 8.1630   LearningRate 0.0589   Epoch: 4   Global Step: 23540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:37:45,419-Speed 3409.36 samples/sec   Loss 8.2053   LearningRate 0.0589   Epoch: 4   Global Step: 23550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:37:48,423-Speed 3410.08 samples/sec   Loss 8.1354   LearningRate 0.0588   Epoch: 4   Global Step: 23560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:51,436-Speed 3399.34 samples/sec   Loss 7.8267   LearningRate 0.0588   Epoch: 4   Global Step: 23570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:54,444-Speed 3405.10 samples/sec   Loss 7.9867   LearningRate 0.0588   Epoch: 4   Global Step: 23580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:37:57,454-Speed 3403.21 samples/sec   Loss 8.0945   LearningRate 0.0588   Epoch: 4   Global Step: 23590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:00,466-Speed 3400.53 samples/sec   Loss 8.1085   LearningRate 0.0588   Epoch: 4   Global Step: 23600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:03,469-Speed 3410.61 samples/sec   Loss 8.2847   LearningRate 0.0588   Epoch: 4   Global Step: 23610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:06,470-Speed 3413.50 samples/sec   Loss 8.1574   LearningRate 0.0588   Epoch: 4   Global Step: 23620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:09,476-Speed 3407.78 samples/sec   Loss 8.2138   LearningRate 0.0587   Epoch: 4   Global Step: 23630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:12,484-Speed 3404.51 samples/sec   Loss 8.0572   LearningRate 0.0587   Epoch: 4   Global Step: 23640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:15,494-Speed 3403.59 samples/sec   Loss 7.9426   LearningRate 0.0587   Epoch: 4   Global Step: 23650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:18,509-Speed 3396.34 samples/sec   Loss 8.0106   LearningRate 0.0587   Epoch: 4   Global Step: 23660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:21,512-Speed 3412.00 samples/sec   Loss 7.9277   LearningRate 0.0587   Epoch: 4   Global Step: 23670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:24,517-Speed 3408.58 samples/sec   Loss 7.9842   LearningRate 0.0587   Epoch: 4   Global Step: 23680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:27,529-Speed 3400.05 samples/sec   Loss 8.2156   LearningRate 0.0586   Epoch: 4   Global Step: 23690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:30,531-Speed 3411.43 samples/sec   Loss 8.1039   LearningRate 0.0586   Epoch: 4   Global Step: 23700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:33,534-Speed 3411.52 samples/sec   Loss 8.1977   LearningRate 0.0586   Epoch: 4   Global Step: 23710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:36,549-Speed 3397.20 samples/sec   Loss 7.9335   LearningRate 0.0586   Epoch: 4   Global Step: 23720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:39,550-Speed 3412.21 samples/sec   Loss 8.1715   LearningRate 0.0586   Epoch: 4   Global Step: 23730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:42,557-Speed 3406.79 samples/sec   Loss 8.0161   LearningRate 0.0586   Epoch: 4   Global Step: 23740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:45,618-Speed 3345.95 samples/sec   Loss 7.9748   LearningRate 0.0586   Epoch: 4   Global Step: 23750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:48,608-Speed 3425.61 samples/sec   Loss 7.8902   LearningRate 0.0585   Epoch: 4   Global Step: 23760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:51,611-Speed 3411.06 samples/sec   Loss 8.1437   LearningRate 0.0585   Epoch: 4   Global Step: 23770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:38:54,608-Speed 3417.44 samples/sec   Loss 8.0447   LearningRate 0.0585   Epoch: 4   Global Step: 23780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:38:57,612-Speed 3410.42 samples/sec   Loss 7.9334   LearningRate 0.0585   Epoch: 4   Global Step: 23790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:00,617-Speed 3408.03 samples/sec   Loss 7.8058   LearningRate 0.0585   Epoch: 4   Global Step: 23800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:03,632-Speed 3397.85 samples/sec   Loss 8.1027   LearningRate 0.0585   Epoch: 4   Global Step: 23810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:06,644-Speed 3400.24 samples/sec   Loss 8.1372   LearningRate 0.0585   Epoch: 4   Global Step: 23820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:09,657-Speed 3398.91 samples/sec   Loss 8.2689   LearningRate 0.0584   Epoch: 4   Global Step: 23830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:12,671-Speed 3398.89 samples/sec   Loss 8.1387   LearningRate 0.0584   Epoch: 4   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:15,679-Speed 3405.88 samples/sec   Loss 7.9949   LearningRate 0.0584   Epoch: 4   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:18,687-Speed 3404.58 samples/sec   Loss 8.2649   LearningRate 0.0584   Epoch: 4   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:21,697-Speed 3403.69 samples/sec   Loss 7.9268   LearningRate 0.0584   Epoch: 4   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:24,700-Speed 3410.65 samples/sec   Loss 7.9764   LearningRate 0.0584   Epoch: 4   Global Step: 23880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:39:27,709-Speed 3402.91 samples/sec   Loss 8.1264   LearningRate 0.0583   Epoch: 4   Global Step: 23890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:39:30,692-Speed 3434.18 samples/sec   Loss 8.0581   LearningRate 0.0583   Epoch: 4   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:33,701-Speed 3403.55 samples/sec   Loss 8.0882   LearningRate 0.0583   Epoch: 4   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:36,719-Speed 3394.32 samples/sec   Loss 8.0243   LearningRate 0.0583   Epoch: 4   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:39,732-Speed 3400.51 samples/sec   Loss 8.1658   LearningRate 0.0583   Epoch: 4   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:42,739-Speed 3405.55 samples/sec   Loss 8.0538   LearningRate 0.0583   Epoch: 4   Global Step: 23940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:45,740-Speed 3414.09 samples/sec   Loss 7.9450   LearningRate 0.0583   Epoch: 4   Global Step: 23950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:48,743-Speed 3410.11 samples/sec   Loss 8.0545   LearningRate 0.0582   Epoch: 4   Global Step: 23960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:51,765-Speed 3389.65 samples/sec   Loss 8.0820   LearningRate 0.0582   Epoch: 4   Global Step: 23970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:54,768-Speed 3410.14 samples/sec   Loss 8.0087   LearningRate 0.0582   Epoch: 4   Global Step: 23980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:39:57,770-Speed 3411.93 samples/sec   Loss 8.0503   LearningRate 0.0582   Epoch: 4   Global Step: 23990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:40:00,772-Speed 3411.84 samples/sec   Loss 7.9623   LearningRate 0.0582   Epoch: 4   Global Step: 24000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:40:45,085-[lfw][24000]XNorm: 22.871274
Training: 2022-04-11 01:40:45,086-[lfw][24000]Accuracy-Flip: 0.99683+-0.00337
Training: 2022-04-11 01:40:45,086-[lfw][24000]Accuracy-Highest: 0.99717
Training: 2022-04-11 01:41:36,739-[cfp_fp][24000]XNorm: 19.961849
Training: 2022-04-11 01:41:36,740-[cfp_fp][24000]Accuracy-Flip: 0.94914+-0.00917
Training: 2022-04-11 01:41:36,740-[cfp_fp][24000]Accuracy-Highest: 0.95629
Training: 2022-04-11 01:42:20,873-[agedb_30][24000]XNorm: 22.410717
Training: 2022-04-11 01:42:20,874-[agedb_30][24000]Accuracy-Flip: 0.97450+-0.00723
Training: 2022-04-11 01:42:20,874-[agedb_30][24000]Accuracy-Highest: 0.97450
Training: 2022-04-11 01:42:23,882-Speed 71.55 samples/sec   Loss 7.9803   LearningRate 0.0582   Epoch: 4   Global Step: 24010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:42:26,862-Speed 3436.77 samples/sec   Loss 8.0887   LearningRate 0.0581   Epoch: 4   Global Step: 24020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:42:29,825-Speed 3456.36 samples/sec   Loss 7.9664   LearningRate 0.0581   Epoch: 4   Global Step: 24030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:32,821-Speed 3418.71 samples/sec   Loss 7.9786   LearningRate 0.0581   Epoch: 4   Global Step: 24040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:35,805-Speed 3432.90 samples/sec   Loss 8.1880   LearningRate 0.0581   Epoch: 4   Global Step: 24050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:38,793-Speed 3428.32 samples/sec   Loss 7.8214   LearningRate 0.0581   Epoch: 4   Global Step: 24060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:41,786-Speed 3420.91 samples/sec   Loss 7.9863   LearningRate 0.0581   Epoch: 4   Global Step: 24070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:44,773-Speed 3428.90 samples/sec   Loss 8.0939   LearningRate 0.0581   Epoch: 4   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:47,769-Speed 3419.90 samples/sec   Loss 7.9175   LearningRate 0.0580   Epoch: 4   Global Step: 24090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:50,761-Speed 3422.89 samples/sec   Loss 7.9904   LearningRate 0.0580   Epoch: 4   Global Step: 24100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:53,773-Speed 3401.57 samples/sec   Loss 8.0475   LearningRate 0.0580   Epoch: 4   Global Step: 24110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:56,771-Speed 3415.87 samples/sec   Loss 8.1432   LearningRate 0.0580   Epoch: 4   Global Step: 24120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:42:59,769-Speed 3415.94 samples/sec   Loss 7.9351   LearningRate 0.0580   Epoch: 4   Global Step: 24130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:02,766-Speed 3417.86 samples/sec   Loss 7.9011   LearningRate 0.0580   Epoch: 4   Global Step: 24140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:05,772-Speed 3407.18 samples/sec   Loss 8.0043   LearningRate 0.0580   Epoch: 4   Global Step: 24150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:08,768-Speed 3419.93 samples/sec   Loss 8.0722   LearningRate 0.0579   Epoch: 4   Global Step: 24160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:11,760-Speed 3422.52 samples/sec   Loss 8.1036   LearningRate 0.0579   Epoch: 4   Global Step: 24170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:14,755-Speed 3420.66 samples/sec   Loss 7.9229   LearningRate 0.0579   Epoch: 4   Global Step: 24180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:17,750-Speed 3419.18 samples/sec   Loss 7.9833   LearningRate 0.0579   Epoch: 4   Global Step: 24190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:20,747-Speed 3418.04 samples/sec   Loss 7.8508   LearningRate 0.0579   Epoch: 4   Global Step: 24200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:23,721-Speed 3444.33 samples/sec   Loss 8.0976   LearningRate 0.0579   Epoch: 4   Global Step: 24210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:26,718-Speed 3417.26 samples/sec   Loss 8.1750   LearningRate 0.0578   Epoch: 4   Global Step: 24220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:29,724-Speed 3407.09 samples/sec   Loss 7.9565   LearningRate 0.0578   Epoch: 4   Global Step: 24230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:32,718-Speed 3421.40 samples/sec   Loss 8.0038   LearningRate 0.0578   Epoch: 4   Global Step: 24240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:35,721-Speed 3410.24 samples/sec   Loss 8.1407   LearningRate 0.0578   Epoch: 4   Global Step: 24250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:38,786-Speed 3343.00 samples/sec   Loss 8.1029   LearningRate 0.0578   Epoch: 4   Global Step: 24260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:41,793-Speed 3406.64 samples/sec   Loss 8.0219   LearningRate 0.0578   Epoch: 4   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:44,785-Speed 3422.34 samples/sec   Loss 7.8985   LearningRate 0.0578   Epoch: 4   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:47,781-Speed 3419.88 samples/sec   Loss 8.0982   LearningRate 0.0577   Epoch: 4   Global Step: 24290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:50,778-Speed 3417.15 samples/sec   Loss 8.0760   LearningRate 0.0577   Epoch: 4   Global Step: 24300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:53,771-Speed 3422.00 samples/sec   Loss 8.0525   LearningRate 0.0577   Epoch: 4   Global Step: 24310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:43:56,745-Speed 3444.43 samples/sec   Loss 7.8486   LearningRate 0.0577   Epoch: 4   Global Step: 24320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:43:59,738-Speed 3422.10 samples/sec   Loss 8.2300   LearningRate 0.0577   Epoch: 4   Global Step: 24330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:02,753-Speed 3397.26 samples/sec   Loss 8.0383   LearningRate 0.0577   Epoch: 4   Global Step: 24340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:05,749-Speed 3419.16 samples/sec   Loss 8.0867   LearningRate 0.0577   Epoch: 4   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:08,742-Speed 3421.70 samples/sec   Loss 8.1538   LearningRate 0.0576   Epoch: 4   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:11,742-Speed 3414.83 samples/sec   Loss 7.9436   LearningRate 0.0576   Epoch: 4   Global Step: 24370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:14,737-Speed 3419.46 samples/sec   Loss 8.0565   LearningRate 0.0576   Epoch: 4   Global Step: 24380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:17,731-Speed 3421.11 samples/sec   Loss 7.9799   LearningRate 0.0576   Epoch: 4   Global Step: 24390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:20,727-Speed 3417.95 samples/sec   Loss 8.0079   LearningRate 0.0576   Epoch: 4   Global Step: 24400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:23,722-Speed 3420.95 samples/sec   Loss 8.0614   LearningRate 0.0576   Epoch: 4   Global Step: 24410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:26,741-Speed 3391.78 samples/sec   Loss 7.9493   LearningRate 0.0575   Epoch: 4   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:44:29,718-Speed 3440.88 samples/sec   Loss 7.9825   LearningRate 0.0575   Epoch: 4   Global Step: 24430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:32,716-Speed 3416.82 samples/sec   Loss 8.2419   LearningRate 0.0575   Epoch: 4   Global Step: 24440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:35,730-Speed 3398.14 samples/sec   Loss 8.0893   LearningRate 0.0575   Epoch: 4   Global Step: 24450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:38,726-Speed 3419.50 samples/sec   Loss 7.9135   LearningRate 0.0575   Epoch: 4   Global Step: 24460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:41,722-Speed 3418.20 samples/sec   Loss 8.1192   LearningRate 0.0575   Epoch: 4   Global Step: 24470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:44,720-Speed 3417.04 samples/sec   Loss 8.1430   LearningRate 0.0575   Epoch: 4   Global Step: 24480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:47,711-Speed 3424.44 samples/sec   Loss 7.8852   LearningRate 0.0574   Epoch: 4   Global Step: 24490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:50,710-Speed 3415.25 samples/sec   Loss 8.0522   LearningRate 0.0574   Epoch: 4   Global Step: 24500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:53,722-Speed 3400.39 samples/sec   Loss 8.0097   LearningRate 0.0574   Epoch: 4   Global Step: 24510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:56,729-Speed 3406.74 samples/sec   Loss 7.8832   LearningRate 0.0574   Epoch: 4   Global Step: 24520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:44:59,767-Speed 3371.11 samples/sec   Loss 7.9350   LearningRate 0.0574   Epoch: 4   Global Step: 24530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:02,772-Speed 3408.98 samples/sec   Loss 8.0160   LearningRate 0.0574   Epoch: 4   Global Step: 24540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:05,773-Speed 3411.98 samples/sec   Loss 7.9974   LearningRate 0.0574   Epoch: 4   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:08,778-Speed 3409.29 samples/sec   Loss 7.8817   LearningRate 0.0573   Epoch: 4   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:11,786-Speed 3405.03 samples/sec   Loss 7.9494   LearningRate 0.0573   Epoch: 4   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:14,784-Speed 3415.90 samples/sec   Loss 8.0212   LearningRate 0.0573   Epoch: 4   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:17,783-Speed 3415.40 samples/sec   Loss 8.1173   LearningRate 0.0573   Epoch: 4   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:20,791-Speed 3405.53 samples/sec   Loss 7.9745   LearningRate 0.0573   Epoch: 4   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:23,770-Speed 3439.16 samples/sec   Loss 8.0771   LearningRate 0.0573   Epoch: 4   Global Step: 24610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:26,764-Speed 3420.35 samples/sec   Loss 7.8896   LearningRate 0.0572   Epoch: 4   Global Step: 24620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:29,762-Speed 3416.92 samples/sec   Loss 8.0049   LearningRate 0.0572   Epoch: 4   Global Step: 24630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:32,756-Speed 3420.94 samples/sec   Loss 8.1722   LearningRate 0.0572   Epoch: 4   Global Step: 24640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:35,760-Speed 3409.00 samples/sec   Loss 7.9710   LearningRate 0.0572   Epoch: 4   Global Step: 24650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:38,770-Speed 3403.56 samples/sec   Loss 8.0750   LearningRate 0.0572   Epoch: 4   Global Step: 24660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:41,774-Speed 3409.46 samples/sec   Loss 8.0118   LearningRate 0.0572   Epoch: 4   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:44,778-Speed 3409.21 samples/sec   Loss 7.8834   LearningRate 0.0572   Epoch: 4   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:47,774-Speed 3419.36 samples/sec   Loss 8.1744   LearningRate 0.0571   Epoch: 4   Global Step: 24690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:50,772-Speed 3416.82 samples/sec   Loss 7.8624   LearningRate 0.0571   Epoch: 4   Global Step: 24700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:45:53,782-Speed 3403.24 samples/sec   Loss 7.9533   LearningRate 0.0571   Epoch: 4   Global Step: 24710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:56,778-Speed 3418.76 samples/sec   Loss 7.8043   LearningRate 0.0571   Epoch: 4   Global Step: 24720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:45:59,775-Speed 3416.76 samples/sec   Loss 8.0956   LearningRate 0.0571   Epoch: 4   Global Step: 24730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:46:02,771-Speed 3419.67 samples/sec   Loss 7.9805   LearningRate 0.0571   Epoch: 4   Global Step: 24740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:46:05,764-Speed 3421.27 samples/sec   Loss 7.9039   LearningRate 0.0571   Epoch: 4   Global Step: 24750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:46:08,769-Speed 3408.62 samples/sec   Loss 8.0059   LearningRate 0.0570   Epoch: 4   Global Step: 24760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:46:11,766-Speed 3417.80 samples/sec   Loss 7.9401   LearningRate 0.0570   Epoch: 4   Global Step: 24770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:46:14,745-Speed 3438.00 samples/sec   Loss 7.9900   LearningRate 0.0570   Epoch: 4   Global Step: 24780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:17,741-Speed 3419.08 samples/sec   Loss 8.0222   LearningRate 0.0570   Epoch: 4   Global Step: 24790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:20,738-Speed 3418.24 samples/sec   Loss 7.9707   LearningRate 0.0570   Epoch: 4   Global Step: 24800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:23,735-Speed 3417.54 samples/sec   Loss 7.8647   LearningRate 0.0570   Epoch: 4   Global Step: 24810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:26,733-Speed 3416.13 samples/sec   Loss 7.9280   LearningRate 0.0569   Epoch: 4   Global Step: 24820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:29,728-Speed 3419.77 samples/sec   Loss 7.9872   LearningRate 0.0569   Epoch: 4   Global Step: 24830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:32,739-Speed 3401.90 samples/sec   Loss 8.0460   LearningRate 0.0569   Epoch: 4   Global Step: 24840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:35,743-Speed 3409.90 samples/sec   Loss 8.0555   LearningRate 0.0569   Epoch: 4   Global Step: 24850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:38,760-Speed 3395.46 samples/sec   Loss 7.8577   LearningRate 0.0569   Epoch: 4   Global Step: 24860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:41,760-Speed 3413.99 samples/sec   Loss 7.9800   LearningRate 0.0569   Epoch: 4   Global Step: 24870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:44,746-Speed 3430.02 samples/sec   Loss 7.9375   LearningRate 0.0569   Epoch: 4   Global Step: 24880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:47,742-Speed 3418.32 samples/sec   Loss 8.0620   LearningRate 0.0568   Epoch: 4   Global Step: 24890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:50,743-Speed 3413.94 samples/sec   Loss 7.9962   LearningRate 0.0568   Epoch: 4   Global Step: 24900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:53,780-Speed 3372.48 samples/sec   Loss 7.8722   LearningRate 0.0568   Epoch: 4   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:56,890-Speed 3293.23 samples/sec   Loss 7.8915   LearningRate 0.0568   Epoch: 4   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:46:59,889-Speed 3414.56 samples/sec   Loss 7.9555   LearningRate 0.0568   Epoch: 4   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:02,891-Speed 3412.03 samples/sec   Loss 7.8563   LearningRate 0.0568   Epoch: 4   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:05,885-Speed 3421.70 samples/sec   Loss 7.8015   LearningRate 0.0568   Epoch: 4   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:08,901-Speed 3396.87 samples/sec   Loss 8.0361   LearningRate 0.0567   Epoch: 4   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:11,898-Speed 3416.75 samples/sec   Loss 7.8517   LearningRate 0.0567   Epoch: 4   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:14,908-Speed 3403.14 samples/sec   Loss 7.8080   LearningRate 0.0567   Epoch: 4   Global Step: 24980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:47:17,942-Speed 3375.63 samples/sec   Loss 7.8429   LearningRate 0.0567   Epoch: 4   Global Step: 24990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:47:20,942-Speed 3414.18 samples/sec   Loss 7.9784   LearningRate 0.0567   Epoch: 4   Global Step: 25000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:47:23,920-Speed 3439.88 samples/sec   Loss 8.1426   LearningRate 0.0567   Epoch: 4   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:26,916-Speed 3418.23 samples/sec   Loss 7.8636   LearningRate 0.0567   Epoch: 4   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:29,921-Speed 3408.87 samples/sec   Loss 7.7171   LearningRate 0.0566   Epoch: 4   Global Step: 25030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:32,956-Speed 3375.68 samples/sec   Loss 7.8285   LearningRate 0.0566   Epoch: 4   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:35,953-Speed 3416.93 samples/sec   Loss 7.9650   LearningRate 0.0566   Epoch: 4   Global Step: 25050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:38,953-Speed 3414.28 samples/sec   Loss 7.9996   LearningRate 0.0566   Epoch: 4   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:41,949-Speed 3419.00 samples/sec   Loss 8.0439   LearningRate 0.0566   Epoch: 4   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:44,947-Speed 3415.62 samples/sec   Loss 8.0714   LearningRate 0.0566   Epoch: 4   Global Step: 25080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:47,950-Speed 3411.29 samples/sec   Loss 7.8529   LearningRate 0.0565   Epoch: 4   Global Step: 25090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:50,954-Speed 3409.83 samples/sec   Loss 8.0884   LearningRate 0.0565   Epoch: 4   Global Step: 25100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:47:53,956-Speed 3412.31 samples/sec   Loss 7.9054   LearningRate 0.0565   Epoch: 4   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:47:56,955-Speed 3414.94 samples/sec   Loss 7.8863   LearningRate 0.0565   Epoch: 4   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:47:59,974-Speed 3392.81 samples/sec   Loss 7.8643   LearningRate 0.0565   Epoch: 4   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:48:02,959-Speed 3431.01 samples/sec   Loss 7.9486   LearningRate 0.0565   Epoch: 4   Global Step: 25140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:05,961-Speed 3412.18 samples/sec   Loss 7.7737   LearningRate 0.0565   Epoch: 4   Global Step: 25150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:08,961-Speed 3413.94 samples/sec   Loss 8.0480   LearningRate 0.0564   Epoch: 4   Global Step: 25160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:11,962-Speed 3412.94 samples/sec   Loss 8.1008   LearningRate 0.0564   Epoch: 4   Global Step: 25170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:14,972-Speed 3402.60 samples/sec   Loss 7.9399   LearningRate 0.0564   Epoch: 4   Global Step: 25180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:17,971-Speed 3415.96 samples/sec   Loss 7.8763   LearningRate 0.0564   Epoch: 4   Global Step: 25190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:20,967-Speed 3418.17 samples/sec   Loss 7.8744   LearningRate 0.0564   Epoch: 4   Global Step: 25200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:23,967-Speed 3414.83 samples/sec   Loss 7.7873   LearningRate 0.0564   Epoch: 4   Global Step: 25210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:26,980-Speed 3399.89 samples/sec   Loss 7.9565   LearningRate 0.0564   Epoch: 4   Global Step: 25220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:29,982-Speed 3412.24 samples/sec   Loss 8.0266   LearningRate 0.0563   Epoch: 4   Global Step: 25230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:48:32,980-Speed 3416.69 samples/sec   Loss 7.8999   LearningRate 0.0563   Epoch: 4   Global Step: 25240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:48:35,982-Speed 3410.82 samples/sec   Loss 7.8591   LearningRate 0.0563   Epoch: 4   Global Step: 25250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:48:38,984-Speed 3412.39 samples/sec   Loss 7.9845   LearningRate 0.0563   Epoch: 4   Global Step: 25260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:48:41,993-Speed 3404.22 samples/sec   Loss 7.7760   LearningRate 0.0563   Epoch: 4   Global Step: 25270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:48:45,150-Speed 3244.41 samples/sec   Loss 8.0723   LearningRate 0.0563   Epoch: 4   Global Step: 25280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:48:48,147-Speed 3417.59 samples/sec   Loss 8.0390   LearningRate 0.0563   Epoch: 4   Global Step: 25290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:49:00,566-Speed 824.62 samples/sec   Loss 7.1180   LearningRate 0.0562   Epoch: 5   Global Step: 25300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:03,602-Speed 3373.41 samples/sec   Loss 7.2235   LearningRate 0.0562   Epoch: 5   Global Step: 25310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:06,626-Speed 3387.55 samples/sec   Loss 7.2190   LearningRate 0.0562   Epoch: 5   Global Step: 25320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:09,644-Speed 3393.81 samples/sec   Loss 7.1779   LearningRate 0.0562   Epoch: 5   Global Step: 25330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:12,662-Speed 3393.89 samples/sec   Loss 7.1422   LearningRate 0.0562   Epoch: 5   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:15,659-Speed 3417.75 samples/sec   Loss 7.1941   LearningRate 0.0562   Epoch: 5   Global Step: 25350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:18,675-Speed 3395.99 samples/sec   Loss 7.2570   LearningRate 0.0561   Epoch: 5   Global Step: 25360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:21,677-Speed 3411.14 samples/sec   Loss 7.1732   LearningRate 0.0561   Epoch: 5   Global Step: 25370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:24,679-Speed 3412.26 samples/sec   Loss 7.3868   LearningRate 0.0561   Epoch: 5   Global Step: 25380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:27,694-Speed 3396.80 samples/sec   Loss 7.1631   LearningRate 0.0561   Epoch: 5   Global Step: 25390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:30,696-Speed 3412.38 samples/sec   Loss 7.2904   LearningRate 0.0561   Epoch: 5   Global Step: 25400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:33,706-Speed 3403.39 samples/sec   Loss 7.2792   LearningRate 0.0561   Epoch: 5   Global Step: 25410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:36,711-Speed 3408.20 samples/sec   Loss 7.1899   LearningRate 0.0561   Epoch: 5   Global Step: 25420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:39,709-Speed 3416.62 samples/sec   Loss 7.1550   LearningRate 0.0560   Epoch: 5   Global Step: 25430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:42,707-Speed 3416.16 samples/sec   Loss 7.1979   LearningRate 0.0560   Epoch: 5   Global Step: 25440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:49:45,712-Speed 3409.15 samples/sec   Loss 7.2293   LearningRate 0.0560   Epoch: 5   Global Step: 25450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:48,708-Speed 3418.40 samples/sec   Loss 7.4133   LearningRate 0.0560   Epoch: 5   Global Step: 25460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:51,716-Speed 3404.87 samples/sec   Loss 7.3540   LearningRate 0.0560   Epoch: 5   Global Step: 25470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:54,717-Speed 3413.41 samples/sec   Loss 7.3273   LearningRate 0.0560   Epoch: 5   Global Step: 25480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:49:57,715-Speed 3416.25 samples/sec   Loss 7.3367   LearningRate 0.0560   Epoch: 5   Global Step: 25490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:50:00,795-Speed 3325.39 samples/sec   Loss 7.2419   LearningRate 0.0559   Epoch: 5   Global Step: 25500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:50:03,798-Speed 3410.72 samples/sec   Loss 7.1822   LearningRate 0.0559   Epoch: 5   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:50:06,802-Speed 3410.36 samples/sec   Loss 7.2984   LearningRate 0.0559   Epoch: 5   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:50:09,798-Speed 3418.82 samples/sec   Loss 7.2403   LearningRate 0.0559   Epoch: 5   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:50:12,798-Speed 3413.35 samples/sec   Loss 7.3233   LearningRate 0.0559   Epoch: 5   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:50:15,809-Speed 3402.39 samples/sec   Loss 7.2374   LearningRate 0.0559   Epoch: 5   Global Step: 25550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:18,804-Speed 3419.03 samples/sec   Loss 7.1350   LearningRate 0.0559   Epoch: 5   Global Step: 25560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:21,802-Speed 3416.62 samples/sec   Loss 7.2129   LearningRate 0.0558   Epoch: 5   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:24,811-Speed 3404.48 samples/sec   Loss 7.3841   LearningRate 0.0558   Epoch: 5   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:27,812-Speed 3413.18 samples/sec   Loss 7.3386   LearningRate 0.0558   Epoch: 5   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:30,815-Speed 3411.62 samples/sec   Loss 7.3506   LearningRate 0.0558   Epoch: 5   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:33,824-Speed 3402.76 samples/sec   Loss 7.5393   LearningRate 0.0558   Epoch: 5   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:36,831-Speed 3407.06 samples/sec   Loss 7.3969   LearningRate 0.0558   Epoch: 5   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:39,853-Speed 3388.83 samples/sec   Loss 7.2992   LearningRate 0.0557   Epoch: 5   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:42,858-Speed 3408.09 samples/sec   Loss 7.3999   LearningRate 0.0557   Epoch: 5   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:45,846-Speed 3428.33 samples/sec   Loss 7.5072   LearningRate 0.0557   Epoch: 5   Global Step: 25650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:48,848-Speed 3412.75 samples/sec   Loss 7.4292   LearningRate 0.0557   Epoch: 5   Global Step: 25660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:51,847-Speed 3415.41 samples/sec   Loss 7.4997   LearningRate 0.0557   Epoch: 5   Global Step: 25670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:50:54,831-Speed 3431.65 samples/sec   Loss 7.4021   LearningRate 0.0557   Epoch: 5   Global Step: 25680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:50:57,853-Speed 3389.40 samples/sec   Loss 7.4545   LearningRate 0.0557   Epoch: 5   Global Step: 25690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:00,870-Speed 3395.17 samples/sec   Loss 7.5583   LearningRate 0.0556   Epoch: 5   Global Step: 25700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:03,897-Speed 3383.84 samples/sec   Loss 7.5954   LearningRate 0.0556   Epoch: 5   Global Step: 25710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:06,906-Speed 3403.37 samples/sec   Loss 7.3078   LearningRate 0.0556   Epoch: 5   Global Step: 25720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:09,916-Speed 3403.39 samples/sec   Loss 7.3772   LearningRate 0.0556   Epoch: 5   Global Step: 25730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:12,918-Speed 3412.12 samples/sec   Loss 7.4906   LearningRate 0.0556   Epoch: 5   Global Step: 25740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:15,927-Speed 3404.58 samples/sec   Loss 7.3786   LearningRate 0.0556   Epoch: 5   Global Step: 25750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:18,939-Speed 3400.34 samples/sec   Loss 7.5862   LearningRate 0.0556   Epoch: 5   Global Step: 25760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:21,943-Speed 3409.52 samples/sec   Loss 7.2858   LearningRate 0.0555   Epoch: 5   Global Step: 25770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:51:24,944-Speed 3413.36 samples/sec   Loss 7.4396   LearningRate 0.0555   Epoch: 5   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:27,954-Speed 3402.82 samples/sec   Loss 7.5536   LearningRate 0.0555   Epoch: 5   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:30,958-Speed 3408.97 samples/sec   Loss 7.5191   LearningRate 0.0555   Epoch: 5   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:33,959-Speed 3412.92 samples/sec   Loss 7.5207   LearningRate 0.0555   Epoch: 5   Global Step: 25810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:36,958-Speed 3416.16 samples/sec   Loss 7.5696   LearningRate 0.0555   Epoch: 5   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:39,957-Speed 3415.32 samples/sec   Loss 7.4559   LearningRate 0.0555   Epoch: 5   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:42,967-Speed 3402.70 samples/sec   Loss 7.5467   LearningRate 0.0554   Epoch: 5   Global Step: 25840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:45,969-Speed 3412.23 samples/sec   Loss 7.3428   LearningRate 0.0554   Epoch: 5   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:48,971-Speed 3411.85 samples/sec   Loss 7.5932   LearningRate 0.0554   Epoch: 5   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:51,973-Speed 3411.25 samples/sec   Loss 7.3833   LearningRate 0.0554   Epoch: 5   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:54,972-Speed 3416.18 samples/sec   Loss 7.6029   LearningRate 0.0554   Epoch: 5   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:51:57,973-Speed 3412.18 samples/sec   Loss 7.7389   LearningRate 0.0554   Epoch: 5   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:00,982-Speed 3405.18 samples/sec   Loss 7.4219   LearningRate 0.0553   Epoch: 5   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:03,984-Speed 3411.14 samples/sec   Loss 7.5718   LearningRate 0.0553   Epoch: 5   Global Step: 25910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:06,994-Speed 3403.46 samples/sec   Loss 7.6516   LearningRate 0.0553   Epoch: 5   Global Step: 25920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:09,998-Speed 3409.95 samples/sec   Loss 7.4581   LearningRate 0.0553   Epoch: 5   Global Step: 25930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:12,999-Speed 3412.79 samples/sec   Loss 7.4958   LearningRate 0.0553   Epoch: 5   Global Step: 25940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:15,999-Speed 3413.96 samples/sec   Loss 7.6096   LearningRate 0.0553   Epoch: 5   Global Step: 25950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:19,009-Speed 3403.19 samples/sec   Loss 7.5800   LearningRate 0.0553   Epoch: 5   Global Step: 25960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:22,009-Speed 3412.93 samples/sec   Loss 7.5428   LearningRate 0.0552   Epoch: 5   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:24,991-Speed 3435.88 samples/sec   Loss 7.4913   LearningRate 0.0552   Epoch: 5   Global Step: 25980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:28,001-Speed 3402.42 samples/sec   Loss 7.6043   LearningRate 0.0552   Epoch: 5   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:52:31,001-Speed 3414.17 samples/sec   Loss 7.5785   LearningRate 0.0552   Epoch: 5   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:53:15,416-[lfw][26000]XNorm: 23.011558
Training: 2022-04-11 01:53:15,417-[lfw][26000]Accuracy-Flip: 0.99683+-0.00241
Training: 2022-04-11 01:53:15,418-[lfw][26000]Accuracy-Highest: 0.99717
Training: 2022-04-11 01:54:07,014-[cfp_fp][26000]XNorm: 20.137223
Training: 2022-04-11 01:54:07,014-[cfp_fp][26000]Accuracy-Flip: 0.95486+-0.01184
Training: 2022-04-11 01:54:07,015-[cfp_fp][26000]Accuracy-Highest: 0.95629
Training: 2022-04-11 01:54:53,442-[agedb_30][26000]XNorm: 22.549544
Training: 2022-04-11 01:54:53,442-[agedb_30][26000]Accuracy-Flip: 0.97317+-0.00864
Training: 2022-04-11 01:54:53,443-[agedb_30][26000]Accuracy-Highest: 0.97450
Training: 2022-04-11 01:54:56,465-Speed 70.40 samples/sec   Loss 7.5950   LearningRate 0.0552   Epoch: 5   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:54:59,452-Speed 3429.81 samples/sec   Loss 7.4485   LearningRate 0.0552   Epoch: 5   Global Step: 26020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:02,438-Speed 3430.34 samples/sec   Loss 7.7481   LearningRate 0.0552   Epoch: 5   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:05,419-Speed 3436.26 samples/sec   Loss 7.6064   LearningRate 0.0551   Epoch: 5   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:08,411-Speed 3422.91 samples/sec   Loss 7.5792   LearningRate 0.0551   Epoch: 5   Global Step: 26050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:11,395-Speed 3432.45 samples/sec   Loss 7.6063   LearningRate 0.0551   Epoch: 5   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:14,375-Speed 3436.74 samples/sec   Loss 7.5860   LearningRate 0.0551   Epoch: 5   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:17,355-Speed 3436.52 samples/sec   Loss 7.5786   LearningRate 0.0551   Epoch: 5   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:20,342-Speed 3430.45 samples/sec   Loss 7.7716   LearningRate 0.0551   Epoch: 5   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:23,331-Speed 3426.34 samples/sec   Loss 7.5577   LearningRate 0.0551   Epoch: 5   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:26,319-Speed 3428.01 samples/sec   Loss 7.6029   LearningRate 0.0550   Epoch: 5   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:29,312-Speed 3422.81 samples/sec   Loss 7.6474   LearningRate 0.0550   Epoch: 5   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:55:32,267-Speed 3465.98 samples/sec   Loss 7.5822   LearningRate 0.0550   Epoch: 5   Global Step: 26130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:35,281-Speed 3397.81 samples/sec   Loss 7.5879   LearningRate 0.0550   Epoch: 5   Global Step: 26140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:38,271-Speed 3426.15 samples/sec   Loss 7.3217   LearningRate 0.0550   Epoch: 5   Global Step: 26150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:41,264-Speed 3422.18 samples/sec   Loss 7.5876   LearningRate 0.0550   Epoch: 5   Global Step: 26160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:44,259-Speed 3419.52 samples/sec   Loss 7.5292   LearningRate 0.0550   Epoch: 5   Global Step: 26170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:47,250-Speed 3423.84 samples/sec   Loss 7.7101   LearningRate 0.0549   Epoch: 5   Global Step: 26180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:50,255-Speed 3408.57 samples/sec   Loss 7.5154   LearningRate 0.0549   Epoch: 5   Global Step: 26190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:53,249-Speed 3421.76 samples/sec   Loss 7.6260   LearningRate 0.0549   Epoch: 5   Global Step: 26200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:56,241-Speed 3423.62 samples/sec   Loss 7.5149   LearningRate 0.0549   Epoch: 5   Global Step: 26210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:55:59,247-Speed 3407.34 samples/sec   Loss 7.5823   LearningRate 0.0549   Epoch: 5   Global Step: 26220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 01:56:02,242-Speed 3419.14 samples/sec   Loss 7.6630   LearningRate 0.0549   Epoch: 5   Global Step: 26230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:05,236-Speed 3422.09 samples/sec   Loss 7.5494   LearningRate 0.0549   Epoch: 5   Global Step: 26240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:08,277-Speed 3367.38 samples/sec   Loss 7.5700   LearningRate 0.0548   Epoch: 5   Global Step: 26250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:11,270-Speed 3422.99 samples/sec   Loss 7.7256   LearningRate 0.0548   Epoch: 5   Global Step: 26260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:14,277-Speed 3405.00 samples/sec   Loss 7.6175   LearningRate 0.0548   Epoch: 5   Global Step: 26270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:17,274-Speed 3417.66 samples/sec   Loss 7.6450   LearningRate 0.0548   Epoch: 5   Global Step: 26280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:20,277-Speed 3411.18 samples/sec   Loss 7.5899   LearningRate 0.0548   Epoch: 5   Global Step: 26290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:23,282-Speed 3408.86 samples/sec   Loss 7.6892   LearningRate 0.0548   Epoch: 5   Global Step: 26300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:26,278-Speed 3418.80 samples/sec   Loss 7.5807   LearningRate 0.0547   Epoch: 5   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:29,271-Speed 3422.99 samples/sec   Loss 7.5893   LearningRate 0.0547   Epoch: 5   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:32,262-Speed 3423.75 samples/sec   Loss 7.5949   LearningRate 0.0547   Epoch: 5   Global Step: 26330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:56:35,286-Speed 3386.86 samples/sec   Loss 7.6107   LearningRate 0.0547   Epoch: 5   Global Step: 26340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:56:38,329-Speed 3365.54 samples/sec   Loss 7.6067   LearningRate 0.0547   Epoch: 5   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:41,332-Speed 3411.87 samples/sec   Loss 7.8271   LearningRate 0.0547   Epoch: 5   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:44,321-Speed 3426.63 samples/sec   Loss 7.5850   LearningRate 0.0547   Epoch: 5   Global Step: 26370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:47,314-Speed 3421.36 samples/sec   Loss 7.6538   LearningRate 0.0546   Epoch: 5   Global Step: 26380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:50,303-Speed 3427.13 samples/sec   Loss 7.5386   LearningRate 0.0546   Epoch: 5   Global Step: 26390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:53,297-Speed 3420.89 samples/sec   Loss 7.4753   LearningRate 0.0546   Epoch: 5   Global Step: 26400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:56,289-Speed 3424.23 samples/sec   Loss 7.6863   LearningRate 0.0546   Epoch: 5   Global Step: 26410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:56:59,322-Speed 3375.95 samples/sec   Loss 7.5095   LearningRate 0.0546   Epoch: 5   Global Step: 26420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:57:02,438-Speed 3287.84 samples/sec   Loss 7.5814   LearningRate 0.0546   Epoch: 5   Global Step: 26430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:57:05,429-Speed 3423.93 samples/sec   Loss 7.5560   LearningRate 0.0546   Epoch: 5   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:57:08,420-Speed 3425.01 samples/sec   Loss 7.6255   LearningRate 0.0545   Epoch: 5   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:11,411-Speed 3424.07 samples/sec   Loss 7.6326   LearningRate 0.0545   Epoch: 5   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:14,403-Speed 3422.88 samples/sec   Loss 7.5170   LearningRate 0.0545   Epoch: 5   Global Step: 26470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:17,416-Speed 3400.24 samples/sec   Loss 7.6299   LearningRate 0.0545   Epoch: 5   Global Step: 26480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:20,412-Speed 3419.14 samples/sec   Loss 7.6883   LearningRate 0.0545   Epoch: 5   Global Step: 26490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:23,402-Speed 3424.87 samples/sec   Loss 7.7712   LearningRate 0.0545   Epoch: 5   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:26,396-Speed 3421.52 samples/sec   Loss 7.6825   LearningRate 0.0545   Epoch: 5   Global Step: 26510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:29,387-Speed 3424.10 samples/sec   Loss 7.6384   LearningRate 0.0544   Epoch: 5   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:32,379-Speed 3423.96 samples/sec   Loss 7.7346   LearningRate 0.0544   Epoch: 5   Global Step: 26530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:35,368-Speed 3427.00 samples/sec   Loss 7.5946   LearningRate 0.0544   Epoch: 5   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:38,341-Speed 3444.17 samples/sec   Loss 7.6649   LearningRate 0.0544   Epoch: 5   Global Step: 26550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:41,332-Speed 3425.26 samples/sec   Loss 7.4531   LearningRate 0.0544   Epoch: 5   Global Step: 26560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:44,328-Speed 3418.96 samples/sec   Loss 7.7380   LearningRate 0.0544   Epoch: 5   Global Step: 26570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:47,319-Speed 3424.30 samples/sec   Loss 7.6452   LearningRate 0.0544   Epoch: 5   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:57:50,301-Speed 3434.56 samples/sec   Loss 7.5763   LearningRate 0.0543   Epoch: 5   Global Step: 26590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:57:53,296-Speed 3420.11 samples/sec   Loss 7.7034   LearningRate 0.0543   Epoch: 5   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:57:56,290-Speed 3421.27 samples/sec   Loss 7.7500   LearningRate 0.0543   Epoch: 5   Global Step: 26610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:57:59,285-Speed 3419.66 samples/sec   Loss 7.7469   LearningRate 0.0543   Epoch: 5   Global Step: 26620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:02,280-Speed 3419.72 samples/sec   Loss 7.5980   LearningRate 0.0543   Epoch: 5   Global Step: 26630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:05,276-Speed 3418.69 samples/sec   Loss 7.7072   LearningRate 0.0543   Epoch: 5   Global Step: 26640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:08,267-Speed 3424.02 samples/sec   Loss 7.7464   LearningRate 0.0543   Epoch: 5   Global Step: 26650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:11,263-Speed 3419.61 samples/sec   Loss 7.6570   LearningRate 0.0542   Epoch: 5   Global Step: 26660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:14,255-Speed 3423.72 samples/sec   Loss 7.7021   LearningRate 0.0542   Epoch: 5   Global Step: 26670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:17,246-Speed 3424.43 samples/sec   Loss 7.7862   LearningRate 0.0542   Epoch: 5   Global Step: 26680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:20,236-Speed 3425.51 samples/sec   Loss 7.7220   LearningRate 0.0542   Epoch: 5   Global Step: 26690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:58:23,234-Speed 3416.72 samples/sec   Loss 7.5915   LearningRate 0.0542   Epoch: 5   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:58:26,225-Speed 3423.58 samples/sec   Loss 7.4586   LearningRate 0.0542   Epoch: 5   Global Step: 26710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:58:29,216-Speed 3424.36 samples/sec   Loss 7.5288   LearningRate 0.0541   Epoch: 5   Global Step: 26720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:58:32,197-Speed 3435.85 samples/sec   Loss 7.4810   LearningRate 0.0541   Epoch: 5   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:35,218-Speed 3391.21 samples/sec   Loss 7.5506   LearningRate 0.0541   Epoch: 5   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:38,213-Speed 3419.36 samples/sec   Loss 7.5843   LearningRate 0.0541   Epoch: 5   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:41,207-Speed 3421.91 samples/sec   Loss 7.5036   LearningRate 0.0541   Epoch: 5   Global Step: 26760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:44,205-Speed 3416.22 samples/sec   Loss 7.7655   LearningRate 0.0541   Epoch: 5   Global Step: 26770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:47,198-Speed 3422.02 samples/sec   Loss 7.5448   LearningRate 0.0541   Epoch: 5   Global Step: 26780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:50,197-Speed 3415.90 samples/sec   Loss 7.8315   LearningRate 0.0540   Epoch: 5   Global Step: 26790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:53,195-Speed 3415.41 samples/sec   Loss 7.5259   LearningRate 0.0540   Epoch: 5   Global Step: 26800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:56,204-Speed 3405.10 samples/sec   Loss 7.6723   LearningRate 0.0540   Epoch: 5   Global Step: 26810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:58:59,200-Speed 3417.50 samples/sec   Loss 7.6938   LearningRate 0.0540   Epoch: 5   Global Step: 26820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:59:02,208-Speed 3405.69 samples/sec   Loss 7.5911   LearningRate 0.0540   Epoch: 5   Global Step: 26830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:05,210-Speed 3411.44 samples/sec   Loss 7.5920   LearningRate 0.0540   Epoch: 5   Global Step: 26840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:08,218-Speed 3405.70 samples/sec   Loss 7.6419   LearningRate 0.0540   Epoch: 5   Global Step: 26850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:11,216-Speed 3416.89 samples/sec   Loss 7.5977   LearningRate 0.0539   Epoch: 5   Global Step: 26860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:14,207-Speed 3423.79 samples/sec   Loss 7.7385   LearningRate 0.0539   Epoch: 5   Global Step: 26870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:17,203-Speed 3418.97 samples/sec   Loss 7.7293   LearningRate 0.0539   Epoch: 5   Global Step: 26880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:20,201-Speed 3417.04 samples/sec   Loss 7.6516   LearningRate 0.0539   Epoch: 5   Global Step: 26890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:23,197-Speed 3418.74 samples/sec   Loss 7.6314   LearningRate 0.0539   Epoch: 5   Global Step: 26900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:26,201-Speed 3409.47 samples/sec   Loss 7.7037   LearningRate 0.0539   Epoch: 5   Global Step: 26910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:29,208-Speed 3406.47 samples/sec   Loss 7.6462   LearningRate 0.0539   Epoch: 5   Global Step: 26920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:32,194-Speed 3429.46 samples/sec   Loss 7.5068   LearningRate 0.0538   Epoch: 5   Global Step: 26930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:35,187-Speed 3422.72 samples/sec   Loss 7.6279   LearningRate 0.0538   Epoch: 5   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:38,189-Speed 3410.85 samples/sec   Loss 7.6775   LearningRate 0.0538   Epoch: 5   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 01:59:41,181-Speed 3424.13 samples/sec   Loss 7.5213   LearningRate 0.0538   Epoch: 5   Global Step: 26960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:59:44,177-Speed 3418.96 samples/sec   Loss 7.4821   LearningRate 0.0538   Epoch: 5   Global Step: 26970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:59:47,186-Speed 3403.53 samples/sec   Loss 7.5119   LearningRate 0.0538   Epoch: 5   Global Step: 26980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:59:50,180-Speed 3421.83 samples/sec   Loss 7.5132   LearningRate 0.0538   Epoch: 5   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:59:53,182-Speed 3411.59 samples/sec   Loss 7.6166   LearningRate 0.0537   Epoch: 5   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:59:56,178-Speed 3418.82 samples/sec   Loss 7.5429   LearningRate 0.0537   Epoch: 5   Global Step: 27010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 01:59:59,176-Speed 3416.48 samples/sec   Loss 7.5429   LearningRate 0.0537   Epoch: 5   Global Step: 27020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:02,173-Speed 3416.93 samples/sec   Loss 7.4819   LearningRate 0.0537   Epoch: 5   Global Step: 27030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:05,223-Speed 3358.80 samples/sec   Loss 7.6019   LearningRate 0.0537   Epoch: 5   Global Step: 27040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:08,220-Speed 3417.61 samples/sec   Loss 7.6951   LearningRate 0.0537   Epoch: 5   Global Step: 27050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:11,215-Speed 3419.65 samples/sec   Loss 7.6940   LearningRate 0.0537   Epoch: 5   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:00:14,218-Speed 3410.76 samples/sec   Loss 7.5173   LearningRate 0.0536   Epoch: 5   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:00:17,219-Speed 3413.05 samples/sec   Loss 7.6966   LearningRate 0.0536   Epoch: 5   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:00:20,218-Speed 3416.14 samples/sec   Loss 7.5447   LearningRate 0.0536   Epoch: 5   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:00:23,214-Speed 3417.90 samples/sec   Loss 7.5284   LearningRate 0.0536   Epoch: 5   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:00:26,196-Speed 3434.81 samples/sec   Loss 7.5718   LearningRate 0.0536   Epoch: 5   Global Step: 27110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:29,256-Speed 3347.47 samples/sec   Loss 7.8165   LearningRate 0.0536   Epoch: 5   Global Step: 27120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:32,264-Speed 3405.28 samples/sec   Loss 7.7398   LearningRate 0.0536   Epoch: 5   Global Step: 27130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:35,259-Speed 3420.34 samples/sec   Loss 7.7466   LearningRate 0.0535   Epoch: 5   Global Step: 27140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:38,258-Speed 3414.59 samples/sec   Loss 7.5838   LearningRate 0.0535   Epoch: 5   Global Step: 27150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:41,255-Speed 3418.32 samples/sec   Loss 7.8178   LearningRate 0.0535   Epoch: 5   Global Step: 27160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:44,255-Speed 3414.08 samples/sec   Loss 7.6197   LearningRate 0.0535   Epoch: 5   Global Step: 27170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:47,257-Speed 3411.25 samples/sec   Loss 7.7134   LearningRate 0.0535   Epoch: 5   Global Step: 27180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:50,266-Speed 3404.63 samples/sec   Loss 7.4917   LearningRate 0.0535   Epoch: 5   Global Step: 27190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:53,268-Speed 3411.08 samples/sec   Loss 7.5598   LearningRate 0.0535   Epoch: 5   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:00:56,272-Speed 3410.77 samples/sec   Loss 7.4395   LearningRate 0.0534   Epoch: 5   Global Step: 27210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:00:59,268-Speed 3418.01 samples/sec   Loss 7.6845   LearningRate 0.0534   Epoch: 5   Global Step: 27220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:02,278-Speed 3402.74 samples/sec   Loss 7.5102   LearningRate 0.0534   Epoch: 5   Global Step: 27230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:05,276-Speed 3417.25 samples/sec   Loss 7.6886   LearningRate 0.0534   Epoch: 5   Global Step: 27240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:08,282-Speed 3406.99 samples/sec   Loss 7.6868   LearningRate 0.0534   Epoch: 5   Global Step: 27250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:11,280-Speed 3417.37 samples/sec   Loss 7.8530   LearningRate 0.0534   Epoch: 5   Global Step: 27260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:14,286-Speed 3406.91 samples/sec   Loss 7.4379   LearningRate 0.0534   Epoch: 5   Global Step: 27270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:17,287-Speed 3412.29 samples/sec   Loss 7.5006   LearningRate 0.0533   Epoch: 5   Global Step: 27280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:20,286-Speed 3415.52 samples/sec   Loss 7.5749   LearningRate 0.0533   Epoch: 5   Global Step: 27290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:23,289-Speed 3410.26 samples/sec   Loss 7.8077   LearningRate 0.0533   Epoch: 5   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:26,271-Speed 3435.36 samples/sec   Loss 7.6745   LearningRate 0.0533   Epoch: 5   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:29,270-Speed 3415.16 samples/sec   Loss 7.6732   LearningRate 0.0533   Epoch: 5   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:32,267-Speed 3417.37 samples/sec   Loss 7.5390   LearningRate 0.0533   Epoch: 5   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:35,281-Speed 3399.43 samples/sec   Loss 7.5220   LearningRate 0.0533   Epoch: 5   Global Step: 27340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:38,281-Speed 3414.07 samples/sec   Loss 7.6005   LearningRate 0.0532   Epoch: 5   Global Step: 27350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:41,277-Speed 3418.85 samples/sec   Loss 7.6781   LearningRate 0.0532   Epoch: 5   Global Step: 27360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:01:44,262-Speed 3431.22 samples/sec   Loss 7.6209   LearningRate 0.0532   Epoch: 5   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:01:47,257-Speed 3419.57 samples/sec   Loss 7.5389   LearningRate 0.0532   Epoch: 5   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:01:50,256-Speed 3416.00 samples/sec   Loss 7.4676   LearningRate 0.0532   Epoch: 5   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:01:53,259-Speed 3410.38 samples/sec   Loss 7.6812   LearningRate 0.0532   Epoch: 5   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 02:01:56,262-Speed 3410.43 samples/sec   Loss 7.5013   LearningRate 0.0532   Epoch: 5   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:01:59,265-Speed 3410.59 samples/sec   Loss 7.5147   LearningRate 0.0531   Epoch: 5   Global Step: 27420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:02,264-Speed 3415.36 samples/sec   Loss 7.5921   LearningRate 0.0531   Epoch: 5   Global Step: 27430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:05,269-Speed 3409.88 samples/sec   Loss 7.5974   LearningRate 0.0531   Epoch: 5   Global Step: 27440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:08,268-Speed 3415.24 samples/sec   Loss 7.4281   LearningRate 0.0531   Epoch: 5   Global Step: 27450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:11,269-Speed 3413.19 samples/sec   Loss 7.5297   LearningRate 0.0531   Epoch: 5   Global Step: 27460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:14,270-Speed 3412.38 samples/sec   Loss 7.6231   LearningRate 0.0531   Epoch: 5   Global Step: 27470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:02:17,276-Speed 3407.55 samples/sec   Loss 7.5609   LearningRate 0.0530   Epoch: 5   Global Step: 27480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:02:20,314-Speed 3371.63 samples/sec   Loss 7.6311   LearningRate 0.0530   Epoch: 5   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:02:23,302-Speed 3427.57 samples/sec   Loss 7.3885   LearningRate 0.0530   Epoch: 5   Global Step: 27500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:26,308-Speed 3408.60 samples/sec   Loss 7.6001   LearningRate 0.0530   Epoch: 5   Global Step: 27510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:29,312-Speed 3409.81 samples/sec   Loss 7.6572   LearningRate 0.0530   Epoch: 5   Global Step: 27520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:32,316-Speed 3408.68 samples/sec   Loss 7.5806   LearningRate 0.0530   Epoch: 5   Global Step: 27530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:35,327-Speed 3402.86 samples/sec   Loss 7.4960   LearningRate 0.0530   Epoch: 5   Global Step: 27540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:38,331-Speed 3408.88 samples/sec   Loss 7.4848   LearningRate 0.0529   Epoch: 5   Global Step: 27550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:41,343-Speed 3400.92 samples/sec   Loss 7.5800   LearningRate 0.0529   Epoch: 5   Global Step: 27560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:44,343-Speed 3414.07 samples/sec   Loss 7.5389   LearningRate 0.0529   Epoch: 5   Global Step: 27570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:47,343-Speed 3413.95 samples/sec   Loss 7.3330   LearningRate 0.0529   Epoch: 5   Global Step: 27580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:50,343-Speed 3414.65 samples/sec   Loss 7.5281   LearningRate 0.0529   Epoch: 5   Global Step: 27590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:02:53,368-Speed 3384.99 samples/sec   Loss 7.6358   LearningRate 0.0529   Epoch: 5   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:02:56,378-Speed 3403.15 samples/sec   Loss 7.5665   LearningRate 0.0529   Epoch: 5   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:02:59,418-Speed 3369.40 samples/sec   Loss 7.5649   LearningRate 0.0528   Epoch: 5   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:03:02,482-Speed 3343.06 samples/sec   Loss 7.5445   LearningRate 0.0528   Epoch: 5   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:03:05,495-Speed 3400.05 samples/sec   Loss 7.7302   LearningRate 0.0528   Epoch: 5   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:03:08,498-Speed 3410.86 samples/sec   Loss 7.4120   LearningRate 0.0528   Epoch: 5   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:03:11,506-Speed 3405.36 samples/sec   Loss 7.5382   LearningRate 0.0528   Epoch: 5   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:03:14,507-Speed 3412.85 samples/sec   Loss 7.4046   LearningRate 0.0528   Epoch: 5   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:03:17,519-Speed 3399.36 samples/sec   Loss 7.5903   LearningRate 0.0528   Epoch: 5   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:03:20,502-Speed 3434.57 samples/sec   Loss 7.5748   LearningRate 0.0527   Epoch: 5   Global Step: 27690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:03:23,487-Speed 3431.19 samples/sec   Loss 7.7678   LearningRate 0.0527   Epoch: 5   Global Step: 27700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:26,492-Speed 3408.69 samples/sec   Loss 7.5558   LearningRate 0.0527   Epoch: 5   Global Step: 27710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:29,492-Speed 3414.34 samples/sec   Loss 7.5726   LearningRate 0.0527   Epoch: 5   Global Step: 27720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:32,495-Speed 3410.30 samples/sec   Loss 7.6581   LearningRate 0.0527   Epoch: 5   Global Step: 27730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:35,496-Speed 3413.63 samples/sec   Loss 7.6419   LearningRate 0.0527   Epoch: 5   Global Step: 27740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:38,500-Speed 3409.54 samples/sec   Loss 7.5809   LearningRate 0.0527   Epoch: 5   Global Step: 27750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:41,508-Speed 3404.85 samples/sec   Loss 7.5171   LearningRate 0.0526   Epoch: 5   Global Step: 27760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:44,505-Speed 3418.08 samples/sec   Loss 7.6495   LearningRate 0.0526   Epoch: 5   Global Step: 27770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:47,509-Speed 3409.53 samples/sec   Loss 7.5962   LearningRate 0.0526   Epoch: 5   Global Step: 27780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:50,508-Speed 3414.66 samples/sec   Loss 7.4522   LearningRate 0.0526   Epoch: 5   Global Step: 27790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:03:53,511-Speed 3411.90 samples/sec   Loss 7.4960   LearningRate 0.0526   Epoch: 5   Global Step: 27800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:03:56,509-Speed 3416.37 samples/sec   Loss 7.4926   LearningRate 0.0526   Epoch: 5   Global Step: 27810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:03:59,508-Speed 3415.57 samples/sec   Loss 7.6032   LearningRate 0.0526   Epoch: 5   Global Step: 27820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:02,516-Speed 3404.38 samples/sec   Loss 7.8116   LearningRate 0.0525   Epoch: 5   Global Step: 27830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:05,519-Speed 3411.35 samples/sec   Loss 7.6072   LearningRate 0.0525   Epoch: 5   Global Step: 27840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:08,524-Speed 3407.69 samples/sec   Loss 7.6506   LearningRate 0.0525   Epoch: 5   Global Step: 27850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:11,537-Speed 3399.70 samples/sec   Loss 7.6094   LearningRate 0.0525   Epoch: 5   Global Step: 27860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:14,548-Speed 3401.65 samples/sec   Loss 7.6403   LearningRate 0.0525   Epoch: 5   Global Step: 27870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:17,547-Speed 3415.11 samples/sec   Loss 7.5437   LearningRate 0.0525   Epoch: 5   Global Step: 27880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:20,547-Speed 3415.35 samples/sec   Loss 7.5309   LearningRate 0.0525   Epoch: 5   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:04:23,543-Speed 3417.73 samples/sec   Loss 7.6794   LearningRate 0.0524   Epoch: 5   Global Step: 27900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:26,544-Speed 3413.70 samples/sec   Loss 7.5285   LearningRate 0.0524   Epoch: 5   Global Step: 27910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:29,544-Speed 3413.45 samples/sec   Loss 7.5608   LearningRate 0.0524   Epoch: 5   Global Step: 27920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:32,545-Speed 3413.06 samples/sec   Loss 7.6424   LearningRate 0.0524   Epoch: 5   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:35,548-Speed 3411.64 samples/sec   Loss 7.5332   LearningRate 0.0524   Epoch: 5   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:38,550-Speed 3411.62 samples/sec   Loss 7.4442   LearningRate 0.0524   Epoch: 5   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:41,548-Speed 3416.10 samples/sec   Loss 7.8281   LearningRate 0.0524   Epoch: 5   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:44,562-Speed 3398.57 samples/sec   Loss 7.6979   LearningRate 0.0523   Epoch: 5   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:47,563-Speed 3413.27 samples/sec   Loss 7.5591   LearningRate 0.0523   Epoch: 5   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:50,561-Speed 3416.32 samples/sec   Loss 7.6775   LearningRate 0.0523   Epoch: 5   Global Step: 27990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:04:53,544-Speed 3433.31 samples/sec   Loss 7.5931   LearningRate 0.0523   Epoch: 5   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:05:37,910-[lfw][28000]XNorm: 21.480269
Training: 2022-04-11 02:05:37,910-[lfw][28000]Accuracy-Flip: 0.99717+-0.00211
Training: 2022-04-11 02:05:37,911-[lfw][28000]Accuracy-Highest: 0.99717
Training: 2022-04-11 02:06:29,598-[cfp_fp][28000]XNorm: 18.856448
Training: 2022-04-11 02:06:29,599-[cfp_fp][28000]Accuracy-Flip: 0.95957+-0.01236
Training: 2022-04-11 02:06:29,599-[cfp_fp][28000]Accuracy-Highest: 0.95957
Training: 2022-04-11 02:07:13,656-[agedb_30][28000]XNorm: 21.347683
Training: 2022-04-11 02:07:13,656-[agedb_30][28000]Accuracy-Flip: 0.97567+-0.00655
Training: 2022-04-11 02:07:13,657-[agedb_30][28000]Accuracy-Highest: 0.97567
Training: 2022-04-11 02:07:16,657-Speed 71.55 samples/sec   Loss 7.4783   LearningRate 0.0523   Epoch: 5   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:19,648-Speed 3424.19 samples/sec   Loss 7.6376   LearningRate 0.0523   Epoch: 5   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:22,628-Speed 3436.60 samples/sec   Loss 7.5862   LearningRate 0.0523   Epoch: 5   Global Step: 28030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:25,622-Speed 3421.57 samples/sec   Loss 7.6655   LearningRate 0.0522   Epoch: 5   Global Step: 28040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:28,608-Speed 3430.18 samples/sec   Loss 7.4957   LearningRate 0.0522   Epoch: 5   Global Step: 28050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:31,594-Speed 3430.90 samples/sec   Loss 7.4441   LearningRate 0.0522   Epoch: 5   Global Step: 28060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:34,594-Speed 3413.55 samples/sec   Loss 7.4630   LearningRate 0.0522   Epoch: 5   Global Step: 28070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:37,579-Speed 3431.47 samples/sec   Loss 7.4904   LearningRate 0.0522   Epoch: 5   Global Step: 28080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 02:07:40,569-Speed 3425.62 samples/sec   Loss 7.6452   LearningRate 0.0522   Epoch: 5   Global Step: 28090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:07:43,547-Speed 3439.49 samples/sec   Loss 7.4881   LearningRate 0.0522   Epoch: 5   Global Step: 28100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:07:46,550-Speed 3410.68 samples/sec   Loss 7.4578   LearningRate 0.0521   Epoch: 5   Global Step: 28110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:07:49,587-Speed 3372.52 samples/sec   Loss 7.5565   LearningRate 0.0521   Epoch: 5   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:07:52,583-Speed 3418.80 samples/sec   Loss 7.5536   LearningRate 0.0521   Epoch: 5   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:07:55,574-Speed 3424.16 samples/sec   Loss 7.5144   LearningRate 0.0521   Epoch: 5   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:07:58,567-Speed 3422.87 samples/sec   Loss 7.5238   LearningRate 0.0521   Epoch: 5   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:08:01,559-Speed 3423.16 samples/sec   Loss 7.4874   LearningRate 0.0521   Epoch: 5   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:08:04,582-Speed 3388.77 samples/sec   Loss 7.4048   LearningRate 0.0521   Epoch: 5   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:08:07,581-Speed 3414.13 samples/sec   Loss 7.5066   LearningRate 0.0520   Epoch: 5   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:08:10,537-Speed 3465.63 samples/sec   Loss 7.5309   LearningRate 0.0520   Epoch: 5   Global Step: 28190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:13,537-Speed 3413.75 samples/sec   Loss 7.6161   LearningRate 0.0520   Epoch: 5   Global Step: 28200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:16,576-Speed 3371.24 samples/sec   Loss 7.5343   LearningRate 0.0520   Epoch: 5   Global Step: 28210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:19,570-Speed 3421.07 samples/sec   Loss 7.5227   LearningRate 0.0520   Epoch: 5   Global Step: 28220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:22,561-Speed 3423.41 samples/sec   Loss 7.5855   LearningRate 0.0520   Epoch: 5   Global Step: 28230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:25,555-Speed 3422.41 samples/sec   Loss 7.5310   LearningRate 0.0520   Epoch: 5   Global Step: 28240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:28,551-Speed 3417.91 samples/sec   Loss 7.5416   LearningRate 0.0519   Epoch: 5   Global Step: 28250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:31,543-Speed 3423.94 samples/sec   Loss 7.7062   LearningRate 0.0519   Epoch: 5   Global Step: 28260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:34,539-Speed 3418.05 samples/sec   Loss 7.5359   LearningRate 0.0519   Epoch: 5   Global Step: 28270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:37,615-Speed 3330.07 samples/sec   Loss 7.4515   LearningRate 0.0519   Epoch: 5   Global Step: 28280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:08:40,611-Speed 3419.16 samples/sec   Loss 7.4714   LearningRate 0.0519   Epoch: 5   Global Step: 28290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:08:43,604-Speed 3422.31 samples/sec   Loss 7.4031   LearningRate 0.0519   Epoch: 5   Global Step: 28300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:08:46,603-Speed 3415.78 samples/sec   Loss 7.5017   LearningRate 0.0519   Epoch: 5   Global Step: 28310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:08:49,604-Speed 3411.93 samples/sec   Loss 7.3855   LearningRate 0.0518   Epoch: 5   Global Step: 28320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:08:52,626-Speed 3389.36 samples/sec   Loss 7.6031   LearningRate 0.0518   Epoch: 5   Global Step: 28330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:08:55,627-Speed 3413.52 samples/sec   Loss 7.6553   LearningRate 0.0518   Epoch: 5   Global Step: 28340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:08:58,623-Speed 3418.80 samples/sec   Loss 7.5509   LearningRate 0.0518   Epoch: 5   Global Step: 28350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:09:01,621-Speed 3416.74 samples/sec   Loss 7.3643   LearningRate 0.0518   Epoch: 5   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:09:04,614-Speed 3421.75 samples/sec   Loss 7.6499   LearningRate 0.0518   Epoch: 5   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:09:07,610-Speed 3418.55 samples/sec   Loss 7.4525   LearningRate 0.0518   Epoch: 5   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:09:10,607-Speed 3417.99 samples/sec   Loss 7.7112   LearningRate 0.0517   Epoch: 5   Global Step: 28390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:13,609-Speed 3411.63 samples/sec   Loss 7.4382   LearningRate 0.0517   Epoch: 5   Global Step: 28400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:16,612-Speed 3411.22 samples/sec   Loss 7.6049   LearningRate 0.0517   Epoch: 5   Global Step: 28410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:19,616-Speed 3409.87 samples/sec   Loss 7.3621   LearningRate 0.0517   Epoch: 5   Global Step: 28420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:22,617-Speed 3413.77 samples/sec   Loss 7.4725   LearningRate 0.0517   Epoch: 5   Global Step: 28430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:25,621-Speed 3410.00 samples/sec   Loss 7.4326   LearningRate 0.0517   Epoch: 5   Global Step: 28440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:28,625-Speed 3408.82 samples/sec   Loss 7.4709   LearningRate 0.0517   Epoch: 5   Global Step: 28450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:31,626-Speed 3413.78 samples/sec   Loss 7.6272   LearningRate 0.0516   Epoch: 5   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:34,623-Speed 3417.59 samples/sec   Loss 7.6563   LearningRate 0.0516   Epoch: 5   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:37,621-Speed 3415.73 samples/sec   Loss 7.3805   LearningRate 0.0516   Epoch: 5   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:40,612-Speed 3425.16 samples/sec   Loss 7.6108   LearningRate 0.0516   Epoch: 5   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:43,611-Speed 3415.32 samples/sec   Loss 7.4282   LearningRate 0.0516   Epoch: 5   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:46,610-Speed 3414.96 samples/sec   Loss 7.4202   LearningRate 0.0516   Epoch: 5   Global Step: 28510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:49,608-Speed 3417.00 samples/sec   Loss 7.6234   LearningRate 0.0516   Epoch: 5   Global Step: 28520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:52,605-Speed 3417.09 samples/sec   Loss 7.4859   LearningRate 0.0515   Epoch: 5   Global Step: 28530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:55,609-Speed 3410.33 samples/sec   Loss 7.4984   LearningRate 0.0515   Epoch: 5   Global Step: 28540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:09:58,604-Speed 3419.67 samples/sec   Loss 7.3700   LearningRate 0.0515   Epoch: 5   Global Step: 28550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:10:01,600-Speed 3418.31 samples/sec   Loss 7.4708   LearningRate 0.0515   Epoch: 5   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:04,618-Speed 3394.37 samples/sec   Loss 7.5726   LearningRate 0.0515   Epoch: 5   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:07,616-Speed 3415.72 samples/sec   Loss 7.4566   LearningRate 0.0515   Epoch: 5   Global Step: 28580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:10,615-Speed 3416.01 samples/sec   Loss 7.4888   LearningRate 0.0515   Epoch: 5   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:13,612-Speed 3417.81 samples/sec   Loss 7.5763   LearningRate 0.0514   Epoch: 5   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:16,611-Speed 3415.41 samples/sec   Loss 7.4584   LearningRate 0.0514   Epoch: 5   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:19,613-Speed 3412.02 samples/sec   Loss 7.5881   LearningRate 0.0514   Epoch: 5   Global Step: 28620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:22,609-Speed 3418.66 samples/sec   Loss 7.4796   LearningRate 0.0514   Epoch: 5   Global Step: 28630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:25,609-Speed 3413.65 samples/sec   Loss 7.4459   LearningRate 0.0514   Epoch: 5   Global Step: 28640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:28,618-Speed 3403.65 samples/sec   Loss 7.3396   LearningRate 0.0514   Epoch: 5   Global Step: 28650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:31,600-Speed 3434.99 samples/sec   Loss 7.5305   LearningRate 0.0514   Epoch: 5   Global Step: 28660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:34,602-Speed 3412.26 samples/sec   Loss 7.6061   LearningRate 0.0513   Epoch: 5   Global Step: 28670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:37,628-Speed 3384.80 samples/sec   Loss 7.6412   LearningRate 0.0513   Epoch: 5   Global Step: 28680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:40,634-Speed 3408.79 samples/sec   Loss 7.4627   LearningRate 0.0513   Epoch: 5   Global Step: 28690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:43,632-Speed 3416.57 samples/sec   Loss 7.3533   LearningRate 0.0513   Epoch: 5   Global Step: 28700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:46,630-Speed 3417.14 samples/sec   Loss 7.4333   LearningRate 0.0513   Epoch: 5   Global Step: 28710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:49,626-Speed 3418.02 samples/sec   Loss 7.4029   LearningRate 0.0513   Epoch: 5   Global Step: 28720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:52,623-Speed 3417.30 samples/sec   Loss 7.4877   LearningRate 0.0513   Epoch: 5   Global Step: 28730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:55,622-Speed 3416.19 samples/sec   Loss 7.4587   LearningRate 0.0513   Epoch: 5   Global Step: 28740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:10:58,631-Speed 3403.32 samples/sec   Loss 7.4498   LearningRate 0.0512   Epoch: 5   Global Step: 28750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:01,639-Speed 3405.53 samples/sec   Loss 7.4705   LearningRate 0.0512   Epoch: 5   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:04,649-Speed 3402.92 samples/sec   Loss 7.3652   LearningRate 0.0512   Epoch: 5   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:07,648-Speed 3415.82 samples/sec   Loss 7.5006   LearningRate 0.0512   Epoch: 5   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:10,645-Speed 3417.51 samples/sec   Loss 7.3653   LearningRate 0.0512   Epoch: 5   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:13,660-Speed 3397.57 samples/sec   Loss 7.4809   LearningRate 0.0512   Epoch: 5   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:16,698-Speed 3370.86 samples/sec   Loss 7.4762   LearningRate 0.0512   Epoch: 5   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:19,697-Speed 3415.98 samples/sec   Loss 7.4793   LearningRate 0.0511   Epoch: 5   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:22,698-Speed 3411.97 samples/sec   Loss 7.4180   LearningRate 0.0511   Epoch: 5   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:25,755-Speed 3351.64 samples/sec   Loss 7.5853   LearningRate 0.0511   Epoch: 5   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:28,828-Speed 3333.23 samples/sec   Loss 7.3584   LearningRate 0.0511   Epoch: 5   Global Step: 28850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:31,809-Speed 3435.32 samples/sec   Loss 7.5675   LearningRate 0.0511   Epoch: 5   Global Step: 28860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:11:34,788-Speed 3438.51 samples/sec   Loss 7.4006   LearningRate 0.0511   Epoch: 5   Global Step: 28870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:37,795-Speed 3406.32 samples/sec   Loss 7.6090   LearningRate 0.0511   Epoch: 5   Global Step: 28880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:40,796-Speed 3413.76 samples/sec   Loss 7.4839   LearningRate 0.0510   Epoch: 5   Global Step: 28890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:43,799-Speed 3410.43 samples/sec   Loss 7.4718   LearningRate 0.0510   Epoch: 5   Global Step: 28900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:46,801-Speed 3412.12 samples/sec   Loss 7.5304   LearningRate 0.0510   Epoch: 5   Global Step: 28910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:49,816-Speed 3397.11 samples/sec   Loss 7.4334   LearningRate 0.0510   Epoch: 5   Global Step: 28920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:52,822-Speed 3406.85 samples/sec   Loss 7.5007   LearningRate 0.0510   Epoch: 5   Global Step: 28930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:55,826-Speed 3409.92 samples/sec   Loss 7.3768   LearningRate 0.0510   Epoch: 5   Global Step: 28940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:11:58,837-Speed 3401.79 samples/sec   Loss 7.5722   LearningRate 0.0510   Epoch: 5   Global Step: 28950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:01,843-Speed 3407.63 samples/sec   Loss 7.5209   LearningRate 0.0509   Epoch: 5   Global Step: 28960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:04,847-Speed 3409.24 samples/sec   Loss 7.6181   LearningRate 0.0509   Epoch: 5   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:07,839-Speed 3423.23 samples/sec   Loss 7.3641   LearningRate 0.0509   Epoch: 5   Global Step: 28980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:10,877-Speed 3372.43 samples/sec   Loss 7.4174   LearningRate 0.0509   Epoch: 5   Global Step: 28990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:13,877-Speed 3413.89 samples/sec   Loss 7.3918   LearningRate 0.0509   Epoch: 5   Global Step: 29000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:16,883-Speed 3407.06 samples/sec   Loss 7.4871   LearningRate 0.0509   Epoch: 5   Global Step: 29010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:19,886-Speed 3410.14 samples/sec   Loss 7.4654   LearningRate 0.0509   Epoch: 5   Global Step: 29020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:22,894-Speed 3405.02 samples/sec   Loss 7.3066   LearningRate 0.0508   Epoch: 5   Global Step: 29030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:25,898-Speed 3410.87 samples/sec   Loss 7.4233   LearningRate 0.0508   Epoch: 5   Global Step: 29040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:28,904-Speed 3407.09 samples/sec   Loss 7.4627   LearningRate 0.0508   Epoch: 5   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:31,911-Speed 3406.33 samples/sec   Loss 7.4591   LearningRate 0.0508   Epoch: 5   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:34,916-Speed 3408.90 samples/sec   Loss 7.5005   LearningRate 0.0508   Epoch: 5   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:12:37,918-Speed 3411.32 samples/sec   Loss 7.3919   LearningRate 0.0508   Epoch: 5   Global Step: 29080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:40,924-Speed 3407.36 samples/sec   Loss 7.5376   LearningRate 0.0508   Epoch: 5   Global Step: 29090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:43,929-Speed 3408.15 samples/sec   Loss 7.5011   LearningRate 0.0507   Epoch: 5   Global Step: 29100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:46,934-Speed 3408.90 samples/sec   Loss 7.4278   LearningRate 0.0507   Epoch: 5   Global Step: 29110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:49,948-Speed 3398.25 samples/sec   Loss 7.3353   LearningRate 0.0507   Epoch: 5   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:52,949-Speed 3413.15 samples/sec   Loss 7.4788   LearningRate 0.0507   Epoch: 5   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:55,959-Speed 3403.69 samples/sec   Loss 7.4083   LearningRate 0.0507   Epoch: 5   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:12:58,969-Speed 3403.01 samples/sec   Loss 7.3395   LearningRate 0.0507   Epoch: 5   Global Step: 29150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:01,973-Speed 3409.08 samples/sec   Loss 7.3947   LearningRate 0.0507   Epoch: 5   Global Step: 29160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:04,978-Speed 3408.04 samples/sec   Loss 7.4151   LearningRate 0.0506   Epoch: 5   Global Step: 29170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:07,967-Speed 3427.52 samples/sec   Loss 7.5552   LearningRate 0.0506   Epoch: 5   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:10,965-Speed 3415.77 samples/sec   Loss 7.3256   LearningRate 0.0506   Epoch: 5   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:13,980-Speed 3397.05 samples/sec   Loss 7.3088   LearningRate 0.0506   Epoch: 5   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:16,990-Speed 3403.69 samples/sec   Loss 7.4179   LearningRate 0.0506   Epoch: 5   Global Step: 29210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:20,018-Speed 3382.80 samples/sec   Loss 7.5850   LearningRate 0.0506   Epoch: 5   Global Step: 29220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:23,023-Speed 3408.43 samples/sec   Loss 7.3046   LearningRate 0.0506   Epoch: 5   Global Step: 29230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:26,051-Speed 3382.40 samples/sec   Loss 7.5540   LearningRate 0.0505   Epoch: 5   Global Step: 29240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:29,056-Speed 3408.89 samples/sec   Loss 7.2499   LearningRate 0.0505   Epoch: 5   Global Step: 29250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:32,058-Speed 3411.97 samples/sec   Loss 7.4100   LearningRate 0.0505   Epoch: 5   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:35,060-Speed 3411.82 samples/sec   Loss 7.4107   LearningRate 0.0505   Epoch: 5   Global Step: 29270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:38,054-Speed 3420.75 samples/sec   Loss 7.4590   LearningRate 0.0505   Epoch: 5   Global Step: 29280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:41,077-Speed 3388.49 samples/sec   Loss 7.2975   LearningRate 0.0505   Epoch: 5   Global Step: 29290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:13:44,059-Speed 3434.60 samples/sec   Loss 7.3621   LearningRate 0.0505   Epoch: 5   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:13:47,066-Speed 3406.58 samples/sec   Loss 7.5010   LearningRate 0.0504   Epoch: 5   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:13:50,083-Speed 3394.91 samples/sec   Loss 7.3637   LearningRate 0.0504   Epoch: 5   Global Step: 29320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:13:53,095-Speed 3400.56 samples/sec   Loss 7.3657   LearningRate 0.0504   Epoch: 5   Global Step: 29330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:13:56,108-Speed 3400.18 samples/sec   Loss 7.3835   LearningRate 0.0504   Epoch: 5   Global Step: 29340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:13:59,111-Speed 3409.58 samples/sec   Loss 7.3412   LearningRate 0.0504   Epoch: 5   Global Step: 29350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:02,127-Speed 3397.24 samples/sec   Loss 7.3125   LearningRate 0.0504   Epoch: 5   Global Step: 29360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:05,143-Speed 3395.10 samples/sec   Loss 7.2720   LearningRate 0.0504   Epoch: 5   Global Step: 29370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:08,148-Speed 3409.65 samples/sec   Loss 7.5507   LearningRate 0.0503   Epoch: 5   Global Step: 29380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:11,147-Speed 3415.61 samples/sec   Loss 7.4692   LearningRate 0.0503   Epoch: 5   Global Step: 29390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:14,151-Speed 3409.74 samples/sec   Loss 7.3134   LearningRate 0.0503   Epoch: 5   Global Step: 29400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:14:17,149-Speed 3415.62 samples/sec   Loss 7.4980   LearningRate 0.0503   Epoch: 5   Global Step: 29410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:14:20,134-Speed 3431.82 samples/sec   Loss 7.5331   LearningRate 0.0503   Epoch: 5   Global Step: 29420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:23,138-Speed 3409.12 samples/sec   Loss 7.5266   LearningRate 0.0503   Epoch: 5   Global Step: 29430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:26,155-Speed 3395.11 samples/sec   Loss 7.2949   LearningRate 0.0503   Epoch: 5   Global Step: 29440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:29,171-Speed 3396.08 samples/sec   Loss 7.3715   LearningRate 0.0503   Epoch: 5   Global Step: 29450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:32,172-Speed 3413.00 samples/sec   Loss 7.3345   LearningRate 0.0502   Epoch: 5   Global Step: 29460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:35,193-Speed 3391.32 samples/sec   Loss 7.1863   LearningRate 0.0502   Epoch: 5   Global Step: 29470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:38,202-Speed 3403.41 samples/sec   Loss 7.4917   LearningRate 0.0502   Epoch: 5   Global Step: 29480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:41,205-Speed 3411.52 samples/sec   Loss 7.3533   LearningRate 0.0502   Epoch: 5   Global Step: 29490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:44,211-Speed 3406.78 samples/sec   Loss 7.6056   LearningRate 0.0502   Epoch: 5   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:47,219-Speed 3405.67 samples/sec   Loss 7.3902   LearningRate 0.0502   Epoch: 5   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:14:50,225-Speed 3407.20 samples/sec   Loss 7.4126   LearningRate 0.0502   Epoch: 5   Global Step: 29520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:14:53,238-Speed 3398.84 samples/sec   Loss 7.5247   LearningRate 0.0501   Epoch: 5   Global Step: 29530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:14:56,276-Speed 3371.71 samples/sec   Loss 7.5561   LearningRate 0.0501   Epoch: 5   Global Step: 29540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:14:59,296-Speed 3392.37 samples/sec   Loss 7.4625   LearningRate 0.0501   Epoch: 5   Global Step: 29550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:02,312-Speed 3395.80 samples/sec   Loss 7.3687   LearningRate 0.0501   Epoch: 5   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:05,317-Speed 3409.22 samples/sec   Loss 7.2313   LearningRate 0.0501   Epoch: 5   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:08,320-Speed 3409.70 samples/sec   Loss 7.2421   LearningRate 0.0501   Epoch: 5   Global Step: 29580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:11,323-Speed 3411.79 samples/sec   Loss 7.2869   LearningRate 0.0501   Epoch: 5   Global Step: 29590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:14,324-Speed 3411.93 samples/sec   Loss 7.3727   LearningRate 0.0500   Epoch: 5   Global Step: 29600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:17,331-Speed 3406.58 samples/sec   Loss 7.4531   LearningRate 0.0500   Epoch: 5   Global Step: 29610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:20,315-Speed 3432.98 samples/sec   Loss 7.4479   LearningRate 0.0500   Epoch: 5   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:23,318-Speed 3410.10 samples/sec   Loss 7.2815   LearningRate 0.0500   Epoch: 5   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:26,325-Speed 3407.14 samples/sec   Loss 7.4237   LearningRate 0.0500   Epoch: 5   Global Step: 29640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:29,333-Speed 3405.38 samples/sec   Loss 7.5176   LearningRate 0.0500   Epoch: 5   Global Step: 29650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:32,338-Speed 3408.79 samples/sec   Loss 7.5310   LearningRate 0.0500   Epoch: 5   Global Step: 29660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:35,340-Speed 3410.86 samples/sec   Loss 7.4518   LearningRate 0.0499   Epoch: 5   Global Step: 29670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:38,348-Speed 3405.81 samples/sec   Loss 7.5434   LearningRate 0.0499   Epoch: 5   Global Step: 29680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:41,347-Speed 3414.63 samples/sec   Loss 7.2663   LearningRate 0.0499   Epoch: 5   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:44,355-Speed 3406.00 samples/sec   Loss 7.2490   LearningRate 0.0499   Epoch: 5   Global Step: 29700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:47,361-Speed 3407.19 samples/sec   Loss 7.3063   LearningRate 0.0499   Epoch: 5   Global Step: 29710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:15:50,363-Speed 3411.73 samples/sec   Loss 7.4514   LearningRate 0.0499   Epoch: 5   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:53,369-Speed 3407.55 samples/sec   Loss 7.3148   LearningRate 0.0499   Epoch: 5   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:56,373-Speed 3410.50 samples/sec   Loss 7.3515   LearningRate 0.0498   Epoch: 5   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:15:59,370-Speed 3416.67 samples/sec   Loss 7.3692   LearningRate 0.0498   Epoch: 5   Global Step: 29750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:02,376-Speed 3407.53 samples/sec   Loss 7.2559   LearningRate 0.0498   Epoch: 5   Global Step: 29760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:05,382-Speed 3407.39 samples/sec   Loss 7.4442   LearningRate 0.0498   Epoch: 5   Global Step: 29770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:08,396-Speed 3398.11 samples/sec   Loss 7.5326   LearningRate 0.0498   Epoch: 5   Global Step: 29780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:11,410-Speed 3399.13 samples/sec   Loss 7.3113   LearningRate 0.0498   Epoch: 5   Global Step: 29790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:14,411-Speed 3412.00 samples/sec   Loss 7.4626   LearningRate 0.0498   Epoch: 5   Global Step: 29800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:17,428-Speed 3395.21 samples/sec   Loss 7.4158   LearningRate 0.0497   Epoch: 5   Global Step: 29810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:20,429-Speed 3413.87 samples/sec   Loss 7.2769   LearningRate 0.0497   Epoch: 5   Global Step: 29820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:23,435-Speed 3407.38 samples/sec   Loss 7.3121   LearningRate 0.0497   Epoch: 5   Global Step: 29830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:26,460-Speed 3386.53 samples/sec   Loss 7.2031   LearningRate 0.0497   Epoch: 5   Global Step: 29840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:29,465-Speed 3407.39 samples/sec   Loss 7.4176   LearningRate 0.0497   Epoch: 5   Global Step: 29850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:16:32,475-Speed 3403.84 samples/sec   Loss 7.2821   LearningRate 0.0497   Epoch: 5   Global Step: 29860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:16:35,474-Speed 3414.85 samples/sec   Loss 7.3697   LearningRate 0.0497   Epoch: 5   Global Step: 29870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:16:38,478-Speed 3409.83 samples/sec   Loss 7.3738   LearningRate 0.0496   Epoch: 5   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:16:41,468-Speed 3425.01 samples/sec   Loss 7.2573   LearningRate 0.0496   Epoch: 5   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:44,478-Speed 3403.43 samples/sec   Loss 7.2584   LearningRate 0.0496   Epoch: 5   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:47,488-Speed 3402.95 samples/sec   Loss 7.2797   LearningRate 0.0496   Epoch: 5   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:50,490-Speed 3411.98 samples/sec   Loss 7.3060   LearningRate 0.0496   Epoch: 5   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:53,504-Speed 3397.71 samples/sec   Loss 7.4019   LearningRate 0.0496   Epoch: 5   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:56,511-Speed 3406.97 samples/sec   Loss 7.5194   LearningRate 0.0496   Epoch: 5   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:16:59,519-Speed 3404.66 samples/sec   Loss 7.2247   LearningRate 0.0496   Epoch: 5   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:17:02,530-Speed 3401.83 samples/sec   Loss 7.3586   LearningRate 0.0495   Epoch: 5   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:17:05,543-Speed 3399.90 samples/sec   Loss 7.4073   LearningRate 0.0495   Epoch: 5   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:17:08,545-Speed 3411.34 samples/sec   Loss 7.3690   LearningRate 0.0495   Epoch: 5   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:17:11,554-Speed 3404.24 samples/sec   Loss 7.2202   LearningRate 0.0495   Epoch: 5   Global Step: 29990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:17:14,563-Speed 3404.35 samples/sec   Loss 7.3548   LearningRate 0.0495   Epoch: 5   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:17:58,978-[lfw][30000]XNorm: 24.189952
Training: 2022-04-11 02:17:58,979-[lfw][30000]Accuracy-Flip: 0.99717+-0.00325
Training: 2022-04-11 02:17:58,979-[lfw][30000]Accuracy-Highest: 0.99717
Training: 2022-04-11 02:18:50,448-[cfp_fp][30000]XNorm: 21.420047
Training: 2022-04-11 02:18:50,449-[cfp_fp][30000]Accuracy-Flip: 0.96443+-0.00839
Training: 2022-04-11 02:18:50,449-[cfp_fp][30000]Accuracy-Highest: 0.96443
Training: 2022-04-11 02:19:34,832-[agedb_30][30000]XNorm: 24.095577
Training: 2022-04-11 02:19:34,833-[agedb_30][30000]Accuracy-Flip: 0.97750+-0.00655
Training: 2022-04-11 02:19:34,833-[agedb_30][30000]Accuracy-Highest: 0.97750
Training: 2022-04-11 02:19:37,823-Speed 71.48 samples/sec   Loss 7.3324   LearningRate 0.0495   Epoch: 5   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:19:40,809-Speed 3430.38 samples/sec   Loss 7.2771   LearningRate 0.0495   Epoch: 5   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:19:43,799-Speed 3424.73 samples/sec   Loss 7.4157   LearningRate 0.0494   Epoch: 5   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:19:46,780-Speed 3437.24 samples/sec   Loss 7.4154   LearningRate 0.0494   Epoch: 5   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:19:49,768-Speed 3428.00 samples/sec   Loss 7.4594   LearningRate 0.0494   Epoch: 5   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:19:52,724-Speed 3465.02 samples/sec   Loss 7.5247   LearningRate 0.0494   Epoch: 5   Global Step: 30060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:19:55,715-Speed 3424.03 samples/sec   Loss 7.3614   LearningRate 0.0494   Epoch: 5   Global Step: 30070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:19:58,701-Speed 3429.43 samples/sec   Loss 7.5177   LearningRate 0.0494   Epoch: 5   Global Step: 30080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:01,692-Speed 3425.63 samples/sec   Loss 7.3512   LearningRate 0.0494   Epoch: 5   Global Step: 30090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:04,684-Speed 3422.82 samples/sec   Loss 7.4419   LearningRate 0.0493   Epoch: 5   Global Step: 30100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:07,680-Speed 3418.73 samples/sec   Loss 7.3610   LearningRate 0.0493   Epoch: 5   Global Step: 30110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:10,676-Speed 3419.04 samples/sec   Loss 7.4625   LearningRate 0.0493   Epoch: 5   Global Step: 30120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:13,672-Speed 3419.05 samples/sec   Loss 7.2953   LearningRate 0.0493   Epoch: 5   Global Step: 30130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:16,667-Speed 3421.10 samples/sec   Loss 7.4333   LearningRate 0.0493   Epoch: 5   Global Step: 30140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:19,658-Speed 3424.11 samples/sec   Loss 7.2979   LearningRate 0.0493   Epoch: 5   Global Step: 30150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:20:22,650-Speed 3423.64 samples/sec   Loss 7.2686   LearningRate 0.0493   Epoch: 5   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:25,646-Speed 3418.74 samples/sec   Loss 7.3315   LearningRate 0.0492   Epoch: 5   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:28,670-Speed 3386.41 samples/sec   Loss 7.1286   LearningRate 0.0492   Epoch: 5   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:31,666-Speed 3419.40 samples/sec   Loss 7.2953   LearningRate 0.0492   Epoch: 5   Global Step: 30190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:34,660-Speed 3421.10 samples/sec   Loss 7.4072   LearningRate 0.0492   Epoch: 5   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:37,656-Speed 3419.10 samples/sec   Loss 7.2211   LearningRate 0.0492   Epoch: 5   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:40,651-Speed 3419.19 samples/sec   Loss 7.3729   LearningRate 0.0492   Epoch: 5   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:43,653-Speed 3412.48 samples/sec   Loss 7.2379   LearningRate 0.0492   Epoch: 5   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:46,648-Speed 3419.89 samples/sec   Loss 7.3036   LearningRate 0.0491   Epoch: 5   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:49,641-Speed 3421.75 samples/sec   Loss 7.3215   LearningRate 0.0491   Epoch: 5   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:20:52,662-Speed 3390.76 samples/sec   Loss 7.4003   LearningRate 0.0491   Epoch: 5   Global Step: 30260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:20:55,655-Speed 3422.24 samples/sec   Loss 7.2752   LearningRate 0.0491   Epoch: 5   Global Step: 30270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:20:58,654-Speed 3415.08 samples/sec   Loss 7.2771   LearningRate 0.0491   Epoch: 5   Global Step: 30280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:21:01,656-Speed 3411.93 samples/sec   Loss 7.3047   LearningRate 0.0491   Epoch: 5   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:04,655-Speed 3415.55 samples/sec   Loss 7.2887   LearningRate 0.0491   Epoch: 5   Global Step: 30300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:07,652-Speed 3418.43 samples/sec   Loss 7.3836   LearningRate 0.0491   Epoch: 5   Global Step: 30310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:10,642-Speed 3424.93 samples/sec   Loss 7.2104   LearningRate 0.0490   Epoch: 5   Global Step: 30320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:13,633-Speed 3424.45 samples/sec   Loss 7.5638   LearningRate 0.0490   Epoch: 5   Global Step: 30330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:16,699-Speed 3341.06 samples/sec   Loss 7.5515   LearningRate 0.0490   Epoch: 5   Global Step: 30340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:29,244-Speed 816.31 samples/sec   Loss 7.2153   LearningRate 0.0490   Epoch: 6   Global Step: 30350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:32,260-Speed 3396.78 samples/sec   Loss 6.5246   LearningRate 0.0490   Epoch: 6   Global Step: 30360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:35,295-Speed 3375.42 samples/sec   Loss 6.4626   LearningRate 0.0490   Epoch: 6   Global Step: 30370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:38,308-Speed 3399.70 samples/sec   Loss 6.4595   LearningRate 0.0490   Epoch: 6   Global Step: 30380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:21:41,330-Speed 3388.83 samples/sec   Loss 6.4377   LearningRate 0.0489   Epoch: 6   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:21:44,348-Speed 3395.11 samples/sec   Loss 6.4218   LearningRate 0.0489   Epoch: 6   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:21:47,367-Speed 3393.76 samples/sec   Loss 6.5353   LearningRate 0.0489   Epoch: 6   Global Step: 30410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:21:50,385-Speed 3395.09 samples/sec   Loss 6.4843   LearningRate 0.0489   Epoch: 6   Global Step: 30420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:21:53,412-Speed 3383.11 samples/sec   Loss 6.5395   LearningRate 0.0489   Epoch: 6   Global Step: 30430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:21:56,415-Speed 3411.19 samples/sec   Loss 6.6336   LearningRate 0.0489   Epoch: 6   Global Step: 30440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:21:59,417-Speed 3413.24 samples/sec   Loss 6.6258   LearningRate 0.0489   Epoch: 6   Global Step: 30450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:02,440-Speed 3389.08 samples/sec   Loss 6.6702   LearningRate 0.0488   Epoch: 6   Global Step: 30460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:05,450-Speed 3402.46 samples/sec   Loss 6.6233   LearningRate 0.0488   Epoch: 6   Global Step: 30470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:08,511-Speed 3346.41 samples/sec   Loss 6.6723   LearningRate 0.0488   Epoch: 6   Global Step: 30480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:11,536-Speed 3386.85 samples/sec   Loss 6.6489   LearningRate 0.0488   Epoch: 6   Global Step: 30490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:14,551-Speed 3397.63 samples/sec   Loss 6.7220   LearningRate 0.0488   Epoch: 6   Global Step: 30500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:17,579-Speed 3382.49 samples/sec   Loss 6.7857   LearningRate 0.0488   Epoch: 6   Global Step: 30510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:20,590-Speed 3401.45 samples/sec   Loss 6.6230   LearningRate 0.0488   Epoch: 6   Global Step: 30520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:23,586-Speed 3418.56 samples/sec   Loss 6.7405   LearningRate 0.0487   Epoch: 6   Global Step: 30530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:26,594-Speed 3405.43 samples/sec   Loss 6.7609   LearningRate 0.0487   Epoch: 6   Global Step: 30540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:22:29,583-Speed 3426.66 samples/sec   Loss 6.5843   LearningRate 0.0487   Epoch: 6   Global Step: 30550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:32,605-Speed 3389.75 samples/sec   Loss 6.8000   LearningRate 0.0487   Epoch: 6   Global Step: 30560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:35,620-Speed 3397.61 samples/sec   Loss 6.7771   LearningRate 0.0487   Epoch: 6   Global Step: 30570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:38,627-Speed 3405.71 samples/sec   Loss 6.6901   LearningRate 0.0487   Epoch: 6   Global Step: 30580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:41,633-Speed 3407.62 samples/sec   Loss 6.7177   LearningRate 0.0487   Epoch: 6   Global Step: 30590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:44,642-Speed 3404.53 samples/sec   Loss 6.7054   LearningRate 0.0487   Epoch: 6   Global Step: 30600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:47,656-Speed 3398.67 samples/sec   Loss 6.6322   LearningRate 0.0486   Epoch: 6   Global Step: 30610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:50,763-Speed 3296.68 samples/sec   Loss 6.6386   LearningRate 0.0486   Epoch: 6   Global Step: 30620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:53,855-Speed 3312.22 samples/sec   Loss 6.7880   LearningRate 0.0486   Epoch: 6   Global Step: 30630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:56,869-Speed 3399.04 samples/sec   Loss 6.8539   LearningRate 0.0486   Epoch: 6   Global Step: 30640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:22:59,897-Speed 3382.50 samples/sec   Loss 6.7852   LearningRate 0.0486   Epoch: 6   Global Step: 30650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:02,907-Speed 3403.19 samples/sec   Loss 6.7805   LearningRate 0.0486   Epoch: 6   Global Step: 30660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:05,946-Speed 3370.38 samples/sec   Loss 6.7954   LearningRate 0.0486   Epoch: 6   Global Step: 30670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:08,952-Speed 3406.59 samples/sec   Loss 6.6738   LearningRate 0.0485   Epoch: 6   Global Step: 30680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:11,959-Speed 3406.83 samples/sec   Loss 6.8363   LearningRate 0.0485   Epoch: 6   Global Step: 30690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:14,960-Speed 3413.27 samples/sec   Loss 6.8920   LearningRate 0.0485   Epoch: 6   Global Step: 30700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:18,059-Speed 3305.51 samples/sec   Loss 6.6756   LearningRate 0.0485   Epoch: 6   Global Step: 30710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:21,073-Speed 3397.61 samples/sec   Loss 6.8646   LearningRate 0.0485   Epoch: 6   Global Step: 30720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:24,082-Speed 3405.01 samples/sec   Loss 6.7057   LearningRate 0.0485   Epoch: 6   Global Step: 30730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:27,083-Speed 3412.60 samples/sec   Loss 6.7691   LearningRate 0.0485   Epoch: 6   Global Step: 30740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:30,072-Speed 3426.87 samples/sec   Loss 6.7011   LearningRate 0.0484   Epoch: 6   Global Step: 30750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:33,072-Speed 3414.11 samples/sec   Loss 6.8118   LearningRate 0.0484   Epoch: 6   Global Step: 30760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:36,086-Speed 3398.78 samples/sec   Loss 6.9273   LearningRate 0.0484   Epoch: 6   Global Step: 30770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:39,111-Speed 3385.78 samples/sec   Loss 6.9024   LearningRate 0.0484   Epoch: 6   Global Step: 30780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:42,158-Speed 3361.78 samples/sec   Loss 6.8229   LearningRate 0.0484   Epoch: 6   Global Step: 30790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:45,162-Speed 3409.30 samples/sec   Loss 6.8808   LearningRate 0.0484   Epoch: 6   Global Step: 30800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:48,169-Speed 3406.63 samples/sec   Loss 6.9338   LearningRate 0.0484   Epoch: 6   Global Step: 30810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:51,224-Speed 3352.28 samples/sec   Loss 6.9000   LearningRate 0.0483   Epoch: 6   Global Step: 30820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:54,245-Speed 3390.79 samples/sec   Loss 6.8763   LearningRate 0.0483   Epoch: 6   Global Step: 30830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:23:57,250-Speed 3408.37 samples/sec   Loss 6.9689   LearningRate 0.0483   Epoch: 6   Global Step: 30840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:24:00,243-Speed 3422.35 samples/sec   Loss 6.9081   LearningRate 0.0483   Epoch: 6   Global Step: 30850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:24:03,245-Speed 3412.25 samples/sec   Loss 6.8830   LearningRate 0.0483   Epoch: 6   Global Step: 30860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:24:06,274-Speed 3381.23 samples/sec   Loss 6.8558   LearningRate 0.0483   Epoch: 6   Global Step: 30870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:24:09,277-Speed 3410.99 samples/sec   Loss 7.0297   LearningRate 0.0483   Epoch: 6   Global Step: 30880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:24:12,285-Speed 3405.28 samples/sec   Loss 7.0570   LearningRate 0.0483   Epoch: 6   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:24:15,271-Speed 3430.06 samples/sec   Loss 6.8291   LearningRate 0.0482   Epoch: 6   Global Step: 30900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:24:18,259-Speed 3427.47 samples/sec   Loss 6.8975   LearningRate 0.0482   Epoch: 6   Global Step: 30910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:21,272-Speed 3400.23 samples/sec   Loss 7.0064   LearningRate 0.0482   Epoch: 6   Global Step: 30920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:24,277-Speed 3408.42 samples/sec   Loss 6.7929   LearningRate 0.0482   Epoch: 6   Global Step: 30930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:27,294-Speed 3395.30 samples/sec   Loss 6.9881   LearningRate 0.0482   Epoch: 6   Global Step: 30940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:30,368-Speed 3332.29 samples/sec   Loss 6.9891   LearningRate 0.0482   Epoch: 6   Global Step: 30950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:33,394-Speed 3384.64 samples/sec   Loss 6.8881   LearningRate 0.0482   Epoch: 6   Global Step: 30960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:36,399-Speed 3408.10 samples/sec   Loss 6.8595   LearningRate 0.0481   Epoch: 6   Global Step: 30970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:39,402-Speed 3411.04 samples/sec   Loss 6.9285   LearningRate 0.0481   Epoch: 6   Global Step: 30980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:42,411-Speed 3404.53 samples/sec   Loss 6.8733   LearningRate 0.0481   Epoch: 6   Global Step: 30990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:45,419-Speed 3404.68 samples/sec   Loss 6.9214   LearningRate 0.0481   Epoch: 6   Global Step: 31000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:48,427-Speed 3405.33 samples/sec   Loss 6.9361   LearningRate 0.0481   Epoch: 6   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:24:51,440-Speed 3400.63 samples/sec   Loss 7.0905   LearningRate 0.0481   Epoch: 6   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:24:54,460-Speed 3390.51 samples/sec   Loss 6.9062   LearningRate 0.0481   Epoch: 6   Global Step: 31030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:24:57,496-Speed 3374.59 samples/sec   Loss 6.9328   LearningRate 0.0480   Epoch: 6   Global Step: 31040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:00,508-Speed 3401.15 samples/sec   Loss 6.8551   LearningRate 0.0480   Epoch: 6   Global Step: 31050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:03,527-Speed 3392.78 samples/sec   Loss 6.9707   LearningRate 0.0480   Epoch: 6   Global Step: 31060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:06,568-Speed 3367.70 samples/sec   Loss 6.9692   LearningRate 0.0480   Epoch: 6   Global Step: 31070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:09,578-Speed 3402.79 samples/sec   Loss 6.9204   LearningRate 0.0480   Epoch: 6   Global Step: 31080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:12,593-Speed 3397.46 samples/sec   Loss 6.8929   LearningRate 0.0480   Epoch: 6   Global Step: 31090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:15,690-Speed 3307.31 samples/sec   Loss 6.9155   LearningRate 0.0480   Epoch: 6   Global Step: 31100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:18,737-Speed 3362.25 samples/sec   Loss 6.9157   LearningRate 0.0480   Epoch: 6   Global Step: 31110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:21,789-Speed 3356.28 samples/sec   Loss 6.8728   LearningRate 0.0479   Epoch: 6   Global Step: 31120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:25:24,810-Speed 3389.65 samples/sec   Loss 7.0824   LearningRate 0.0479   Epoch: 6   Global Step: 31130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:27,822-Speed 3400.44 samples/sec   Loss 6.9417   LearningRate 0.0479   Epoch: 6   Global Step: 31140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:30,823-Speed 3412.82 samples/sec   Loss 6.8946   LearningRate 0.0479   Epoch: 6   Global Step: 31150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:33,833-Speed 3403.00 samples/sec   Loss 7.0804   LearningRate 0.0479   Epoch: 6   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:36,852-Speed 3393.80 samples/sec   Loss 7.1426   LearningRate 0.0479   Epoch: 6   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:39,865-Speed 3399.55 samples/sec   Loss 7.0206   LearningRate 0.0479   Epoch: 6   Global Step: 31180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:42,915-Speed 3358.06 samples/sec   Loss 7.1202   LearningRate 0.0478   Epoch: 6   Global Step: 31190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:45,923-Speed 3404.79 samples/sec   Loss 6.9653   LearningRate 0.0478   Epoch: 6   Global Step: 31200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:48,928-Speed 3408.25 samples/sec   Loss 6.9984   LearningRate 0.0478   Epoch: 6   Global Step: 31210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:51,940-Speed 3400.38 samples/sec   Loss 7.0721   LearningRate 0.0478   Epoch: 6   Global Step: 31220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:25:54,945-Speed 3408.51 samples/sec   Loss 7.0429   LearningRate 0.0478   Epoch: 6   Global Step: 31230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:25:57,953-Speed 3405.69 samples/sec   Loss 7.0528   LearningRate 0.0478   Epoch: 6   Global Step: 31240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:00,956-Speed 3410.45 samples/sec   Loss 7.0532   LearningRate 0.0478   Epoch: 6   Global Step: 31250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:03,967-Speed 3402.34 samples/sec   Loss 7.1095   LearningRate 0.0477   Epoch: 6   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:07,036-Speed 3337.48 samples/sec   Loss 7.0245   LearningRate 0.0477   Epoch: 6   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:10,044-Speed 3404.51 samples/sec   Loss 6.8883   LearningRate 0.0477   Epoch: 6   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:13,097-Speed 3355.47 samples/sec   Loss 7.0201   LearningRate 0.0477   Epoch: 6   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:16,103-Speed 3407.09 samples/sec   Loss 7.0214   LearningRate 0.0477   Epoch: 6   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:19,113-Speed 3402.77 samples/sec   Loss 6.9982   LearningRate 0.0477   Epoch: 6   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:22,120-Speed 3407.01 samples/sec   Loss 6.9790   LearningRate 0.0477   Epoch: 6   Global Step: 31320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:25,124-Speed 3409.54 samples/sec   Loss 7.1586   LearningRate 0.0477   Epoch: 6   Global Step: 31330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:28,164-Speed 3368.98 samples/sec   Loss 7.0158   LearningRate 0.0476   Epoch: 6   Global Step: 31340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:31,200-Speed 3373.88 samples/sec   Loss 7.1747   LearningRate 0.0476   Epoch: 6   Global Step: 31350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:34,210-Speed 3403.19 samples/sec   Loss 7.0833   LearningRate 0.0476   Epoch: 6   Global Step: 31360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:37,224-Speed 3398.87 samples/sec   Loss 6.9851   LearningRate 0.0476   Epoch: 6   Global Step: 31370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:40,237-Speed 3399.48 samples/sec   Loss 6.9664   LearningRate 0.0476   Epoch: 6   Global Step: 31380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:43,247-Speed 3402.13 samples/sec   Loss 7.0686   LearningRate 0.0476   Epoch: 6   Global Step: 31390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:46,252-Speed 3408.70 samples/sec   Loss 6.9184   LearningRate 0.0476   Epoch: 6   Global Step: 31400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:49,264-Speed 3401.45 samples/sec   Loss 6.8147   LearningRate 0.0475   Epoch: 6   Global Step: 31410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:52,284-Speed 3391.29 samples/sec   Loss 7.0982   LearningRate 0.0475   Epoch: 6   Global Step: 31420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:26:55,282-Speed 3416.79 samples/sec   Loss 6.9435   LearningRate 0.0475   Epoch: 6   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:26:58,295-Speed 3399.07 samples/sec   Loss 6.9404   LearningRate 0.0475   Epoch: 6   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:01,309-Speed 3398.34 samples/sec   Loss 6.9250   LearningRate 0.0475   Epoch: 6   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:04,318-Speed 3404.02 samples/sec   Loss 7.0484   LearningRate 0.0475   Epoch: 6   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:07,333-Speed 3396.80 samples/sec   Loss 7.0033   LearningRate 0.0475   Epoch: 6   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:10,346-Speed 3400.10 samples/sec   Loss 6.9191   LearningRate 0.0474   Epoch: 6   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:13,351-Speed 3408.38 samples/sec   Loss 7.0895   LearningRate 0.0474   Epoch: 6   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:16,381-Speed 3380.95 samples/sec   Loss 6.7666   LearningRate 0.0474   Epoch: 6   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:19,400-Speed 3391.88 samples/sec   Loss 7.2581   LearningRate 0.0474   Epoch: 6   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:22,419-Speed 3393.17 samples/sec   Loss 7.0292   LearningRate 0.0474   Epoch: 6   Global Step: 31520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:25,432-Speed 3400.12 samples/sec   Loss 7.0084   LearningRate 0.0474   Epoch: 6   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:27:28,447-Speed 3397.36 samples/sec   Loss 6.9645   LearningRate 0.0474   Epoch: 6   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:27:31,450-Speed 3410.28 samples/sec   Loss 7.0008   LearningRate 0.0474   Epoch: 6   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:27:34,461-Speed 3401.03 samples/sec   Loss 7.0542   LearningRate 0.0473   Epoch: 6   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:27:37,461-Speed 3414.96 samples/sec   Loss 7.1436   LearningRate 0.0473   Epoch: 6   Global Step: 31570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:40,472-Speed 3401.57 samples/sec   Loss 7.1965   LearningRate 0.0473   Epoch: 6   Global Step: 31580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:43,524-Speed 3356.43 samples/sec   Loss 7.0491   LearningRate 0.0473   Epoch: 6   Global Step: 31590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:46,544-Speed 3391.57 samples/sec   Loss 6.8491   LearningRate 0.0473   Epoch: 6   Global Step: 31600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:49,552-Speed 3404.72 samples/sec   Loss 6.9137   LearningRate 0.0473   Epoch: 6   Global Step: 31610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:52,558-Speed 3408.29 samples/sec   Loss 6.9426   LearningRate 0.0473   Epoch: 6   Global Step: 31620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:55,561-Speed 3410.44 samples/sec   Loss 7.0448   LearningRate 0.0472   Epoch: 6   Global Step: 31630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:27:58,571-Speed 3402.64 samples/sec   Loss 6.9637   LearningRate 0.0472   Epoch: 6   Global Step: 31640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:01,572-Speed 3414.04 samples/sec   Loss 6.9625   LearningRate 0.0472   Epoch: 6   Global Step: 31650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:04,600-Speed 3381.82 samples/sec   Loss 6.9797   LearningRate 0.0472   Epoch: 6   Global Step: 31660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:07,605-Speed 3408.69 samples/sec   Loss 7.0620   LearningRate 0.0472   Epoch: 6   Global Step: 31670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:28:10,606-Speed 3413.80 samples/sec   Loss 6.9900   LearningRate 0.0472   Epoch: 6   Global Step: 31680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:28:13,621-Speed 3397.29 samples/sec   Loss 6.9874   LearningRate 0.0472   Epoch: 6   Global Step: 31690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:28:16,629-Speed 3404.62 samples/sec   Loss 6.9195   LearningRate 0.0471   Epoch: 6   Global Step: 31700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:28:19,637-Speed 3405.16 samples/sec   Loss 6.9188   LearningRate 0.0471   Epoch: 6   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:28:22,624-Speed 3429.93 samples/sec   Loss 7.0118   LearningRate 0.0471   Epoch: 6   Global Step: 31720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:25,626-Speed 3411.03 samples/sec   Loss 6.9325   LearningRate 0.0471   Epoch: 6   Global Step: 31730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:28,631-Speed 3409.39 samples/sec   Loss 7.0854   LearningRate 0.0471   Epoch: 6   Global Step: 31740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:31,650-Speed 3392.15 samples/sec   Loss 6.9573   LearningRate 0.0471   Epoch: 6   Global Step: 31750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:34,673-Speed 3388.95 samples/sec   Loss 7.1172   LearningRate 0.0471   Epoch: 6   Global Step: 31760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:37,742-Speed 3337.16 samples/sec   Loss 7.0032   LearningRate 0.0471   Epoch: 6   Global Step: 31770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:40,782-Speed 3369.50 samples/sec   Loss 7.0370   LearningRate 0.0470   Epoch: 6   Global Step: 31780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:43,791-Speed 3403.54 samples/sec   Loss 7.1430   LearningRate 0.0470   Epoch: 6   Global Step: 31790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:46,795-Speed 3410.14 samples/sec   Loss 6.9084   LearningRate 0.0470   Epoch: 6   Global Step: 31800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:49,838-Speed 3365.59 samples/sec   Loss 7.1475   LearningRate 0.0470   Epoch: 6   Global Step: 31810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:28:52,866-Speed 3383.29 samples/sec   Loss 7.0754   LearningRate 0.0470   Epoch: 6   Global Step: 31820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:28:55,885-Speed 3392.20 samples/sec   Loss 6.9953   LearningRate 0.0470   Epoch: 6   Global Step: 31830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:28:58,887-Speed 3412.21 samples/sec   Loss 7.0445   LearningRate 0.0470   Epoch: 6   Global Step: 31840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:01,889-Speed 3412.21 samples/sec   Loss 6.9847   LearningRate 0.0469   Epoch: 6   Global Step: 31850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:04,919-Speed 3381.08 samples/sec   Loss 7.0800   LearningRate 0.0469   Epoch: 6   Global Step: 31860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:07,921-Speed 3411.98 samples/sec   Loss 7.1352   LearningRate 0.0469   Epoch: 6   Global Step: 31870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:10,927-Speed 3407.17 samples/sec   Loss 7.1510   LearningRate 0.0469   Epoch: 6   Global Step: 31880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:13,929-Speed 3411.73 samples/sec   Loss 7.0224   LearningRate 0.0469   Epoch: 6   Global Step: 31890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:16,944-Speed 3398.47 samples/sec   Loss 6.9635   LearningRate 0.0469   Epoch: 6   Global Step: 31900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:19,954-Speed 3402.75 samples/sec   Loss 7.0240   LearningRate 0.0469   Epoch: 6   Global Step: 31910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:22,974-Speed 3390.91 samples/sec   Loss 7.1650   LearningRate 0.0468   Epoch: 6   Global Step: 31920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:25,983-Speed 3403.95 samples/sec   Loss 7.0120   LearningRate 0.0468   Epoch: 6   Global Step: 31930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:29:28,991-Speed 3405.52 samples/sec   Loss 6.9862   LearningRate 0.0468   Epoch: 6   Global Step: 31940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:29:32,036-Speed 3364.05 samples/sec   Loss 7.1778   LearningRate 0.0468   Epoch: 6   Global Step: 31950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:29:35,054-Speed 3393.64 samples/sec   Loss 6.9573   LearningRate 0.0468   Epoch: 6   Global Step: 31960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:29:38,058-Speed 3409.91 samples/sec   Loss 7.0964   LearningRate 0.0468   Epoch: 6   Global Step: 31970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:29:41,061-Speed 3410.09 samples/sec   Loss 6.9642   LearningRate 0.0468   Epoch: 6   Global Step: 31980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:29:44,075-Speed 3398.29 samples/sec   Loss 6.9673   LearningRate 0.0468   Epoch: 6   Global Step: 31990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:29:47,077-Speed 3412.27 samples/sec   Loss 6.9834   LearningRate 0.0467   Epoch: 6   Global Step: 32000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:30:31,753-[lfw][32000]XNorm: 24.251113
Training: 2022-04-11 02:30:31,754-[lfw][32000]Accuracy-Flip: 0.99717+-0.00342
Training: 2022-04-11 02:30:31,754-[lfw][32000]Accuracy-Highest: 0.99717
Training: 2022-04-11 02:31:23,023-[cfp_fp][32000]XNorm: 21.559307
Training: 2022-04-11 02:31:23,024-[cfp_fp][32000]Accuracy-Flip: 0.97057+-0.01023
Training: 2022-04-11 02:31:23,025-[cfp_fp][32000]Accuracy-Highest: 0.97057
Training: 2022-04-11 02:32:07,986-[agedb_30][32000]XNorm: 23.669438
Training: 2022-04-11 02:32:07,986-[agedb_30][32000]Accuracy-Flip: 0.97567+-0.00797
Training: 2022-04-11 02:32:07,987-[agedb_30][32000]Accuracy-Highest: 0.97750
Training: 2022-04-11 02:32:10,976-Speed 71.16 samples/sec   Loss 7.0603   LearningRate 0.0467   Epoch: 6   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:13,958-Speed 3434.48 samples/sec   Loss 7.0901   LearningRate 0.0467   Epoch: 6   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:16,952-Speed 3421.46 samples/sec   Loss 6.9353   LearningRate 0.0467   Epoch: 6   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:19,941-Speed 3426.76 samples/sec   Loss 6.9643   LearningRate 0.0467   Epoch: 6   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:22,928-Speed 3429.63 samples/sec   Loss 7.0314   LearningRate 0.0467   Epoch: 6   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:25,926-Speed 3416.36 samples/sec   Loss 7.0558   LearningRate 0.0467   Epoch: 6   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:28,915-Speed 3426.73 samples/sec   Loss 7.2993   LearningRate 0.0466   Epoch: 6   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:31,902-Speed 3429.25 samples/sec   Loss 6.9969   LearningRate 0.0466   Epoch: 6   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:34,887-Speed 3431.62 samples/sec   Loss 6.9800   LearningRate 0.0466   Epoch: 6   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:37,882-Speed 3419.84 samples/sec   Loss 6.9485   LearningRate 0.0466   Epoch: 6   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:40,871-Speed 3427.11 samples/sec   Loss 7.0474   LearningRate 0.0466   Epoch: 6   Global Step: 32110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:32:43,876-Speed 3407.55 samples/sec   Loss 6.9531   LearningRate 0.0466   Epoch: 6   Global Step: 32120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:32:46,876-Speed 3414.97 samples/sec   Loss 6.9982   LearningRate 0.0466   Epoch: 6   Global Step: 32130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:32:49,869-Speed 3421.92 samples/sec   Loss 7.1346   LearningRate 0.0466   Epoch: 6   Global Step: 32140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:32:52,889-Speed 3391.74 samples/sec   Loss 7.0539   LearningRate 0.0465   Epoch: 6   Global Step: 32150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:32:55,893-Speed 3409.94 samples/sec   Loss 6.9007   LearningRate 0.0465   Epoch: 6   Global Step: 32160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:32:58,891-Speed 3416.58 samples/sec   Loss 7.0852   LearningRate 0.0465   Epoch: 6   Global Step: 32170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:01,888-Speed 3417.89 samples/sec   Loss 6.9931   LearningRate 0.0465   Epoch: 6   Global Step: 32180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:04,899-Speed 3401.65 samples/sec   Loss 7.1510   LearningRate 0.0465   Epoch: 6   Global Step: 32190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:07,901-Speed 3411.90 samples/sec   Loss 7.1407   LearningRate 0.0465   Epoch: 6   Global Step: 32200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:10,899-Speed 3416.35 samples/sec   Loss 7.0500   LearningRate 0.0465   Epoch: 6   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:13,901-Speed 3411.83 samples/sec   Loss 6.9931   LearningRate 0.0464   Epoch: 6   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:16,909-Speed 3406.03 samples/sec   Loss 7.0811   LearningRate 0.0464   Epoch: 6   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:19,917-Speed 3404.58 samples/sec   Loss 7.1109   LearningRate 0.0464   Epoch: 6   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:22,929-Speed 3400.72 samples/sec   Loss 7.1253   LearningRate 0.0464   Epoch: 6   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:25,980-Speed 3357.74 samples/sec   Loss 7.0478   LearningRate 0.0464   Epoch: 6   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:28,989-Speed 3404.04 samples/sec   Loss 6.8741   LearningRate 0.0464   Epoch: 6   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:31,992-Speed 3410.93 samples/sec   Loss 7.0184   LearningRate 0.0464   Epoch: 6   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:34,996-Speed 3409.59 samples/sec   Loss 7.0449   LearningRate 0.0463   Epoch: 6   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:37,997-Speed 3412.87 samples/sec   Loss 6.9015   LearningRate 0.0463   Epoch: 6   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:41,014-Speed 3395.57 samples/sec   Loss 7.1120   LearningRate 0.0463   Epoch: 6   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:44,014-Speed 3413.50 samples/sec   Loss 7.0569   LearningRate 0.0463   Epoch: 6   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:47,028-Speed 3399.56 samples/sec   Loss 7.0043   LearningRate 0.0463   Epoch: 6   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:50,050-Speed 3388.87 samples/sec   Loss 6.9778   LearningRate 0.0463   Epoch: 6   Global Step: 32340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:53,050-Speed 3414.29 samples/sec   Loss 7.0867   LearningRate 0.0463   Epoch: 6   Global Step: 32350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:33:56,054-Speed 3409.78 samples/sec   Loss 6.9946   LearningRate 0.0463   Epoch: 6   Global Step: 32360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:33:59,033-Speed 3438.57 samples/sec   Loss 7.0807   LearningRate 0.0462   Epoch: 6   Global Step: 32370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:02,049-Speed 3395.80 samples/sec   Loss 7.0877   LearningRate 0.0462   Epoch: 6   Global Step: 32380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:05,176-Speed 3276.53 samples/sec   Loss 7.0309   LearningRate 0.0462   Epoch: 6   Global Step: 32390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:08,198-Speed 3389.60 samples/sec   Loss 7.1399   LearningRate 0.0462   Epoch: 6   Global Step: 32400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:11,195-Speed 3417.71 samples/sec   Loss 6.9550   LearningRate 0.0462   Epoch: 6   Global Step: 32410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:14,193-Speed 3416.62 samples/sec   Loss 6.9338   LearningRate 0.0462   Epoch: 6   Global Step: 32420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:17,191-Speed 3416.29 samples/sec   Loss 7.2092   LearningRate 0.0462   Epoch: 6   Global Step: 32430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:20,190-Speed 3415.66 samples/sec   Loss 6.9784   LearningRate 0.0461   Epoch: 6   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:23,192-Speed 3411.28 samples/sec   Loss 7.0111   LearningRate 0.0461   Epoch: 6   Global Step: 32450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:26,189-Speed 3418.07 samples/sec   Loss 7.0865   LearningRate 0.0461   Epoch: 6   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:29,207-Speed 3394.25 samples/sec   Loss 7.0715   LearningRate 0.0461   Epoch: 6   Global Step: 32470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:34:32,206-Speed 3415.30 samples/sec   Loss 7.0103   LearningRate 0.0461   Epoch: 6   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:35,205-Speed 3415.08 samples/sec   Loss 7.1890   LearningRate 0.0461   Epoch: 6   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:38,210-Speed 3408.33 samples/sec   Loss 7.0172   LearningRate 0.0461   Epoch: 6   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:41,225-Speed 3398.50 samples/sec   Loss 7.0052   LearningRate 0.0461   Epoch: 6   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:44,221-Speed 3418.07 samples/sec   Loss 7.0221   LearningRate 0.0460   Epoch: 6   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:47,217-Speed 3418.89 samples/sec   Loss 7.1079   LearningRate 0.0460   Epoch: 6   Global Step: 32530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:50,213-Speed 3419.20 samples/sec   Loss 7.0597   LearningRate 0.0460   Epoch: 6   Global Step: 32540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:53,252-Speed 3370.30 samples/sec   Loss 6.9978   LearningRate 0.0460   Epoch: 6   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:56,338-Speed 3319.18 samples/sec   Loss 6.9725   LearningRate 0.0460   Epoch: 6   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:34:59,319-Speed 3435.97 samples/sec   Loss 6.9244   LearningRate 0.0460   Epoch: 6   Global Step: 32570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:02,317-Speed 3417.00 samples/sec   Loss 7.1396   LearningRate 0.0460   Epoch: 6   Global Step: 32580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:05,317-Speed 3414.12 samples/sec   Loss 6.8904   LearningRate 0.0459   Epoch: 6   Global Step: 32590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:08,317-Speed 3414.56 samples/sec   Loss 6.9766   LearningRate 0.0459   Epoch: 6   Global Step: 32600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:11,314-Speed 3417.04 samples/sec   Loss 7.0390   LearningRate 0.0459   Epoch: 6   Global Step: 32610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:14,312-Speed 3416.73 samples/sec   Loss 6.9422   LearningRate 0.0459   Epoch: 6   Global Step: 32620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:17,321-Speed 3403.71 samples/sec   Loss 6.8198   LearningRate 0.0459   Epoch: 6   Global Step: 32630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:20,319-Speed 3417.22 samples/sec   Loss 6.9699   LearningRate 0.0459   Epoch: 6   Global Step: 32640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:23,356-Speed 3373.08 samples/sec   Loss 7.0855   LearningRate 0.0459   Epoch: 6   Global Step: 32650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:26,376-Speed 3391.74 samples/sec   Loss 7.1178   LearningRate 0.0459   Epoch: 6   Global Step: 32660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:35:29,379-Speed 3410.74 samples/sec   Loss 7.0608   LearningRate 0.0458   Epoch: 6   Global Step: 32670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:32,391-Speed 3399.91 samples/sec   Loss 6.8310   LearningRate 0.0458   Epoch: 6   Global Step: 32680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:35,393-Speed 3412.93 samples/sec   Loss 7.1634   LearningRate 0.0458   Epoch: 6   Global Step: 32690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:38,395-Speed 3411.99 samples/sec   Loss 7.0867   LearningRate 0.0458   Epoch: 6   Global Step: 32700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:41,396-Speed 3413.14 samples/sec   Loss 7.0963   LearningRate 0.0458   Epoch: 6   Global Step: 32710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:44,396-Speed 3414.50 samples/sec   Loss 7.0106   LearningRate 0.0458   Epoch: 6   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:47,396-Speed 3413.70 samples/sec   Loss 7.0455   LearningRate 0.0458   Epoch: 6   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:50,409-Speed 3399.95 samples/sec   Loss 6.9080   LearningRate 0.0457   Epoch: 6   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:53,412-Speed 3410.37 samples/sec   Loss 7.0390   LearningRate 0.0457   Epoch: 6   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:56,413-Speed 3413.18 samples/sec   Loss 6.9751   LearningRate 0.0457   Epoch: 6   Global Step: 32760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:35:59,401-Speed 3427.80 samples/sec   Loss 7.0066   LearningRate 0.0457   Epoch: 6   Global Step: 32770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:02,404-Speed 3411.20 samples/sec   Loss 7.2255   LearningRate 0.0457   Epoch: 6   Global Step: 32780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:05,407-Speed 3411.60 samples/sec   Loss 6.9074   LearningRate 0.0457   Epoch: 6   Global Step: 32790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:08,405-Speed 3416.24 samples/sec   Loss 7.0669   LearningRate 0.0457   Epoch: 6   Global Step: 32800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:11,419-Speed 3397.46 samples/sec   Loss 6.9675   LearningRate 0.0457   Epoch: 6   Global Step: 32810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:14,424-Speed 3409.41 samples/sec   Loss 7.0120   LearningRate 0.0456   Epoch: 6   Global Step: 32820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:17,426-Speed 3412.26 samples/sec   Loss 7.1515   LearningRate 0.0456   Epoch: 6   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:20,423-Speed 3417.91 samples/sec   Loss 6.9263   LearningRate 0.0456   Epoch: 6   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:23,421-Speed 3415.63 samples/sec   Loss 6.9851   LearningRate 0.0456   Epoch: 6   Global Step: 32850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:26,429-Speed 3404.80 samples/sec   Loss 7.0193   LearningRate 0.0456   Epoch: 6   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:29,453-Speed 3388.50 samples/sec   Loss 7.1124   LearningRate 0.0456   Epoch: 6   Global Step: 32870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:36:32,461-Speed 3404.65 samples/sec   Loss 7.2273   LearningRate 0.0456   Epoch: 6   Global Step: 32880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:36:35,444-Speed 3434.32 samples/sec   Loss 7.0682   LearningRate 0.0455   Epoch: 6   Global Step: 32890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:38,445-Speed 3412.01 samples/sec   Loss 7.1304   LearningRate 0.0455   Epoch: 6   Global Step: 32900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:41,448-Speed 3411.56 samples/sec   Loss 6.8694   LearningRate 0.0455   Epoch: 6   Global Step: 32910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:44,449-Speed 3412.84 samples/sec   Loss 7.0216   LearningRate 0.0455   Epoch: 6   Global Step: 32920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:47,447-Speed 3416.93 samples/sec   Loss 6.9870   LearningRate 0.0455   Epoch: 6   Global Step: 32930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:50,448-Speed 3413.30 samples/sec   Loss 6.8657   LearningRate 0.0455   Epoch: 6   Global Step: 32940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:53,464-Speed 3395.12 samples/sec   Loss 6.9753   LearningRate 0.0455   Epoch: 6   Global Step: 32950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:56,469-Speed 3408.65 samples/sec   Loss 7.2357   LearningRate 0.0455   Epoch: 6   Global Step: 32960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:36:59,471-Speed 3412.51 samples/sec   Loss 6.9005   LearningRate 0.0454   Epoch: 6   Global Step: 32970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:02,501-Speed 3379.89 samples/sec   Loss 6.9470   LearningRate 0.0454   Epoch: 6   Global Step: 32980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:05,502-Speed 3413.25 samples/sec   Loss 6.9887   LearningRate 0.0454   Epoch: 6   Global Step: 32990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:08,511-Speed 3403.95 samples/sec   Loss 7.0278   LearningRate 0.0454   Epoch: 6   Global Step: 33000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:11,510-Speed 3415.90 samples/sec   Loss 6.8820   LearningRate 0.0454   Epoch: 6   Global Step: 33010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:14,509-Speed 3415.47 samples/sec   Loss 6.9603   LearningRate 0.0454   Epoch: 6   Global Step: 33020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:17,509-Speed 3413.70 samples/sec   Loss 6.7644   LearningRate 0.0454   Epoch: 6   Global Step: 33030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:20,515-Speed 3408.25 samples/sec   Loss 7.0835   LearningRate 0.0453   Epoch: 6   Global Step: 33040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:23,536-Speed 3389.95 samples/sec   Loss 6.9687   LearningRate 0.0453   Epoch: 6   Global Step: 33050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:26,648-Speed 3291.40 samples/sec   Loss 7.0444   LearningRate 0.0453   Epoch: 6   Global Step: 33060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:29,651-Speed 3410.95 samples/sec   Loss 7.0598   LearningRate 0.0453   Epoch: 6   Global Step: 33070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:32,650-Speed 3414.60 samples/sec   Loss 6.8764   LearningRate 0.0453   Epoch: 6   Global Step: 33080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:35,634-Speed 3433.17 samples/sec   Loss 7.0009   LearningRate 0.0453   Epoch: 6   Global Step: 33090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:38,638-Speed 3409.28 samples/sec   Loss 7.0919   LearningRate 0.0453   Epoch: 6   Global Step: 33100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:37:41,622-Speed 3432.88 samples/sec   Loss 7.0442   LearningRate 0.0453   Epoch: 6   Global Step: 33110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:44,618-Speed 3418.92 samples/sec   Loss 6.9576   LearningRate 0.0452   Epoch: 6   Global Step: 33120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:47,623-Speed 3408.51 samples/sec   Loss 7.0320   LearningRate 0.0452   Epoch: 6   Global Step: 33130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:50,623-Speed 3414.64 samples/sec   Loss 7.0364   LearningRate 0.0452   Epoch: 6   Global Step: 33140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:53,624-Speed 3412.42 samples/sec   Loss 7.0022   LearningRate 0.0452   Epoch: 6   Global Step: 33150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:56,630-Speed 3407.59 samples/sec   Loss 7.0403   LearningRate 0.0452   Epoch: 6   Global Step: 33160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:37:59,638-Speed 3405.15 samples/sec   Loss 7.1427   LearningRate 0.0452   Epoch: 6   Global Step: 33170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:38:02,647-Speed 3403.92 samples/sec   Loss 7.0347   LearningRate 0.0452   Epoch: 6   Global Step: 33180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:38:05,665-Speed 3394.04 samples/sec   Loss 7.0464   LearningRate 0.0451   Epoch: 6   Global Step: 33190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:38:08,670-Speed 3408.74 samples/sec   Loss 6.9957   LearningRate 0.0451   Epoch: 6   Global Step: 33200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:38:11,666-Speed 3417.91 samples/sec   Loss 7.1292   LearningRate 0.0451   Epoch: 6   Global Step: 33210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:14,667-Speed 3414.09 samples/sec   Loss 7.0895   LearningRate 0.0451   Epoch: 6   Global Step: 33220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:17,700-Speed 3375.94 samples/sec   Loss 7.0054   LearningRate 0.0451   Epoch: 6   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:20,701-Speed 3413.57 samples/sec   Loss 6.8486   LearningRate 0.0451   Epoch: 6   Global Step: 33240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:23,710-Speed 3404.76 samples/sec   Loss 6.9449   LearningRate 0.0451   Epoch: 6   Global Step: 33250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:26,716-Speed 3407.15 samples/sec   Loss 6.9350   LearningRate 0.0451   Epoch: 6   Global Step: 33260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:29,716-Speed 3414.49 samples/sec   Loss 7.1929   LearningRate 0.0450   Epoch: 6   Global Step: 33270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:32,730-Speed 3398.26 samples/sec   Loss 6.8320   LearningRate 0.0450   Epoch: 6   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:35,731-Speed 3413.14 samples/sec   Loss 7.0418   LearningRate 0.0450   Epoch: 6   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:38,731-Speed 3414.42 samples/sec   Loss 7.2367   LearningRate 0.0450   Epoch: 6   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:41,712-Speed 3435.21 samples/sec   Loss 7.0376   LearningRate 0.0450   Epoch: 6   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:44,718-Speed 3408.12 samples/sec   Loss 6.8260   LearningRate 0.0450   Epoch: 6   Global Step: 33320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:47,721-Speed 3410.33 samples/sec   Loss 6.9688   LearningRate 0.0450   Epoch: 6   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:50,718-Speed 3418.09 samples/sec   Loss 7.0408   LearningRate 0.0449   Epoch: 6   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:53,726-Speed 3405.80 samples/sec   Loss 7.1424   LearningRate 0.0449   Epoch: 6   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:56,723-Speed 3417.72 samples/sec   Loss 6.9759   LearningRate 0.0449   Epoch: 6   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:38:59,729-Speed 3406.87 samples/sec   Loss 7.1042   LearningRate 0.0449   Epoch: 6   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:39:02,766-Speed 3372.64 samples/sec   Loss 6.9330   LearningRate 0.0449   Epoch: 6   Global Step: 33380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:39:05,761-Speed 3420.81 samples/sec   Loss 6.7784   LearningRate 0.0449   Epoch: 6   Global Step: 33390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:08,763-Speed 3412.20 samples/sec   Loss 7.0634   LearningRate 0.0449   Epoch: 6   Global Step: 33400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:11,769-Speed 3407.10 samples/sec   Loss 7.0007   LearningRate 0.0449   Epoch: 6   Global Step: 33410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:14,766-Speed 3417.34 samples/sec   Loss 6.7962   LearningRate 0.0448   Epoch: 6   Global Step: 33420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:17,785-Speed 3392.28 samples/sec   Loss 6.9160   LearningRate 0.0448   Epoch: 6   Global Step: 33430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:20,787-Speed 3412.20 samples/sec   Loss 7.0928   LearningRate 0.0448   Epoch: 6   Global Step: 33440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:23,801-Speed 3399.16 samples/sec   Loss 7.0606   LearningRate 0.0448   Epoch: 6   Global Step: 33450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:26,805-Speed 3408.63 samples/sec   Loss 6.8696   LearningRate 0.0448   Epoch: 6   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:29,809-Speed 3409.67 samples/sec   Loss 7.0196   LearningRate 0.0448   Epoch: 6   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:32,816-Speed 3406.35 samples/sec   Loss 6.9083   LearningRate 0.0448   Epoch: 6   Global Step: 33480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:35,815-Speed 3415.42 samples/sec   Loss 6.9869   LearningRate 0.0447   Epoch: 6   Global Step: 33490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:39:38,821-Speed 3407.66 samples/sec   Loss 6.9363   LearningRate 0.0447   Epoch: 6   Global Step: 33500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:39:41,818-Speed 3416.81 samples/sec   Loss 6.6461   LearningRate 0.0447   Epoch: 6   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:44,829-Speed 3402.59 samples/sec   Loss 6.9412   LearningRate 0.0447   Epoch: 6   Global Step: 33520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:47,857-Speed 3382.71 samples/sec   Loss 6.8683   LearningRate 0.0447   Epoch: 6   Global Step: 33530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:50,888-Speed 3379.84 samples/sec   Loss 6.9104   LearningRate 0.0447   Epoch: 6   Global Step: 33540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:53,921-Speed 3375.78 samples/sec   Loss 6.8435   LearningRate 0.0447   Epoch: 6   Global Step: 33550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:56,926-Speed 3409.01 samples/sec   Loss 6.9069   LearningRate 0.0447   Epoch: 6   Global Step: 33560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:39:59,923-Speed 3417.37 samples/sec   Loss 6.9309   LearningRate 0.0446   Epoch: 6   Global Step: 33570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:02,924-Speed 3413.32 samples/sec   Loss 7.0541   LearningRate 0.0446   Epoch: 6   Global Step: 33580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:05,924-Speed 3414.56 samples/sec   Loss 7.0744   LearningRate 0.0446   Epoch: 6   Global Step: 33590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:08,936-Speed 3400.43 samples/sec   Loss 7.0243   LearningRate 0.0446   Epoch: 6   Global Step: 33600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:11,946-Speed 3402.92 samples/sec   Loss 6.9988   LearningRate 0.0446   Epoch: 6   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:14,945-Speed 3415.64 samples/sec   Loss 6.8876   LearningRate 0.0446   Epoch: 6   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:17,946-Speed 3412.65 samples/sec   Loss 6.9989   LearningRate 0.0446   Epoch: 6   Global Step: 33630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:20,982-Speed 3374.43 samples/sec   Loss 6.8537   LearningRate 0.0445   Epoch: 6   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:23,980-Speed 3416.18 samples/sec   Loss 6.9997   LearningRate 0.0445   Epoch: 6   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:26,985-Speed 3408.62 samples/sec   Loss 7.1859   LearningRate 0.0445   Epoch: 6   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:30,151-Speed 3235.81 samples/sec   Loss 6.9085   LearningRate 0.0445   Epoch: 6   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:33,192-Speed 3368.10 samples/sec   Loss 6.9794   LearningRate 0.0445   Epoch: 6   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:36,193-Speed 3413.53 samples/sec   Loss 6.8710   LearningRate 0.0445   Epoch: 6   Global Step: 33690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:39,195-Speed 3411.21 samples/sec   Loss 6.9430   LearningRate 0.0445   Epoch: 6   Global Step: 33700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:42,202-Speed 3406.48 samples/sec   Loss 6.9791   LearningRate 0.0445   Epoch: 6   Global Step: 33710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:40:45,183-Speed 3436.95 samples/sec   Loss 6.7964   LearningRate 0.0444   Epoch: 6   Global Step: 33720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:48,183-Speed 3413.56 samples/sec   Loss 7.0215   LearningRate 0.0444   Epoch: 6   Global Step: 33730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:51,197-Speed 3398.27 samples/sec   Loss 6.9212   LearningRate 0.0444   Epoch: 6   Global Step: 33740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:54,201-Speed 3410.77 samples/sec   Loss 6.9546   LearningRate 0.0444   Epoch: 6   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:40:57,201-Speed 3413.24 samples/sec   Loss 6.8991   LearningRate 0.0444   Epoch: 6   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:00,218-Speed 3395.53 samples/sec   Loss 6.9572   LearningRate 0.0444   Epoch: 6   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:03,241-Speed 3388.86 samples/sec   Loss 7.0678   LearningRate 0.0444   Epoch: 6   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:06,244-Speed 3410.18 samples/sec   Loss 6.9590   LearningRate 0.0444   Epoch: 6   Global Step: 33790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:09,250-Speed 3407.23 samples/sec   Loss 7.0040   LearningRate 0.0443   Epoch: 6   Global Step: 33800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:12,261-Speed 3402.64 samples/sec   Loss 6.9821   LearningRate 0.0443   Epoch: 6   Global Step: 33810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:15,270-Speed 3403.40 samples/sec   Loss 6.9963   LearningRate 0.0443   Epoch: 6   Global Step: 33820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:41:18,340-Speed 3336.69 samples/sec   Loss 6.7767   LearningRate 0.0443   Epoch: 6   Global Step: 33830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:21,369-Speed 3381.65 samples/sec   Loss 6.8697   LearningRate 0.0443   Epoch: 6   Global Step: 33840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:24,368-Speed 3415.14 samples/sec   Loss 6.7946   LearningRate 0.0443   Epoch: 6   Global Step: 33850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:27,373-Speed 3408.59 samples/sec   Loss 6.8765   LearningRate 0.0443   Epoch: 6   Global Step: 33860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:30,379-Speed 3407.88 samples/sec   Loss 6.9247   LearningRate 0.0442   Epoch: 6   Global Step: 33870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:33,418-Speed 3370.25 samples/sec   Loss 6.9150   LearningRate 0.0442   Epoch: 6   Global Step: 33880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:36,423-Speed 3409.03 samples/sec   Loss 6.9565   LearningRate 0.0442   Epoch: 6   Global Step: 33890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:39,430-Speed 3405.29 samples/sec   Loss 6.9089   LearningRate 0.0442   Epoch: 6   Global Step: 33900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:42,443-Speed 3400.18 samples/sec   Loss 6.9171   LearningRate 0.0442   Epoch: 6   Global Step: 33910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:45,443-Speed 3414.46 samples/sec   Loss 6.8557   LearningRate 0.0442   Epoch: 6   Global Step: 33920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:41:48,445-Speed 3412.01 samples/sec   Loss 6.9884   LearningRate 0.0442   Epoch: 6   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:41:51,444-Speed 3415.28 samples/sec   Loss 6.8366   LearningRate 0.0442   Epoch: 6   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:41:54,514-Speed 3336.71 samples/sec   Loss 6.9998   LearningRate 0.0441   Epoch: 6   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:41:57,513-Speed 3415.25 samples/sec   Loss 6.7315   LearningRate 0.0441   Epoch: 6   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:42:00,515-Speed 3412.37 samples/sec   Loss 6.8960   LearningRate 0.0441   Epoch: 6   Global Step: 33970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:42:03,540-Speed 3386.17 samples/sec   Loss 6.8718   LearningRate 0.0441   Epoch: 6   Global Step: 33980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:42:06,542-Speed 3412.02 samples/sec   Loss 7.0745   LearningRate 0.0441   Epoch: 6   Global Step: 33990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:42:09,544-Speed 3412.14 samples/sec   Loss 6.9817   LearningRate 0.0441   Epoch: 6   Global Step: 34000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:42:54,245-[lfw][34000]XNorm: 21.288355
Training: 2022-04-11 02:42:54,245-[lfw][34000]Accuracy-Flip: 0.99733+-0.00186
Training: 2022-04-11 02:42:54,246-[lfw][34000]Accuracy-Highest: 0.99733
Training: 2022-04-11 02:43:45,950-[cfp_fp][34000]XNorm: 19.032494
Training: 2022-04-11 02:43:45,950-[cfp_fp][34000]Accuracy-Flip: 0.96771+-0.00865
Training: 2022-04-11 02:43:45,951-[cfp_fp][34000]Accuracy-Highest: 0.97057
Training: 2022-04-11 02:44:30,339-[agedb_30][34000]XNorm: 21.578682
Training: 2022-04-11 02:44:30,340-[agedb_30][34000]Accuracy-Flip: 0.97650+-0.00740
Training: 2022-04-11 02:44:30,340-[agedb_30][34000]Accuracy-Highest: 0.97750
Training: 2022-04-11 02:44:33,359-Speed 71.20 samples/sec   Loss 6.8706   LearningRate 0.0441   Epoch: 6   Global Step: 34010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:36,346-Speed 3428.63 samples/sec   Loss 6.8656   LearningRate 0.0440   Epoch: 6   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:39,322-Speed 3442.17 samples/sec   Loss 6.9866   LearningRate 0.0440   Epoch: 6   Global Step: 34030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:42,315-Speed 3421.80 samples/sec   Loss 7.0167   LearningRate 0.0440   Epoch: 6   Global Step: 34040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:45,305-Speed 3425.56 samples/sec   Loss 6.9783   LearningRate 0.0440   Epoch: 6   Global Step: 34050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:48,299-Speed 3422.09 samples/sec   Loss 6.8672   LearningRate 0.0440   Epoch: 6   Global Step: 34060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:51,319-Speed 3390.49 samples/sec   Loss 7.0306   LearningRate 0.0440   Epoch: 6   Global Step: 34070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:54,437-Speed 3285.51 samples/sec   Loss 6.8644   LearningRate 0.0440   Epoch: 6   Global Step: 34080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:44:57,442-Speed 3408.73 samples/sec   Loss 7.0510   LearningRate 0.0440   Epoch: 6   Global Step: 34090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:00,451-Speed 3403.77 samples/sec   Loss 6.8768   LearningRate 0.0439   Epoch: 6   Global Step: 34100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:03,449-Speed 3416.83 samples/sec   Loss 6.8815   LearningRate 0.0439   Epoch: 6   Global Step: 34110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:06,443-Speed 3420.99 samples/sec   Loss 6.9854   LearningRate 0.0439   Epoch: 6   Global Step: 34120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:09,417-Speed 3444.28 samples/sec   Loss 6.9022   LearningRate 0.0439   Epoch: 6   Global Step: 34130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:12,414-Speed 3418.14 samples/sec   Loss 6.9303   LearningRate 0.0439   Epoch: 6   Global Step: 34140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:15,429-Speed 3396.38 samples/sec   Loss 6.8531   LearningRate 0.0439   Epoch: 6   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:18,432-Speed 3411.10 samples/sec   Loss 7.0344   LearningRate 0.0439   Epoch: 6   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:45:21,408-Speed 3442.31 samples/sec   Loss 6.9760   LearningRate 0.0439   Epoch: 6   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:45:24,384-Speed 3441.15 samples/sec   Loss 6.9399   LearningRate 0.0438   Epoch: 6   Global Step: 34180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:27,399-Speed 3397.37 samples/sec   Loss 6.9242   LearningRate 0.0438   Epoch: 6   Global Step: 34190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:30,394-Speed 3419.96 samples/sec   Loss 6.7526   LearningRate 0.0438   Epoch: 6   Global Step: 34200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:33,388-Speed 3421.74 samples/sec   Loss 6.8754   LearningRate 0.0438   Epoch: 6   Global Step: 34210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:36,381-Speed 3422.17 samples/sec   Loss 6.8086   LearningRate 0.0438   Epoch: 6   Global Step: 34220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:39,385-Speed 3409.16 samples/sec   Loss 6.8958   LearningRate 0.0438   Epoch: 6   Global Step: 34230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:42,383-Speed 3417.22 samples/sec   Loss 6.7048   LearningRate 0.0438   Epoch: 6   Global Step: 34240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:45,383-Speed 3413.76 samples/sec   Loss 6.8463   LearningRate 0.0437   Epoch: 6   Global Step: 34250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:48,471-Speed 3317.12 samples/sec   Loss 6.7791   LearningRate 0.0437   Epoch: 6   Global Step: 34260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:51,466-Speed 3420.58 samples/sec   Loss 6.9527   LearningRate 0.0437   Epoch: 6   Global Step: 34270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:45:54,467-Speed 3412.93 samples/sec   Loss 6.8384   LearningRate 0.0437   Epoch: 6   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:45:57,475-Speed 3405.62 samples/sec   Loss 6.9797   LearningRate 0.0437   Epoch: 6   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:00,471-Speed 3418.97 samples/sec   Loss 6.7500   LearningRate 0.0437   Epoch: 6   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:03,485-Speed 3398.36 samples/sec   Loss 6.8950   LearningRate 0.0437   Epoch: 6   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:06,501-Speed 3396.76 samples/sec   Loss 6.8918   LearningRate 0.0437   Epoch: 6   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:09,502-Speed 3412.16 samples/sec   Loss 7.0267   LearningRate 0.0436   Epoch: 6   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:12,503-Speed 3413.94 samples/sec   Loss 7.0018   LearningRate 0.0436   Epoch: 6   Global Step: 34340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:15,495-Speed 3423.02 samples/sec   Loss 6.9221   LearningRate 0.0436   Epoch: 6   Global Step: 34350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:18,508-Speed 3400.43 samples/sec   Loss 6.9588   LearningRate 0.0436   Epoch: 6   Global Step: 34360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:21,506-Speed 3416.13 samples/sec   Loss 6.8129   LearningRate 0.0436   Epoch: 6   Global Step: 34370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:24,506-Speed 3414.19 samples/sec   Loss 7.0657   LearningRate 0.0436   Epoch: 6   Global Step: 34380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:27,553-Speed 3362.41 samples/sec   Loss 6.9462   LearningRate 0.0436   Epoch: 6   Global Step: 34390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:30,562-Speed 3403.70 samples/sec   Loss 6.9997   LearningRate 0.0436   Epoch: 6   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:33,556-Speed 3420.58 samples/sec   Loss 7.0588   LearningRate 0.0435   Epoch: 6   Global Step: 34410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:36,553-Speed 3418.49 samples/sec   Loss 6.8210   LearningRate 0.0435   Epoch: 6   Global Step: 34420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:39,551-Speed 3416.01 samples/sec   Loss 6.9472   LearningRate 0.0435   Epoch: 6   Global Step: 34430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:42,548-Speed 3417.80 samples/sec   Loss 6.8983   LearningRate 0.0435   Epoch: 6   Global Step: 34440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:45,548-Speed 3414.15 samples/sec   Loss 6.8208   LearningRate 0.0435   Epoch: 6   Global Step: 34450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:48,555-Speed 3407.39 samples/sec   Loss 6.9031   LearningRate 0.0435   Epoch: 6   Global Step: 34460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:51,561-Speed 3407.37 samples/sec   Loss 6.8352   LearningRate 0.0435   Epoch: 6   Global Step: 34470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:46:54,571-Speed 3402.04 samples/sec   Loss 6.9446   LearningRate 0.0434   Epoch: 6   Global Step: 34480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:46:57,548-Speed 3441.03 samples/sec   Loss 6.7550   LearningRate 0.0434   Epoch: 6   Global Step: 34490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:00,544-Speed 3419.56 samples/sec   Loss 6.7764   LearningRate 0.0434   Epoch: 6   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:03,543-Speed 3414.97 samples/sec   Loss 6.7780   LearningRate 0.0434   Epoch: 6   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:06,550-Speed 3405.69 samples/sec   Loss 6.8201   LearningRate 0.0434   Epoch: 6   Global Step: 34520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:09,548-Speed 3416.98 samples/sec   Loss 6.8963   LearningRate 0.0434   Epoch: 6   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:12,553-Speed 3408.56 samples/sec   Loss 6.8698   LearningRate 0.0434   Epoch: 6   Global Step: 34540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:15,554-Speed 3412.78 samples/sec   Loss 6.8873   LearningRate 0.0434   Epoch: 6   Global Step: 34550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:18,609-Speed 3352.65 samples/sec   Loss 6.8364   LearningRate 0.0433   Epoch: 6   Global Step: 34560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:21,611-Speed 3412.36 samples/sec   Loss 6.7851   LearningRate 0.0433   Epoch: 6   Global Step: 34570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:24,613-Speed 3412.05 samples/sec   Loss 6.8270   LearningRate 0.0433   Epoch: 6   Global Step: 34580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:27,617-Speed 3410.59 samples/sec   Loss 6.8285   LearningRate 0.0433   Epoch: 6   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:47:30,599-Speed 3433.97 samples/sec   Loss 6.8552   LearningRate 0.0433   Epoch: 6   Global Step: 34600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:33,600-Speed 3413.78 samples/sec   Loss 6.8319   LearningRate 0.0433   Epoch: 6   Global Step: 34610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:36,602-Speed 3412.65 samples/sec   Loss 6.8491   LearningRate 0.0433   Epoch: 6   Global Step: 34620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:39,598-Speed 3418.05 samples/sec   Loss 6.8673   LearningRate 0.0433   Epoch: 6   Global Step: 34630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:42,599-Speed 3412.76 samples/sec   Loss 6.7977   LearningRate 0.0432   Epoch: 6   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:45,597-Speed 3416.89 samples/sec   Loss 6.8729   LearningRate 0.0432   Epoch: 6   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:48,600-Speed 3411.24 samples/sec   Loss 6.9714   LearningRate 0.0432   Epoch: 6   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:51,616-Speed 3395.62 samples/sec   Loss 6.9780   LearningRate 0.0432   Epoch: 6   Global Step: 34670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:54,615-Speed 3415.60 samples/sec   Loss 6.8577   LearningRate 0.0432   Epoch: 6   Global Step: 34680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:47:57,616-Speed 3412.69 samples/sec   Loss 6.9390   LearningRate 0.0432   Epoch: 6   Global Step: 34690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:00,616-Speed 3413.91 samples/sec   Loss 6.8856   LearningRate 0.0432   Epoch: 6   Global Step: 34700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:48:03,639-Speed 3388.83 samples/sec   Loss 6.7580   LearningRate 0.0431   Epoch: 6   Global Step: 34710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:48:06,642-Speed 3410.83 samples/sec   Loss 6.8326   LearningRate 0.0431   Epoch: 6   Global Step: 34720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:48:09,647-Speed 3408.96 samples/sec   Loss 6.9340   LearningRate 0.0431   Epoch: 6   Global Step: 34730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:48:12,659-Speed 3400.41 samples/sec   Loss 6.8034   LearningRate 0.0431   Epoch: 6   Global Step: 34740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:48:15,683-Speed 3387.43 samples/sec   Loss 6.9499   LearningRate 0.0431   Epoch: 6   Global Step: 34750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:18,704-Speed 3390.63 samples/sec   Loss 6.7653   LearningRate 0.0431   Epoch: 6   Global Step: 34760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:21,706-Speed 3412.31 samples/sec   Loss 6.7568   LearningRate 0.0431   Epoch: 6   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:24,705-Speed 3415.16 samples/sec   Loss 6.7396   LearningRate 0.0431   Epoch: 6   Global Step: 34780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:27,708-Speed 3410.38 samples/sec   Loss 6.9691   LearningRate 0.0430   Epoch: 6   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:30,716-Speed 3405.58 samples/sec   Loss 6.8111   LearningRate 0.0430   Epoch: 6   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:33,715-Speed 3415.32 samples/sec   Loss 6.9167   LearningRate 0.0430   Epoch: 6   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:36,734-Speed 3392.35 samples/sec   Loss 6.8547   LearningRate 0.0430   Epoch: 6   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:39,737-Speed 3410.84 samples/sec   Loss 6.8562   LearningRate 0.0430   Epoch: 6   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:42,809-Speed 3334.37 samples/sec   Loss 6.7185   LearningRate 0.0430   Epoch: 6   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:48:45,825-Speed 3395.86 samples/sec   Loss 6.8812   LearningRate 0.0430   Epoch: 6   Global Step: 34850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:48:48,793-Speed 3450.81 samples/sec   Loss 6.8887   LearningRate 0.0430   Epoch: 6   Global Step: 34860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:48:51,837-Speed 3365.76 samples/sec   Loss 6.8114   LearningRate 0.0429   Epoch: 6   Global Step: 34870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:48:55,011-Speed 3226.22 samples/sec   Loss 6.8815   LearningRate 0.0429   Epoch: 6   Global Step: 34880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:48:58,012-Speed 3413.30 samples/sec   Loss 6.7817   LearningRate 0.0429   Epoch: 6   Global Step: 34890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:49:01,027-Speed 3397.39 samples/sec   Loss 6.7677   LearningRate 0.0429   Epoch: 6   Global Step: 34900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:49:04,158-Speed 3270.83 samples/sec   Loss 6.8419   LearningRate 0.0429   Epoch: 6   Global Step: 34910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:49:07,178-Speed 3392.37 samples/sec   Loss 6.8547   LearningRate 0.0429   Epoch: 6   Global Step: 34920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:49:10,179-Speed 3412.71 samples/sec   Loss 6.8524   LearningRate 0.0429   Epoch: 6   Global Step: 34930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:49:13,179-Speed 3414.68 samples/sec   Loss 6.9294   LearningRate 0.0429   Epoch: 6   Global Step: 34940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:49:16,228-Speed 3359.08 samples/sec   Loss 6.9536   LearningRate 0.0428   Epoch: 6   Global Step: 34950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:49:19,234-Speed 3406.81 samples/sec   Loss 6.7245   LearningRate 0.0428   Epoch: 6   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:22,236-Speed 3412.29 samples/sec   Loss 6.9334   LearningRate 0.0428   Epoch: 6   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:25,245-Speed 3404.18 samples/sec   Loss 6.9125   LearningRate 0.0428   Epoch: 6   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:28,250-Speed 3408.91 samples/sec   Loss 6.7984   LearningRate 0.0428   Epoch: 6   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:31,253-Speed 3410.53 samples/sec   Loss 6.8732   LearningRate 0.0428   Epoch: 6   Global Step: 35000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:34,256-Speed 3410.09 samples/sec   Loss 6.9140   LearningRate 0.0428   Epoch: 6   Global Step: 35010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:37,270-Speed 3399.68 samples/sec   Loss 6.7049   LearningRate 0.0427   Epoch: 6   Global Step: 35020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:40,272-Speed 3411.50 samples/sec   Loss 6.6675   LearningRate 0.0427   Epoch: 6   Global Step: 35030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:43,279-Speed 3406.51 samples/sec   Loss 6.6535   LearningRate 0.0427   Epoch: 6   Global Step: 35040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:46,282-Speed 3410.16 samples/sec   Loss 6.9461   LearningRate 0.0427   Epoch: 6   Global Step: 35050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:49:49,316-Speed 3375.84 samples/sec   Loss 6.8942   LearningRate 0.0427   Epoch: 6   Global Step: 35060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:49:52,359-Speed 3366.27 samples/sec   Loss 6.8236   LearningRate 0.0427   Epoch: 6   Global Step: 35070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:49:55,364-Speed 3408.28 samples/sec   Loss 6.8984   LearningRate 0.0427   Epoch: 6   Global Step: 35080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:49:58,367-Speed 3410.64 samples/sec   Loss 6.7898   LearningRate 0.0427   Epoch: 6   Global Step: 35090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:50:01,393-Speed 3385.18 samples/sec   Loss 6.7952   LearningRate 0.0426   Epoch: 6   Global Step: 35100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:50:04,388-Speed 3420.45 samples/sec   Loss 6.8491   LearningRate 0.0426   Epoch: 6   Global Step: 35110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:07,393-Speed 3408.05 samples/sec   Loss 6.7908   LearningRate 0.0426   Epoch: 6   Global Step: 35120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:10,399-Speed 3407.87 samples/sec   Loss 6.7881   LearningRate 0.0426   Epoch: 6   Global Step: 35130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:13,402-Speed 3410.99 samples/sec   Loss 6.8028   LearningRate 0.0426   Epoch: 6   Global Step: 35140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:16,411-Speed 3403.19 samples/sec   Loss 6.7990   LearningRate 0.0426   Epoch: 6   Global Step: 35150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:19,415-Speed 3409.98 samples/sec   Loss 6.8962   LearningRate 0.0426   Epoch: 6   Global Step: 35160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:22,421-Speed 3406.51 samples/sec   Loss 6.7527   LearningRate 0.0426   Epoch: 6   Global Step: 35170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:25,430-Speed 3403.91 samples/sec   Loss 6.8950   LearningRate 0.0425   Epoch: 6   Global Step: 35180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:28,453-Speed 3389.27 samples/sec   Loss 6.9354   LearningRate 0.0425   Epoch: 6   Global Step: 35190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:31,457-Speed 3409.60 samples/sec   Loss 6.8088   LearningRate 0.0425   Epoch: 6   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:34,465-Speed 3404.20 samples/sec   Loss 6.8298   LearningRate 0.0425   Epoch: 6   Global Step: 35210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:50:37,459-Speed 3421.58 samples/sec   Loss 6.7806   LearningRate 0.0425   Epoch: 6   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:40,477-Speed 3394.16 samples/sec   Loss 6.8443   LearningRate 0.0425   Epoch: 6   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:43,479-Speed 3411.96 samples/sec   Loss 6.9284   LearningRate 0.0425   Epoch: 6   Global Step: 35240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:46,484-Speed 3407.68 samples/sec   Loss 6.7312   LearningRate 0.0425   Epoch: 6   Global Step: 35250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:49,493-Speed 3404.83 samples/sec   Loss 6.8391   LearningRate 0.0424   Epoch: 6   Global Step: 35260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:52,530-Speed 3372.82 samples/sec   Loss 6.9360   LearningRate 0.0424   Epoch: 6   Global Step: 35270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:55,542-Speed 3399.89 samples/sec   Loss 6.8916   LearningRate 0.0424   Epoch: 6   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:50:58,568-Speed 3385.79 samples/sec   Loss 6.9492   LearningRate 0.0424   Epoch: 6   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:51:01,587-Speed 3392.41 samples/sec   Loss 6.8832   LearningRate 0.0424   Epoch: 6   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:51:04,621-Speed 3375.84 samples/sec   Loss 6.6609   LearningRate 0.0424   Epoch: 6   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:51:07,633-Speed 3400.36 samples/sec   Loss 6.7736   LearningRate 0.0424   Epoch: 6   Global Step: 35320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:10,636-Speed 3411.46 samples/sec   Loss 6.6943   LearningRate 0.0423   Epoch: 6   Global Step: 35330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:13,639-Speed 3410.88 samples/sec   Loss 6.7729   LearningRate 0.0423   Epoch: 6   Global Step: 35340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:16,645-Speed 3407.58 samples/sec   Loss 6.8667   LearningRate 0.0423   Epoch: 6   Global Step: 35350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:19,654-Speed 3403.33 samples/sec   Loss 6.6782   LearningRate 0.0423   Epoch: 6   Global Step: 35360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:22,661-Speed 3406.30 samples/sec   Loss 6.6471   LearningRate 0.0423   Epoch: 6   Global Step: 35370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:25,668-Speed 3406.64 samples/sec   Loss 6.7487   LearningRate 0.0423   Epoch: 6   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:28,673-Speed 3408.84 samples/sec   Loss 6.7338   LearningRate 0.0423   Epoch: 6   Global Step: 35390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:31,765-Speed 3313.08 samples/sec   Loss 6.7722   LearningRate 0.0423   Epoch: 6   Global Step: 35400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:44,061-Speed 832.82 samples/sec   Loss 6.5069   LearningRate 0.0422   Epoch: 7   Global Step: 35410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:51:47,067-Speed 3408.22 samples/sec   Loss 6.0252   LearningRate 0.0422   Epoch: 7   Global Step: 35420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:51:50,109-Speed 3366.33 samples/sec   Loss 5.9228   LearningRate 0.0422   Epoch: 7   Global Step: 35430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:51:53,203-Speed 3311.15 samples/sec   Loss 6.1060   LearningRate 0.0422   Epoch: 7   Global Step: 35440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:51:56,287-Speed 3321.05 samples/sec   Loss 6.0186   LearningRate 0.0422   Epoch: 7   Global Step: 35450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:51:59,308-Speed 3390.36 samples/sec   Loss 6.0335   LearningRate 0.0422   Epoch: 7   Global Step: 35460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:02,372-Speed 3343.26 samples/sec   Loss 5.9502   LearningRate 0.0422   Epoch: 7   Global Step: 35470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:05,444-Speed 3334.82 samples/sec   Loss 6.0494   LearningRate 0.0422   Epoch: 7   Global Step: 35480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:08,462-Speed 3392.98 samples/sec   Loss 5.9199   LearningRate 0.0421   Epoch: 7   Global Step: 35490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:11,476-Speed 3398.69 samples/sec   Loss 6.0468   LearningRate 0.0421   Epoch: 7   Global Step: 35500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:14,493-Speed 3394.86 samples/sec   Loss 6.0031   LearningRate 0.0421   Epoch: 7   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:17,518-Speed 3387.06 samples/sec   Loss 6.1066   LearningRate 0.0421   Epoch: 7   Global Step: 35520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:20,519-Speed 3412.46 samples/sec   Loss 6.1669   LearningRate 0.0421   Epoch: 7   Global Step: 35530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:23,542-Speed 3388.28 samples/sec   Loss 6.0885   LearningRate 0.0421   Epoch: 7   Global Step: 35540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:26,562-Speed 3392.20 samples/sec   Loss 6.2100   LearningRate 0.0421   Epoch: 7   Global Step: 35550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:29,579-Speed 3395.67 samples/sec   Loss 6.1601   LearningRate 0.0421   Epoch: 7   Global Step: 35560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:32,611-Speed 3377.88 samples/sec   Loss 6.1312   LearningRate 0.0420   Epoch: 7   Global Step: 35570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:35,653-Speed 3367.28 samples/sec   Loss 5.9959   LearningRate 0.0420   Epoch: 7   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:38,811-Speed 3242.81 samples/sec   Loss 6.1512   LearningRate 0.0420   Epoch: 7   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:52:41,830-Speed 3393.47 samples/sec   Loss 6.1117   LearningRate 0.0420   Epoch: 7   Global Step: 35600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:44,837-Speed 3406.15 samples/sec   Loss 6.1031   LearningRate 0.0420   Epoch: 7   Global Step: 35610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:47,868-Speed 3379.15 samples/sec   Loss 6.0558   LearningRate 0.0420   Epoch: 7   Global Step: 35620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:50,881-Speed 3398.94 samples/sec   Loss 6.0205   LearningRate 0.0420   Epoch: 7   Global Step: 35630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:53,895-Speed 3399.00 samples/sec   Loss 6.1123   LearningRate 0.0419   Epoch: 7   Global Step: 35640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:56,946-Speed 3357.10 samples/sec   Loss 6.1856   LearningRate 0.0419   Epoch: 7   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:52:59,955-Speed 3404.44 samples/sec   Loss 6.2337   LearningRate 0.0419   Epoch: 7   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:02,969-Speed 3398.55 samples/sec   Loss 6.1321   LearningRate 0.0419   Epoch: 7   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:05,985-Speed 3395.57 samples/sec   Loss 6.2103   LearningRate 0.0419   Epoch: 7   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:08,990-Speed 3409.56 samples/sec   Loss 6.4112   LearningRate 0.0419   Epoch: 7   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:12,024-Speed 3375.94 samples/sec   Loss 6.2907   LearningRate 0.0419   Epoch: 7   Global Step: 35700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:53:15,081-Speed 3349.96 samples/sec   Loss 6.2468   LearningRate 0.0419   Epoch: 7   Global Step: 35710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:53:18,083-Speed 3412.20 samples/sec   Loss 6.1430   LearningRate 0.0418   Epoch: 7   Global Step: 35720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:53:21,119-Speed 3373.35 samples/sec   Loss 6.2654   LearningRate 0.0418   Epoch: 7   Global Step: 35730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:53:24,136-Speed 3396.28 samples/sec   Loss 6.2786   LearningRate 0.0418   Epoch: 7   Global Step: 35740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:53:27,170-Speed 3375.20 samples/sec   Loss 6.2086   LearningRate 0.0418   Epoch: 7   Global Step: 35750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:30,209-Speed 3370.71 samples/sec   Loss 6.1987   LearningRate 0.0418   Epoch: 7   Global Step: 35760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:33,231-Speed 3390.24 samples/sec   Loss 6.2323   LearningRate 0.0418   Epoch: 7   Global Step: 35770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:36,237-Speed 3407.26 samples/sec   Loss 6.3171   LearningRate 0.0418   Epoch: 7   Global Step: 35780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:39,246-Speed 3403.80 samples/sec   Loss 6.2057   LearningRate 0.0418   Epoch: 7   Global Step: 35790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:42,247-Speed 3413.66 samples/sec   Loss 6.3306   LearningRate 0.0417   Epoch: 7   Global Step: 35800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:45,260-Speed 3399.32 samples/sec   Loss 6.2759   LearningRate 0.0417   Epoch: 7   Global Step: 35810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:48,263-Speed 3411.26 samples/sec   Loss 6.1390   LearningRate 0.0417   Epoch: 7   Global Step: 35820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:53:51,281-Speed 3394.53 samples/sec   Loss 6.3033   LearningRate 0.0417   Epoch: 7   Global Step: 35830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:53:54,293-Speed 3400.31 samples/sec   Loss 6.3229   LearningRate 0.0417   Epoch: 7   Global Step: 35840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:53:57,302-Speed 3403.74 samples/sec   Loss 6.3489   LearningRate 0.0417   Epoch: 7   Global Step: 35850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:00,310-Speed 3405.44 samples/sec   Loss 6.3835   LearningRate 0.0417   Epoch: 7   Global Step: 35860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:03,315-Speed 3408.87 samples/sec   Loss 6.3590   LearningRate 0.0417   Epoch: 7   Global Step: 35870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:06,339-Speed 3386.54 samples/sec   Loss 6.3654   LearningRate 0.0416   Epoch: 7   Global Step: 35880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:09,357-Speed 3394.25 samples/sec   Loss 6.2944   LearningRate 0.0416   Epoch: 7   Global Step: 35890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:12,367-Speed 3402.58 samples/sec   Loss 6.3397   LearningRate 0.0416   Epoch: 7   Global Step: 35900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:15,377-Speed 3403.00 samples/sec   Loss 6.2565   LearningRate 0.0416   Epoch: 7   Global Step: 35910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:18,397-Speed 3391.48 samples/sec   Loss 6.3273   LearningRate 0.0416   Epoch: 7   Global Step: 35920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 02:54:21,419-Speed 3389.34 samples/sec   Loss 6.3876   LearningRate 0.0416   Epoch: 7   Global Step: 35930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:54:24,425-Speed 3406.84 samples/sec   Loss 6.3249   LearningRate 0.0416   Epoch: 7   Global Step: 35940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:54:27,458-Speed 3377.64 samples/sec   Loss 6.3180   LearningRate 0.0416   Epoch: 7   Global Step: 35950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:54:30,595-Speed 3265.08 samples/sec   Loss 6.2536   LearningRate 0.0415   Epoch: 7   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:54:33,620-Speed 3386.80 samples/sec   Loss 6.3569   LearningRate 0.0415   Epoch: 7   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:54:36,630-Speed 3401.86 samples/sec   Loss 6.3139   LearningRate 0.0415   Epoch: 7   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:54:39,639-Speed 3404.37 samples/sec   Loss 6.2115   LearningRate 0.0415   Epoch: 7   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:54:42,650-Speed 3402.25 samples/sec   Loss 6.3805   LearningRate 0.0415   Epoch: 7   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:55:27,985-[lfw][36000]XNorm: 21.069878
Training: 2022-04-11 02:55:27,986-[lfw][36000]Accuracy-Flip: 0.99767+-0.00238
Training: 2022-04-11 02:55:27,986-[lfw][36000]Accuracy-Highest: 0.99767
Training: 2022-04-11 02:56:19,929-[cfp_fp][36000]XNorm: 18.998192
Training: 2022-04-11 02:56:19,929-[cfp_fp][36000]Accuracy-Flip: 0.96900+-0.00783
Training: 2022-04-11 02:56:19,930-[cfp_fp][36000]Accuracy-Highest: 0.97057
Training: 2022-04-11 02:57:04,873-[agedb_30][36000]XNorm: 21.024379
Training: 2022-04-11 02:57:04,874-[agedb_30][36000]Accuracy-Flip: 0.97583+-0.00834
Training: 2022-04-11 02:57:04,875-[agedb_30][36000]Accuracy-Highest: 0.97750
Training: 2022-04-11 02:57:07,905-Speed 70.50 samples/sec   Loss 6.3683   LearningRate 0.0415   Epoch: 7   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:10,906-Speed 3413.09 samples/sec   Loss 6.2565   LearningRate 0.0415   Epoch: 7   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:13,898-Speed 3422.49 samples/sec   Loss 6.3893   LearningRate 0.0415   Epoch: 7   Global Step: 36030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:57:16,905-Speed 3406.57 samples/sec   Loss 6.3538   LearningRate 0.0414   Epoch: 7   Global Step: 36040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:57:19,891-Speed 3430.27 samples/sec   Loss 6.3588   LearningRate 0.0414   Epoch: 7   Global Step: 36050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:22,901-Speed 3402.66 samples/sec   Loss 6.3161   LearningRate 0.0414   Epoch: 7   Global Step: 36060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:26,030-Speed 3273.62 samples/sec   Loss 6.4180   LearningRate 0.0414   Epoch: 7   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:29,128-Speed 3306.84 samples/sec   Loss 6.3650   LearningRate 0.0414   Epoch: 7   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:32,125-Speed 3417.58 samples/sec   Loss 6.4551   LearningRate 0.0414   Epoch: 7   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:35,126-Speed 3412.44 samples/sec   Loss 6.2726   LearningRate 0.0414   Epoch: 7   Global Step: 36100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:38,185-Speed 3348.43 samples/sec   Loss 6.5275   LearningRate 0.0414   Epoch: 7   Global Step: 36110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:41,218-Speed 3377.16 samples/sec   Loss 6.4226   LearningRate 0.0413   Epoch: 7   Global Step: 36120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:44,214-Speed 3419.49 samples/sec   Loss 6.5557   LearningRate 0.0413   Epoch: 7   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:47,287-Speed 3332.55 samples/sec   Loss 6.4045   LearningRate 0.0413   Epoch: 7   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:50,288-Speed 3413.86 samples/sec   Loss 6.3963   LearningRate 0.0413   Epoch: 7   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:57:53,288-Speed 3413.81 samples/sec   Loss 6.3469   LearningRate 0.0413   Epoch: 7   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:57:56,283-Speed 3420.51 samples/sec   Loss 6.1915   LearningRate 0.0413   Epoch: 7   Global Step: 36170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:57:59,286-Speed 3410.96 samples/sec   Loss 6.6212   LearningRate 0.0413   Epoch: 7   Global Step: 36180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:02,294-Speed 3405.76 samples/sec   Loss 6.4738   LearningRate 0.0412   Epoch: 7   Global Step: 36190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:05,306-Speed 3400.01 samples/sec   Loss 6.3622   LearningRate 0.0412   Epoch: 7   Global Step: 36200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:08,304-Speed 3416.61 samples/sec   Loss 6.4286   LearningRate 0.0412   Epoch: 7   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:11,303-Speed 3415.69 samples/sec   Loss 6.3539   LearningRate 0.0412   Epoch: 7   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:14,303-Speed 3414.50 samples/sec   Loss 6.4488   LearningRate 0.0412   Epoch: 7   Global Step: 36230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:17,303-Speed 3414.42 samples/sec   Loss 6.4771   LearningRate 0.0412   Epoch: 7   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:20,306-Speed 3410.03 samples/sec   Loss 6.5489   LearningRate 0.0412   Epoch: 7   Global Step: 36250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:23,321-Speed 3397.61 samples/sec   Loss 6.5759   LearningRate 0.0412   Epoch: 7   Global Step: 36260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:26,586-Speed 3137.04 samples/sec   Loss 6.4558   LearningRate 0.0411   Epoch: 7   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:58:29,607-Speed 3390.27 samples/sec   Loss 6.4535   LearningRate 0.0411   Epoch: 7   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:58:32,611-Speed 3410.19 samples/sec   Loss 6.5383   LearningRate 0.0411   Epoch: 7   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:58:35,650-Speed 3370.74 samples/sec   Loss 6.4197   LearningRate 0.0411   Epoch: 7   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:58:38,639-Speed 3426.86 samples/sec   Loss 6.3989   LearningRate 0.0411   Epoch: 7   Global Step: 36310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:41,643-Speed 3409.26 samples/sec   Loss 6.3722   LearningRate 0.0411   Epoch: 7   Global Step: 36320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:44,695-Speed 3356.05 samples/sec   Loss 6.5769   LearningRate 0.0411   Epoch: 7   Global Step: 36330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:47,697-Speed 3412.52 samples/sec   Loss 6.3926   LearningRate 0.0411   Epoch: 7   Global Step: 36340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:50,783-Speed 3318.49 samples/sec   Loss 6.2735   LearningRate 0.0410   Epoch: 7   Global Step: 36350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:53,781-Speed 3417.16 samples/sec   Loss 6.5964   LearningRate 0.0410   Epoch: 7   Global Step: 36360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:56,792-Speed 3402.18 samples/sec   Loss 6.4750   LearningRate 0.0410   Epoch: 7   Global Step: 36370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:58:59,789-Speed 3418.00 samples/sec   Loss 6.4998   LearningRate 0.0410   Epoch: 7   Global Step: 36380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:02,793-Speed 3409.58 samples/sec   Loss 6.4848   LearningRate 0.0410   Epoch: 7   Global Step: 36390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:05,815-Speed 3389.24 samples/sec   Loss 6.3565   LearningRate 0.0410   Epoch: 7   Global Step: 36400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:08,814-Speed 3415.33 samples/sec   Loss 6.4543   LearningRate 0.0410   Epoch: 7   Global Step: 36410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:59:11,811-Speed 3417.49 samples/sec   Loss 6.4164   LearningRate 0.0410   Epoch: 7   Global Step: 36420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:59:14,807-Speed 3418.72 samples/sec   Loss 6.3286   LearningRate 0.0409   Epoch: 7   Global Step: 36430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:59:17,818-Speed 3402.01 samples/sec   Loss 6.4738   LearningRate 0.0409   Epoch: 7   Global Step: 36440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:20,826-Speed 3404.86 samples/sec   Loss 6.4288   LearningRate 0.0409   Epoch: 7   Global Step: 36450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:23,836-Speed 3403.38 samples/sec   Loss 6.2867   LearningRate 0.0409   Epoch: 7   Global Step: 36460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:26,843-Speed 3405.90 samples/sec   Loss 6.5425   LearningRate 0.0409   Epoch: 7   Global Step: 36470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:29,852-Speed 3404.48 samples/sec   Loss 6.4027   LearningRate 0.0409   Epoch: 7   Global Step: 36480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:32,853-Speed 3413.09 samples/sec   Loss 6.2872   LearningRate 0.0409   Epoch: 7   Global Step: 36490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:35,855-Speed 3412.10 samples/sec   Loss 6.4034   LearningRate 0.0409   Epoch: 7   Global Step: 36500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:38,853-Speed 3415.97 samples/sec   Loss 6.4762   LearningRate 0.0408   Epoch: 7   Global Step: 36510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:41,854-Speed 3413.69 samples/sec   Loss 6.5259   LearningRate 0.0408   Epoch: 7   Global Step: 36520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:44,861-Speed 3406.27 samples/sec   Loss 6.4211   LearningRate 0.0408   Epoch: 7   Global Step: 36530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 02:59:47,865-Speed 3409.81 samples/sec   Loss 6.5307   LearningRate 0.0408   Epoch: 7   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:59:50,864-Speed 3414.83 samples/sec   Loss 6.5906   LearningRate 0.0408   Epoch: 7   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:59:53,875-Speed 3401.64 samples/sec   Loss 6.4399   LearningRate 0.0408   Epoch: 7   Global Step: 36560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:59:56,879-Speed 3410.69 samples/sec   Loss 6.5541   LearningRate 0.0408   Epoch: 7   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 02:59:59,879-Speed 3414.02 samples/sec   Loss 6.5201   LearningRate 0.0408   Epoch: 7   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:00:02,891-Speed 3400.85 samples/sec   Loss 6.5007   LearningRate 0.0407   Epoch: 7   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:00:05,902-Speed 3401.59 samples/sec   Loss 6.4682   LearningRate 0.0407   Epoch: 7   Global Step: 36600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:00:08,883-Speed 3435.97 samples/sec   Loss 6.5755   LearningRate 0.0407   Epoch: 7   Global Step: 36610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:11,884-Speed 3413.10 samples/sec   Loss 6.4778   LearningRate 0.0407   Epoch: 7   Global Step: 36620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:14,888-Speed 3408.80 samples/sec   Loss 6.4565   LearningRate 0.0407   Epoch: 7   Global Step: 36630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:17,900-Speed 3400.85 samples/sec   Loss 6.5088   LearningRate 0.0407   Epoch: 7   Global Step: 36640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:20,907-Speed 3406.85 samples/sec   Loss 6.4400   LearningRate 0.0407   Epoch: 7   Global Step: 36650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:23,933-Speed 3384.75 samples/sec   Loss 6.3396   LearningRate 0.0407   Epoch: 7   Global Step: 36660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:26,938-Speed 3408.77 samples/sec   Loss 6.4586   LearningRate 0.0406   Epoch: 7   Global Step: 36670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:30,051-Speed 3290.23 samples/sec   Loss 6.5807   LearningRate 0.0406   Epoch: 7   Global Step: 36680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:33,081-Speed 3380.69 samples/sec   Loss 6.5113   LearningRate 0.0406   Epoch: 7   Global Step: 36690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:36,078-Speed 3417.20 samples/sec   Loss 6.5223   LearningRate 0.0406   Epoch: 7   Global Step: 36700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:00:39,082-Speed 3409.10 samples/sec   Loss 6.3503   LearningRate 0.0406   Epoch: 7   Global Step: 36710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:00:42,081-Speed 3416.34 samples/sec   Loss 6.4242   LearningRate 0.0406   Epoch: 7   Global Step: 36720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:00:45,094-Speed 3399.02 samples/sec   Loss 6.4939   LearningRate 0.0406   Epoch: 7   Global Step: 36730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:00:48,092-Speed 3417.49 samples/sec   Loss 6.5036   LearningRate 0.0406   Epoch: 7   Global Step: 36740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:00:51,101-Speed 3403.18 samples/sec   Loss 6.5186   LearningRate 0.0405   Epoch: 7   Global Step: 36750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:00:54,111-Speed 3402.95 samples/sec   Loss 6.5226   LearningRate 0.0405   Epoch: 7   Global Step: 36760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:00:57,115-Speed 3409.29 samples/sec   Loss 6.3608   LearningRate 0.0405   Epoch: 7   Global Step: 36770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:00,123-Speed 3404.90 samples/sec   Loss 6.4623   LearningRate 0.0405   Epoch: 7   Global Step: 36780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:03,166-Speed 3366.88 samples/sec   Loss 6.4239   LearningRate 0.0405   Epoch: 7   Global Step: 36790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:06,164-Speed 3415.99 samples/sec   Loss 6.4659   LearningRate 0.0405   Epoch: 7   Global Step: 36800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:09,162-Speed 3416.30 samples/sec   Loss 6.4925   LearningRate 0.0405   Epoch: 7   Global Step: 36810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:01:12,173-Speed 3402.65 samples/sec   Loss 6.5571   LearningRate 0.0405   Epoch: 7   Global Step: 36820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:01:15,173-Speed 3413.91 samples/sec   Loss 6.4387   LearningRate 0.0404   Epoch: 7   Global Step: 36830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:01:18,177-Speed 3410.18 samples/sec   Loss 6.4451   LearningRate 0.0404   Epoch: 7   Global Step: 36840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:01:21,162-Speed 3430.61 samples/sec   Loss 6.3836   LearningRate 0.0404   Epoch: 7   Global Step: 36850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:24,169-Speed 3406.04 samples/sec   Loss 6.4954   LearningRate 0.0404   Epoch: 7   Global Step: 36860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:27,169-Speed 3414.61 samples/sec   Loss 6.5284   LearningRate 0.0404   Epoch: 7   Global Step: 36870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:30,180-Speed 3402.20 samples/sec   Loss 6.5288   LearningRate 0.0404   Epoch: 7   Global Step: 36880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:33,183-Speed 3411.69 samples/sec   Loss 6.4443   LearningRate 0.0404   Epoch: 7   Global Step: 36890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:36,184-Speed 3412.16 samples/sec   Loss 6.4529   LearningRate 0.0404   Epoch: 7   Global Step: 36900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:39,247-Speed 3344.77 samples/sec   Loss 6.5640   LearningRate 0.0403   Epoch: 7   Global Step: 36910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:42,270-Speed 3387.69 samples/sec   Loss 6.4869   LearningRate 0.0403   Epoch: 7   Global Step: 36920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:45,271-Speed 3412.63 samples/sec   Loss 6.5658   LearningRate 0.0403   Epoch: 7   Global Step: 36930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:48,278-Speed 3407.34 samples/sec   Loss 6.6148   LearningRate 0.0403   Epoch: 7   Global Step: 36940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:01:51,293-Speed 3397.09 samples/sec   Loss 6.6142   LearningRate 0.0403   Epoch: 7   Global Step: 36950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:01:54,293-Speed 3414.16 samples/sec   Loss 6.5038   LearningRate 0.0403   Epoch: 7   Global Step: 36960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:01:57,293-Speed 3413.92 samples/sec   Loss 6.5614   LearningRate 0.0403   Epoch: 7   Global Step: 36970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:00,298-Speed 3409.00 samples/sec   Loss 6.4687   LearningRate 0.0403   Epoch: 7   Global Step: 36980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:03,300-Speed 3412.65 samples/sec   Loss 6.3481   LearningRate 0.0402   Epoch: 7   Global Step: 36990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:06,331-Speed 3378.95 samples/sec   Loss 6.4482   LearningRate 0.0402   Epoch: 7   Global Step: 37000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:09,338-Speed 3406.31 samples/sec   Loss 6.4743   LearningRate 0.0402   Epoch: 7   Global Step: 37010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:12,340-Speed 3412.42 samples/sec   Loss 6.6451   LearningRate 0.0402   Epoch: 7   Global Step: 37020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:15,364-Speed 3387.32 samples/sec   Loss 6.5407   LearningRate 0.0402   Epoch: 7   Global Step: 37030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:18,376-Speed 3400.58 samples/sec   Loss 6.4192   LearningRate 0.0402   Epoch: 7   Global Step: 37040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:21,371-Speed 3420.41 samples/sec   Loss 6.5411   LearningRate 0.0402   Epoch: 7   Global Step: 37050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:24,381-Speed 3402.82 samples/sec   Loss 6.5641   LearningRate 0.0402   Epoch: 7   Global Step: 37060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:27,385-Speed 3409.58 samples/sec   Loss 6.5832   LearningRate 0.0401   Epoch: 7   Global Step: 37070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:30,403-Speed 3394.04 samples/sec   Loss 6.5720   LearningRate 0.0401   Epoch: 7   Global Step: 37080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:33,405-Speed 3412.17 samples/sec   Loss 6.4838   LearningRate 0.0401   Epoch: 7   Global Step: 37090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:36,408-Speed 3410.58 samples/sec   Loss 6.4450   LearningRate 0.0401   Epoch: 7   Global Step: 37100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:39,409-Speed 3413.61 samples/sec   Loss 6.6765   LearningRate 0.0401   Epoch: 7   Global Step: 37110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:42,416-Speed 3406.09 samples/sec   Loss 6.5802   LearningRate 0.0401   Epoch: 7   Global Step: 37120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:45,421-Speed 3409.17 samples/sec   Loss 6.3836   LearningRate 0.0401   Epoch: 7   Global Step: 37130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:48,428-Speed 3406.37 samples/sec   Loss 6.5182   LearningRate 0.0401   Epoch: 7   Global Step: 37140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:51,416-Speed 3428.47 samples/sec   Loss 6.5014   LearningRate 0.0400   Epoch: 7   Global Step: 37150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:54,420-Speed 3409.20 samples/sec   Loss 6.5014   LearningRate 0.0400   Epoch: 7   Global Step: 37160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:02:57,421-Speed 3412.41 samples/sec   Loss 6.4974   LearningRate 0.0400   Epoch: 7   Global Step: 37170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:00,424-Speed 3411.75 samples/sec   Loss 6.4942   LearningRate 0.0400   Epoch: 7   Global Step: 37180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:03,435-Speed 3402.08 samples/sec   Loss 6.4140   LearningRate 0.0400   Epoch: 7   Global Step: 37190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:06,451-Speed 3395.60 samples/sec   Loss 6.5402   LearningRate 0.0400   Epoch: 7   Global Step: 37200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:09,449-Speed 3415.85 samples/sec   Loss 6.5556   LearningRate 0.0400   Epoch: 7   Global Step: 37210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:12,515-Speed 3340.86 samples/sec   Loss 6.4753   LearningRate 0.0400   Epoch: 7   Global Step: 37220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:15,518-Speed 3411.53 samples/sec   Loss 6.5770   LearningRate 0.0399   Epoch: 7   Global Step: 37230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:18,642-Speed 3277.77 samples/sec   Loss 6.5370   LearningRate 0.0399   Epoch: 7   Global Step: 37240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:21,642-Speed 3414.76 samples/sec   Loss 6.3518   LearningRate 0.0399   Epoch: 7   Global Step: 37250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:24,645-Speed 3411.21 samples/sec   Loss 6.3537   LearningRate 0.0399   Epoch: 7   Global Step: 37260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:27,649-Speed 3409.51 samples/sec   Loss 6.4583   LearningRate 0.0399   Epoch: 7   Global Step: 37270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:30,649-Speed 3414.17 samples/sec   Loss 6.4006   LearningRate 0.0399   Epoch: 7   Global Step: 37280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:03:33,651-Speed 3412.19 samples/sec   Loss 6.6359   LearningRate 0.0399   Epoch: 7   Global Step: 37290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:36,658-Speed 3407.20 samples/sec   Loss 6.4033   LearningRate 0.0399   Epoch: 7   Global Step: 37300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:39,663-Speed 3408.62 samples/sec   Loss 6.3766   LearningRate 0.0398   Epoch: 7   Global Step: 37310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:42,673-Speed 3402.41 samples/sec   Loss 6.4946   LearningRate 0.0398   Epoch: 7   Global Step: 37320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:45,687-Speed 3398.72 samples/sec   Loss 6.4498   LearningRate 0.0398   Epoch: 7   Global Step: 37330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:48,689-Speed 3411.71 samples/sec   Loss 6.5468   LearningRate 0.0398   Epoch: 7   Global Step: 37340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:51,695-Speed 3407.77 samples/sec   Loss 6.5356   LearningRate 0.0398   Epoch: 7   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:54,706-Speed 3401.72 samples/sec   Loss 6.5236   LearningRate 0.0398   Epoch: 7   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:03:57,708-Speed 3411.67 samples/sec   Loss 6.3313   LearningRate 0.0398   Epoch: 7   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:04:00,724-Speed 3396.82 samples/sec   Loss 6.3961   LearningRate 0.0398   Epoch: 7   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:04:03,694-Speed 3448.08 samples/sec   Loss 6.5427   LearningRate 0.0397   Epoch: 7   Global Step: 37390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:06,697-Speed 3410.86 samples/sec   Loss 6.5644   LearningRate 0.0397   Epoch: 7   Global Step: 37400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:09,697-Speed 3413.98 samples/sec   Loss 6.5133   LearningRate 0.0397   Epoch: 7   Global Step: 37410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:12,697-Speed 3414.59 samples/sec   Loss 6.3688   LearningRate 0.0397   Epoch: 7   Global Step: 37420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:15,703-Speed 3407.32 samples/sec   Loss 6.5047   LearningRate 0.0397   Epoch: 7   Global Step: 37430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:18,706-Speed 3411.12 samples/sec   Loss 6.5654   LearningRate 0.0397   Epoch: 7   Global Step: 37440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:21,706-Speed 3414.40 samples/sec   Loss 6.6054   LearningRate 0.0397   Epoch: 7   Global Step: 37450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:24,710-Speed 3409.63 samples/sec   Loss 6.4385   LearningRate 0.0397   Epoch: 7   Global Step: 37460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:27,712-Speed 3411.97 samples/sec   Loss 6.5077   LearningRate 0.0396   Epoch: 7   Global Step: 37470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:30,767-Speed 3353.07 samples/sec   Loss 6.6186   LearningRate 0.0396   Epoch: 7   Global Step: 37480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 03:04:33,767-Speed 3414.29 samples/sec   Loss 6.4183   LearningRate 0.0396   Epoch: 7   Global Step: 37490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:04:36,773-Speed 3407.95 samples/sec   Loss 6.5796   LearningRate 0.0396   Epoch: 7   Global Step: 37500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:04:39,779-Speed 3406.77 samples/sec   Loss 6.3822   LearningRate 0.0396   Epoch: 7   Global Step: 37510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:04:42,785-Speed 3408.88 samples/sec   Loss 6.3991   LearningRate 0.0396   Epoch: 7   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:04:45,791-Speed 3406.76 samples/sec   Loss 6.5092   LearningRate 0.0396   Epoch: 7   Global Step: 37530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 03:04:48,796-Speed 3408.21 samples/sec   Loss 6.5468   LearningRate 0.0396   Epoch: 7   Global Step: 37540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:04:51,813-Speed 3395.67 samples/sec   Loss 6.5230   LearningRate 0.0395   Epoch: 7   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:04:54,820-Speed 3405.87 samples/sec   Loss 6.5657   LearningRate 0.0395   Epoch: 7   Global Step: 37560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:04:57,828-Speed 3405.70 samples/sec   Loss 6.5707   LearningRate 0.0395   Epoch: 7   Global Step: 37570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:00,839-Speed 3401.74 samples/sec   Loss 6.5871   LearningRate 0.0395   Epoch: 7   Global Step: 37580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:03,851-Speed 3399.64 samples/sec   Loss 6.5502   LearningRate 0.0395   Epoch: 7   Global Step: 37590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:06,880-Speed 3383.10 samples/sec   Loss 6.4579   LearningRate 0.0395   Epoch: 7   Global Step: 37600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:09,882-Speed 3411.76 samples/sec   Loss 6.4837   LearningRate 0.0395   Epoch: 7   Global Step: 37610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:12,889-Speed 3406.78 samples/sec   Loss 6.5259   LearningRate 0.0395   Epoch: 7   Global Step: 37620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:15,902-Speed 3399.02 samples/sec   Loss 6.5243   LearningRate 0.0394   Epoch: 7   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:18,943-Speed 3368.98 samples/sec   Loss 6.4698   LearningRate 0.0394   Epoch: 7   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:21,945-Speed 3411.33 samples/sec   Loss 6.5778   LearningRate 0.0394   Epoch: 7   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:24,954-Speed 3404.27 samples/sec   Loss 6.5368   LearningRate 0.0394   Epoch: 7   Global Step: 37660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:28,012-Speed 3349.78 samples/sec   Loss 6.5242   LearningRate 0.0394   Epoch: 7   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:31,019-Speed 3406.52 samples/sec   Loss 6.5080   LearningRate 0.0394   Epoch: 7   Global Step: 37680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:34,022-Speed 3410.27 samples/sec   Loss 6.3997   LearningRate 0.0394   Epoch: 7   Global Step: 37690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:37,027-Speed 3409.24 samples/sec   Loss 6.5537   LearningRate 0.0394   Epoch: 7   Global Step: 37700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:05:40,044-Speed 3395.09 samples/sec   Loss 6.4523   LearningRate 0.0393   Epoch: 7   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:43,052-Speed 3405.73 samples/sec   Loss 6.3957   LearningRate 0.0393   Epoch: 7   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:46,059-Speed 3405.59 samples/sec   Loss 6.4617   LearningRate 0.0393   Epoch: 7   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:49,081-Speed 3389.01 samples/sec   Loss 6.4103   LearningRate 0.0393   Epoch: 7   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:52,120-Speed 3370.73 samples/sec   Loss 6.5583   LearningRate 0.0393   Epoch: 7   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:55,138-Speed 3394.05 samples/sec   Loss 6.4349   LearningRate 0.0393   Epoch: 7   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:05:58,147-Speed 3404.68 samples/sec   Loss 6.3687   LearningRate 0.0393   Epoch: 7   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:06:01,138-Speed 3423.95 samples/sec   Loss 6.4390   LearningRate 0.0393   Epoch: 7   Global Step: 37780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:04,161-Speed 3387.91 samples/sec   Loss 6.6255   LearningRate 0.0392   Epoch: 7   Global Step: 37790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:07,171-Speed 3403.78 samples/sec   Loss 6.4330   LearningRate 0.0392   Epoch: 7   Global Step: 37800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:10,180-Speed 3403.71 samples/sec   Loss 6.5508   LearningRate 0.0392   Epoch: 7   Global Step: 37810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:13,188-Speed 3404.80 samples/sec   Loss 6.5339   LearningRate 0.0392   Epoch: 7   Global Step: 37820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:16,213-Speed 3386.73 samples/sec   Loss 6.4561   LearningRate 0.0392   Epoch: 7   Global Step: 37830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:19,246-Speed 3377.03 samples/sec   Loss 6.4801   LearningRate 0.0392   Epoch: 7   Global Step: 37840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:22,250-Speed 3409.78 samples/sec   Loss 6.4362   LearningRate 0.0392   Epoch: 7   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:25,263-Speed 3399.38 samples/sec   Loss 6.4446   LearningRate 0.0392   Epoch: 7   Global Step: 37860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:28,274-Speed 3401.04 samples/sec   Loss 6.5212   LearningRate 0.0391   Epoch: 7   Global Step: 37870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:31,285-Speed 3401.91 samples/sec   Loss 6.4834   LearningRate 0.0391   Epoch: 7   Global Step: 37880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:06:34,281-Speed 3419.06 samples/sec   Loss 6.5670   LearningRate 0.0391   Epoch: 7   Global Step: 37890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:37,286-Speed 3408.91 samples/sec   Loss 6.2445   LearningRate 0.0391   Epoch: 7   Global Step: 37900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:40,291-Speed 3408.23 samples/sec   Loss 6.5860   LearningRate 0.0391   Epoch: 7   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:43,300-Speed 3404.16 samples/sec   Loss 6.5799   LearningRate 0.0391   Epoch: 7   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:46,303-Speed 3410.86 samples/sec   Loss 6.6128   LearningRate 0.0391   Epoch: 7   Global Step: 37930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:49,317-Speed 3397.85 samples/sec   Loss 6.5262   LearningRate 0.0391   Epoch: 7   Global Step: 37940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:52,345-Speed 3382.65 samples/sec   Loss 6.3664   LearningRate 0.0390   Epoch: 7   Global Step: 37950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:55,380-Speed 3374.27 samples/sec   Loss 6.4730   LearningRate 0.0390   Epoch: 7   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:06:58,383-Speed 3411.90 samples/sec   Loss 6.4578   LearningRate 0.0390   Epoch: 7   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:07:01,390-Speed 3406.41 samples/sec   Loss 6.5000   LearningRate 0.0390   Epoch: 7   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:07:04,419-Speed 3381.18 samples/sec   Loss 6.4209   LearningRate 0.0390   Epoch: 7   Global Step: 37990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:07:07,392-Speed 3444.89 samples/sec   Loss 6.3110   LearningRate 0.0390   Epoch: 7   Global Step: 38000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:07:51,398-[lfw][38000]XNorm: 25.049999
Training: 2022-04-11 03:07:51,399-[lfw][38000]Accuracy-Flip: 0.99800+-0.00233
Training: 2022-04-11 03:07:51,399-[lfw][38000]Accuracy-Highest: 0.99800
Training: 2022-04-11 03:08:43,032-[cfp_fp][38000]XNorm: 22.735850
Training: 2022-04-11 03:08:43,033-[cfp_fp][38000]Accuracy-Flip: 0.96857+-0.01141
Training: 2022-04-11 03:08:43,033-[cfp_fp][38000]Accuracy-Highest: 0.97057
Training: 2022-04-11 03:09:27,717-[agedb_30][38000]XNorm: 25.016551
Training: 2022-04-11 03:09:27,718-[agedb_30][38000]Accuracy-Flip: 0.97700+-0.00710
Training: 2022-04-11 03:09:27,718-[agedb_30][38000]Accuracy-Highest: 0.97750
Training: 2022-04-11 03:09:30,728-Speed 71.44 samples/sec   Loss 6.5550   LearningRate 0.0390   Epoch: 7   Global Step: 38010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:09:33,718-Speed 3425.54 samples/sec   Loss 6.4811   LearningRate 0.0390   Epoch: 7   Global Step: 38020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 03:09:36,712-Speed 3420.78 samples/sec   Loss 6.5204   LearningRate 0.0389   Epoch: 7   Global Step: 38030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:09:39,724-Speed 3401.88 samples/sec   Loss 6.5348   LearningRate 0.0389   Epoch: 7   Global Step: 38040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:09:44,248-Speed 2264.12 samples/sec   Loss 6.6622   LearningRate 0.0389   Epoch: 7   Global Step: 38050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:09:47,243-Speed 3419.30 samples/sec   Loss 6.3676   LearningRate 0.0389   Epoch: 7   Global Step: 38060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:09:50,252-Speed 3404.34 samples/sec   Loss 6.5471   LearningRate 0.0389   Epoch: 7   Global Step: 38070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:09:53,256-Speed 3409.90 samples/sec   Loss 6.2943   LearningRate 0.0389   Epoch: 7   Global Step: 38080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:09:56,254-Speed 3416.99 samples/sec   Loss 6.3620   LearningRate 0.0389   Epoch: 7   Global Step: 38090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:09:59,263-Speed 3403.48 samples/sec   Loss 6.5326   LearningRate 0.0389   Epoch: 7   Global Step: 38100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:02,279-Speed 3396.47 samples/sec   Loss 6.5535   LearningRate 0.0388   Epoch: 7   Global Step: 38110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:05,321-Speed 3367.96 samples/sec   Loss 6.4976   LearningRate 0.0388   Epoch: 7   Global Step: 38120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:08,318-Speed 3417.25 samples/sec   Loss 6.4575   LearningRate 0.0388   Epoch: 7   Global Step: 38130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:11,320-Speed 3411.67 samples/sec   Loss 6.5237   LearningRate 0.0388   Epoch: 7   Global Step: 38140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:14,332-Speed 3400.99 samples/sec   Loss 6.7567   LearningRate 0.0388   Epoch: 7   Global Step: 38150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:17,336-Speed 3409.80 samples/sec   Loss 6.5053   LearningRate 0.0388   Epoch: 7   Global Step: 38160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:20,337-Speed 3412.90 samples/sec   Loss 6.3025   LearningRate 0.0388   Epoch: 7   Global Step: 38170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:23,340-Speed 3410.47 samples/sec   Loss 6.4280   LearningRate 0.0388   Epoch: 7   Global Step: 38180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:26,365-Speed 3386.25 samples/sec   Loss 6.4061   LearningRate 0.0387   Epoch: 7   Global Step: 38190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:29,374-Speed 3404.29 samples/sec   Loss 6.5185   LearningRate 0.0387   Epoch: 7   Global Step: 38200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:10:32,410-Speed 3373.16 samples/sec   Loss 6.4507   LearningRate 0.0387   Epoch: 7   Global Step: 38210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:10:35,415-Speed 3408.70 samples/sec   Loss 6.3921   LearningRate 0.0387   Epoch: 7   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:10:38,398-Speed 3434.20 samples/sec   Loss 6.3872   LearningRate 0.0387   Epoch: 7   Global Step: 38230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:10:41,413-Speed 3396.39 samples/sec   Loss 6.2970   LearningRate 0.0387   Epoch: 7   Global Step: 38240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:10:44,425-Speed 3402.07 samples/sec   Loss 6.5112   LearningRate 0.0387   Epoch: 7   Global Step: 38250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:10:47,435-Speed 3401.85 samples/sec   Loss 6.4610   LearningRate 0.0387   Epoch: 7   Global Step: 38260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:10:50,441-Speed 3408.15 samples/sec   Loss 6.5011   LearningRate 0.0386   Epoch: 7   Global Step: 38270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:10:53,474-Speed 3377.63 samples/sec   Loss 6.4795   LearningRate 0.0386   Epoch: 7   Global Step: 38280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:10:56,474-Speed 3414.20 samples/sec   Loss 6.4318   LearningRate 0.0386   Epoch: 7   Global Step: 38290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:10:59,472-Speed 3416.28 samples/sec   Loss 6.4084   LearningRate 0.0386   Epoch: 7   Global Step: 38300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:11:02,563-Speed 3313.71 samples/sec   Loss 6.4499   LearningRate 0.0386   Epoch: 7   Global Step: 38310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:11:05,662-Speed 3305.40 samples/sec   Loss 6.5292   LearningRate 0.0386   Epoch: 7   Global Step: 38320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:11:08,684-Speed 3389.85 samples/sec   Loss 6.6118   LearningRate 0.0386   Epoch: 7   Global Step: 38330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:11,684-Speed 3414.04 samples/sec   Loss 6.4733   LearningRate 0.0386   Epoch: 7   Global Step: 38340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:14,683-Speed 3414.76 samples/sec   Loss 6.5770   LearningRate 0.0386   Epoch: 7   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:17,688-Speed 3408.59 samples/sec   Loss 6.3968   LearningRate 0.0385   Epoch: 7   Global Step: 38360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:20,688-Speed 3414.85 samples/sec   Loss 6.6122   LearningRate 0.0385   Epoch: 7   Global Step: 38370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:23,717-Speed 3381.59 samples/sec   Loss 6.5323   LearningRate 0.0385   Epoch: 7   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:26,718-Speed 3412.36 samples/sec   Loss 6.3626   LearningRate 0.0385   Epoch: 7   Global Step: 38390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:29,729-Speed 3402.21 samples/sec   Loss 6.4318   LearningRate 0.0385   Epoch: 7   Global Step: 38400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:32,727-Speed 3416.07 samples/sec   Loss 6.3649   LearningRate 0.0385   Epoch: 7   Global Step: 38410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:35,724-Speed 3418.44 samples/sec   Loss 6.4162   LearningRate 0.0385   Epoch: 7   Global Step: 38420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:11:38,738-Speed 3397.82 samples/sec   Loss 6.4196   LearningRate 0.0385   Epoch: 7   Global Step: 38430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:11:41,740-Speed 3411.83 samples/sec   Loss 6.5673   LearningRate 0.0384   Epoch: 7   Global Step: 38440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:11:44,754-Speed 3398.79 samples/sec   Loss 6.3651   LearningRate 0.0384   Epoch: 7   Global Step: 38450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:11:47,757-Speed 3410.51 samples/sec   Loss 6.2653   LearningRate 0.0384   Epoch: 7   Global Step: 38460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:11:50,765-Speed 3405.23 samples/sec   Loss 6.4185   LearningRate 0.0384   Epoch: 7   Global Step: 38470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:11:53,781-Speed 3396.58 samples/sec   Loss 6.5409   LearningRate 0.0384   Epoch: 7   Global Step: 38480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:11:56,783-Speed 3411.15 samples/sec   Loss 6.4308   LearningRate 0.0384   Epoch: 7   Global Step: 38490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:11:59,785-Speed 3411.76 samples/sec   Loss 6.4411   LearningRate 0.0384   Epoch: 7   Global Step: 38500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:02,795-Speed 3402.75 samples/sec   Loss 6.5246   LearningRate 0.0384   Epoch: 7   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:05,803-Speed 3405.86 samples/sec   Loss 6.4865   LearningRate 0.0383   Epoch: 7   Global Step: 38520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:08,792-Speed 3426.06 samples/sec   Loss 6.4261   LearningRate 0.0383   Epoch: 7   Global Step: 38530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:11,794-Speed 3411.88 samples/sec   Loss 6.3740   LearningRate 0.0383   Epoch: 7   Global Step: 38540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:14,791-Speed 3418.62 samples/sec   Loss 6.5059   LearningRate 0.0383   Epoch: 7   Global Step: 38550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:17,797-Speed 3406.47 samples/sec   Loss 6.3639   LearningRate 0.0383   Epoch: 7   Global Step: 38560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:20,801-Speed 3409.99 samples/sec   Loss 6.4756   LearningRate 0.0383   Epoch: 7   Global Step: 38570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:23,803-Speed 3411.80 samples/sec   Loss 6.4023   LearningRate 0.0383   Epoch: 7   Global Step: 38580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:26,803-Speed 3414.37 samples/sec   Loss 6.4705   LearningRate 0.0383   Epoch: 7   Global Step: 38590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:12:29,785-Speed 3434.78 samples/sec   Loss 6.4438   LearningRate 0.0382   Epoch: 7   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:32,784-Speed 3415.08 samples/sec   Loss 6.4446   LearningRate 0.0382   Epoch: 7   Global Step: 38610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:35,781-Speed 3418.33 samples/sec   Loss 6.5748   LearningRate 0.0382   Epoch: 7   Global Step: 38620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:38,780-Speed 3415.33 samples/sec   Loss 6.5557   LearningRate 0.0382   Epoch: 7   Global Step: 38630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:41,791-Speed 3401.45 samples/sec   Loss 6.6042   LearningRate 0.0382   Epoch: 7   Global Step: 38640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:44,795-Speed 3409.86 samples/sec   Loss 6.4653   LearningRate 0.0382   Epoch: 7   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:47,797-Speed 3412.39 samples/sec   Loss 6.3979   LearningRate 0.0382   Epoch: 7   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:50,797-Speed 3414.24 samples/sec   Loss 6.3773   LearningRate 0.0382   Epoch: 7   Global Step: 38670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:53,800-Speed 3410.00 samples/sec   Loss 6.5854   LearningRate 0.0381   Epoch: 7   Global Step: 38680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:56,804-Speed 3410.07 samples/sec   Loss 6.3091   LearningRate 0.0381   Epoch: 7   Global Step: 38690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:12:59,824-Speed 3391.05 samples/sec   Loss 6.3535   LearningRate 0.0381   Epoch: 7   Global Step: 38700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:13:02,817-Speed 3422.68 samples/sec   Loss 6.4601   LearningRate 0.0381   Epoch: 7   Global Step: 38710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:05,825-Speed 3404.51 samples/sec   Loss 6.5470   LearningRate 0.0381   Epoch: 7   Global Step: 38720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:08,811-Speed 3430.55 samples/sec   Loss 6.6095   LearningRate 0.0381   Epoch: 7   Global Step: 38730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:11,831-Speed 3391.95 samples/sec   Loss 6.5379   LearningRate 0.0381   Epoch: 7   Global Step: 38740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:14,829-Speed 3416.42 samples/sec   Loss 6.4614   LearningRate 0.0381   Epoch: 7   Global Step: 38750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:17,844-Speed 3396.46 samples/sec   Loss 6.4757   LearningRate 0.0380   Epoch: 7   Global Step: 38760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:20,853-Speed 3404.18 samples/sec   Loss 6.3834   LearningRate 0.0380   Epoch: 7   Global Step: 38770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:23,873-Speed 3391.13 samples/sec   Loss 6.4174   LearningRate 0.0380   Epoch: 7   Global Step: 38780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:26,899-Speed 3385.43 samples/sec   Loss 6.3748   LearningRate 0.0380   Epoch: 7   Global Step: 38790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:29,898-Speed 3415.35 samples/sec   Loss 6.4477   LearningRate 0.0380   Epoch: 7   Global Step: 38800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:32,904-Speed 3407.59 samples/sec   Loss 6.4690   LearningRate 0.0380   Epoch: 7   Global Step: 38810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:35,922-Speed 3394.64 samples/sec   Loss 6.2371   LearningRate 0.0380   Epoch: 7   Global Step: 38820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:13:38,997-Speed 3331.20 samples/sec   Loss 6.6379   LearningRate 0.0380   Epoch: 7   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:42,059-Speed 3344.91 samples/sec   Loss 6.3609   LearningRate 0.0380   Epoch: 7   Global Step: 38840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:45,074-Speed 3397.55 samples/sec   Loss 6.3998   LearningRate 0.0379   Epoch: 7   Global Step: 38850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:48,079-Speed 3408.17 samples/sec   Loss 6.3240   LearningRate 0.0379   Epoch: 7   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:51,083-Speed 3411.33 samples/sec   Loss 6.4077   LearningRate 0.0379   Epoch: 7   Global Step: 38870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:54,098-Speed 3396.51 samples/sec   Loss 6.5662   LearningRate 0.0379   Epoch: 7   Global Step: 38880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:13:57,098-Speed 3413.84 samples/sec   Loss 6.5148   LearningRate 0.0379   Epoch: 7   Global Step: 38890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:00,105-Speed 3406.65 samples/sec   Loss 6.2733   LearningRate 0.0379   Epoch: 7   Global Step: 38900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:03,140-Speed 3374.12 samples/sec   Loss 6.3302   LearningRate 0.0379   Epoch: 7   Global Step: 38910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:06,155-Speed 3398.45 samples/sec   Loss 6.2852   LearningRate 0.0379   Epoch: 7   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:09,155-Speed 3414.21 samples/sec   Loss 6.4796   LearningRate 0.0378   Epoch: 7   Global Step: 38930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:14:12,172-Speed 3394.42 samples/sec   Loss 6.2825   LearningRate 0.0378   Epoch: 7   Global Step: 38940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:14:15,177-Speed 3408.73 samples/sec   Loss 6.6915   LearningRate 0.0378   Epoch: 7   Global Step: 38950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:14:18,178-Speed 3412.44 samples/sec   Loss 6.4956   LearningRate 0.0378   Epoch: 7   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:21,184-Speed 3408.52 samples/sec   Loss 6.4135   LearningRate 0.0378   Epoch: 7   Global Step: 38970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:24,190-Speed 3407.91 samples/sec   Loss 6.4998   LearningRate 0.0378   Epoch: 7   Global Step: 38980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:27,232-Speed 3366.24 samples/sec   Loss 6.4231   LearningRate 0.0378   Epoch: 7   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:30,296-Speed 3343.08 samples/sec   Loss 6.4934   LearningRate 0.0378   Epoch: 7   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:33,302-Speed 3407.46 samples/sec   Loss 6.5588   LearningRate 0.0377   Epoch: 7   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:36,303-Speed 3413.61 samples/sec   Loss 6.4760   LearningRate 0.0377   Epoch: 7   Global Step: 39020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:39,320-Speed 3394.11 samples/sec   Loss 6.4131   LearningRate 0.0377   Epoch: 7   Global Step: 39030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:42,331-Speed 3401.80 samples/sec   Loss 6.5480   LearningRate 0.0377   Epoch: 7   Global Step: 39040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:45,338-Speed 3407.22 samples/sec   Loss 6.4396   LearningRate 0.0377   Epoch: 7   Global Step: 39050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:14:48,341-Speed 3410.40 samples/sec   Loss 6.3685   LearningRate 0.0377   Epoch: 7   Global Step: 39060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:14:51,354-Speed 3399.24 samples/sec   Loss 6.2895   LearningRate 0.0377   Epoch: 7   Global Step: 39070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:14:54,356-Speed 3411.79 samples/sec   Loss 6.3361   LearningRate 0.0377   Epoch: 7   Global Step: 39080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:14:57,342-Speed 3430.73 samples/sec   Loss 6.2570   LearningRate 0.0376   Epoch: 7   Global Step: 39090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:00,395-Speed 3354.41 samples/sec   Loss 6.4587   LearningRate 0.0376   Epoch: 7   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:03,422-Speed 3383.88 samples/sec   Loss 6.5282   LearningRate 0.0376   Epoch: 7   Global Step: 39110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:06,542-Speed 3282.82 samples/sec   Loss 6.4072   LearningRate 0.0376   Epoch: 7   Global Step: 39120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:09,591-Speed 3359.20 samples/sec   Loss 6.2459   LearningRate 0.0376   Epoch: 7   Global Step: 39130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:12,605-Speed 3398.67 samples/sec   Loss 6.4116   LearningRate 0.0376   Epoch: 7   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:15,643-Speed 3371.29 samples/sec   Loss 6.3729   LearningRate 0.0376   Epoch: 7   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:18,679-Speed 3373.59 samples/sec   Loss 6.3837   LearningRate 0.0376   Epoch: 7   Global Step: 39160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:21,689-Speed 3402.87 samples/sec   Loss 6.3344   LearningRate 0.0376   Epoch: 7   Global Step: 39170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:24,697-Speed 3405.57 samples/sec   Loss 6.3772   LearningRate 0.0375   Epoch: 7   Global Step: 39180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:27,692-Speed 3420.33 samples/sec   Loss 6.5403   LearningRate 0.0375   Epoch: 7   Global Step: 39190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:30,695-Speed 3411.12 samples/sec   Loss 6.3051   LearningRate 0.0375   Epoch: 7   Global Step: 39200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:33,700-Speed 3408.28 samples/sec   Loss 6.4125   LearningRate 0.0375   Epoch: 7   Global Step: 39210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:36,705-Speed 3408.33 samples/sec   Loss 6.3041   LearningRate 0.0375   Epoch: 7   Global Step: 39220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:39,742-Speed 3372.16 samples/sec   Loss 6.5720   LearningRate 0.0375   Epoch: 7   Global Step: 39230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:42,758-Speed 3396.52 samples/sec   Loss 6.4116   LearningRate 0.0375   Epoch: 7   Global Step: 39240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:45,761-Speed 3411.26 samples/sec   Loss 6.3945   LearningRate 0.0375   Epoch: 7   Global Step: 39250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:48,764-Speed 3410.79 samples/sec   Loss 6.5164   LearningRate 0.0374   Epoch: 7   Global Step: 39260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:51,778-Speed 3397.86 samples/sec   Loss 6.3041   LearningRate 0.0374   Epoch: 7   Global Step: 39270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:54,797-Speed 3393.63 samples/sec   Loss 6.3487   LearningRate 0.0374   Epoch: 7   Global Step: 39280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:15:57,798-Speed 3412.01 samples/sec   Loss 6.3880   LearningRate 0.0374   Epoch: 7   Global Step: 39290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:16:00,799-Speed 3413.39 samples/sec   Loss 6.4131   LearningRate 0.0374   Epoch: 7   Global Step: 39300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:16:03,806-Speed 3405.94 samples/sec   Loss 6.3308   LearningRate 0.0374   Epoch: 7   Global Step: 39310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:16:06,791-Speed 3432.10 samples/sec   Loss 6.4340   LearningRate 0.0374   Epoch: 7   Global Step: 39320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:09,793-Speed 3411.48 samples/sec   Loss 6.3805   LearningRate 0.0374   Epoch: 7   Global Step: 39330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:12,804-Speed 3402.27 samples/sec   Loss 6.4359   LearningRate 0.0373   Epoch: 7   Global Step: 39340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:15,809-Speed 3408.91 samples/sec   Loss 6.2848   LearningRate 0.0373   Epoch: 7   Global Step: 39350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:18,817-Speed 3404.88 samples/sec   Loss 6.3558   LearningRate 0.0373   Epoch: 7   Global Step: 39360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:21,822-Speed 3408.72 samples/sec   Loss 6.3922   LearningRate 0.0373   Epoch: 7   Global Step: 39370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:24,836-Speed 3399.23 samples/sec   Loss 6.3619   LearningRate 0.0373   Epoch: 7   Global Step: 39380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:27,839-Speed 3410.69 samples/sec   Loss 6.2652   LearningRate 0.0373   Epoch: 7   Global Step: 39390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:30,859-Speed 3391.86 samples/sec   Loss 6.2957   LearningRate 0.0373   Epoch: 7   Global Step: 39400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:33,862-Speed 3409.79 samples/sec   Loss 6.5135   LearningRate 0.0373   Epoch: 7   Global Step: 39410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:36,869-Speed 3406.65 samples/sec   Loss 6.4195   LearningRate 0.0372   Epoch: 7   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:16:39,883-Speed 3398.59 samples/sec   Loss 6.3131   LearningRate 0.0372   Epoch: 7   Global Step: 39430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:16:42,891-Speed 3405.58 samples/sec   Loss 6.3829   LearningRate 0.0372   Epoch: 7   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:16:45,872-Speed 3435.10 samples/sec   Loss 6.3969   LearningRate 0.0372   Epoch: 7   Global Step: 39450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:48,876-Speed 3410.00 samples/sec   Loss 6.3498   LearningRate 0.0372   Epoch: 7   Global Step: 39460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:51,883-Speed 3406.83 samples/sec   Loss 6.4230   LearningRate 0.0372   Epoch: 7   Global Step: 39470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:54,902-Speed 3392.54 samples/sec   Loss 6.3264   LearningRate 0.0372   Epoch: 7   Global Step: 39480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:16:57,909-Speed 3405.72 samples/sec   Loss 6.3296   LearningRate 0.0372   Epoch: 7   Global Step: 39490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:00,928-Speed 3393.80 samples/sec   Loss 6.4189   LearningRate 0.0372   Epoch: 7   Global Step: 39500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:03,935-Speed 3406.38 samples/sec   Loss 6.4264   LearningRate 0.0371   Epoch: 7   Global Step: 39510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:06,939-Speed 3409.16 samples/sec   Loss 6.3130   LearningRate 0.0371   Epoch: 7   Global Step: 39520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:09,950-Speed 3401.92 samples/sec   Loss 6.3387   LearningRate 0.0371   Epoch: 7   Global Step: 39530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:12,966-Speed 3395.38 samples/sec   Loss 6.3009   LearningRate 0.0371   Epoch: 7   Global Step: 39540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:15,972-Speed 3408.24 samples/sec   Loss 6.4534   LearningRate 0.0371   Epoch: 7   Global Step: 39550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:17:18,968-Speed 3418.17 samples/sec   Loss 6.4791   LearningRate 0.0371   Epoch: 7   Global Step: 39560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:21,973-Speed 3408.85 samples/sec   Loss 6.3999   LearningRate 0.0371   Epoch: 7   Global Step: 39570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:24,977-Speed 3409.53 samples/sec   Loss 6.3527   LearningRate 0.0371   Epoch: 7   Global Step: 39580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:27,991-Speed 3399.00 samples/sec   Loss 6.3305   LearningRate 0.0370   Epoch: 7   Global Step: 39590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:30,996-Speed 3408.13 samples/sec   Loss 6.2955   LearningRate 0.0370   Epoch: 7   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:34,006-Speed 3403.56 samples/sec   Loss 6.3154   LearningRate 0.0370   Epoch: 7   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:37,006-Speed 3414.05 samples/sec   Loss 6.3971   LearningRate 0.0370   Epoch: 7   Global Step: 39620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:40,031-Speed 3385.81 samples/sec   Loss 6.3080   LearningRate 0.0370   Epoch: 7   Global Step: 39630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:43,034-Speed 3410.90 samples/sec   Loss 6.3046   LearningRate 0.0370   Epoch: 7   Global Step: 39640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:46,034-Speed 3413.91 samples/sec   Loss 6.3126   LearningRate 0.0370   Epoch: 7   Global Step: 39650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:49,027-Speed 3422.07 samples/sec   Loss 6.3527   LearningRate 0.0370   Epoch: 7   Global Step: 39660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:52,062-Speed 3374.98 samples/sec   Loss 6.4425   LearningRate 0.0369   Epoch: 7   Global Step: 39670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:55,138-Speed 3330.46 samples/sec   Loss 6.3940   LearningRate 0.0369   Epoch: 7   Global Step: 39680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:17:58,140-Speed 3411.24 samples/sec   Loss 6.2312   LearningRate 0.0369   Epoch: 7   Global Step: 39690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:01,151-Speed 3402.62 samples/sec   Loss 6.4654   LearningRate 0.0369   Epoch: 7   Global Step: 39700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:04,164-Speed 3398.62 samples/sec   Loss 6.2496   LearningRate 0.0369   Epoch: 7   Global Step: 39710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:07,168-Speed 3409.46 samples/sec   Loss 6.3641   LearningRate 0.0369   Epoch: 7   Global Step: 39720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:10,173-Speed 3409.14 samples/sec   Loss 6.3104   LearningRate 0.0369   Epoch: 7   Global Step: 39730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:13,189-Speed 3395.68 samples/sec   Loss 6.3042   LearningRate 0.0369   Epoch: 7   Global Step: 39740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:16,199-Speed 3402.50 samples/sec   Loss 6.2964   LearningRate 0.0369   Epoch: 7   Global Step: 39750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:19,215-Speed 3395.85 samples/sec   Loss 6.3897   LearningRate 0.0368   Epoch: 7   Global Step: 39760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:18:22,216-Speed 3414.30 samples/sec   Loss 6.5010   LearningRate 0.0368   Epoch: 7   Global Step: 39770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:18:25,226-Speed 3402.81 samples/sec   Loss 6.2659   LearningRate 0.0368   Epoch: 7   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:18:28,236-Speed 3402.03 samples/sec   Loss 6.3305   LearningRate 0.0368   Epoch: 7   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:18:31,234-Speed 3417.43 samples/sec   Loss 6.4657   LearningRate 0.0368   Epoch: 7   Global Step: 39800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:34,242-Speed 3404.50 samples/sec   Loss 6.3856   LearningRate 0.0368   Epoch: 7   Global Step: 39810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:37,244-Speed 3412.13 samples/sec   Loss 6.4531   LearningRate 0.0368   Epoch: 7   Global Step: 39820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:40,262-Speed 3393.57 samples/sec   Loss 6.2549   LearningRate 0.0368   Epoch: 7   Global Step: 39830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:43,286-Speed 3388.02 samples/sec   Loss 6.4340   LearningRate 0.0367   Epoch: 7   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:46,288-Speed 3411.43 samples/sec   Loss 6.2078   LearningRate 0.0367   Epoch: 7   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:49,292-Speed 3409.78 samples/sec   Loss 6.3951   LearningRate 0.0367   Epoch: 7   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:52,301-Speed 3404.25 samples/sec   Loss 6.2680   LearningRate 0.0367   Epoch: 7   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:55,376-Speed 3330.65 samples/sec   Loss 6.3087   LearningRate 0.0367   Epoch: 7   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:18:58,385-Speed 3404.06 samples/sec   Loss 6.3964   LearningRate 0.0367   Epoch: 7   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:19:01,424-Speed 3370.66 samples/sec   Loss 6.3305   LearningRate 0.0367   Epoch: 7   Global Step: 39900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:19:04,494-Speed 3336.51 samples/sec   Loss 6.1879   LearningRate 0.0367   Epoch: 7   Global Step: 39910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:19:07,585-Speed 3313.78 samples/sec   Loss 6.1422   LearningRate 0.0366   Epoch: 7   Global Step: 39920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:19:10,589-Speed 3409.59 samples/sec   Loss 6.4092   LearningRate 0.0366   Epoch: 7   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:19:13,578-Speed 3426.72 samples/sec   Loss 6.2859   LearningRate 0.0366   Epoch: 7   Global Step: 39940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:19:16,590-Speed 3401.28 samples/sec   Loss 6.4721   LearningRate 0.0366   Epoch: 7   Global Step: 39950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:19:19,596-Speed 3406.37 samples/sec   Loss 6.3819   LearningRate 0.0366   Epoch: 7   Global Step: 39960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:19:22,602-Speed 3407.57 samples/sec   Loss 6.2279   LearningRate 0.0366   Epoch: 7   Global Step: 39970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:19:25,616-Speed 3398.45 samples/sec   Loss 6.2636   LearningRate 0.0366   Epoch: 7   Global Step: 39980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:19:28,619-Speed 3410.43 samples/sec   Loss 6.4423   LearningRate 0.0366   Epoch: 7   Global Step: 39990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:19:31,621-Speed 3412.27 samples/sec   Loss 6.3194   LearningRate 0.0366   Epoch: 7   Global Step: 40000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:20:15,913-[lfw][40000]XNorm: 21.937800
Training: 2022-04-11 03:20:15,913-[lfw][40000]Accuracy-Flip: 0.99750+-0.00239
Training: 2022-04-11 03:20:15,914-[lfw][40000]Accuracy-Highest: 0.99800
Training: 2022-04-11 03:21:07,447-[cfp_fp][40000]XNorm: 19.839884
Training: 2022-04-11 03:21:07,448-[cfp_fp][40000]Accuracy-Flip: 0.97471+-0.00553
Training: 2022-04-11 03:21:07,448-[cfp_fp][40000]Accuracy-Highest: 0.97471
Training: 2022-04-11 03:21:51,808-[agedb_30][40000]XNorm: 21.920209
Training: 2022-04-11 03:21:51,808-[agedb_30][40000]Accuracy-Flip: 0.97967+-0.00795
Training: 2022-04-11 03:21:51,809-[agedb_30][40000]Accuracy-Highest: 0.97967
Training: 2022-04-11 03:21:54,803-Speed 71.52 samples/sec   Loss 6.4074   LearningRate 0.0365   Epoch: 7   Global Step: 40010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:21:57,786-Speed 3433.28 samples/sec   Loss 6.3019   LearningRate 0.0365   Epoch: 7   Global Step: 40020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:00,775-Speed 3426.27 samples/sec   Loss 6.2963   LearningRate 0.0365   Epoch: 7   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:03,787-Speed 3400.95 samples/sec   Loss 6.3540   LearningRate 0.0365   Epoch: 7   Global Step: 40040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:22:06,775-Speed 3427.15 samples/sec   Loss 6.3858   LearningRate 0.0365   Epoch: 7   Global Step: 40050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:22:09,766-Speed 3425.65 samples/sec   Loss 6.3182   LearningRate 0.0365   Epoch: 7   Global Step: 40060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:22:12,745-Speed 3437.39 samples/sec   Loss 6.2947   LearningRate 0.0365   Epoch: 7   Global Step: 40070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:15,736-Speed 3425.19 samples/sec   Loss 6.2720   LearningRate 0.0365   Epoch: 7   Global Step: 40080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:18,743-Speed 3406.37 samples/sec   Loss 6.3699   LearningRate 0.0364   Epoch: 7   Global Step: 40090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:21,733-Speed 3425.29 samples/sec   Loss 6.2907   LearningRate 0.0364   Epoch: 7   Global Step: 40100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:24,724-Speed 3424.11 samples/sec   Loss 6.3289   LearningRate 0.0364   Epoch: 7   Global Step: 40110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:27,719-Speed 3419.74 samples/sec   Loss 6.4349   LearningRate 0.0364   Epoch: 7   Global Step: 40120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:30,711-Speed 3423.35 samples/sec   Loss 6.4235   LearningRate 0.0364   Epoch: 7   Global Step: 40130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:33,716-Speed 3408.68 samples/sec   Loss 6.3353   LearningRate 0.0364   Epoch: 7   Global Step: 40140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:36,708-Speed 3423.83 samples/sec   Loss 6.2066   LearningRate 0.0364   Epoch: 7   Global Step: 40150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:39,702-Speed 3421.56 samples/sec   Loss 6.3762   LearningRate 0.0364   Epoch: 7   Global Step: 40160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:42,695-Speed 3421.97 samples/sec   Loss 6.3634   LearningRate 0.0363   Epoch: 7   Global Step: 40170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:45,695-Speed 3413.73 samples/sec   Loss 6.2661   LearningRate 0.0363   Epoch: 7   Global Step: 40180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:48,690-Speed 3420.21 samples/sec   Loss 6.2241   LearningRate 0.0363   Epoch: 7   Global Step: 40190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:51,687-Speed 3417.49 samples/sec   Loss 6.1981   LearningRate 0.0363   Epoch: 7   Global Step: 40200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:22:54,685-Speed 3417.12 samples/sec   Loss 6.2660   LearningRate 0.0363   Epoch: 7   Global Step: 40210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:22:57,681-Speed 3418.83 samples/sec   Loss 6.2505   LearningRate 0.0363   Epoch: 7   Global Step: 40220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:00,736-Speed 3352.61 samples/sec   Loss 6.1688   LearningRate 0.0363   Epoch: 7   Global Step: 40230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:03,803-Speed 3340.21 samples/sec   Loss 6.3047   LearningRate 0.0363   Epoch: 7   Global Step: 40240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:06,826-Speed 3387.89 samples/sec   Loss 6.3046   LearningRate 0.0363   Epoch: 7   Global Step: 40250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:09,821-Speed 3419.63 samples/sec   Loss 6.3114   LearningRate 0.0362   Epoch: 7   Global Step: 40260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:12,820-Speed 3415.16 samples/sec   Loss 6.2667   LearningRate 0.0362   Epoch: 7   Global Step: 40270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:15,817-Speed 3418.21 samples/sec   Loss 6.3334   LearningRate 0.0362   Epoch: 7   Global Step: 40280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:18,820-Speed 3411.17 samples/sec   Loss 6.2126   LearningRate 0.0362   Epoch: 7   Global Step: 40290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:21,818-Speed 3416.01 samples/sec   Loss 6.2342   LearningRate 0.0362   Epoch: 7   Global Step: 40300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:23:24,850-Speed 3379.17 samples/sec   Loss 6.3084   LearningRate 0.0362   Epoch: 7   Global Step: 40310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:27,888-Speed 3371.04 samples/sec   Loss 6.2757   LearningRate 0.0362   Epoch: 7   Global Step: 40320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:30,884-Speed 3418.51 samples/sec   Loss 6.2049   LearningRate 0.0362   Epoch: 7   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:33,884-Speed 3414.29 samples/sec   Loss 6.3202   LearningRate 0.0361   Epoch: 7   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:36,878-Speed 3421.05 samples/sec   Loss 6.3485   LearningRate 0.0361   Epoch: 7   Global Step: 40350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:39,875-Speed 3417.42 samples/sec   Loss 6.1155   LearningRate 0.0361   Epoch: 7   Global Step: 40360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:42,888-Speed 3400.19 samples/sec   Loss 6.2767   LearningRate 0.0361   Epoch: 7   Global Step: 40370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:45,897-Speed 3403.51 samples/sec   Loss 6.2727   LearningRate 0.0361   Epoch: 7   Global Step: 40380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:48,893-Speed 3419.78 samples/sec   Loss 6.2816   LearningRate 0.0361   Epoch: 7   Global Step: 40390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:51,899-Speed 3407.06 samples/sec   Loss 6.0737   LearningRate 0.0361   Epoch: 7   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:54,878-Speed 3437.69 samples/sec   Loss 6.5187   LearningRate 0.0361   Epoch: 7   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:23:57,875-Speed 3417.51 samples/sec   Loss 6.2566   LearningRate 0.0361   Epoch: 7   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:00,871-Speed 3418.70 samples/sec   Loss 6.2567   LearningRate 0.0360   Epoch: 7   Global Step: 40430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:03,869-Speed 3417.60 samples/sec   Loss 6.2290   LearningRate 0.0360   Epoch: 7   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:06,867-Speed 3415.58 samples/sec   Loss 6.1114   LearningRate 0.0360   Epoch: 7   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:09,995-Speed 3274.60 samples/sec   Loss 6.3817   LearningRate 0.0360   Epoch: 7   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:22,652-Speed 809.15 samples/sec   Loss 5.7625   LearningRate 0.0360   Epoch: 8   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:25,812-Speed 3240.62 samples/sec   Loss 5.4454   LearningRate 0.0360   Epoch: 8   Global Step: 40480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:28,855-Speed 3367.64 samples/sec   Loss 5.5011   LearningRate 0.0360   Epoch: 8   Global Step: 40490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:31,867-Speed 3400.24 samples/sec   Loss 5.5135   LearningRate 0.0360   Epoch: 8   Global Step: 40500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:34,895-Speed 3382.67 samples/sec   Loss 5.4095   LearningRate 0.0359   Epoch: 8   Global Step: 40510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:37,926-Speed 3379.71 samples/sec   Loss 5.3218   LearningRate 0.0359   Epoch: 8   Global Step: 40520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:40,931-Speed 3408.34 samples/sec   Loss 5.5385   LearningRate 0.0359   Epoch: 8   Global Step: 40530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:43,960-Speed 3380.87 samples/sec   Loss 5.5487   LearningRate 0.0359   Epoch: 8   Global Step: 40540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:24:46,954-Speed 3420.99 samples/sec   Loss 5.5051   LearningRate 0.0359   Epoch: 8   Global Step: 40550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:24:49,964-Speed 3403.10 samples/sec   Loss 5.4694   LearningRate 0.0359   Epoch: 8   Global Step: 40560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:24:53,005-Speed 3368.81 samples/sec   Loss 5.6193   LearningRate 0.0359   Epoch: 8   Global Step: 40570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:24:56,048-Speed 3365.57 samples/sec   Loss 5.5189   LearningRate 0.0359   Epoch: 8   Global Step: 40580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:24:59,069-Speed 3391.07 samples/sec   Loss 5.5903   LearningRate 0.0359   Epoch: 8   Global Step: 40590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:25:02,097-Speed 3381.76 samples/sec   Loss 5.6260   LearningRate 0.0358   Epoch: 8   Global Step: 40600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:25:05,172-Speed 3331.35 samples/sec   Loss 5.5505   LearningRate 0.0358   Epoch: 8   Global Step: 40610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:25:08,185-Speed 3399.60 samples/sec   Loss 5.7949   LearningRate 0.0358   Epoch: 8   Global Step: 40620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:25:11,191-Speed 3407.75 samples/sec   Loss 5.5591   LearningRate 0.0358   Epoch: 8   Global Step: 40630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:25:14,195-Speed 3409.67 samples/sec   Loss 5.6104   LearningRate 0.0358   Epoch: 8   Global Step: 40640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:25:17,198-Speed 3410.71 samples/sec   Loss 5.6368   LearningRate 0.0358   Epoch: 8   Global Step: 40650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:20,200-Speed 3411.38 samples/sec   Loss 5.6638   LearningRate 0.0358   Epoch: 8   Global Step: 40660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:23,205-Speed 3409.29 samples/sec   Loss 5.7015   LearningRate 0.0358   Epoch: 8   Global Step: 40670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:26,230-Speed 3385.33 samples/sec   Loss 5.6992   LearningRate 0.0357   Epoch: 8   Global Step: 40680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:29,243-Speed 3399.50 samples/sec   Loss 5.4923   LearningRate 0.0357   Epoch: 8   Global Step: 40690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:32,242-Speed 3415.54 samples/sec   Loss 5.7359   LearningRate 0.0357   Epoch: 8   Global Step: 40700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:35,251-Speed 3404.32 samples/sec   Loss 5.8021   LearningRate 0.0357   Epoch: 8   Global Step: 40710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:38,256-Speed 3408.72 samples/sec   Loss 5.6890   LearningRate 0.0357   Epoch: 8   Global Step: 40720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:41,285-Speed 3381.64 samples/sec   Loss 5.6248   LearningRate 0.0357   Epoch: 8   Global Step: 40730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:44,316-Speed 3378.48 samples/sec   Loss 5.6021   LearningRate 0.0357   Epoch: 8   Global Step: 40740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:47,327-Speed 3402.14 samples/sec   Loss 5.6372   LearningRate 0.0357   Epoch: 8   Global Step: 40750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:25:50,341-Speed 3397.95 samples/sec   Loss 5.7582   LearningRate 0.0356   Epoch: 8   Global Step: 40760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:25:53,334-Speed 3422.67 samples/sec   Loss 5.8137   LearningRate 0.0356   Epoch: 8   Global Step: 40770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:56,339-Speed 3408.47 samples/sec   Loss 5.5699   LearningRate 0.0356   Epoch: 8   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:25:59,355-Speed 3395.37 samples/sec   Loss 5.7134   LearningRate 0.0356   Epoch: 8   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:02,411-Speed 3352.13 samples/sec   Loss 5.6183   LearningRate 0.0356   Epoch: 8   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:05,427-Speed 3395.90 samples/sec   Loss 5.5627   LearningRate 0.0356   Epoch: 8   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:08,444-Speed 3395.39 samples/sec   Loss 5.7311   LearningRate 0.0356   Epoch: 8   Global Step: 40820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:11,458-Speed 3398.36 samples/sec   Loss 5.6572   LearningRate 0.0356   Epoch: 8   Global Step: 40830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:14,475-Speed 3395.26 samples/sec   Loss 5.8720   LearningRate 0.0356   Epoch: 8   Global Step: 40840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:17,480-Speed 3407.37 samples/sec   Loss 5.7088   LearningRate 0.0355   Epoch: 8   Global Step: 40850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:20,495-Speed 3397.33 samples/sec   Loss 5.5921   LearningRate 0.0355   Epoch: 8   Global Step: 40860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:23,503-Speed 3404.96 samples/sec   Loss 5.7116   LearningRate 0.0355   Epoch: 8   Global Step: 40870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:26:26,490-Speed 3429.36 samples/sec   Loss 5.8606   LearningRate 0.0355   Epoch: 8   Global Step: 40880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:29,501-Speed 3402.11 samples/sec   Loss 5.6694   LearningRate 0.0355   Epoch: 8   Global Step: 40890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:32,510-Speed 3404.40 samples/sec   Loss 5.8211   LearningRate 0.0355   Epoch: 8   Global Step: 40900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:35,513-Speed 3410.73 samples/sec   Loss 5.8632   LearningRate 0.0355   Epoch: 8   Global Step: 40910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:38,512-Speed 3414.74 samples/sec   Loss 5.7095   LearningRate 0.0355   Epoch: 8   Global Step: 40920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:41,511-Speed 3414.95 samples/sec   Loss 5.7537   LearningRate 0.0354   Epoch: 8   Global Step: 40930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:44,512-Speed 3413.56 samples/sec   Loss 5.8027   LearningRate 0.0354   Epoch: 8   Global Step: 40940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:47,515-Speed 3410.74 samples/sec   Loss 5.7384   LearningRate 0.0354   Epoch: 8   Global Step: 40950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:50,521-Speed 3406.63 samples/sec   Loss 5.8452   LearningRate 0.0354   Epoch: 8   Global Step: 40960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:53,576-Speed 3352.81 samples/sec   Loss 5.7397   LearningRate 0.0354   Epoch: 8   Global Step: 40970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:56,562-Speed 3430.44 samples/sec   Loss 5.9071   LearningRate 0.0354   Epoch: 8   Global Step: 40980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:26:59,563-Speed 3413.24 samples/sec   Loss 5.8514   LearningRate 0.0354   Epoch: 8   Global Step: 40990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:02,578-Speed 3397.67 samples/sec   Loss 5.5363   LearningRate 0.0354   Epoch: 8   Global Step: 41000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:05,594-Speed 3396.39 samples/sec   Loss 5.9335   LearningRate 0.0354   Epoch: 8   Global Step: 41010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:08,596-Speed 3412.51 samples/sec   Loss 5.7626   LearningRate 0.0353   Epoch: 8   Global Step: 41020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:11,594-Speed 3416.60 samples/sec   Loss 5.6528   LearningRate 0.0353   Epoch: 8   Global Step: 41030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:14,599-Speed 3407.91 samples/sec   Loss 5.7697   LearningRate 0.0353   Epoch: 8   Global Step: 41040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:17,603-Speed 3409.31 samples/sec   Loss 5.6905   LearningRate 0.0353   Epoch: 8   Global Step: 41050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:20,605-Speed 3412.58 samples/sec   Loss 5.8931   LearningRate 0.0353   Epoch: 8   Global Step: 41060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:23,607-Speed 3411.16 samples/sec   Loss 5.7760   LearningRate 0.0353   Epoch: 8   Global Step: 41070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:26,625-Speed 3394.32 samples/sec   Loss 5.7511   LearningRate 0.0353   Epoch: 8   Global Step: 41080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:27:29,646-Speed 3391.14 samples/sec   Loss 6.0364   LearningRate 0.0353   Epoch: 8   Global Step: 41090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:27:32,648-Speed 3411.31 samples/sec   Loss 5.8389   LearningRate 0.0352   Epoch: 8   Global Step: 41100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:27:35,631-Speed 3433.99 samples/sec   Loss 5.9337   LearningRate 0.0352   Epoch: 8   Global Step: 41110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:38,633-Speed 3411.92 samples/sec   Loss 5.6821   LearningRate 0.0352   Epoch: 8   Global Step: 41120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:41,634-Speed 3412.62 samples/sec   Loss 5.7852   LearningRate 0.0352   Epoch: 8   Global Step: 41130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:44,636-Speed 3412.25 samples/sec   Loss 5.9023   LearningRate 0.0352   Epoch: 8   Global Step: 41140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:47,676-Speed 3368.97 samples/sec   Loss 5.7950   LearningRate 0.0352   Epoch: 8   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:50,695-Speed 3393.21 samples/sec   Loss 5.7248   LearningRate 0.0352   Epoch: 8   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:53,708-Speed 3399.09 samples/sec   Loss 5.7870   LearningRate 0.0352   Epoch: 8   Global Step: 41170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:56,721-Speed 3399.98 samples/sec   Loss 5.9561   LearningRate 0.0352   Epoch: 8   Global Step: 41180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:27:59,719-Speed 3416.03 samples/sec   Loss 5.9351   LearningRate 0.0351   Epoch: 8   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:02,730-Speed 3401.76 samples/sec   Loss 6.0214   LearningRate 0.0351   Epoch: 8   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:05,739-Speed 3404.54 samples/sec   Loss 5.7844   LearningRate 0.0351   Epoch: 8   Global Step: 41210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:28:08,762-Speed 3387.60 samples/sec   Loss 5.8440   LearningRate 0.0351   Epoch: 8   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:11,765-Speed 3411.06 samples/sec   Loss 5.8077   LearningRate 0.0351   Epoch: 8   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:14,845-Speed 3325.73 samples/sec   Loss 5.8336   LearningRate 0.0351   Epoch: 8   Global Step: 41240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:17,907-Speed 3344.80 samples/sec   Loss 5.8676   LearningRate 0.0351   Epoch: 8   Global Step: 41250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:20,908-Speed 3412.75 samples/sec   Loss 6.0014   LearningRate 0.0351   Epoch: 8   Global Step: 41260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:23,910-Speed 3412.64 samples/sec   Loss 5.8309   LearningRate 0.0351   Epoch: 8   Global Step: 41270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:26,910-Speed 3413.73 samples/sec   Loss 5.8745   LearningRate 0.0350   Epoch: 8   Global Step: 41280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:29,910-Speed 3414.69 samples/sec   Loss 5.9095   LearningRate 0.0350   Epoch: 8   Global Step: 41290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:32,912-Speed 3411.51 samples/sec   Loss 5.9929   LearningRate 0.0350   Epoch: 8   Global Step: 41300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:35,913-Speed 3413.08 samples/sec   Loss 5.9435   LearningRate 0.0350   Epoch: 8   Global Step: 41310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:38,959-Speed 3362.55 samples/sec   Loss 5.9976   LearningRate 0.0350   Epoch: 8   Global Step: 41320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:28:41,963-Speed 3409.23 samples/sec   Loss 5.9051   LearningRate 0.0350   Epoch: 8   Global Step: 41330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:28:44,968-Speed 3412.36 samples/sec   Loss 5.9944   LearningRate 0.0350   Epoch: 8   Global Step: 41340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:28:47,977-Speed 3403.43 samples/sec   Loss 5.9658   LearningRate 0.0350   Epoch: 8   Global Step: 41350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:28:50,964-Speed 3429.16 samples/sec   Loss 5.9035   LearningRate 0.0349   Epoch: 8   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:53,989-Speed 3386.61 samples/sec   Loss 5.9992   LearningRate 0.0349   Epoch: 8   Global Step: 41370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:56,988-Speed 3414.28 samples/sec   Loss 5.9166   LearningRate 0.0349   Epoch: 8   Global Step: 41380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:28:59,987-Speed 3415.93 samples/sec   Loss 6.0456   LearningRate 0.0349   Epoch: 8   Global Step: 41390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:02,992-Speed 3408.28 samples/sec   Loss 5.9424   LearningRate 0.0349   Epoch: 8   Global Step: 41400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:06,045-Speed 3354.57 samples/sec   Loss 6.0066   LearningRate 0.0349   Epoch: 8   Global Step: 41410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:09,046-Speed 3413.64 samples/sec   Loss 5.9531   LearningRate 0.0349   Epoch: 8   Global Step: 41420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:12,052-Speed 3407.47 samples/sec   Loss 5.8947   LearningRate 0.0349   Epoch: 8   Global Step: 41430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:15,059-Speed 3406.43 samples/sec   Loss 6.0509   LearningRate 0.0349   Epoch: 8   Global Step: 41440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:18,087-Speed 3382.61 samples/sec   Loss 6.0270   LearningRate 0.0348   Epoch: 8   Global Step: 41450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:21,089-Speed 3411.69 samples/sec   Loss 5.8494   LearningRate 0.0348   Epoch: 8   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:29:24,076-Speed 3429.13 samples/sec   Loss 5.8994   LearningRate 0.0348   Epoch: 8   Global Step: 41470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:27,081-Speed 3408.77 samples/sec   Loss 5.9369   LearningRate 0.0348   Epoch: 8   Global Step: 41480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:30,085-Speed 3409.64 samples/sec   Loss 5.7324   LearningRate 0.0348   Epoch: 8   Global Step: 41490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:33,085-Speed 3413.44 samples/sec   Loss 6.0750   LearningRate 0.0348   Epoch: 8   Global Step: 41500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:36,089-Speed 3410.15 samples/sec   Loss 5.8752   LearningRate 0.0348   Epoch: 8   Global Step: 41510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:39,093-Speed 3408.98 samples/sec   Loss 5.9805   LearningRate 0.0348   Epoch: 8   Global Step: 41520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:42,101-Speed 3405.63 samples/sec   Loss 5.8669   LearningRate 0.0347   Epoch: 8   Global Step: 41530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:45,102-Speed 3412.54 samples/sec   Loss 5.9490   LearningRate 0.0347   Epoch: 8   Global Step: 41540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:29:48,101-Speed 3414.99 samples/sec   Loss 5.9452   LearningRate 0.0347   Epoch: 8   Global Step: 41550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:29:51,102-Speed 3413.49 samples/sec   Loss 6.0106   LearningRate 0.0347   Epoch: 8   Global Step: 41560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:29:54,112-Speed 3403.54 samples/sec   Loss 5.7665   LearningRate 0.0347   Epoch: 8   Global Step: 41570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:29:57,122-Speed 3402.87 samples/sec   Loss 5.8955   LearningRate 0.0347   Epoch: 8   Global Step: 41580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:30:00,125-Speed 3410.57 samples/sec   Loss 5.9323   LearningRate 0.0347   Epoch: 8   Global Step: 41590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:30:03,133-Speed 3405.25 samples/sec   Loss 5.9004   LearningRate 0.0347   Epoch: 8   Global Step: 41600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:30:06,154-Speed 3390.15 samples/sec   Loss 5.9218   LearningRate 0.0347   Epoch: 8   Global Step: 41610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:30:09,189-Speed 3374.31 samples/sec   Loss 6.0716   LearningRate 0.0346   Epoch: 8   Global Step: 41620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:30:12,197-Speed 3405.28 samples/sec   Loss 6.0400   LearningRate 0.0346   Epoch: 8   Global Step: 41630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:30:15,204-Speed 3406.60 samples/sec   Loss 5.9347   LearningRate 0.0346   Epoch: 8   Global Step: 41640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:30:18,203-Speed 3414.57 samples/sec   Loss 5.9289   LearningRate 0.0346   Epoch: 8   Global Step: 41650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:21,213-Speed 3403.02 samples/sec   Loss 5.9821   LearningRate 0.0346   Epoch: 8   Global Step: 41660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:24,225-Speed 3400.97 samples/sec   Loss 5.8710   LearningRate 0.0346   Epoch: 8   Global Step: 41670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:27,242-Speed 3395.43 samples/sec   Loss 6.0259   LearningRate 0.0346   Epoch: 8   Global Step: 41680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:30,296-Speed 3353.44 samples/sec   Loss 5.8224   LearningRate 0.0346   Epoch: 8   Global Step: 41690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:33,308-Speed 3400.06 samples/sec   Loss 5.8695   LearningRate 0.0345   Epoch: 8   Global Step: 41700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:36,321-Speed 3399.17 samples/sec   Loss 6.0229   LearningRate 0.0345   Epoch: 8   Global Step: 41710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:39,344-Speed 3389.28 samples/sec   Loss 6.0596   LearningRate 0.0345   Epoch: 8   Global Step: 41720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:42,358-Speed 3398.16 samples/sec   Loss 5.9508   LearningRate 0.0345   Epoch: 8   Global Step: 41730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:45,378-Speed 3391.74 samples/sec   Loss 6.0830   LearningRate 0.0345   Epoch: 8   Global Step: 41740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:48,386-Speed 3405.51 samples/sec   Loss 6.0679   LearningRate 0.0345   Epoch: 8   Global Step: 41750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:30:51,375-Speed 3426.07 samples/sec   Loss 5.8426   LearningRate 0.0345   Epoch: 8   Global Step: 41760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:54,385-Speed 3402.73 samples/sec   Loss 6.2280   LearningRate 0.0345   Epoch: 8   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:30:57,388-Speed 3411.33 samples/sec   Loss 5.9797   LearningRate 0.0345   Epoch: 8   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:00,372-Speed 3431.90 samples/sec   Loss 6.0867   LearningRate 0.0344   Epoch: 8   Global Step: 41790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:03,398-Speed 3384.85 samples/sec   Loss 5.9858   LearningRate 0.0344   Epoch: 8   Global Step: 41800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:06,409-Speed 3402.18 samples/sec   Loss 5.9794   LearningRate 0.0344   Epoch: 8   Global Step: 41810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:09,453-Speed 3364.86 samples/sec   Loss 5.9048   LearningRate 0.0344   Epoch: 8   Global Step: 41820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:12,458-Speed 3407.93 samples/sec   Loss 5.9772   LearningRate 0.0344   Epoch: 8   Global Step: 41830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:15,466-Speed 3406.16 samples/sec   Loss 6.0586   LearningRate 0.0344   Epoch: 8   Global Step: 41840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:18,521-Speed 3352.76 samples/sec   Loss 5.8836   LearningRate 0.0344   Epoch: 8   Global Step: 41850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:21,527-Speed 3407.06 samples/sec   Loss 5.8953   LearningRate 0.0344   Epoch: 8   Global Step: 41860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:24,531-Speed 3409.13 samples/sec   Loss 6.0616   LearningRate 0.0344   Epoch: 8   Global Step: 41870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:27,538-Speed 3405.75 samples/sec   Loss 5.8699   LearningRate 0.0343   Epoch: 8   Global Step: 41880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:31:30,546-Speed 3405.95 samples/sec   Loss 5.8946   LearningRate 0.0343   Epoch: 8   Global Step: 41890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:33,551-Speed 3407.77 samples/sec   Loss 5.9371   LearningRate 0.0343   Epoch: 8   Global Step: 41900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:36,558-Speed 3406.32 samples/sec   Loss 5.9740   LearningRate 0.0343   Epoch: 8   Global Step: 41910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:39,564-Speed 3408.04 samples/sec   Loss 5.8919   LearningRate 0.0343   Epoch: 8   Global Step: 41920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:42,565-Speed 3413.39 samples/sec   Loss 6.0078   LearningRate 0.0343   Epoch: 8   Global Step: 41930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:45,583-Speed 3393.51 samples/sec   Loss 6.0437   LearningRate 0.0343   Epoch: 8   Global Step: 41940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:48,584-Speed 3412.72 samples/sec   Loss 5.9184   LearningRate 0.0343   Epoch: 8   Global Step: 41950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:51,588-Speed 3410.36 samples/sec   Loss 6.0530   LearningRate 0.0342   Epoch: 8   Global Step: 41960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:54,592-Speed 3408.83 samples/sec   Loss 6.0986   LearningRate 0.0342   Epoch: 8   Global Step: 41970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:31:57,597-Speed 3409.41 samples/sec   Loss 6.1015   LearningRate 0.0342   Epoch: 8   Global Step: 41980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:32:00,616-Speed 3392.70 samples/sec   Loss 6.1156   LearningRate 0.0342   Epoch: 8   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:32:03,628-Speed 3399.74 samples/sec   Loss 5.9940   LearningRate 0.0342   Epoch: 8   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:32:48,004-[lfw][42000]XNorm: 23.488167
Training: 2022-04-11 03:32:48,005-[lfw][42000]Accuracy-Flip: 0.99733+-0.00327
Training: 2022-04-11 03:32:48,006-[lfw][42000]Accuracy-Highest: 0.99800
Training: 2022-04-11 03:33:39,534-[cfp_fp][42000]XNorm: 20.888780
Training: 2022-04-11 03:33:39,535-[cfp_fp][42000]Accuracy-Flip: 0.97186+-0.00712
Training: 2022-04-11 03:33:39,536-[cfp_fp][42000]Accuracy-Highest: 0.97471
Training: 2022-04-11 03:34:23,543-[agedb_30][42000]XNorm: 23.363319
Training: 2022-04-11 03:34:23,543-[agedb_30][42000]Accuracy-Flip: 0.97867+-0.00714
Training: 2022-04-11 03:34:23,544-[agedb_30][42000]Accuracy-Highest: 0.97967
Training: 2022-04-11 03:34:26,549-Speed 71.65 samples/sec   Loss 5.8776   LearningRate 0.0342   Epoch: 8   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:34:29,546-Speed 3417.18 samples/sec   Loss 5.9734   LearningRate 0.0342   Epoch: 8   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:34:32,534-Speed 3427.76 samples/sec   Loss 5.9890   LearningRate 0.0342   Epoch: 8   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:34:35,528-Speed 3420.70 samples/sec   Loss 5.8902   LearningRate 0.0342   Epoch: 8   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:34:38,521-Speed 3422.22 samples/sec   Loss 6.0810   LearningRate 0.0341   Epoch: 8   Global Step: 42050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:34:41,517-Speed 3418.77 samples/sec   Loss 5.9304   LearningRate 0.0341   Epoch: 8   Global Step: 42060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:34:44,509-Speed 3423.58 samples/sec   Loss 6.0536   LearningRate 0.0341   Epoch: 8   Global Step: 42070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:34:47,503-Speed 3421.14 samples/sec   Loss 6.0425   LearningRate 0.0341   Epoch: 8   Global Step: 42080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:34:50,496-Speed 3421.85 samples/sec   Loss 6.0338   LearningRate 0.0341   Epoch: 8   Global Step: 42090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:34:53,491-Speed 3419.46 samples/sec   Loss 5.9242   LearningRate 0.0341   Epoch: 8   Global Step: 42100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:34:56,482-Speed 3426.82 samples/sec   Loss 6.0475   LearningRate 0.0341   Epoch: 8   Global Step: 42110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:34:59,475-Speed 3421.53 samples/sec   Loss 5.9205   LearningRate 0.0341   Epoch: 8   Global Step: 42120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:02,475-Speed 3414.45 samples/sec   Loss 6.0461   LearningRate 0.0341   Epoch: 8   Global Step: 42130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:05,476-Speed 3412.92 samples/sec   Loss 5.8921   LearningRate 0.0340   Epoch: 8   Global Step: 42140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:08,473-Speed 3417.13 samples/sec   Loss 5.9949   LearningRate 0.0340   Epoch: 8   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:35:11,454-Speed 3437.17 samples/sec   Loss 6.0059   LearningRate 0.0340   Epoch: 8   Global Step: 42160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:14,478-Speed 3386.05 samples/sec   Loss 5.7844   LearningRate 0.0340   Epoch: 8   Global Step: 42170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:17,532-Speed 3353.92 samples/sec   Loss 5.8089   LearningRate 0.0340   Epoch: 8   Global Step: 42180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:20,548-Speed 3397.11 samples/sec   Loss 5.9817   LearningRate 0.0340   Epoch: 8   Global Step: 42190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:23,546-Speed 3416.07 samples/sec   Loss 5.8698   LearningRate 0.0340   Epoch: 8   Global Step: 42200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:26,546-Speed 3413.97 samples/sec   Loss 6.0194   LearningRate 0.0340   Epoch: 8   Global Step: 42210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:29,562-Speed 3396.80 samples/sec   Loss 5.8735   LearningRate 0.0339   Epoch: 8   Global Step: 42220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:32,566-Speed 3408.78 samples/sec   Loss 6.0703   LearningRate 0.0339   Epoch: 8   Global Step: 42230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:35,572-Speed 3407.83 samples/sec   Loss 5.9217   LearningRate 0.0339   Epoch: 8   Global Step: 42240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:38,580-Speed 3404.22 samples/sec   Loss 5.9325   LearningRate 0.0339   Epoch: 8   Global Step: 42250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:41,590-Speed 3403.90 samples/sec   Loss 5.9450   LearningRate 0.0339   Epoch: 8   Global Step: 42260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:44,610-Speed 3391.04 samples/sec   Loss 6.0806   LearningRate 0.0339   Epoch: 8   Global Step: 42270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:47,620-Speed 3403.39 samples/sec   Loss 5.9764   LearningRate 0.0339   Epoch: 8   Global Step: 42280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:50,644-Speed 3387.21 samples/sec   Loss 6.0099   LearningRate 0.0339   Epoch: 8   Global Step: 42290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:53,657-Speed 3399.55 samples/sec   Loss 5.9946   LearningRate 0.0339   Epoch: 8   Global Step: 42300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:56,664-Speed 3406.42 samples/sec   Loss 6.0100   LearningRate 0.0338   Epoch: 8   Global Step: 42310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:35:59,674-Speed 3401.88 samples/sec   Loss 5.9456   LearningRate 0.0338   Epoch: 8   Global Step: 42320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:02,687-Speed 3399.54 samples/sec   Loss 5.8833   LearningRate 0.0338   Epoch: 8   Global Step: 42330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:05,762-Speed 3331.36 samples/sec   Loss 5.8406   LearningRate 0.0338   Epoch: 8   Global Step: 42340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:08,773-Speed 3401.58 samples/sec   Loss 6.0036   LearningRate 0.0338   Epoch: 8   Global Step: 42350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:11,787-Speed 3397.57 samples/sec   Loss 6.0770   LearningRate 0.0338   Epoch: 8   Global Step: 42360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:14,806-Speed 3393.19 samples/sec   Loss 6.0353   LearningRate 0.0338   Epoch: 8   Global Step: 42370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:17,828-Speed 3389.88 samples/sec   Loss 5.9290   LearningRate 0.0338   Epoch: 8   Global Step: 42380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:20,835-Speed 3405.75 samples/sec   Loss 5.9656   LearningRate 0.0338   Epoch: 8   Global Step: 42390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:23,853-Speed 3393.82 samples/sec   Loss 5.9640   LearningRate 0.0337   Epoch: 8   Global Step: 42400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:26,854-Speed 3412.80 samples/sec   Loss 6.0834   LearningRate 0.0337   Epoch: 8   Global Step: 42410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:29,865-Speed 3402.55 samples/sec   Loss 5.9115   LearningRate 0.0337   Epoch: 8   Global Step: 42420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:32,866-Speed 3412.19 samples/sec   Loss 6.0400   LearningRate 0.0337   Epoch: 8   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:35,873-Speed 3407.11 samples/sec   Loss 6.0427   LearningRate 0.0337   Epoch: 8   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:38,926-Speed 3354.50 samples/sec   Loss 6.0450   LearningRate 0.0337   Epoch: 8   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:36:41,933-Speed 3405.65 samples/sec   Loss 6.1675   LearningRate 0.0337   Epoch: 8   Global Step: 42460   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 03:36:44,900-Speed 3452.90 samples/sec   Loss 5.9129   LearningRate 0.0337   Epoch: 8   Global Step: 42470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:47,896-Speed 3418.58 samples/sec   Loss 6.1179   LearningRate 0.0336   Epoch: 8   Global Step: 42480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:50,900-Speed 3410.05 samples/sec   Loss 5.9347   LearningRate 0.0336   Epoch: 8   Global Step: 42490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:53,901-Speed 3412.81 samples/sec   Loss 5.9350   LearningRate 0.0336   Epoch: 8   Global Step: 42500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:56,906-Speed 3408.37 samples/sec   Loss 5.8786   LearningRate 0.0336   Epoch: 8   Global Step: 42510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:36:59,905-Speed 3416.08 samples/sec   Loss 5.9106   LearningRate 0.0336   Epoch: 8   Global Step: 42520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:02,933-Speed 3382.35 samples/sec   Loss 6.0665   LearningRate 0.0336   Epoch: 8   Global Step: 42530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:05,957-Speed 3387.23 samples/sec   Loss 5.9893   LearningRate 0.0336   Epoch: 8   Global Step: 42540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:08,958-Speed 3412.25 samples/sec   Loss 5.9520   LearningRate 0.0336   Epoch: 8   Global Step: 42550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:11,958-Speed 3414.82 samples/sec   Loss 5.9257   LearningRate 0.0336   Epoch: 8   Global Step: 42560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:14,977-Speed 3393.26 samples/sec   Loss 5.9511   LearningRate 0.0335   Epoch: 8   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:17,976-Speed 3414.85 samples/sec   Loss 5.9945   LearningRate 0.0335   Epoch: 8   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:20,974-Speed 3417.22 samples/sec   Loss 5.7909   LearningRate 0.0335   Epoch: 8   Global Step: 42590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:23,971-Speed 3416.82 samples/sec   Loss 5.8373   LearningRate 0.0335   Epoch: 8   Global Step: 42600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:26,977-Speed 3407.29 samples/sec   Loss 6.2056   LearningRate 0.0335   Epoch: 8   Global Step: 42610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:29,979-Speed 3412.53 samples/sec   Loss 5.9570   LearningRate 0.0335   Epoch: 8   Global Step: 42620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:32,981-Speed 3411.60 samples/sec   Loss 5.9350   LearningRate 0.0335   Epoch: 8   Global Step: 42630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:35,982-Speed 3412.89 samples/sec   Loss 5.8539   LearningRate 0.0335   Epoch: 8   Global Step: 42640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:38,979-Speed 3417.51 samples/sec   Loss 5.9356   LearningRate 0.0335   Epoch: 8   Global Step: 42650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:41,987-Speed 3404.64 samples/sec   Loss 5.9252   LearningRate 0.0334   Epoch: 8   Global Step: 42660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:44,988-Speed 3414.08 samples/sec   Loss 5.9856   LearningRate 0.0334   Epoch: 8   Global Step: 42670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:37:47,991-Speed 3410.53 samples/sec   Loss 6.0497   LearningRate 0.0334   Epoch: 8   Global Step: 42680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:37:50,971-Speed 3437.34 samples/sec   Loss 5.9264   LearningRate 0.0334   Epoch: 8   Global Step: 42690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:53,987-Speed 3396.21 samples/sec   Loss 5.8846   LearningRate 0.0334   Epoch: 8   Global Step: 42700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:56,997-Speed 3402.55 samples/sec   Loss 5.9293   LearningRate 0.0334   Epoch: 8   Global Step: 42710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:37:59,997-Speed 3414.00 samples/sec   Loss 6.0264   LearningRate 0.0334   Epoch: 8   Global Step: 42720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:03,003-Speed 3407.43 samples/sec   Loss 5.9904   LearningRate 0.0334   Epoch: 8   Global Step: 42730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:06,033-Speed 3381.19 samples/sec   Loss 6.0221   LearningRate 0.0334   Epoch: 8   Global Step: 42740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:09,036-Speed 3410.12 samples/sec   Loss 6.1055   LearningRate 0.0333   Epoch: 8   Global Step: 42750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:12,040-Speed 3409.27 samples/sec   Loss 5.8495   LearningRate 0.0333   Epoch: 8   Global Step: 42760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:15,048-Speed 3406.23 samples/sec   Loss 6.0045   LearningRate 0.0333   Epoch: 8   Global Step: 42770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:18,048-Speed 3413.96 samples/sec   Loss 5.9945   LearningRate 0.0333   Epoch: 8   Global Step: 42780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:21,051-Speed 3410.87 samples/sec   Loss 5.9317   LearningRate 0.0333   Epoch: 8   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:38:24,033-Speed 3434.94 samples/sec   Loss 5.9869   LearningRate 0.0333   Epoch: 8   Global Step: 42800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:27,047-Speed 3397.65 samples/sec   Loss 6.0274   LearningRate 0.0333   Epoch: 8   Global Step: 42810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:30,052-Speed 3408.72 samples/sec   Loss 6.0303   LearningRate 0.0333   Epoch: 8   Global Step: 42820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:33,053-Speed 3412.96 samples/sec   Loss 5.9642   LearningRate 0.0332   Epoch: 8   Global Step: 42830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:36,058-Speed 3408.94 samples/sec   Loss 5.9210   LearningRate 0.0332   Epoch: 8   Global Step: 42840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:39,067-Speed 3403.54 samples/sec   Loss 6.0126   LearningRate 0.0332   Epoch: 8   Global Step: 42850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:42,074-Speed 3406.54 samples/sec   Loss 5.9010   LearningRate 0.0332   Epoch: 8   Global Step: 42860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:45,076-Speed 3411.83 samples/sec   Loss 5.9566   LearningRate 0.0332   Epoch: 8   Global Step: 42870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:48,088-Speed 3401.09 samples/sec   Loss 6.0398   LearningRate 0.0332   Epoch: 8   Global Step: 42880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:51,091-Speed 3410.29 samples/sec   Loss 6.0631   LearningRate 0.0332   Epoch: 8   Global Step: 42890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:54,077-Speed 3430.48 samples/sec   Loss 5.9051   LearningRate 0.0332   Epoch: 8   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:38:57,079-Speed 3412.16 samples/sec   Loss 5.8920   LearningRate 0.0332   Epoch: 8   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:00,130-Speed 3357.26 samples/sec   Loss 6.0107   LearningRate 0.0331   Epoch: 8   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:03,143-Speed 3398.32 samples/sec   Loss 6.0762   LearningRate 0.0331   Epoch: 8   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:06,176-Speed 3377.45 samples/sec   Loss 5.9115   LearningRate 0.0331   Epoch: 8   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:09,182-Speed 3407.92 samples/sec   Loss 5.9412   LearningRate 0.0331   Epoch: 8   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:12,184-Speed 3411.92 samples/sec   Loss 5.9785   LearningRate 0.0331   Epoch: 8   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:15,219-Speed 3374.27 samples/sec   Loss 5.9530   LearningRate 0.0331   Epoch: 8   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:18,228-Speed 3404.71 samples/sec   Loss 5.9675   LearningRate 0.0331   Epoch: 8   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:21,249-Speed 3390.51 samples/sec   Loss 5.9724   LearningRate 0.0331   Epoch: 8   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:24,274-Speed 3385.79 samples/sec   Loss 5.9559   LearningRate 0.0331   Epoch: 8   Global Step: 43000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:39:27,287-Speed 3399.39 samples/sec   Loss 5.8840   LearningRate 0.0330   Epoch: 8   Global Step: 43010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:39:30,304-Speed 3394.85 samples/sec   Loss 5.9683   LearningRate 0.0330   Epoch: 8   Global Step: 43020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:39:33,294-Speed 3425.38 samples/sec   Loss 5.9916   LearningRate 0.0330   Epoch: 8   Global Step: 43030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:36,305-Speed 3401.75 samples/sec   Loss 5.9060   LearningRate 0.0330   Epoch: 8   Global Step: 43040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:39,310-Speed 3409.38 samples/sec   Loss 5.9818   LearningRate 0.0330   Epoch: 8   Global Step: 43050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:42,327-Speed 3394.89 samples/sec   Loss 5.8100   LearningRate 0.0330   Epoch: 8   Global Step: 43060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:45,332-Speed 3407.58 samples/sec   Loss 5.9545   LearningRate 0.0330   Epoch: 8   Global Step: 43070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:48,344-Speed 3400.94 samples/sec   Loss 5.7517   LearningRate 0.0330   Epoch: 8   Global Step: 43080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:51,369-Speed 3386.30 samples/sec   Loss 6.0863   LearningRate 0.0330   Epoch: 8   Global Step: 43090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:54,393-Speed 3387.13 samples/sec   Loss 6.1060   LearningRate 0.0329   Epoch: 8   Global Step: 43100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:39:57,400-Speed 3406.08 samples/sec   Loss 6.0384   LearningRate 0.0329   Epoch: 8   Global Step: 43110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:00,409-Speed 3404.13 samples/sec   Loss 5.9573   LearningRate 0.0329   Epoch: 8   Global Step: 43120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:03,418-Speed 3403.35 samples/sec   Loss 5.8255   LearningRate 0.0329   Epoch: 8   Global Step: 43130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:40:06,417-Speed 3415.47 samples/sec   Loss 5.9668   LearningRate 0.0329   Epoch: 8   Global Step: 43140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:09,434-Speed 3394.81 samples/sec   Loss 5.9565   LearningRate 0.0329   Epoch: 8   Global Step: 43150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:12,440-Speed 3408.07 samples/sec   Loss 5.9993   LearningRate 0.0329   Epoch: 8   Global Step: 43160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:15,440-Speed 3414.35 samples/sec   Loss 6.1465   LearningRate 0.0329   Epoch: 8   Global Step: 43170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:18,446-Speed 3407.38 samples/sec   Loss 5.9369   LearningRate 0.0329   Epoch: 8   Global Step: 43180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:21,451-Speed 3408.96 samples/sec   Loss 6.0378   LearningRate 0.0328   Epoch: 8   Global Step: 43190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:24,457-Speed 3406.63 samples/sec   Loss 5.9423   LearningRate 0.0328   Epoch: 8   Global Step: 43200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:27,462-Speed 3407.88 samples/sec   Loss 5.8940   LearningRate 0.0328   Epoch: 8   Global Step: 43210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:30,474-Speed 3400.66 samples/sec   Loss 5.8976   LearningRate 0.0328   Epoch: 8   Global Step: 43220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:33,487-Speed 3399.29 samples/sec   Loss 5.9330   LearningRate 0.0328   Epoch: 8   Global Step: 43230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:36,484-Speed 3417.71 samples/sec   Loss 5.8356   LearningRate 0.0328   Epoch: 8   Global Step: 43240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:39,497-Speed 3399.68 samples/sec   Loss 6.0882   LearningRate 0.0328   Epoch: 8   Global Step: 43250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:42,508-Speed 3402.64 samples/sec   Loss 5.9033   LearningRate 0.0328   Epoch: 8   Global Step: 43260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:45,512-Speed 3409.74 samples/sec   Loss 6.0392   LearningRate 0.0327   Epoch: 8   Global Step: 43270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:48,527-Speed 3396.93 samples/sec   Loss 5.9329   LearningRate 0.0327   Epoch: 8   Global Step: 43280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:51,557-Speed 3380.71 samples/sec   Loss 6.0404   LearningRate 0.0327   Epoch: 8   Global Step: 43290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:54,570-Speed 3399.04 samples/sec   Loss 5.9040   LearningRate 0.0327   Epoch: 8   Global Step: 43300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:40:57,579-Speed 3403.51 samples/sec   Loss 6.0133   LearningRate 0.0327   Epoch: 8   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:00,592-Speed 3400.60 samples/sec   Loss 5.9819   LearningRate 0.0327   Epoch: 8   Global Step: 43320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:03,649-Speed 3350.19 samples/sec   Loss 5.7361   LearningRate 0.0327   Epoch: 8   Global Step: 43330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:06,668-Speed 3393.20 samples/sec   Loss 6.0659   LearningRate 0.0327   Epoch: 8   Global Step: 43340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:41:09,675-Speed 3406.29 samples/sec   Loss 5.9487   LearningRate 0.0327   Epoch: 8   Global Step: 43350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:41:12,657-Speed 3434.42 samples/sec   Loss 6.1088   LearningRate 0.0326   Epoch: 8   Global Step: 43360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:15,675-Speed 3393.91 samples/sec   Loss 5.9386   LearningRate 0.0326   Epoch: 8   Global Step: 43370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:18,690-Speed 3396.74 samples/sec   Loss 5.7922   LearningRate 0.0326   Epoch: 8   Global Step: 43380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:21,699-Speed 3404.81 samples/sec   Loss 5.9623   LearningRate 0.0326   Epoch: 8   Global Step: 43390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:24,710-Speed 3400.65 samples/sec   Loss 5.9189   LearningRate 0.0326   Epoch: 8   Global Step: 43400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:27,716-Speed 3407.16 samples/sec   Loss 5.8901   LearningRate 0.0326   Epoch: 8   Global Step: 43410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:30,736-Speed 3392.08 samples/sec   Loss 5.9296   LearningRate 0.0326   Epoch: 8   Global Step: 43420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:33,742-Speed 3407.55 samples/sec   Loss 6.0451   LearningRate 0.0326   Epoch: 8   Global Step: 43430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:36,748-Speed 3407.55 samples/sec   Loss 5.9624   LearningRate 0.0326   Epoch: 8   Global Step: 43440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:39,757-Speed 3404.64 samples/sec   Loss 6.0571   LearningRate 0.0325   Epoch: 8   Global Step: 43450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:42,762-Speed 3408.31 samples/sec   Loss 5.8647   LearningRate 0.0325   Epoch: 8   Global Step: 43460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:41:45,746-Speed 3432.30 samples/sec   Loss 6.0138   LearningRate 0.0325   Epoch: 8   Global Step: 43470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:48,768-Speed 3389.65 samples/sec   Loss 6.0505   LearningRate 0.0325   Epoch: 8   Global Step: 43480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:51,777-Speed 3403.20 samples/sec   Loss 5.9549   LearningRate 0.0325   Epoch: 8   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:54,800-Speed 3388.81 samples/sec   Loss 5.9719   LearningRate 0.0325   Epoch: 8   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:41:57,802-Speed 3410.98 samples/sec   Loss 5.8734   LearningRate 0.0325   Epoch: 8   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:00,820-Speed 3394.88 samples/sec   Loss 5.8942   LearningRate 0.0325   Epoch: 8   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:03,842-Speed 3389.55 samples/sec   Loss 6.0033   LearningRate 0.0325   Epoch: 8   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:06,852-Speed 3403.03 samples/sec   Loss 5.9510   LearningRate 0.0324   Epoch: 8   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:09,857-Speed 3407.50 samples/sec   Loss 5.9753   LearningRate 0.0324   Epoch: 8   Global Step: 43550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:12,849-Speed 3423.88 samples/sec   Loss 6.0441   LearningRate 0.0324   Epoch: 8   Global Step: 43560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:15,863-Speed 3397.94 samples/sec   Loss 5.9076   LearningRate 0.0324   Epoch: 8   Global Step: 43570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:18,881-Speed 3393.72 samples/sec   Loss 5.8707   LearningRate 0.0324   Epoch: 8   Global Step: 43580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:21,890-Speed 3404.91 samples/sec   Loss 5.8946   LearningRate 0.0324   Epoch: 8   Global Step: 43590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:24,891-Speed 3412.89 samples/sec   Loss 5.9089   LearningRate 0.0324   Epoch: 8   Global Step: 43600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:27,917-Speed 3383.81 samples/sec   Loss 5.8755   LearningRate 0.0324   Epoch: 8   Global Step: 43610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:30,937-Speed 3391.63 samples/sec   Loss 5.8962   LearningRate 0.0324   Epoch: 8   Global Step: 43620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:33,959-Speed 3390.09 samples/sec   Loss 5.8921   LearningRate 0.0323   Epoch: 8   Global Step: 43630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:36,965-Speed 3408.05 samples/sec   Loss 5.9202   LearningRate 0.0323   Epoch: 8   Global Step: 43640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:39,971-Speed 3406.70 samples/sec   Loss 5.9880   LearningRate 0.0323   Epoch: 8   Global Step: 43650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:42:42,987-Speed 3396.11 samples/sec   Loss 5.9670   LearningRate 0.0323   Epoch: 8   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:45,991-Speed 3409.35 samples/sec   Loss 6.0920   LearningRate 0.0323   Epoch: 8   Global Step: 43670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:48,997-Speed 3407.48 samples/sec   Loss 5.9914   LearningRate 0.0323   Epoch: 8   Global Step: 43680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:52,004-Speed 3406.64 samples/sec   Loss 5.9390   LearningRate 0.0323   Epoch: 8   Global Step: 43690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:55,032-Speed 3382.60 samples/sec   Loss 6.0968   LearningRate 0.0323   Epoch: 8   Global Step: 43700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:42:58,042-Speed 3402.68 samples/sec   Loss 6.0484   LearningRate 0.0323   Epoch: 8   Global Step: 43710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:01,057-Speed 3397.20 samples/sec   Loss 5.9688   LearningRate 0.0322   Epoch: 8   Global Step: 43720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:04,063-Speed 3407.84 samples/sec   Loss 5.9363   LearningRate 0.0322   Epoch: 8   Global Step: 43730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:07,075-Speed 3400.02 samples/sec   Loss 6.0269   LearningRate 0.0322   Epoch: 8   Global Step: 43740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:10,090-Speed 3397.35 samples/sec   Loss 5.9640   LearningRate 0.0322   Epoch: 8   Global Step: 43750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:13,101-Speed 3401.80 samples/sec   Loss 6.0469   LearningRate 0.0322   Epoch: 8   Global Step: 43760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:43:16,110-Speed 3403.51 samples/sec   Loss 5.7705   LearningRate 0.0322   Epoch: 8   Global Step: 43770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:43:19,153-Speed 3366.70 samples/sec   Loss 6.1076   LearningRate 0.0322   Epoch: 8   Global Step: 43780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:43:22,159-Speed 3407.36 samples/sec   Loss 5.9616   LearningRate 0.0322   Epoch: 8   Global Step: 43790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:43:25,156-Speed 3417.23 samples/sec   Loss 6.0070   LearningRate 0.0322   Epoch: 8   Global Step: 43800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:28,163-Speed 3405.88 samples/sec   Loss 6.0166   LearningRate 0.0321   Epoch: 8   Global Step: 43810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:31,168-Speed 3409.75 samples/sec   Loss 5.7943   LearningRate 0.0321   Epoch: 8   Global Step: 43820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:34,174-Speed 3407.07 samples/sec   Loss 5.8695   LearningRate 0.0321   Epoch: 8   Global Step: 43830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:37,202-Speed 3382.34 samples/sec   Loss 5.8078   LearningRate 0.0321   Epoch: 8   Global Step: 43840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:40,208-Speed 3407.57 samples/sec   Loss 5.8550   LearningRate 0.0321   Epoch: 8   Global Step: 43850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:43,214-Speed 3407.10 samples/sec   Loss 5.8308   LearningRate 0.0321   Epoch: 8   Global Step: 43860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:46,218-Speed 3409.19 samples/sec   Loss 5.7649   LearningRate 0.0321   Epoch: 8   Global Step: 43870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:49,225-Speed 3406.21 samples/sec   Loss 5.8605   LearningRate 0.0321   Epoch: 8   Global Step: 43880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:52,231-Speed 3407.76 samples/sec   Loss 5.9459   LearningRate 0.0321   Epoch: 8   Global Step: 43890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:43:55,243-Speed 3400.77 samples/sec   Loss 5.8234   LearningRate 0.0320   Epoch: 8   Global Step: 43900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:43:58,229-Speed 3430.72 samples/sec   Loss 5.8996   LearningRate 0.0320   Epoch: 8   Global Step: 43910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:01,235-Speed 3407.20 samples/sec   Loss 5.8221   LearningRate 0.0320   Epoch: 8   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:04,244-Speed 3404.43 samples/sec   Loss 5.8495   LearningRate 0.0320   Epoch: 8   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:07,251-Speed 3405.70 samples/sec   Loss 6.0010   LearningRate 0.0320   Epoch: 8   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:10,258-Speed 3405.99 samples/sec   Loss 5.9504   LearningRate 0.0320   Epoch: 8   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:13,262-Speed 3409.25 samples/sec   Loss 5.8732   LearningRate 0.0320   Epoch: 8   Global Step: 43960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:16,281-Speed 3393.30 samples/sec   Loss 5.8202   LearningRate 0.0320   Epoch: 8   Global Step: 43970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:19,300-Speed 3392.16 samples/sec   Loss 5.9101   LearningRate 0.0319   Epoch: 8   Global Step: 43980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:22,313-Speed 3400.37 samples/sec   Loss 5.7898   LearningRate 0.0319   Epoch: 8   Global Step: 43990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:44:25,326-Speed 3399.66 samples/sec   Loss 6.0712   LearningRate 0.0319   Epoch: 8   Global Step: 44000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:45:09,601-[lfw][44000]XNorm: 22.249712
Training: 2022-04-11 03:45:09,602-[lfw][44000]Accuracy-Flip: 0.99667+-0.00325
Training: 2022-04-11 03:45:09,602-[lfw][44000]Accuracy-Highest: 0.99800
Training: 2022-04-11 03:46:00,795-[cfp_fp][44000]XNorm: 20.131060
Training: 2022-04-11 03:46:00,795-[cfp_fp][44000]Accuracy-Flip: 0.97543+-0.00800
Training: 2022-04-11 03:46:00,796-[cfp_fp][44000]Accuracy-Highest: 0.97543
Training: 2022-04-11 03:46:44,814-[agedb_30][44000]XNorm: 22.260320
Training: 2022-04-11 03:46:44,814-[agedb_30][44000]Accuracy-Flip: 0.98083+-0.00668
Training: 2022-04-11 03:46:44,815-[agedb_30][44000]Accuracy-Highest: 0.98083
Training: 2022-04-11 03:46:47,819-Speed 71.86 samples/sec   Loss 5.8642   LearningRate 0.0319   Epoch: 8   Global Step: 44010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:46:50,788-Speed 3448.83 samples/sec   Loss 6.0052   LearningRate 0.0319   Epoch: 8   Global Step: 44020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:46:53,773-Speed 3431.65 samples/sec   Loss 6.0919   LearningRate 0.0319   Epoch: 8   Global Step: 44030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:46:56,763-Speed 3426.34 samples/sec   Loss 6.0614   LearningRate 0.0319   Epoch: 8   Global Step: 44040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:46:59,796-Speed 3375.97 samples/sec   Loss 5.9309   LearningRate 0.0319   Epoch: 8   Global Step: 44050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:02,795-Speed 3416.48 samples/sec   Loss 5.9333   LearningRate 0.0319   Epoch: 8   Global Step: 44060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:05,796-Speed 3412.18 samples/sec   Loss 5.8921   LearningRate 0.0318   Epoch: 8   Global Step: 44070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:08,797-Speed 3413.09 samples/sec   Loss 5.9853   LearningRate 0.0318   Epoch: 8   Global Step: 44080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:11,802-Speed 3409.12 samples/sec   Loss 5.8835   LearningRate 0.0318   Epoch: 8   Global Step: 44090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:14,800-Speed 3416.41 samples/sec   Loss 5.9323   LearningRate 0.0318   Epoch: 8   Global Step: 44100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:17,883-Speed 3323.47 samples/sec   Loss 6.0009   LearningRate 0.0318   Epoch: 8   Global Step: 44110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:20,878-Speed 3419.54 samples/sec   Loss 5.8952   LearningRate 0.0318   Epoch: 8   Global Step: 44120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:47:23,861-Speed 3433.40 samples/sec   Loss 5.8691   LearningRate 0.0318   Epoch: 8   Global Step: 44130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:26,862-Speed 3413.93 samples/sec   Loss 5.9151   LearningRate 0.0318   Epoch: 8   Global Step: 44140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:29,870-Speed 3405.05 samples/sec   Loss 6.0304   LearningRate 0.0318   Epoch: 8   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:32,871-Speed 3412.92 samples/sec   Loss 5.8327   LearningRate 0.0317   Epoch: 8   Global Step: 44160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:35,908-Speed 3372.26 samples/sec   Loss 5.8607   LearningRate 0.0317   Epoch: 8   Global Step: 44170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:38,913-Speed 3408.33 samples/sec   Loss 5.9013   LearningRate 0.0317   Epoch: 8   Global Step: 44180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:41,921-Speed 3406.20 samples/sec   Loss 5.8321   LearningRate 0.0317   Epoch: 8   Global Step: 44190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:44,923-Speed 3411.74 samples/sec   Loss 5.8939   LearningRate 0.0317   Epoch: 8   Global Step: 44200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:47,934-Speed 3401.36 samples/sec   Loss 5.9134   LearningRate 0.0317   Epoch: 8   Global Step: 44210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:50,939-Speed 3408.78 samples/sec   Loss 5.9379   LearningRate 0.0317   Epoch: 8   Global Step: 44220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:47:53,939-Speed 3414.29 samples/sec   Loss 6.0638   LearningRate 0.0317   Epoch: 8   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:47:56,940-Speed 3413.15 samples/sec   Loss 5.9110   LearningRate 0.0317   Epoch: 8   Global Step: 44240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:47:59,949-Speed 3403.91 samples/sec   Loss 6.0394   LearningRate 0.0316   Epoch: 8   Global Step: 44250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:02,950-Speed 3412.16 samples/sec   Loss 5.8567   LearningRate 0.0316   Epoch: 8   Global Step: 44260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:05,978-Speed 3383.55 samples/sec   Loss 5.9306   LearningRate 0.0316   Epoch: 8   Global Step: 44270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:08,975-Speed 3417.43 samples/sec   Loss 5.9875   LearningRate 0.0316   Epoch: 8   Global Step: 44280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:11,972-Speed 3417.54 samples/sec   Loss 5.8967   LearningRate 0.0316   Epoch: 8   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:14,969-Speed 3417.76 samples/sec   Loss 5.8133   LearningRate 0.0316   Epoch: 8   Global Step: 44300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:17,985-Speed 3396.00 samples/sec   Loss 5.8309   LearningRate 0.0316   Epoch: 8   Global Step: 44310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:20,983-Speed 3416.97 samples/sec   Loss 5.9498   LearningRate 0.0316   Epoch: 8   Global Step: 44320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:23,984-Speed 3412.20 samples/sec   Loss 5.9930   LearningRate 0.0316   Epoch: 8   Global Step: 44330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:26,983-Speed 3415.66 samples/sec   Loss 6.1430   LearningRate 0.0315   Epoch: 8   Global Step: 44340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:29,985-Speed 3411.34 samples/sec   Loss 5.7792   LearningRate 0.0315   Epoch: 8   Global Step: 44350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:32,990-Speed 3409.19 samples/sec   Loss 5.7650   LearningRate 0.0315   Epoch: 8   Global Step: 44360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:35,989-Speed 3415.86 samples/sec   Loss 5.9152   LearningRate 0.0315   Epoch: 8   Global Step: 44370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:38,984-Speed 3420.41 samples/sec   Loss 5.8830   LearningRate 0.0315   Epoch: 8   Global Step: 44380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:41,987-Speed 3410.33 samples/sec   Loss 5.9630   LearningRate 0.0315   Epoch: 8   Global Step: 44390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:44,987-Speed 3414.54 samples/sec   Loss 5.7671   LearningRate 0.0315   Epoch: 8   Global Step: 44400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:47,988-Speed 3413.30 samples/sec   Loss 5.8966   LearningRate 0.0315   Epoch: 8   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:51,005-Speed 3394.80 samples/sec   Loss 5.8849   LearningRate 0.0315   Epoch: 8   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:48:53,985-Speed 3437.30 samples/sec   Loss 5.8746   LearningRate 0.0314   Epoch: 8   Global Step: 44430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:48:56,981-Speed 3418.09 samples/sec   Loss 5.8087   LearningRate 0.0314   Epoch: 8   Global Step: 44440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:00,005-Speed 3387.95 samples/sec   Loss 5.9259   LearningRate 0.0314   Epoch: 8   Global Step: 44450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:03,087-Speed 3323.47 samples/sec   Loss 5.8659   LearningRate 0.0314   Epoch: 8   Global Step: 44460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:06,106-Speed 3392.57 samples/sec   Loss 6.0094   LearningRate 0.0314   Epoch: 8   Global Step: 44470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:09,104-Speed 3416.94 samples/sec   Loss 5.9504   LearningRate 0.0314   Epoch: 8   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:12,104-Speed 3414.23 samples/sec   Loss 5.8929   LearningRate 0.0314   Epoch: 8   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:15,110-Speed 3407.33 samples/sec   Loss 5.8261   LearningRate 0.0314   Epoch: 8   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:18,112-Speed 3412.14 samples/sec   Loss 5.7655   LearningRate 0.0314   Epoch: 8   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:21,113-Speed 3412.86 samples/sec   Loss 5.8733   LearningRate 0.0313   Epoch: 8   Global Step: 44520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:24,123-Speed 3402.16 samples/sec   Loss 5.8406   LearningRate 0.0313   Epoch: 8   Global Step: 44530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:27,135-Speed 3400.82 samples/sec   Loss 5.9784   LearningRate 0.0313   Epoch: 8   Global Step: 44540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:30,134-Speed 3416.11 samples/sec   Loss 5.8085   LearningRate 0.0313   Epoch: 8   Global Step: 44550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:33,139-Speed 3408.89 samples/sec   Loss 5.9597   LearningRate 0.0313   Epoch: 8   Global Step: 44560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:36,138-Speed 3414.53 samples/sec   Loss 6.0057   LearningRate 0.0313   Epoch: 8   Global Step: 44570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:39,150-Speed 3400.40 samples/sec   Loss 5.9029   LearningRate 0.0313   Epoch: 8   Global Step: 44580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:42,152-Speed 3412.55 samples/sec   Loss 5.9865   LearningRate 0.0313   Epoch: 8   Global Step: 44590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:45,153-Speed 3412.82 samples/sec   Loss 5.8722   LearningRate 0.0313   Epoch: 8   Global Step: 44600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:48,155-Speed 3412.31 samples/sec   Loss 5.7369   LearningRate 0.0312   Epoch: 8   Global Step: 44610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:49:51,158-Speed 3411.39 samples/sec   Loss 5.8822   LearningRate 0.0312   Epoch: 8   Global Step: 44620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:54,177-Speed 3391.90 samples/sec   Loss 5.7651   LearningRate 0.0312   Epoch: 8   Global Step: 44630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:49:57,187-Speed 3404.01 samples/sec   Loss 5.8891   LearningRate 0.0312   Epoch: 8   Global Step: 44640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:00,186-Speed 3414.42 samples/sec   Loss 5.7525   LearningRate 0.0312   Epoch: 8   Global Step: 44650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:03,191-Speed 3408.52 samples/sec   Loss 5.8136   LearningRate 0.0312   Epoch: 8   Global Step: 44660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:06,198-Speed 3406.44 samples/sec   Loss 5.8173   LearningRate 0.0312   Epoch: 8   Global Step: 44670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:09,198-Speed 3414.54 samples/sec   Loss 5.9325   LearningRate 0.0312   Epoch: 8   Global Step: 44680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:12,204-Speed 3407.67 samples/sec   Loss 5.8249   LearningRate 0.0312   Epoch: 8   Global Step: 44690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:15,205-Speed 3412.53 samples/sec   Loss 5.8417   LearningRate 0.0312   Epoch: 8   Global Step: 44700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:18,208-Speed 3410.00 samples/sec   Loss 5.7462   LearningRate 0.0311   Epoch: 8   Global Step: 44710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:21,211-Speed 3410.98 samples/sec   Loss 5.8031   LearningRate 0.0311   Epoch: 8   Global Step: 44720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:50:24,207-Speed 3419.53 samples/sec   Loss 5.9268   LearningRate 0.0311   Epoch: 8   Global Step: 44730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:27,207-Speed 3413.91 samples/sec   Loss 5.8462   LearningRate 0.0311   Epoch: 8   Global Step: 44740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:30,211-Speed 3410.33 samples/sec   Loss 5.8793   LearningRate 0.0311   Epoch: 8   Global Step: 44750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:33,235-Speed 3386.92 samples/sec   Loss 5.8917   LearningRate 0.0311   Epoch: 8   Global Step: 44760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:36,260-Speed 3386.61 samples/sec   Loss 5.8884   LearningRate 0.0311   Epoch: 8   Global Step: 44770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:39,259-Speed 3414.56 samples/sec   Loss 5.9642   LearningRate 0.0311   Epoch: 8   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:42,258-Speed 3415.29 samples/sec   Loss 5.7843   LearningRate 0.0311   Epoch: 8   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:45,265-Speed 3405.82 samples/sec   Loss 5.8906   LearningRate 0.0310   Epoch: 8   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:48,272-Speed 3406.60 samples/sec   Loss 5.8185   LearningRate 0.0310   Epoch: 8   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:51,288-Speed 3396.72 samples/sec   Loss 5.7824   LearningRate 0.0310   Epoch: 8   Global Step: 44820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:50:54,288-Speed 3413.64 samples/sec   Loss 5.9317   LearningRate 0.0310   Epoch: 8   Global Step: 44830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:50:57,271-Speed 3434.70 samples/sec   Loss 5.8832   LearningRate 0.0310   Epoch: 8   Global Step: 44840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:00,271-Speed 3413.86 samples/sec   Loss 5.8933   LearningRate 0.0310   Epoch: 8   Global Step: 44850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:03,276-Speed 3407.76 samples/sec   Loss 5.7833   LearningRate 0.0310   Epoch: 8   Global Step: 44860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:06,287-Speed 3402.50 samples/sec   Loss 5.9044   LearningRate 0.0310   Epoch: 8   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:09,293-Speed 3406.97 samples/sec   Loss 5.8716   LearningRate 0.0310   Epoch: 8   Global Step: 44880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:12,306-Speed 3399.13 samples/sec   Loss 5.8888   LearningRate 0.0309   Epoch: 8   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:15,320-Speed 3399.12 samples/sec   Loss 5.7594   LearningRate 0.0309   Epoch: 8   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:18,351-Speed 3378.83 samples/sec   Loss 6.0182   LearningRate 0.0309   Epoch: 8   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:21,354-Speed 3410.84 samples/sec   Loss 5.8751   LearningRate 0.0309   Epoch: 8   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:24,367-Speed 3400.29 samples/sec   Loss 5.8265   LearningRate 0.0309   Epoch: 8   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:27,369-Speed 3411.96 samples/sec   Loss 5.9543   LearningRate 0.0309   Epoch: 8   Global Step: 44940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:51:30,351-Speed 3434.31 samples/sec   Loss 5.8189   LearningRate 0.0309   Epoch: 8   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:33,359-Speed 3405.20 samples/sec   Loss 5.8077   LearningRate 0.0309   Epoch: 8   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:36,360-Speed 3413.20 samples/sec   Loss 5.8557   LearningRate 0.0309   Epoch: 8   Global Step: 44970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:39,361-Speed 3412.96 samples/sec   Loss 5.9159   LearningRate 0.0308   Epoch: 8   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:42,369-Speed 3405.26 samples/sec   Loss 5.8881   LearningRate 0.0308   Epoch: 8   Global Step: 44990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:45,375-Speed 3407.57 samples/sec   Loss 5.7404   LearningRate 0.0308   Epoch: 8   Global Step: 45000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:48,376-Speed 3412.31 samples/sec   Loss 5.8521   LearningRate 0.0308   Epoch: 8   Global Step: 45010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:51,385-Speed 3405.21 samples/sec   Loss 5.8481   LearningRate 0.0308   Epoch: 8   Global Step: 45020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:54,388-Speed 3410.39 samples/sec   Loss 5.8319   LearningRate 0.0308   Epoch: 8   Global Step: 45030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:51:57,390-Speed 3411.56 samples/sec   Loss 5.9004   LearningRate 0.0308   Epoch: 8   Global Step: 45040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:00,393-Speed 3410.75 samples/sec   Loss 5.8730   LearningRate 0.0308   Epoch: 8   Global Step: 45050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:52:03,414-Speed 3390.20 samples/sec   Loss 5.6835   LearningRate 0.0308   Epoch: 8   Global Step: 45060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:52:06,408-Speed 3421.55 samples/sec   Loss 5.8523   LearningRate 0.0307   Epoch: 8   Global Step: 45070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:09,411-Speed 3410.28 samples/sec   Loss 5.7407   LearningRate 0.0307   Epoch: 8   Global Step: 45080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:12,421-Speed 3402.72 samples/sec   Loss 5.8076   LearningRate 0.0307   Epoch: 8   Global Step: 45090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:15,423-Speed 3412.41 samples/sec   Loss 5.8482   LearningRate 0.0307   Epoch: 8   Global Step: 45100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:18,431-Speed 3405.81 samples/sec   Loss 5.8633   LearningRate 0.0307   Epoch: 8   Global Step: 45110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:21,436-Speed 3408.45 samples/sec   Loss 5.7839   LearningRate 0.0307   Epoch: 8   Global Step: 45120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:24,437-Speed 3412.98 samples/sec   Loss 5.6228   LearningRate 0.0307   Epoch: 8   Global Step: 45130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:27,442-Speed 3408.67 samples/sec   Loss 5.7659   LearningRate 0.0307   Epoch: 8   Global Step: 45140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:30,450-Speed 3404.73 samples/sec   Loss 5.8436   LearningRate 0.0307   Epoch: 8   Global Step: 45150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:33,452-Speed 3411.34 samples/sec   Loss 5.7869   LearningRate 0.0306   Epoch: 8   Global Step: 45160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:36,460-Speed 3405.61 samples/sec   Loss 5.9243   LearningRate 0.0306   Epoch: 8   Global Step: 45170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:52:39,468-Speed 3404.65 samples/sec   Loss 5.7647   LearningRate 0.0306   Epoch: 8   Global Step: 45180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:52:42,479-Speed 3402.08 samples/sec   Loss 5.7988   LearningRate 0.0306   Epoch: 8   Global Step: 45190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:52:45,471-Speed 3423.70 samples/sec   Loss 5.7626   LearningRate 0.0306   Epoch: 8   Global Step: 45200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:48,490-Speed 3392.29 samples/sec   Loss 5.8395   LearningRate 0.0306   Epoch: 8   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:51,519-Speed 3381.85 samples/sec   Loss 5.8752   LearningRate 0.0306   Epoch: 8   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:54,526-Speed 3405.63 samples/sec   Loss 5.8552   LearningRate 0.0306   Epoch: 8   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:52:57,529-Speed 3411.81 samples/sec   Loss 5.9451   LearningRate 0.0306   Epoch: 8   Global Step: 45240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:00,549-Speed 3391.25 samples/sec   Loss 5.8765   LearningRate 0.0305   Epoch: 8   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:03,553-Speed 3408.84 samples/sec   Loss 5.8148   LearningRate 0.0305   Epoch: 8   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:06,607-Speed 3353.98 samples/sec   Loss 5.7817   LearningRate 0.0305   Epoch: 8   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:09,613-Speed 3407.25 samples/sec   Loss 5.8922   LearningRate 0.0305   Epoch: 8   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:12,617-Speed 3410.40 samples/sec   Loss 5.7686   LearningRate 0.0305   Epoch: 8   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:15,623-Speed 3407.49 samples/sec   Loss 5.8107   LearningRate 0.0305   Epoch: 8   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:18,633-Speed 3403.06 samples/sec   Loss 5.8570   LearningRate 0.0305   Epoch: 8   Global Step: 45310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:21,644-Speed 3401.77 samples/sec   Loss 5.7692   LearningRate 0.0305   Epoch: 8   Global Step: 45320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:24,664-Speed 3390.58 samples/sec   Loss 5.9350   LearningRate 0.0305   Epoch: 8   Global Step: 45330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:27,668-Speed 3410.84 samples/sec   Loss 5.7361   LearningRate 0.0304   Epoch: 8   Global Step: 45340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:30,674-Speed 3407.09 samples/sec   Loss 5.7389   LearningRate 0.0304   Epoch: 8   Global Step: 45350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:33,681-Speed 3405.56 samples/sec   Loss 5.7516   LearningRate 0.0304   Epoch: 8   Global Step: 45360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:36,685-Speed 3410.18 samples/sec   Loss 5.8500   LearningRate 0.0304   Epoch: 8   Global Step: 45370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:39,690-Speed 3408.92 samples/sec   Loss 5.8227   LearningRate 0.0304   Epoch: 8   Global Step: 45380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:42,708-Speed 3393.94 samples/sec   Loss 5.9617   LearningRate 0.0304   Epoch: 8   Global Step: 45390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:45,709-Speed 3412.72 samples/sec   Loss 5.8615   LearningRate 0.0304   Epoch: 8   Global Step: 45400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:48,713-Speed 3409.50 samples/sec   Loss 5.7304   LearningRate 0.0304   Epoch: 8   Global Step: 45410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:51,714-Speed 3412.96 samples/sec   Loss 5.7777   LearningRate 0.0304   Epoch: 8   Global Step: 45420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:54,721-Speed 3406.35 samples/sec   Loss 5.8931   LearningRate 0.0304   Epoch: 8   Global Step: 45430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:53:57,723-Speed 3412.27 samples/sec   Loss 5.8304   LearningRate 0.0303   Epoch: 8   Global Step: 45440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:00,732-Speed 3403.04 samples/sec   Loss 5.8099   LearningRate 0.0303   Epoch: 8   Global Step: 45450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:03,739-Speed 3406.16 samples/sec   Loss 5.8861   LearningRate 0.0303   Epoch: 8   Global Step: 45460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:06,745-Speed 3408.76 samples/sec   Loss 5.8137   LearningRate 0.0303   Epoch: 8   Global Step: 45470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:09,759-Speed 3397.28 samples/sec   Loss 5.8916   LearningRate 0.0303   Epoch: 8   Global Step: 45480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:12,772-Speed 3399.76 samples/sec   Loss 6.0053   LearningRate 0.0303   Epoch: 8   Global Step: 45490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:15,781-Speed 3404.09 samples/sec   Loss 5.8987   LearningRate 0.0303   Epoch: 8   Global Step: 45500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:54:18,768-Speed 3429.25 samples/sec   Loss 5.8117   LearningRate 0.0303   Epoch: 8   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:21,848-Speed 3325.05 samples/sec   Loss 5.7765   LearningRate 0.0303   Epoch: 8   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:34,232-Speed 826.95 samples/sec   Loss 5.2065   LearningRate 0.0302   Epoch: 9   Global Step: 45530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:37,252-Speed 3391.93 samples/sec   Loss 4.8917   LearningRate 0.0302   Epoch: 9   Global Step: 45540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:40,280-Speed 3382.56 samples/sec   Loss 5.0386   LearningRate 0.0302   Epoch: 9   Global Step: 45550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:43,304-Speed 3387.39 samples/sec   Loss 4.9489   LearningRate 0.0302   Epoch: 9   Global Step: 45560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:46,323-Speed 3392.42 samples/sec   Loss 5.1061   LearningRate 0.0302   Epoch: 9   Global Step: 45570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:49,403-Speed 3326.43 samples/sec   Loss 5.0126   LearningRate 0.0302   Epoch: 9   Global Step: 45580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:52,413-Speed 3402.54 samples/sec   Loss 5.0469   LearningRate 0.0302   Epoch: 9   Global Step: 45590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:55,423-Speed 3402.41 samples/sec   Loss 5.0692   LearningRate 0.0302   Epoch: 9   Global Step: 45600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:54:58,430-Speed 3406.84 samples/sec   Loss 5.1437   LearningRate 0.0302   Epoch: 9   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:55:01,431-Speed 3413.37 samples/sec   Loss 4.9158   LearningRate 0.0301   Epoch: 9   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:55:04,484-Speed 3355.24 samples/sec   Loss 4.9515   LearningRate 0.0301   Epoch: 9   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:55:07,485-Speed 3411.85 samples/sec   Loss 5.0875   LearningRate 0.0301   Epoch: 9   Global Step: 45640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:10,496-Speed 3402.47 samples/sec   Loss 5.0600   LearningRate 0.0301   Epoch: 9   Global Step: 45650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:13,506-Speed 3402.87 samples/sec   Loss 4.9215   LearningRate 0.0301   Epoch: 9   Global Step: 45660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:16,544-Speed 3370.90 samples/sec   Loss 5.1093   LearningRate 0.0301   Epoch: 9   Global Step: 45670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:19,563-Speed 3393.31 samples/sec   Loss 5.1684   LearningRate 0.0301   Epoch: 9   Global Step: 45680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:22,580-Speed 3395.15 samples/sec   Loss 5.0457   LearningRate 0.0301   Epoch: 9   Global Step: 45690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:25,592-Speed 3400.40 samples/sec   Loss 5.0505   LearningRate 0.0301   Epoch: 9   Global Step: 45700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:28,593-Speed 3413.50 samples/sec   Loss 4.9961   LearningRate 0.0300   Epoch: 9   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:31,601-Speed 3404.28 samples/sec   Loss 5.1279   LearningRate 0.0300   Epoch: 9   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:34,605-Speed 3410.62 samples/sec   Loss 4.9850   LearningRate 0.0300   Epoch: 9   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:37,611-Speed 3406.95 samples/sec   Loss 5.0851   LearningRate 0.0300   Epoch: 9   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:55:40,622-Speed 3401.30 samples/sec   Loss 5.0356   LearningRate 0.0300   Epoch: 9   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 03:55:43,616-Speed 3421.30 samples/sec   Loss 5.2160   LearningRate 0.0300   Epoch: 9   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:46,616-Speed 3414.05 samples/sec   Loss 5.1380   LearningRate 0.0300   Epoch: 9   Global Step: 45770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:49,638-Speed 3388.63 samples/sec   Loss 5.0740   LearningRate 0.0300   Epoch: 9   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:52,646-Speed 3405.32 samples/sec   Loss 5.3304   LearningRate 0.0300   Epoch: 9   Global Step: 45790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:55,656-Speed 3403.63 samples/sec   Loss 5.0899   LearningRate 0.0299   Epoch: 9   Global Step: 45800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:55:58,672-Speed 3395.92 samples/sec   Loss 4.9939   LearningRate 0.0299   Epoch: 9   Global Step: 45810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:01,680-Speed 3404.34 samples/sec   Loss 5.1931   LearningRate 0.0299   Epoch: 9   Global Step: 45820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:04,689-Speed 3404.87 samples/sec   Loss 5.1395   LearningRate 0.0299   Epoch: 9   Global Step: 45830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:07,697-Speed 3404.47 samples/sec   Loss 5.1148   LearningRate 0.0299   Epoch: 9   Global Step: 45840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:10,703-Speed 3408.14 samples/sec   Loss 5.1427   LearningRate 0.0299   Epoch: 9   Global Step: 45850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:13,686-Speed 3432.74 samples/sec   Loss 5.1997   LearningRate 0.0299   Epoch: 9   Global Step: 45860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:16,696-Speed 3403.60 samples/sec   Loss 5.2560   LearningRate 0.0299   Epoch: 9   Global Step: 45870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:19,683-Speed 3429.21 samples/sec   Loss 5.1218   LearningRate 0.0299   Epoch: 9   Global Step: 45880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:22,700-Speed 3394.30 samples/sec   Loss 5.1061   LearningRate 0.0299   Epoch: 9   Global Step: 45890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:25,704-Speed 3409.62 samples/sec   Loss 5.2268   LearningRate 0.0298   Epoch: 9   Global Step: 45900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:28,827-Speed 3280.24 samples/sec   Loss 5.2606   LearningRate 0.0298   Epoch: 9   Global Step: 45910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:31,835-Speed 3404.90 samples/sec   Loss 5.2577   LearningRate 0.0298   Epoch: 9   Global Step: 45920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:34,839-Speed 3409.62 samples/sec   Loss 5.1279   LearningRate 0.0298   Epoch: 9   Global Step: 45930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:37,853-Speed 3398.59 samples/sec   Loss 5.2932   LearningRate 0.0298   Epoch: 9   Global Step: 45940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:40,858-Speed 3408.65 samples/sec   Loss 5.3948   LearningRate 0.0298   Epoch: 9   Global Step: 45950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:43,865-Speed 3406.30 samples/sec   Loss 5.2828   LearningRate 0.0298   Epoch: 9   Global Step: 45960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:46,882-Speed 3393.90 samples/sec   Loss 5.1796   LearningRate 0.0298   Epoch: 9   Global Step: 45970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:56:49,892-Speed 3403.30 samples/sec   Loss 5.1869   LearningRate 0.0298   Epoch: 9   Global Step: 45980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:53,013-Speed 3282.15 samples/sec   Loss 5.3348   LearningRate 0.0297   Epoch: 9   Global Step: 45990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:56:56,024-Speed 3401.27 samples/sec   Loss 5.2796   LearningRate 0.0297   Epoch: 9   Global Step: 46000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:57:40,078-[lfw][46000]XNorm: 22.894358
Training: 2022-04-11 03:57:40,079-[lfw][46000]Accuracy-Flip: 0.99783+-0.00224
Training: 2022-04-11 03:57:40,079-[lfw][46000]Accuracy-Highest: 0.99800
Training: 2022-04-11 03:58:31,428-[cfp_fp][46000]XNorm: 20.601470
Training: 2022-04-11 03:58:31,428-[cfp_fp][46000]Accuracy-Flip: 0.97157+-0.00799
Training: 2022-04-11 03:58:31,429-[cfp_fp][46000]Accuracy-Highest: 0.97543
Training: 2022-04-11 03:59:15,673-[agedb_30][46000]XNorm: 22.624711
Training: 2022-04-11 03:59:15,674-[agedb_30][46000]Accuracy-Flip: 0.97967+-0.00653
Training: 2022-04-11 03:59:15,674-[agedb_30][46000]Accuracy-Highest: 0.98083
Training: 2022-04-11 03:59:18,671-Speed 71.79 samples/sec   Loss 5.2783   LearningRate 0.0297   Epoch: 9   Global Step: 46010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:21,657-Speed 3429.95 samples/sec   Loss 5.2581   LearningRate 0.0297   Epoch: 9   Global Step: 46020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:24,658-Speed 3412.14 samples/sec   Loss 5.3597   LearningRate 0.0297   Epoch: 9   Global Step: 46030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:27,650-Speed 3423.30 samples/sec   Loss 5.2082   LearningRate 0.0297   Epoch: 9   Global Step: 46040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:30,639-Speed 3426.96 samples/sec   Loss 5.3183   LearningRate 0.0297   Epoch: 9   Global Step: 46050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:33,626-Speed 3428.92 samples/sec   Loss 5.3331   LearningRate 0.0297   Epoch: 9   Global Step: 46060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:36,628-Speed 3411.84 samples/sec   Loss 5.3094   LearningRate 0.0297   Epoch: 9   Global Step: 46070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:39,629-Speed 3413.73 samples/sec   Loss 5.2876   LearningRate 0.0296   Epoch: 9   Global Step: 46080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:42,619-Speed 3425.75 samples/sec   Loss 5.3493   LearningRate 0.0296   Epoch: 9   Global Step: 46090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:45,609-Speed 3424.69 samples/sec   Loss 5.3447   LearningRate 0.0296   Epoch: 9   Global Step: 46100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 03:59:48,609-Speed 3414.78 samples/sec   Loss 5.3262   LearningRate 0.0296   Epoch: 9   Global Step: 46110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:59:51,599-Speed 3425.44 samples/sec   Loss 5.3824   LearningRate 0.0296   Epoch: 9   Global Step: 46120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:59:54,603-Speed 3409.09 samples/sec   Loss 5.3492   LearningRate 0.0296   Epoch: 9   Global Step: 46130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 03:59:57,598-Speed 3420.69 samples/sec   Loss 5.3327   LearningRate 0.0296   Epoch: 9   Global Step: 46140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:00,577-Speed 3438.29 samples/sec   Loss 5.1828   LearningRate 0.0296   Epoch: 9   Global Step: 46150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:03,582-Speed 3408.89 samples/sec   Loss 5.4298   LearningRate 0.0296   Epoch: 9   Global Step: 46160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:06,578-Speed 3418.12 samples/sec   Loss 5.2257   LearningRate 0.0295   Epoch: 9   Global Step: 46170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:09,573-Speed 3420.41 samples/sec   Loss 5.2835   LearningRate 0.0295   Epoch: 9   Global Step: 46180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:12,572-Speed 3414.99 samples/sec   Loss 5.3891   LearningRate 0.0295   Epoch: 9   Global Step: 46190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:15,571-Speed 3415.03 samples/sec   Loss 5.3824   LearningRate 0.0295   Epoch: 9   Global Step: 46200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:18,587-Speed 3396.55 samples/sec   Loss 5.3158   LearningRate 0.0295   Epoch: 9   Global Step: 46210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:21,587-Speed 3414.34 samples/sec   Loss 5.2907   LearningRate 0.0295   Epoch: 9   Global Step: 46220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:24,594-Speed 3405.75 samples/sec   Loss 5.3121   LearningRate 0.0295   Epoch: 9   Global Step: 46230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:27,618-Speed 3387.04 samples/sec   Loss 5.3665   LearningRate 0.0295   Epoch: 9   Global Step: 46240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:00:30,624-Speed 3407.74 samples/sec   Loss 5.3940   LearningRate 0.0295   Epoch: 9   Global Step: 46250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:33,629-Speed 3408.24 samples/sec   Loss 5.4352   LearningRate 0.0295   Epoch: 9   Global Step: 46260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:36,628-Speed 3415.73 samples/sec   Loss 5.3078   LearningRate 0.0294   Epoch: 9   Global Step: 46270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:39,629-Speed 3412.38 samples/sec   Loss 5.3276   LearningRate 0.0294   Epoch: 9   Global Step: 46280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:42,633-Speed 3410.04 samples/sec   Loss 5.5195   LearningRate 0.0294   Epoch: 9   Global Step: 46290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:45,634-Speed 3412.84 samples/sec   Loss 5.5237   LearningRate 0.0294   Epoch: 9   Global Step: 46300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:48,647-Speed 3399.67 samples/sec   Loss 5.3673   LearningRate 0.0294   Epoch: 9   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:51,648-Speed 3413.10 samples/sec   Loss 5.3944   LearningRate 0.0294   Epoch: 9   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:54,643-Speed 3419.62 samples/sec   Loss 5.2891   LearningRate 0.0294   Epoch: 9   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:00:57,649-Speed 3407.15 samples/sec   Loss 5.3847   LearningRate 0.0294   Epoch: 9   Global Step: 46340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:00,649-Speed 3414.60 samples/sec   Loss 5.4336   LearningRate 0.0294   Epoch: 9   Global Step: 46350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:01:03,651-Speed 3412.57 samples/sec   Loss 5.3958   LearningRate 0.0293   Epoch: 9   Global Step: 46360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:01:06,744-Speed 3310.51 samples/sec   Loss 5.5512   LearningRate 0.0293   Epoch: 9   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:09,739-Speed 3420.27 samples/sec   Loss 5.4958   LearningRate 0.0293   Epoch: 9   Global Step: 46380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:12,737-Speed 3417.08 samples/sec   Loss 5.5623   LearningRate 0.0293   Epoch: 9   Global Step: 46390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:15,742-Speed 3408.20 samples/sec   Loss 5.2804   LearningRate 0.0293   Epoch: 9   Global Step: 46400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:18,748-Speed 3407.67 samples/sec   Loss 5.2808   LearningRate 0.0293   Epoch: 9   Global Step: 46410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:21,756-Speed 3404.66 samples/sec   Loss 5.2629   LearningRate 0.0293   Epoch: 9   Global Step: 46420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:24,752-Speed 3418.57 samples/sec   Loss 5.3645   LearningRate 0.0293   Epoch: 9   Global Step: 46430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:27,754-Speed 3411.47 samples/sec   Loss 5.4124   LearningRate 0.0293   Epoch: 9   Global Step: 46440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:30,758-Speed 3409.74 samples/sec   Loss 5.3331   LearningRate 0.0292   Epoch: 9   Global Step: 46450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:33,753-Speed 3421.22 samples/sec   Loss 5.5081   LearningRate 0.0292   Epoch: 9   Global Step: 46460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:36,754-Speed 3411.90 samples/sec   Loss 5.3738   LearningRate 0.0292   Epoch: 9   Global Step: 46470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:01:39,755-Speed 3413.27 samples/sec   Loss 5.2747   LearningRate 0.0292   Epoch: 9   Global Step: 46480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:01:42,737-Speed 3435.57 samples/sec   Loss 5.4594   LearningRate 0.0292   Epoch: 9   Global Step: 46490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:45,734-Speed 3417.03 samples/sec   Loss 5.4510   LearningRate 0.0292   Epoch: 9   Global Step: 46500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:48,739-Speed 3408.40 samples/sec   Loss 5.4793   LearningRate 0.0292   Epoch: 9   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:51,744-Speed 3408.79 samples/sec   Loss 5.4671   LearningRate 0.0292   Epoch: 9   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:54,741-Speed 3416.94 samples/sec   Loss 5.4706   LearningRate 0.0292   Epoch: 9   Global Step: 46530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:01:57,739-Speed 3416.39 samples/sec   Loss 5.4302   LearningRate 0.0292   Epoch: 9   Global Step: 46540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:00,736-Speed 3417.74 samples/sec   Loss 5.4615   LearningRate 0.0291   Epoch: 9   Global Step: 46550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:03,739-Speed 3411.54 samples/sec   Loss 5.5114   LearningRate 0.0291   Epoch: 9   Global Step: 46560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:06,738-Speed 3414.54 samples/sec   Loss 5.5233   LearningRate 0.0291   Epoch: 9   Global Step: 46570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:09,749-Speed 3401.80 samples/sec   Loss 5.3292   LearningRate 0.0291   Epoch: 9   Global Step: 46580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:12,748-Speed 3415.53 samples/sec   Loss 5.4721   LearningRate 0.0291   Epoch: 9   Global Step: 46590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:02:15,752-Speed 3409.28 samples/sec   Loss 5.3204   LearningRate 0.0291   Epoch: 9   Global Step: 46600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:02:18,764-Speed 3400.81 samples/sec   Loss 5.5185   LearningRate 0.0291   Epoch: 9   Global Step: 46610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:02:21,761-Speed 3417.42 samples/sec   Loss 5.4185   LearningRate 0.0291   Epoch: 9   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:02:24,745-Speed 3432.47 samples/sec   Loss 5.4870   LearningRate 0.0291   Epoch: 9   Global Step: 46630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:27,757-Speed 3400.70 samples/sec   Loss 5.3830   LearningRate 0.0290   Epoch: 9   Global Step: 46640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:30,759-Speed 3412.60 samples/sec   Loss 5.6241   LearningRate 0.0290   Epoch: 9   Global Step: 46650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:33,756-Speed 3417.36 samples/sec   Loss 5.5085   LearningRate 0.0290   Epoch: 9   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:36,853-Speed 3307.67 samples/sec   Loss 5.3946   LearningRate 0.0290   Epoch: 9   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:39,893-Speed 3368.44 samples/sec   Loss 5.5283   LearningRate 0.0290   Epoch: 9   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:42,896-Speed 3411.39 samples/sec   Loss 5.3179   LearningRate 0.0290   Epoch: 9   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:45,899-Speed 3410.67 samples/sec   Loss 5.4737   LearningRate 0.0290   Epoch: 9   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:48,899-Speed 3414.45 samples/sec   Loss 5.6211   LearningRate 0.0290   Epoch: 9   Global Step: 46710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:51,915-Speed 3395.00 samples/sec   Loss 5.5703   LearningRate 0.0290   Epoch: 9   Global Step: 46720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:02:54,918-Speed 3410.89 samples/sec   Loss 5.4867   LearningRate 0.0290   Epoch: 9   Global Step: 46730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:02:57,925-Speed 3406.93 samples/sec   Loss 5.4721   LearningRate 0.0289   Epoch: 9   Global Step: 46740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:03:00,949-Speed 3386.51 samples/sec   Loss 5.4367   LearningRate 0.0289   Epoch: 9   Global Step: 46750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:03:03,960-Speed 3402.25 samples/sec   Loss 5.4975   LearningRate 0.0289   Epoch: 9   Global Step: 46760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:03:06,964-Speed 3409.28 samples/sec   Loss 5.4201   LearningRate 0.0289   Epoch: 9   Global Step: 46770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:03:09,965-Speed 3412.84 samples/sec   Loss 5.5425   LearningRate 0.0289   Epoch: 9   Global Step: 46780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:03:12,948-Speed 3434.99 samples/sec   Loss 5.5305   LearningRate 0.0289   Epoch: 9   Global Step: 46790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:15,951-Speed 3410.87 samples/sec   Loss 5.4381   LearningRate 0.0289   Epoch: 9   Global Step: 46800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:18,961-Speed 3403.14 samples/sec   Loss 5.3316   LearningRate 0.0289   Epoch: 9   Global Step: 46810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:21,961-Speed 3413.57 samples/sec   Loss 5.3470   LearningRate 0.0289   Epoch: 9   Global Step: 46820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:24,964-Speed 3411.23 samples/sec   Loss 5.3926   LearningRate 0.0288   Epoch: 9   Global Step: 46830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:27,964-Speed 3414.44 samples/sec   Loss 5.3832   LearningRate 0.0288   Epoch: 9   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:30,966-Speed 3411.61 samples/sec   Loss 5.4977   LearningRate 0.0288   Epoch: 9   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:33,972-Speed 3407.53 samples/sec   Loss 5.6001   LearningRate 0.0288   Epoch: 9   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:36,974-Speed 3411.93 samples/sec   Loss 5.4998   LearningRate 0.0288   Epoch: 9   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:39,975-Speed 3413.01 samples/sec   Loss 5.5329   LearningRate 0.0288   Epoch: 9   Global Step: 46880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:42,966-Speed 3424.59 samples/sec   Loss 5.3249   LearningRate 0.0288   Epoch: 9   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:45,968-Speed 3412.11 samples/sec   Loss 5.4111   LearningRate 0.0288   Epoch: 9   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:48,976-Speed 3405.48 samples/sec   Loss 5.3417   LearningRate 0.0288   Epoch: 9   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:51,995-Speed 3391.60 samples/sec   Loss 5.4493   LearningRate 0.0287   Epoch: 9   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:55,005-Speed 3402.91 samples/sec   Loss 5.2943   LearningRate 0.0287   Epoch: 9   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:03:58,008-Speed 3411.72 samples/sec   Loss 5.4319   LearningRate 0.0287   Epoch: 9   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:01,016-Speed 3404.98 samples/sec   Loss 5.4996   LearningRate 0.0287   Epoch: 9   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:04,078-Speed 3345.13 samples/sec   Loss 5.3327   LearningRate 0.0287   Epoch: 9   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:07,091-Speed 3399.26 samples/sec   Loss 5.4907   LearningRate 0.0287   Epoch: 9   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:10,094-Speed 3410.05 samples/sec   Loss 5.5476   LearningRate 0.0287   Epoch: 9   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:13,097-Speed 3411.80 samples/sec   Loss 5.4290   LearningRate 0.0287   Epoch: 9   Global Step: 46990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:04:16,090-Speed 3421.79 samples/sec   Loss 5.5041   LearningRate 0.0287   Epoch: 9   Global Step: 47000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:19,093-Speed 3411.14 samples/sec   Loss 5.4714   LearningRate 0.0287   Epoch: 9   Global Step: 47010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:22,145-Speed 3355.97 samples/sec   Loss 5.5294   LearningRate 0.0286   Epoch: 9   Global Step: 47020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:25,188-Speed 3366.11 samples/sec   Loss 5.2839   LearningRate 0.0286   Epoch: 9   Global Step: 47030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:28,196-Speed 3405.50 samples/sec   Loss 5.3787   LearningRate 0.0286   Epoch: 9   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:31,212-Speed 3395.54 samples/sec   Loss 5.3735   LearningRate 0.0286   Epoch: 9   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:34,214-Speed 3411.78 samples/sec   Loss 5.5117   LearningRate 0.0286   Epoch: 9   Global Step: 47060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:37,221-Speed 3405.81 samples/sec   Loss 5.5585   LearningRate 0.0286   Epoch: 9   Global Step: 47070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:40,229-Speed 3405.78 samples/sec   Loss 5.7448   LearningRate 0.0286   Epoch: 9   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:43,292-Speed 3343.70 samples/sec   Loss 5.6093   LearningRate 0.0286   Epoch: 9   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:04:46,280-Speed 3428.10 samples/sec   Loss 5.5032   LearningRate 0.0286   Epoch: 9   Global Step: 47100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:04:49,299-Speed 3391.82 samples/sec   Loss 5.3787   LearningRate 0.0285   Epoch: 9   Global Step: 47110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:04:52,309-Speed 3403.30 samples/sec   Loss 5.6982   LearningRate 0.0285   Epoch: 9   Global Step: 47120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:04:55,313-Speed 3410.04 samples/sec   Loss 5.4271   LearningRate 0.0285   Epoch: 9   Global Step: 47130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:04:58,317-Speed 3409.55 samples/sec   Loss 5.3607   LearningRate 0.0285   Epoch: 9   Global Step: 47140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:05:01,325-Speed 3405.62 samples/sec   Loss 5.5383   LearningRate 0.0285   Epoch: 9   Global Step: 47150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:05:04,337-Speed 3399.28 samples/sec   Loss 5.4425   LearningRate 0.0285   Epoch: 9   Global Step: 47160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:05:07,343-Speed 3408.33 samples/sec   Loss 5.4831   LearningRate 0.0285   Epoch: 9   Global Step: 47170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:05:10,348-Speed 3407.55 samples/sec   Loss 5.4661   LearningRate 0.0285   Epoch: 9   Global Step: 47180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:05:13,353-Speed 3409.31 samples/sec   Loss 5.5436   LearningRate 0.0285   Epoch: 9   Global Step: 47190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 04:05:16,407-Speed 3353.19 samples/sec   Loss 5.5051   LearningRate 0.0285   Epoch: 9   Global Step: 47200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:19,431-Speed 3387.17 samples/sec   Loss 5.4730   LearningRate 0.0284   Epoch: 9   Global Step: 47210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:22,444-Speed 3400.01 samples/sec   Loss 5.4450   LearningRate 0.0284   Epoch: 9   Global Step: 47220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:25,449-Speed 3408.95 samples/sec   Loss 5.4261   LearningRate 0.0284   Epoch: 9   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:28,468-Speed 3392.78 samples/sec   Loss 5.4290   LearningRate 0.0284   Epoch: 9   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:31,478-Speed 3401.84 samples/sec   Loss 5.5208   LearningRate 0.0284   Epoch: 9   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:34,485-Speed 3406.51 samples/sec   Loss 5.4529   LearningRate 0.0284   Epoch: 9   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:37,500-Speed 3396.82 samples/sec   Loss 5.4388   LearningRate 0.0284   Epoch: 9   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:40,509-Speed 3403.69 samples/sec   Loss 5.4476   LearningRate 0.0284   Epoch: 9   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:43,515-Speed 3408.37 samples/sec   Loss 5.5065   LearningRate 0.0284   Epoch: 9   Global Step: 47290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:46,533-Speed 3393.38 samples/sec   Loss 5.3900   LearningRate 0.0283   Epoch: 9   Global Step: 47300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 04:05:49,519-Speed 3429.76 samples/sec   Loss 5.5682   LearningRate 0.0283   Epoch: 9   Global Step: 47310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:52,525-Speed 3408.05 samples/sec   Loss 5.4730   LearningRate 0.0283   Epoch: 9   Global Step: 47320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:55,534-Speed 3403.72 samples/sec   Loss 5.4974   LearningRate 0.0283   Epoch: 9   Global Step: 47330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:05:58,546-Speed 3401.47 samples/sec   Loss 5.3747   LearningRate 0.0283   Epoch: 9   Global Step: 47340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:01,553-Speed 3405.89 samples/sec   Loss 5.5311   LearningRate 0.0283   Epoch: 9   Global Step: 47350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:04,561-Speed 3404.95 samples/sec   Loss 5.5149   LearningRate 0.0283   Epoch: 9   Global Step: 47360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:07,565-Speed 3409.29 samples/sec   Loss 5.4444   LearningRate 0.0283   Epoch: 9   Global Step: 47370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:10,577-Speed 3400.71 samples/sec   Loss 5.3772   LearningRate 0.0283   Epoch: 9   Global Step: 47380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:13,579-Speed 3412.14 samples/sec   Loss 5.4922   LearningRate 0.0283   Epoch: 9   Global Step: 47390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:16,584-Speed 3407.55 samples/sec   Loss 5.5477   LearningRate 0.0282   Epoch: 9   Global Step: 47400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:19,572-Speed 3428.56 samples/sec   Loss 5.4713   LearningRate 0.0282   Epoch: 9   Global Step: 47410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:22,577-Speed 3408.82 samples/sec   Loss 5.4986   LearningRate 0.0282   Epoch: 9   Global Step: 47420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:25,589-Speed 3400.50 samples/sec   Loss 5.4414   LearningRate 0.0282   Epoch: 9   Global Step: 47430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:28,599-Speed 3402.71 samples/sec   Loss 5.5150   LearningRate 0.0282   Epoch: 9   Global Step: 47440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:31,602-Speed 3411.01 samples/sec   Loss 5.5595   LearningRate 0.0282   Epoch: 9   Global Step: 47450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:34,606-Speed 3409.26 samples/sec   Loss 5.4664   LearningRate 0.0282   Epoch: 9   Global Step: 47460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:37,617-Speed 3401.53 samples/sec   Loss 5.6317   LearningRate 0.0282   Epoch: 9   Global Step: 47470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:40,623-Speed 3407.39 samples/sec   Loss 5.3990   LearningRate 0.0282   Epoch: 9   Global Step: 47480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:43,628-Speed 3408.44 samples/sec   Loss 5.5219   LearningRate 0.0281   Epoch: 9   Global Step: 47490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 04:06:46,641-Speed 3399.42 samples/sec   Loss 5.5370   LearningRate 0.0281   Epoch: 9   Global Step: 47500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:06:49,633-Speed 3423.26 samples/sec   Loss 5.4336   LearningRate 0.0281   Epoch: 9   Global Step: 47510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:06:52,787-Speed 3247.12 samples/sec   Loss 5.5320   LearningRate 0.0281   Epoch: 9   Global Step: 47520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:06:55,790-Speed 3411.42 samples/sec   Loss 5.4518   LearningRate 0.0281   Epoch: 9   Global Step: 47530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:06:58,793-Speed 3410.84 samples/sec   Loss 5.5277   LearningRate 0.0281   Epoch: 9   Global Step: 47540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:01,801-Speed 3404.65 samples/sec   Loss 5.5436   LearningRate 0.0281   Epoch: 9   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:04,817-Speed 3395.74 samples/sec   Loss 5.5305   LearningRate 0.0281   Epoch: 9   Global Step: 47560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:07,824-Speed 3406.48 samples/sec   Loss 5.4373   LearningRate 0.0281   Epoch: 9   Global Step: 47570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:10,838-Speed 3398.53 samples/sec   Loss 5.4256   LearningRate 0.0281   Epoch: 9   Global Step: 47580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:13,844-Speed 3407.36 samples/sec   Loss 5.3767   LearningRate 0.0280   Epoch: 9   Global Step: 47590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:16,856-Speed 3400.55 samples/sec   Loss 5.4998   LearningRate 0.0280   Epoch: 9   Global Step: 47600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:19,858-Speed 3411.57 samples/sec   Loss 5.5564   LearningRate 0.0280   Epoch: 9   Global Step: 47610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:07:22,849-Speed 3425.34 samples/sec   Loss 5.5421   LearningRate 0.0280   Epoch: 9   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:25,895-Speed 3361.98 samples/sec   Loss 5.5926   LearningRate 0.0280   Epoch: 9   Global Step: 47630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:28,907-Speed 3400.76 samples/sec   Loss 5.5777   LearningRate 0.0280   Epoch: 9   Global Step: 47640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:31,916-Speed 3404.66 samples/sec   Loss 5.5195   LearningRate 0.0280   Epoch: 9   Global Step: 47650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:34,927-Speed 3401.13 samples/sec   Loss 5.4675   LearningRate 0.0280   Epoch: 9   Global Step: 47660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:37,934-Speed 3406.15 samples/sec   Loss 5.3233   LearningRate 0.0280   Epoch: 9   Global Step: 47670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:40,944-Speed 3402.50 samples/sec   Loss 5.4438   LearningRate 0.0279   Epoch: 9   Global Step: 47680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:43,955-Speed 3401.88 samples/sec   Loss 5.4691   LearningRate 0.0279   Epoch: 9   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:46,959-Speed 3409.31 samples/sec   Loss 5.5525   LearningRate 0.0279   Epoch: 9   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:49,965-Speed 3407.77 samples/sec   Loss 5.5828   LearningRate 0.0279   Epoch: 9   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:52,977-Speed 3400.59 samples/sec   Loss 5.4482   LearningRate 0.0279   Epoch: 9   Global Step: 47720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:07:55,969-Speed 3423.19 samples/sec   Loss 5.4827   LearningRate 0.0279   Epoch: 9   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:07:58,973-Speed 3410.28 samples/sec   Loss 5.5970   LearningRate 0.0279   Epoch: 9   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:01,982-Speed 3403.31 samples/sec   Loss 5.4873   LearningRate 0.0279   Epoch: 9   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:04,985-Speed 3411.45 samples/sec   Loss 5.4661   LearningRate 0.0279   Epoch: 9   Global Step: 47760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:08,078-Speed 3311.59 samples/sec   Loss 5.3300   LearningRate 0.0279   Epoch: 9   Global Step: 47770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:11,091-Speed 3398.52 samples/sec   Loss 5.4351   LearningRate 0.0278   Epoch: 9   Global Step: 47780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:14,098-Speed 3406.81 samples/sec   Loss 5.3806   LearningRate 0.0278   Epoch: 9   Global Step: 47790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:17,106-Speed 3404.27 samples/sec   Loss 5.4983   LearningRate 0.0278   Epoch: 9   Global Step: 47800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:20,117-Speed 3401.63 samples/sec   Loss 5.4936   LearningRate 0.0278   Epoch: 9   Global Step: 47810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:23,131-Speed 3398.69 samples/sec   Loss 5.6095   LearningRate 0.0278   Epoch: 9   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:26,146-Speed 3398.02 samples/sec   Loss 5.3294   LearningRate 0.0278   Epoch: 9   Global Step: 47830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:08:29,144-Speed 3416.13 samples/sec   Loss 5.4933   LearningRate 0.0278   Epoch: 9   Global Step: 47840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:32,161-Speed 3394.63 samples/sec   Loss 5.5714   LearningRate 0.0278   Epoch: 9   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:35,181-Speed 3391.64 samples/sec   Loss 5.4169   LearningRate 0.0278   Epoch: 9   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:38,192-Speed 3401.32 samples/sec   Loss 5.5984   LearningRate 0.0278   Epoch: 9   Global Step: 47870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:41,254-Speed 3345.43 samples/sec   Loss 5.4013   LearningRate 0.0277   Epoch: 9   Global Step: 47880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:44,263-Speed 3403.57 samples/sec   Loss 5.4412   LearningRate 0.0277   Epoch: 9   Global Step: 47890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:47,275-Speed 3400.32 samples/sec   Loss 5.4591   LearningRate 0.0277   Epoch: 9   Global Step: 47900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:50,283-Speed 3405.13 samples/sec   Loss 5.3411   LearningRate 0.0277   Epoch: 9   Global Step: 47910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:53,370-Speed 3317.92 samples/sec   Loss 5.4917   LearningRate 0.0277   Epoch: 9   Global Step: 47920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:56,372-Speed 3413.20 samples/sec   Loss 5.5456   LearningRate 0.0277   Epoch: 9   Global Step: 47930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:08:59,362-Speed 3425.76 samples/sec   Loss 5.4907   LearningRate 0.0277   Epoch: 9   Global Step: 47940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:09:02,367-Speed 3407.75 samples/sec   Loss 5.4387   LearningRate 0.0277   Epoch: 9   Global Step: 47950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:09:05,375-Speed 3405.43 samples/sec   Loss 5.6267   LearningRate 0.0277   Epoch: 9   Global Step: 47960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:09:08,379-Speed 3409.10 samples/sec   Loss 5.3642   LearningRate 0.0276   Epoch: 9   Global Step: 47970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:09:11,389-Speed 3402.49 samples/sec   Loss 5.5626   LearningRate 0.0276   Epoch: 9   Global Step: 47980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:09:14,411-Speed 3390.01 samples/sec   Loss 5.5128   LearningRate 0.0276   Epoch: 9   Global Step: 47990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:09:17,418-Speed 3405.93 samples/sec   Loss 5.5355   LearningRate 0.0276   Epoch: 9   Global Step: 48000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:10:01,723-[lfw][48000]XNorm: 23.144163
Training: 2022-04-11 04:10:01,723-[lfw][48000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-11 04:10:01,724-[lfw][48000]Accuracy-Highest: 0.99817
Training: 2022-04-11 04:10:53,167-[cfp_fp][48000]XNorm: 21.078792
Training: 2022-04-11 04:10:53,168-[cfp_fp][48000]Accuracy-Flip: 0.97629+-0.00829
Training: 2022-04-11 04:10:53,168-[cfp_fp][48000]Accuracy-Highest: 0.97629
Training: 2022-04-11 04:11:37,562-[agedb_30][48000]XNorm: 22.959230
Training: 2022-04-11 04:11:37,563-[agedb_30][48000]Accuracy-Flip: 0.97883+-0.00671
Training: 2022-04-11 04:11:37,563-[agedb_30][48000]Accuracy-Highest: 0.98083
Training: 2022-04-11 04:11:40,593-Speed 71.52 samples/sec   Loss 5.6525   LearningRate 0.0276   Epoch: 9   Global Step: 48010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:11:43,585-Speed 3423.29 samples/sec   Loss 5.3786   LearningRate 0.0276   Epoch: 9   Global Step: 48020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:11:46,573-Speed 3427.94 samples/sec   Loss 5.4885   LearningRate 0.0276   Epoch: 9   Global Step: 48030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:11:49,577-Speed 3409.78 samples/sec   Loss 5.5358   LearningRate 0.0276   Epoch: 9   Global Step: 48040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:11:52,549-Speed 3446.92 samples/sec   Loss 5.4781   LearningRate 0.0276   Epoch: 9   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:11:55,542-Speed 3422.39 samples/sec   Loss 5.5342   LearningRate 0.0276   Epoch: 9   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:11:58,536-Speed 3420.90 samples/sec   Loss 5.3398   LearningRate 0.0275   Epoch: 9   Global Step: 48070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:01,533-Speed 3416.62 samples/sec   Loss 5.4674   LearningRate 0.0275   Epoch: 9   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:04,541-Speed 3406.19 samples/sec   Loss 5.5689   LearningRate 0.0275   Epoch: 9   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:07,548-Speed 3405.87 samples/sec   Loss 5.4827   LearningRate 0.0275   Epoch: 9   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:10,545-Speed 3417.27 samples/sec   Loss 5.4170   LearningRate 0.0275   Epoch: 9   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:13,542-Speed 3417.61 samples/sec   Loss 5.7087   LearningRate 0.0275   Epoch: 9   Global Step: 48120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:16,551-Speed 3403.64 samples/sec   Loss 5.4306   LearningRate 0.0275   Epoch: 9   Global Step: 48130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:19,550-Speed 3415.40 samples/sec   Loss 5.5830   LearningRate 0.0275   Epoch: 9   Global Step: 48140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:22,528-Speed 3439.40 samples/sec   Loss 5.6120   LearningRate 0.0275   Epoch: 9   Global Step: 48150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:25,585-Speed 3351.20 samples/sec   Loss 5.4231   LearningRate 0.0274   Epoch: 9   Global Step: 48160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:28,584-Speed 3415.07 samples/sec   Loss 5.4111   LearningRate 0.0274   Epoch: 9   Global Step: 48170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:31,582-Speed 3416.31 samples/sec   Loss 5.5693   LearningRate 0.0274   Epoch: 9   Global Step: 48180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:34,578-Speed 3418.86 samples/sec   Loss 5.5040   LearningRate 0.0274   Epoch: 9   Global Step: 48190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:12:37,554-Speed 3441.69 samples/sec   Loss 5.6332   LearningRate 0.0274   Epoch: 9   Global Step: 48200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:12:40,562-Speed 3405.29 samples/sec   Loss 5.4593   LearningRate 0.0274   Epoch: 9   Global Step: 48210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:12:43,555-Speed 3421.84 samples/sec   Loss 5.4716   LearningRate 0.0274   Epoch: 9   Global Step: 48220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:12:46,561-Speed 3407.71 samples/sec   Loss 5.2709   LearningRate 0.0274   Epoch: 9   Global Step: 48230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:12:49,558-Speed 3417.99 samples/sec   Loss 5.5409   LearningRate 0.0274   Epoch: 9   Global Step: 48240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:12:52,559-Speed 3412.64 samples/sec   Loss 5.3980   LearningRate 0.0274   Epoch: 9   Global Step: 48250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:12:55,560-Speed 3413.07 samples/sec   Loss 5.2875   LearningRate 0.0273   Epoch: 9   Global Step: 48260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:12:58,561-Speed 3413.29 samples/sec   Loss 5.3233   LearningRate 0.0273   Epoch: 9   Global Step: 48270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:01,570-Speed 3403.09 samples/sec   Loss 5.3303   LearningRate 0.0273   Epoch: 9   Global Step: 48280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:04,671-Speed 3303.00 samples/sec   Loss 5.4491   LearningRate 0.0273   Epoch: 9   Global Step: 48290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:07,663-Speed 3423.46 samples/sec   Loss 5.4425   LearningRate 0.0273   Epoch: 9   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:13:10,657-Speed 3421.81 samples/sec   Loss 5.6298   LearningRate 0.0273   Epoch: 9   Global Step: 48310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:13:13,667-Speed 3402.10 samples/sec   Loss 5.4931   LearningRate 0.0273   Epoch: 9   Global Step: 48320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:13:16,675-Speed 3405.90 samples/sec   Loss 5.4282   LearningRate 0.0273   Epoch: 9   Global Step: 48330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:13:19,671-Speed 3418.55 samples/sec   Loss 5.4110   LearningRate 0.0273   Epoch: 9   Global Step: 48340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:13:22,670-Speed 3414.81 samples/sec   Loss 5.5373   LearningRate 0.0273   Epoch: 9   Global Step: 48350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:13:25,691-Speed 3390.78 samples/sec   Loss 5.5117   LearningRate 0.0272   Epoch: 9   Global Step: 48360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:13:28,710-Speed 3393.03 samples/sec   Loss 5.4921   LearningRate 0.0272   Epoch: 9   Global Step: 48370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:31,703-Speed 3421.59 samples/sec   Loss 5.3574   LearningRate 0.0272   Epoch: 9   Global Step: 48380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:34,703-Speed 3414.69 samples/sec   Loss 5.4001   LearningRate 0.0272   Epoch: 9   Global Step: 48390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:37,715-Speed 3400.02 samples/sec   Loss 5.4773   LearningRate 0.0272   Epoch: 9   Global Step: 48400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:40,715-Speed 3415.54 samples/sec   Loss 5.4933   LearningRate 0.0272   Epoch: 9   Global Step: 48410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:43,708-Speed 3420.98 samples/sec   Loss 5.4454   LearningRate 0.0272   Epoch: 9   Global Step: 48420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:46,716-Speed 3405.68 samples/sec   Loss 5.3321   LearningRate 0.0272   Epoch: 9   Global Step: 48430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:49,713-Speed 3418.19 samples/sec   Loss 5.4678   LearningRate 0.0272   Epoch: 9   Global Step: 48440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:52,721-Speed 3405.26 samples/sec   Loss 5.4105   LearningRate 0.0271   Epoch: 9   Global Step: 48450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:55,719-Speed 3415.79 samples/sec   Loss 5.5086   LearningRate 0.0271   Epoch: 9   Global Step: 48460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:13:58,872-Speed 3248.56 samples/sec   Loss 5.4004   LearningRate 0.0271   Epoch: 9   Global Step: 48470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:01,877-Speed 3408.56 samples/sec   Loss 5.4598   LearningRate 0.0271   Epoch: 9   Global Step: 48480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:04,862-Speed 3430.63 samples/sec   Loss 5.5447   LearningRate 0.0271   Epoch: 9   Global Step: 48490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:07,872-Speed 3402.83 samples/sec   Loss 5.3768   LearningRate 0.0271   Epoch: 9   Global Step: 48500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:10,887-Speed 3397.74 samples/sec   Loss 5.4789   LearningRate 0.0271   Epoch: 9   Global Step: 48510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:13,893-Speed 3408.77 samples/sec   Loss 5.4834   LearningRate 0.0271   Epoch: 9   Global Step: 48520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:16,897-Speed 3409.42 samples/sec   Loss 5.5553   LearningRate 0.0271   Epoch: 9   Global Step: 48530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:19,897-Speed 3414.21 samples/sec   Loss 5.5330   LearningRate 0.0271   Epoch: 9   Global Step: 48540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:22,897-Speed 3413.69 samples/sec   Loss 5.3512   LearningRate 0.0270   Epoch: 9   Global Step: 48550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:25,901-Speed 3410.04 samples/sec   Loss 5.5464   LearningRate 0.0270   Epoch: 9   Global Step: 48560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:28,901-Speed 3414.08 samples/sec   Loss 5.4852   LearningRate 0.0270   Epoch: 9   Global Step: 48570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:31,899-Speed 3416.19 samples/sec   Loss 5.4487   LearningRate 0.0270   Epoch: 9   Global Step: 48580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:14:34,898-Speed 3416.17 samples/sec   Loss 5.3666   LearningRate 0.0270   Epoch: 9   Global Step: 48590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:37,897-Speed 3414.74 samples/sec   Loss 5.3951   LearningRate 0.0270   Epoch: 9   Global Step: 48600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:40,901-Speed 3409.31 samples/sec   Loss 5.3093   LearningRate 0.0270   Epoch: 9   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:43,905-Speed 3409.76 samples/sec   Loss 5.4230   LearningRate 0.0270   Epoch: 9   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:46,904-Speed 3415.16 samples/sec   Loss 5.4567   LearningRate 0.0270   Epoch: 9   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:49,908-Speed 3410.74 samples/sec   Loss 5.4672   LearningRate 0.0270   Epoch: 9   Global Step: 48640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:52,910-Speed 3411.49 samples/sec   Loss 5.4018   LearningRate 0.0269   Epoch: 9   Global Step: 48650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:55,923-Speed 3398.80 samples/sec   Loss 5.3945   LearningRate 0.0269   Epoch: 9   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:14:58,933-Speed 3402.92 samples/sec   Loss 5.4281   LearningRate 0.0269   Epoch: 9   Global Step: 48670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:01,952-Speed 3392.78 samples/sec   Loss 5.4769   LearningRate 0.0269   Epoch: 9   Global Step: 48680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:04,955-Speed 3410.73 samples/sec   Loss 5.3589   LearningRate 0.0269   Epoch: 9   Global Step: 48690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:15:07,944-Speed 3426.20 samples/sec   Loss 5.4838   LearningRate 0.0269   Epoch: 9   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:10,958-Speed 3398.75 samples/sec   Loss 5.4383   LearningRate 0.0269   Epoch: 9   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:13,976-Speed 3393.82 samples/sec   Loss 5.3955   LearningRate 0.0269   Epoch: 9   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:16,984-Speed 3405.43 samples/sec   Loss 5.4799   LearningRate 0.0269   Epoch: 9   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:20,000-Speed 3395.83 samples/sec   Loss 5.3599   LearningRate 0.0269   Epoch: 9   Global Step: 48740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:23,017-Speed 3394.82 samples/sec   Loss 5.5035   LearningRate 0.0268   Epoch: 9   Global Step: 48750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:26,113-Speed 3308.83 samples/sec   Loss 5.4125   LearningRate 0.0268   Epoch: 9   Global Step: 48760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:29,112-Speed 3414.57 samples/sec   Loss 5.4113   LearningRate 0.0268   Epoch: 9   Global Step: 48770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:15:32,147-Speed 3374.98 samples/sec   Loss 5.4653   LearningRate 0.0268   Epoch: 9   Global Step: 48780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:35,153-Speed 3407.29 samples/sec   Loss 5.5042   LearningRate 0.0268   Epoch: 9   Global Step: 48790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:38,164-Speed 3402.23 samples/sec   Loss 5.3874   LearningRate 0.0268   Epoch: 9   Global Step: 48800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:41,168-Speed 3409.61 samples/sec   Loss 5.4982   LearningRate 0.0268   Epoch: 9   Global Step: 48810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:44,180-Speed 3400.79 samples/sec   Loss 5.3680   LearningRate 0.0268   Epoch: 9   Global Step: 48820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:47,181-Speed 3412.99 samples/sec   Loss 5.6280   LearningRate 0.0268   Epoch: 9   Global Step: 48830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:50,186-Speed 3408.71 samples/sec   Loss 5.3405   LearningRate 0.0267   Epoch: 9   Global Step: 48840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:53,204-Speed 3394.02 samples/sec   Loss 5.5152   LearningRate 0.0267   Epoch: 9   Global Step: 48850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:56,206-Speed 3411.74 samples/sec   Loss 5.3615   LearningRate 0.0267   Epoch: 9   Global Step: 48860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:15:59,205-Speed 3414.59 samples/sec   Loss 5.3684   LearningRate 0.0267   Epoch: 9   Global Step: 48870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:16:02,210-Speed 3408.47 samples/sec   Loss 5.3377   LearningRate 0.0267   Epoch: 9   Global Step: 48880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:05,229-Speed 3393.48 samples/sec   Loss 5.4125   LearningRate 0.0267   Epoch: 9   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:08,240-Speed 3401.23 samples/sec   Loss 5.4213   LearningRate 0.0267   Epoch: 9   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:11,244-Speed 3410.42 samples/sec   Loss 5.3876   LearningRate 0.0267   Epoch: 9   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:14,249-Speed 3408.36 samples/sec   Loss 5.3732   LearningRate 0.0267   Epoch: 9   Global Step: 48920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:17,257-Speed 3404.74 samples/sec   Loss 5.3918   LearningRate 0.0267   Epoch: 9   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:20,265-Speed 3405.94 samples/sec   Loss 5.3551   LearningRate 0.0266   Epoch: 9   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:23,277-Speed 3399.53 samples/sec   Loss 5.4850   LearningRate 0.0266   Epoch: 9   Global Step: 48950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:26,278-Speed 3413.47 samples/sec   Loss 5.4487   LearningRate 0.0266   Epoch: 9   Global Step: 48960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:29,286-Speed 3404.63 samples/sec   Loss 5.5324   LearningRate 0.0266   Epoch: 9   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:32,290-Speed 3410.27 samples/sec   Loss 5.4465   LearningRate 0.0266   Epoch: 9   Global Step: 48980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:16:35,286-Speed 3419.11 samples/sec   Loss 5.3989   LearningRate 0.0266   Epoch: 9   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:38,291-Speed 3408.89 samples/sec   Loss 5.4697   LearningRate 0.0266   Epoch: 9   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:41,301-Speed 3402.49 samples/sec   Loss 5.4512   LearningRate 0.0266   Epoch: 9   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:44,305-Speed 3409.50 samples/sec   Loss 5.5436   LearningRate 0.0266   Epoch: 9   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:47,309-Speed 3409.42 samples/sec   Loss 5.4153   LearningRate 0.0266   Epoch: 9   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:50,313-Speed 3409.96 samples/sec   Loss 5.3618   LearningRate 0.0265   Epoch: 9   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:53,336-Speed 3388.16 samples/sec   Loss 5.3836   LearningRate 0.0265   Epoch: 9   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:56,340-Speed 3409.66 samples/sec   Loss 5.2974   LearningRate 0.0265   Epoch: 9   Global Step: 49060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:16:59,343-Speed 3410.78 samples/sec   Loss 5.5083   LearningRate 0.0265   Epoch: 9   Global Step: 49070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:02,360-Speed 3394.52 samples/sec   Loss 5.5412   LearningRate 0.0265   Epoch: 9   Global Step: 49080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:05,368-Speed 3405.76 samples/sec   Loss 5.2931   LearningRate 0.0265   Epoch: 9   Global Step: 49090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:17:08,363-Speed 3419.38 samples/sec   Loss 5.4434   LearningRate 0.0265   Epoch: 9   Global Step: 49100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:11,378-Speed 3397.72 samples/sec   Loss 5.4411   LearningRate 0.0265   Epoch: 9   Global Step: 49110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:14,387-Speed 3403.56 samples/sec   Loss 5.3979   LearningRate 0.0265   Epoch: 9   Global Step: 49120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:17,396-Speed 3404.72 samples/sec   Loss 5.3887   LearningRate 0.0265   Epoch: 9   Global Step: 49130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:20,405-Speed 3403.81 samples/sec   Loss 5.3730   LearningRate 0.0264   Epoch: 9   Global Step: 49140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:23,409-Speed 3409.60 samples/sec   Loss 5.5238   LearningRate 0.0264   Epoch: 9   Global Step: 49150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:26,435-Speed 3383.79 samples/sec   Loss 5.3550   LearningRate 0.0264   Epoch: 9   Global Step: 49160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:29,443-Speed 3405.43 samples/sec   Loss 5.5870   LearningRate 0.0264   Epoch: 9   Global Step: 49170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:32,458-Speed 3397.78 samples/sec   Loss 5.3961   LearningRate 0.0264   Epoch: 9   Global Step: 49180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:35,462-Speed 3409.81 samples/sec   Loss 5.3348   LearningRate 0.0264   Epoch: 9   Global Step: 49190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:38,513-Speed 3357.46 samples/sec   Loss 5.3482   LearningRate 0.0264   Epoch: 9   Global Step: 49200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:17:41,553-Speed 3369.24 samples/sec   Loss 5.4252   LearningRate 0.0264   Epoch: 9   Global Step: 49210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:44,559-Speed 3407.17 samples/sec   Loss 5.4053   LearningRate 0.0264   Epoch: 9   Global Step: 49220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:47,570-Speed 3401.15 samples/sec   Loss 5.4571   LearningRate 0.0264   Epoch: 9   Global Step: 49230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:50,575-Speed 3408.87 samples/sec   Loss 5.3647   LearningRate 0.0263   Epoch: 9   Global Step: 49240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:53,597-Speed 3389.12 samples/sec   Loss 5.2743   LearningRate 0.0263   Epoch: 9   Global Step: 49250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:56,614-Speed 3394.40 samples/sec   Loss 5.4077   LearningRate 0.0263   Epoch: 9   Global Step: 49260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:17:59,623-Speed 3405.04 samples/sec   Loss 5.4302   LearningRate 0.0263   Epoch: 9   Global Step: 49270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:02,631-Speed 3404.77 samples/sec   Loss 5.3595   LearningRate 0.0263   Epoch: 9   Global Step: 49280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:05,639-Speed 3405.51 samples/sec   Loss 5.4806   LearningRate 0.0263   Epoch: 9   Global Step: 49290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:08,641-Speed 3411.34 samples/sec   Loss 5.4977   LearningRate 0.0263   Epoch: 9   Global Step: 49300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:11,657-Speed 3395.92 samples/sec   Loss 5.4081   LearningRate 0.0263   Epoch: 9   Global Step: 49310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:18:14,643-Speed 3430.77 samples/sec   Loss 5.4343   LearningRate 0.0263   Epoch: 9   Global Step: 49320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:17,654-Speed 3401.45 samples/sec   Loss 5.3604   LearningRate 0.0263   Epoch: 9   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:20,658-Speed 3409.15 samples/sec   Loss 5.2398   LearningRate 0.0262   Epoch: 9   Global Step: 49340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:23,661-Speed 3411.63 samples/sec   Loss 5.4429   LearningRate 0.0262   Epoch: 9   Global Step: 49350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:26,667-Speed 3406.78 samples/sec   Loss 5.2699   LearningRate 0.0262   Epoch: 9   Global Step: 49360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:29,678-Speed 3402.17 samples/sec   Loss 5.4193   LearningRate 0.0262   Epoch: 9   Global Step: 49370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:32,682-Speed 3409.57 samples/sec   Loss 5.3354   LearningRate 0.0262   Epoch: 9   Global Step: 49380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:35,685-Speed 3410.70 samples/sec   Loss 5.3999   LearningRate 0.0262   Epoch: 9   Global Step: 49390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:38,696-Speed 3402.26 samples/sec   Loss 5.4904   LearningRate 0.0262   Epoch: 9   Global Step: 49400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:41,702-Speed 3406.27 samples/sec   Loss 5.4611   LearningRate 0.0262   Epoch: 9   Global Step: 49410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:44,695-Speed 3422.40 samples/sec   Loss 5.2563   LearningRate 0.0262   Epoch: 9   Global Step: 49420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:47,727-Speed 3378.11 samples/sec   Loss 5.2768   LearningRate 0.0261   Epoch: 9   Global Step: 49430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:50,747-Speed 3392.20 samples/sec   Loss 5.4804   LearningRate 0.0261   Epoch: 9   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:53,767-Speed 3391.95 samples/sec   Loss 5.5131   LearningRate 0.0261   Epoch: 9   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:56,774-Speed 3406.25 samples/sec   Loss 5.5220   LearningRate 0.0261   Epoch: 9   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:18:59,781-Speed 3405.71 samples/sec   Loss 5.4739   LearningRate 0.0261   Epoch: 9   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:02,795-Speed 3398.64 samples/sec   Loss 5.4169   LearningRate 0.0261   Epoch: 9   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:05,799-Speed 3409.39 samples/sec   Loss 5.3181   LearningRate 0.0261   Epoch: 9   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:08,806-Speed 3406.38 samples/sec   Loss 5.4682   LearningRate 0.0261   Epoch: 9   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:11,816-Speed 3402.58 samples/sec   Loss 5.3564   LearningRate 0.0261   Epoch: 9   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:14,862-Speed 3362.59 samples/sec   Loss 5.2977   LearningRate 0.0261   Epoch: 9   Global Step: 49520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:19:17,859-Speed 3417.86 samples/sec   Loss 5.4034   LearningRate 0.0260   Epoch: 9   Global Step: 49530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:20,862-Speed 3411.65 samples/sec   Loss 5.5335   LearningRate 0.0260   Epoch: 9   Global Step: 49540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:23,866-Speed 3408.74 samples/sec   Loss 5.4223   LearningRate 0.0260   Epoch: 9   Global Step: 49550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:26,872-Speed 3407.45 samples/sec   Loss 5.3081   LearningRate 0.0260   Epoch: 9   Global Step: 49560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:29,874-Speed 3412.56 samples/sec   Loss 5.4435   LearningRate 0.0260   Epoch: 9   Global Step: 49570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:32,880-Speed 3406.97 samples/sec   Loss 5.4551   LearningRate 0.0260   Epoch: 9   Global Step: 49580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:35,893-Speed 3399.48 samples/sec   Loss 5.4246   LearningRate 0.0260   Epoch: 9   Global Step: 49590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:38,906-Speed 3399.64 samples/sec   Loss 5.4431   LearningRate 0.0260   Epoch: 9   Global Step: 49600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:41,989-Speed 3321.54 samples/sec   Loss 5.3675   LearningRate 0.0260   Epoch: 9   Global Step: 49610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:45,006-Speed 3395.68 samples/sec   Loss 5.4586   LearningRate 0.0260   Epoch: 9   Global Step: 49620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:47,994-Speed 3427.87 samples/sec   Loss 5.3827   LearningRate 0.0259   Epoch: 9   Global Step: 49630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:51,000-Speed 3407.96 samples/sec   Loss 5.4098   LearningRate 0.0259   Epoch: 9   Global Step: 49640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:19:54,009-Speed 3403.98 samples/sec   Loss 5.4026   LearningRate 0.0259   Epoch: 9   Global Step: 49650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:19:57,015-Speed 3406.40 samples/sec   Loss 5.4204   LearningRate 0.0259   Epoch: 9   Global Step: 49660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:00,028-Speed 3399.72 samples/sec   Loss 5.4439   LearningRate 0.0259   Epoch: 9   Global Step: 49670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:03,048-Speed 3391.92 samples/sec   Loss 5.4220   LearningRate 0.0259   Epoch: 9   Global Step: 49680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:06,059-Speed 3401.12 samples/sec   Loss 5.4492   LearningRate 0.0259   Epoch: 9   Global Step: 49690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:09,081-Speed 3389.19 samples/sec   Loss 5.4378   LearningRate 0.0259   Epoch: 9   Global Step: 49700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:12,087-Speed 3407.83 samples/sec   Loss 5.3606   LearningRate 0.0259   Epoch: 9   Global Step: 49710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:15,092-Speed 3408.76 samples/sec   Loss 5.4811   LearningRate 0.0259   Epoch: 9   Global Step: 49720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:18,099-Speed 3406.18 samples/sec   Loss 5.4151   LearningRate 0.0258   Epoch: 9   Global Step: 49730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:21,106-Speed 3407.17 samples/sec   Loss 5.2602   LearningRate 0.0258   Epoch: 9   Global Step: 49740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:20:24,119-Speed 3398.82 samples/sec   Loss 5.3546   LearningRate 0.0258   Epoch: 9   Global Step: 49750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:27,134-Speed 3396.96 samples/sec   Loss 5.3579   LearningRate 0.0258   Epoch: 9   Global Step: 49760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:30,144-Speed 3402.80 samples/sec   Loss 5.4039   LearningRate 0.0258   Epoch: 9   Global Step: 49770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:33,151-Speed 3406.48 samples/sec   Loss 5.3951   LearningRate 0.0258   Epoch: 9   Global Step: 49780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:36,256-Speed 3298.56 samples/sec   Loss 5.4439   LearningRate 0.0258   Epoch: 9   Global Step: 49790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:39,270-Speed 3398.17 samples/sec   Loss 5.4422   LearningRate 0.0258   Epoch: 9   Global Step: 49800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:42,279-Speed 3404.01 samples/sec   Loss 5.3699   LearningRate 0.0258   Epoch: 9   Global Step: 49810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:45,282-Speed 3411.19 samples/sec   Loss 5.2513   LearningRate 0.0258   Epoch: 9   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:48,300-Speed 3394.22 samples/sec   Loss 5.4511   LearningRate 0.0257   Epoch: 9   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:51,303-Speed 3410.97 samples/sec   Loss 5.3459   LearningRate 0.0257   Epoch: 9   Global Step: 49840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:20:54,318-Speed 3396.94 samples/sec   Loss 5.4647   LearningRate 0.0257   Epoch: 9   Global Step: 49850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:20:57,306-Speed 3427.80 samples/sec   Loss 5.4937   LearningRate 0.0257   Epoch: 9   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:00,351-Speed 3363.40 samples/sec   Loss 5.4848   LearningRate 0.0257   Epoch: 9   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:03,367-Speed 3396.85 samples/sec   Loss 5.4467   LearningRate 0.0257   Epoch: 9   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:06,435-Speed 3337.76 samples/sec   Loss 5.4190   LearningRate 0.0257   Epoch: 9   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:09,473-Speed 3372.62 samples/sec   Loss 5.4680   LearningRate 0.0257   Epoch: 9   Global Step: 49900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:12,543-Speed 3335.84 samples/sec   Loss 5.3978   LearningRate 0.0257   Epoch: 9   Global Step: 49910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:15,702-Speed 3242.28 samples/sec   Loss 5.4148   LearningRate 0.0257   Epoch: 9   Global Step: 49920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:18,725-Speed 3388.14 samples/sec   Loss 5.4662   LearningRate 0.0256   Epoch: 9   Global Step: 49930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:21,729-Speed 3409.11 samples/sec   Loss 5.2778   LearningRate 0.0256   Epoch: 9   Global Step: 49940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:24,735-Speed 3407.90 samples/sec   Loss 5.2251   LearningRate 0.0256   Epoch: 9   Global Step: 49950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:27,742-Speed 3406.53 samples/sec   Loss 5.3529   LearningRate 0.0256   Epoch: 9   Global Step: 49960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:21:30,755-Speed 3399.55 samples/sec   Loss 5.4018   LearningRate 0.0256   Epoch: 9   Global Step: 49970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:21:33,741-Speed 3430.17 samples/sec   Loss 5.4115   LearningRate 0.0256   Epoch: 9   Global Step: 49980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:36,747-Speed 3407.66 samples/sec   Loss 5.2140   LearningRate 0.0256   Epoch: 9   Global Step: 49990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:21:39,774-Speed 3383.49 samples/sec   Loss 5.3956   LearningRate 0.0256   Epoch: 9   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:22:24,204-[lfw][50000]XNorm: 22.149314
Training: 2022-04-11 04:22:24,205-[lfw][50000]Accuracy-Flip: 0.99817+-0.00252
Training: 2022-04-11 04:22:24,205-[lfw][50000]Accuracy-Highest: 0.99817
Training: 2022-04-11 04:23:15,398-[cfp_fp][50000]XNorm: 20.303504
Training: 2022-04-11 04:23:15,399-[cfp_fp][50000]Accuracy-Flip: 0.97500+-0.00721
Training: 2022-04-11 04:23:15,399-[cfp_fp][50000]Accuracy-Highest: 0.97629
Training: 2022-04-11 04:23:59,372-[agedb_30][50000]XNorm: 22.360195
Training: 2022-04-11 04:23:59,373-[agedb_30][50000]Accuracy-Flip: 0.98000+-0.00683
Training: 2022-04-11 04:23:59,373-[agedb_30][50000]Accuracy-Highest: 0.98083
Training: 2022-04-11 04:24:02,373-Speed 71.81 samples/sec   Loss 5.2904   LearningRate 0.0256   Epoch: 9   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:05,366-Speed 3422.12 samples/sec   Loss 5.2919   LearningRate 0.0256   Epoch: 9   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:08,359-Speed 3422.14 samples/sec   Loss 5.4169   LearningRate 0.0255   Epoch: 9   Global Step: 50030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:11,349-Speed 3425.43 samples/sec   Loss 5.3682   LearningRate 0.0255   Epoch: 9   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:14,338-Speed 3426.17 samples/sec   Loss 5.3187   LearningRate 0.0255   Epoch: 9   Global Step: 50050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:17,326-Speed 3428.31 samples/sec   Loss 5.3778   LearningRate 0.0255   Epoch: 9   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:20,318-Speed 3423.48 samples/sec   Loss 5.2729   LearningRate 0.0255   Epoch: 9   Global Step: 50070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:23,309-Speed 3424.07 samples/sec   Loss 5.3535   LearningRate 0.0255   Epoch: 9   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:24:26,314-Speed 3409.13 samples/sec   Loss 5.3742   LearningRate 0.0255   Epoch: 9   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:24:29,292-Speed 3439.03 samples/sec   Loss 5.2005   LearningRate 0.0255   Epoch: 9   Global Step: 50100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:32,296-Speed 3410.23 samples/sec   Loss 5.3642   LearningRate 0.0255   Epoch: 9   Global Step: 50110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:35,315-Speed 3392.57 samples/sec   Loss 5.3935   LearningRate 0.0255   Epoch: 9   Global Step: 50120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:38,311-Speed 3418.68 samples/sec   Loss 5.1176   LearningRate 0.0254   Epoch: 9   Global Step: 50130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:41,312-Speed 3413.19 samples/sec   Loss 5.4613   LearningRate 0.0254   Epoch: 9   Global Step: 50140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:44,309-Speed 3417.30 samples/sec   Loss 5.4245   LearningRate 0.0254   Epoch: 9   Global Step: 50150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:47,307-Speed 3416.32 samples/sec   Loss 5.4173   LearningRate 0.0254   Epoch: 9   Global Step: 50160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:50,334-Speed 3383.69 samples/sec   Loss 5.3115   LearningRate 0.0254   Epoch: 9   Global Step: 50170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:24:53,333-Speed 3415.21 samples/sec   Loss 5.4011   LearningRate 0.0254   Epoch: 9   Global Step: 50180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:24:56,339-Speed 3407.66 samples/sec   Loss 5.3851   LearningRate 0.0254   Epoch: 9   Global Step: 50190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:24:59,347-Speed 3405.18 samples/sec   Loss 5.4700   LearningRate 0.0254   Epoch: 9   Global Step: 50200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:02,351-Speed 3409.92 samples/sec   Loss 5.4977   LearningRate 0.0254   Epoch: 9   Global Step: 50210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:05,357-Speed 3406.90 samples/sec   Loss 5.2777   LearningRate 0.0254   Epoch: 9   Global Step: 50220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:08,367-Speed 3403.00 samples/sec   Loss 5.3412   LearningRate 0.0253   Epoch: 9   Global Step: 50230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:11,372-Speed 3409.18 samples/sec   Loss 5.5678   LearningRate 0.0253   Epoch: 9   Global Step: 50240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:14,387-Speed 3396.29 samples/sec   Loss 5.4419   LearningRate 0.0253   Epoch: 9   Global Step: 50250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:17,395-Speed 3406.20 samples/sec   Loss 5.3541   LearningRate 0.0253   Epoch: 9   Global Step: 50260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:20,399-Speed 3409.48 samples/sec   Loss 5.1490   LearningRate 0.0253   Epoch: 9   Global Step: 50270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:25:23,400-Speed 3412.93 samples/sec   Loss 5.3456   LearningRate 0.0253   Epoch: 9   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:26,406-Speed 3407.63 samples/sec   Loss 5.2531   LearningRate 0.0253   Epoch: 9   Global Step: 50290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:29,408-Speed 3411.13 samples/sec   Loss 5.4998   LearningRate 0.0253   Epoch: 9   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:32,413-Speed 3409.47 samples/sec   Loss 5.4090   LearningRate 0.0253   Epoch: 9   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:35,425-Speed 3400.20 samples/sec   Loss 5.2976   LearningRate 0.0253   Epoch: 9   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:38,439-Speed 3397.48 samples/sec   Loss 5.4053   LearningRate 0.0252   Epoch: 9   Global Step: 50330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:41,441-Speed 3412.12 samples/sec   Loss 5.3057   LearningRate 0.0252   Epoch: 9   Global Step: 50340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:44,449-Speed 3405.63 samples/sec   Loss 5.4173   LearningRate 0.0252   Epoch: 9   Global Step: 50350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:47,456-Speed 3406.91 samples/sec   Loss 5.4147   LearningRate 0.0252   Epoch: 9   Global Step: 50360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:50,465-Speed 3403.38 samples/sec   Loss 5.3603   LearningRate 0.0252   Epoch: 9   Global Step: 50370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:53,446-Speed 3435.23 samples/sec   Loss 5.2670   LearningRate 0.0252   Epoch: 9   Global Step: 50380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:56,444-Speed 3417.73 samples/sec   Loss 5.5536   LearningRate 0.0252   Epoch: 9   Global Step: 50390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:25:59,450-Speed 3407.06 samples/sec   Loss 5.4204   LearningRate 0.0252   Epoch: 9   Global Step: 50400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:02,450-Speed 3413.98 samples/sec   Loss 5.2032   LearningRate 0.0252   Epoch: 9   Global Step: 50410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:05,456-Speed 3406.97 samples/sec   Loss 5.3780   LearningRate 0.0252   Epoch: 9   Global Step: 50420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:08,456-Speed 3413.86 samples/sec   Loss 5.3191   LearningRate 0.0251   Epoch: 9   Global Step: 50430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:11,469-Speed 3400.61 samples/sec   Loss 5.3982   LearningRate 0.0251   Epoch: 9   Global Step: 50440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:14,470-Speed 3413.17 samples/sec   Loss 5.3313   LearningRate 0.0251   Epoch: 9   Global Step: 50450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:17,500-Speed 3380.31 samples/sec   Loss 5.2509   LearningRate 0.0251   Epoch: 9   Global Step: 50460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:20,542-Speed 3366.47 samples/sec   Loss 5.4481   LearningRate 0.0251   Epoch: 9   Global Step: 50470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:23,544-Speed 3412.31 samples/sec   Loss 5.2691   LearningRate 0.0251   Epoch: 9   Global Step: 50480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:26,549-Speed 3408.63 samples/sec   Loss 5.2010   LearningRate 0.0251   Epoch: 9   Global Step: 50490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:29,566-Speed 3393.99 samples/sec   Loss 5.3184   LearningRate 0.0251   Epoch: 9   Global Step: 50500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:32,580-Speed 3399.00 samples/sec   Loss 5.2706   LearningRate 0.0251   Epoch: 9   Global Step: 50510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:35,583-Speed 3410.90 samples/sec   Loss 5.4198   LearningRate 0.0251   Epoch: 9   Global Step: 50520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:38,583-Speed 3414.04 samples/sec   Loss 5.2386   LearningRate 0.0250   Epoch: 9   Global Step: 50530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:41,588-Speed 3408.20 samples/sec   Loss 5.1735   LearningRate 0.0250   Epoch: 9   Global Step: 50540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:44,602-Speed 3398.61 samples/sec   Loss 5.3122   LearningRate 0.0250   Epoch: 9   Global Step: 50550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:47,618-Speed 3396.11 samples/sec   Loss 5.3725   LearningRate 0.0250   Epoch: 9   Global Step: 50560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:50,690-Speed 3333.99 samples/sec   Loss 5.3586   LearningRate 0.0250   Epoch: 9   Global Step: 50570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:26:53,668-Speed 3439.89 samples/sec   Loss 5.3492   LearningRate 0.0250   Epoch: 9   Global Step: 50580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:05,847-Speed 840.83 samples/sec   Loss 4.3911   LearningRate 0.0250   Epoch: 10   Global Step: 50590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:09,020-Speed 3228.86 samples/sec   Loss 4.5265   LearningRate 0.0250   Epoch: 10   Global Step: 50600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:12,026-Speed 3406.99 samples/sec   Loss 4.5089   LearningRate 0.0250   Epoch: 10   Global Step: 50610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:15,023-Speed 3417.98 samples/sec   Loss 4.5717   LearningRate 0.0250   Epoch: 10   Global Step: 50620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:18,027-Speed 3409.57 samples/sec   Loss 4.4469   LearningRate 0.0250   Epoch: 10   Global Step: 50630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:21,025-Speed 3416.21 samples/sec   Loss 4.4848   LearningRate 0.0249   Epoch: 10   Global Step: 50640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:24,031-Speed 3406.63 samples/sec   Loss 4.5594   LearningRate 0.0249   Epoch: 10   Global Step: 50650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:27,099-Speed 3338.77 samples/sec   Loss 4.5153   LearningRate 0.0249   Epoch: 10   Global Step: 50660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:30,143-Speed 3364.63 samples/sec   Loss 4.5759   LearningRate 0.0249   Epoch: 10   Global Step: 50670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:33,183-Speed 3369.96 samples/sec   Loss 4.5789   LearningRate 0.0249   Epoch: 10   Global Step: 50680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:27:36,178-Speed 3419.34 samples/sec   Loss 4.4890   LearningRate 0.0249   Epoch: 10   Global Step: 50690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:39,185-Speed 3406.58 samples/sec   Loss 4.4361   LearningRate 0.0249   Epoch: 10   Global Step: 50700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:42,192-Speed 3407.13 samples/sec   Loss 4.4783   LearningRate 0.0249   Epoch: 10   Global Step: 50710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:45,201-Speed 3403.09 samples/sec   Loss 4.6652   LearningRate 0.0249   Epoch: 10   Global Step: 50720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:48,202-Speed 3413.69 samples/sec   Loss 4.5655   LearningRate 0.0249   Epoch: 10   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:51,206-Speed 3409.77 samples/sec   Loss 4.6321   LearningRate 0.0248   Epoch: 10   Global Step: 50740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:54,211-Speed 3407.99 samples/sec   Loss 4.6734   LearningRate 0.0248   Epoch: 10   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:27:57,229-Speed 3394.42 samples/sec   Loss 4.5394   LearningRate 0.0248   Epoch: 10   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:00,237-Speed 3404.82 samples/sec   Loss 4.5541   LearningRate 0.0248   Epoch: 10   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:03,248-Speed 3402.07 samples/sec   Loss 4.5297   LearningRate 0.0248   Epoch: 10   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:06,249-Speed 3412.31 samples/sec   Loss 4.6819   LearningRate 0.0248   Epoch: 10   Global Step: 50790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:28:09,236-Speed 3429.03 samples/sec   Loss 4.7165   LearningRate 0.0248   Epoch: 10   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:12,241-Speed 3408.97 samples/sec   Loss 4.7414   LearningRate 0.0248   Epoch: 10   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:15,256-Speed 3397.70 samples/sec   Loss 4.8183   LearningRate 0.0248   Epoch: 10   Global Step: 50820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:18,259-Speed 3409.85 samples/sec   Loss 4.7285   LearningRate 0.0248   Epoch: 10   Global Step: 50830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:21,269-Speed 3403.01 samples/sec   Loss 4.6553   LearningRate 0.0247   Epoch: 10   Global Step: 50840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:24,269-Speed 3414.44 samples/sec   Loss 4.5583   LearningRate 0.0247   Epoch: 10   Global Step: 50850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:27,274-Speed 3408.77 samples/sec   Loss 4.5726   LearningRate 0.0247   Epoch: 10   Global Step: 50860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:30,282-Speed 3404.78 samples/sec   Loss 4.6446   LearningRate 0.0247   Epoch: 10   Global Step: 50870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:33,301-Speed 3392.19 samples/sec   Loss 4.7572   LearningRate 0.0247   Epoch: 10   Global Step: 50880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:36,400-Speed 3306.58 samples/sec   Loss 4.7753   LearningRate 0.0247   Epoch: 10   Global Step: 50890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:39,434-Speed 3374.94 samples/sec   Loss 4.7068   LearningRate 0.0247   Epoch: 10   Global Step: 50900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:42,577-Speed 3258.95 samples/sec   Loss 4.5844   LearningRate 0.0247   Epoch: 10   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:45,583-Speed 3407.58 samples/sec   Loss 4.6641   LearningRate 0.0247   Epoch: 10   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:48,587-Speed 3409.71 samples/sec   Loss 4.5873   LearningRate 0.0247   Epoch: 10   Global Step: 50930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:51,597-Speed 3403.17 samples/sec   Loss 4.7562   LearningRate 0.0246   Epoch: 10   Global Step: 50940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:54,604-Speed 3405.46 samples/sec   Loss 4.6922   LearningRate 0.0246   Epoch: 10   Global Step: 50950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:28:57,616-Speed 3400.47 samples/sec   Loss 4.7052   LearningRate 0.0246   Epoch: 10   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:00,634-Speed 3393.98 samples/sec   Loss 4.5666   LearningRate 0.0246   Epoch: 10   Global Step: 50970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:03,651-Speed 3395.09 samples/sec   Loss 4.6304   LearningRate 0.0246   Epoch: 10   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:06,659-Speed 3405.68 samples/sec   Loss 4.9220   LearningRate 0.0246   Epoch: 10   Global Step: 50990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:09,697-Speed 3371.31 samples/sec   Loss 4.6721   LearningRate 0.0246   Epoch: 10   Global Step: 51000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:29:12,699-Speed 3411.63 samples/sec   Loss 4.9397   LearningRate 0.0246   Epoch: 10   Global Step: 51010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:29:15,715-Speed 3396.36 samples/sec   Loss 4.7873   LearningRate 0.0246   Epoch: 10   Global Step: 51020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:29:18,725-Speed 3402.76 samples/sec   Loss 4.8450   LearningRate 0.0246   Epoch: 10   Global Step: 51030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:29:21,715-Speed 3425.78 samples/sec   Loss 4.7051   LearningRate 0.0245   Epoch: 10   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:24,717-Speed 3410.84 samples/sec   Loss 4.7002   LearningRate 0.0245   Epoch: 10   Global Step: 51050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:27,724-Speed 3406.49 samples/sec   Loss 4.8884   LearningRate 0.0245   Epoch: 10   Global Step: 51060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:30,740-Speed 3395.92 samples/sec   Loss 4.6426   LearningRate 0.0245   Epoch: 10   Global Step: 51070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:33,750-Speed 3403.21 samples/sec   Loss 4.7894   LearningRate 0.0245   Epoch: 10   Global Step: 51080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:36,753-Speed 3411.25 samples/sec   Loss 4.7163   LearningRate 0.0245   Epoch: 10   Global Step: 51090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:39,768-Speed 3396.65 samples/sec   Loss 4.7413   LearningRate 0.0245   Epoch: 10   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:42,800-Speed 3378.08 samples/sec   Loss 4.6514   LearningRate 0.0245   Epoch: 10   Global Step: 51110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:45,804-Speed 3410.12 samples/sec   Loss 4.6740   LearningRate 0.0245   Epoch: 10   Global Step: 51120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:48,813-Speed 3403.34 samples/sec   Loss 4.8036   LearningRate 0.0245   Epoch: 10   Global Step: 51130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:29:51,833-Speed 3392.39 samples/sec   Loss 4.9902   LearningRate 0.0244   Epoch: 10   Global Step: 51140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:29:54,851-Speed 3393.43 samples/sec   Loss 4.6933   LearningRate 0.0244   Epoch: 10   Global Step: 51150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:29:57,855-Speed 3409.66 samples/sec   Loss 4.8313   LearningRate 0.0244   Epoch: 10   Global Step: 51160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:30:00,854-Speed 3415.37 samples/sec   Loss 4.8594   LearningRate 0.0244   Epoch: 10   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:03,876-Speed 3389.60 samples/sec   Loss 4.8507   LearningRate 0.0244   Epoch: 10   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:06,889-Speed 3399.20 samples/sec   Loss 4.9297   LearningRate 0.0244   Epoch: 10   Global Step: 51190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:09,900-Speed 3402.26 samples/sec   Loss 4.9214   LearningRate 0.0244   Epoch: 10   Global Step: 51200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:12,925-Speed 3386.17 samples/sec   Loss 4.7442   LearningRate 0.0244   Epoch: 10   Global Step: 51210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:15,934-Speed 3403.40 samples/sec   Loss 4.9323   LearningRate 0.0244   Epoch: 10   Global Step: 51220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:18,943-Speed 3403.61 samples/sec   Loss 4.9237   LearningRate 0.0244   Epoch: 10   Global Step: 51230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:21,950-Speed 3406.63 samples/sec   Loss 5.0296   LearningRate 0.0244   Epoch: 10   Global Step: 51240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:24,962-Speed 3400.31 samples/sec   Loss 4.9743   LearningRate 0.0243   Epoch: 10   Global Step: 51250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:27,975-Speed 3400.29 samples/sec   Loss 4.9308   LearningRate 0.0243   Epoch: 10   Global Step: 51260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:30,999-Speed 3386.69 samples/sec   Loss 4.8398   LearningRate 0.0243   Epoch: 10   Global Step: 51270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:30:34,007-Speed 3404.76 samples/sec   Loss 4.7871   LearningRate 0.0243   Epoch: 10   Global Step: 51280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:30:36,996-Speed 3427.37 samples/sec   Loss 4.7228   LearningRate 0.0243   Epoch: 10   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:40,016-Speed 3390.86 samples/sec   Loss 4.7939   LearningRate 0.0243   Epoch: 10   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:43,025-Speed 3404.26 samples/sec   Loss 4.7165   LearningRate 0.0243   Epoch: 10   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:46,035-Speed 3403.67 samples/sec   Loss 4.7652   LearningRate 0.0243   Epoch: 10   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:49,055-Speed 3390.61 samples/sec   Loss 4.8407   LearningRate 0.0243   Epoch: 10   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:52,071-Speed 3397.49 samples/sec   Loss 4.8693   LearningRate 0.0243   Epoch: 10   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:30:55,060-Speed 3425.75 samples/sec   Loss 5.0067   LearningRate 0.0242   Epoch: 10   Global Step: 51350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:30:58,069-Speed 3404.43 samples/sec   Loss 4.8763   LearningRate 0.0242   Epoch: 10   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:01,075-Speed 3407.76 samples/sec   Loss 4.8710   LearningRate 0.0242   Epoch: 10   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:04,084-Speed 3403.99 samples/sec   Loss 4.8019   LearningRate 0.0242   Epoch: 10   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:07,092-Speed 3404.45 samples/sec   Loss 4.9777   LearningRate 0.0242   Epoch: 10   Global Step: 51390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:10,099-Speed 3406.92 samples/sec   Loss 4.8559   LearningRate 0.0242   Epoch: 10   Global Step: 51400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:13,109-Speed 3402.14 samples/sec   Loss 4.7787   LearningRate 0.0242   Epoch: 10   Global Step: 51410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:16,129-Speed 3392.05 samples/sec   Loss 4.8659   LearningRate 0.0242   Epoch: 10   Global Step: 51420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:19,133-Speed 3409.12 samples/sec   Loss 4.9659   LearningRate 0.0242   Epoch: 10   Global Step: 51430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:22,136-Speed 3410.89 samples/sec   Loss 4.8534   LearningRate 0.0242   Epoch: 10   Global Step: 51440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:31:25,147-Speed 3402.18 samples/sec   Loss 4.9107   LearningRate 0.0241   Epoch: 10   Global Step: 51450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:28,152-Speed 3408.29 samples/sec   Loss 4.9545   LearningRate 0.0241   Epoch: 10   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:31,168-Speed 3396.60 samples/sec   Loss 5.0224   LearningRate 0.0241   Epoch: 10   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:34,172-Speed 3409.06 samples/sec   Loss 4.9207   LearningRate 0.0241   Epoch: 10   Global Step: 51480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:37,186-Speed 3398.40 samples/sec   Loss 4.9317   LearningRate 0.0241   Epoch: 10   Global Step: 51490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:40,202-Speed 3395.84 samples/sec   Loss 4.8408   LearningRate 0.0241   Epoch: 10   Global Step: 51500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:43,221-Speed 3392.18 samples/sec   Loss 4.9211   LearningRate 0.0241   Epoch: 10   Global Step: 51510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:46,235-Speed 3399.37 samples/sec   Loss 4.9877   LearningRate 0.0241   Epoch: 10   Global Step: 51520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:49,249-Speed 3397.82 samples/sec   Loss 4.9022   LearningRate 0.0241   Epoch: 10   Global Step: 51530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:52,261-Speed 3401.61 samples/sec   Loss 4.8978   LearningRate 0.0241   Epoch: 10   Global Step: 51540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:31:55,266-Speed 3407.64 samples/sec   Loss 4.9418   LearningRate 0.0241   Epoch: 10   Global Step: 51550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:31:58,256-Speed 3425.22 samples/sec   Loss 4.8729   LearningRate 0.0240   Epoch: 10   Global Step: 51560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:01,267-Speed 3402.67 samples/sec   Loss 4.7624   LearningRate 0.0240   Epoch: 10   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:04,331-Speed 3342.57 samples/sec   Loss 4.9444   LearningRate 0.0240   Epoch: 10   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:07,326-Speed 3419.54 samples/sec   Loss 4.9435   LearningRate 0.0240   Epoch: 10   Global Step: 51590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:10,336-Speed 3402.62 samples/sec   Loss 4.8691   LearningRate 0.0240   Epoch: 10   Global Step: 51600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:13,353-Speed 3395.12 samples/sec   Loss 4.8096   LearningRate 0.0240   Epoch: 10   Global Step: 51610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:16,419-Speed 3340.76 samples/sec   Loss 4.8166   LearningRate 0.0240   Epoch: 10   Global Step: 51620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:19,430-Speed 3402.09 samples/sec   Loss 4.9253   LearningRate 0.0240   Epoch: 10   Global Step: 51630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:22,439-Speed 3404.18 samples/sec   Loss 4.9480   LearningRate 0.0240   Epoch: 10   Global Step: 51640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:25,445-Speed 3407.29 samples/sec   Loss 4.8409   LearningRate 0.0240   Epoch: 10   Global Step: 51650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:28,452-Speed 3406.01 samples/sec   Loss 4.7084   LearningRate 0.0239   Epoch: 10   Global Step: 51660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:31,464-Speed 3401.40 samples/sec   Loss 4.8450   LearningRate 0.0239   Epoch: 10   Global Step: 51670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:34,469-Speed 3407.72 samples/sec   Loss 5.0178   LearningRate 0.0239   Epoch: 10   Global Step: 51680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:32:37,479-Speed 3403.12 samples/sec   Loss 4.9346   LearningRate 0.0239   Epoch: 10   Global Step: 51690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:40,494-Speed 3397.21 samples/sec   Loss 4.9917   LearningRate 0.0239   Epoch: 10   Global Step: 51700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:43,509-Speed 3396.73 samples/sec   Loss 4.9674   LearningRate 0.0239   Epoch: 10   Global Step: 51710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:46,523-Speed 3399.24 samples/sec   Loss 4.9044   LearningRate 0.0239   Epoch: 10   Global Step: 51720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:49,533-Speed 3402.40 samples/sec   Loss 4.9659   LearningRate 0.0239   Epoch: 10   Global Step: 51730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:52,540-Speed 3407.04 samples/sec   Loss 4.9882   LearningRate 0.0239   Epoch: 10   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:55,555-Speed 3396.39 samples/sec   Loss 5.0071   LearningRate 0.0239   Epoch: 10   Global Step: 51750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:32:58,560-Speed 3408.69 samples/sec   Loss 5.0316   LearningRate 0.0238   Epoch: 10   Global Step: 51760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:01,568-Speed 3404.97 samples/sec   Loss 5.0152   LearningRate 0.0238   Epoch: 10   Global Step: 51770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:04,571-Speed 3410.91 samples/sec   Loss 4.9560   LearningRate 0.0238   Epoch: 10   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:07,582-Speed 3401.95 samples/sec   Loss 5.0500   LearningRate 0.0238   Epoch: 10   Global Step: 51790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:33:10,843-Speed 3140.59 samples/sec   Loss 4.9678   LearningRate 0.0238   Epoch: 10   Global Step: 51800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:33:13,838-Speed 3420.11 samples/sec   Loss 4.9112   LearningRate 0.0238   Epoch: 10   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:16,864-Speed 3385.15 samples/sec   Loss 4.9387   LearningRate 0.0238   Epoch: 10   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:19,873-Speed 3404.19 samples/sec   Loss 4.9505   LearningRate 0.0238   Epoch: 10   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:22,903-Speed 3380.37 samples/sec   Loss 4.8537   LearningRate 0.0238   Epoch: 10   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:25,939-Speed 3373.51 samples/sec   Loss 4.9053   LearningRate 0.0238   Epoch: 10   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:28,956-Speed 3395.08 samples/sec   Loss 5.0498   LearningRate 0.0238   Epoch: 10   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:31,963-Speed 3405.97 samples/sec   Loss 4.8918   LearningRate 0.0237   Epoch: 10   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:34,978-Speed 3397.64 samples/sec   Loss 5.0438   LearningRate 0.0237   Epoch: 10   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:37,990-Speed 3399.82 samples/sec   Loss 4.9349   LearningRate 0.0237   Epoch: 10   Global Step: 51890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:41,019-Speed 3381.18 samples/sec   Loss 4.9412   LearningRate 0.0237   Epoch: 10   Global Step: 51900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:44,031-Speed 3401.38 samples/sec   Loss 4.9088   LearningRate 0.0237   Epoch: 10   Global Step: 51910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:33:47,021-Speed 3425.95 samples/sec   Loss 4.9450   LearningRate 0.0237   Epoch: 10   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:50,027-Speed 3406.50 samples/sec   Loss 4.9074   LearningRate 0.0237   Epoch: 10   Global Step: 51930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:53,054-Speed 3384.11 samples/sec   Loss 4.9801   LearningRate 0.0237   Epoch: 10   Global Step: 51940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:56,079-Speed 3385.61 samples/sec   Loss 5.0356   LearningRate 0.0237   Epoch: 10   Global Step: 51950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:33:59,083-Speed 3410.03 samples/sec   Loss 5.1268   LearningRate 0.0237   Epoch: 10   Global Step: 51960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:34:02,095-Speed 3400.79 samples/sec   Loss 5.0340   LearningRate 0.0236   Epoch: 10   Global Step: 51970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:34:05,112-Speed 3394.80 samples/sec   Loss 5.0117   LearningRate 0.0236   Epoch: 10   Global Step: 51980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:34:08,136-Speed 3386.83 samples/sec   Loss 4.8604   LearningRate 0.0236   Epoch: 10   Global Step: 51990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:34:11,151-Speed 3397.51 samples/sec   Loss 4.8939   LearningRate 0.0236   Epoch: 10   Global Step: 52000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:34:55,241-[lfw][52000]XNorm: 20.781405
Training: 2022-04-11 04:34:55,242-[lfw][52000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-11 04:34:55,243-[lfw][52000]Accuracy-Highest: 0.99817
Training: 2022-04-11 04:35:46,284-[cfp_fp][52000]XNorm: 18.933418
Training: 2022-04-11 04:35:46,284-[cfp_fp][52000]Accuracy-Flip: 0.97357+-0.00788
Training: 2022-04-11 04:35:46,285-[cfp_fp][52000]Accuracy-Highest: 0.97629
Training: 2022-04-11 04:36:30,279-[agedb_30][52000]XNorm: 20.852038
Training: 2022-04-11 04:36:30,280-[agedb_30][52000]Accuracy-Flip: 0.97883+-0.00563
Training: 2022-04-11 04:36:30,280-[agedb_30][52000]Accuracy-Highest: 0.98083
Training: 2022-04-11 04:36:33,277-Speed 72.05 samples/sec   Loss 4.9141   LearningRate 0.0236   Epoch: 10   Global Step: 52010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:36:36,271-Speed 3420.57 samples/sec   Loss 5.0383   LearningRate 0.0236   Epoch: 10   Global Step: 52020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:36:39,250-Speed 3437.85 samples/sec   Loss 4.8852   LearningRate 0.0236   Epoch: 10   Global Step: 52030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:36:42,259-Speed 3405.08 samples/sec   Loss 4.9775   LearningRate 0.0236   Epoch: 10   Global Step: 52040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:36:45,249-Speed 3425.45 samples/sec   Loss 4.9375   LearningRate 0.0236   Epoch: 10   Global Step: 52050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:36:48,253-Speed 3409.77 samples/sec   Loss 5.0643   LearningRate 0.0236   Epoch: 10   Global Step: 52060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:36:51,248-Speed 3419.17 samples/sec   Loss 4.9804   LearningRate 0.0235   Epoch: 10   Global Step: 52070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:36:54,247-Speed 3415.10 samples/sec   Loss 5.0190   LearningRate 0.0235   Epoch: 10   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:36:57,239-Speed 3424.31 samples/sec   Loss 4.9211   LearningRate 0.0235   Epoch: 10   Global Step: 52090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:00,233-Speed 3421.04 samples/sec   Loss 4.9420   LearningRate 0.0235   Epoch: 10   Global Step: 52100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:03,229-Speed 3417.99 samples/sec   Loss 4.9644   LearningRate 0.0235   Epoch: 10   Global Step: 52110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:06,228-Speed 3416.03 samples/sec   Loss 4.8959   LearningRate 0.0235   Epoch: 10   Global Step: 52120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:09,230-Speed 3411.30 samples/sec   Loss 4.9467   LearningRate 0.0235   Epoch: 10   Global Step: 52130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:37:12,223-Speed 3422.16 samples/sec   Loss 5.0376   LearningRate 0.0235   Epoch: 10   Global Step: 52140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:37:15,210-Speed 3429.18 samples/sec   Loss 4.9469   LearningRate 0.0235   Epoch: 10   Global Step: 52150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:18,214-Speed 3410.59 samples/sec   Loss 4.8939   LearningRate 0.0235   Epoch: 10   Global Step: 52160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:21,214-Speed 3413.24 samples/sec   Loss 4.8986   LearningRate 0.0235   Epoch: 10   Global Step: 52170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:24,216-Speed 3412.08 samples/sec   Loss 4.8898   LearningRate 0.0234   Epoch: 10   Global Step: 52180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:27,215-Speed 3415.37 samples/sec   Loss 5.0414   LearningRate 0.0234   Epoch: 10   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:30,233-Speed 3393.20 samples/sec   Loss 5.0126   LearningRate 0.0234   Epoch: 10   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:33,234-Speed 3414.15 samples/sec   Loss 4.9036   LearningRate 0.0234   Epoch: 10   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:36,227-Speed 3421.34 samples/sec   Loss 4.9310   LearningRate 0.0234   Epoch: 10   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:39,252-Speed 3386.25 samples/sec   Loss 5.0805   LearningRate 0.0234   Epoch: 10   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:42,249-Speed 3418.00 samples/sec   Loss 4.9761   LearningRate 0.0234   Epoch: 10   Global Step: 52240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:45,229-Speed 3437.19 samples/sec   Loss 4.8789   LearningRate 0.0234   Epoch: 10   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:48,230-Speed 3413.41 samples/sec   Loss 4.8951   LearningRate 0.0234   Epoch: 10   Global Step: 52260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:51,231-Speed 3412.78 samples/sec   Loss 5.0235   LearningRate 0.0234   Epoch: 10   Global Step: 52270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:54,240-Speed 3403.33 samples/sec   Loss 4.9104   LearningRate 0.0233   Epoch: 10   Global Step: 52280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:37:57,245-Speed 3408.87 samples/sec   Loss 5.0145   LearningRate 0.0233   Epoch: 10   Global Step: 52290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:00,244-Speed 3415.10 samples/sec   Loss 5.0103   LearningRate 0.0233   Epoch: 10   Global Step: 52300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:03,268-Speed 3387.00 samples/sec   Loss 4.8769   LearningRate 0.0233   Epoch: 10   Global Step: 52310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:06,277-Speed 3403.98 samples/sec   Loss 4.9590   LearningRate 0.0233   Epoch: 10   Global Step: 52320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:09,276-Speed 3414.60 samples/sec   Loss 5.0087   LearningRate 0.0233   Epoch: 10   Global Step: 52330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:12,276-Speed 3415.30 samples/sec   Loss 4.9326   LearningRate 0.0233   Epoch: 10   Global Step: 52340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:15,260-Speed 3432.12 samples/sec   Loss 4.9625   LearningRate 0.0233   Epoch: 10   Global Step: 52350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:18,265-Speed 3409.22 samples/sec   Loss 5.0866   LearningRate 0.0233   Epoch: 10   Global Step: 52360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:21,261-Speed 3418.09 samples/sec   Loss 4.9350   LearningRate 0.0233   Epoch: 10   Global Step: 52370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:24,266-Speed 3408.54 samples/sec   Loss 4.9281   LearningRate 0.0233   Epoch: 10   Global Step: 52380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:27,266-Speed 3414.30 samples/sec   Loss 4.8073   LearningRate 0.0232   Epoch: 10   Global Step: 52390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:30,273-Speed 3405.57 samples/sec   Loss 4.9847   LearningRate 0.0232   Epoch: 10   Global Step: 52400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:33,273-Speed 3415.25 samples/sec   Loss 4.9745   LearningRate 0.0232   Epoch: 10   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:36,272-Speed 3414.46 samples/sec   Loss 5.0615   LearningRate 0.0232   Epoch: 10   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:39,273-Speed 3413.95 samples/sec   Loss 4.9935   LearningRate 0.0232   Epoch: 10   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:42,269-Speed 3418.75 samples/sec   Loss 4.9407   LearningRate 0.0232   Epoch: 10   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:45,247-Speed 3439.22 samples/sec   Loss 5.0673   LearningRate 0.0232   Epoch: 10   Global Step: 52450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:48,246-Speed 3415.12 samples/sec   Loss 5.0000   LearningRate 0.0232   Epoch: 10   Global Step: 52460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:51,244-Speed 3416.76 samples/sec   Loss 4.9511   LearningRate 0.0232   Epoch: 10   Global Step: 52470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:54,243-Speed 3414.90 samples/sec   Loss 4.9782   LearningRate 0.0232   Epoch: 10   Global Step: 52480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:38:57,247-Speed 3410.16 samples/sec   Loss 4.9547   LearningRate 0.0231   Epoch: 10   Global Step: 52490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:00,246-Speed 3414.49 samples/sec   Loss 4.8809   LearningRate 0.0231   Epoch: 10   Global Step: 52500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:03,307-Speed 3347.04 samples/sec   Loss 4.8960   LearningRate 0.0231   Epoch: 10   Global Step: 52510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:06,353-Speed 3361.79 samples/sec   Loss 5.0811   LearningRate 0.0231   Epoch: 10   Global Step: 52520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:09,357-Speed 3410.15 samples/sec   Loss 5.0577   LearningRate 0.0231   Epoch: 10   Global Step: 52530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:12,364-Speed 3406.62 samples/sec   Loss 5.0434   LearningRate 0.0231   Epoch: 10   Global Step: 52540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:15,365-Speed 3412.47 samples/sec   Loss 5.0207   LearningRate 0.0231   Epoch: 10   Global Step: 52550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:39:18,367-Speed 3411.82 samples/sec   Loss 4.9497   LearningRate 0.0231   Epoch: 10   Global Step: 52560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:39:21,348-Speed 3436.16 samples/sec   Loss 4.9837   LearningRate 0.0231   Epoch: 10   Global Step: 52570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:24,347-Speed 3415.17 samples/sec   Loss 4.9597   LearningRate 0.0231   Epoch: 10   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:27,347-Speed 3414.35 samples/sec   Loss 4.9031   LearningRate 0.0231   Epoch: 10   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:30,353-Speed 3406.80 samples/sec   Loss 4.9426   LearningRate 0.0230   Epoch: 10   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:33,358-Speed 3408.43 samples/sec   Loss 4.9936   LearningRate 0.0230   Epoch: 10   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:36,360-Speed 3412.56 samples/sec   Loss 4.9797   LearningRate 0.0230   Epoch: 10   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:39,360-Speed 3413.84 samples/sec   Loss 4.9241   LearningRate 0.0230   Epoch: 10   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:42,390-Speed 3380.50 samples/sec   Loss 5.0645   LearningRate 0.0230   Epoch: 10   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:45,395-Speed 3408.74 samples/sec   Loss 4.9337   LearningRate 0.0230   Epoch: 10   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:48,400-Speed 3408.04 samples/sec   Loss 4.9649   LearningRate 0.0230   Epoch: 10   Global Step: 52660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:51,397-Speed 3417.13 samples/sec   Loss 4.8893   LearningRate 0.0230   Epoch: 10   Global Step: 52670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:54,400-Speed 3410.91 samples/sec   Loss 5.1479   LearningRate 0.0230   Epoch: 10   Global Step: 52680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:39:57,401-Speed 3413.64 samples/sec   Loss 4.8527   LearningRate 0.0230   Epoch: 10   Global Step: 52690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:00,402-Speed 3412.30 samples/sec   Loss 5.0722   LearningRate 0.0229   Epoch: 10   Global Step: 52700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:03,405-Speed 3410.86 samples/sec   Loss 4.9488   LearningRate 0.0229   Epoch: 10   Global Step: 52710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:06,410-Speed 3408.37 samples/sec   Loss 4.8653   LearningRate 0.0229   Epoch: 10   Global Step: 52720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:09,408-Speed 3416.97 samples/sec   Loss 4.9467   LearningRate 0.0229   Epoch: 10   Global Step: 52730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:12,415-Speed 3406.62 samples/sec   Loss 4.9132   LearningRate 0.0229   Epoch: 10   Global Step: 52740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:15,426-Speed 3401.86 samples/sec   Loss 4.8547   LearningRate 0.0229   Epoch: 10   Global Step: 52750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:18,427-Speed 3412.37 samples/sec   Loss 5.0348   LearningRate 0.0229   Epoch: 10   Global Step: 52760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:21,428-Speed 3413.23 samples/sec   Loss 4.9302   LearningRate 0.0229   Epoch: 10   Global Step: 52770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:40:24,420-Speed 3422.78 samples/sec   Loss 4.8929   LearningRate 0.0229   Epoch: 10   Global Step: 52780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:27,442-Speed 3389.50 samples/sec   Loss 4.9828   LearningRate 0.0229   Epoch: 10   Global Step: 52790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:30,447-Speed 3408.93 samples/sec   Loss 5.0531   LearningRate 0.0229   Epoch: 10   Global Step: 52800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:33,460-Speed 3399.46 samples/sec   Loss 4.9461   LearningRate 0.0228   Epoch: 10   Global Step: 52810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:36,467-Speed 3405.81 samples/sec   Loss 5.0507   LearningRate 0.0228   Epoch: 10   Global Step: 52820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:39,476-Speed 3404.16 samples/sec   Loss 5.0430   LearningRate 0.0228   Epoch: 10   Global Step: 52830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:42,477-Speed 3413.51 samples/sec   Loss 5.0624   LearningRate 0.0228   Epoch: 10   Global Step: 52840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:45,478-Speed 3412.93 samples/sec   Loss 5.0590   LearningRate 0.0228   Epoch: 10   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:48,501-Speed 3388.27 samples/sec   Loss 5.0686   LearningRate 0.0228   Epoch: 10   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:51,727-Speed 3174.68 samples/sec   Loss 4.9287   LearningRate 0.0228   Epoch: 10   Global Step: 52870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:54,717-Speed 3425.62 samples/sec   Loss 5.0021   LearningRate 0.0228   Epoch: 10   Global Step: 52880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:40:57,721-Speed 3410.00 samples/sec   Loss 5.0470   LearningRate 0.0228   Epoch: 10   Global Step: 52890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:00,728-Speed 3405.50 samples/sec   Loss 5.1343   LearningRate 0.0228   Epoch: 10   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:03,737-Speed 3403.83 samples/sec   Loss 4.8304   LearningRate 0.0227   Epoch: 10   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:06,749-Speed 3401.30 samples/sec   Loss 4.9633   LearningRate 0.0227   Epoch: 10   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:09,768-Speed 3392.44 samples/sec   Loss 4.9725   LearningRate 0.0227   Epoch: 10   Global Step: 52930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:12,773-Speed 3409.24 samples/sec   Loss 4.9285   LearningRate 0.0227   Epoch: 10   Global Step: 52940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:15,787-Speed 3397.76 samples/sec   Loss 5.0767   LearningRate 0.0227   Epoch: 10   Global Step: 52950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:18,792-Speed 3407.86 samples/sec   Loss 5.0870   LearningRate 0.0227   Epoch: 10   Global Step: 52960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:21,793-Speed 3413.38 samples/sec   Loss 4.8788   LearningRate 0.0227   Epoch: 10   Global Step: 52970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:24,797-Speed 3410.00 samples/sec   Loss 5.0608   LearningRate 0.0227   Epoch: 10   Global Step: 52980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:41:27,782-Speed 3431.58 samples/sec   Loss 5.0850   LearningRate 0.0227   Epoch: 10   Global Step: 52990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:30,791-Speed 3403.75 samples/sec   Loss 4.9097   LearningRate 0.0227   Epoch: 10   Global Step: 53000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:33,798-Speed 3406.29 samples/sec   Loss 5.0726   LearningRate 0.0227   Epoch: 10   Global Step: 53010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:36,806-Speed 3404.87 samples/sec   Loss 5.0306   LearningRate 0.0226   Epoch: 10   Global Step: 53020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:41:39,802-Speed 3418.72 samples/sec   Loss 5.0932   LearningRate 0.0226   Epoch: 10   Global Step: 53030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:41:42,819-Speed 3395.49 samples/sec   Loss 4.9456   LearningRate 0.0226   Epoch: 10   Global Step: 53040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:41:45,819-Speed 3413.28 samples/sec   Loss 4.9806   LearningRate 0.0226   Epoch: 10   Global Step: 53050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:41:48,838-Speed 3392.59 samples/sec   Loss 4.9601   LearningRate 0.0226   Epoch: 10   Global Step: 53060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:41:51,847-Speed 3404.08 samples/sec   Loss 4.9745   LearningRate 0.0226   Epoch: 10   Global Step: 53070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:41:54,853-Speed 3407.21 samples/sec   Loss 5.0104   LearningRate 0.0226   Epoch: 10   Global Step: 53080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:41:57,857-Speed 3410.98 samples/sec   Loss 4.9820   LearningRate 0.0226   Epoch: 10   Global Step: 53090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:42:00,872-Speed 3397.00 samples/sec   Loss 4.9046   LearningRate 0.0226   Epoch: 10   Global Step: 53100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:42:03,954-Speed 3322.99 samples/sec   Loss 4.9482   LearningRate 0.0226   Epoch: 10   Global Step: 53110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:42:06,956-Speed 3411.87 samples/sec   Loss 5.0468   LearningRate 0.0226   Epoch: 10   Global Step: 53120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:42:09,964-Speed 3405.45 samples/sec   Loss 5.0075   LearningRate 0.0225   Epoch: 10   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:12,970-Speed 3407.88 samples/sec   Loss 4.9478   LearningRate 0.0225   Epoch: 10   Global Step: 53140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:15,975-Speed 3407.49 samples/sec   Loss 5.0380   LearningRate 0.0225   Epoch: 10   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:18,982-Speed 3406.23 samples/sec   Loss 4.9823   LearningRate 0.0225   Epoch: 10   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:21,990-Speed 3404.87 samples/sec   Loss 5.1045   LearningRate 0.0225   Epoch: 10   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:25,045-Speed 3352.46 samples/sec   Loss 4.8707   LearningRate 0.0225   Epoch: 10   Global Step: 53180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:28,061-Speed 3397.04 samples/sec   Loss 4.9446   LearningRate 0.0225   Epoch: 10   Global Step: 53190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:31,081-Speed 3391.11 samples/sec   Loss 4.9993   LearningRate 0.0225   Epoch: 10   Global Step: 53200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:34,082-Speed 3413.15 samples/sec   Loss 4.9400   LearningRate 0.0225   Epoch: 10   Global Step: 53210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:37,085-Speed 3411.00 samples/sec   Loss 5.0104   LearningRate 0.0225   Epoch: 10   Global Step: 53220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:40,110-Speed 3385.40 samples/sec   Loss 5.0056   LearningRate 0.0224   Epoch: 10   Global Step: 53230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:42:43,107-Speed 3418.12 samples/sec   Loss 4.9401   LearningRate 0.0224   Epoch: 10   Global Step: 53240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:46,121-Speed 3398.55 samples/sec   Loss 4.9972   LearningRate 0.0224   Epoch: 10   Global Step: 53250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:49,126-Speed 3408.16 samples/sec   Loss 4.9534   LearningRate 0.0224   Epoch: 10   Global Step: 53260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:52,134-Speed 3405.23 samples/sec   Loss 5.0239   LearningRate 0.0224   Epoch: 10   Global Step: 53270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:55,144-Speed 3402.09 samples/sec   Loss 4.9254   LearningRate 0.0224   Epoch: 10   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:42:58,161-Speed 3395.76 samples/sec   Loss 4.9456   LearningRate 0.0224   Epoch: 10   Global Step: 53290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:01,165-Speed 3409.21 samples/sec   Loss 4.9764   LearningRate 0.0224   Epoch: 10   Global Step: 53300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:04,176-Speed 3402.31 samples/sec   Loss 5.0811   LearningRate 0.0224   Epoch: 10   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:07,193-Speed 3394.85 samples/sec   Loss 4.7841   LearningRate 0.0224   Epoch: 10   Global Step: 53320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:10,194-Speed 3412.47 samples/sec   Loss 4.9930   LearningRate 0.0224   Epoch: 10   Global Step: 53330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:13,177-Speed 3433.71 samples/sec   Loss 4.9492   LearningRate 0.0223   Epoch: 10   Global Step: 53340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:16,180-Speed 3410.57 samples/sec   Loss 5.0410   LearningRate 0.0223   Epoch: 10   Global Step: 53350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:19,198-Speed 3393.66 samples/sec   Loss 4.9984   LearningRate 0.0223   Epoch: 10   Global Step: 53360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:22,201-Speed 3410.79 samples/sec   Loss 4.9225   LearningRate 0.0223   Epoch: 10   Global Step: 53370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:25,213-Speed 3401.17 samples/sec   Loss 4.9152   LearningRate 0.0223   Epoch: 10   Global Step: 53380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:28,248-Speed 3374.39 samples/sec   Loss 4.9206   LearningRate 0.0223   Epoch: 10   Global Step: 53390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:31,258-Speed 3402.69 samples/sec   Loss 4.9335   LearningRate 0.0223   Epoch: 10   Global Step: 53400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:34,292-Speed 3376.39 samples/sec   Loss 4.7935   LearningRate 0.0223   Epoch: 10   Global Step: 53410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:37,321-Speed 3382.34 samples/sec   Loss 5.0939   LearningRate 0.0223   Epoch: 10   Global Step: 53420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:40,329-Speed 3404.40 samples/sec   Loss 5.0486   LearningRate 0.0223   Epoch: 10   Global Step: 53430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:43,358-Speed 3381.65 samples/sec   Loss 5.0393   LearningRate 0.0223   Epoch: 10   Global Step: 53440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:46,365-Speed 3406.32 samples/sec   Loss 5.1093   LearningRate 0.0222   Epoch: 10   Global Step: 53450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:49,393-Speed 3382.61 samples/sec   Loss 4.8274   LearningRate 0.0222   Epoch: 10   Global Step: 53460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:52,411-Speed 3393.97 samples/sec   Loss 4.9452   LearningRate 0.0222   Epoch: 10   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:43:55,399-Speed 3427.73 samples/sec   Loss 5.0494   LearningRate 0.0222   Epoch: 10   Global Step: 53480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:43:58,409-Speed 3402.95 samples/sec   Loss 5.1245   LearningRate 0.0222   Epoch: 10   Global Step: 53490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:01,414-Speed 3408.67 samples/sec   Loss 5.0537   LearningRate 0.0222   Epoch: 10   Global Step: 53500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:04,455-Speed 3368.03 samples/sec   Loss 5.0036   LearningRate 0.0222   Epoch: 10   Global Step: 53510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:07,460-Speed 3408.35 samples/sec   Loss 4.9213   LearningRate 0.0222   Epoch: 10   Global Step: 53520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:10,461-Speed 3412.70 samples/sec   Loss 5.1556   LearningRate 0.0222   Epoch: 10   Global Step: 53530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:13,464-Speed 3410.81 samples/sec   Loss 4.8925   LearningRate 0.0222   Epoch: 10   Global Step: 53540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:16,468-Speed 3410.10 samples/sec   Loss 5.0564   LearningRate 0.0222   Epoch: 10   Global Step: 53550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:19,472-Speed 3409.07 samples/sec   Loss 4.9589   LearningRate 0.0221   Epoch: 10   Global Step: 53560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:22,480-Speed 3405.86 samples/sec   Loss 5.0470   LearningRate 0.0221   Epoch: 10   Global Step: 53570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:25,482-Speed 3410.90 samples/sec   Loss 4.9270   LearningRate 0.0221   Epoch: 10   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:44:28,493-Speed 3402.86 samples/sec   Loss 5.0261   LearningRate 0.0221   Epoch: 10   Global Step: 53590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:44:31,498-Speed 3408.04 samples/sec   Loss 4.9262   LearningRate 0.0221   Epoch: 10   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:44:34,514-Speed 3395.83 samples/sec   Loss 4.8647   LearningRate 0.0221   Epoch: 10   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:44:37,536-Speed 3389.43 samples/sec   Loss 4.9429   LearningRate 0.0221   Epoch: 10   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:44:40,565-Speed 3381.51 samples/sec   Loss 4.9172   LearningRate 0.0221   Epoch: 10   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:44:43,572-Speed 3405.80 samples/sec   Loss 5.0106   LearningRate 0.0221   Epoch: 10   Global Step: 53640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:44:46,560-Speed 3427.61 samples/sec   Loss 5.1227   LearningRate 0.0221   Epoch: 10   Global Step: 53650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:49,594-Speed 3375.72 samples/sec   Loss 4.9951   LearningRate 0.0220   Epoch: 10   Global Step: 53660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:52,604-Speed 3403.92 samples/sec   Loss 5.0481   LearningRate 0.0220   Epoch: 10   Global Step: 53670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:55,609-Speed 3407.93 samples/sec   Loss 5.0842   LearningRate 0.0220   Epoch: 10   Global Step: 53680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:44:58,621-Speed 3404.69 samples/sec   Loss 4.8997   LearningRate 0.0220   Epoch: 10   Global Step: 53690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:45:01,633-Speed 3400.75 samples/sec   Loss 4.9605   LearningRate 0.0220   Epoch: 10   Global Step: 53700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:45:04,644-Speed 3402.22 samples/sec   Loss 4.9400   LearningRate 0.0220   Epoch: 10   Global Step: 53710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:45:07,665-Speed 3390.02 samples/sec   Loss 4.8760   LearningRate 0.0220   Epoch: 10   Global Step: 53720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:45:10,667-Speed 3411.62 samples/sec   Loss 4.9400   LearningRate 0.0220   Epoch: 10   Global Step: 53730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:45:13,674-Speed 3406.10 samples/sec   Loss 4.8907   LearningRate 0.0220   Epoch: 10   Global Step: 53740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:45:16,680-Speed 3407.39 samples/sec   Loss 4.9564   LearningRate 0.0220   Epoch: 10   Global Step: 53750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:19,684-Speed 3409.12 samples/sec   Loss 4.9564   LearningRate 0.0220   Epoch: 10   Global Step: 53760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:22,692-Speed 3406.09 samples/sec   Loss 4.9281   LearningRate 0.0219   Epoch: 10   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:25,744-Speed 3355.31 samples/sec   Loss 5.0374   LearningRate 0.0219   Epoch: 10   Global Step: 53780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:28,790-Speed 3362.90 samples/sec   Loss 4.9916   LearningRate 0.0219   Epoch: 10   Global Step: 53790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:31,800-Speed 3403.41 samples/sec   Loss 4.9444   LearningRate 0.0219   Epoch: 10   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:34,804-Speed 3409.23 samples/sec   Loss 4.9298   LearningRate 0.0219   Epoch: 10   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:37,823-Speed 3392.94 samples/sec   Loss 4.8925   LearningRate 0.0219   Epoch: 10   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:40,840-Speed 3394.91 samples/sec   Loss 5.0406   LearningRate 0.0219   Epoch: 10   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:43,862-Speed 3388.78 samples/sec   Loss 4.9749   LearningRate 0.0219   Epoch: 10   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:46,854-Speed 3423.64 samples/sec   Loss 4.9372   LearningRate 0.0219   Epoch: 10   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:49,861-Speed 3406.43 samples/sec   Loss 5.0096   LearningRate 0.0219   Epoch: 10   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:52,867-Speed 3406.83 samples/sec   Loss 5.0402   LearningRate 0.0219   Epoch: 10   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:45:55,864-Speed 3417.83 samples/sec   Loss 4.9642   LearningRate 0.0218   Epoch: 10   Global Step: 53880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:45:58,867-Speed 3411.51 samples/sec   Loss 4.9921   LearningRate 0.0218   Epoch: 10   Global Step: 53890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:01,868-Speed 3413.14 samples/sec   Loss 5.1308   LearningRate 0.0218   Epoch: 10   Global Step: 53900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:04,879-Speed 3400.57 samples/sec   Loss 4.8895   LearningRate 0.0218   Epoch: 10   Global Step: 53910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:07,909-Speed 3380.78 samples/sec   Loss 5.0967   LearningRate 0.0218   Epoch: 10   Global Step: 53920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:10,913-Speed 3410.04 samples/sec   Loss 4.9876   LearningRate 0.0218   Epoch: 10   Global Step: 53930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:13,925-Speed 3400.84 samples/sec   Loss 5.0278   LearningRate 0.0218   Epoch: 10   Global Step: 53940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:16,941-Speed 3395.80 samples/sec   Loss 4.9184   LearningRate 0.0218   Epoch: 10   Global Step: 53950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:19,961-Speed 3391.15 samples/sec   Loss 5.1242   LearningRate 0.0218   Epoch: 10   Global Step: 53960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:22,965-Speed 3410.02 samples/sec   Loss 5.0101   LearningRate 0.0218   Epoch: 10   Global Step: 53970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:46:25,985-Speed 3390.92 samples/sec   Loss 4.8933   LearningRate 0.0218   Epoch: 10   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:46:29,001-Speed 3396.78 samples/sec   Loss 5.0171   LearningRate 0.0217   Epoch: 10   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:46:32,002-Speed 3412.62 samples/sec   Loss 5.0299   LearningRate 0.0217   Epoch: 10   Global Step: 54000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:47:16,183-[lfw][54000]XNorm: 21.819049
Training: 2022-04-11 04:47:16,184-[lfw][54000]Accuracy-Flip: 0.99783+-0.00248
Training: 2022-04-11 04:47:16,184-[lfw][54000]Accuracy-Highest: 0.99817
Training: 2022-04-11 04:48:07,548-[cfp_fp][54000]XNorm: 19.907261
Training: 2022-04-11 04:48:07,549-[cfp_fp][54000]Accuracy-Flip: 0.97757+-0.00836
Training: 2022-04-11 04:48:07,549-[cfp_fp][54000]Accuracy-Highest: 0.97757
Training: 2022-04-11 04:48:51,737-[agedb_30][54000]XNorm: 21.736796
Training: 2022-04-11 04:48:51,738-[agedb_30][54000]Accuracy-Flip: 0.97967+-0.00586
Training: 2022-04-11 04:48:51,739-[agedb_30][54000]Accuracy-Highest: 0.98083
Training: 2022-04-11 04:48:54,742-Speed 71.74 samples/sec   Loss 5.1063   LearningRate 0.0217   Epoch: 10   Global Step: 54010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:48:57,725-Speed 3433.97 samples/sec   Loss 4.9365   LearningRate 0.0217   Epoch: 10   Global Step: 54020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:00,692-Speed 3452.21 samples/sec   Loss 4.7887   LearningRate 0.0217   Epoch: 10   Global Step: 54030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:03,681-Speed 3426.88 samples/sec   Loss 4.9741   LearningRate 0.0217   Epoch: 10   Global Step: 54040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:07,976-Speed 2384.42 samples/sec   Loss 4.8590   LearningRate 0.0217   Epoch: 10   Global Step: 54050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:10,962-Speed 3430.46 samples/sec   Loss 5.0115   LearningRate 0.0217   Epoch: 10   Global Step: 54060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:13,960-Speed 3416.62 samples/sec   Loss 4.9356   LearningRate 0.0217   Epoch: 10   Global Step: 54070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:16,949-Speed 3426.61 samples/sec   Loss 4.8896   LearningRate 0.0217   Epoch: 10   Global Step: 54080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:19,948-Speed 3414.65 samples/sec   Loss 4.8757   LearningRate 0.0217   Epoch: 10   Global Step: 54090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:22,940-Speed 3423.71 samples/sec   Loss 5.0494   LearningRate 0.0216   Epoch: 10   Global Step: 54100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:25,964-Speed 3386.63 samples/sec   Loss 4.8276   LearningRate 0.0216   Epoch: 10   Global Step: 54110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:28,969-Speed 3408.57 samples/sec   Loss 4.9249   LearningRate 0.0216   Epoch: 10   Global Step: 54120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:49:31,982-Speed 3399.38 samples/sec   Loss 4.9601   LearningRate 0.0216   Epoch: 10   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:34,999-Speed 3395.23 samples/sec   Loss 4.8796   LearningRate 0.0216   Epoch: 10   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:38,031-Speed 3377.97 samples/sec   Loss 4.8898   LearningRate 0.0216   Epoch: 10   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:41,089-Speed 3350.13 samples/sec   Loss 4.8057   LearningRate 0.0216   Epoch: 10   Global Step: 54160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:44,090-Speed 3413.10 samples/sec   Loss 4.9586   LearningRate 0.0216   Epoch: 10   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:47,092-Speed 3411.75 samples/sec   Loss 4.9412   LearningRate 0.0216   Epoch: 10   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:50,097-Speed 3408.43 samples/sec   Loss 4.9345   LearningRate 0.0216   Epoch: 10   Global Step: 54190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:53,100-Speed 3410.52 samples/sec   Loss 4.9143   LearningRate 0.0215   Epoch: 10   Global Step: 54200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:56,101-Speed 3413.36 samples/sec   Loss 4.9091   LearningRate 0.0215   Epoch: 10   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:49:59,102-Speed 3413.03 samples/sec   Loss 4.9994   LearningRate 0.0215   Epoch: 10   Global Step: 54220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:02,120-Speed 3392.98 samples/sec   Loss 5.0859   LearningRate 0.0215   Epoch: 10   Global Step: 54230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:50:05,108-Speed 3428.56 samples/sec   Loss 4.8523   LearningRate 0.0215   Epoch: 10   Global Step: 54240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:08,119-Speed 3402.35 samples/sec   Loss 4.9021   LearningRate 0.0215   Epoch: 10   Global Step: 54250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:11,169-Speed 3358.09 samples/sec   Loss 4.9629   LearningRate 0.0215   Epoch: 10   Global Step: 54260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:14,325-Speed 3245.47 samples/sec   Loss 4.9102   LearningRate 0.0215   Epoch: 10   Global Step: 54270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:17,326-Speed 3412.75 samples/sec   Loss 5.0231   LearningRate 0.0215   Epoch: 10   Global Step: 54280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:20,337-Speed 3401.13 samples/sec   Loss 4.8584   LearningRate 0.0215   Epoch: 10   Global Step: 54290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:23,335-Speed 3417.06 samples/sec   Loss 4.9292   LearningRate 0.0215   Epoch: 10   Global Step: 54300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:26,348-Speed 3398.97 samples/sec   Loss 4.9293   LearningRate 0.0214   Epoch: 10   Global Step: 54310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:29,354-Speed 3407.38 samples/sec   Loss 5.0342   LearningRate 0.0214   Epoch: 10   Global Step: 54320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:32,351-Speed 3417.95 samples/sec   Loss 5.0755   LearningRate 0.0214   Epoch: 10   Global Step: 54330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:35,354-Speed 3410.51 samples/sec   Loss 4.8958   LearningRate 0.0214   Epoch: 10   Global Step: 54340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:38,373-Speed 3393.37 samples/sec   Loss 4.7450   LearningRate 0.0214   Epoch: 10   Global Step: 54350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:41,399-Speed 3385.23 samples/sec   Loss 5.0894   LearningRate 0.0214   Epoch: 10   Global Step: 54360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:44,396-Speed 3417.70 samples/sec   Loss 5.0690   LearningRate 0.0214   Epoch: 10   Global Step: 54370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:47,408-Speed 3400.41 samples/sec   Loss 4.9703   LearningRate 0.0214   Epoch: 10   Global Step: 54380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:50,408-Speed 3413.66 samples/sec   Loss 4.9072   LearningRate 0.0214   Epoch: 10   Global Step: 54390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:53,410-Speed 3411.90 samples/sec   Loss 5.0641   LearningRate 0.0214   Epoch: 10   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:50:56,413-Speed 3410.59 samples/sec   Loss 4.9589   LearningRate 0.0214   Epoch: 10   Global Step: 54410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:00,309-Speed 2628.97 samples/sec   Loss 4.9490   LearningRate 0.0213   Epoch: 10   Global Step: 54420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:03,310-Speed 3413.26 samples/sec   Loss 4.8673   LearningRate 0.0213   Epoch: 10   Global Step: 54430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:06,309-Speed 3416.07 samples/sec   Loss 4.9429   LearningRate 0.0213   Epoch: 10   Global Step: 54440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:51:09,289-Speed 3437.16 samples/sec   Loss 4.8971   LearningRate 0.0213   Epoch: 10   Global Step: 54450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:12,274-Speed 3430.55 samples/sec   Loss 4.8773   LearningRate 0.0213   Epoch: 10   Global Step: 54460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:15,267-Speed 3421.62 samples/sec   Loss 4.9648   LearningRate 0.0213   Epoch: 10   Global Step: 54470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:18,269-Speed 3412.94 samples/sec   Loss 4.9691   LearningRate 0.0213   Epoch: 10   Global Step: 54480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:21,266-Speed 3417.08 samples/sec   Loss 4.8448   LearningRate 0.0213   Epoch: 10   Global Step: 54490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:24,281-Speed 3397.21 samples/sec   Loss 4.9700   LearningRate 0.0213   Epoch: 10   Global Step: 54500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:27,279-Speed 3416.93 samples/sec   Loss 4.9958   LearningRate 0.0213   Epoch: 10   Global Step: 54510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:30,282-Speed 3410.84 samples/sec   Loss 4.8696   LearningRate 0.0213   Epoch: 10   Global Step: 54520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:33,279-Speed 3417.43 samples/sec   Loss 4.7803   LearningRate 0.0212   Epoch: 10   Global Step: 54530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:36,276-Speed 3417.82 samples/sec   Loss 4.9532   LearningRate 0.0212   Epoch: 10   Global Step: 54540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:39,275-Speed 3415.06 samples/sec   Loss 4.9365   LearningRate 0.0212   Epoch: 10   Global Step: 54550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:42,274-Speed 3415.46 samples/sec   Loss 4.8754   LearningRate 0.0212   Epoch: 10   Global Step: 54560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:45,273-Speed 3415.64 samples/sec   Loss 4.8800   LearningRate 0.0212   Epoch: 10   Global Step: 54570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:48,279-Speed 3407.40 samples/sec   Loss 5.0436   LearningRate 0.0212   Epoch: 10   Global Step: 54580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:51,283-Speed 3408.98 samples/sec   Loss 4.9230   LearningRate 0.0212   Epoch: 10   Global Step: 54590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:51:54,268-Speed 3431.75 samples/sec   Loss 4.9240   LearningRate 0.0212   Epoch: 10   Global Step: 54600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:51:57,285-Speed 3394.92 samples/sec   Loss 4.9866   LearningRate 0.0212   Epoch: 10   Global Step: 54610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:00,281-Speed 3418.20 samples/sec   Loss 4.8433   LearningRate 0.0212   Epoch: 10   Global Step: 54620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:03,295-Speed 3399.51 samples/sec   Loss 4.9637   LearningRate 0.0212   Epoch: 10   Global Step: 54630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:06,300-Speed 3407.59 samples/sec   Loss 4.8943   LearningRate 0.0211   Epoch: 10   Global Step: 54640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:09,317-Speed 3394.91 samples/sec   Loss 4.6922   LearningRate 0.0211   Epoch: 10   Global Step: 54650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:12,313-Speed 3418.30 samples/sec   Loss 4.9847   LearningRate 0.0211   Epoch: 10   Global Step: 54660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:15,308-Speed 3420.28 samples/sec   Loss 4.8643   LearningRate 0.0211   Epoch: 10   Global Step: 54670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:18,312-Speed 3409.73 samples/sec   Loss 5.0499   LearningRate 0.0211   Epoch: 10   Global Step: 54680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:21,325-Speed 3398.76 samples/sec   Loss 4.9068   LearningRate 0.0211   Epoch: 10   Global Step: 54690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:24,349-Speed 3388.01 samples/sec   Loss 4.7960   LearningRate 0.0211   Epoch: 10   Global Step: 54700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:52:27,333-Speed 3432.52 samples/sec   Loss 4.8608   LearningRate 0.0211   Epoch: 10   Global Step: 54710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:30,345-Speed 3400.90 samples/sec   Loss 5.0128   LearningRate 0.0211   Epoch: 10   Global Step: 54720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:33,343-Speed 3415.85 samples/sec   Loss 4.8979   LearningRate 0.0211   Epoch: 10   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:36,341-Speed 3416.68 samples/sec   Loss 4.8639   LearningRate 0.0211   Epoch: 10   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:39,362-Speed 3390.43 samples/sec   Loss 5.0897   LearningRate 0.0210   Epoch: 10   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:42,384-Speed 3388.90 samples/sec   Loss 4.9104   LearningRate 0.0210   Epoch: 10   Global Step: 54760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:45,388-Speed 3409.84 samples/sec   Loss 4.9123   LearningRate 0.0210   Epoch: 10   Global Step: 54770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:48,391-Speed 3410.60 samples/sec   Loss 4.9150   LearningRate 0.0210   Epoch: 10   Global Step: 54780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:51,404-Speed 3400.30 samples/sec   Loss 4.8086   LearningRate 0.0210   Epoch: 10   Global Step: 54790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:54,404-Speed 3413.98 samples/sec   Loss 4.9578   LearningRate 0.0210   Epoch: 10   Global Step: 54800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:52:57,402-Speed 3416.37 samples/sec   Loss 4.9343   LearningRate 0.0210   Epoch: 10   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:53:00,406-Speed 3409.80 samples/sec   Loss 4.9645   LearningRate 0.0210   Epoch: 10   Global Step: 54820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:53:03,429-Speed 3387.72 samples/sec   Loss 4.8923   LearningRate 0.0210   Epoch: 10   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:53:06,432-Speed 3410.87 samples/sec   Loss 4.9287   LearningRate 0.0210   Epoch: 10   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:53:09,416-Speed 3432.38 samples/sec   Loss 4.9575   LearningRate 0.0210   Epoch: 10   Global Step: 54850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:12,416-Speed 3414.09 samples/sec   Loss 5.0175   LearningRate 0.0209   Epoch: 10   Global Step: 54860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:15,420-Speed 3409.82 samples/sec   Loss 4.8565   LearningRate 0.0209   Epoch: 10   Global Step: 54870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:18,423-Speed 3410.92 samples/sec   Loss 4.9019   LearningRate 0.0209   Epoch: 10   Global Step: 54880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:21,420-Speed 3417.65 samples/sec   Loss 4.8946   LearningRate 0.0209   Epoch: 10   Global Step: 54890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:24,424-Speed 3410.15 samples/sec   Loss 4.9844   LearningRate 0.0209   Epoch: 10   Global Step: 54900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:27,469-Speed 3363.52 samples/sec   Loss 4.8330   LearningRate 0.0209   Epoch: 10   Global Step: 54910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:30,492-Speed 3387.83 samples/sec   Loss 4.8186   LearningRate 0.0209   Epoch: 10   Global Step: 54920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:33,496-Speed 3409.79 samples/sec   Loss 4.8968   LearningRate 0.0209   Epoch: 10   Global Step: 54930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:36,500-Speed 3409.57 samples/sec   Loss 4.8959   LearningRate 0.0209   Epoch: 10   Global Step: 54940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:39,506-Speed 3407.88 samples/sec   Loss 4.9282   LearningRate 0.0209   Epoch: 10   Global Step: 54950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:53:42,498-Speed 3422.33 samples/sec   Loss 4.9689   LearningRate 0.0209   Epoch: 10   Global Step: 54960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:45,504-Speed 3407.85 samples/sec   Loss 4.9189   LearningRate 0.0208   Epoch: 10   Global Step: 54970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:48,504-Speed 3414.83 samples/sec   Loss 4.9289   LearningRate 0.0208   Epoch: 10   Global Step: 54980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:51,632-Speed 3273.73 samples/sec   Loss 4.8481   LearningRate 0.0208   Epoch: 10   Global Step: 54990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:54,740-Speed 3295.66 samples/sec   Loss 4.9542   LearningRate 0.0208   Epoch: 10   Global Step: 55000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:53:57,740-Speed 3414.70 samples/sec   Loss 5.0042   LearningRate 0.0208   Epoch: 10   Global Step: 55010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:54:00,764-Speed 3386.61 samples/sec   Loss 4.8910   LearningRate 0.0208   Epoch: 10   Global Step: 55020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:54:03,803-Speed 3370.81 samples/sec   Loss 4.9466   LearningRate 0.0208   Epoch: 10   Global Step: 55030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:54:06,829-Speed 3384.43 samples/sec   Loss 4.8579   LearningRate 0.0208   Epoch: 10   Global Step: 55040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:54:09,835-Speed 3408.08 samples/sec   Loss 4.8612   LearningRate 0.0208   Epoch: 10   Global Step: 55050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:54:12,835-Speed 3413.21 samples/sec   Loss 4.8955   LearningRate 0.0208   Epoch: 10   Global Step: 55060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:15,842-Speed 3407.01 samples/sec   Loss 4.8117   LearningRate 0.0208   Epoch: 10   Global Step: 55070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:18,845-Speed 3410.60 samples/sec   Loss 4.9069   LearningRate 0.0207   Epoch: 10   Global Step: 55080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:21,852-Speed 3406.58 samples/sec   Loss 4.9205   LearningRate 0.0207   Epoch: 10   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:24,852-Speed 3414.61 samples/sec   Loss 4.8632   LearningRate 0.0207   Epoch: 10   Global Step: 55100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:27,853-Speed 3411.91 samples/sec   Loss 4.9245   LearningRate 0.0207   Epoch: 10   Global Step: 55110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:30,866-Speed 3399.91 samples/sec   Loss 4.8847   LearningRate 0.0207   Epoch: 10   Global Step: 55120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:33,865-Speed 3415.37 samples/sec   Loss 4.7136   LearningRate 0.0207   Epoch: 10   Global Step: 55130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:36,866-Speed 3413.24 samples/sec   Loss 4.8232   LearningRate 0.0207   Epoch: 10   Global Step: 55140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:39,866-Speed 3413.89 samples/sec   Loss 4.9712   LearningRate 0.0207   Epoch: 10   Global Step: 55150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:42,851-Speed 3431.26 samples/sec   Loss 4.8719   LearningRate 0.0207   Epoch: 10   Global Step: 55160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:45,855-Speed 3410.12 samples/sec   Loss 4.9338   LearningRate 0.0207   Epoch: 10   Global Step: 55170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:48,860-Speed 3407.78 samples/sec   Loss 4.7833   LearningRate 0.0207   Epoch: 10   Global Step: 55180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:51,901-Speed 3368.76 samples/sec   Loss 4.8900   LearningRate 0.0207   Epoch: 10   Global Step: 55190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:54,957-Speed 3351.86 samples/sec   Loss 4.8128   LearningRate 0.0206   Epoch: 10   Global Step: 55200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:54:57,960-Speed 3409.76 samples/sec   Loss 4.9896   LearningRate 0.0206   Epoch: 10   Global Step: 55210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:00,964-Speed 3409.75 samples/sec   Loss 4.8681   LearningRate 0.0206   Epoch: 10   Global Step: 55220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:03,969-Speed 3408.40 samples/sec   Loss 4.8019   LearningRate 0.0206   Epoch: 10   Global Step: 55230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:06,978-Speed 3403.81 samples/sec   Loss 4.7456   LearningRate 0.0206   Epoch: 10   Global Step: 55240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:09,985-Speed 3407.12 samples/sec   Loss 4.8916   LearningRate 0.0206   Epoch: 10   Global Step: 55250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:12,991-Speed 3407.09 samples/sec   Loss 4.8679   LearningRate 0.0206   Epoch: 10   Global Step: 55260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:15,996-Speed 3408.45 samples/sec   Loss 4.8985   LearningRate 0.0206   Epoch: 10   Global Step: 55270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:19,001-Speed 3409.11 samples/sec   Loss 4.8496   LearningRate 0.0206   Epoch: 10   Global Step: 55280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:22,011-Speed 3402.13 samples/sec   Loss 4.8295   LearningRate 0.0206   Epoch: 10   Global Step: 55290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:25,013-Speed 3411.93 samples/sec   Loss 4.9026   LearningRate 0.0206   Epoch: 10   Global Step: 55300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:28,026-Speed 3399.98 samples/sec   Loss 4.8760   LearningRate 0.0205   Epoch: 10   Global Step: 55310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:55:31,038-Speed 3399.98 samples/sec   Loss 4.7819   LearningRate 0.0205   Epoch: 10   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:34,048-Speed 3402.81 samples/sec   Loss 4.8855   LearningRate 0.0205   Epoch: 10   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:37,053-Speed 3408.80 samples/sec   Loss 4.6846   LearningRate 0.0205   Epoch: 10   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:40,068-Speed 3397.01 samples/sec   Loss 4.8291   LearningRate 0.0205   Epoch: 10   Global Step: 55350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:43,070-Speed 3411.81 samples/sec   Loss 4.8128   LearningRate 0.0205   Epoch: 10   Global Step: 55360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:46,075-Speed 3408.44 samples/sec   Loss 4.8288   LearningRate 0.0205   Epoch: 10   Global Step: 55370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:49,082-Speed 3406.81 samples/sec   Loss 4.8678   LearningRate 0.0205   Epoch: 10   Global Step: 55380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:52,083-Speed 3413.52 samples/sec   Loss 4.7364   LearningRate 0.0205   Epoch: 10   Global Step: 55390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:55,088-Speed 3407.75 samples/sec   Loss 4.9947   LearningRate 0.0205   Epoch: 10   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:55:58,105-Speed 3395.72 samples/sec   Loss 4.8960   LearningRate 0.0205   Epoch: 10   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:01,114-Speed 3402.93 samples/sec   Loss 4.7707   LearningRate 0.0204   Epoch: 10   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:56:04,124-Speed 3403.77 samples/sec   Loss 4.9292   LearningRate 0.0204   Epoch: 10   Global Step: 55430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:07,137-Speed 3399.34 samples/sec   Loss 4.8450   LearningRate 0.0204   Epoch: 10   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:10,137-Speed 3414.16 samples/sec   Loss 4.7418   LearningRate 0.0204   Epoch: 10   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:13,138-Speed 3413.50 samples/sec   Loss 5.0357   LearningRate 0.0204   Epoch: 10   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:16,167-Speed 3381.00 samples/sec   Loss 4.8518   LearningRate 0.0204   Epoch: 10   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:19,173-Speed 3407.62 samples/sec   Loss 4.7117   LearningRate 0.0204   Epoch: 10   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:22,174-Speed 3412.52 samples/sec   Loss 4.7688   LearningRate 0.0204   Epoch: 10   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:25,177-Speed 3410.62 samples/sec   Loss 4.7740   LearningRate 0.0204   Epoch: 10   Global Step: 55500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:28,182-Speed 3408.71 samples/sec   Loss 4.9218   LearningRate 0.0204   Epoch: 10   Global Step: 55510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:31,213-Speed 3379.57 samples/sec   Loss 4.9136   LearningRate 0.0204   Epoch: 10   Global Step: 55520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:34,233-Speed 3391.02 samples/sec   Loss 4.8606   LearningRate 0.0203   Epoch: 10   Global Step: 55530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:56:37,233-Speed 3415.16 samples/sec   Loss 4.8291   LearningRate 0.0203   Epoch: 10   Global Step: 55540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:40,238-Speed 3407.70 samples/sec   Loss 4.7792   LearningRate 0.0203   Epoch: 10   Global Step: 55550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:43,244-Speed 3407.85 samples/sec   Loss 4.8718   LearningRate 0.0203   Epoch: 10   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:56:46,240-Speed 3418.73 samples/sec   Loss 4.9328   LearningRate 0.0203   Epoch: 10   Global Step: 55570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:56:49,256-Speed 3396.61 samples/sec   Loss 4.8264   LearningRate 0.0203   Epoch: 10   Global Step: 55580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:56:52,278-Speed 3388.64 samples/sec   Loss 4.9036   LearningRate 0.0203   Epoch: 10   Global Step: 55590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:56:55,288-Speed 3403.57 samples/sec   Loss 4.7523   LearningRate 0.0203   Epoch: 10   Global Step: 55600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:56:58,299-Speed 3401.30 samples/sec   Loss 4.8458   LearningRate 0.0203   Epoch: 10   Global Step: 55610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:57:01,304-Speed 3407.68 samples/sec   Loss 4.6797   LearningRate 0.0203   Epoch: 10   Global Step: 55620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:57:04,423-Speed 3284.17 samples/sec   Loss 4.8438   LearningRate 0.0203   Epoch: 10   Global Step: 55630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:57:16,898-Speed 820.95 samples/sec   Loss 4.6885   LearningRate 0.0202   Epoch: 11   Global Step: 55640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:57:19,978-Speed 3325.46 samples/sec   Loss 4.1221   LearningRate 0.0202   Epoch: 11   Global Step: 55650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:57:22,989-Speed 3402.21 samples/sec   Loss 4.0773   LearningRate 0.0202   Epoch: 11   Global Step: 55660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 04:57:26,015-Speed 3384.65 samples/sec   Loss 4.0264   LearningRate 0.0202   Epoch: 11   Global Step: 55670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:29,046-Speed 3379.32 samples/sec   Loss 3.9786   LearningRate 0.0202   Epoch: 11   Global Step: 55680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:32,084-Speed 3371.17 samples/sec   Loss 4.1270   LearningRate 0.0202   Epoch: 11   Global Step: 55690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:35,086-Speed 3411.81 samples/sec   Loss 4.0034   LearningRate 0.0202   Epoch: 11   Global Step: 55700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:38,117-Speed 3379.07 samples/sec   Loss 4.0125   LearningRate 0.0202   Epoch: 11   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:41,168-Speed 3356.56 samples/sec   Loss 4.1319   LearningRate 0.0202   Epoch: 11   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:44,172-Speed 3410.32 samples/sec   Loss 3.9433   LearningRate 0.0202   Epoch: 11   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:47,179-Speed 3406.51 samples/sec   Loss 4.0760   LearningRate 0.0202   Epoch: 11   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:50,218-Speed 3371.03 samples/sec   Loss 4.1344   LearningRate 0.0202   Epoch: 11   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:53,258-Speed 3369.26 samples/sec   Loss 4.0797   LearningRate 0.0201   Epoch: 11   Global Step: 55760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:57:56,276-Speed 3393.66 samples/sec   Loss 4.0003   LearningRate 0.0201   Epoch: 11   Global Step: 55770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 04:57:59,268-Speed 3424.08 samples/sec   Loss 4.2533   LearningRate 0.0201   Epoch: 11   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:02,452-Speed 3216.41 samples/sec   Loss 4.1312   LearningRate 0.0201   Epoch: 11   Global Step: 55790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:05,461-Speed 3404.49 samples/sec   Loss 3.9752   LearningRate 0.0201   Epoch: 11   Global Step: 55800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:08,484-Speed 3388.08 samples/sec   Loss 4.0308   LearningRate 0.0201   Epoch: 11   Global Step: 55810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:11,502-Speed 3393.25 samples/sec   Loss 4.1331   LearningRate 0.0201   Epoch: 11   Global Step: 55820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:14,506-Speed 3410.57 samples/sec   Loss 4.1713   LearningRate 0.0201   Epoch: 11   Global Step: 55830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:17,519-Speed 3398.51 samples/sec   Loss 4.1685   LearningRate 0.0201   Epoch: 11   Global Step: 55840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:20,524-Speed 3409.36 samples/sec   Loss 4.0565   LearningRate 0.0201   Epoch: 11   Global Step: 55850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:23,552-Speed 3381.89 samples/sec   Loss 4.0859   LearningRate 0.0201   Epoch: 11   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:26,579-Speed 3384.47 samples/sec   Loss 4.1603   LearningRate 0.0200   Epoch: 11   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:29,568-Speed 3426.86 samples/sec   Loss 4.0539   LearningRate 0.0200   Epoch: 11   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:32,590-Speed 3388.90 samples/sec   Loss 4.1081   LearningRate 0.0200   Epoch: 11   Global Step: 55890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:35,603-Speed 3399.84 samples/sec   Loss 4.1405   LearningRate 0.0200   Epoch: 11   Global Step: 55900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:38,646-Speed 3366.19 samples/sec   Loss 4.1047   LearningRate 0.0200   Epoch: 11   Global Step: 55910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:41,672-Speed 3383.97 samples/sec   Loss 4.1438   LearningRate 0.0200   Epoch: 11   Global Step: 55920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:44,699-Speed 3384.28 samples/sec   Loss 4.1317   LearningRate 0.0200   Epoch: 11   Global Step: 55930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:47,712-Speed 3399.25 samples/sec   Loss 4.1339   LearningRate 0.0200   Epoch: 11   Global Step: 55940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:50,722-Speed 3403.30 samples/sec   Loss 4.1639   LearningRate 0.0200   Epoch: 11   Global Step: 55950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:53,732-Speed 3402.60 samples/sec   Loss 4.3655   LearningRate 0.0200   Epoch: 11   Global Step: 55960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:56,778-Speed 3362.81 samples/sec   Loss 4.1231   LearningRate 0.0200   Epoch: 11   Global Step: 55970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:58:59,776-Speed 3416.65 samples/sec   Loss 4.2203   LearningRate 0.0199   Epoch: 11   Global Step: 55980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:59:02,790-Speed 3397.20 samples/sec   Loss 4.2684   LearningRate 0.0199   Epoch: 11   Global Step: 55990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:59:05,798-Speed 3405.60 samples/sec   Loss 4.2431   LearningRate 0.0199   Epoch: 11   Global Step: 56000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 04:59:50,321-[lfw][56000]XNorm: 21.719909
Training: 2022-04-11 04:59:50,322-[lfw][56000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-11 04:59:50,322-[lfw][56000]Accuracy-Highest: 0.99817
Training: 2022-04-11 05:00:42,019-[cfp_fp][56000]XNorm: 20.101255
Training: 2022-04-11 05:00:42,020-[cfp_fp][56000]Accuracy-Flip: 0.97671+-0.00878
Training: 2022-04-11 05:00:42,020-[cfp_fp][56000]Accuracy-Highest: 0.97757
Training: 2022-04-11 05:01:26,358-[agedb_30][56000]XNorm: 21.878597
Training: 2022-04-11 05:01:26,358-[agedb_30][56000]Accuracy-Flip: 0.97833+-0.00745
Training: 2022-04-11 05:01:26,359-[agedb_30][56000]Accuracy-Highest: 0.98083
Training: 2022-04-11 05:01:29,379-Speed 71.32 samples/sec   Loss 4.2396   LearningRate 0.0199   Epoch: 11   Global Step: 56010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:32,393-Speed 3398.37 samples/sec   Loss 4.2381   LearningRate 0.0199   Epoch: 11   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:35,394-Speed 3412.82 samples/sec   Loss 4.1980   LearningRate 0.0199   Epoch: 11   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:38,477-Speed 3322.56 samples/sec   Loss 4.3040   LearningRate 0.0199   Epoch: 11   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:41,486-Speed 3403.73 samples/sec   Loss 4.2613   LearningRate 0.0199   Epoch: 11   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:44,477-Speed 3424.45 samples/sec   Loss 4.2558   LearningRate 0.0199   Epoch: 11   Global Step: 56060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:47,473-Speed 3418.56 samples/sec   Loss 4.2568   LearningRate 0.0199   Epoch: 11   Global Step: 56070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:50,481-Speed 3405.39 samples/sec   Loss 4.2845   LearningRate 0.0199   Epoch: 11   Global Step: 56080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 05:01:53,471-Speed 3425.59 samples/sec   Loss 4.1836   LearningRate 0.0198   Epoch: 11   Global Step: 56090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:56,470-Speed 3415.66 samples/sec   Loss 4.3173   LearningRate 0.0198   Epoch: 11   Global Step: 56100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:01:59,478-Speed 3404.68 samples/sec   Loss 4.2291   LearningRate 0.0198   Epoch: 11   Global Step: 56110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:02,489-Speed 3401.42 samples/sec   Loss 4.2700   LearningRate 0.0198   Epoch: 11   Global Step: 56120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:05,489-Speed 3414.45 samples/sec   Loss 4.2655   LearningRate 0.0198   Epoch: 11   Global Step: 56130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:08,535-Speed 3362.99 samples/sec   Loss 4.2271   LearningRate 0.0198   Epoch: 11   Global Step: 56140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:11,522-Speed 3428.87 samples/sec   Loss 4.1310   LearningRate 0.0198   Epoch: 11   Global Step: 56150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:14,533-Speed 3401.23 samples/sec   Loss 4.2395   LearningRate 0.0198   Epoch: 11   Global Step: 56160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:17,555-Speed 3389.49 samples/sec   Loss 4.2886   LearningRate 0.0198   Epoch: 11   Global Step: 56170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:20,560-Speed 3408.99 samples/sec   Loss 4.2615   LearningRate 0.0198   Epoch: 11   Global Step: 56180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:23,627-Speed 3339.70 samples/sec   Loss 4.2756   LearningRate 0.0198   Epoch: 11   Global Step: 56190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:26,667-Speed 3369.54 samples/sec   Loss 4.3767   LearningRate 0.0198   Epoch: 11   Global Step: 56200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:29,677-Speed 3402.31 samples/sec   Loss 4.3654   LearningRate 0.0197   Epoch: 11   Global Step: 56210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:32,687-Speed 3403.46 samples/sec   Loss 4.3217   LearningRate 0.0197   Epoch: 11   Global Step: 56220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:35,701-Speed 3398.18 samples/sec   Loss 4.2855   LearningRate 0.0197   Epoch: 11   Global Step: 56230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:38,710-Speed 3403.34 samples/sec   Loss 4.1744   LearningRate 0.0197   Epoch: 11   Global Step: 56240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:02:41,715-Speed 3408.19 samples/sec   Loss 4.1882   LearningRate 0.0197   Epoch: 11   Global Step: 56250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:44,723-Speed 3404.99 samples/sec   Loss 4.3331   LearningRate 0.0197   Epoch: 11   Global Step: 56260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:47,753-Speed 3381.76 samples/sec   Loss 4.3084   LearningRate 0.0197   Epoch: 11   Global Step: 56270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:50,765-Speed 3399.96 samples/sec   Loss 4.1891   LearningRate 0.0197   Epoch: 11   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:53,772-Speed 3407.08 samples/sec   Loss 4.2287   LearningRate 0.0197   Epoch: 11   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:56,777-Speed 3407.78 samples/sec   Loss 4.3983   LearningRate 0.0197   Epoch: 11   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:02:59,789-Speed 3400.08 samples/sec   Loss 4.4246   LearningRate 0.0197   Epoch: 11   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:03:02,811-Speed 3390.26 samples/sec   Loss 4.2697   LearningRate 0.0196   Epoch: 11   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:03:05,818-Speed 3405.25 samples/sec   Loss 4.2145   LearningRate 0.0196   Epoch: 11   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:03:08,810-Speed 3423.52 samples/sec   Loss 4.2014   LearningRate 0.0196   Epoch: 11   Global Step: 56340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:11,817-Speed 3406.07 samples/sec   Loss 4.2385   LearningRate 0.0196   Epoch: 11   Global Step: 56350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:14,823-Speed 3407.60 samples/sec   Loss 4.2950   LearningRate 0.0196   Epoch: 11   Global Step: 56360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:17,950-Speed 3275.69 samples/sec   Loss 4.3712   LearningRate 0.0196   Epoch: 11   Global Step: 56370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:20,953-Speed 3411.26 samples/sec   Loss 4.4364   LearningRate 0.0196   Epoch: 11   Global Step: 56380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:23,952-Speed 3415.15 samples/sec   Loss 4.3008   LearningRate 0.0196   Epoch: 11   Global Step: 56390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:26,963-Speed 3402.22 samples/sec   Loss 4.2950   LearningRate 0.0196   Epoch: 11   Global Step: 56400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:29,969-Speed 3407.28 samples/sec   Loss 4.1714   LearningRate 0.0196   Epoch: 11   Global Step: 56410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:32,966-Speed 3417.03 samples/sec   Loss 4.3942   LearningRate 0.0196   Epoch: 11   Global Step: 56420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:35,983-Speed 3395.09 samples/sec   Loss 4.2936   LearningRate 0.0196   Epoch: 11   Global Step: 56430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:39,006-Speed 3387.85 samples/sec   Loss 4.2772   LearningRate 0.0195   Epoch: 11   Global Step: 56440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:03:41,990-Speed 3432.27 samples/sec   Loss 4.3985   LearningRate 0.0195   Epoch: 11   Global Step: 56450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:44,988-Speed 3417.22 samples/sec   Loss 4.5634   LearningRate 0.0195   Epoch: 11   Global Step: 56460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:47,992-Speed 3410.26 samples/sec   Loss 4.3801   LearningRate 0.0195   Epoch: 11   Global Step: 56470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:51,005-Speed 3399.28 samples/sec   Loss 4.3480   LearningRate 0.0195   Epoch: 11   Global Step: 56480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:54,005-Speed 3414.47 samples/sec   Loss 4.2054   LearningRate 0.0195   Epoch: 11   Global Step: 56490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:03:57,008-Speed 3409.87 samples/sec   Loss 4.2949   LearningRate 0.0195   Epoch: 11   Global Step: 56500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:00,011-Speed 3411.13 samples/sec   Loss 4.4257   LearningRate 0.0195   Epoch: 11   Global Step: 56510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:03,073-Speed 3344.76 samples/sec   Loss 4.4419   LearningRate 0.0195   Epoch: 11   Global Step: 56520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:06,164-Speed 3313.95 samples/sec   Loss 4.3958   LearningRate 0.0195   Epoch: 11   Global Step: 56530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:09,169-Speed 3408.70 samples/sec   Loss 4.3965   LearningRate 0.0195   Epoch: 11   Global Step: 56540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:12,182-Speed 3399.94 samples/sec   Loss 4.3161   LearningRate 0.0194   Epoch: 11   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:04:15,197-Speed 3397.11 samples/sec   Loss 4.4084   LearningRate 0.0194   Epoch: 11   Global Step: 56560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:04:18,213-Speed 3396.34 samples/sec   Loss 4.4466   LearningRate 0.0194   Epoch: 11   Global Step: 56570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:04:21,216-Speed 3410.28 samples/sec   Loss 4.4210   LearningRate 0.0194   Epoch: 11   Global Step: 56580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:04:24,217-Speed 3413.21 samples/sec   Loss 4.5139   LearningRate 0.0194   Epoch: 11   Global Step: 56590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:04:27,211-Speed 3420.22 samples/sec   Loss 4.5054   LearningRate 0.0194   Epoch: 11   Global Step: 56600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:30,213-Speed 3412.57 samples/sec   Loss 4.3029   LearningRate 0.0194   Epoch: 11   Global Step: 56610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:33,227-Speed 3398.60 samples/sec   Loss 4.3584   LearningRate 0.0194   Epoch: 11   Global Step: 56620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:36,273-Speed 3362.58 samples/sec   Loss 4.3813   LearningRate 0.0194   Epoch: 11   Global Step: 56630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:39,295-Speed 3389.49 samples/sec   Loss 4.3952   LearningRate 0.0194   Epoch: 11   Global Step: 56640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:42,299-Speed 3409.49 samples/sec   Loss 4.4782   LearningRate 0.0194   Epoch: 11   Global Step: 56650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:45,304-Speed 3407.88 samples/sec   Loss 4.2554   LearningRate 0.0194   Epoch: 11   Global Step: 56660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:48,322-Speed 3394.38 samples/sec   Loss 4.4543   LearningRate 0.0193   Epoch: 11   Global Step: 56670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:51,335-Speed 3399.42 samples/sec   Loss 4.3623   LearningRate 0.0193   Epoch: 11   Global Step: 56680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:54,336-Speed 3413.71 samples/sec   Loss 4.3190   LearningRate 0.0193   Epoch: 11   Global Step: 56690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:04:57,347-Speed 3402.04 samples/sec   Loss 4.3282   LearningRate 0.0193   Epoch: 11   Global Step: 56700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:05:00,355-Speed 3404.99 samples/sec   Loss 4.3701   LearningRate 0.0193   Epoch: 11   Global Step: 56710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:05:03,359-Speed 3409.17 samples/sec   Loss 4.3732   LearningRate 0.0193   Epoch: 11   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:05:06,369-Speed 3403.25 samples/sec   Loss 4.5143   LearningRate 0.0193   Epoch: 11   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:05:09,375-Speed 3406.85 samples/sec   Loss 4.3155   LearningRate 0.0193   Epoch: 11   Global Step: 56740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:05:12,363-Speed 3428.42 samples/sec   Loss 4.3528   LearningRate 0.0193   Epoch: 11   Global Step: 56750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:15,364-Speed 3412.53 samples/sec   Loss 4.3621   LearningRate 0.0193   Epoch: 11   Global Step: 56760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:18,366-Speed 3412.40 samples/sec   Loss 4.4782   LearningRate 0.0193   Epoch: 11   Global Step: 56770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:21,365-Speed 3415.11 samples/sec   Loss 4.4053   LearningRate 0.0192   Epoch: 11   Global Step: 56780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:24,367-Speed 3412.35 samples/sec   Loss 4.3950   LearningRate 0.0192   Epoch: 11   Global Step: 56790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:27,374-Speed 3406.16 samples/sec   Loss 4.3841   LearningRate 0.0192   Epoch: 11   Global Step: 56800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:30,420-Speed 3363.03 samples/sec   Loss 4.5437   LearningRate 0.0192   Epoch: 11   Global Step: 56810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:33,439-Speed 3392.46 samples/sec   Loss 4.4345   LearningRate 0.0192   Epoch: 11   Global Step: 56820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:36,450-Speed 3401.17 samples/sec   Loss 4.4270   LearningRate 0.0192   Epoch: 11   Global Step: 56830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:39,467-Speed 3395.75 samples/sec   Loss 4.5089   LearningRate 0.0192   Epoch: 11   Global Step: 56840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:42,495-Speed 3382.27 samples/sec   Loss 4.3224   LearningRate 0.0192   Epoch: 11   Global Step: 56850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:05:45,488-Speed 3422.81 samples/sec   Loss 4.3907   LearningRate 0.0192   Epoch: 11   Global Step: 56860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:48,490-Speed 3412.15 samples/sec   Loss 4.4893   LearningRate 0.0192   Epoch: 11   Global Step: 56870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:51,499-Speed 3403.63 samples/sec   Loss 4.4472   LearningRate 0.0192   Epoch: 11   Global Step: 56880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:54,507-Speed 3404.54 samples/sec   Loss 4.3078   LearningRate 0.0192   Epoch: 11   Global Step: 56890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:05:57,512-Speed 3408.86 samples/sec   Loss 4.4905   LearningRate 0.0191   Epoch: 11   Global Step: 56900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:06:00,538-Speed 3384.69 samples/sec   Loss 4.4118   LearningRate 0.0191   Epoch: 11   Global Step: 56910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:06:03,544-Speed 3407.04 samples/sec   Loss 4.3694   LearningRate 0.0191   Epoch: 11   Global Step: 56920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:06:06,551-Speed 3406.31 samples/sec   Loss 4.4792   LearningRate 0.0191   Epoch: 11   Global Step: 56930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:06:09,553-Speed 3412.96 samples/sec   Loss 4.4489   LearningRate 0.0191   Epoch: 11   Global Step: 56940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:06:12,558-Speed 3407.90 samples/sec   Loss 4.3845   LearningRate 0.0191   Epoch: 11   Global Step: 56950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:06:15,569-Speed 3401.20 samples/sec   Loss 4.4541   LearningRate 0.0191   Epoch: 11   Global Step: 56960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:18,581-Speed 3401.34 samples/sec   Loss 4.2806   LearningRate 0.0191   Epoch: 11   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:21,595-Speed 3397.73 samples/sec   Loss 4.4246   LearningRate 0.0191   Epoch: 11   Global Step: 56980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:24,602-Speed 3406.54 samples/sec   Loss 4.3246   LearningRate 0.0191   Epoch: 11   Global Step: 56990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:27,604-Speed 3411.42 samples/sec   Loss 4.4539   LearningRate 0.0191   Epoch: 11   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:30,615-Speed 3401.65 samples/sec   Loss 4.5609   LearningRate 0.0190   Epoch: 11   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:33,620-Speed 3408.59 samples/sec   Loss 4.4812   LearningRate 0.0190   Epoch: 11   Global Step: 57020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:36,641-Speed 3391.26 samples/sec   Loss 4.4086   LearningRate 0.0190   Epoch: 11   Global Step: 57030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:39,649-Speed 3405.36 samples/sec   Loss 4.5055   LearningRate 0.0190   Epoch: 11   Global Step: 57040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:42,735-Speed 3318.18 samples/sec   Loss 4.4462   LearningRate 0.0190   Epoch: 11   Global Step: 57050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:45,722-Speed 3429.65 samples/sec   Loss 4.4254   LearningRate 0.0190   Epoch: 11   Global Step: 57060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:48,729-Speed 3406.47 samples/sec   Loss 4.4084   LearningRate 0.0190   Epoch: 11   Global Step: 57070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:51,734-Speed 3407.47 samples/sec   Loss 4.3905   LearningRate 0.0190   Epoch: 11   Global Step: 57080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:54,821-Speed 3319.01 samples/sec   Loss 4.4866   LearningRate 0.0190   Epoch: 11   Global Step: 57090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:06:57,828-Speed 3406.01 samples/sec   Loss 4.4860   LearningRate 0.0190   Epoch: 11   Global Step: 57100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:00,841-Speed 3399.06 samples/sec   Loss 4.3599   LearningRate 0.0190   Epoch: 11   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:03,848-Speed 3406.58 samples/sec   Loss 4.5689   LearningRate 0.0190   Epoch: 11   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:06,867-Speed 3392.52 samples/sec   Loss 4.3848   LearningRate 0.0189   Epoch: 11   Global Step: 57130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:09,855-Speed 3428.64 samples/sec   Loss 4.4509   LearningRate 0.0189   Epoch: 11   Global Step: 57140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:12,894-Speed 3369.62 samples/sec   Loss 4.3924   LearningRate 0.0189   Epoch: 11   Global Step: 57150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:15,899-Speed 3408.86 samples/sec   Loss 4.3734   LearningRate 0.0189   Epoch: 11   Global Step: 57160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:18,918-Speed 3393.17 samples/sec   Loss 4.5187   LearningRate 0.0189   Epoch: 11   Global Step: 57170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:21,925-Speed 3405.23 samples/sec   Loss 4.4403   LearningRate 0.0189   Epoch: 11   Global Step: 57180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:24,936-Speed 3401.78 samples/sec   Loss 4.4377   LearningRate 0.0189   Epoch: 11   Global Step: 57190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:27,960-Speed 3388.09 samples/sec   Loss 4.5336   LearningRate 0.0189   Epoch: 11   Global Step: 57200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:30,972-Speed 3399.58 samples/sec   Loss 4.4923   LearningRate 0.0189   Epoch: 11   Global Step: 57210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:33,978-Speed 3408.16 samples/sec   Loss 4.4126   LearningRate 0.0189   Epoch: 11   Global Step: 57220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:36,991-Speed 3399.80 samples/sec   Loss 4.4540   LearningRate 0.0189   Epoch: 11   Global Step: 57230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:07:40,004-Speed 3399.12 samples/sec   Loss 4.3353   LearningRate 0.0188   Epoch: 11   Global Step: 57240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:43,009-Speed 3408.22 samples/sec   Loss 4.5009   LearningRate 0.0188   Epoch: 11   Global Step: 57250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:46,026-Speed 3395.37 samples/sec   Loss 4.5063   LearningRate 0.0188   Epoch: 11   Global Step: 57260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:49,033-Speed 3406.72 samples/sec   Loss 4.4586   LearningRate 0.0188   Epoch: 11   Global Step: 57270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:52,061-Speed 3381.77 samples/sec   Loss 4.2834   LearningRate 0.0188   Epoch: 11   Global Step: 57280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:55,066-Speed 3408.60 samples/sec   Loss 4.4610   LearningRate 0.0188   Epoch: 11   Global Step: 57290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:07:58,076-Speed 3402.76 samples/sec   Loss 4.4843   LearningRate 0.0188   Epoch: 11   Global Step: 57300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 05:08:01,065-Speed 3426.95 samples/sec   Loss 4.3609   LearningRate 0.0188   Epoch: 11   Global Step: 57310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:08:04,072-Speed 3406.13 samples/sec   Loss 4.4516   LearningRate 0.0188   Epoch: 11   Global Step: 57320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:08:07,082-Speed 3403.80 samples/sec   Loss 4.4454   LearningRate 0.0188   Epoch: 11   Global Step: 57330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:08:10,086-Speed 3409.22 samples/sec   Loss 4.3890   LearningRate 0.0188   Epoch: 11   Global Step: 57340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:08:13,090-Speed 3409.31 samples/sec   Loss 4.6032   LearningRate 0.0188   Epoch: 11   Global Step: 57350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 05:08:16,093-Speed 3411.38 samples/sec   Loss 4.5290   LearningRate 0.0187   Epoch: 11   Global Step: 57360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:08:19,118-Speed 3385.39 samples/sec   Loss 4.5299   LearningRate 0.0187   Epoch: 11   Global Step: 57370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:08:22,124-Speed 3408.05 samples/sec   Loss 4.3301   LearningRate 0.0187   Epoch: 11   Global Step: 57380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:08:25,132-Speed 3404.88 samples/sec   Loss 4.5225   LearningRate 0.0187   Epoch: 11   Global Step: 57390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:08:28,161-Speed 3381.18 samples/sec   Loss 4.5029   LearningRate 0.0187   Epoch: 11   Global Step: 57400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:08:31,166-Speed 3407.92 samples/sec   Loss 4.4157   LearningRate 0.0187   Epoch: 11   Global Step: 57410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:34,168-Speed 3412.19 samples/sec   Loss 4.4551   LearningRate 0.0187   Epoch: 11   Global Step: 57420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:37,172-Speed 3410.07 samples/sec   Loss 4.4745   LearningRate 0.0187   Epoch: 11   Global Step: 57430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:40,175-Speed 3411.41 samples/sec   Loss 4.5707   LearningRate 0.0187   Epoch: 11   Global Step: 57440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:43,182-Speed 3406.21 samples/sec   Loss 4.4927   LearningRate 0.0187   Epoch: 11   Global Step: 57450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:46,187-Speed 3407.88 samples/sec   Loss 4.4812   LearningRate 0.0187   Epoch: 11   Global Step: 57460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:49,197-Speed 3402.67 samples/sec   Loss 4.4224   LearningRate 0.0187   Epoch: 11   Global Step: 57470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:52,201-Speed 3409.83 samples/sec   Loss 4.5773   LearningRate 0.0186   Epoch: 11   Global Step: 57480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:55,209-Speed 3404.60 samples/sec   Loss 4.4614   LearningRate 0.0186   Epoch: 11   Global Step: 57490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:08:58,214-Speed 3409.45 samples/sec   Loss 4.5686   LearningRate 0.0186   Epoch: 11   Global Step: 57500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:01,203-Speed 3426.27 samples/sec   Loss 4.4939   LearningRate 0.0186   Epoch: 11   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:04,208-Speed 3409.61 samples/sec   Loss 4.5030   LearningRate 0.0186   Epoch: 11   Global Step: 57520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:07,214-Speed 3406.48 samples/sec   Loss 4.5574   LearningRate 0.0186   Epoch: 11   Global Step: 57530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:10,216-Speed 3411.58 samples/sec   Loss 4.3043   LearningRate 0.0186   Epoch: 11   Global Step: 57540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:13,223-Speed 3406.78 samples/sec   Loss 4.4808   LearningRate 0.0186   Epoch: 11   Global Step: 57550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:16,235-Speed 3400.08 samples/sec   Loss 4.3531   LearningRate 0.0186   Epoch: 11   Global Step: 57560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:19,253-Speed 3394.22 samples/sec   Loss 4.4620   LearningRate 0.0186   Epoch: 11   Global Step: 57570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:22,258-Speed 3408.28 samples/sec   Loss 4.5113   LearningRate 0.0186   Epoch: 11   Global Step: 57580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:25,271-Speed 3399.66 samples/sec   Loss 4.5156   LearningRate 0.0186   Epoch: 11   Global Step: 57590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:28,371-Speed 3303.71 samples/sec   Loss 4.5323   LearningRate 0.0185   Epoch: 11   Global Step: 57600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:31,362-Speed 3424.62 samples/sec   Loss 4.3789   LearningRate 0.0185   Epoch: 11   Global Step: 57610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:34,377-Speed 3397.86 samples/sec   Loss 4.4469   LearningRate 0.0185   Epoch: 11   Global Step: 57620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:37,386-Speed 3404.07 samples/sec   Loss 4.5326   LearningRate 0.0185   Epoch: 11   Global Step: 57630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:40,394-Speed 3404.81 samples/sec   Loss 4.4514   LearningRate 0.0185   Epoch: 11   Global Step: 57640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:43,396-Speed 3412.56 samples/sec   Loss 4.5952   LearningRate 0.0185   Epoch: 11   Global Step: 57650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:46,412-Speed 3395.87 samples/sec   Loss 4.4392   LearningRate 0.0185   Epoch: 11   Global Step: 57660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:49,425-Speed 3399.43 samples/sec   Loss 4.5251   LearningRate 0.0185   Epoch: 11   Global Step: 57670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:52,430-Speed 3408.43 samples/sec   Loss 4.5268   LearningRate 0.0185   Epoch: 11   Global Step: 57680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:55,439-Speed 3403.42 samples/sec   Loss 4.4197   LearningRate 0.0185   Epoch: 11   Global Step: 57690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:09:58,468-Speed 3381.62 samples/sec   Loss 4.4531   LearningRate 0.0185   Epoch: 11   Global Step: 57700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:01,452-Speed 3433.19 samples/sec   Loss 4.5501   LearningRate 0.0184   Epoch: 11   Global Step: 57710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:04,469-Speed 3394.98 samples/sec   Loss 4.5284   LearningRate 0.0184   Epoch: 11   Global Step: 57720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:07,483-Speed 3397.71 samples/sec   Loss 4.5132   LearningRate 0.0184   Epoch: 11   Global Step: 57730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:10,488-Speed 3409.00 samples/sec   Loss 4.5968   LearningRate 0.0184   Epoch: 11   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:13,494-Speed 3407.21 samples/sec   Loss 4.4718   LearningRate 0.0184   Epoch: 11   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:16,501-Speed 3405.92 samples/sec   Loss 4.4806   LearningRate 0.0184   Epoch: 11   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:19,516-Speed 3397.12 samples/sec   Loss 4.5717   LearningRate 0.0184   Epoch: 11   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:22,502-Speed 3430.73 samples/sec   Loss 4.4682   LearningRate 0.0184   Epoch: 11   Global Step: 57780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:25,525-Speed 3387.94 samples/sec   Loss 4.4632   LearningRate 0.0184   Epoch: 11   Global Step: 57790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:28,536-Speed 3402.42 samples/sec   Loss 4.4963   LearningRate 0.0184   Epoch: 11   Global Step: 57800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:31,541-Speed 3407.73 samples/sec   Loss 4.4629   LearningRate 0.0184   Epoch: 11   Global Step: 57810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:34,549-Speed 3406.34 samples/sec   Loss 4.3940   LearningRate 0.0184   Epoch: 11   Global Step: 57820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:37,560-Speed 3400.49 samples/sec   Loss 4.4972   LearningRate 0.0183   Epoch: 11   Global Step: 57830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:40,576-Speed 3396.25 samples/sec   Loss 4.6557   LearningRate 0.0183   Epoch: 11   Global Step: 57840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:43,579-Speed 3411.09 samples/sec   Loss 4.4602   LearningRate 0.0183   Epoch: 11   Global Step: 57850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:46,584-Speed 3408.57 samples/sec   Loss 4.4146   LearningRate 0.0183   Epoch: 11   Global Step: 57860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:49,603-Speed 3392.20 samples/sec   Loss 4.4835   LearningRate 0.0183   Epoch: 11   Global Step: 57870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:10:52,621-Speed 3394.16 samples/sec   Loss 4.4803   LearningRate 0.0183   Epoch: 11   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:55,637-Speed 3396.31 samples/sec   Loss 4.4999   LearningRate 0.0183   Epoch: 11   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:10:58,642-Speed 3408.07 samples/sec   Loss 4.6006   LearningRate 0.0183   Epoch: 11   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:01,653-Speed 3401.72 samples/sec   Loss 4.4201   LearningRate 0.0183   Epoch: 11   Global Step: 57910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:04,683-Speed 3380.99 samples/sec   Loss 4.4859   LearningRate 0.0183   Epoch: 11   Global Step: 57920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:07,694-Speed 3401.69 samples/sec   Loss 4.5690   LearningRate 0.0183   Epoch: 11   Global Step: 57930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:10,699-Speed 3408.21 samples/sec   Loss 4.5675   LearningRate 0.0183   Epoch: 11   Global Step: 57940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:13,711-Speed 3400.42 samples/sec   Loss 4.5181   LearningRate 0.0182   Epoch: 11   Global Step: 57950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:16,807-Speed 3308.13 samples/sec   Loss 4.5300   LearningRate 0.0182   Epoch: 11   Global Step: 57960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:19,818-Speed 3402.06 samples/sec   Loss 4.5744   LearningRate 0.0182   Epoch: 11   Global Step: 57970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:11:22,825-Speed 3406.06 samples/sec   Loss 4.4946   LearningRate 0.0182   Epoch: 11   Global Step: 57980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:11:25,827-Speed 3411.72 samples/sec   Loss 4.5741   LearningRate 0.0182   Epoch: 11   Global Step: 57990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:11:28,818-Speed 3425.49 samples/sec   Loss 4.5162   LearningRate 0.0182   Epoch: 11   Global Step: 58000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:12:13,340-[lfw][58000]XNorm: 23.376417
Training: 2022-04-11 05:12:13,340-[lfw][58000]Accuracy-Flip: 0.99850+-0.00203
Training: 2022-04-11 05:12:13,341-[lfw][58000]Accuracy-Highest: 0.99850
Training: 2022-04-11 05:13:04,896-[cfp_fp][58000]XNorm: 21.504341
Training: 2022-04-11 05:13:04,897-[cfp_fp][58000]Accuracy-Flip: 0.97900+-0.00661
Training: 2022-04-11 05:13:04,897-[cfp_fp][58000]Accuracy-Highest: 0.97900
Training: 2022-04-11 05:13:49,111-[agedb_30][58000]XNorm: 23.291820
Training: 2022-04-11 05:13:49,112-[agedb_30][58000]Accuracy-Flip: 0.98083+-0.00534
Training: 2022-04-11 05:13:49,112-[agedb_30][58000]Accuracy-Highest: 0.98083
Training: 2022-04-11 05:13:52,127-Speed 71.45 samples/sec   Loss 4.4707   LearningRate 0.0182   Epoch: 11   Global Step: 58010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:13:55,115-Speed 3427.34 samples/sec   Loss 4.6467   LearningRate 0.0182   Epoch: 11   Global Step: 58020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:13:58,101-Speed 3429.72 samples/sec   Loss 4.5720   LearningRate 0.0182   Epoch: 11   Global Step: 58030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:01,090-Speed 3427.56 samples/sec   Loss 4.4535   LearningRate 0.0182   Epoch: 11   Global Step: 58040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:04,199-Speed 3294.29 samples/sec   Loss 4.4768   LearningRate 0.0182   Epoch: 11   Global Step: 58050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:07,196-Speed 3417.63 samples/sec   Loss 4.5642   LearningRate 0.0182   Epoch: 11   Global Step: 58060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:10,209-Speed 3399.06 samples/sec   Loss 4.5565   LearningRate 0.0181   Epoch: 11   Global Step: 58070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:13,202-Speed 3422.07 samples/sec   Loss 4.4774   LearningRate 0.0181   Epoch: 11   Global Step: 58080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:16,192-Speed 3426.37 samples/sec   Loss 4.6013   LearningRate 0.0181   Epoch: 11   Global Step: 58090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:19,200-Speed 3405.47 samples/sec   Loss 4.3911   LearningRate 0.0181   Epoch: 11   Global Step: 58100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:22,193-Speed 3421.46 samples/sec   Loss 4.4952   LearningRate 0.0181   Epoch: 11   Global Step: 58110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:14:25,191-Speed 3417.29 samples/sec   Loss 4.5046   LearningRate 0.0181   Epoch: 11   Global Step: 58120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:14:28,171-Speed 3436.79 samples/sec   Loss 4.6381   LearningRate 0.0181   Epoch: 11   Global Step: 58130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:31,178-Speed 3405.98 samples/sec   Loss 4.5160   LearningRate 0.0181   Epoch: 11   Global Step: 58140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:34,181-Speed 3410.17 samples/sec   Loss 4.4921   LearningRate 0.0181   Epoch: 11   Global Step: 58150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:37,187-Speed 3407.44 samples/sec   Loss 4.5084   LearningRate 0.0181   Epoch: 11   Global Step: 58160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:40,197-Speed 3402.73 samples/sec   Loss 4.4902   LearningRate 0.0181   Epoch: 11   Global Step: 58170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:43,212-Speed 3397.44 samples/sec   Loss 4.4080   LearningRate 0.0181   Epoch: 11   Global Step: 58180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:46,210-Speed 3417.24 samples/sec   Loss 4.4981   LearningRate 0.0180   Epoch: 11   Global Step: 58190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:49,267-Speed 3350.38 samples/sec   Loss 4.5575   LearningRate 0.0180   Epoch: 11   Global Step: 58200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:52,334-Speed 3339.19 samples/sec   Loss 4.5019   LearningRate 0.0180   Epoch: 11   Global Step: 58210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:55,328-Speed 3421.35 samples/sec   Loss 4.3233   LearningRate 0.0180   Epoch: 11   Global Step: 58220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:14:58,338-Speed 3402.81 samples/sec   Loss 4.4221   LearningRate 0.0180   Epoch: 11   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:01,331-Speed 3422.12 samples/sec   Loss 4.4388   LearningRate 0.0180   Epoch: 11   Global Step: 58240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:04,328-Speed 3417.75 samples/sec   Loss 4.3897   LearningRate 0.0180   Epoch: 11   Global Step: 58250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:07,334-Speed 3407.48 samples/sec   Loss 4.5543   LearningRate 0.0180   Epoch: 11   Global Step: 58260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:10,321-Speed 3428.89 samples/sec   Loss 4.5632   LearningRate 0.0180   Epoch: 11   Global Step: 58270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:13,397-Speed 3330.46 samples/sec   Loss 4.5405   LearningRate 0.0180   Epoch: 11   Global Step: 58280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:16,393-Speed 3418.23 samples/sec   Loss 4.4583   LearningRate 0.0180   Epoch: 11   Global Step: 58290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:19,390-Speed 3418.12 samples/sec   Loss 4.5360   LearningRate 0.0180   Epoch: 11   Global Step: 58300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:22,396-Speed 3407.31 samples/sec   Loss 4.3775   LearningRate 0.0179   Epoch: 11   Global Step: 58310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:25,396-Speed 3414.26 samples/sec   Loss 4.5017   LearningRate 0.0179   Epoch: 11   Global Step: 58320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:28,416-Speed 3391.06 samples/sec   Loss 4.5576   LearningRate 0.0179   Epoch: 11   Global Step: 58330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:31,422-Speed 3408.20 samples/sec   Loss 4.4149   LearningRate 0.0179   Epoch: 11   Global Step: 58340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:34,417-Speed 3419.77 samples/sec   Loss 4.5746   LearningRate 0.0179   Epoch: 11   Global Step: 58350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:37,420-Speed 3409.62 samples/sec   Loss 4.3813   LearningRate 0.0179   Epoch: 11   Global Step: 58360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:15:40,440-Speed 3392.29 samples/sec   Loss 4.5475   LearningRate 0.0179   Epoch: 11   Global Step: 58370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:43,438-Speed 3416.28 samples/sec   Loss 4.5386   LearningRate 0.0179   Epoch: 11   Global Step: 58380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:46,442-Speed 3410.55 samples/sec   Loss 4.4063   LearningRate 0.0179   Epoch: 11   Global Step: 58390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:49,480-Speed 3370.75 samples/sec   Loss 4.6105   LearningRate 0.0179   Epoch: 11   Global Step: 58400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:52,502-Speed 3389.40 samples/sec   Loss 4.5691   LearningRate 0.0179   Epoch: 11   Global Step: 58410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:55,503-Speed 3412.81 samples/sec   Loss 4.3332   LearningRate 0.0179   Epoch: 11   Global Step: 58420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:15:58,500-Speed 3417.91 samples/sec   Loss 4.3927   LearningRate 0.0178   Epoch: 11   Global Step: 58430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:01,502-Speed 3411.96 samples/sec   Loss 4.5007   LearningRate 0.0178   Epoch: 11   Global Step: 58440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:04,506-Speed 3409.21 samples/sec   Loss 4.5195   LearningRate 0.0178   Epoch: 11   Global Step: 58450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:07,512-Speed 3407.69 samples/sec   Loss 4.5083   LearningRate 0.0178   Epoch: 11   Global Step: 58460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:10,513-Speed 3413.10 samples/sec   Loss 4.5127   LearningRate 0.0178   Epoch: 11   Global Step: 58470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:16:13,497-Speed 3432.97 samples/sec   Loss 4.3756   LearningRate 0.0178   Epoch: 11   Global Step: 58480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:16,494-Speed 3417.53 samples/sec   Loss 4.3981   LearningRate 0.0178   Epoch: 11   Global Step: 58490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:19,500-Speed 3407.07 samples/sec   Loss 4.4992   LearningRate 0.0178   Epoch: 11   Global Step: 58500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:22,519-Speed 3392.92 samples/sec   Loss 4.5389   LearningRate 0.0178   Epoch: 11   Global Step: 58510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:25,565-Speed 3362.62 samples/sec   Loss 4.5459   LearningRate 0.0178   Epoch: 11   Global Step: 58520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:28,572-Speed 3405.94 samples/sec   Loss 4.4498   LearningRate 0.0178   Epoch: 11   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:31,589-Speed 3395.95 samples/sec   Loss 4.4682   LearningRate 0.0178   Epoch: 11   Global Step: 58540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:34,591-Speed 3411.71 samples/sec   Loss 4.4160   LearningRate 0.0177   Epoch: 11   Global Step: 58550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:37,589-Speed 3415.69 samples/sec   Loss 4.6078   LearningRate 0.0177   Epoch: 11   Global Step: 58560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:40,588-Speed 3416.10 samples/sec   Loss 4.4933   LearningRate 0.0177   Epoch: 11   Global Step: 58570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:43,572-Speed 3432.38 samples/sec   Loss 4.7138   LearningRate 0.0177   Epoch: 11   Global Step: 58580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:46,571-Speed 3415.64 samples/sec   Loss 4.4965   LearningRate 0.0177   Epoch: 11   Global Step: 58590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:49,575-Speed 3408.80 samples/sec   Loss 4.3255   LearningRate 0.0177   Epoch: 11   Global Step: 58600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:52,577-Speed 3412.79 samples/sec   Loss 4.5195   LearningRate 0.0177   Epoch: 11   Global Step: 58610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:55,576-Speed 3415.54 samples/sec   Loss 4.4067   LearningRate 0.0177   Epoch: 11   Global Step: 58620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:16:58,573-Speed 3416.85 samples/sec   Loss 4.5022   LearningRate 0.0177   Epoch: 11   Global Step: 58630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:01,586-Speed 3399.57 samples/sec   Loss 4.4430   LearningRate 0.0177   Epoch: 11   Global Step: 58640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:04,594-Speed 3405.55 samples/sec   Loss 4.5982   LearningRate 0.0177   Epoch: 11   Global Step: 58650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:07,599-Speed 3408.12 samples/sec   Loss 4.5647   LearningRate 0.0177   Epoch: 11   Global Step: 58660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:10,602-Speed 3410.74 samples/sec   Loss 4.4948   LearningRate 0.0176   Epoch: 11   Global Step: 58670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:13,605-Speed 3410.80 samples/sec   Loss 4.4416   LearningRate 0.0176   Epoch: 11   Global Step: 58680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:16,601-Speed 3419.07 samples/sec   Loss 4.5770   LearningRate 0.0176   Epoch: 11   Global Step: 58690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:19,628-Speed 3383.29 samples/sec   Loss 4.5141   LearningRate 0.0176   Epoch: 11   Global Step: 58700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:22,626-Speed 3416.87 samples/sec   Loss 4.4526   LearningRate 0.0176   Epoch: 11   Global Step: 58710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:25,639-Speed 3398.84 samples/sec   Loss 4.5660   LearningRate 0.0176   Epoch: 11   Global Step: 58720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:28,642-Speed 3411.00 samples/sec   Loss 4.3930   LearningRate 0.0176   Epoch: 11   Global Step: 58730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:31,653-Speed 3402.42 samples/sec   Loss 4.4280   LearningRate 0.0176   Epoch: 11   Global Step: 58740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:34,721-Speed 3338.19 samples/sec   Loss 4.3825   LearningRate 0.0176   Epoch: 11   Global Step: 58750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:37,732-Speed 3401.96 samples/sec   Loss 4.4308   LearningRate 0.0176   Epoch: 11   Global Step: 58760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:40,844-Speed 3291.76 samples/sec   Loss 4.4977   LearningRate 0.0176   Epoch: 11   Global Step: 58770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:43,844-Speed 3413.97 samples/sec   Loss 4.3774   LearningRate 0.0176   Epoch: 11   Global Step: 58780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:17:46,827-Speed 3433.76 samples/sec   Loss 4.4751   LearningRate 0.0175   Epoch: 11   Global Step: 58790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:49,827-Speed 3414.11 samples/sec   Loss 4.4497   LearningRate 0.0175   Epoch: 11   Global Step: 58800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:52,826-Speed 3414.95 samples/sec   Loss 4.4097   LearningRate 0.0175   Epoch: 11   Global Step: 58810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:55,831-Speed 3408.85 samples/sec   Loss 4.5385   LearningRate 0.0175   Epoch: 11   Global Step: 58820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:17:58,830-Speed 3415.25 samples/sec   Loss 4.4172   LearningRate 0.0175   Epoch: 11   Global Step: 58830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:01,869-Speed 3369.52 samples/sec   Loss 4.4490   LearningRate 0.0175   Epoch: 11   Global Step: 58840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:04,888-Speed 3393.14 samples/sec   Loss 4.3943   LearningRate 0.0175   Epoch: 11   Global Step: 58850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:07,890-Speed 3411.86 samples/sec   Loss 4.5378   LearningRate 0.0175   Epoch: 11   Global Step: 58860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:10,903-Speed 3399.69 samples/sec   Loss 4.4154   LearningRate 0.0175   Epoch: 11   Global Step: 58870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:13,904-Speed 3413.17 samples/sec   Loss 4.4237   LearningRate 0.0175   Epoch: 11   Global Step: 58880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:16,895-Speed 3424.48 samples/sec   Loss 4.5563   LearningRate 0.0175   Epoch: 11   Global Step: 58890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:19,902-Speed 3405.90 samples/sec   Loss 4.4540   LearningRate 0.0175   Epoch: 11   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:22,956-Speed 3354.42 samples/sec   Loss 4.4384   LearningRate 0.0174   Epoch: 11   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:25,971-Speed 3396.35 samples/sec   Loss 4.5232   LearningRate 0.0174   Epoch: 11   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:28,976-Speed 3408.66 samples/sec   Loss 4.3119   LearningRate 0.0174   Epoch: 11   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:31,976-Speed 3414.27 samples/sec   Loss 4.5456   LearningRate 0.0174   Epoch: 11   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:34,982-Speed 3407.60 samples/sec   Loss 4.3921   LearningRate 0.0174   Epoch: 11   Global Step: 58950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:37,983-Speed 3412.74 samples/sec   Loss 4.4589   LearningRate 0.0174   Epoch: 11   Global Step: 58960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:40,983-Speed 3415.18 samples/sec   Loss 4.4114   LearningRate 0.0174   Epoch: 11   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:44,007-Speed 3386.79 samples/sec   Loss 4.3394   LearningRate 0.0174   Epoch: 11   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:46,997-Speed 3425.13 samples/sec   Loss 4.3347   LearningRate 0.0174   Epoch: 11   Global Step: 58990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:50,030-Speed 3377.48 samples/sec   Loss 4.5346   LearningRate 0.0174   Epoch: 11   Global Step: 59000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:53,028-Speed 3416.05 samples/sec   Loss 4.4734   LearningRate 0.0174   Epoch: 11   Global Step: 59010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:56,025-Speed 3418.07 samples/sec   Loss 4.4943   LearningRate 0.0174   Epoch: 11   Global Step: 59020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:18:59,022-Speed 3417.49 samples/sec   Loss 4.5178   LearningRate 0.0173   Epoch: 11   Global Step: 59030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:02,039-Speed 3395.63 samples/sec   Loss 4.4141   LearningRate 0.0173   Epoch: 11   Global Step: 59040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:05,041-Speed 3411.18 samples/sec   Loss 4.5531   LearningRate 0.0173   Epoch: 11   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:08,042-Speed 3413.17 samples/sec   Loss 4.3943   LearningRate 0.0173   Epoch: 11   Global Step: 59060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:11,027-Speed 3432.00 samples/sec   Loss 4.6341   LearningRate 0.0173   Epoch: 11   Global Step: 59070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:14,023-Speed 3417.71 samples/sec   Loss 4.5414   LearningRate 0.0173   Epoch: 11   Global Step: 59080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:17,023-Speed 3415.35 samples/sec   Loss 4.3415   LearningRate 0.0173   Epoch: 11   Global Step: 59090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:20,024-Speed 3412.36 samples/sec   Loss 4.4553   LearningRate 0.0173   Epoch: 11   Global Step: 59100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:23,024-Speed 3413.42 samples/sec   Loss 4.5107   LearningRate 0.0173   Epoch: 11   Global Step: 59110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:26,031-Speed 3407.11 samples/sec   Loss 4.5862   LearningRate 0.0173   Epoch: 11   Global Step: 59120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:29,041-Speed 3402.00 samples/sec   Loss 4.2805   LearningRate 0.0173   Epoch: 11   Global Step: 59130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:32,055-Speed 3399.12 samples/sec   Loss 4.4640   LearningRate 0.0173   Epoch: 11   Global Step: 59140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:35,074-Speed 3393.15 samples/sec   Loss 4.4334   LearningRate 0.0172   Epoch: 11   Global Step: 59150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:38,078-Speed 3408.58 samples/sec   Loss 4.4291   LearningRate 0.0172   Epoch: 11   Global Step: 59160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:19:41,079-Speed 3413.41 samples/sec   Loss 4.4046   LearningRate 0.0172   Epoch: 11   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:44,091-Speed 3400.41 samples/sec   Loss 4.4467   LearningRate 0.0172   Epoch: 11   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:47,090-Speed 3415.93 samples/sec   Loss 4.3756   LearningRate 0.0172   Epoch: 11   Global Step: 59190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:50,098-Speed 3404.91 samples/sec   Loss 4.5401   LearningRate 0.0172   Epoch: 11   Global Step: 59200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:53,096-Speed 3416.63 samples/sec   Loss 4.3949   LearningRate 0.0172   Epoch: 11   Global Step: 59210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:56,101-Speed 3408.71 samples/sec   Loss 4.4940   LearningRate 0.0172   Epoch: 11   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:19:59,101-Speed 3413.79 samples/sec   Loss 4.5102   LearningRate 0.0172   Epoch: 11   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:20:02,099-Speed 3417.11 samples/sec   Loss 4.4333   LearningRate 0.0172   Epoch: 11   Global Step: 59240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:05,103-Speed 3409.68 samples/sec   Loss 4.4942   LearningRate 0.0172   Epoch: 11   Global Step: 59250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:08,106-Speed 3409.72 samples/sec   Loss 4.5075   LearningRate 0.0172   Epoch: 11   Global Step: 59260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:11,103-Speed 3418.09 samples/sec   Loss 4.4748   LearningRate 0.0171   Epoch: 11   Global Step: 59270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:14,109-Speed 3406.85 samples/sec   Loss 4.4951   LearningRate 0.0171   Epoch: 11   Global Step: 59280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:17,147-Speed 3372.15 samples/sec   Loss 4.3757   LearningRate 0.0171   Epoch: 11   Global Step: 59290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:20,153-Speed 3407.38 samples/sec   Loss 4.5438   LearningRate 0.0171   Epoch: 11   Global Step: 59300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:23,206-Speed 3354.34 samples/sec   Loss 4.5495   LearningRate 0.0171   Epoch: 11   Global Step: 59310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:26,243-Speed 3373.26 samples/sec   Loss 4.5701   LearningRate 0.0171   Epoch: 11   Global Step: 59320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:29,242-Speed 3414.87 samples/sec   Loss 4.5984   LearningRate 0.0171   Epoch: 11   Global Step: 59330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:32,245-Speed 3411.39 samples/sec   Loss 4.4442   LearningRate 0.0171   Epoch: 11   Global Step: 59340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:20:35,245-Speed 3413.90 samples/sec   Loss 4.4232   LearningRate 0.0171   Epoch: 11   Global Step: 59350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:20:38,227-Speed 3434.78 samples/sec   Loss 4.4811   LearningRate 0.0171   Epoch: 11   Global Step: 59360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:41,227-Speed 3413.84 samples/sec   Loss 4.6000   LearningRate 0.0171   Epoch: 11   Global Step: 59370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:44,228-Speed 3412.99 samples/sec   Loss 4.4548   LearningRate 0.0171   Epoch: 11   Global Step: 59380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:47,229-Speed 3413.86 samples/sec   Loss 4.5446   LearningRate 0.0170   Epoch: 11   Global Step: 59390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:50,227-Speed 3416.00 samples/sec   Loss 4.3092   LearningRate 0.0170   Epoch: 11   Global Step: 59400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:53,286-Speed 3348.96 samples/sec   Loss 4.3578   LearningRate 0.0170   Epoch: 11   Global Step: 59410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:56,283-Speed 3417.14 samples/sec   Loss 4.3979   LearningRate 0.0170   Epoch: 11   Global Step: 59420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:20:59,289-Speed 3407.68 samples/sec   Loss 4.4355   LearningRate 0.0170   Epoch: 11   Global Step: 59430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:21:02,307-Speed 3394.00 samples/sec   Loss 4.4040   LearningRate 0.0170   Epoch: 11   Global Step: 59440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:21:05,327-Speed 3391.49 samples/sec   Loss 4.4173   LearningRate 0.0170   Epoch: 11   Global Step: 59450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:21:08,332-Speed 3408.24 samples/sec   Loss 4.6200   LearningRate 0.0170   Epoch: 11   Global Step: 59460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:11,351-Speed 3393.23 samples/sec   Loss 4.3081   LearningRate 0.0170   Epoch: 11   Global Step: 59470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:14,367-Speed 3395.56 samples/sec   Loss 4.5016   LearningRate 0.0170   Epoch: 11   Global Step: 59480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:17,379-Speed 3400.82 samples/sec   Loss 4.2655   LearningRate 0.0170   Epoch: 11   Global Step: 59490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:20,383-Speed 3409.55 samples/sec   Loss 4.3938   LearningRate 0.0170   Epoch: 11   Global Step: 59500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:23,392-Speed 3403.64 samples/sec   Loss 4.3089   LearningRate 0.0170   Epoch: 11   Global Step: 59510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:26,408-Speed 3396.78 samples/sec   Loss 4.4687   LearningRate 0.0169   Epoch: 11   Global Step: 59520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:29,411-Speed 3411.00 samples/sec   Loss 4.3665   LearningRate 0.0169   Epoch: 11   Global Step: 59530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:32,417-Speed 3407.41 samples/sec   Loss 4.3784   LearningRate 0.0169   Epoch: 11   Global Step: 59540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:35,460-Speed 3365.45 samples/sec   Loss 4.4751   LearningRate 0.0169   Epoch: 11   Global Step: 59550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:38,446-Speed 3430.02 samples/sec   Loss 4.5359   LearningRate 0.0169   Epoch: 11   Global Step: 59560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:41,497-Speed 3357.31 samples/sec   Loss 4.4129   LearningRate 0.0169   Epoch: 11   Global Step: 59570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:44,501-Speed 3409.89 samples/sec   Loss 4.5288   LearningRate 0.0169   Epoch: 11   Global Step: 59580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:47,506-Speed 3408.88 samples/sec   Loss 4.3432   LearningRate 0.0169   Epoch: 11   Global Step: 59590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:50,508-Speed 3411.61 samples/sec   Loss 4.4513   LearningRate 0.0169   Epoch: 11   Global Step: 59600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:53,509-Speed 3412.67 samples/sec   Loss 4.4296   LearningRate 0.0169   Epoch: 11   Global Step: 59610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:56,506-Speed 3417.58 samples/sec   Loss 4.2895   LearningRate 0.0169   Epoch: 11   Global Step: 59620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:21:59,489-Speed 3433.30 samples/sec   Loss 4.3399   LearningRate 0.0169   Epoch: 11   Global Step: 59630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:02,492-Speed 3411.12 samples/sec   Loss 4.4542   LearningRate 0.0168   Epoch: 11   Global Step: 59640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:05,494-Speed 3412.11 samples/sec   Loss 4.4671   LearningRate 0.0168   Epoch: 11   Global Step: 59650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:08,500-Speed 3407.24 samples/sec   Loss 4.4021   LearningRate 0.0168   Epoch: 11   Global Step: 59660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:11,507-Speed 3405.97 samples/sec   Loss 4.6109   LearningRate 0.0168   Epoch: 11   Global Step: 59670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:14,531-Speed 3387.83 samples/sec   Loss 4.3292   LearningRate 0.0168   Epoch: 11   Global Step: 59680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:17,534-Speed 3410.58 samples/sec   Loss 4.3539   LearningRate 0.0168   Epoch: 11   Global Step: 59690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:20,539-Speed 3408.90 samples/sec   Loss 4.5298   LearningRate 0.0168   Epoch: 11   Global Step: 59700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:23,571-Speed 3378.11 samples/sec   Loss 4.3413   LearningRate 0.0168   Epoch: 11   Global Step: 59710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:26,575-Speed 3408.74 samples/sec   Loss 4.2632   LearningRate 0.0168   Epoch: 11   Global Step: 59720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:22:29,587-Speed 3400.71 samples/sec   Loss 4.3858   LearningRate 0.0168   Epoch: 11   Global Step: 59730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:32,587-Speed 3414.22 samples/sec   Loss 4.4137   LearningRate 0.0168   Epoch: 11   Global Step: 59740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:35,597-Speed 3402.57 samples/sec   Loss 4.4406   LearningRate 0.0168   Epoch: 11   Global Step: 59750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:38,613-Speed 3396.16 samples/sec   Loss 4.3914   LearningRate 0.0167   Epoch: 11   Global Step: 59760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:41,622-Speed 3404.90 samples/sec   Loss 4.3090   LearningRate 0.0167   Epoch: 11   Global Step: 59770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:44,629-Speed 3406.05 samples/sec   Loss 4.3177   LearningRate 0.0167   Epoch: 11   Global Step: 59780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:47,715-Speed 3318.99 samples/sec   Loss 4.5367   LearningRate 0.0167   Epoch: 11   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:50,719-Speed 3409.66 samples/sec   Loss 4.4334   LearningRate 0.0167   Epoch: 11   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:53,721-Speed 3411.58 samples/sec   Loss 4.4195   LearningRate 0.0167   Epoch: 11   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:56,723-Speed 3411.70 samples/sec   Loss 4.3412   LearningRate 0.0167   Epoch: 11   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:22:59,705-Speed 3435.53 samples/sec   Loss 4.3914   LearningRate 0.0167   Epoch: 11   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:02,713-Speed 3405.39 samples/sec   Loss 4.4925   LearningRate 0.0167   Epoch: 11   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:05,728-Speed 3396.16 samples/sec   Loss 4.3338   LearningRate 0.0167   Epoch: 11   Global Step: 59850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:08,743-Speed 3397.34 samples/sec   Loss 4.2903   LearningRate 0.0167   Epoch: 11   Global Step: 59860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:11,744-Speed 3414.51 samples/sec   Loss 4.3792   LearningRate 0.0167   Epoch: 11   Global Step: 59870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:14,743-Speed 3415.36 samples/sec   Loss 4.3852   LearningRate 0.0167   Epoch: 11   Global Step: 59880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:17,756-Speed 3399.58 samples/sec   Loss 4.3357   LearningRate 0.0166   Epoch: 11   Global Step: 59890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:20,759-Speed 3410.87 samples/sec   Loss 4.5066   LearningRate 0.0166   Epoch: 11   Global Step: 59900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:23:23,770-Speed 3400.83 samples/sec   Loss 4.4399   LearningRate 0.0166   Epoch: 11   Global Step: 59910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:26,773-Speed 3411.49 samples/sec   Loss 4.4372   LearningRate 0.0166   Epoch: 11   Global Step: 59920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:29,795-Speed 3389.25 samples/sec   Loss 4.2816   LearningRate 0.0166   Epoch: 11   Global Step: 59930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:32,849-Speed 3353.71 samples/sec   Loss 4.4249   LearningRate 0.0166   Epoch: 11   Global Step: 59940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:35,847-Speed 3416.54 samples/sec   Loss 4.4441   LearningRate 0.0166   Epoch: 11   Global Step: 59950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:38,851-Speed 3409.90 samples/sec   Loss 4.4958   LearningRate 0.0166   Epoch: 11   Global Step: 59960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:41,852-Speed 3412.54 samples/sec   Loss 4.4792   LearningRate 0.0166   Epoch: 11   Global Step: 59970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:44,856-Speed 3409.70 samples/sec   Loss 4.4931   LearningRate 0.0166   Epoch: 11   Global Step: 59980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:47,870-Speed 3398.94 samples/sec   Loss 4.5210   LearningRate 0.0166   Epoch: 11   Global Step: 59990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:23:50,873-Speed 3410.83 samples/sec   Loss 4.3455   LearningRate 0.0166   Epoch: 11   Global Step: 60000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:24:35,628-[lfw][60000]XNorm: 22.425556
Training: 2022-04-11 05:24:35,629-[lfw][60000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 05:24:35,629-[lfw][60000]Accuracy-Highest: 0.99850
Training: 2022-04-11 05:25:27,484-[cfp_fp][60000]XNorm: 21.019339
Training: 2022-04-11 05:25:27,485-[cfp_fp][60000]Accuracy-Flip: 0.97986+-0.00757
Training: 2022-04-11 05:25:27,485-[cfp_fp][60000]Accuracy-Highest: 0.97986
Training: 2022-04-11 05:26:11,817-[agedb_30][60000]XNorm: 22.608731
Training: 2022-04-11 05:26:11,818-[agedb_30][60000]Accuracy-Flip: 0.98083+-0.00720
Training: 2022-04-11 05:26:11,819-[agedb_30][60000]Accuracy-Highest: 0.98083
Training: 2022-04-11 05:26:14,817-Speed 71.14 samples/sec   Loss 4.3787   LearningRate 0.0165   Epoch: 11   Global Step: 60010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:17,801-Speed 3432.99 samples/sec   Loss 4.5122   LearningRate 0.0165   Epoch: 11   Global Step: 60020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:20,815-Speed 3397.27 samples/sec   Loss 4.4110   LearningRate 0.0165   Epoch: 11   Global Step: 60030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:23,844-Speed 3381.93 samples/sec   Loss 4.3699   LearningRate 0.0165   Epoch: 11   Global Step: 60040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:26,906-Speed 3344.93 samples/sec   Loss 4.4498   LearningRate 0.0165   Epoch: 11   Global Step: 60050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:30,047-Speed 3261.22 samples/sec   Loss 4.3965   LearningRate 0.0165   Epoch: 11   Global Step: 60060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:33,033-Speed 3430.84 samples/sec   Loss 4.3591   LearningRate 0.0165   Epoch: 11   Global Step: 60070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:36,057-Speed 3386.64 samples/sec   Loss 4.4501   LearningRate 0.0165   Epoch: 11   Global Step: 60080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:39,088-Speed 3378.73 samples/sec   Loss 4.3141   LearningRate 0.0165   Epoch: 11   Global Step: 60090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:42,087-Speed 3415.25 samples/sec   Loss 4.3010   LearningRate 0.0165   Epoch: 11   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:45,065-Speed 3439.74 samples/sec   Loss 4.6966   LearningRate 0.0165   Epoch: 11   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:48,131-Speed 3340.57 samples/sec   Loss 4.3839   LearningRate 0.0165   Epoch: 11   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:51,261-Speed 3272.39 samples/sec   Loss 4.4660   LearningRate 0.0165   Epoch: 11   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:54,253-Speed 3423.72 samples/sec   Loss 4.4459   LearningRate 0.0164   Epoch: 11   Global Step: 60140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:26:57,248-Speed 3420.35 samples/sec   Loss 4.4283   LearningRate 0.0164   Epoch: 11   Global Step: 60150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:00,247-Speed 3414.27 samples/sec   Loss 4.5005   LearningRate 0.0164   Epoch: 11   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:03,248-Speed 3413.08 samples/sec   Loss 4.3088   LearningRate 0.0164   Epoch: 11   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:06,249-Speed 3412.88 samples/sec   Loss 4.5157   LearningRate 0.0164   Epoch: 11   Global Step: 60180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:09,249-Speed 3414.69 samples/sec   Loss 4.3148   LearningRate 0.0164   Epoch: 11   Global Step: 60190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:12,269-Speed 3391.84 samples/sec   Loss 4.2456   LearningRate 0.0164   Epoch: 11   Global Step: 60200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:15,248-Speed 3437.38 samples/sec   Loss 4.3058   LearningRate 0.0164   Epoch: 11   Global Step: 60210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:18,255-Speed 3406.94 samples/sec   Loss 4.3517   LearningRate 0.0164   Epoch: 11   Global Step: 60220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:21,256-Speed 3413.31 samples/sec   Loss 4.4456   LearningRate 0.0164   Epoch: 11   Global Step: 60230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:24,279-Speed 3388.18 samples/sec   Loss 4.4129   LearningRate 0.0164   Epoch: 11   Global Step: 60240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:27,284-Speed 3408.56 samples/sec   Loss 4.3426   LearningRate 0.0164   Epoch: 11   Global Step: 60250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:27:30,295-Speed 3401.95 samples/sec   Loss 4.2259   LearningRate 0.0163   Epoch: 11   Global Step: 60260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:33,292-Speed 3417.24 samples/sec   Loss 4.3787   LearningRate 0.0163   Epoch: 11   Global Step: 60270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:36,318-Speed 3384.94 samples/sec   Loss 4.3963   LearningRate 0.0163   Epoch: 11   Global Step: 60280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:39,330-Speed 3401.20 samples/sec   Loss 4.5088   LearningRate 0.0163   Epoch: 11   Global Step: 60290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:42,339-Speed 3402.86 samples/sec   Loss 4.3729   LearningRate 0.0163   Epoch: 11   Global Step: 60300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:45,338-Speed 3415.50 samples/sec   Loss 4.3412   LearningRate 0.0163   Epoch: 11   Global Step: 60310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:48,415-Speed 3329.01 samples/sec   Loss 4.4547   LearningRate 0.0163   Epoch: 11   Global Step: 60320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:51,421-Speed 3407.19 samples/sec   Loss 4.4211   LearningRate 0.0163   Epoch: 11   Global Step: 60330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:54,420-Speed 3415.52 samples/sec   Loss 4.4116   LearningRate 0.0163   Epoch: 11   Global Step: 60340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:27:57,417-Speed 3417.11 samples/sec   Loss 4.4150   LearningRate 0.0163   Epoch: 11   Global Step: 60350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:00,440-Speed 3388.37 samples/sec   Loss 4.3406   LearningRate 0.0163   Epoch: 11   Global Step: 60360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:03,422-Speed 3435.17 samples/sec   Loss 4.4421   LearningRate 0.0163   Epoch: 11   Global Step: 60370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:06,431-Speed 3404.04 samples/sec   Loss 4.4574   LearningRate 0.0163   Epoch: 11   Global Step: 60380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:09,426-Speed 3420.05 samples/sec   Loss 4.3920   LearningRate 0.0162   Epoch: 11   Global Step: 60390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:12,429-Speed 3410.39 samples/sec   Loss 4.3053   LearningRate 0.0162   Epoch: 11   Global Step: 60400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:15,434-Speed 3408.82 samples/sec   Loss 4.4942   LearningRate 0.0162   Epoch: 11   Global Step: 60410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:18,430-Speed 3419.12 samples/sec   Loss 4.3946   LearningRate 0.0162   Epoch: 11   Global Step: 60420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:21,424-Speed 3420.56 samples/sec   Loss 4.5309   LearningRate 0.0162   Epoch: 11   Global Step: 60430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:24,418-Speed 3421.41 samples/sec   Loss 4.4029   LearningRate 0.0162   Epoch: 11   Global Step: 60440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:27,414-Speed 3418.69 samples/sec   Loss 4.3770   LearningRate 0.0162   Epoch: 11   Global Step: 60450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:30,410-Speed 3418.80 samples/sec   Loss 4.4691   LearningRate 0.0162   Epoch: 11   Global Step: 60460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:28:33,403-Speed 3421.96 samples/sec   Loss 4.4252   LearningRate 0.0162   Epoch: 11   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:36,406-Speed 3410.38 samples/sec   Loss 4.3839   LearningRate 0.0162   Epoch: 11   Global Step: 60480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:39,401-Speed 3420.41 samples/sec   Loss 4.2273   LearningRate 0.0162   Epoch: 11   Global Step: 60490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:42,398-Speed 3418.04 samples/sec   Loss 4.3997   LearningRate 0.0162   Epoch: 11   Global Step: 60500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:45,391-Speed 3421.34 samples/sec   Loss 4.3156   LearningRate 0.0161   Epoch: 11   Global Step: 60510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:48,390-Speed 3415.96 samples/sec   Loss 4.4161   LearningRate 0.0161   Epoch: 11   Global Step: 60520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:51,387-Speed 3417.27 samples/sec   Loss 4.2972   LearningRate 0.0161   Epoch: 11   Global Step: 60530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:54,385-Speed 3416.84 samples/sec   Loss 4.3370   LearningRate 0.0161   Epoch: 11   Global Step: 60540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:28:57,381-Speed 3418.29 samples/sec   Loss 4.3764   LearningRate 0.0161   Epoch: 11   Global Step: 60550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:00,377-Speed 3418.05 samples/sec   Loss 4.4280   LearningRate 0.0161   Epoch: 11   Global Step: 60560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:03,366-Speed 3427.63 samples/sec   Loss 4.3982   LearningRate 0.0161   Epoch: 11   Global Step: 60570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:06,375-Speed 3403.37 samples/sec   Loss 4.3734   LearningRate 0.0161   Epoch: 11   Global Step: 60580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:09,373-Speed 3417.73 samples/sec   Loss 4.4784   LearningRate 0.0161   Epoch: 11   Global Step: 60590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:12,392-Speed 3392.57 samples/sec   Loss 4.3059   LearningRate 0.0161   Epoch: 11   Global Step: 60600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:15,387-Speed 3419.47 samples/sec   Loss 4.3471   LearningRate 0.0161   Epoch: 11   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:18,388-Speed 3413.12 samples/sec   Loss 4.3727   LearningRate 0.0161   Epoch: 11   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:21,388-Speed 3413.54 samples/sec   Loss 4.4465   LearningRate 0.0161   Epoch: 11   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:24,469-Speed 3324.45 samples/sec   Loss 4.4764   LearningRate 0.0160   Epoch: 11   Global Step: 60640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:27,484-Speed 3397.48 samples/sec   Loss 4.3801   LearningRate 0.0160   Epoch: 11   Global Step: 60650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:30,492-Speed 3404.72 samples/sec   Loss 4.5286   LearningRate 0.0160   Epoch: 11   Global Step: 60660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:33,519-Speed 3383.66 samples/sec   Loss 4.3144   LearningRate 0.0160   Epoch: 11   Global Step: 60670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:29:36,538-Speed 3393.40 samples/sec   Loss 4.3455   LearningRate 0.0160   Epoch: 11   Global Step: 60680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:39,641-Speed 3301.31 samples/sec   Loss 4.4246   LearningRate 0.0160   Epoch: 11   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:51,928-Speed 833.45 samples/sec   Loss 4.0392   LearningRate 0.0160   Epoch: 12   Global Step: 60700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:54,966-Speed 3371.61 samples/sec   Loss 3.4326   LearningRate 0.0160   Epoch: 12   Global Step: 60710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:29:58,014-Speed 3360.81 samples/sec   Loss 3.5261   LearningRate 0.0160   Epoch: 12   Global Step: 60720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:01,116-Speed 3302.31 samples/sec   Loss 3.4776   LearningRate 0.0160   Epoch: 12   Global Step: 60730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:04,125-Speed 3403.57 samples/sec   Loss 3.5472   LearningRate 0.0160   Epoch: 12   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:07,173-Speed 3360.45 samples/sec   Loss 3.5525   LearningRate 0.0160   Epoch: 12   Global Step: 60750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:10,173-Speed 3414.13 samples/sec   Loss 3.4688   LearningRate 0.0159   Epoch: 12   Global Step: 60760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:13,194-Speed 3390.34 samples/sec   Loss 3.6518   LearningRate 0.0159   Epoch: 12   Global Step: 60770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:16,215-Speed 3390.45 samples/sec   Loss 3.5426   LearningRate 0.0159   Epoch: 12   Global Step: 60780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:30:19,217-Speed 3411.95 samples/sec   Loss 3.5861   LearningRate 0.0159   Epoch: 12   Global Step: 60790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:22,216-Speed 3415.17 samples/sec   Loss 3.6482   LearningRate 0.0159   Epoch: 12   Global Step: 60800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:25,214-Speed 3416.89 samples/sec   Loss 3.5596   LearningRate 0.0159   Epoch: 12   Global Step: 60810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:28,230-Speed 3396.25 samples/sec   Loss 3.5994   LearningRate 0.0159   Epoch: 12   Global Step: 60820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:31,231-Speed 3412.86 samples/sec   Loss 3.6085   LearningRate 0.0159   Epoch: 12   Global Step: 60830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:34,232-Speed 3413.10 samples/sec   Loss 3.5897   LearningRate 0.0159   Epoch: 12   Global Step: 60840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:37,248-Speed 3396.14 samples/sec   Loss 3.7131   LearningRate 0.0159   Epoch: 12   Global Step: 60850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:40,252-Speed 3409.58 samples/sec   Loss 3.6403   LearningRate 0.0159   Epoch: 12   Global Step: 60860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:43,258-Speed 3407.23 samples/sec   Loss 3.5657   LearningRate 0.0159   Epoch: 12   Global Step: 60870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:46,259-Speed 3413.24 samples/sec   Loss 3.5701   LearningRate 0.0159   Epoch: 12   Global Step: 60880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:49,282-Speed 3388.37 samples/sec   Loss 3.5349   LearningRate 0.0158   Epoch: 12   Global Step: 60890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:52,318-Speed 3373.38 samples/sec   Loss 3.6301   LearningRate 0.0158   Epoch: 12   Global Step: 60900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:55,376-Speed 3349.53 samples/sec   Loss 3.7512   LearningRate 0.0158   Epoch: 12   Global Step: 60910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:30:58,383-Speed 3406.27 samples/sec   Loss 3.5441   LearningRate 0.0158   Epoch: 12   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:01,388-Speed 3408.28 samples/sec   Loss 3.6582   LearningRate 0.0158   Epoch: 12   Global Step: 60930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:04,404-Speed 3396.08 samples/sec   Loss 3.5764   LearningRate 0.0158   Epoch: 12   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:07,414-Speed 3402.58 samples/sec   Loss 3.6904   LearningRate 0.0158   Epoch: 12   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:10,412-Speed 3416.60 samples/sec   Loss 3.6250   LearningRate 0.0158   Epoch: 12   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:13,413-Speed 3412.71 samples/sec   Loss 3.6510   LearningRate 0.0158   Epoch: 12   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:16,441-Speed 3382.62 samples/sec   Loss 3.7929   LearningRate 0.0158   Epoch: 12   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:19,425-Speed 3432.62 samples/sec   Loss 3.6643   LearningRate 0.0158   Epoch: 12   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:22,427-Speed 3412.36 samples/sec   Loss 3.6491   LearningRate 0.0158   Epoch: 12   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:25,493-Speed 3340.23 samples/sec   Loss 3.8409   LearningRate 0.0158   Epoch: 12   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:31:28,484-Speed 3424.48 samples/sec   Loss 3.7631   LearningRate 0.0157   Epoch: 12   Global Step: 61020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:31,488-Speed 3410.03 samples/sec   Loss 3.7253   LearningRate 0.0157   Epoch: 12   Global Step: 61030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:34,492-Speed 3409.14 samples/sec   Loss 3.7315   LearningRate 0.0157   Epoch: 12   Global Step: 61040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:37,531-Speed 3370.52 samples/sec   Loss 3.6460   LearningRate 0.0157   Epoch: 12   Global Step: 61050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:40,546-Speed 3396.64 samples/sec   Loss 3.6176   LearningRate 0.0157   Epoch: 12   Global Step: 61060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:43,565-Speed 3393.09 samples/sec   Loss 3.8004   LearningRate 0.0157   Epoch: 12   Global Step: 61070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:46,571-Speed 3407.69 samples/sec   Loss 3.6678   LearningRate 0.0157   Epoch: 12   Global Step: 61080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:49,586-Speed 3397.36 samples/sec   Loss 3.7940   LearningRate 0.0157   Epoch: 12   Global Step: 61090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:52,591-Speed 3408.57 samples/sec   Loss 3.7433   LearningRate 0.0157   Epoch: 12   Global Step: 61100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:55,614-Speed 3387.84 samples/sec   Loss 3.5928   LearningRate 0.0157   Epoch: 12   Global Step: 61110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:31:58,619-Speed 3408.72 samples/sec   Loss 3.6591   LearningRate 0.0157   Epoch: 12   Global Step: 61120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:01,627-Speed 3404.62 samples/sec   Loss 3.8275   LearningRate 0.0157   Epoch: 12   Global Step: 61130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:04,656-Speed 3382.37 samples/sec   Loss 3.7171   LearningRate 0.0157   Epoch: 12   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:07,683-Speed 3382.45 samples/sec   Loss 3.7106   LearningRate 0.0156   Epoch: 12   Global Step: 61150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:10,697-Speed 3398.95 samples/sec   Loss 3.7358   LearningRate 0.0156   Epoch: 12   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:13,718-Speed 3389.73 samples/sec   Loss 3.7729   LearningRate 0.0156   Epoch: 12   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:16,727-Speed 3404.13 samples/sec   Loss 3.8528   LearningRate 0.0156   Epoch: 12   Global Step: 61180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:19,730-Speed 3412.23 samples/sec   Loss 3.8132   LearningRate 0.0156   Epoch: 12   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:22,769-Speed 3370.47 samples/sec   Loss 3.8436   LearningRate 0.0156   Epoch: 12   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:25,795-Speed 3384.54 samples/sec   Loss 3.6131   LearningRate 0.0156   Epoch: 12   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:28,899-Speed 3300.26 samples/sec   Loss 3.8821   LearningRate 0.0156   Epoch: 12   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:32:31,896-Speed 3418.08 samples/sec   Loss 3.7111   LearningRate 0.0156   Epoch: 12   Global Step: 61230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:34,939-Speed 3366.19 samples/sec   Loss 3.5701   LearningRate 0.0156   Epoch: 12   Global Step: 61240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:37,970-Speed 3378.90 samples/sec   Loss 3.7766   LearningRate 0.0156   Epoch: 12   Global Step: 61250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:40,995-Speed 3384.97 samples/sec   Loss 3.6128   LearningRate 0.0156   Epoch: 12   Global Step: 61260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:43,999-Speed 3410.24 samples/sec   Loss 3.7962   LearningRate 0.0155   Epoch: 12   Global Step: 61270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:47,003-Speed 3409.99 samples/sec   Loss 3.8669   LearningRate 0.0155   Epoch: 12   Global Step: 61280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:50,031-Speed 3383.37 samples/sec   Loss 3.5976   LearningRate 0.0155   Epoch: 12   Global Step: 61290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:53,088-Speed 3350.15 samples/sec   Loss 3.8454   LearningRate 0.0155   Epoch: 12   Global Step: 61300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:56,091-Speed 3410.83 samples/sec   Loss 3.5678   LearningRate 0.0155   Epoch: 12   Global Step: 61310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:32:59,106-Speed 3396.35 samples/sec   Loss 3.8169   LearningRate 0.0155   Epoch: 12   Global Step: 61320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:33:02,134-Speed 3382.39 samples/sec   Loss 3.6619   LearningRate 0.0155   Epoch: 12   Global Step: 61330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:05,153-Speed 3393.88 samples/sec   Loss 3.8403   LearningRate 0.0155   Epoch: 12   Global Step: 61340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:08,161-Speed 3404.72 samples/sec   Loss 3.7045   LearningRate 0.0155   Epoch: 12   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:11,162-Speed 3412.73 samples/sec   Loss 3.8863   LearningRate 0.0155   Epoch: 12   Global Step: 61360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:14,182-Speed 3391.37 samples/sec   Loss 3.8972   LearningRate 0.0155   Epoch: 12   Global Step: 61370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:17,224-Speed 3367.58 samples/sec   Loss 3.7989   LearningRate 0.0155   Epoch: 12   Global Step: 61380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:20,242-Speed 3393.98 samples/sec   Loss 3.8825   LearningRate 0.0155   Epoch: 12   Global Step: 61390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:23,252-Speed 3402.65 samples/sec   Loss 3.9698   LearningRate 0.0154   Epoch: 12   Global Step: 61400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:26,262-Speed 3402.58 samples/sec   Loss 3.8601   LearningRate 0.0154   Epoch: 12   Global Step: 61410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:29,266-Speed 3409.47 samples/sec   Loss 3.6243   LearningRate 0.0154   Epoch: 12   Global Step: 61420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:32,247-Speed 3435.80 samples/sec   Loss 3.7172   LearningRate 0.0154   Epoch: 12   Global Step: 61430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:35,254-Speed 3406.08 samples/sec   Loss 3.9871   LearningRate 0.0154   Epoch: 12   Global Step: 61440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:38,257-Speed 3410.89 samples/sec   Loss 3.8878   LearningRate 0.0154   Epoch: 12   Global Step: 61450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:41,265-Speed 3404.53 samples/sec   Loss 3.9305   LearningRate 0.0154   Epoch: 12   Global Step: 61460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:44,277-Speed 3401.89 samples/sec   Loss 3.8025   LearningRate 0.0154   Epoch: 12   Global Step: 61470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:47,277-Speed 3413.77 samples/sec   Loss 3.7243   LearningRate 0.0154   Epoch: 12   Global Step: 61480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:50,285-Speed 3404.80 samples/sec   Loss 3.8265   LearningRate 0.0154   Epoch: 12   Global Step: 61490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:53,292-Speed 3406.00 samples/sec   Loss 3.7679   LearningRate 0.0154   Epoch: 12   Global Step: 61500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:33:56,277-Speed 3431.63 samples/sec   Loss 3.9831   LearningRate 0.0154   Epoch: 12   Global Step: 61510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:33:59,280-Speed 3410.64 samples/sec   Loss 3.8860   LearningRate 0.0154   Epoch: 12   Global Step: 61520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:02,439-Speed 3241.96 samples/sec   Loss 3.8320   LearningRate 0.0153   Epoch: 12   Global Step: 61530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:05,581-Speed 3260.37 samples/sec   Loss 3.9101   LearningRate 0.0153   Epoch: 12   Global Step: 61540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:08,588-Speed 3406.65 samples/sec   Loss 3.8860   LearningRate 0.0153   Epoch: 12   Global Step: 61550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:11,590-Speed 3411.93 samples/sec   Loss 3.7908   LearningRate 0.0153   Epoch: 12   Global Step: 61560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:14,600-Speed 3403.66 samples/sec   Loss 3.7880   LearningRate 0.0153   Epoch: 12   Global Step: 61570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:17,604-Speed 3408.58 samples/sec   Loss 3.8543   LearningRate 0.0153   Epoch: 12   Global Step: 61580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:20,604-Speed 3414.65 samples/sec   Loss 3.9137   LearningRate 0.0153   Epoch: 12   Global Step: 61590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:23,607-Speed 3410.50 samples/sec   Loss 3.9488   LearningRate 0.0153   Epoch: 12   Global Step: 61600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:26,641-Speed 3376.37 samples/sec   Loss 3.8442   LearningRate 0.0153   Epoch: 12   Global Step: 61610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:34:29,650-Speed 3403.39 samples/sec   Loss 3.7820   LearningRate 0.0153   Epoch: 12   Global Step: 61620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:34:32,653-Speed 3410.72 samples/sec   Loss 3.8942   LearningRate 0.0153   Epoch: 12   Global Step: 61630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:34:35,662-Speed 3404.00 samples/sec   Loss 3.7668   LearningRate 0.0153   Epoch: 12   Global Step: 61640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:34:38,716-Speed 3354.91 samples/sec   Loss 3.9193   LearningRate 0.0153   Epoch: 12   Global Step: 61650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:34:41,780-Speed 3342.11 samples/sec   Loss 3.9938   LearningRate 0.0152   Epoch: 12   Global Step: 61660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:34:44,785-Speed 3408.82 samples/sec   Loss 3.8533   LearningRate 0.0152   Epoch: 12   Global Step: 61670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:34:47,794-Speed 3403.92 samples/sec   Loss 4.0577   LearningRate 0.0152   Epoch: 12   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:50,819-Speed 3386.17 samples/sec   Loss 3.8607   LearningRate 0.0152   Epoch: 12   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:53,827-Speed 3405.11 samples/sec   Loss 3.9628   LearningRate 0.0152   Epoch: 12   Global Step: 61700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:56,839-Speed 3399.54 samples/sec   Loss 3.9268   LearningRate 0.0152   Epoch: 12   Global Step: 61710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:34:59,847-Speed 3405.69 samples/sec   Loss 3.8184   LearningRate 0.0152   Epoch: 12   Global Step: 61720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:02,857-Speed 3402.80 samples/sec   Loss 3.7811   LearningRate 0.0152   Epoch: 12   Global Step: 61730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:05,921-Speed 3342.80 samples/sec   Loss 3.9630   LearningRate 0.0152   Epoch: 12   Global Step: 61740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:08,924-Speed 3411.51 samples/sec   Loss 3.6831   LearningRate 0.0152   Epoch: 12   Global Step: 61750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:11,933-Speed 3404.22 samples/sec   Loss 3.8330   LearningRate 0.0152   Epoch: 12   Global Step: 61760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:14,950-Speed 3394.11 samples/sec   Loss 3.8880   LearningRate 0.0152   Epoch: 12   Global Step: 61770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:17,979-Speed 3381.82 samples/sec   Loss 3.7945   LearningRate 0.0152   Epoch: 12   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:35:20,984-Speed 3407.83 samples/sec   Loss 3.8911   LearningRate 0.0151   Epoch: 12   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:35:23,987-Speed 3411.42 samples/sec   Loss 3.8672   LearningRate 0.0151   Epoch: 12   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:35:26,974-Speed 3429.04 samples/sec   Loss 3.9665   LearningRate 0.0151   Epoch: 12   Global Step: 61810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:29,981-Speed 3405.89 samples/sec   Loss 3.9613   LearningRate 0.0151   Epoch: 12   Global Step: 61820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:32,987-Speed 3407.26 samples/sec   Loss 3.9231   LearningRate 0.0151   Epoch: 12   Global Step: 61830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:36,001-Speed 3398.92 samples/sec   Loss 3.8934   LearningRate 0.0151   Epoch: 12   Global Step: 61840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:39,007-Speed 3407.75 samples/sec   Loss 3.9672   LearningRate 0.0151   Epoch: 12   Global Step: 61850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:42,020-Speed 3399.55 samples/sec   Loss 3.8655   LearningRate 0.0151   Epoch: 12   Global Step: 61860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:45,028-Speed 3404.58 samples/sec   Loss 4.0376   LearningRate 0.0151   Epoch: 12   Global Step: 61870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:48,052-Speed 3387.23 samples/sec   Loss 3.9328   LearningRate 0.0151   Epoch: 12   Global Step: 61880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:51,077-Speed 3386.05 samples/sec   Loss 3.9669   LearningRate 0.0151   Epoch: 12   Global Step: 61890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:54,082-Speed 3408.65 samples/sec   Loss 4.0872   LearningRate 0.0151   Epoch: 12   Global Step: 61900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:35:57,081-Speed 3414.38 samples/sec   Loss 3.8983   LearningRate 0.0151   Epoch: 12   Global Step: 61910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:36:00,090-Speed 3404.07 samples/sec   Loss 3.9252   LearningRate 0.0150   Epoch: 12   Global Step: 61920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:36:03,128-Speed 3372.02 samples/sec   Loss 3.9725   LearningRate 0.0150   Epoch: 12   Global Step: 61930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:36:06,213-Speed 3319.23 samples/sec   Loss 4.0069   LearningRate 0.0150   Epoch: 12   Global Step: 61940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:36:09,225-Speed 3401.74 samples/sec   Loss 3.9786   LearningRate 0.0150   Epoch: 12   Global Step: 61950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:36:12,228-Speed 3410.15 samples/sec   Loss 3.9705   LearningRate 0.0150   Epoch: 12   Global Step: 61960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:36:15,256-Speed 3383.54 samples/sec   Loss 4.0081   LearningRate 0.0150   Epoch: 12   Global Step: 61970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:36:18,322-Speed 3340.59 samples/sec   Loss 3.8883   LearningRate 0.0150   Epoch: 12   Global Step: 61980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:36:21,327-Speed 3407.41 samples/sec   Loss 3.9159   LearningRate 0.0150   Epoch: 12   Global Step: 61990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:36:24,374-Speed 3361.65 samples/sec   Loss 3.9287   LearningRate 0.0150   Epoch: 12   Global Step: 62000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:37:08,681-[lfw][62000]XNorm: 22.459812
Training: 2022-04-11 05:37:08,682-[lfw][62000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-11 05:37:08,682-[lfw][62000]Accuracy-Highest: 0.99850
Training: 2022-04-11 05:38:00,155-[cfp_fp][62000]XNorm: 20.965085
Training: 2022-04-11 05:38:00,156-[cfp_fp][62000]Accuracy-Flip: 0.98100+-0.00658
Training: 2022-04-11 05:38:00,156-[cfp_fp][62000]Accuracy-Highest: 0.98100
Training: 2022-04-11 05:38:44,047-[agedb_30][62000]XNorm: 22.390025
Training: 2022-04-11 05:38:44,048-[agedb_30][62000]Accuracy-Flip: 0.98267+-0.00786
Training: 2022-04-11 05:38:44,048-[agedb_30][62000]Accuracy-Highest: 0.98267
Training: 2022-04-11 05:38:47,044-Speed 71.77 samples/sec   Loss 4.0041   LearningRate 0.0150   Epoch: 12   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:38:50,028-Speed 3432.12 samples/sec   Loss 3.9698   LearningRate 0.0150   Epoch: 12   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:38:53,016-Speed 3426.97 samples/sec   Loss 3.9531   LearningRate 0.0150   Epoch: 12   Global Step: 62030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:38:56,006-Speed 3426.68 samples/sec   Loss 3.9416   LearningRate 0.0150   Epoch: 12   Global Step: 62040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:38:59,013-Speed 3406.16 samples/sec   Loss 3.7979   LearningRate 0.0149   Epoch: 12   Global Step: 62050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:02,001-Speed 3427.61 samples/sec   Loss 3.7785   LearningRate 0.0149   Epoch: 12   Global Step: 62060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:04,992-Speed 3424.42 samples/sec   Loss 3.9109   LearningRate 0.0149   Epoch: 12   Global Step: 62070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:07,983-Speed 3424.48 samples/sec   Loss 4.0140   LearningRate 0.0149   Epoch: 12   Global Step: 62080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:10,983-Speed 3414.79 samples/sec   Loss 3.9645   LearningRate 0.0149   Epoch: 12   Global Step: 62090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:13,997-Speed 3397.75 samples/sec   Loss 3.8514   LearningRate 0.0149   Epoch: 12   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:16,993-Speed 3419.19 samples/sec   Loss 3.8779   LearningRate 0.0149   Epoch: 12   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:19,990-Speed 3417.49 samples/sec   Loss 3.9255   LearningRate 0.0149   Epoch: 12   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:22,964-Speed 3442.91 samples/sec   Loss 3.8906   LearningRate 0.0149   Epoch: 12   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:25,965-Speed 3414.02 samples/sec   Loss 3.9984   LearningRate 0.0149   Epoch: 12   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:28,973-Speed 3404.40 samples/sec   Loss 3.8942   LearningRate 0.0149   Epoch: 12   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:31,969-Speed 3419.74 samples/sec   Loss 3.8947   LearningRate 0.0149   Epoch: 12   Global Step: 62160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:34,958-Speed 3426.35 samples/sec   Loss 3.9240   LearningRate 0.0149   Epoch: 12   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:37,969-Speed 3401.59 samples/sec   Loss 3.8551   LearningRate 0.0148   Epoch: 12   Global Step: 62180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:40,971-Speed 3412.64 samples/sec   Loss 3.9235   LearningRate 0.0148   Epoch: 12   Global Step: 62190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:43,968-Speed 3416.95 samples/sec   Loss 3.9792   LearningRate 0.0148   Epoch: 12   Global Step: 62200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:46,961-Speed 3422.56 samples/sec   Loss 3.9609   LearningRate 0.0148   Epoch: 12   Global Step: 62210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:50,015-Speed 3353.97 samples/sec   Loss 3.8779   LearningRate 0.0148   Epoch: 12   Global Step: 62220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:53,093-Speed 3327.20 samples/sec   Loss 3.7856   LearningRate 0.0148   Epoch: 12   Global Step: 62230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:39:56,072-Speed 3438.99 samples/sec   Loss 4.0555   LearningRate 0.0148   Epoch: 12   Global Step: 62240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:39:59,069-Speed 3417.46 samples/sec   Loss 3.9135   LearningRate 0.0148   Epoch: 12   Global Step: 62250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:02,108-Speed 3371.19 samples/sec   Loss 4.1471   LearningRate 0.0148   Epoch: 12   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:05,095-Speed 3428.65 samples/sec   Loss 3.9050   LearningRate 0.0148   Epoch: 12   Global Step: 62270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:08,098-Speed 3410.06 samples/sec   Loss 3.9930   LearningRate 0.0148   Epoch: 12   Global Step: 62280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:11,106-Speed 3405.39 samples/sec   Loss 3.9619   LearningRate 0.0148   Epoch: 12   Global Step: 62290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:14,115-Speed 3404.17 samples/sec   Loss 3.9602   LearningRate 0.0148   Epoch: 12   Global Step: 62300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:17,113-Speed 3416.82 samples/sec   Loss 3.9471   LearningRate 0.0147   Epoch: 12   Global Step: 62310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:20,115-Speed 3411.53 samples/sec   Loss 3.8995   LearningRate 0.0147   Epoch: 12   Global Step: 62320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:23,130-Speed 3397.10 samples/sec   Loss 3.9397   LearningRate 0.0147   Epoch: 12   Global Step: 62330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:26,188-Speed 3349.14 samples/sec   Loss 4.0233   LearningRate 0.0147   Epoch: 12   Global Step: 62340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:29,205-Speed 3394.53 samples/sec   Loss 3.9925   LearningRate 0.0147   Epoch: 12   Global Step: 62350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:32,203-Speed 3416.79 samples/sec   Loss 3.8828   LearningRate 0.0147   Epoch: 12   Global Step: 62360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:40:35,200-Speed 3417.31 samples/sec   Loss 4.0434   LearningRate 0.0147   Epoch: 12   Global Step: 62370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:38,201-Speed 3413.78 samples/sec   Loss 3.9863   LearningRate 0.0147   Epoch: 12   Global Step: 62380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:41,199-Speed 3415.80 samples/sec   Loss 3.9690   LearningRate 0.0147   Epoch: 12   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:44,209-Speed 3402.83 samples/sec   Loss 3.8509   LearningRate 0.0147   Epoch: 12   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:47,207-Speed 3416.31 samples/sec   Loss 3.9224   LearningRate 0.0147   Epoch: 12   Global Step: 62410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:50,205-Speed 3416.10 samples/sec   Loss 3.9930   LearningRate 0.0147   Epoch: 12   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:53,210-Speed 3408.84 samples/sec   Loss 3.9107   LearningRate 0.0147   Epoch: 12   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:56,209-Speed 3415.53 samples/sec   Loss 3.9189   LearningRate 0.0147   Epoch: 12   Global Step: 62440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:40:59,216-Speed 3406.91 samples/sec   Loss 3.9906   LearningRate 0.0146   Epoch: 12   Global Step: 62450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:02,219-Speed 3410.60 samples/sec   Loss 3.9350   LearningRate 0.0146   Epoch: 12   Global Step: 62460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:05,205-Speed 3430.46 samples/sec   Loss 3.9748   LearningRate 0.0146   Epoch: 12   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:08,224-Speed 3392.19 samples/sec   Loss 4.0699   LearningRate 0.0146   Epoch: 12   Global Step: 62480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:11,251-Speed 3383.46 samples/sec   Loss 4.0096   LearningRate 0.0146   Epoch: 12   Global Step: 62490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:14,292-Speed 3368.63 samples/sec   Loss 4.0159   LearningRate 0.0146   Epoch: 12   Global Step: 62500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:17,448-Speed 3245.24 samples/sec   Loss 3.9487   LearningRate 0.0146   Epoch: 12   Global Step: 62510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:20,451-Speed 3410.82 samples/sec   Loss 3.9393   LearningRate 0.0146   Epoch: 12   Global Step: 62520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:23,472-Speed 3389.84 samples/sec   Loss 3.9987   LearningRate 0.0146   Epoch: 12   Global Step: 62530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:26,480-Speed 3406.57 samples/sec   Loss 3.9560   LearningRate 0.0146   Epoch: 12   Global Step: 62540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:29,483-Speed 3410.20 samples/sec   Loss 4.0276   LearningRate 0.0146   Epoch: 12   Global Step: 62550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:32,481-Speed 3416.37 samples/sec   Loss 3.9337   LearningRate 0.0146   Epoch: 12   Global Step: 62560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:35,484-Speed 3411.50 samples/sec   Loss 3.8466   LearningRate 0.0146   Epoch: 12   Global Step: 62570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:41:38,484-Speed 3413.55 samples/sec   Loss 4.0486   LearningRate 0.0145   Epoch: 12   Global Step: 62580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:41,494-Speed 3402.38 samples/sec   Loss 4.0194   LearningRate 0.0145   Epoch: 12   Global Step: 62590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:41:44,499-Speed 3409.31 samples/sec   Loss 4.0004   LearningRate 0.0145   Epoch: 12   Global Step: 62600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:41:47,512-Speed 3398.84 samples/sec   Loss 4.0664   LearningRate 0.0145   Epoch: 12   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:41:50,517-Speed 3408.75 samples/sec   Loss 3.9148   LearningRate 0.0145   Epoch: 12   Global Step: 62620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:41:53,527-Speed 3402.26 samples/sec   Loss 3.9220   LearningRate 0.0145   Epoch: 12   Global Step: 62630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:41:56,537-Speed 3404.28 samples/sec   Loss 3.9575   LearningRate 0.0145   Epoch: 12   Global Step: 62640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:41:59,539-Speed 3410.98 samples/sec   Loss 3.9950   LearningRate 0.0145   Epoch: 12   Global Step: 62650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:02,562-Speed 3388.56 samples/sec   Loss 3.9607   LearningRate 0.0145   Epoch: 12   Global Step: 62660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:05,562-Speed 3414.13 samples/sec   Loss 3.9737   LearningRate 0.0145   Epoch: 12   Global Step: 62670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:08,573-Speed 3401.13 samples/sec   Loss 3.8248   LearningRate 0.0145   Epoch: 12   Global Step: 62680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:11,576-Speed 3411.87 samples/sec   Loss 3.9361   LearningRate 0.0145   Epoch: 12   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:14,592-Speed 3395.25 samples/sec   Loss 3.9501   LearningRate 0.0145   Epoch: 12   Global Step: 62700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:42:17,615-Speed 3388.78 samples/sec   Loss 4.0859   LearningRate 0.0144   Epoch: 12   Global Step: 62710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:42:20,621-Speed 3406.48 samples/sec   Loss 4.0438   LearningRate 0.0144   Epoch: 12   Global Step: 62720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:42:23,671-Speed 3359.18 samples/sec   Loss 3.9731   LearningRate 0.0144   Epoch: 12   Global Step: 62730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:42:26,871-Speed 3200.69 samples/sec   Loss 4.0825   LearningRate 0.0144   Epoch: 12   Global Step: 62740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:42:29,908-Speed 3373.34 samples/sec   Loss 3.9150   LearningRate 0.0144   Epoch: 12   Global Step: 62750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:42:32,892-Speed 3431.76 samples/sec   Loss 4.1014   LearningRate 0.0144   Epoch: 12   Global Step: 62760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:35,896-Speed 3410.73 samples/sec   Loss 4.0458   LearningRate 0.0144   Epoch: 12   Global Step: 62770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:38,918-Speed 3388.20 samples/sec   Loss 4.0395   LearningRate 0.0144   Epoch: 12   Global Step: 62780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:41,924-Speed 3407.64 samples/sec   Loss 3.9814   LearningRate 0.0144   Epoch: 12   Global Step: 62790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:44,925-Speed 3413.35 samples/sec   Loss 3.9716   LearningRate 0.0144   Epoch: 12   Global Step: 62800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:47,933-Speed 3404.60 samples/sec   Loss 3.9494   LearningRate 0.0144   Epoch: 12   Global Step: 62810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:50,951-Speed 3394.11 samples/sec   Loss 4.0392   LearningRate 0.0144   Epoch: 12   Global Step: 62820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:54,030-Speed 3326.14 samples/sec   Loss 3.8946   LearningRate 0.0144   Epoch: 12   Global Step: 62830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:42:57,032-Speed 3413.33 samples/sec   Loss 3.9789   LearningRate 0.0143   Epoch: 12   Global Step: 62840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:43:00,038-Speed 3406.28 samples/sec   Loss 4.0212   LearningRate 0.0143   Epoch: 12   Global Step: 62850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:43:03,044-Speed 3407.33 samples/sec   Loss 3.8559   LearningRate 0.0143   Epoch: 12   Global Step: 62860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:06,061-Speed 3395.45 samples/sec   Loss 3.9656   LearningRate 0.0143   Epoch: 12   Global Step: 62870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:09,066-Speed 3408.42 samples/sec   Loss 3.9322   LearningRate 0.0143   Epoch: 12   Global Step: 62880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:12,078-Speed 3400.32 samples/sec   Loss 3.9391   LearningRate 0.0143   Epoch: 12   Global Step: 62890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:15,096-Speed 3393.97 samples/sec   Loss 3.9729   LearningRate 0.0143   Epoch: 12   Global Step: 62900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:18,121-Speed 3386.36 samples/sec   Loss 3.9599   LearningRate 0.0143   Epoch: 12   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:21,133-Speed 3400.22 samples/sec   Loss 4.0136   LearningRate 0.0143   Epoch: 12   Global Step: 62920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:24,142-Speed 3404.52 samples/sec   Loss 3.8409   LearningRate 0.0143   Epoch: 12   Global Step: 62930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:27,226-Speed 3321.59 samples/sec   Loss 3.9831   LearningRate 0.0143   Epoch: 12   Global Step: 62940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:30,320-Speed 3310.15 samples/sec   Loss 3.9881   LearningRate 0.0143   Epoch: 12   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:33,310-Speed 3425.23 samples/sec   Loss 4.0676   LearningRate 0.0143   Epoch: 12   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:36,317-Speed 3405.81 samples/sec   Loss 3.9477   LearningRate 0.0143   Epoch: 12   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:39,416-Speed 3305.57 samples/sec   Loss 3.9573   LearningRate 0.0142   Epoch: 12   Global Step: 62980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:42,435-Speed 3392.96 samples/sec   Loss 3.9724   LearningRate 0.0142   Epoch: 12   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:45,440-Speed 3407.54 samples/sec   Loss 3.9074   LearningRate 0.0142   Epoch: 12   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:48,444-Speed 3410.19 samples/sec   Loss 3.9417   LearningRate 0.0142   Epoch: 12   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:51,450-Speed 3408.02 samples/sec   Loss 4.0218   LearningRate 0.0142   Epoch: 12   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:54,456-Speed 3407.08 samples/sec   Loss 3.9034   LearningRate 0.0142   Epoch: 12   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:43:57,456-Speed 3414.11 samples/sec   Loss 4.0594   LearningRate 0.0142   Epoch: 12   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:00,528-Speed 3334.89 samples/sec   Loss 3.9471   LearningRate 0.0142   Epoch: 12   Global Step: 63050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:03,522-Speed 3420.66 samples/sec   Loss 3.9934   LearningRate 0.0142   Epoch: 12   Global Step: 63060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:06,549-Speed 3383.61 samples/sec   Loss 4.0095   LearningRate 0.0142   Epoch: 12   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:09,559-Speed 3403.05 samples/sec   Loss 3.9578   LearningRate 0.0142   Epoch: 12   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:12,591-Speed 3377.79 samples/sec   Loss 3.9436   LearningRate 0.0142   Epoch: 12   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:15,601-Speed 3403.86 samples/sec   Loss 4.0252   LearningRate 0.0142   Epoch: 12   Global Step: 63100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:18,619-Speed 3393.07 samples/sec   Loss 3.9276   LearningRate 0.0141   Epoch: 12   Global Step: 63110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:21,634-Speed 3397.23 samples/sec   Loss 4.0416   LearningRate 0.0141   Epoch: 12   Global Step: 63120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:24,670-Speed 3373.63 samples/sec   Loss 3.9034   LearningRate 0.0141   Epoch: 12   Global Step: 63130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:27,732-Speed 3345.02 samples/sec   Loss 3.9594   LearningRate 0.0141   Epoch: 12   Global Step: 63140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:30,763-Speed 3380.38 samples/sec   Loss 3.9974   LearningRate 0.0141   Epoch: 12   Global Step: 63150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:33,769-Speed 3407.27 samples/sec   Loss 3.9967   LearningRate 0.0141   Epoch: 12   Global Step: 63160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:36,774-Speed 3407.97 samples/sec   Loss 3.9989   LearningRate 0.0141   Epoch: 12   Global Step: 63170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:39,787-Speed 3399.45 samples/sec   Loss 3.9546   LearningRate 0.0141   Epoch: 12   Global Step: 63180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:42,799-Speed 3400.72 samples/sec   Loss 3.9716   LearningRate 0.0141   Epoch: 12   Global Step: 63190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:44:45,804-Speed 3408.18 samples/sec   Loss 4.0367   LearningRate 0.0141   Epoch: 12   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:48,811-Speed 3406.38 samples/sec   Loss 4.0875   LearningRate 0.0141   Epoch: 12   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:51,820-Speed 3404.14 samples/sec   Loss 3.9537   LearningRate 0.0141   Epoch: 12   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:54,843-Speed 3388.69 samples/sec   Loss 4.0505   LearningRate 0.0141   Epoch: 12   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:44:57,856-Speed 3399.45 samples/sec   Loss 3.9814   LearningRate 0.0141   Epoch: 12   Global Step: 63240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:00,879-Speed 3388.37 samples/sec   Loss 4.0445   LearningRate 0.0140   Epoch: 12   Global Step: 63250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:03,895-Speed 3395.99 samples/sec   Loss 3.8646   LearningRate 0.0140   Epoch: 12   Global Step: 63260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:06,915-Speed 3391.59 samples/sec   Loss 3.9549   LearningRate 0.0140   Epoch: 12   Global Step: 63270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:09,918-Speed 3410.25 samples/sec   Loss 3.9426   LearningRate 0.0140   Epoch: 12   Global Step: 63280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:12,937-Speed 3392.80 samples/sec   Loss 4.0667   LearningRate 0.0140   Epoch: 12   Global Step: 63290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:15,952-Speed 3397.31 samples/sec   Loss 3.9731   LearningRate 0.0140   Epoch: 12   Global Step: 63300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:18,980-Speed 3382.84 samples/sec   Loss 4.0127   LearningRate 0.0140   Epoch: 12   Global Step: 63310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:21,980-Speed 3413.92 samples/sec   Loss 3.9057   LearningRate 0.0140   Epoch: 12   Global Step: 63320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:24,984-Speed 3409.85 samples/sec   Loss 3.9284   LearningRate 0.0140   Epoch: 12   Global Step: 63330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:28,004-Speed 3391.76 samples/sec   Loss 3.9874   LearningRate 0.0140   Epoch: 12   Global Step: 63340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:45:31,053-Speed 3358.78 samples/sec   Loss 4.0564   LearningRate 0.0140   Epoch: 12   Global Step: 63350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:34,058-Speed 3409.40 samples/sec   Loss 4.0025   LearningRate 0.0140   Epoch: 12   Global Step: 63360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:37,102-Speed 3364.10 samples/sec   Loss 3.8962   LearningRate 0.0140   Epoch: 12   Global Step: 63370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:40,115-Speed 3399.41 samples/sec   Loss 3.9568   LearningRate 0.0139   Epoch: 12   Global Step: 63380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:43,129-Speed 3398.22 samples/sec   Loss 3.9321   LearningRate 0.0139   Epoch: 12   Global Step: 63390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:46,134-Speed 3409.16 samples/sec   Loss 4.0063   LearningRate 0.0139   Epoch: 12   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:49,146-Speed 3399.83 samples/sec   Loss 4.0085   LearningRate 0.0139   Epoch: 12   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:52,194-Speed 3361.14 samples/sec   Loss 4.1521   LearningRate 0.0139   Epoch: 12   Global Step: 63420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:55,201-Speed 3406.05 samples/sec   Loss 4.0194   LearningRate 0.0139   Epoch: 12   Global Step: 63430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:45:58,223-Speed 3389.61 samples/sec   Loss 3.9435   LearningRate 0.0139   Epoch: 12   Global Step: 63440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:46:01,191-Speed 3450.69 samples/sec   Loss 4.0538   LearningRate 0.0139   Epoch: 12   Global Step: 63450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:04,215-Speed 3387.10 samples/sec   Loss 4.0911   LearningRate 0.0139   Epoch: 12   Global Step: 63460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:07,223-Speed 3405.66 samples/sec   Loss 4.0019   LearningRate 0.0139   Epoch: 12   Global Step: 63470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:10,231-Speed 3404.53 samples/sec   Loss 3.9934   LearningRate 0.0139   Epoch: 12   Global Step: 63480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:13,251-Speed 3391.79 samples/sec   Loss 4.0913   LearningRate 0.0139   Epoch: 12   Global Step: 63490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:16,261-Speed 3402.42 samples/sec   Loss 3.8657   LearningRate 0.0139   Epoch: 12   Global Step: 63500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:19,274-Speed 3399.89 samples/sec   Loss 4.0023   LearningRate 0.0139   Epoch: 12   Global Step: 63510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:22,277-Speed 3410.81 samples/sec   Loss 3.9689   LearningRate 0.0138   Epoch: 12   Global Step: 63520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:25,382-Speed 3298.72 samples/sec   Loss 3.8671   LearningRate 0.0138   Epoch: 12   Global Step: 63530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:28,420-Speed 3371.45 samples/sec   Loss 4.0924   LearningRate 0.0138   Epoch: 12   Global Step: 63540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:31,446-Speed 3385.21 samples/sec   Loss 4.0277   LearningRate 0.0138   Epoch: 12   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:46:34,427-Speed 3435.27 samples/sec   Loss 4.0018   LearningRate 0.0138   Epoch: 12   Global Step: 63560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:37,444-Speed 3395.18 samples/sec   Loss 4.0151   LearningRate 0.0138   Epoch: 12   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:40,448-Speed 3409.58 samples/sec   Loss 3.9167   LearningRate 0.0138   Epoch: 12   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:43,451-Speed 3410.09 samples/sec   Loss 4.1203   LearningRate 0.0138   Epoch: 12   Global Step: 63590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:46,461-Speed 3403.19 samples/sec   Loss 3.9457   LearningRate 0.0138   Epoch: 12   Global Step: 63600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:49,482-Speed 3390.55 samples/sec   Loss 3.9988   LearningRate 0.0138   Epoch: 12   Global Step: 63610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:52,492-Speed 3403.95 samples/sec   Loss 3.8405   LearningRate 0.0138   Epoch: 12   Global Step: 63620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:55,595-Speed 3300.47 samples/sec   Loss 3.9045   LearningRate 0.0138   Epoch: 12   Global Step: 63630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:46:58,604-Speed 3403.28 samples/sec   Loss 4.0384   LearningRate 0.0138   Epoch: 12   Global Step: 63640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:47:01,659-Speed 3353.12 samples/sec   Loss 3.9898   LearningRate 0.0137   Epoch: 12   Global Step: 63650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:47:04,670-Speed 3401.16 samples/sec   Loss 3.8627   LearningRate 0.0137   Epoch: 12   Global Step: 63660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:07,674-Speed 3410.73 samples/sec   Loss 3.8747   LearningRate 0.0137   Epoch: 12   Global Step: 63670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:10,679-Speed 3408.44 samples/sec   Loss 3.9999   LearningRate 0.0137   Epoch: 12   Global Step: 63680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:13,688-Speed 3402.83 samples/sec   Loss 3.9317   LearningRate 0.0137   Epoch: 12   Global Step: 63690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:16,750-Speed 3345.89 samples/sec   Loss 4.1185   LearningRate 0.0137   Epoch: 12   Global Step: 63700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:19,755-Speed 3408.24 samples/sec   Loss 4.0185   LearningRate 0.0137   Epoch: 12   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:22,760-Speed 3408.53 samples/sec   Loss 3.9144   LearningRate 0.0137   Epoch: 12   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:25,765-Speed 3408.85 samples/sec   Loss 3.9255   LearningRate 0.0137   Epoch: 12   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:28,788-Speed 3387.97 samples/sec   Loss 3.8417   LearningRate 0.0137   Epoch: 12   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:31,795-Speed 3406.85 samples/sec   Loss 3.9421   LearningRate 0.0137   Epoch: 12   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:34,798-Speed 3409.81 samples/sec   Loss 3.9252   LearningRate 0.0137   Epoch: 12   Global Step: 63760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:47:37,784-Speed 3430.39 samples/sec   Loss 3.9034   LearningRate 0.0137   Epoch: 12   Global Step: 63770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:40,790-Speed 3407.07 samples/sec   Loss 3.8953   LearningRate 0.0137   Epoch: 12   Global Step: 63780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:43,799-Speed 3404.24 samples/sec   Loss 4.0462   LearningRate 0.0136   Epoch: 12   Global Step: 63790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:46,817-Speed 3394.07 samples/sec   Loss 4.1024   LearningRate 0.0136   Epoch: 12   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:49,823-Speed 3407.82 samples/sec   Loss 4.1194   LearningRate 0.0136   Epoch: 12   Global Step: 63810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:52,832-Speed 3403.92 samples/sec   Loss 3.9524   LearningRate 0.0136   Epoch: 12   Global Step: 63820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:55,839-Speed 3406.08 samples/sec   Loss 3.8803   LearningRate 0.0136   Epoch: 12   Global Step: 63830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:47:58,843-Speed 3409.88 samples/sec   Loss 3.9700   LearningRate 0.0136   Epoch: 12   Global Step: 63840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:01,851-Speed 3405.53 samples/sec   Loss 3.9549   LearningRate 0.0136   Epoch: 12   Global Step: 63850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:04,859-Speed 3404.15 samples/sec   Loss 4.0080   LearningRate 0.0136   Epoch: 12   Global Step: 63860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:07,850-Speed 3425.24 samples/sec   Loss 3.9891   LearningRate 0.0136   Epoch: 12   Global Step: 63870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:10,856-Speed 3406.92 samples/sec   Loss 4.0665   LearningRate 0.0136   Epoch: 12   Global Step: 63880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:13,861-Speed 3408.30 samples/sec   Loss 4.0028   LearningRate 0.0136   Epoch: 12   Global Step: 63890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:16,875-Speed 3398.13 samples/sec   Loss 4.0676   LearningRate 0.0136   Epoch: 12   Global Step: 63900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:19,882-Speed 3406.47 samples/sec   Loss 3.9594   LearningRate 0.0136   Epoch: 12   Global Step: 63910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:48:22,872-Speed 3425.55 samples/sec   Loss 3.9288   LearningRate 0.0136   Epoch: 12   Global Step: 63920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:25,898-Speed 3384.85 samples/sec   Loss 3.9050   LearningRate 0.0135   Epoch: 12   Global Step: 63930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:28,910-Speed 3400.74 samples/sec   Loss 3.9058   LearningRate 0.0135   Epoch: 12   Global Step: 63940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:31,992-Speed 3324.36 samples/sec   Loss 3.7861   LearningRate 0.0135   Epoch: 12   Global Step: 63950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:35,008-Speed 3395.69 samples/sec   Loss 4.0114   LearningRate 0.0135   Epoch: 12   Global Step: 63960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:38,022-Speed 3398.06 samples/sec   Loss 3.9950   LearningRate 0.0135   Epoch: 12   Global Step: 63970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:41,046-Speed 3387.26 samples/sec   Loss 3.8993   LearningRate 0.0135   Epoch: 12   Global Step: 63980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:44,048-Speed 3412.25 samples/sec   Loss 3.9958   LearningRate 0.0135   Epoch: 12   Global Step: 63990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:48:47,051-Speed 3411.34 samples/sec   Loss 3.8641   LearningRate 0.0135   Epoch: 12   Global Step: 64000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:49:31,327-[lfw][64000]XNorm: 24.006045
Training: 2022-04-11 05:49:31,328-[lfw][64000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 05:49:31,328-[lfw][64000]Accuracy-Highest: 0.99850
Training: 2022-04-11 05:50:22,728-[cfp_fp][64000]XNorm: 22.506030
Training: 2022-04-11 05:50:22,729-[cfp_fp][64000]Accuracy-Flip: 0.98300+-0.00618
Training: 2022-04-11 05:50:22,729-[cfp_fp][64000]Accuracy-Highest: 0.98300
Training: 2022-04-11 05:51:07,240-[agedb_30][64000]XNorm: 23.641323
Training: 2022-04-11 05:51:07,240-[agedb_30][64000]Accuracy-Flip: 0.98233+-0.00704
Training: 2022-04-11 05:51:07,241-[agedb_30][64000]Accuracy-Highest: 0.98267
Training: 2022-04-11 05:51:10,242-Speed 71.51 samples/sec   Loss 3.9861   LearningRate 0.0135   Epoch: 12   Global Step: 64010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:13,231-Speed 3426.83 samples/sec   Loss 3.9602   LearningRate 0.0135   Epoch: 12   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:51:16,253-Speed 3389.40 samples/sec   Loss 3.9343   LearningRate 0.0135   Epoch: 12   Global Step: 64030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:51:19,223-Speed 3448.66 samples/sec   Loss 4.0770   LearningRate 0.0135   Epoch: 12   Global Step: 64040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:22,213-Speed 3425.29 samples/sec   Loss 3.9784   LearningRate 0.0135   Epoch: 12   Global Step: 64050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:25,204-Speed 3425.57 samples/sec   Loss 3.9765   LearningRate 0.0135   Epoch: 12   Global Step: 64060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:28,201-Speed 3417.40 samples/sec   Loss 3.9607   LearningRate 0.0134   Epoch: 12   Global Step: 64070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:31,191-Speed 3424.60 samples/sec   Loss 3.8879   LearningRate 0.0134   Epoch: 12   Global Step: 64080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:34,184-Speed 3423.13 samples/sec   Loss 4.0579   LearningRate 0.0134   Epoch: 12   Global Step: 64090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:37,181-Speed 3416.80 samples/sec   Loss 3.9531   LearningRate 0.0134   Epoch: 12   Global Step: 64100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:40,176-Speed 3420.69 samples/sec   Loss 3.8770   LearningRate 0.0134   Epoch: 12   Global Step: 64110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:43,168-Speed 3423.56 samples/sec   Loss 3.8897   LearningRate 0.0134   Epoch: 12   Global Step: 64120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:46,162-Speed 3420.41 samples/sec   Loss 4.0155   LearningRate 0.0134   Epoch: 12   Global Step: 64130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:51:49,165-Speed 3410.95 samples/sec   Loss 4.0399   LearningRate 0.0134   Epoch: 12   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:51:52,175-Speed 3403.28 samples/sec   Loss 3.9364   LearningRate 0.0134   Epoch: 12   Global Step: 64150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:51:55,205-Speed 3379.87 samples/sec   Loss 3.9935   LearningRate 0.0134   Epoch: 12   Global Step: 64160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:51:58,206-Speed 3412.86 samples/sec   Loss 3.9885   LearningRate 0.0134   Epoch: 12   Global Step: 64170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:01,205-Speed 3415.18 samples/sec   Loss 3.9370   LearningRate 0.0134   Epoch: 12   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:04,204-Speed 3415.86 samples/sec   Loss 3.9893   LearningRate 0.0134   Epoch: 12   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:07,214-Speed 3402.34 samples/sec   Loss 3.8788   LearningRate 0.0133   Epoch: 12   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:10,225-Speed 3402.93 samples/sec   Loss 3.8480   LearningRate 0.0133   Epoch: 12   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:13,271-Speed 3362.24 samples/sec   Loss 4.0462   LearningRate 0.0133   Epoch: 12   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:16,311-Speed 3368.35 samples/sec   Loss 3.8693   LearningRate 0.0133   Epoch: 12   Global Step: 64230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:19,307-Speed 3419.32 samples/sec   Loss 3.9103   LearningRate 0.0133   Epoch: 12   Global Step: 64240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 05:52:22,286-Speed 3438.36 samples/sec   Loss 3.9475   LearningRate 0.0133   Epoch: 12   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:25,281-Speed 3419.57 samples/sec   Loss 3.9107   LearningRate 0.0133   Epoch: 12   Global Step: 64260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:28,284-Speed 3410.41 samples/sec   Loss 3.9630   LearningRate 0.0133   Epoch: 12   Global Step: 64270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:31,334-Speed 3358.11 samples/sec   Loss 3.9507   LearningRate 0.0133   Epoch: 12   Global Step: 64280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:34,340-Speed 3408.51 samples/sec   Loss 3.9407   LearningRate 0.0133   Epoch: 12   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:37,368-Speed 3382.18 samples/sec   Loss 4.0075   LearningRate 0.0133   Epoch: 12   Global Step: 64300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:40,365-Speed 3418.15 samples/sec   Loss 3.9597   LearningRate 0.0133   Epoch: 12   Global Step: 64310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:43,370-Speed 3407.88 samples/sec   Loss 3.9574   LearningRate 0.0133   Epoch: 12   Global Step: 64320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:46,368-Speed 3416.36 samples/sec   Loss 3.9122   LearningRate 0.0133   Epoch: 12   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:49,373-Speed 3409.08 samples/sec   Loss 3.9488   LearningRate 0.0132   Epoch: 12   Global Step: 64340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:52,400-Speed 3383.14 samples/sec   Loss 3.8536   LearningRate 0.0132   Epoch: 12   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:55,425-Speed 3386.34 samples/sec   Loss 4.1308   LearningRate 0.0132   Epoch: 12   Global Step: 64360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:52:58,451-Speed 3384.51 samples/sec   Loss 3.8905   LearningRate 0.0132   Epoch: 12   Global Step: 64370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:01,471-Speed 3392.37 samples/sec   Loss 3.9454   LearningRate 0.0132   Epoch: 12   Global Step: 64380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:04,469-Speed 3416.96 samples/sec   Loss 3.9598   LearningRate 0.0132   Epoch: 12   Global Step: 64390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:07,468-Speed 3414.30 samples/sec   Loss 4.0218   LearningRate 0.0132   Epoch: 12   Global Step: 64400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:10,462-Speed 3420.99 samples/sec   Loss 3.9244   LearningRate 0.0132   Epoch: 12   Global Step: 64410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:13,461-Speed 3415.65 samples/sec   Loss 4.0348   LearningRate 0.0132   Epoch: 12   Global Step: 64420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:16,458-Speed 3417.80 samples/sec   Loss 3.9597   LearningRate 0.0132   Epoch: 12   Global Step: 64430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:19,457-Speed 3415.74 samples/sec   Loss 3.9221   LearningRate 0.0132   Epoch: 12   Global Step: 64440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:22,439-Speed 3433.67 samples/sec   Loss 3.8072   LearningRate 0.0132   Epoch: 12   Global Step: 64450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:25,456-Speed 3395.78 samples/sec   Loss 3.8772   LearningRate 0.0132   Epoch: 12   Global Step: 64460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:53:28,449-Speed 3422.40 samples/sec   Loss 3.8984   LearningRate 0.0132   Epoch: 12   Global Step: 64470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:31,456-Speed 3406.13 samples/sec   Loss 3.9588   LearningRate 0.0131   Epoch: 12   Global Step: 64480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:34,454-Speed 3416.90 samples/sec   Loss 3.8253   LearningRate 0.0131   Epoch: 12   Global Step: 64490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:37,466-Speed 3400.04 samples/sec   Loss 3.9747   LearningRate 0.0131   Epoch: 12   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:40,463-Speed 3418.53 samples/sec   Loss 3.9744   LearningRate 0.0131   Epoch: 12   Global Step: 64510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:43,465-Speed 3411.58 samples/sec   Loss 3.9692   LearningRate 0.0131   Epoch: 12   Global Step: 64520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:46,520-Speed 3352.72 samples/sec   Loss 3.9086   LearningRate 0.0131   Epoch: 12   Global Step: 64530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:49,539-Speed 3392.93 samples/sec   Loss 4.0050   LearningRate 0.0131   Epoch: 12   Global Step: 64540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:52,631-Speed 3311.88 samples/sec   Loss 3.8266   LearningRate 0.0131   Epoch: 12   Global Step: 64550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:55,635-Speed 3409.38 samples/sec   Loss 3.9423   LearningRate 0.0131   Epoch: 12   Global Step: 64560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:53:58,642-Speed 3406.46 samples/sec   Loss 4.0024   LearningRate 0.0131   Epoch: 12   Global Step: 64570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:01,646-Speed 3409.90 samples/sec   Loss 3.9424   LearningRate 0.0131   Epoch: 12   Global Step: 64580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:04,655-Speed 3404.62 samples/sec   Loss 3.9463   LearningRate 0.0131   Epoch: 12   Global Step: 64590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:07,656-Speed 3413.18 samples/sec   Loss 3.9240   LearningRate 0.0131   Epoch: 12   Global Step: 64600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:10,652-Speed 3418.23 samples/sec   Loss 3.9768   LearningRate 0.0131   Epoch: 12   Global Step: 64610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:13,658-Speed 3407.07 samples/sec   Loss 3.8963   LearningRate 0.0130   Epoch: 12   Global Step: 64620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:16,659-Speed 3412.88 samples/sec   Loss 3.9267   LearningRate 0.0130   Epoch: 12   Global Step: 64630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:19,662-Speed 3413.36 samples/sec   Loss 3.9774   LearningRate 0.0130   Epoch: 12   Global Step: 64640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:22,671-Speed 3403.87 samples/sec   Loss 3.8473   LearningRate 0.0130   Epoch: 12   Global Step: 64650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:25,663-Speed 3422.85 samples/sec   Loss 3.9495   LearningRate 0.0130   Epoch: 12   Global Step: 64660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:28,738-Speed 3332.06 samples/sec   Loss 3.9436   LearningRate 0.0130   Epoch: 12   Global Step: 64670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:31,741-Speed 3410.51 samples/sec   Loss 3.9788   LearningRate 0.0130   Epoch: 12   Global Step: 64680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:34,743-Speed 3412.45 samples/sec   Loss 3.8203   LearningRate 0.0130   Epoch: 12   Global Step: 64690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:37,767-Speed 3387.07 samples/sec   Loss 3.8413   LearningRate 0.0130   Epoch: 12   Global Step: 64700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:40,770-Speed 3409.94 samples/sec   Loss 3.8659   LearningRate 0.0130   Epoch: 12   Global Step: 64710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:43,774-Speed 3410.06 samples/sec   Loss 3.9893   LearningRate 0.0130   Epoch: 12   Global Step: 64720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:46,774-Speed 3414.38 samples/sec   Loss 4.0429   LearningRate 0.0130   Epoch: 12   Global Step: 64730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:49,917-Speed 3258.91 samples/sec   Loss 3.9917   LearningRate 0.0130   Epoch: 12   Global Step: 64740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:52,947-Speed 3379.41 samples/sec   Loss 3.9693   LearningRate 0.0130   Epoch: 12   Global Step: 64750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:54:55,948-Speed 3413.56 samples/sec   Loss 3.9429   LearningRate 0.0129   Epoch: 12   Global Step: 64760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:54:58,935-Speed 3429.83 samples/sec   Loss 3.8662   LearningRate 0.0129   Epoch: 12   Global Step: 64770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:01,998-Speed 3343.22 samples/sec   Loss 3.9226   LearningRate 0.0129   Epoch: 12   Global Step: 64780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:05,031-Speed 3377.13 samples/sec   Loss 3.9818   LearningRate 0.0129   Epoch: 12   Global Step: 64790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:08,062-Speed 3379.04 samples/sec   Loss 3.8745   LearningRate 0.0129   Epoch: 12   Global Step: 64800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:11,062-Speed 3415.10 samples/sec   Loss 4.1030   LearningRate 0.0129   Epoch: 12   Global Step: 64810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:14,064-Speed 3411.56 samples/sec   Loss 4.0259   LearningRate 0.0129   Epoch: 12   Global Step: 64820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:17,078-Speed 3398.05 samples/sec   Loss 3.8765   LearningRate 0.0129   Epoch: 12   Global Step: 64830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:20,079-Speed 3413.12 samples/sec   Loss 3.9308   LearningRate 0.0129   Epoch: 12   Global Step: 64840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:23,084-Speed 3408.69 samples/sec   Loss 3.9281   LearningRate 0.0129   Epoch: 12   Global Step: 64850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:26,086-Speed 3411.75 samples/sec   Loss 3.8125   LearningRate 0.0129   Epoch: 12   Global Step: 64860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:29,094-Speed 3405.45 samples/sec   Loss 3.8414   LearningRate 0.0129   Epoch: 12   Global Step: 64870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:32,094-Speed 3413.88 samples/sec   Loss 3.8855   LearningRate 0.0129   Epoch: 12   Global Step: 64880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:35,095-Speed 3413.44 samples/sec   Loss 4.0092   LearningRate 0.0129   Epoch: 12   Global Step: 64890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:38,097-Speed 3411.25 samples/sec   Loss 4.0407   LearningRate 0.0128   Epoch: 12   Global Step: 64900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:41,107-Speed 3402.67 samples/sec   Loss 3.9402   LearningRate 0.0128   Epoch: 12   Global Step: 64910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:44,111-Speed 3410.12 samples/sec   Loss 3.8049   LearningRate 0.0128   Epoch: 12   Global Step: 64920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:47,111-Speed 3413.92 samples/sec   Loss 4.0509   LearningRate 0.0128   Epoch: 12   Global Step: 64930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:50,114-Speed 3411.16 samples/sec   Loss 3.8097   LearningRate 0.0128   Epoch: 12   Global Step: 64940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:53,132-Speed 3393.64 samples/sec   Loss 4.0106   LearningRate 0.0128   Epoch: 12   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:55:56,118-Speed 3430.77 samples/sec   Loss 3.8918   LearningRate 0.0128   Epoch: 12   Global Step: 64960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:55:59,118-Speed 3413.86 samples/sec   Loss 3.9805   LearningRate 0.0128   Epoch: 12   Global Step: 64970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:02,122-Speed 3409.25 samples/sec   Loss 3.9607   LearningRate 0.0128   Epoch: 12   Global Step: 64980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:05,131-Speed 3404.20 samples/sec   Loss 3.7986   LearningRate 0.0128   Epoch: 12   Global Step: 64990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:08,143-Speed 3401.39 samples/sec   Loss 3.8849   LearningRate 0.0128   Epoch: 12   Global Step: 65000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:11,164-Speed 3390.34 samples/sec   Loss 3.9974   LearningRate 0.0128   Epoch: 12   Global Step: 65010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:14,170-Speed 3407.31 samples/sec   Loss 3.9092   LearningRate 0.0128   Epoch: 12   Global Step: 65020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:17,188-Speed 3393.74 samples/sec   Loss 4.0186   LearningRate 0.0128   Epoch: 12   Global Step: 65030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:20,191-Speed 3411.15 samples/sec   Loss 3.9208   LearningRate 0.0127   Epoch: 12   Global Step: 65040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:23,192-Speed 3412.72 samples/sec   Loss 3.9788   LearningRate 0.0127   Epoch: 12   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:26,187-Speed 3419.99 samples/sec   Loss 3.9914   LearningRate 0.0127   Epoch: 12   Global Step: 65060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:29,341-Speed 3248.52 samples/sec   Loss 3.8893   LearningRate 0.0127   Epoch: 12   Global Step: 65070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:32,412-Speed 3335.27 samples/sec   Loss 3.9217   LearningRate 0.0127   Epoch: 12   Global Step: 65080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:35,421-Speed 3403.90 samples/sec   Loss 3.8362   LearningRate 0.0127   Epoch: 12   Global Step: 65090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:38,420-Speed 3415.55 samples/sec   Loss 3.8255   LearningRate 0.0127   Epoch: 12   Global Step: 65100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:41,432-Speed 3399.88 samples/sec   Loss 3.7621   LearningRate 0.0127   Epoch: 12   Global Step: 65110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:44,446-Speed 3398.84 samples/sec   Loss 3.8886   LearningRate 0.0127   Epoch: 12   Global Step: 65120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:47,457-Speed 3401.68 samples/sec   Loss 3.9307   LearningRate 0.0127   Epoch: 12   Global Step: 65130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:50,483-Speed 3385.05 samples/sec   Loss 3.8713   LearningRate 0.0127   Epoch: 12   Global Step: 65140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:53,488-Speed 3408.26 samples/sec   Loss 3.9582   LearningRate 0.0127   Epoch: 12   Global Step: 65150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:56:56,492-Speed 3409.62 samples/sec   Loss 3.9877   LearningRate 0.0127   Epoch: 12   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:56:59,496-Speed 3410.30 samples/sec   Loss 3.8082   LearningRate 0.0127   Epoch: 12   Global Step: 65170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:02,522-Speed 3384.08 samples/sec   Loss 3.8598   LearningRate 0.0127   Epoch: 12   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:05,575-Speed 3355.62 samples/sec   Loss 3.9169   LearningRate 0.0126   Epoch: 12   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:08,576-Speed 3412.27 samples/sec   Loss 3.9725   LearningRate 0.0126   Epoch: 12   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:11,617-Speed 3367.98 samples/sec   Loss 4.0219   LearningRate 0.0126   Epoch: 12   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:14,648-Speed 3380.56 samples/sec   Loss 3.9973   LearningRate 0.0126   Epoch: 12   Global Step: 65220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:17,666-Speed 3393.73 samples/sec   Loss 3.8419   LearningRate 0.0126   Epoch: 12   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:20,694-Speed 3382.01 samples/sec   Loss 3.8154   LearningRate 0.0126   Epoch: 12   Global Step: 65240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:23,682-Speed 3428.03 samples/sec   Loss 3.7737   LearningRate 0.0126   Epoch: 12   Global Step: 65250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:26,693-Speed 3401.50 samples/sec   Loss 3.8448   LearningRate 0.0126   Epoch: 12   Global Step: 65260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:29,710-Speed 3394.88 samples/sec   Loss 3.9300   LearningRate 0.0126   Epoch: 12   Global Step: 65270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:32,726-Speed 3396.78 samples/sec   Loss 4.0403   LearningRate 0.0126   Epoch: 12   Global Step: 65280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:35,727-Speed 3412.97 samples/sec   Loss 3.9559   LearningRate 0.0126   Epoch: 12   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:38,744-Speed 3395.03 samples/sec   Loss 3.9358   LearningRate 0.0126   Epoch: 12   Global Step: 65300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:41,751-Speed 3405.97 samples/sec   Loss 3.9728   LearningRate 0.0126   Epoch: 12   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:44,756-Speed 3409.19 samples/sec   Loss 3.9302   LearningRate 0.0126   Epoch: 12   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:47,764-Speed 3404.59 samples/sec   Loss 3.9985   LearningRate 0.0125   Epoch: 12   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:50,868-Speed 3299.70 samples/sec   Loss 3.9106   LearningRate 0.0125   Epoch: 12   Global Step: 65340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:57:53,874-Speed 3407.12 samples/sec   Loss 3.8860   LearningRate 0.0125   Epoch: 12   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:56,880-Speed 3408.09 samples/sec   Loss 3.8417   LearningRate 0.0125   Epoch: 12   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:57:59,884-Speed 3409.35 samples/sec   Loss 4.0387   LearningRate 0.0125   Epoch: 12   Global Step: 65370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:58:02,870-Speed 3430.07 samples/sec   Loss 3.8011   LearningRate 0.0125   Epoch: 12   Global Step: 65380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:05,885-Speed 3397.26 samples/sec   Loss 3.8571   LearningRate 0.0125   Epoch: 12   Global Step: 65390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:08,885-Speed 3414.07 samples/sec   Loss 3.8503   LearningRate 0.0125   Epoch: 12   Global Step: 65400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:11,902-Speed 3394.82 samples/sec   Loss 3.7738   LearningRate 0.0125   Epoch: 12   Global Step: 65410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:15,067-Speed 3236.85 samples/sec   Loss 3.9131   LearningRate 0.0125   Epoch: 12   Global Step: 65420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:18,072-Speed 3408.67 samples/sec   Loss 3.8025   LearningRate 0.0125   Epoch: 12   Global Step: 65430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:21,075-Speed 3409.81 samples/sec   Loss 3.7407   LearningRate 0.0125   Epoch: 12   Global Step: 65440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:24,093-Speed 3394.71 samples/sec   Loss 3.8333   LearningRate 0.0125   Epoch: 12   Global Step: 65450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:27,145-Speed 3355.84 samples/sec   Loss 3.7949   LearningRate 0.0125   Epoch: 12   Global Step: 65460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:30,156-Speed 3403.11 samples/sec   Loss 3.9143   LearningRate 0.0124   Epoch: 12   Global Step: 65470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:33,190-Speed 3375.92 samples/sec   Loss 3.9540   LearningRate 0.0124   Epoch: 12   Global Step: 65480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:58:36,173-Speed 3433.59 samples/sec   Loss 4.0330   LearningRate 0.0124   Epoch: 12   Global Step: 65490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:39,184-Speed 3402.41 samples/sec   Loss 3.7686   LearningRate 0.0124   Epoch: 12   Global Step: 65500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:42,188-Speed 3408.50 samples/sec   Loss 3.9160   LearningRate 0.0124   Epoch: 12   Global Step: 65510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:45,221-Speed 3377.24 samples/sec   Loss 3.9038   LearningRate 0.0124   Epoch: 12   Global Step: 65520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:48,247-Speed 3384.94 samples/sec   Loss 3.8229   LearningRate 0.0124   Epoch: 12   Global Step: 65530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:51,275-Speed 3382.34 samples/sec   Loss 4.0176   LearningRate 0.0124   Epoch: 12   Global Step: 65540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:54,281-Speed 3407.59 samples/sec   Loss 3.8087   LearningRate 0.0124   Epoch: 12   Global Step: 65550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:58:57,285-Speed 3409.35 samples/sec   Loss 3.9038   LearningRate 0.0124   Epoch: 12   Global Step: 65560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:00,295-Speed 3404.13 samples/sec   Loss 3.7572   LearningRate 0.0124   Epoch: 12   Global Step: 65570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:03,340-Speed 3363.19 samples/sec   Loss 3.8908   LearningRate 0.0124   Epoch: 12   Global Step: 65580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:06,343-Speed 3410.52 samples/sec   Loss 3.9039   LearningRate 0.0124   Epoch: 12   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:59:09,327-Speed 3433.56 samples/sec   Loss 3.8758   LearningRate 0.0124   Epoch: 12   Global Step: 65600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:12,328-Speed 3412.01 samples/sec   Loss 3.8193   LearningRate 0.0123   Epoch: 12   Global Step: 65610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:15,328-Speed 3414.15 samples/sec   Loss 3.8549   LearningRate 0.0123   Epoch: 12   Global Step: 65620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:18,335-Speed 3406.88 samples/sec   Loss 3.9392   LearningRate 0.0123   Epoch: 12   Global Step: 65630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:21,336-Speed 3412.05 samples/sec   Loss 3.9269   LearningRate 0.0123   Epoch: 12   Global Step: 65640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:24,342-Speed 3407.88 samples/sec   Loss 3.9068   LearningRate 0.0123   Epoch: 12   Global Step: 65650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:27,350-Speed 3404.65 samples/sec   Loss 3.7860   LearningRate 0.0123   Epoch: 12   Global Step: 65660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:30,374-Speed 3387.74 samples/sec   Loss 3.8953   LearningRate 0.0123   Epoch: 12   Global Step: 65670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:33,376-Speed 3411.88 samples/sec   Loss 4.0013   LearningRate 0.0123   Epoch: 12   Global Step: 65680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:36,381-Speed 3408.46 samples/sec   Loss 3.7870   LearningRate 0.0123   Epoch: 12   Global Step: 65690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 05:59:39,386-Speed 3408.81 samples/sec   Loss 3.8762   LearningRate 0.0123   Epoch: 12   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:59:42,405-Speed 3393.00 samples/sec   Loss 3.8091   LearningRate 0.0123   Epoch: 12   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:59:45,407-Speed 3411.29 samples/sec   Loss 3.7928   LearningRate 0.0123   Epoch: 12   Global Step: 65720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:59:48,416-Speed 3404.15 samples/sec   Loss 4.0441   LearningRate 0.0123   Epoch: 12   Global Step: 65730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:59:51,437-Speed 3390.70 samples/sec   Loss 3.8847   LearningRate 0.0123   Epoch: 12   Global Step: 65740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 05:59:54,532-Speed 3309.25 samples/sec   Loss 3.9555   LearningRate 0.0123   Epoch: 12   Global Step: 65750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:07,092-Speed 815.35 samples/sec   Loss 3.4283   LearningRate 0.0122   Epoch: 13   Global Step: 65760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:10,106-Speed 3398.93 samples/sec   Loss 3.0649   LearningRate 0.0122   Epoch: 13   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:13,142-Speed 3372.89 samples/sec   Loss 3.1461   LearningRate 0.0122   Epoch: 13   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:16,146-Speed 3410.25 samples/sec   Loss 2.9722   LearningRate 0.0122   Epoch: 13   Global Step: 65790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:19,152-Speed 3406.97 samples/sec   Loss 3.0527   LearningRate 0.0122   Epoch: 13   Global Step: 65800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:22,159-Speed 3406.67 samples/sec   Loss 3.0370   LearningRate 0.0122   Epoch: 13   Global Step: 65810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:25,190-Speed 3379.40 samples/sec   Loss 3.1143   LearningRate 0.0122   Epoch: 13   Global Step: 65820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:28,269-Speed 3325.88 samples/sec   Loss 2.9776   LearningRate 0.0122   Epoch: 13   Global Step: 65830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:31,518-Speed 3153.10 samples/sec   Loss 3.0850   LearningRate 0.0122   Epoch: 13   Global Step: 65840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:34,547-Speed 3381.06 samples/sec   Loss 3.0776   LearningRate 0.0122   Epoch: 13   Global Step: 65850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:37,578-Speed 3379.60 samples/sec   Loss 3.1088   LearningRate 0.0122   Epoch: 13   Global Step: 65860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:40,639-Speed 3345.82 samples/sec   Loss 3.1222   LearningRate 0.0122   Epoch: 13   Global Step: 65870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:43,660-Speed 3390.56 samples/sec   Loss 3.0332   LearningRate 0.0122   Epoch: 13   Global Step: 65880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:00:46,667-Speed 3406.30 samples/sec   Loss 3.0980   LearningRate 0.0122   Epoch: 13   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:49,683-Speed 3396.12 samples/sec   Loss 3.1500   LearningRate 0.0121   Epoch: 13   Global Step: 65900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:52,777-Speed 3310.87 samples/sec   Loss 3.1731   LearningRate 0.0121   Epoch: 13   Global Step: 65910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:55,800-Speed 3388.49 samples/sec   Loss 3.0360   LearningRate 0.0121   Epoch: 13   Global Step: 65920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:00:58,819-Speed 3392.04 samples/sec   Loss 3.2163   LearningRate 0.0121   Epoch: 13   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:01:01,977-Speed 3244.01 samples/sec   Loss 3.0755   LearningRate 0.0121   Epoch: 13   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:01:05,063-Speed 3319.29 samples/sec   Loss 3.1848   LearningRate 0.0121   Epoch: 13   Global Step: 65950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:01:08,076-Speed 3398.36 samples/sec   Loss 3.1903   LearningRate 0.0121   Epoch: 13   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:01:11,090-Speed 3399.56 samples/sec   Loss 3.0516   LearningRate 0.0121   Epoch: 13   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:01:14,103-Speed 3398.51 samples/sec   Loss 3.2382   LearningRate 0.0121   Epoch: 13   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:01:17,105-Speed 3412.16 samples/sec   Loss 3.1762   LearningRate 0.0121   Epoch: 13   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:01:20,122-Speed 3395.17 samples/sec   Loss 3.1396   LearningRate 0.0121   Epoch: 13   Global Step: 66000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:02:04,101-[lfw][66000]XNorm: 22.737340
Training: 2022-04-11 06:02:04,101-[lfw][66000]Accuracy-Flip: 0.99817+-0.00252
Training: 2022-04-11 06:02:04,102-[lfw][66000]Accuracy-Highest: 0.99850
Training: 2022-04-11 06:02:55,646-[cfp_fp][66000]XNorm: 21.792470
Training: 2022-04-11 06:02:55,646-[cfp_fp][66000]Accuracy-Flip: 0.98386+-0.00503
Training: 2022-04-11 06:02:55,647-[cfp_fp][66000]Accuracy-Highest: 0.98386
Training: 2022-04-11 06:03:39,633-[agedb_30][66000]XNorm: 23.119056
Training: 2022-04-11 06:03:39,634-[agedb_30][66000]Accuracy-Flip: 0.98317+-0.00677
Training: 2022-04-11 06:03:39,634-[agedb_30][66000]Accuracy-Highest: 0.98317
Training: 2022-04-11 06:03:42,653-Speed 71.84 samples/sec   Loss 3.1056   LearningRate 0.0121   Epoch: 13   Global Step: 66010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:03:45,665-Speed 3401.33 samples/sec   Loss 3.1219   LearningRate 0.0121   Epoch: 13   Global Step: 66020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:03:48,672-Speed 3405.65 samples/sec   Loss 3.1336   LearningRate 0.0121   Epoch: 13   Global Step: 66030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:03:51,714-Speed 3367.23 samples/sec   Loss 3.1528   LearningRate 0.0121   Epoch: 13   Global Step: 66040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:03:54,697-Speed 3433.47 samples/sec   Loss 3.2243   LearningRate 0.0120   Epoch: 13   Global Step: 66050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:03:57,689-Speed 3423.48 samples/sec   Loss 3.1341   LearningRate 0.0120   Epoch: 13   Global Step: 66060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:04:00,678-Speed 3426.38 samples/sec   Loss 2.9970   LearningRate 0.0120   Epoch: 13   Global Step: 66070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:04:03,681-Speed 3411.31 samples/sec   Loss 3.2161   LearningRate 0.0120   Epoch: 13   Global Step: 66080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:04:06,679-Speed 3416.15 samples/sec   Loss 3.0741   LearningRate 0.0120   Epoch: 13   Global Step: 66090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:04:09,678-Speed 3415.94 samples/sec   Loss 3.2473   LearningRate 0.0120   Epoch: 13   Global Step: 66100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:04:12,709-Speed 3378.90 samples/sec   Loss 3.1557   LearningRate 0.0120   Epoch: 13   Global Step: 66110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:04:15,702-Speed 3422.35 samples/sec   Loss 3.3548   LearningRate 0.0120   Epoch: 13   Global Step: 66120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:04:18,712-Speed 3403.30 samples/sec   Loss 3.1934   LearningRate 0.0120   Epoch: 13   Global Step: 66130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:21,713-Speed 3413.23 samples/sec   Loss 3.1966   LearningRate 0.0120   Epoch: 13   Global Step: 66140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:24,753-Speed 3369.01 samples/sec   Loss 3.1430   LearningRate 0.0120   Epoch: 13   Global Step: 66150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:27,888-Speed 3266.97 samples/sec   Loss 3.1748   LearningRate 0.0120   Epoch: 13   Global Step: 66160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:30,900-Speed 3400.28 samples/sec   Loss 3.2452   LearningRate 0.0120   Epoch: 13   Global Step: 66170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:33,910-Speed 3403.25 samples/sec   Loss 3.1407   LearningRate 0.0120   Epoch: 13   Global Step: 66180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:37,034-Speed 3278.90 samples/sec   Loss 3.1868   LearningRate 0.0120   Epoch: 13   Global Step: 66190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:40,039-Speed 3408.04 samples/sec   Loss 3.2852   LearningRate 0.0119   Epoch: 13   Global Step: 66200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:43,057-Speed 3393.98 samples/sec   Loss 3.2327   LearningRate 0.0119   Epoch: 13   Global Step: 66210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:46,079-Speed 3389.38 samples/sec   Loss 3.1051   LearningRate 0.0119   Epoch: 13   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:49,092-Speed 3399.31 samples/sec   Loss 3.2373   LearningRate 0.0119   Epoch: 13   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:52,094-Speed 3412.13 samples/sec   Loss 3.1890   LearningRate 0.0119   Epoch: 13   Global Step: 66240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:55,149-Speed 3352.84 samples/sec   Loss 3.1810   LearningRate 0.0119   Epoch: 13   Global Step: 66250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:04:58,160-Speed 3400.65 samples/sec   Loss 3.1655   LearningRate 0.0119   Epoch: 13   Global Step: 66260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:01,188-Speed 3383.65 samples/sec   Loss 3.3551   LearningRate 0.0119   Epoch: 13   Global Step: 66270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:04,192-Speed 3409.59 samples/sec   Loss 3.2920   LearningRate 0.0119   Epoch: 13   Global Step: 66280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:07,207-Speed 3397.35 samples/sec   Loss 3.1971   LearningRate 0.0119   Epoch: 13   Global Step: 66290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:10,216-Speed 3404.25 samples/sec   Loss 3.2080   LearningRate 0.0119   Epoch: 13   Global Step: 66300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:13,201-Speed 3431.16 samples/sec   Loss 3.3073   LearningRate 0.0119   Epoch: 13   Global Step: 66310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:16,216-Speed 3397.13 samples/sec   Loss 3.2149   LearningRate 0.0119   Epoch: 13   Global Step: 66320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:19,252-Speed 3373.91 samples/sec   Loss 3.3048   LearningRate 0.0119   Epoch: 13   Global Step: 66330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:22,259-Speed 3406.15 samples/sec   Loss 3.1872   LearningRate 0.0118   Epoch: 13   Global Step: 66340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:25,328-Speed 3336.97 samples/sec   Loss 3.1872   LearningRate 0.0118   Epoch: 13   Global Step: 66350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:28,346-Speed 3394.45 samples/sec   Loss 3.3036   LearningRate 0.0118   Epoch: 13   Global Step: 66360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:31,371-Speed 3387.19 samples/sec   Loss 3.2995   LearningRate 0.0118   Epoch: 13   Global Step: 66370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:34,367-Speed 3418.45 samples/sec   Loss 3.3356   LearningRate 0.0118   Epoch: 13   Global Step: 66380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:37,469-Speed 3302.08 samples/sec   Loss 3.3528   LearningRate 0.0118   Epoch: 13   Global Step: 66390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:40,530-Speed 3346.54 samples/sec   Loss 3.3228   LearningRate 0.0118   Epoch: 13   Global Step: 66400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:05:43,546-Speed 3395.03 samples/sec   Loss 3.2435   LearningRate 0.0118   Epoch: 13   Global Step: 66410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:46,547-Speed 3413.89 samples/sec   Loss 3.3007   LearningRate 0.0118   Epoch: 13   Global Step: 66420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:49,560-Speed 3398.35 samples/sec   Loss 3.1444   LearningRate 0.0118   Epoch: 13   Global Step: 66430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:52,564-Speed 3410.10 samples/sec   Loss 3.3043   LearningRate 0.0118   Epoch: 13   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:55,564-Speed 3413.69 samples/sec   Loss 3.2533   LearningRate 0.0118   Epoch: 13   Global Step: 66450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:05:58,577-Speed 3400.34 samples/sec   Loss 3.3193   LearningRate 0.0118   Epoch: 13   Global Step: 66460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:01,628-Speed 3357.48 samples/sec   Loss 3.2539   LearningRate 0.0118   Epoch: 13   Global Step: 66470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:04,691-Speed 3344.35 samples/sec   Loss 3.3169   LearningRate 0.0118   Epoch: 13   Global Step: 66480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:07,703-Speed 3399.79 samples/sec   Loss 3.2813   LearningRate 0.0117   Epoch: 13   Global Step: 66490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:10,727-Speed 3387.48 samples/sec   Loss 3.2131   LearningRate 0.0117   Epoch: 13   Global Step: 66500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:13,717-Speed 3425.43 samples/sec   Loss 3.3036   LearningRate 0.0117   Epoch: 13   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:16,723-Speed 3407.71 samples/sec   Loss 3.3434   LearningRate 0.0117   Epoch: 13   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:19,721-Speed 3415.90 samples/sec   Loss 3.2234   LearningRate 0.0117   Epoch: 13   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:22,720-Speed 3415.30 samples/sec   Loss 3.2096   LearningRate 0.0117   Epoch: 13   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:25,767-Speed 3361.65 samples/sec   Loss 3.2690   LearningRate 0.0117   Epoch: 13   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:28,763-Speed 3418.61 samples/sec   Loss 3.2815   LearningRate 0.0117   Epoch: 13   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:31,763-Speed 3414.93 samples/sec   Loss 3.3278   LearningRate 0.0117   Epoch: 13   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:34,761-Speed 3415.92 samples/sec   Loss 3.2908   LearningRate 0.0117   Epoch: 13   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:37,776-Speed 3397.23 samples/sec   Loss 3.2470   LearningRate 0.0117   Epoch: 13   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:40,774-Speed 3416.08 samples/sec   Loss 3.4171   LearningRate 0.0117   Epoch: 13   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:43,751-Speed 3440.79 samples/sec   Loss 3.4514   LearningRate 0.0117   Epoch: 13   Global Step: 66610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:46,754-Speed 3411.21 samples/sec   Loss 3.3741   LearningRate 0.0117   Epoch: 13   Global Step: 66620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:49,800-Speed 3362.44 samples/sec   Loss 3.3453   LearningRate 0.0117   Epoch: 13   Global Step: 66630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:52,805-Speed 3408.44 samples/sec   Loss 3.4533   LearningRate 0.0116   Epoch: 13   Global Step: 66640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:55,804-Speed 3415.50 samples/sec   Loss 3.2885   LearningRate 0.0116   Epoch: 13   Global Step: 66650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:06:58,849-Speed 3363.58 samples/sec   Loss 3.3049   LearningRate 0.0116   Epoch: 13   Global Step: 66660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:01,855-Speed 3408.07 samples/sec   Loss 3.1475   LearningRate 0.0116   Epoch: 13   Global Step: 66670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:04,861-Speed 3406.93 samples/sec   Loss 3.2636   LearningRate 0.0116   Epoch: 13   Global Step: 66680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:07,870-Speed 3404.49 samples/sec   Loss 3.3108   LearningRate 0.0116   Epoch: 13   Global Step: 66690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:10,877-Speed 3405.87 samples/sec   Loss 3.2780   LearningRate 0.0116   Epoch: 13   Global Step: 66700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:13,879-Speed 3411.33 samples/sec   Loss 3.4563   LearningRate 0.0116   Epoch: 13   Global Step: 66710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 06:07:16,872-Speed 3422.85 samples/sec   Loss 3.3771   LearningRate 0.0116   Epoch: 13   Global Step: 66720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:19,883-Speed 3401.68 samples/sec   Loss 3.3260   LearningRate 0.0116   Epoch: 13   Global Step: 66730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:22,888-Speed 3408.67 samples/sec   Loss 3.3029   LearningRate 0.0116   Epoch: 13   Global Step: 66740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:25,895-Speed 3405.92 samples/sec   Loss 3.3103   LearningRate 0.0116   Epoch: 13   Global Step: 66750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:28,900-Speed 3408.42 samples/sec   Loss 3.3359   LearningRate 0.0116   Epoch: 13   Global Step: 66760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:31,902-Speed 3412.57 samples/sec   Loss 3.3019   LearningRate 0.0116   Epoch: 13   Global Step: 66770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:34,985-Speed 3322.07 samples/sec   Loss 3.2611   LearningRate 0.0116   Epoch: 13   Global Step: 66780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:38,002-Speed 3395.20 samples/sec   Loss 3.3780   LearningRate 0.0115   Epoch: 13   Global Step: 66790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:41,176-Speed 3227.02 samples/sec   Loss 3.3457   LearningRate 0.0115   Epoch: 13   Global Step: 66800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:44,294-Speed 3284.04 samples/sec   Loss 3.3920   LearningRate 0.0115   Epoch: 13   Global Step: 66810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:47,294-Speed 3414.99 samples/sec   Loss 3.4095   LearningRate 0.0115   Epoch: 13   Global Step: 66820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:07:50,340-Speed 3362.15 samples/sec   Loss 3.3164   LearningRate 0.0115   Epoch: 13   Global Step: 66830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:53,360-Speed 3391.71 samples/sec   Loss 3.3692   LearningRate 0.0115   Epoch: 13   Global Step: 66840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:56,368-Speed 3405.25 samples/sec   Loss 3.4462   LearningRate 0.0115   Epoch: 13   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:07:59,370-Speed 3411.61 samples/sec   Loss 3.3892   LearningRate 0.0115   Epoch: 13   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:02,410-Speed 3370.16 samples/sec   Loss 3.4609   LearningRate 0.0115   Epoch: 13   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:05,414-Speed 3409.60 samples/sec   Loss 3.3078   LearningRate 0.0115   Epoch: 13   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:08,412-Speed 3415.40 samples/sec   Loss 3.3558   LearningRate 0.0115   Epoch: 13   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:11,431-Speed 3393.50 samples/sec   Loss 3.3197   LearningRate 0.0115   Epoch: 13   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:14,433-Speed 3411.06 samples/sec   Loss 3.2149   LearningRate 0.0115   Epoch: 13   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:17,450-Speed 3395.92 samples/sec   Loss 3.4654   LearningRate 0.0115   Epoch: 13   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:20,449-Speed 3414.95 samples/sec   Loss 3.3755   LearningRate 0.0114   Epoch: 13   Global Step: 66930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 06:08:23,431-Speed 3434.35 samples/sec   Loss 3.4891   LearningRate 0.0114   Epoch: 13   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:26,447-Speed 3396.94 samples/sec   Loss 3.3476   LearningRate 0.0114   Epoch: 13   Global Step: 66950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:29,452-Speed 3408.14 samples/sec   Loss 3.3487   LearningRate 0.0114   Epoch: 13   Global Step: 66960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:32,477-Speed 3385.74 samples/sec   Loss 3.2795   LearningRate 0.0114   Epoch: 13   Global Step: 66970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:35,491-Speed 3398.82 samples/sec   Loss 3.4362   LearningRate 0.0114   Epoch: 13   Global Step: 66980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:38,501-Speed 3402.68 samples/sec   Loss 3.4542   LearningRate 0.0114   Epoch: 13   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:41,504-Speed 3410.05 samples/sec   Loss 3.3175   LearningRate 0.0114   Epoch: 13   Global Step: 67000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:44,506-Speed 3412.68 samples/sec   Loss 3.3207   LearningRate 0.0114   Epoch: 13   Global Step: 67010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:08:47,495-Speed 3426.46 samples/sec   Loss 3.3979   LearningRate 0.0114   Epoch: 13   Global Step: 67020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:08:50,514-Speed 3393.01 samples/sec   Loss 3.3367   LearningRate 0.0114   Epoch: 13   Global Step: 67030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:08:53,564-Speed 3357.26 samples/sec   Loss 3.3253   LearningRate 0.0114   Epoch: 13   Global Step: 67040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:08:56,580-Speed 3396.83 samples/sec   Loss 3.4028   LearningRate 0.0114   Epoch: 13   Global Step: 67050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:08:59,596-Speed 3395.70 samples/sec   Loss 3.3582   LearningRate 0.0114   Epoch: 13   Global Step: 67060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:09:02,601-Speed 3408.89 samples/sec   Loss 3.3966   LearningRate 0.0114   Epoch: 13   Global Step: 67070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:09:05,607-Speed 3407.91 samples/sec   Loss 3.3909   LearningRate 0.0113   Epoch: 13   Global Step: 67080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:09:08,616-Speed 3403.77 samples/sec   Loss 3.5222   LearningRate 0.0113   Epoch: 13   Global Step: 67090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:09:11,617-Speed 3413.31 samples/sec   Loss 3.4249   LearningRate 0.0113   Epoch: 13   Global Step: 67100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:09:14,613-Speed 3417.81 samples/sec   Loss 3.3972   LearningRate 0.0113   Epoch: 13   Global Step: 67110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 06:09:17,635-Speed 3389.80 samples/sec   Loss 3.3231   LearningRate 0.0113   Epoch: 13   Global Step: 67120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:09:20,645-Speed 3403.17 samples/sec   Loss 3.3539   LearningRate 0.0113   Epoch: 13   Global Step: 67130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:09:23,643-Speed 3415.48 samples/sec   Loss 3.2996   LearningRate 0.0113   Epoch: 13   Global Step: 67140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:09:26,708-Speed 3342.04 samples/sec   Loss 3.3668   LearningRate 0.0113   Epoch: 13   Global Step: 67150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 06:09:29,724-Speed 3396.60 samples/sec   Loss 3.3769   LearningRate 0.0113   Epoch: 13   Global Step: 67160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:09:32,725-Speed 3412.83 samples/sec   Loss 3.4521   LearningRate 0.0113   Epoch: 13   Global Step: 67170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:09:35,731-Speed 3407.48 samples/sec   Loss 3.2845   LearningRate 0.0113   Epoch: 13   Global Step: 67180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:09:38,736-Speed 3408.32 samples/sec   Loss 3.4847   LearningRate 0.0113   Epoch: 13   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:09:41,737-Speed 3412.97 samples/sec   Loss 3.4117   LearningRate 0.0113   Epoch: 13   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:09:44,721-Speed 3433.07 samples/sec   Loss 3.4648   LearningRate 0.0113   Epoch: 13   Global Step: 67210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:09:47,756-Speed 3374.34 samples/sec   Loss 3.3908   LearningRate 0.0113   Epoch: 13   Global Step: 67220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:09:50,828-Speed 3334.51 samples/sec   Loss 3.3294   LearningRate 0.0112   Epoch: 13   Global Step: 67230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:09:53,828-Speed 3413.51 samples/sec   Loss 3.4712   LearningRate 0.0112   Epoch: 13   Global Step: 67240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:09:56,832-Speed 3410.52 samples/sec   Loss 3.3132   LearningRate 0.0112   Epoch: 13   Global Step: 67250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:09:59,859-Speed 3383.40 samples/sec   Loss 3.3348   LearningRate 0.0112   Epoch: 13   Global Step: 67260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:02,879-Speed 3392.18 samples/sec   Loss 3.4215   LearningRate 0.0112   Epoch: 13   Global Step: 67270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:05,991-Speed 3291.00 samples/sec   Loss 3.3604   LearningRate 0.0112   Epoch: 13   Global Step: 67280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:09,025-Speed 3376.18 samples/sec   Loss 3.2591   LearningRate 0.0112   Epoch: 13   Global Step: 67290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:12,042-Speed 3395.33 samples/sec   Loss 3.3414   LearningRate 0.0112   Epoch: 13   Global Step: 67300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:15,062-Speed 3390.53 samples/sec   Loss 3.5142   LearningRate 0.0112   Epoch: 13   Global Step: 67310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:10:18,055-Speed 3422.39 samples/sec   Loss 3.4329   LearningRate 0.0112   Epoch: 13   Global Step: 67320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:21,060-Speed 3408.88 samples/sec   Loss 3.5412   LearningRate 0.0112   Epoch: 13   Global Step: 67330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:24,089-Speed 3381.90 samples/sec   Loss 3.4405   LearningRate 0.0112   Epoch: 13   Global Step: 67340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:27,105-Speed 3395.43 samples/sec   Loss 3.4319   LearningRate 0.0112   Epoch: 13   Global Step: 67350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:30,113-Speed 3405.97 samples/sec   Loss 3.3538   LearningRate 0.0112   Epoch: 13   Global Step: 67360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:33,118-Speed 3407.77 samples/sec   Loss 3.5216   LearningRate 0.0112   Epoch: 13   Global Step: 67370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:36,125-Speed 3406.52 samples/sec   Loss 3.4123   LearningRate 0.0112   Epoch: 13   Global Step: 67380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:39,130-Speed 3408.02 samples/sec   Loss 3.4326   LearningRate 0.0111   Epoch: 13   Global Step: 67390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:42,133-Speed 3411.54 samples/sec   Loss 3.4658   LearningRate 0.0111   Epoch: 13   Global Step: 67400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:45,145-Speed 3399.87 samples/sec   Loss 3.4205   LearningRate 0.0111   Epoch: 13   Global Step: 67410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:10:48,147-Speed 3411.70 samples/sec   Loss 3.4354   LearningRate 0.0111   Epoch: 13   Global Step: 67420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:10:51,152-Speed 3408.93 samples/sec   Loss 3.4131   LearningRate 0.0111   Epoch: 13   Global Step: 67430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:10:54,153-Speed 3412.70 samples/sec   Loss 3.3534   LearningRate 0.0111   Epoch: 13   Global Step: 67440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:10:57,156-Speed 3411.37 samples/sec   Loss 3.4686   LearningRate 0.0111   Epoch: 13   Global Step: 67450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:00,164-Speed 3405.54 samples/sec   Loss 3.3796   LearningRate 0.0111   Epoch: 13   Global Step: 67460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:03,153-Speed 3426.68 samples/sec   Loss 3.5727   LearningRate 0.0111   Epoch: 13   Global Step: 67470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:06,157-Speed 3409.45 samples/sec   Loss 3.4242   LearningRate 0.0111   Epoch: 13   Global Step: 67480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:09,165-Speed 3404.47 samples/sec   Loss 3.3825   LearningRate 0.0111   Epoch: 13   Global Step: 67490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:12,168-Speed 3410.86 samples/sec   Loss 3.5662   LearningRate 0.0111   Epoch: 13   Global Step: 67500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:15,175-Speed 3406.86 samples/sec   Loss 3.3459   LearningRate 0.0111   Epoch: 13   Global Step: 67510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:18,187-Speed 3400.68 samples/sec   Loss 3.4980   LearningRate 0.0111   Epoch: 13   Global Step: 67520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:21,187-Speed 3413.61 samples/sec   Loss 3.4032   LearningRate 0.0111   Epoch: 13   Global Step: 67530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:24,190-Speed 3410.68 samples/sec   Loss 3.3705   LearningRate 0.0110   Epoch: 13   Global Step: 67540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:27,205-Speed 3398.12 samples/sec   Loss 3.3549   LearningRate 0.0110   Epoch: 13   Global Step: 67550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:30,210-Speed 3407.81 samples/sec   Loss 3.5523   LearningRate 0.0110   Epoch: 13   Global Step: 67560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:11:33,210-Speed 3414.48 samples/sec   Loss 3.4183   LearningRate 0.0110   Epoch: 13   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:36,210-Speed 3414.48 samples/sec   Loss 3.4265   LearningRate 0.0110   Epoch: 13   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:39,229-Speed 3391.56 samples/sec   Loss 3.3879   LearningRate 0.0110   Epoch: 13   Global Step: 67590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:42,243-Speed 3398.49 samples/sec   Loss 3.5470   LearningRate 0.0110   Epoch: 13   Global Step: 67600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:45,246-Speed 3411.13 samples/sec   Loss 3.4450   LearningRate 0.0110   Epoch: 13   Global Step: 67610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:48,250-Speed 3409.51 samples/sec   Loss 3.3831   LearningRate 0.0110   Epoch: 13   Global Step: 67620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:51,256-Speed 3408.22 samples/sec   Loss 3.3946   LearningRate 0.0110   Epoch: 13   Global Step: 67630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:54,257-Speed 3412.25 samples/sec   Loss 3.5376   LearningRate 0.0110   Epoch: 13   Global Step: 67640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:11:57,256-Speed 3416.13 samples/sec   Loss 3.5718   LearningRate 0.0110   Epoch: 13   Global Step: 67650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:12:00,278-Speed 3388.69 samples/sec   Loss 3.4813   LearningRate 0.0110   Epoch: 13   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:03,327-Speed 3359.96 samples/sec   Loss 3.3552   LearningRate 0.0110   Epoch: 13   Global Step: 67670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:06,371-Speed 3364.59 samples/sec   Loss 3.4897   LearningRate 0.0110   Epoch: 13   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:09,378-Speed 3406.03 samples/sec   Loss 3.4827   LearningRate 0.0109   Epoch: 13   Global Step: 67690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:12,400-Speed 3389.48 samples/sec   Loss 3.5704   LearningRate 0.0109   Epoch: 13   Global Step: 67700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:15,412-Speed 3400.38 samples/sec   Loss 3.5089   LearningRate 0.0109   Epoch: 13   Global Step: 67710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:18,426-Speed 3398.82 samples/sec   Loss 3.4249   LearningRate 0.0109   Epoch: 13   Global Step: 67720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:21,430-Speed 3409.60 samples/sec   Loss 3.4303   LearningRate 0.0109   Epoch: 13   Global Step: 67730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:24,434-Speed 3409.82 samples/sec   Loss 3.5960   LearningRate 0.0109   Epoch: 13   Global Step: 67740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:27,450-Speed 3396.26 samples/sec   Loss 3.4765   LearningRate 0.0109   Epoch: 13   Global Step: 67750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:30,459-Speed 3403.39 samples/sec   Loss 3.3333   LearningRate 0.0109   Epoch: 13   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:12:33,462-Speed 3410.56 samples/sec   Loss 3.4668   LearningRate 0.0109   Epoch: 13   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:12:36,488-Speed 3384.88 samples/sec   Loss 3.4244   LearningRate 0.0109   Epoch: 13   Global Step: 67780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:12:39,527-Speed 3370.24 samples/sec   Loss 3.4118   LearningRate 0.0109   Epoch: 13   Global Step: 67790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:42,546-Speed 3393.16 samples/sec   Loss 3.3178   LearningRate 0.0109   Epoch: 13   Global Step: 67800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:45,551-Speed 3407.53 samples/sec   Loss 3.4808   LearningRate 0.0109   Epoch: 13   Global Step: 67810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:48,555-Speed 3409.73 samples/sec   Loss 3.4180   LearningRate 0.0109   Epoch: 13   Global Step: 67820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:51,564-Speed 3404.49 samples/sec   Loss 3.4996   LearningRate 0.0109   Epoch: 13   Global Step: 67830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:54,607-Speed 3365.99 samples/sec   Loss 3.2613   LearningRate 0.0108   Epoch: 13   Global Step: 67840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:12:57,635-Speed 3383.33 samples/sec   Loss 3.4559   LearningRate 0.0108   Epoch: 13   Global Step: 67850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:00,647-Speed 3400.28 samples/sec   Loss 3.4190   LearningRate 0.0108   Epoch: 13   Global Step: 67860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:03,649-Speed 3411.19 samples/sec   Loss 3.4737   LearningRate 0.0108   Epoch: 13   Global Step: 67870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:06,664-Speed 3397.53 samples/sec   Loss 3.4267   LearningRate 0.0108   Epoch: 13   Global Step: 67880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:09,665-Speed 3412.88 samples/sec   Loss 3.5035   LearningRate 0.0108   Epoch: 13   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:13:12,668-Speed 3411.28 samples/sec   Loss 3.4506   LearningRate 0.0108   Epoch: 13   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:13:15,719-Speed 3357.42 samples/sec   Loss 3.3756   LearningRate 0.0108   Epoch: 13   Global Step: 67910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:18,785-Speed 3340.49 samples/sec   Loss 3.5643   LearningRate 0.0108   Epoch: 13   Global Step: 67920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:21,792-Speed 3406.36 samples/sec   Loss 3.4311   LearningRate 0.0108   Epoch: 13   Global Step: 67930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:24,810-Speed 3394.38 samples/sec   Loss 3.6493   LearningRate 0.0108   Epoch: 13   Global Step: 67940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:27,833-Speed 3387.77 samples/sec   Loss 3.5987   LearningRate 0.0108   Epoch: 13   Global Step: 67950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:30,871-Speed 3371.78 samples/sec   Loss 3.4208   LearningRate 0.0108   Epoch: 13   Global Step: 67960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:33,879-Speed 3404.86 samples/sec   Loss 3.3550   LearningRate 0.0108   Epoch: 13   Global Step: 67970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:36,920-Speed 3367.78 samples/sec   Loss 3.5627   LearningRate 0.0108   Epoch: 13   Global Step: 67980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:39,939-Speed 3392.77 samples/sec   Loss 3.4549   LearningRate 0.0108   Epoch: 13   Global Step: 67990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:13:42,953-Speed 3397.88 samples/sec   Loss 3.4341   LearningRate 0.0107   Epoch: 13   Global Step: 68000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:14:27,204-[lfw][68000]XNorm: 23.713765
Training: 2022-04-11 06:14:27,205-[lfw][68000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-11 06:14:27,205-[lfw][68000]Accuracy-Highest: 0.99850
Training: 2022-04-11 06:15:18,707-[cfp_fp][68000]XNorm: 22.193234
Training: 2022-04-11 06:15:18,707-[cfp_fp][68000]Accuracy-Flip: 0.98300+-0.00555
Training: 2022-04-11 06:15:18,708-[cfp_fp][68000]Accuracy-Highest: 0.98386
Training: 2022-04-11 06:16:03,243-[agedb_30][68000]XNorm: 23.814715
Training: 2022-04-11 06:16:03,244-[agedb_30][68000]Accuracy-Flip: 0.98350+-0.00681
Training: 2022-04-11 06:16:03,245-[agedb_30][68000]Accuracy-Highest: 0.98350
Training: 2022-04-11 06:16:06,256-Speed 71.46 samples/sec   Loss 3.4309   LearningRate 0.0107   Epoch: 13   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:16:09,242-Speed 3429.41 samples/sec   Loss 3.5168   LearningRate 0.0107   Epoch: 13   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:16:12,250-Speed 3404.97 samples/sec   Loss 3.3346   LearningRate 0.0107   Epoch: 13   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:16:15,244-Speed 3421.78 samples/sec   Loss 3.4914   LearningRate 0.0107   Epoch: 13   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:16:18,244-Speed 3413.98 samples/sec   Loss 3.5780   LearningRate 0.0107   Epoch: 13   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:16:21,237-Speed 3422.44 samples/sec   Loss 3.6364   LearningRate 0.0107   Epoch: 13   Global Step: 68060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:16:24,222-Speed 3431.55 samples/sec   Loss 3.4713   LearningRate 0.0107   Epoch: 13   Global Step: 68070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:27,226-Speed 3408.95 samples/sec   Loss 3.4850   LearningRate 0.0107   Epoch: 13   Global Step: 68080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:30,304-Speed 3327.36 samples/sec   Loss 3.4962   LearningRate 0.0107   Epoch: 13   Global Step: 68090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:33,299-Speed 3420.31 samples/sec   Loss 3.4609   LearningRate 0.0107   Epoch: 13   Global Step: 68100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:36,298-Speed 3416.03 samples/sec   Loss 3.4126   LearningRate 0.0107   Epoch: 13   Global Step: 68110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:39,340-Speed 3367.66 samples/sec   Loss 3.4961   LearningRate 0.0107   Epoch: 13   Global Step: 68120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:42,384-Speed 3364.47 samples/sec   Loss 3.4185   LearningRate 0.0107   Epoch: 13   Global Step: 68130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:45,391-Speed 3406.62 samples/sec   Loss 3.4429   LearningRate 0.0107   Epoch: 13   Global Step: 68140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:48,384-Speed 3422.36 samples/sec   Loss 3.4446   LearningRate 0.0106   Epoch: 13   Global Step: 68150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:51,385-Speed 3413.25 samples/sec   Loss 3.4552   LearningRate 0.0106   Epoch: 13   Global Step: 68160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:16:54,430-Speed 3363.95 samples/sec   Loss 3.4556   LearningRate 0.0106   Epoch: 13   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:16:57,433-Speed 3410.01 samples/sec   Loss 3.5487   LearningRate 0.0106   Epoch: 13   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:00,436-Speed 3411.49 samples/sec   Loss 3.4540   LearningRate 0.0106   Epoch: 13   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:03,511-Speed 3331.12 samples/sec   Loss 3.4846   LearningRate 0.0106   Epoch: 13   Global Step: 68200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:06,521-Speed 3401.82 samples/sec   Loss 3.3109   LearningRate 0.0106   Epoch: 13   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:09,518-Speed 3418.06 samples/sec   Loss 3.4166   LearningRate 0.0106   Epoch: 13   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:12,520-Speed 3411.60 samples/sec   Loss 3.4519   LearningRate 0.0106   Epoch: 13   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:15,524-Speed 3411.01 samples/sec   Loss 3.3921   LearningRate 0.0106   Epoch: 13   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:18,520-Speed 3418.56 samples/sec   Loss 3.3440   LearningRate 0.0106   Epoch: 13   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:21,520-Speed 3414.98 samples/sec   Loss 3.4051   LearningRate 0.0106   Epoch: 13   Global Step: 68260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:24,524-Speed 3409.34 samples/sec   Loss 3.5386   LearningRate 0.0106   Epoch: 13   Global Step: 68270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 06:17:27,535-Speed 3401.45 samples/sec   Loss 3.3283   LearningRate 0.0106   Epoch: 13   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:30,545-Speed 3402.51 samples/sec   Loss 3.4957   LearningRate 0.0106   Epoch: 13   Global Step: 68290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:33,542-Speed 3418.11 samples/sec   Loss 3.5025   LearningRate 0.0106   Epoch: 13   Global Step: 68300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:36,538-Speed 3418.30 samples/sec   Loss 3.4624   LearningRate 0.0105   Epoch: 13   Global Step: 68310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:17:39,524-Speed 3429.99 samples/sec   Loss 3.5511   LearningRate 0.0105   Epoch: 13   Global Step: 68320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:17:42,543-Speed 3393.64 samples/sec   Loss 3.3819   LearningRate 0.0105   Epoch: 13   Global Step: 68330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:17:45,546-Speed 3411.18 samples/sec   Loss 3.4719   LearningRate 0.0105   Epoch: 13   Global Step: 68340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:17:48,573-Speed 3383.86 samples/sec   Loss 3.4417   LearningRate 0.0105   Epoch: 13   Global Step: 68350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:17:51,578-Speed 3407.53 samples/sec   Loss 3.4622   LearningRate 0.0105   Epoch: 13   Global Step: 68360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:17:54,576-Speed 3417.07 samples/sec   Loss 3.4627   LearningRate 0.0105   Epoch: 13   Global Step: 68370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:17:57,587-Speed 3401.17 samples/sec   Loss 3.4251   LearningRate 0.0105   Epoch: 13   Global Step: 68380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:18:00,637-Speed 3358.97 samples/sec   Loss 3.4389   LearningRate 0.0105   Epoch: 13   Global Step: 68390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:18:03,657-Speed 3390.67 samples/sec   Loss 3.5272   LearningRate 0.0105   Epoch: 13   Global Step: 68400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:18:06,663-Speed 3407.21 samples/sec   Loss 3.3235   LearningRate 0.0105   Epoch: 13   Global Step: 68410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:18:09,664-Speed 3413.70 samples/sec   Loss 3.4557   LearningRate 0.0105   Epoch: 13   Global Step: 68420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:12,665-Speed 3413.79 samples/sec   Loss 3.4266   LearningRate 0.0105   Epoch: 13   Global Step: 68430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:15,670-Speed 3407.94 samples/sec   Loss 3.4928   LearningRate 0.0105   Epoch: 13   Global Step: 68440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:18,666-Speed 3419.34 samples/sec   Loss 3.4549   LearningRate 0.0105   Epoch: 13   Global Step: 68450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:21,664-Speed 3416.48 samples/sec   Loss 3.5328   LearningRate 0.0104   Epoch: 13   Global Step: 68460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:24,673-Speed 3403.06 samples/sec   Loss 3.4340   LearningRate 0.0104   Epoch: 13   Global Step: 68470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:27,671-Speed 3417.18 samples/sec   Loss 3.3031   LearningRate 0.0104   Epoch: 13   Global Step: 68480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:30,700-Speed 3381.51 samples/sec   Loss 3.3972   LearningRate 0.0104   Epoch: 13   Global Step: 68490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:33,705-Speed 3408.53 samples/sec   Loss 3.4793   LearningRate 0.0104   Epoch: 13   Global Step: 68500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:36,725-Speed 3390.61 samples/sec   Loss 3.3901   LearningRate 0.0104   Epoch: 13   Global Step: 68510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:39,713-Speed 3428.18 samples/sec   Loss 3.4433   LearningRate 0.0104   Epoch: 13   Global Step: 68520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:42,714-Speed 3413.78 samples/sec   Loss 3.3664   LearningRate 0.0104   Epoch: 13   Global Step: 68530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:45,717-Speed 3410.65 samples/sec   Loss 3.2703   LearningRate 0.0104   Epoch: 13   Global Step: 68540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:48,724-Speed 3406.51 samples/sec   Loss 3.4704   LearningRate 0.0104   Epoch: 13   Global Step: 68550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:51,770-Speed 3361.56 samples/sec   Loss 3.3875   LearningRate 0.0104   Epoch: 13   Global Step: 68560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:54,779-Speed 3405.14 samples/sec   Loss 3.4754   LearningRate 0.0104   Epoch: 13   Global Step: 68570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:18:57,786-Speed 3405.25 samples/sec   Loss 3.4191   LearningRate 0.0104   Epoch: 13   Global Step: 68580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:00,837-Speed 3357.95 samples/sec   Loss 3.4548   LearningRate 0.0104   Epoch: 13   Global Step: 68590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:03,850-Speed 3398.58 samples/sec   Loss 3.4067   LearningRate 0.0104   Epoch: 13   Global Step: 68600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:06,866-Speed 3395.90 samples/sec   Loss 3.4783   LearningRate 0.0104   Epoch: 13   Global Step: 68610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:09,848-Speed 3435.92 samples/sec   Loss 3.4644   LearningRate 0.0103   Epoch: 13   Global Step: 68620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:12,851-Speed 3410.36 samples/sec   Loss 3.4512   LearningRate 0.0103   Epoch: 13   Global Step: 68630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:15,856-Speed 3408.77 samples/sec   Loss 3.4645   LearningRate 0.0103   Epoch: 13   Global Step: 68640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:18,856-Speed 3414.40 samples/sec   Loss 3.5104   LearningRate 0.0103   Epoch: 13   Global Step: 68650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:21,863-Speed 3405.64 samples/sec   Loss 3.4283   LearningRate 0.0103   Epoch: 13   Global Step: 68660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:24,884-Speed 3390.24 samples/sec   Loss 3.6559   LearningRate 0.0103   Epoch: 13   Global Step: 68670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:19:27,882-Speed 3416.34 samples/sec   Loss 3.5924   LearningRate 0.0103   Epoch: 13   Global Step: 68680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:30,886-Speed 3410.65 samples/sec   Loss 3.4766   LearningRate 0.0103   Epoch: 13   Global Step: 68690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:33,890-Speed 3408.76 samples/sec   Loss 3.5354   LearningRate 0.0103   Epoch: 13   Global Step: 68700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:36,895-Speed 3408.93 samples/sec   Loss 3.5134   LearningRate 0.0103   Epoch: 13   Global Step: 68710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:39,924-Speed 3381.67 samples/sec   Loss 3.5661   LearningRate 0.0103   Epoch: 13   Global Step: 68720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:42,924-Speed 3413.71 samples/sec   Loss 3.3729   LearningRate 0.0103   Epoch: 13   Global Step: 68730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:45,936-Speed 3401.91 samples/sec   Loss 3.4449   LearningRate 0.0103   Epoch: 13   Global Step: 68740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:48,982-Speed 3362.13 samples/sec   Loss 3.4445   LearningRate 0.0103   Epoch: 13   Global Step: 68750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:51,984-Speed 3411.66 samples/sec   Loss 3.5933   LearningRate 0.0103   Epoch: 13   Global Step: 68760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:54,984-Speed 3414.27 samples/sec   Loss 3.4544   LearningRate 0.0103   Epoch: 13   Global Step: 68770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:19:57,985-Speed 3412.85 samples/sec   Loss 3.4143   LearningRate 0.0102   Epoch: 13   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:01,009-Speed 3386.70 samples/sec   Loss 3.3734   LearningRate 0.0102   Epoch: 13   Global Step: 68790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:04,016-Speed 3406.42 samples/sec   Loss 3.4071   LearningRate 0.0102   Epoch: 13   Global Step: 68800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:07,028-Speed 3399.94 samples/sec   Loss 3.5214   LearningRate 0.0102   Epoch: 13   Global Step: 68810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:10,016-Speed 3428.37 samples/sec   Loss 3.4332   LearningRate 0.0102   Epoch: 13   Global Step: 68820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:13,036-Speed 3392.38 samples/sec   Loss 3.4746   LearningRate 0.0102   Epoch: 13   Global Step: 68830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:16,174-Speed 3263.78 samples/sec   Loss 3.4487   LearningRate 0.0102   Epoch: 13   Global Step: 68840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:19,217-Speed 3366.21 samples/sec   Loss 3.4640   LearningRate 0.0102   Epoch: 13   Global Step: 68850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:22,224-Speed 3405.73 samples/sec   Loss 3.4624   LearningRate 0.0102   Epoch: 13   Global Step: 68860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:25,270-Speed 3363.07 samples/sec   Loss 3.5126   LearningRate 0.0102   Epoch: 13   Global Step: 68870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:28,323-Speed 3354.81 samples/sec   Loss 3.4325   LearningRate 0.0102   Epoch: 13   Global Step: 68880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:31,321-Speed 3416.59 samples/sec   Loss 3.4304   LearningRate 0.0102   Epoch: 13   Global Step: 68890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:34,322-Speed 3412.63 samples/sec   Loss 3.4955   LearningRate 0.0102   Epoch: 13   Global Step: 68900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:37,333-Speed 3400.95 samples/sec   Loss 3.4766   LearningRate 0.0102   Epoch: 13   Global Step: 68910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:20:40,355-Speed 3390.09 samples/sec   Loss 3.4854   LearningRate 0.0102   Epoch: 13   Global Step: 68920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:43,358-Speed 3410.92 samples/sec   Loss 3.4554   LearningRate 0.0102   Epoch: 13   Global Step: 68930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:46,366-Speed 3404.84 samples/sec   Loss 3.4338   LearningRate 0.0101   Epoch: 13   Global Step: 68940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:49,386-Speed 3391.41 samples/sec   Loss 3.4888   LearningRate 0.0101   Epoch: 13   Global Step: 68950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:52,392-Speed 3408.11 samples/sec   Loss 3.6025   LearningRate 0.0101   Epoch: 13   Global Step: 68960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:55,394-Speed 3411.26 samples/sec   Loss 3.3716   LearningRate 0.0101   Epoch: 13   Global Step: 68970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:20:58,397-Speed 3410.30 samples/sec   Loss 3.4979   LearningRate 0.0101   Epoch: 13   Global Step: 68980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:21:01,411-Speed 3399.33 samples/sec   Loss 3.5118   LearningRate 0.0101   Epoch: 13   Global Step: 68990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:21:04,468-Speed 3349.78 samples/sec   Loss 3.4955   LearningRate 0.0101   Epoch: 13   Global Step: 69000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:21:07,469-Speed 3413.08 samples/sec   Loss 3.4931   LearningRate 0.0101   Epoch: 13   Global Step: 69010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:10,471-Speed 3412.15 samples/sec   Loss 3.4881   LearningRate 0.0101   Epoch: 13   Global Step: 69020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:13,474-Speed 3411.24 samples/sec   Loss 3.5507   LearningRate 0.0101   Epoch: 13   Global Step: 69030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:16,491-Speed 3394.74 samples/sec   Loss 3.5700   LearningRate 0.0101   Epoch: 13   Global Step: 69040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:19,495-Speed 3409.16 samples/sec   Loss 3.4800   LearningRate 0.0101   Epoch: 13   Global Step: 69050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:22,497-Speed 3412.47 samples/sec   Loss 3.5402   LearningRate 0.0101   Epoch: 13   Global Step: 69060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:25,509-Speed 3400.37 samples/sec   Loss 3.3650   LearningRate 0.0101   Epoch: 13   Global Step: 69070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:28,546-Speed 3372.07 samples/sec   Loss 3.5011   LearningRate 0.0101   Epoch: 13   Global Step: 69080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:31,555-Speed 3404.53 samples/sec   Loss 3.3006   LearningRate 0.0101   Epoch: 13   Global Step: 69090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:34,562-Speed 3406.36 samples/sec   Loss 3.3804   LearningRate 0.0100   Epoch: 13   Global Step: 69100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:37,578-Speed 3395.59 samples/sec   Loss 3.5079   LearningRate 0.0100   Epoch: 13   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:21:40,625-Speed 3362.70 samples/sec   Loss 3.4490   LearningRate 0.0100   Epoch: 13   Global Step: 69120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:43,676-Speed 3356.78 samples/sec   Loss 3.4696   LearningRate 0.0100   Epoch: 13   Global Step: 69130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:46,682-Speed 3407.21 samples/sec   Loss 3.5002   LearningRate 0.0100   Epoch: 13   Global Step: 69140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:49,692-Speed 3402.84 samples/sec   Loss 3.5070   LearningRate 0.0100   Epoch: 13   Global Step: 69150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:52,709-Speed 3395.35 samples/sec   Loss 3.3686   LearningRate 0.0100   Epoch: 13   Global Step: 69160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:55,711-Speed 3411.41 samples/sec   Loss 3.3676   LearningRate 0.0100   Epoch: 13   Global Step: 69170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:21:58,716-Speed 3408.58 samples/sec   Loss 3.5355   LearningRate 0.0100   Epoch: 13   Global Step: 69180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:01,771-Speed 3353.61 samples/sec   Loss 3.4736   LearningRate 0.0100   Epoch: 13   Global Step: 69190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:05,001-Speed 3170.81 samples/sec   Loss 3.5262   LearningRate 0.0100   Epoch: 13   Global Step: 69200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:08,113-Speed 3291.33 samples/sec   Loss 3.4034   LearningRate 0.0100   Epoch: 13   Global Step: 69210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:11,124-Speed 3401.99 samples/sec   Loss 3.3300   LearningRate 0.0100   Epoch: 13   Global Step: 69220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:22:14,140-Speed 3395.97 samples/sec   Loss 3.5418   LearningRate 0.0100   Epoch: 13   Global Step: 69230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:17,146-Speed 3407.51 samples/sec   Loss 3.5063   LearningRate 0.0100   Epoch: 13   Global Step: 69240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:20,155-Speed 3403.26 samples/sec   Loss 3.4462   LearningRate 0.0100   Epoch: 13   Global Step: 69250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:23,158-Speed 3411.39 samples/sec   Loss 3.4578   LearningRate 0.0099   Epoch: 13   Global Step: 69260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:26,169-Speed 3401.12 samples/sec   Loss 3.4563   LearningRate 0.0099   Epoch: 13   Global Step: 69270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:29,188-Speed 3392.82 samples/sec   Loss 3.5275   LearningRate 0.0099   Epoch: 13   Global Step: 69280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:32,192-Speed 3410.60 samples/sec   Loss 3.5095   LearningRate 0.0099   Epoch: 13   Global Step: 69290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:35,197-Speed 3408.41 samples/sec   Loss 3.4664   LearningRate 0.0099   Epoch: 13   Global Step: 69300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:38,209-Speed 3400.66 samples/sec   Loss 3.4188   LearningRate 0.0099   Epoch: 13   Global Step: 69310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:41,214-Speed 3408.59 samples/sec   Loss 3.5255   LearningRate 0.0099   Epoch: 13   Global Step: 69320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:22:44,230-Speed 3395.84 samples/sec   Loss 3.4103   LearningRate 0.0099   Epoch: 13   Global Step: 69330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:22:47,236-Speed 3406.51 samples/sec   Loss 3.4744   LearningRate 0.0099   Epoch: 13   Global Step: 69340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:22:50,256-Speed 3391.82 samples/sec   Loss 3.4775   LearningRate 0.0099   Epoch: 13   Global Step: 69350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:22:53,326-Speed 3336.27 samples/sec   Loss 3.4439   LearningRate 0.0099   Epoch: 13   Global Step: 69360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:22:56,334-Speed 3406.15 samples/sec   Loss 3.5220   LearningRate 0.0099   Epoch: 13   Global Step: 69370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:22:59,333-Speed 3415.92 samples/sec   Loss 3.5189   LearningRate 0.0099   Epoch: 13   Global Step: 69380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:02,340-Speed 3406.03 samples/sec   Loss 3.4441   LearningRate 0.0099   Epoch: 13   Global Step: 69390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:05,347-Speed 3406.70 samples/sec   Loss 3.5249   LearningRate 0.0099   Epoch: 13   Global Step: 69400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:08,351-Speed 3409.64 samples/sec   Loss 3.5210   LearningRate 0.0099   Epoch: 13   Global Step: 69410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:11,370-Speed 3392.13 samples/sec   Loss 3.4849   LearningRate 0.0098   Epoch: 13   Global Step: 69420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:14,372-Speed 3411.45 samples/sec   Loss 3.4365   LearningRate 0.0098   Epoch: 13   Global Step: 69430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:17,393-Speed 3390.99 samples/sec   Loss 3.4325   LearningRate 0.0098   Epoch: 13   Global Step: 69440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:20,398-Speed 3408.87 samples/sec   Loss 3.3709   LearningRate 0.0098   Epoch: 13   Global Step: 69450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:23,421-Speed 3387.89 samples/sec   Loss 3.5669   LearningRate 0.0098   Epoch: 13   Global Step: 69460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:26,449-Speed 3382.81 samples/sec   Loss 3.3395   LearningRate 0.0098   Epoch: 13   Global Step: 69470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:23:29,465-Speed 3396.09 samples/sec   Loss 3.4528   LearningRate 0.0098   Epoch: 13   Global Step: 69480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:32,472-Speed 3405.57 samples/sec   Loss 3.5486   LearningRate 0.0098   Epoch: 13   Global Step: 69490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:35,476-Speed 3410.18 samples/sec   Loss 3.3818   LearningRate 0.0098   Epoch: 13   Global Step: 69500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:38,489-Speed 3399.06 samples/sec   Loss 3.4391   LearningRate 0.0098   Epoch: 13   Global Step: 69510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:41,551-Speed 3345.58 samples/sec   Loss 3.4655   LearningRate 0.0098   Epoch: 13   Global Step: 69520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:44,588-Speed 3372.49 samples/sec   Loss 3.3286   LearningRate 0.0098   Epoch: 13   Global Step: 69530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:47,772-Speed 3216.59 samples/sec   Loss 3.4416   LearningRate 0.0098   Epoch: 13   Global Step: 69540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:50,888-Speed 3288.07 samples/sec   Loss 3.4007   LearningRate 0.0098   Epoch: 13   Global Step: 69550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:53,904-Speed 3396.45 samples/sec   Loss 3.3555   LearningRate 0.0098   Epoch: 13   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:56,908-Speed 3409.49 samples/sec   Loss 3.5287   LearningRate 0.0098   Epoch: 13   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:23:59,895-Speed 3428.78 samples/sec   Loss 3.4804   LearningRate 0.0097   Epoch: 13   Global Step: 69580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:02,905-Speed 3402.89 samples/sec   Loss 3.5278   LearningRate 0.0097   Epoch: 13   Global Step: 69590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:05,908-Speed 3410.60 samples/sec   Loss 3.3987   LearningRate 0.0097   Epoch: 13   Global Step: 69600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:08,921-Speed 3398.88 samples/sec   Loss 3.6077   LearningRate 0.0097   Epoch: 13   Global Step: 69610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:11,939-Speed 3394.79 samples/sec   Loss 3.4617   LearningRate 0.0097   Epoch: 13   Global Step: 69620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:14,957-Speed 3393.80 samples/sec   Loss 3.4212   LearningRate 0.0097   Epoch: 13   Global Step: 69630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:17,997-Speed 3368.81 samples/sec   Loss 3.3929   LearningRate 0.0097   Epoch: 13   Global Step: 69640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:21,006-Speed 3404.85 samples/sec   Loss 3.3888   LearningRate 0.0097   Epoch: 13   Global Step: 69650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:24,013-Speed 3405.76 samples/sec   Loss 3.4761   LearningRate 0.0097   Epoch: 13   Global Step: 69660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:27,018-Speed 3408.65 samples/sec   Loss 3.5284   LearningRate 0.0097   Epoch: 13   Global Step: 69670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:30,034-Speed 3395.78 samples/sec   Loss 3.3308   LearningRate 0.0097   Epoch: 13   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:24:33,042-Speed 3405.21 samples/sec   Loss 3.4124   LearningRate 0.0097   Epoch: 13   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:24:36,052-Speed 3403.09 samples/sec   Loss 3.3987   LearningRate 0.0097   Epoch: 13   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:24:39,061-Speed 3403.22 samples/sec   Loss 3.3408   LearningRate 0.0097   Epoch: 13   Global Step: 69710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:24:42,052-Speed 3425.46 samples/sec   Loss 3.4647   LearningRate 0.0097   Epoch: 13   Global Step: 69720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:45,055-Speed 3410.74 samples/sec   Loss 3.2711   LearningRate 0.0097   Epoch: 13   Global Step: 69730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:48,068-Speed 3399.00 samples/sec   Loss 3.4836   LearningRate 0.0096   Epoch: 13   Global Step: 69740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:51,074-Speed 3407.46 samples/sec   Loss 3.4383   LearningRate 0.0096   Epoch: 13   Global Step: 69750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:54,082-Speed 3405.83 samples/sec   Loss 3.5949   LearningRate 0.0096   Epoch: 13   Global Step: 69760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:24:57,087-Speed 3408.18 samples/sec   Loss 3.4744   LearningRate 0.0096   Epoch: 13   Global Step: 69770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:00,111-Speed 3386.61 samples/sec   Loss 3.3422   LearningRate 0.0096   Epoch: 13   Global Step: 69780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:03,134-Speed 3388.79 samples/sec   Loss 3.2524   LearningRate 0.0096   Epoch: 13   Global Step: 69790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:06,157-Speed 3387.78 samples/sec   Loss 3.4285   LearningRate 0.0096   Epoch: 13   Global Step: 69800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:09,165-Speed 3405.73 samples/sec   Loss 3.4995   LearningRate 0.0096   Epoch: 13   Global Step: 69810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:12,176-Speed 3401.93 samples/sec   Loss 3.2805   LearningRate 0.0096   Epoch: 13   Global Step: 69820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:25:15,278-Speed 3302.18 samples/sec   Loss 3.4992   LearningRate 0.0096   Epoch: 13   Global Step: 69830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:25:18,299-Speed 3389.70 samples/sec   Loss 3.3235   LearningRate 0.0096   Epoch: 13   Global Step: 69840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:25:21,309-Speed 3403.16 samples/sec   Loss 3.5230   LearningRate 0.0096   Epoch: 13   Global Step: 69850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:25:24,297-Speed 3427.74 samples/sec   Loss 3.4040   LearningRate 0.0096   Epoch: 13   Global Step: 69860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:27,316-Speed 3393.11 samples/sec   Loss 3.5046   LearningRate 0.0096   Epoch: 13   Global Step: 69870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:30,341-Speed 3386.78 samples/sec   Loss 3.4744   LearningRate 0.0096   Epoch: 13   Global Step: 69880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:33,362-Speed 3389.68 samples/sec   Loss 3.4202   LearningRate 0.0096   Epoch: 13   Global Step: 69890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:36,376-Speed 3398.96 samples/sec   Loss 3.3703   LearningRate 0.0095   Epoch: 13   Global Step: 69900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:39,403-Speed 3384.63 samples/sec   Loss 3.2691   LearningRate 0.0095   Epoch: 13   Global Step: 69910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:42,431-Speed 3382.43 samples/sec   Loss 3.3287   LearningRate 0.0095   Epoch: 13   Global Step: 69920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:45,479-Speed 3359.81 samples/sec   Loss 3.3647   LearningRate 0.0095   Epoch: 13   Global Step: 69930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:48,498-Speed 3392.82 samples/sec   Loss 3.2957   LearningRate 0.0095   Epoch: 13   Global Step: 69940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:51,505-Speed 3406.32 samples/sec   Loss 3.4386   LearningRate 0.0095   Epoch: 13   Global Step: 69950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:25:54,528-Speed 3388.15 samples/sec   Loss 3.4355   LearningRate 0.0095   Epoch: 13   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:25:57,533-Speed 3407.95 samples/sec   Loss 3.3984   LearningRate 0.0095   Epoch: 13   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:26:00,542-Speed 3404.66 samples/sec   Loss 3.3940   LearningRate 0.0095   Epoch: 13   Global Step: 69980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:26:03,551-Speed 3403.90 samples/sec   Loss 3.4502   LearningRate 0.0095   Epoch: 13   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:26:06,574-Speed 3388.84 samples/sec   Loss 3.4327   LearningRate 0.0095   Epoch: 13   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:26:50,827-[lfw][70000]XNorm: 21.686980
Training: 2022-04-11 06:26:50,828-[lfw][70000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 06:26:50,828-[lfw][70000]Accuracy-Highest: 0.99850
Training: 2022-04-11 06:27:42,214-[cfp_fp][70000]XNorm: 20.941235
Training: 2022-04-11 06:27:42,215-[cfp_fp][70000]Accuracy-Flip: 0.98343+-0.00586
Training: 2022-04-11 06:27:42,215-[cfp_fp][70000]Accuracy-Highest: 0.98386
Training: 2022-04-11 06:28:26,489-[agedb_30][70000]XNorm: 21.943153
Training: 2022-04-11 06:28:26,490-[agedb_30][70000]Accuracy-Flip: 0.98417+-0.00684
Training: 2022-04-11 06:28:26,491-[agedb_30][70000]Accuracy-Highest: 0.98417
Training: 2022-04-11 06:28:29,509-Speed 71.64 samples/sec   Loss 3.3525   LearningRate 0.0095   Epoch: 13   Global Step: 70010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:28:32,494-Speed 3430.80 samples/sec   Loss 3.4274   LearningRate 0.0095   Epoch: 13   Global Step: 70020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:28:35,533-Speed 3370.16 samples/sec   Loss 3.4512   LearningRate 0.0095   Epoch: 13   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:28:38,511-Speed 3440.12 samples/sec   Loss 3.3062   LearningRate 0.0095   Epoch: 13   Global Step: 70040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:28:42,424-Speed 2616.75 samples/sec   Loss 3.3373   LearningRate 0.0095   Epoch: 13   Global Step: 70050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:28:45,995-Speed 2868.34 samples/sec   Loss 3.3845   LearningRate 0.0095   Epoch: 13   Global Step: 70060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:28:48,987-Speed 3423.05 samples/sec   Loss 3.4653   LearningRate 0.0094   Epoch: 13   Global Step: 70070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:28:51,986-Speed 3415.63 samples/sec   Loss 3.2355   LearningRate 0.0094   Epoch: 13   Global Step: 70080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:28:54,986-Speed 3414.53 samples/sec   Loss 3.5037   LearningRate 0.0094   Epoch: 13   Global Step: 70090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:28:58,004-Speed 3393.49 samples/sec   Loss 3.5106   LearningRate 0.0094   Epoch: 13   Global Step: 70100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:01,008-Speed 3409.61 samples/sec   Loss 3.4637   LearningRate 0.0094   Epoch: 13   Global Step: 70110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:04,009-Speed 3413.41 samples/sec   Loss 3.3564   LearningRate 0.0094   Epoch: 13   Global Step: 70120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:07,074-Speed 3341.90 samples/sec   Loss 3.3644   LearningRate 0.0094   Epoch: 13   Global Step: 70130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:10,075-Speed 3412.31 samples/sec   Loss 3.2700   LearningRate 0.0094   Epoch: 13   Global Step: 70140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:29:13,072-Speed 3418.05 samples/sec   Loss 3.4660   LearningRate 0.0094   Epoch: 13   Global Step: 70150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:29:16,121-Speed 3359.51 samples/sec   Loss 3.3484   LearningRate 0.0094   Epoch: 13   Global Step: 70160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:29:19,106-Speed 3431.88 samples/sec   Loss 3.5104   LearningRate 0.0094   Epoch: 13   Global Step: 70170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:22,117-Speed 3401.80 samples/sec   Loss 3.3693   LearningRate 0.0094   Epoch: 13   Global Step: 70180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:25,139-Speed 3388.64 samples/sec   Loss 3.4617   LearningRate 0.0094   Epoch: 13   Global Step: 70190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:28,238-Speed 3305.57 samples/sec   Loss 3.4122   LearningRate 0.0094   Epoch: 13   Global Step: 70200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:31,256-Speed 3394.33 samples/sec   Loss 3.3514   LearningRate 0.0094   Epoch: 13   Global Step: 70210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:34,257-Speed 3412.75 samples/sec   Loss 3.4202   LearningRate 0.0094   Epoch: 13   Global Step: 70220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:37,260-Speed 3410.99 samples/sec   Loss 3.3879   LearningRate 0.0093   Epoch: 13   Global Step: 70230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:40,265-Speed 3408.45 samples/sec   Loss 3.4821   LearningRate 0.0093   Epoch: 13   Global Step: 70240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:43,294-Speed 3381.22 samples/sec   Loss 3.4545   LearningRate 0.0093   Epoch: 13   Global Step: 70250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:46,326-Speed 3378.50 samples/sec   Loss 3.4588   LearningRate 0.0093   Epoch: 13   Global Step: 70260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:29:49,336-Speed 3402.65 samples/sec   Loss 3.4546   LearningRate 0.0093   Epoch: 13   Global Step: 70270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:29:52,350-Speed 3398.63 samples/sec   Loss 3.4021   LearningRate 0.0093   Epoch: 13   Global Step: 70280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:29:55,389-Speed 3370.44 samples/sec   Loss 3.2585   LearningRate 0.0093   Epoch: 13   Global Step: 70290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:29:58,418-Speed 3381.65 samples/sec   Loss 3.2698   LearningRate 0.0093   Epoch: 13   Global Step: 70300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:30:01,430-Speed 3401.00 samples/sec   Loss 3.4051   LearningRate 0.0093   Epoch: 13   Global Step: 70310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:30:04,454-Speed 3386.42 samples/sec   Loss 3.3824   LearningRate 0.0093   Epoch: 13   Global Step: 70320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:07,464-Speed 3403.72 samples/sec   Loss 3.3282   LearningRate 0.0093   Epoch: 13   Global Step: 70330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:10,473-Speed 3403.53 samples/sec   Loss 3.2593   LearningRate 0.0093   Epoch: 13   Global Step: 70340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:13,511-Speed 3371.45 samples/sec   Loss 3.4522   LearningRate 0.0093   Epoch: 13   Global Step: 70350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:16,536-Speed 3385.58 samples/sec   Loss 3.4700   LearningRate 0.0093   Epoch: 13   Global Step: 70360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:19,547-Speed 3401.83 samples/sec   Loss 3.3228   LearningRate 0.0093   Epoch: 13   Global Step: 70370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:22,557-Speed 3402.99 samples/sec   Loss 3.4221   LearningRate 0.0093   Epoch: 13   Global Step: 70380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:25,564-Speed 3406.21 samples/sec   Loss 3.4212   LearningRate 0.0093   Epoch: 13   Global Step: 70390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:28,590-Speed 3385.72 samples/sec   Loss 3.3742   LearningRate 0.0092   Epoch: 13   Global Step: 70400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:31,592-Speed 3411.32 samples/sec   Loss 3.4216   LearningRate 0.0092   Epoch: 13   Global Step: 70410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:34,625-Speed 3377.33 samples/sec   Loss 3.4541   LearningRate 0.0092   Epoch: 13   Global Step: 70420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:38,310-Speed 2779.39 samples/sec   Loss 3.4727   LearningRate 0.0092   Epoch: 13   Global Step: 70430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:41,373-Speed 3343.92 samples/sec   Loss 3.2822   LearningRate 0.0092   Epoch: 13   Global Step: 70440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:44,379-Speed 3406.96 samples/sec   Loss 3.2900   LearningRate 0.0092   Epoch: 13   Global Step: 70450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:47,411-Speed 3378.52 samples/sec   Loss 3.3883   LearningRate 0.0092   Epoch: 13   Global Step: 70460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:50,429-Speed 3393.03 samples/sec   Loss 3.4947   LearningRate 0.0092   Epoch: 13   Global Step: 70470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:53,449-Speed 3391.98 samples/sec   Loss 3.2689   LearningRate 0.0092   Epoch: 13   Global Step: 70480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:56,460-Speed 3401.90 samples/sec   Loss 3.2025   LearningRate 0.0092   Epoch: 13   Global Step: 70490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:30:59,468-Speed 3405.24 samples/sec   Loss 3.3504   LearningRate 0.0092   Epoch: 13   Global Step: 70500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:02,487-Speed 3393.08 samples/sec   Loss 3.4325   LearningRate 0.0092   Epoch: 13   Global Step: 70510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:05,497-Speed 3402.73 samples/sec   Loss 3.4555   LearningRate 0.0092   Epoch: 13   Global Step: 70520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:08,511-Speed 3398.68 samples/sec   Loss 3.1862   LearningRate 0.0092   Epoch: 13   Global Step: 70530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:11,531-Speed 3390.51 samples/sec   Loss 3.4441   LearningRate 0.0092   Epoch: 13   Global Step: 70540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:14,538-Speed 3407.05 samples/sec   Loss 3.3864   LearningRate 0.0092   Epoch: 13   Global Step: 70550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:17,613-Speed 3331.21 samples/sec   Loss 3.3682   LearningRate 0.0092   Epoch: 13   Global Step: 70560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:20,648-Speed 3375.03 samples/sec   Loss 3.4384   LearningRate 0.0091   Epoch: 13   Global Step: 70570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:23,659-Speed 3401.53 samples/sec   Loss 3.3434   LearningRate 0.0091   Epoch: 13   Global Step: 70580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:26,665-Speed 3407.39 samples/sec   Loss 3.2874   LearningRate 0.0091   Epoch: 13   Global Step: 70590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:29,676-Speed 3401.81 samples/sec   Loss 3.4066   LearningRate 0.0091   Epoch: 13   Global Step: 70600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:32,679-Speed 3410.78 samples/sec   Loss 3.4450   LearningRate 0.0091   Epoch: 13   Global Step: 70610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:35,683-Speed 3409.33 samples/sec   Loss 3.3985   LearningRate 0.0091   Epoch: 13   Global Step: 70620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:38,703-Speed 3392.05 samples/sec   Loss 3.2701   LearningRate 0.0091   Epoch: 13   Global Step: 70630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:41,711-Speed 3405.13 samples/sec   Loss 3.3746   LearningRate 0.0091   Epoch: 13   Global Step: 70640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:44,740-Speed 3380.83 samples/sec   Loss 3.3133   LearningRate 0.0091   Epoch: 13   Global Step: 70650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:31:47,746-Speed 3408.29 samples/sec   Loss 3.3705   LearningRate 0.0091   Epoch: 13   Global Step: 70660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:50,752-Speed 3406.55 samples/sec   Loss 3.4713   LearningRate 0.0091   Epoch: 13   Global Step: 70670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:53,762-Speed 3403.72 samples/sec   Loss 3.2927   LearningRate 0.0091   Epoch: 13   Global Step: 70680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:56,768-Speed 3407.18 samples/sec   Loss 3.3991   LearningRate 0.0091   Epoch: 13   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:31:59,785-Speed 3394.75 samples/sec   Loss 3.2460   LearningRate 0.0091   Epoch: 13   Global Step: 70700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:32:02,796-Speed 3402.25 samples/sec   Loss 3.3919   LearningRate 0.0091   Epoch: 13   Global Step: 70710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:32:05,803-Speed 3405.09 samples/sec   Loss 3.4439   LearningRate 0.0091   Epoch: 13   Global Step: 70720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:32:08,812-Speed 3405.25 samples/sec   Loss 3.5068   LearningRate 0.0090   Epoch: 13   Global Step: 70730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:32:11,810-Speed 3415.61 samples/sec   Loss 3.4229   LearningRate 0.0090   Epoch: 13   Global Step: 70740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:14,843-Speed 3377.03 samples/sec   Loss 3.3325   LearningRate 0.0090   Epoch: 13   Global Step: 70750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:17,856-Speed 3400.55 samples/sec   Loss 3.3361   LearningRate 0.0090   Epoch: 13   Global Step: 70760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:20,865-Speed 3404.43 samples/sec   Loss 3.5019   LearningRate 0.0090   Epoch: 13   Global Step: 70770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:23,870-Speed 3408.42 samples/sec   Loss 3.3333   LearningRate 0.0090   Epoch: 13   Global Step: 70780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:26,875-Speed 3408.47 samples/sec   Loss 3.3726   LearningRate 0.0090   Epoch: 13   Global Step: 70790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:29,890-Speed 3396.88 samples/sec   Loss 3.3918   LearningRate 0.0090   Epoch: 13   Global Step: 70800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:33,006-Speed 3287.45 samples/sec   Loss 3.3950   LearningRate 0.0090   Epoch: 13   Global Step: 70810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:45,080-Speed 848.21 samples/sec   Loss 2.7603   LearningRate 0.0090   Epoch: 14   Global Step: 70820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:48,117-Speed 3373.22 samples/sec   Loss 2.5704   LearningRate 0.0090   Epoch: 14   Global Step: 70830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:32:51,143-Speed 3384.14 samples/sec   Loss 2.5088   LearningRate 0.0090   Epoch: 14   Global Step: 70840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:32:54,153-Speed 3402.91 samples/sec   Loss 2.6288   LearningRate 0.0090   Epoch: 14   Global Step: 70850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:32:57,197-Speed 3365.66 samples/sec   Loss 2.5496   LearningRate 0.0090   Epoch: 14   Global Step: 70860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:00,208-Speed 3401.48 samples/sec   Loss 2.6205   LearningRate 0.0090   Epoch: 14   Global Step: 70870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:03,219-Speed 3401.74 samples/sec   Loss 2.6085   LearningRate 0.0090   Epoch: 14   Global Step: 70880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:06,227-Speed 3404.54 samples/sec   Loss 2.5965   LearningRate 0.0090   Epoch: 14   Global Step: 70890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:09,240-Speed 3400.22 samples/sec   Loss 2.5215   LearningRate 0.0089   Epoch: 14   Global Step: 70900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:12,234-Speed 3420.22 samples/sec   Loss 2.5292   LearningRate 0.0089   Epoch: 14   Global Step: 70910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:15,242-Speed 3405.20 samples/sec   Loss 2.7348   LearningRate 0.0089   Epoch: 14   Global Step: 70920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:18,270-Speed 3382.99 samples/sec   Loss 2.6194   LearningRate 0.0089   Epoch: 14   Global Step: 70930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:21,273-Speed 3410.36 samples/sec   Loss 2.6468   LearningRate 0.0089   Epoch: 14   Global Step: 70940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:24,298-Speed 3386.93 samples/sec   Loss 2.6727   LearningRate 0.0089   Epoch: 14   Global Step: 70950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:27,306-Speed 3404.53 samples/sec   Loss 2.5889   LearningRate 0.0089   Epoch: 14   Global Step: 70960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:30,312-Speed 3407.88 samples/sec   Loss 2.5864   LearningRate 0.0089   Epoch: 14   Global Step: 70970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:33,322-Speed 3403.15 samples/sec   Loss 2.5631   LearningRate 0.0089   Epoch: 14   Global Step: 70980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:36,335-Speed 3398.74 samples/sec   Loss 2.6274   LearningRate 0.0089   Epoch: 14   Global Step: 70990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:39,343-Speed 3405.52 samples/sec   Loss 2.5759   LearningRate 0.0089   Epoch: 14   Global Step: 71000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:33:42,350-Speed 3405.72 samples/sec   Loss 2.6457   LearningRate 0.0089   Epoch: 14   Global Step: 71010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:45,359-Speed 3404.91 samples/sec   Loss 2.7726   LearningRate 0.0089   Epoch: 14   Global Step: 71020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:48,369-Speed 3402.13 samples/sec   Loss 2.6460   LearningRate 0.0089   Epoch: 14   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:51,378-Speed 3403.88 samples/sec   Loss 2.6360   LearningRate 0.0089   Epoch: 14   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:54,395-Speed 3395.66 samples/sec   Loss 2.5696   LearningRate 0.0089   Epoch: 14   Global Step: 71050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:33:57,414-Speed 3392.58 samples/sec   Loss 2.6502   LearningRate 0.0089   Epoch: 14   Global Step: 71060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:00,424-Speed 3402.93 samples/sec   Loss 2.5750   LearningRate 0.0088   Epoch: 14   Global Step: 71070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:03,449-Speed 3386.26 samples/sec   Loss 2.5849   LearningRate 0.0088   Epoch: 14   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:06,463-Speed 3398.19 samples/sec   Loss 2.6675   LearningRate 0.0088   Epoch: 14   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:09,471-Speed 3405.14 samples/sec   Loss 2.7117   LearningRate 0.0088   Epoch: 14   Global Step: 71100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:12,480-Speed 3404.41 samples/sec   Loss 2.5391   LearningRate 0.0088   Epoch: 14   Global Step: 71110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:15,491-Speed 3401.81 samples/sec   Loss 2.6262   LearningRate 0.0088   Epoch: 14   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:18,501-Speed 3402.24 samples/sec   Loss 2.6352   LearningRate 0.0088   Epoch: 14   Global Step: 71130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:21,513-Speed 3401.18 samples/sec   Loss 2.7168   LearningRate 0.0088   Epoch: 14   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:24,523-Speed 3403.05 samples/sec   Loss 2.7600   LearningRate 0.0088   Epoch: 14   Global Step: 71150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:27,559-Speed 3373.69 samples/sec   Loss 2.7600   LearningRate 0.0088   Epoch: 14   Global Step: 71160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:30,616-Speed 3350.49 samples/sec   Loss 2.7235   LearningRate 0.0088   Epoch: 14   Global Step: 71170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:33,622-Speed 3407.21 samples/sec   Loss 2.6662   LearningRate 0.0088   Epoch: 14   Global Step: 71180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:36,630-Speed 3404.69 samples/sec   Loss 2.5145   LearningRate 0.0088   Epoch: 14   Global Step: 71190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:39,666-Speed 3373.50 samples/sec   Loss 2.6377   LearningRate 0.0088   Epoch: 14   Global Step: 71200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:42,666-Speed 3415.07 samples/sec   Loss 2.6860   LearningRate 0.0088   Epoch: 14   Global Step: 71210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:45,681-Speed 3397.07 samples/sec   Loss 2.7306   LearningRate 0.0088   Epoch: 14   Global Step: 71220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:48,687-Speed 3407.49 samples/sec   Loss 2.7035   LearningRate 0.0088   Epoch: 14   Global Step: 71230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:51,696-Speed 3403.82 samples/sec   Loss 2.7346   LearningRate 0.0087   Epoch: 14   Global Step: 71240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:54,707-Speed 3401.90 samples/sec   Loss 2.6800   LearningRate 0.0087   Epoch: 14   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:34:57,714-Speed 3406.06 samples/sec   Loss 2.6644   LearningRate 0.0087   Epoch: 14   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:00,728-Speed 3399.23 samples/sec   Loss 2.7357   LearningRate 0.0087   Epoch: 14   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:03,740-Speed 3400.27 samples/sec   Loss 2.6674   LearningRate 0.0087   Epoch: 14   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:06,752-Speed 3400.25 samples/sec   Loss 2.7218   LearningRate 0.0087   Epoch: 14   Global Step: 71290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:09,763-Speed 3402.32 samples/sec   Loss 2.7658   LearningRate 0.0087   Epoch: 14   Global Step: 71300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:12,792-Speed 3381.22 samples/sec   Loss 2.6193   LearningRate 0.0087   Epoch: 14   Global Step: 71310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 06:35:15,787-Speed 3420.45 samples/sec   Loss 2.6588   LearningRate 0.0087   Epoch: 14   Global Step: 71320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:18,813-Speed 3384.66 samples/sec   Loss 2.7199   LearningRate 0.0087   Epoch: 14   Global Step: 71330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:21,821-Speed 3404.96 samples/sec   Loss 2.6855   LearningRate 0.0087   Epoch: 14   Global Step: 71340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:24,847-Speed 3385.36 samples/sec   Loss 2.6532   LearningRate 0.0087   Epoch: 14   Global Step: 71350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:27,861-Speed 3398.23 samples/sec   Loss 2.7891   LearningRate 0.0087   Epoch: 14   Global Step: 71360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:30,877-Speed 3395.94 samples/sec   Loss 2.7851   LearningRate 0.0087   Epoch: 14   Global Step: 71370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:33,883-Speed 3407.06 samples/sec   Loss 2.8199   LearningRate 0.0087   Epoch: 14   Global Step: 71380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:36,900-Speed 3394.52 samples/sec   Loss 2.6052   LearningRate 0.0087   Epoch: 14   Global Step: 71390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:39,910-Speed 3403.22 samples/sec   Loss 2.7818   LearningRate 0.0087   Epoch: 14   Global Step: 71400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:42,922-Speed 3400.66 samples/sec   Loss 2.7191   LearningRate 0.0086   Epoch: 14   Global Step: 71410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:45,915-Speed 3423.22 samples/sec   Loss 2.7833   LearningRate 0.0086   Epoch: 14   Global Step: 71420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:48,925-Speed 3402.31 samples/sec   Loss 2.7678   LearningRate 0.0086   Epoch: 14   Global Step: 71430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:51,938-Speed 3399.71 samples/sec   Loss 2.7247   LearningRate 0.0086   Epoch: 14   Global Step: 71440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:54,976-Speed 3371.04 samples/sec   Loss 2.7324   LearningRate 0.0086   Epoch: 14   Global Step: 71450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:35:57,987-Speed 3401.42 samples/sec   Loss 2.7773   LearningRate 0.0086   Epoch: 14   Global Step: 71460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:00,998-Speed 3402.69 samples/sec   Loss 2.7051   LearningRate 0.0086   Epoch: 14   Global Step: 71470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:04,018-Speed 3391.36 samples/sec   Loss 2.7224   LearningRate 0.0086   Epoch: 14   Global Step: 71480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:07,033-Speed 3396.92 samples/sec   Loss 2.6998   LearningRate 0.0086   Epoch: 14   Global Step: 71490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:10,062-Speed 3380.86 samples/sec   Loss 2.7052   LearningRate 0.0086   Epoch: 14   Global Step: 71500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:13,072-Speed 3403.65 samples/sec   Loss 2.7679   LearningRate 0.0086   Epoch: 14   Global Step: 71510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:16,068-Speed 3418.51 samples/sec   Loss 2.7076   LearningRate 0.0086   Epoch: 14   Global Step: 71520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:19,078-Speed 3403.42 samples/sec   Loss 2.6315   LearningRate 0.0086   Epoch: 14   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:22,089-Speed 3401.38 samples/sec   Loss 2.6511   LearningRate 0.0086   Epoch: 14   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:25,107-Speed 3394.37 samples/sec   Loss 2.7836   LearningRate 0.0086   Epoch: 14   Global Step: 71550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:28,159-Speed 3355.31 samples/sec   Loss 2.8121   LearningRate 0.0086   Epoch: 14   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:36:31,149-Speed 3426.31 samples/sec   Loss 2.8380   LearningRate 0.0086   Epoch: 14   Global Step: 71570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:34,159-Speed 3402.09 samples/sec   Loss 2.7658   LearningRate 0.0086   Epoch: 14   Global Step: 71580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:37,168-Speed 3404.25 samples/sec   Loss 2.8937   LearningRate 0.0085   Epoch: 14   Global Step: 71590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:40,188-Speed 3391.56 samples/sec   Loss 2.8159   LearningRate 0.0085   Epoch: 14   Global Step: 71600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:43,199-Speed 3401.82 samples/sec   Loss 2.9052   LearningRate 0.0085   Epoch: 14   Global Step: 71610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:46,211-Speed 3400.56 samples/sec   Loss 2.7871   LearningRate 0.0085   Epoch: 14   Global Step: 71620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:49,221-Speed 3403.12 samples/sec   Loss 2.8111   LearningRate 0.0085   Epoch: 14   Global Step: 71630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:52,234-Speed 3399.29 samples/sec   Loss 2.7491   LearningRate 0.0085   Epoch: 14   Global Step: 71640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:55,246-Speed 3400.45 samples/sec   Loss 2.6691   LearningRate 0.0085   Epoch: 14   Global Step: 71650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:36:58,258-Speed 3400.78 samples/sec   Loss 2.8340   LearningRate 0.0085   Epoch: 14   Global Step: 71660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:01,274-Speed 3396.63 samples/sec   Loss 2.7804   LearningRate 0.0085   Epoch: 14   Global Step: 71670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:04,269-Speed 3419.97 samples/sec   Loss 2.8721   LearningRate 0.0085   Epoch: 14   Global Step: 71680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:07,297-Speed 3382.00 samples/sec   Loss 2.8942   LearningRate 0.0085   Epoch: 14   Global Step: 71690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:10,311-Speed 3398.68 samples/sec   Loss 2.7670   LearningRate 0.0085   Epoch: 14   Global Step: 71700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:13,329-Speed 3393.78 samples/sec   Loss 2.7238   LearningRate 0.0085   Epoch: 14   Global Step: 71710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:16,346-Speed 3396.19 samples/sec   Loss 2.8571   LearningRate 0.0085   Epoch: 14   Global Step: 71720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:19,359-Speed 3399.21 samples/sec   Loss 2.9309   LearningRate 0.0085   Epoch: 14   Global Step: 71730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:22,375-Speed 3395.12 samples/sec   Loss 2.8250   LearningRate 0.0085   Epoch: 14   Global Step: 71740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:25,383-Speed 3405.36 samples/sec   Loss 2.7545   LearningRate 0.0085   Epoch: 14   Global Step: 71750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:28,396-Speed 3399.36 samples/sec   Loss 2.9306   LearningRate 0.0084   Epoch: 14   Global Step: 71760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:31,416-Speed 3391.71 samples/sec   Loss 2.7978   LearningRate 0.0084   Epoch: 14   Global Step: 71770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:37:34,444-Speed 3382.53 samples/sec   Loss 2.7300   LearningRate 0.0084   Epoch: 14   Global Step: 71780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:37,458-Speed 3399.29 samples/sec   Loss 2.8398   LearningRate 0.0084   Epoch: 14   Global Step: 71790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:40,475-Speed 3394.69 samples/sec   Loss 2.8742   LearningRate 0.0084   Epoch: 14   Global Step: 71800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:43,500-Speed 3385.31 samples/sec   Loss 2.7591   LearningRate 0.0084   Epoch: 14   Global Step: 71810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:46,517-Speed 3395.84 samples/sec   Loss 2.8369   LearningRate 0.0084   Epoch: 14   Global Step: 71820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:49,533-Speed 3395.51 samples/sec   Loss 2.7794   LearningRate 0.0084   Epoch: 14   Global Step: 71830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:52,548-Speed 3397.12 samples/sec   Loss 2.8461   LearningRate 0.0084   Epoch: 14   Global Step: 71840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:55,561-Speed 3400.40 samples/sec   Loss 2.8838   LearningRate 0.0084   Epoch: 14   Global Step: 71850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:37:58,576-Speed 3396.58 samples/sec   Loss 2.9715   LearningRate 0.0084   Epoch: 14   Global Step: 71860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:38:01,576-Speed 3413.88 samples/sec   Loss 2.8153   LearningRate 0.0084   Epoch: 14   Global Step: 71870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:04,596-Speed 3392.45 samples/sec   Loss 2.9428   LearningRate 0.0084   Epoch: 14   Global Step: 71880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:07,610-Speed 3398.14 samples/sec   Loss 2.7397   LearningRate 0.0084   Epoch: 14   Global Step: 71890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:10,625-Speed 3397.28 samples/sec   Loss 2.8397   LearningRate 0.0084   Epoch: 14   Global Step: 71900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:13,639-Speed 3398.65 samples/sec   Loss 2.9608   LearningRate 0.0084   Epoch: 14   Global Step: 71910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:16,656-Speed 3394.88 samples/sec   Loss 2.7701   LearningRate 0.0084   Epoch: 14   Global Step: 71920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:19,672-Speed 3395.64 samples/sec   Loss 2.7998   LearningRate 0.0083   Epoch: 14   Global Step: 71930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:22,688-Speed 3396.23 samples/sec   Loss 2.8607   LearningRate 0.0083   Epoch: 14   Global Step: 71940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:25,706-Speed 3393.45 samples/sec   Loss 2.7260   LearningRate 0.0083   Epoch: 14   Global Step: 71950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:28,739-Speed 3377.72 samples/sec   Loss 2.9608   LearningRate 0.0083   Epoch: 14   Global Step: 71960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:38:31,757-Speed 3393.81 samples/sec   Loss 2.8359   LearningRate 0.0083   Epoch: 14   Global Step: 71970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:38:34,780-Speed 3388.25 samples/sec   Loss 2.7301   LearningRate 0.0083   Epoch: 14   Global Step: 71980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:38:37,828-Speed 3360.03 samples/sec   Loss 2.8618   LearningRate 0.0083   Epoch: 14   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:38:40,858-Speed 3381.43 samples/sec   Loss 2.8219   LearningRate 0.0083   Epoch: 14   Global Step: 72000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:39:24,919-[lfw][72000]XNorm: 22.353854
Training: 2022-04-11 06:39:24,920-[lfw][72000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 06:39:24,920-[lfw][72000]Accuracy-Highest: 0.99850
Training: 2022-04-11 06:40:16,000-[cfp_fp][72000]XNorm: 21.396975
Training: 2022-04-11 06:40:16,000-[cfp_fp][72000]Accuracy-Flip: 0.98414+-0.00608
Training: 2022-04-11 06:40:16,001-[cfp_fp][72000]Accuracy-Highest: 0.98414
Training: 2022-04-11 06:41:00,030-[agedb_30][72000]XNorm: 22.576923
Training: 2022-04-11 06:41:00,030-[agedb_30][72000]Accuracy-Flip: 0.98433+-0.00606
Training: 2022-04-11 06:41:00,031-[agedb_30][72000]Accuracy-Highest: 0.98433
Training: 2022-04-11 06:41:03,025-Speed 72.03 samples/sec   Loss 2.9244   LearningRate 0.0083   Epoch: 14   Global Step: 72010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:06,014-Speed 3426.27 samples/sec   Loss 2.8531   LearningRate 0.0083   Epoch: 14   Global Step: 72020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:09,006-Speed 3423.21 samples/sec   Loss 2.8063   LearningRate 0.0083   Epoch: 14   Global Step: 72030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:12,003-Speed 3417.86 samples/sec   Loss 2.8122   LearningRate 0.0083   Epoch: 14   Global Step: 72040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:14,994-Speed 3424.27 samples/sec   Loss 2.8469   LearningRate 0.0083   Epoch: 14   Global Step: 72050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:17,989-Speed 3419.80 samples/sec   Loss 2.7974   LearningRate 0.0083   Epoch: 14   Global Step: 72060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:20,966-Speed 3440.31 samples/sec   Loss 2.7933   LearningRate 0.0083   Epoch: 14   Global Step: 72070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:23,962-Speed 3418.97 samples/sec   Loss 2.7879   LearningRate 0.0083   Epoch: 14   Global Step: 72080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:26,979-Speed 3395.17 samples/sec   Loss 2.7830   LearningRate 0.0083   Epoch: 14   Global Step: 72090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:29,978-Speed 3416.52 samples/sec   Loss 2.7945   LearningRate 0.0083   Epoch: 14   Global Step: 72100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:32,991-Speed 3399.26 samples/sec   Loss 2.8608   LearningRate 0.0082   Epoch: 14   Global Step: 72110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:35,989-Speed 3416.61 samples/sec   Loss 2.8532   LearningRate 0.0082   Epoch: 14   Global Step: 72120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:38,989-Speed 3414.43 samples/sec   Loss 2.9175   LearningRate 0.0082   Epoch: 14   Global Step: 72130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:42,000-Speed 3400.96 samples/sec   Loss 2.7940   LearningRate 0.0082   Epoch: 14   Global Step: 72140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:45,003-Speed 3410.53 samples/sec   Loss 2.7678   LearningRate 0.0082   Epoch: 14   Global Step: 72150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:48,013-Speed 3403.47 samples/sec   Loss 2.8017   LearningRate 0.0082   Epoch: 14   Global Step: 72160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:51,018-Speed 3408.44 samples/sec   Loss 2.8494   LearningRate 0.0082   Epoch: 14   Global Step: 72170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 06:41:54,002-Speed 3433.32 samples/sec   Loss 2.9179   LearningRate 0.0082   Epoch: 14   Global Step: 72180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:41:57,008-Speed 3406.93 samples/sec   Loss 2.8197   LearningRate 0.0082   Epoch: 14   Global Step: 72190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:00,003-Speed 3419.82 samples/sec   Loss 2.8068   LearningRate 0.0082   Epoch: 14   Global Step: 72200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:03,015-Speed 3400.79 samples/sec   Loss 2.8078   LearningRate 0.0082   Epoch: 14   Global Step: 72210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:06,035-Speed 3391.57 samples/sec   Loss 2.8347   LearningRate 0.0082   Epoch: 14   Global Step: 72220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:09,045-Speed 3403.16 samples/sec   Loss 2.8844   LearningRate 0.0082   Epoch: 14   Global Step: 72230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:12,050-Speed 3408.11 samples/sec   Loss 2.8862   LearningRate 0.0082   Epoch: 14   Global Step: 72240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:15,066-Speed 3396.68 samples/sec   Loss 2.7560   LearningRate 0.0082   Epoch: 14   Global Step: 72250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:18,082-Speed 3395.46 samples/sec   Loss 2.9106   LearningRate 0.0082   Epoch: 14   Global Step: 72260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:21,096-Speed 3398.81 samples/sec   Loss 2.8782   LearningRate 0.0082   Epoch: 14   Global Step: 72270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:24,115-Speed 3392.27 samples/sec   Loss 2.9168   LearningRate 0.0082   Epoch: 14   Global Step: 72280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:27,129-Speed 3398.96 samples/sec   Loss 2.9630   LearningRate 0.0081   Epoch: 14   Global Step: 72290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:42:30,141-Speed 3400.74 samples/sec   Loss 2.8305   LearningRate 0.0081   Epoch: 14   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:33,165-Speed 3387.62 samples/sec   Loss 2.7549   LearningRate 0.0081   Epoch: 14   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:36,188-Speed 3388.10 samples/sec   Loss 2.7834   LearningRate 0.0081   Epoch: 14   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:39,243-Speed 3351.59 samples/sec   Loss 2.8455   LearningRate 0.0081   Epoch: 14   Global Step: 72330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:42,282-Speed 3370.91 samples/sec   Loss 2.8008   LearningRate 0.0081   Epoch: 14   Global Step: 72340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:45,289-Speed 3406.20 samples/sec   Loss 2.8710   LearningRate 0.0081   Epoch: 14   Global Step: 72350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:48,310-Speed 3390.69 samples/sec   Loss 2.8587   LearningRate 0.0081   Epoch: 14   Global Step: 72360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:51,320-Speed 3403.61 samples/sec   Loss 2.8208   LearningRate 0.0081   Epoch: 14   Global Step: 72370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:54,330-Speed 3401.97 samples/sec   Loss 2.9367   LearningRate 0.0081   Epoch: 14   Global Step: 72380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:42:57,337-Speed 3406.56 samples/sec   Loss 2.8852   LearningRate 0.0081   Epoch: 14   Global Step: 72390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:00,331-Speed 3421.74 samples/sec   Loss 2.7627   LearningRate 0.0081   Epoch: 14   Global Step: 72400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:03,342-Speed 3401.24 samples/sec   Loss 2.7745   LearningRate 0.0081   Epoch: 14   Global Step: 72410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:06,352-Speed 3403.10 samples/sec   Loss 2.9264   LearningRate 0.0081   Epoch: 14   Global Step: 72420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:09,362-Speed 3401.79 samples/sec   Loss 2.8712   LearningRate 0.0081   Epoch: 14   Global Step: 72430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:12,383-Speed 3390.91 samples/sec   Loss 2.9704   LearningRate 0.0081   Epoch: 14   Global Step: 72440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:15,402-Speed 3392.81 samples/sec   Loss 2.9426   LearningRate 0.0081   Epoch: 14   Global Step: 72450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:18,417-Speed 3397.65 samples/sec   Loss 2.8874   LearningRate 0.0080   Epoch: 14   Global Step: 72460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:21,431-Speed 3398.46 samples/sec   Loss 2.8405   LearningRate 0.0080   Epoch: 14   Global Step: 72470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:24,437-Speed 3406.57 samples/sec   Loss 2.8920   LearningRate 0.0080   Epoch: 14   Global Step: 72480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:27,448-Speed 3401.98 samples/sec   Loss 2.7864   LearningRate 0.0080   Epoch: 14   Global Step: 72490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:30,439-Speed 3424.81 samples/sec   Loss 2.7839   LearningRate 0.0080   Epoch: 14   Global Step: 72500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:33,450-Speed 3400.93 samples/sec   Loss 2.8947   LearningRate 0.0080   Epoch: 14   Global Step: 72510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:36,465-Speed 3397.80 samples/sec   Loss 2.8763   LearningRate 0.0080   Epoch: 14   Global Step: 72520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:39,473-Speed 3405.24 samples/sec   Loss 2.8356   LearningRate 0.0080   Epoch: 14   Global Step: 72530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:42,480-Speed 3405.92 samples/sec   Loss 2.8958   LearningRate 0.0080   Epoch: 14   Global Step: 72540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:43:45,470-Speed 3425.96 samples/sec   Loss 2.9140   LearningRate 0.0080   Epoch: 14   Global Step: 72550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:43:48,483-Speed 3399.05 samples/sec   Loss 2.8785   LearningRate 0.0080   Epoch: 14   Global Step: 72560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:43:51,515-Speed 3378.24 samples/sec   Loss 3.0178   LearningRate 0.0080   Epoch: 14   Global Step: 72570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:43:54,522-Speed 3406.38 samples/sec   Loss 2.7822   LearningRate 0.0080   Epoch: 14   Global Step: 72580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:43:57,529-Speed 3406.62 samples/sec   Loss 2.8011   LearningRate 0.0080   Epoch: 14   Global Step: 72590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:00,540-Speed 3401.46 samples/sec   Loss 3.0326   LearningRate 0.0080   Epoch: 14   Global Step: 72600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:03,545-Speed 3408.41 samples/sec   Loss 2.7411   LearningRate 0.0080   Epoch: 14   Global Step: 72610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:06,555-Speed 3402.97 samples/sec   Loss 2.9755   LearningRate 0.0080   Epoch: 14   Global Step: 72620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:09,582-Speed 3383.29 samples/sec   Loss 2.8251   LearningRate 0.0080   Epoch: 14   Global Step: 72630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:12,591-Speed 3404.48 samples/sec   Loss 2.8583   LearningRate 0.0079   Epoch: 14   Global Step: 72640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:15,627-Speed 3373.96 samples/sec   Loss 2.9547   LearningRate 0.0079   Epoch: 14   Global Step: 72650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:44:18,644-Speed 3395.14 samples/sec   Loss 2.8787   LearningRate 0.0079   Epoch: 14   Global Step: 72660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:44:21,653-Speed 3404.08 samples/sec   Loss 2.7908   LearningRate 0.0079   Epoch: 14   Global Step: 72670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:44:24,663-Speed 3402.62 samples/sec   Loss 2.9469   LearningRate 0.0079   Epoch: 14   Global Step: 72680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:44:27,678-Speed 3397.39 samples/sec   Loss 2.8864   LearningRate 0.0079   Epoch: 14   Global Step: 72690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:44:30,689-Speed 3401.15 samples/sec   Loss 2.8898   LearningRate 0.0079   Epoch: 14   Global Step: 72700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:44:33,701-Speed 3401.48 samples/sec   Loss 2.9290   LearningRate 0.0079   Epoch: 14   Global Step: 72710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:44:36,698-Speed 3417.54 samples/sec   Loss 2.9110   LearningRate 0.0079   Epoch: 14   Global Step: 72720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:39,713-Speed 3396.87 samples/sec   Loss 2.8880   LearningRate 0.0079   Epoch: 14   Global Step: 72730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:42,728-Speed 3397.79 samples/sec   Loss 2.8957   LearningRate 0.0079   Epoch: 14   Global Step: 72740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:45,741-Speed 3398.67 samples/sec   Loss 2.8715   LearningRate 0.0079   Epoch: 14   Global Step: 72750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:48,755-Speed 3398.85 samples/sec   Loss 2.8817   LearningRate 0.0079   Epoch: 14   Global Step: 72760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:51,769-Speed 3398.28 samples/sec   Loss 2.8674   LearningRate 0.0079   Epoch: 14   Global Step: 72770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:54,792-Speed 3388.78 samples/sec   Loss 2.8947   LearningRate 0.0079   Epoch: 14   Global Step: 72780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:44:57,811-Speed 3392.18 samples/sec   Loss 2.8589   LearningRate 0.0079   Epoch: 14   Global Step: 72790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:45:00,833-Speed 3389.70 samples/sec   Loss 2.8702   LearningRate 0.0079   Epoch: 14   Global Step: 72800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:45:03,848-Speed 3396.73 samples/sec   Loss 2.8667   LearningRate 0.0079   Epoch: 14   Global Step: 72810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:45:06,864-Speed 3395.95 samples/sec   Loss 2.8850   LearningRate 0.0078   Epoch: 14   Global Step: 72820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:09,877-Speed 3400.20 samples/sec   Loss 2.8735   LearningRate 0.0078   Epoch: 14   Global Step: 72830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:12,892-Speed 3397.29 samples/sec   Loss 2.8551   LearningRate 0.0078   Epoch: 14   Global Step: 72840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:15,912-Speed 3391.96 samples/sec   Loss 2.8474   LearningRate 0.0078   Epoch: 14   Global Step: 72850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:18,940-Speed 3381.89 samples/sec   Loss 2.8688   LearningRate 0.0078   Epoch: 14   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:21,968-Speed 3382.75 samples/sec   Loss 2.9351   LearningRate 0.0078   Epoch: 14   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:24,988-Speed 3392.43 samples/sec   Loss 2.9820   LearningRate 0.0078   Epoch: 14   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:28,000-Speed 3399.99 samples/sec   Loss 2.9940   LearningRate 0.0078   Epoch: 14   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:31,020-Speed 3391.63 samples/sec   Loss 2.8809   LearningRate 0.0078   Epoch: 14   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:34,039-Speed 3392.95 samples/sec   Loss 2.9479   LearningRate 0.0078   Epoch: 14   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:37,036-Speed 3417.52 samples/sec   Loss 2.9412   LearningRate 0.0078   Epoch: 14   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:40,053-Speed 3395.33 samples/sec   Loss 2.9110   LearningRate 0.0078   Epoch: 14   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:43,071-Speed 3393.45 samples/sec   Loss 2.9213   LearningRate 0.0078   Epoch: 14   Global Step: 72940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:46,095-Speed 3387.82 samples/sec   Loss 2.9554   LearningRate 0.0078   Epoch: 14   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:49,180-Speed 3320.25 samples/sec   Loss 2.8667   LearningRate 0.0078   Epoch: 14   Global Step: 72960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:52,202-Speed 3388.53 samples/sec   Loss 2.9779   LearningRate 0.0078   Epoch: 14   Global Step: 72970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:55,214-Speed 3401.10 samples/sec   Loss 2.9376   LearningRate 0.0078   Epoch: 14   Global Step: 72980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:45:58,230-Speed 3396.02 samples/sec   Loss 2.9594   LearningRate 0.0078   Epoch: 14   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:01,255-Speed 3386.39 samples/sec   Loss 2.9763   LearningRate 0.0077   Epoch: 14   Global Step: 73000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:04,270-Speed 3396.64 samples/sec   Loss 2.9815   LearningRate 0.0077   Epoch: 14   Global Step: 73010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:07,261-Speed 3424.93 samples/sec   Loss 2.9077   LearningRate 0.0077   Epoch: 14   Global Step: 73020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:10,279-Speed 3392.82 samples/sec   Loss 2.7500   LearningRate 0.0077   Epoch: 14   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:13,293-Speed 3399.12 samples/sec   Loss 2.8931   LearningRate 0.0077   Epoch: 14   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:16,319-Speed 3384.56 samples/sec   Loss 2.8959   LearningRate 0.0077   Epoch: 14   Global Step: 73050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:19,336-Speed 3395.20 samples/sec   Loss 2.9172   LearningRate 0.0077   Epoch: 14   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:22,329-Speed 3422.29 samples/sec   Loss 2.9129   LearningRate 0.0077   Epoch: 14   Global Step: 73070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:25,368-Speed 3370.78 samples/sec   Loss 2.9622   LearningRate 0.0077   Epoch: 14   Global Step: 73080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:28,421-Speed 3354.90 samples/sec   Loss 3.0007   LearningRate 0.0077   Epoch: 14   Global Step: 73090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:31,436-Speed 3397.19 samples/sec   Loss 2.9719   LearningRate 0.0077   Epoch: 14   Global Step: 73100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:34,453-Speed 3394.85 samples/sec   Loss 2.9555   LearningRate 0.0077   Epoch: 14   Global Step: 73110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:37,505-Speed 3355.24 samples/sec   Loss 2.9142   LearningRate 0.0077   Epoch: 14   Global Step: 73120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:40,519-Speed 3398.97 samples/sec   Loss 2.9630   LearningRate 0.0077   Epoch: 14   Global Step: 73130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:43,537-Speed 3393.42 samples/sec   Loss 2.9073   LearningRate 0.0077   Epoch: 14   Global Step: 73140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:46,553-Speed 3397.05 samples/sec   Loss 2.8642   LearningRate 0.0077   Epoch: 14   Global Step: 73150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:49,561-Speed 3404.09 samples/sec   Loss 2.8637   LearningRate 0.0077   Epoch: 14   Global Step: 73160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:46:52,572-Speed 3402.13 samples/sec   Loss 2.9695   LearningRate 0.0077   Epoch: 14   Global Step: 73170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:55,579-Speed 3405.82 samples/sec   Loss 2.9757   LearningRate 0.0077   Epoch: 14   Global Step: 73180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:46:58,594-Speed 3398.12 samples/sec   Loss 2.9082   LearningRate 0.0076   Epoch: 14   Global Step: 73190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:47:01,588-Speed 3420.44 samples/sec   Loss 2.8390   LearningRate 0.0076   Epoch: 14   Global Step: 73200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:04,599-Speed 3401.60 samples/sec   Loss 2.9668   LearningRate 0.0076   Epoch: 14   Global Step: 73210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:07,622-Speed 3388.22 samples/sec   Loss 2.8557   LearningRate 0.0076   Epoch: 14   Global Step: 73220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:10,634-Speed 3400.72 samples/sec   Loss 3.0347   LearningRate 0.0076   Epoch: 14   Global Step: 73230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:13,643-Speed 3403.82 samples/sec   Loss 2.9177   LearningRate 0.0076   Epoch: 14   Global Step: 73240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:16,679-Speed 3373.63 samples/sec   Loss 2.8446   LearningRate 0.0076   Epoch: 14   Global Step: 73250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:19,692-Speed 3400.00 samples/sec   Loss 2.8169   LearningRate 0.0076   Epoch: 14   Global Step: 73260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:22,707-Speed 3397.03 samples/sec   Loss 2.8907   LearningRate 0.0076   Epoch: 14   Global Step: 73270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:25,722-Speed 3396.77 samples/sec   Loss 2.9274   LearningRate 0.0076   Epoch: 14   Global Step: 73280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:28,746-Speed 3387.31 samples/sec   Loss 2.8532   LearningRate 0.0076   Epoch: 14   Global Step: 73290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:31,759-Speed 3400.03 samples/sec   Loss 2.9562   LearningRate 0.0076   Epoch: 14   Global Step: 73300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:47:34,781-Speed 3389.21 samples/sec   Loss 2.8580   LearningRate 0.0076   Epoch: 14   Global Step: 73310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:47:37,785-Speed 3409.07 samples/sec   Loss 2.9172   LearningRate 0.0076   Epoch: 14   Global Step: 73320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:40,799-Speed 3398.36 samples/sec   Loss 2.8860   LearningRate 0.0076   Epoch: 14   Global Step: 73330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:43,816-Speed 3394.98 samples/sec   Loss 2.8538   LearningRate 0.0076   Epoch: 14   Global Step: 73340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:46,830-Speed 3399.27 samples/sec   Loss 2.8753   LearningRate 0.0076   Epoch: 14   Global Step: 73350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:49,844-Speed 3398.11 samples/sec   Loss 2.8402   LearningRate 0.0076   Epoch: 14   Global Step: 73360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:52,864-Speed 3391.40 samples/sec   Loss 2.9042   LearningRate 0.0075   Epoch: 14   Global Step: 73370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:55,877-Speed 3399.39 samples/sec   Loss 2.9677   LearningRate 0.0075   Epoch: 14   Global Step: 73380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:47:58,885-Speed 3404.65 samples/sec   Loss 2.8588   LearningRate 0.0075   Epoch: 14   Global Step: 73390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:48:01,898-Speed 3400.26 samples/sec   Loss 2.9168   LearningRate 0.0075   Epoch: 14   Global Step: 73400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:48:04,924-Speed 3384.40 samples/sec   Loss 2.8963   LearningRate 0.0075   Epoch: 14   Global Step: 73410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:48:07,946-Speed 3389.08 samples/sec   Loss 2.9556   LearningRate 0.0075   Epoch: 14   Global Step: 73420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:10,957-Speed 3401.66 samples/sec   Loss 2.9405   LearningRate 0.0075   Epoch: 14   Global Step: 73430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:13,976-Speed 3393.27 samples/sec   Loss 2.8982   LearningRate 0.0075   Epoch: 14   Global Step: 73440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:16,989-Speed 3399.18 samples/sec   Loss 2.9033   LearningRate 0.0075   Epoch: 14   Global Step: 73450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:20,000-Speed 3401.43 samples/sec   Loss 2.9164   LearningRate 0.0075   Epoch: 14   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:23,017-Speed 3395.59 samples/sec   Loss 3.0018   LearningRate 0.0075   Epoch: 14   Global Step: 73470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:26,073-Speed 3351.17 samples/sec   Loss 2.9803   LearningRate 0.0075   Epoch: 14   Global Step: 73480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:29,089-Speed 3396.17 samples/sec   Loss 2.8193   LearningRate 0.0075   Epoch: 14   Global Step: 73490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:32,103-Speed 3398.97 samples/sec   Loss 2.8346   LearningRate 0.0075   Epoch: 14   Global Step: 73500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:35,121-Speed 3393.48 samples/sec   Loss 2.9596   LearningRate 0.0075   Epoch: 14   Global Step: 73510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:38,162-Speed 3368.69 samples/sec   Loss 2.9199   LearningRate 0.0075   Epoch: 14   Global Step: 73520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:41,210-Speed 3359.53 samples/sec   Loss 3.0005   LearningRate 0.0075   Epoch: 14   Global Step: 73530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:44,238-Speed 3382.42 samples/sec   Loss 2.8602   LearningRate 0.0075   Epoch: 14   Global Step: 73540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:47,266-Speed 3383.43 samples/sec   Loss 2.9996   LearningRate 0.0074   Epoch: 14   Global Step: 73550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:50,280-Speed 3398.68 samples/sec   Loss 2.9527   LearningRate 0.0074   Epoch: 14   Global Step: 73560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:53,297-Speed 3394.71 samples/sec   Loss 2.9052   LearningRate 0.0074   Epoch: 14   Global Step: 73570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:56,306-Speed 3404.01 samples/sec   Loss 2.9780   LearningRate 0.0074   Epoch: 14   Global Step: 73580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:48:59,317-Speed 3401.68 samples/sec   Loss 3.0063   LearningRate 0.0074   Epoch: 14   Global Step: 73590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:49:02,329-Speed 3400.86 samples/sec   Loss 2.8376   LearningRate 0.0074   Epoch: 14   Global Step: 73600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:49:05,322-Speed 3421.10 samples/sec   Loss 2.9247   LearningRate 0.0074   Epoch: 14   Global Step: 73610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:08,333-Speed 3402.19 samples/sec   Loss 2.7632   LearningRate 0.0074   Epoch: 14   Global Step: 73620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:11,345-Speed 3400.87 samples/sec   Loss 3.1004   LearningRate 0.0074   Epoch: 14   Global Step: 73630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:14,354-Speed 3404.32 samples/sec   Loss 2.9149   LearningRate 0.0074   Epoch: 14   Global Step: 73640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:17,374-Speed 3391.68 samples/sec   Loss 2.9591   LearningRate 0.0074   Epoch: 14   Global Step: 73650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:20,391-Speed 3394.04 samples/sec   Loss 2.9571   LearningRate 0.0074   Epoch: 14   Global Step: 73660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:23,411-Speed 3392.59 samples/sec   Loss 2.8295   LearningRate 0.0074   Epoch: 14   Global Step: 73670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:26,425-Speed 3397.33 samples/sec   Loss 2.9417   LearningRate 0.0074   Epoch: 14   Global Step: 73680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:29,437-Speed 3400.86 samples/sec   Loss 2.9018   LearningRate 0.0074   Epoch: 14   Global Step: 73690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:32,449-Speed 3400.38 samples/sec   Loss 2.9150   LearningRate 0.0074   Epoch: 14   Global Step: 73700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:35,469-Speed 3391.79 samples/sec   Loss 2.9020   LearningRate 0.0074   Epoch: 14   Global Step: 73710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:49:38,491-Speed 3389.13 samples/sec   Loss 2.9045   LearningRate 0.0074   Epoch: 14   Global Step: 73720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:49:41,504-Speed 3399.82 samples/sec   Loss 2.9823   LearningRate 0.0074   Epoch: 14   Global Step: 73730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:49:44,497-Speed 3422.70 samples/sec   Loss 2.9659   LearningRate 0.0073   Epoch: 14   Global Step: 73740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:47,518-Speed 3389.86 samples/sec   Loss 2.9963   LearningRate 0.0073   Epoch: 14   Global Step: 73750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:50,537-Speed 3392.90 samples/sec   Loss 2.8619   LearningRate 0.0073   Epoch: 14   Global Step: 73760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:53,556-Speed 3393.08 samples/sec   Loss 2.9852   LearningRate 0.0073   Epoch: 14   Global Step: 73770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:56,567-Speed 3401.91 samples/sec   Loss 3.0306   LearningRate 0.0073   Epoch: 14   Global Step: 73780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:49:59,579-Speed 3400.13 samples/sec   Loss 2.8977   LearningRate 0.0073   Epoch: 14   Global Step: 73790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:50:02,597-Speed 3394.31 samples/sec   Loss 2.8861   LearningRate 0.0073   Epoch: 14   Global Step: 73800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:50:05,606-Speed 3403.13 samples/sec   Loss 2.8214   LearningRate 0.0073   Epoch: 14   Global Step: 73810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:50:08,619-Speed 3399.61 samples/sec   Loss 2.8269   LearningRate 0.0073   Epoch: 14   Global Step: 73820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:50:11,644-Speed 3386.06 samples/sec   Loss 2.8292   LearningRate 0.0073   Epoch: 14   Global Step: 73830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:50:14,705-Speed 3346.04 samples/sec   Loss 3.0183   LearningRate 0.0073   Epoch: 14   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:17,722-Speed 3395.52 samples/sec   Loss 2.9888   LearningRate 0.0073   Epoch: 14   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:20,735-Speed 3399.01 samples/sec   Loss 2.8698   LearningRate 0.0073   Epoch: 14   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:23,745-Speed 3403.55 samples/sec   Loss 2.9168   LearningRate 0.0073   Epoch: 14   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:26,763-Speed 3393.55 samples/sec   Loss 2.9686   LearningRate 0.0073   Epoch: 14   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:29,779-Speed 3395.71 samples/sec   Loss 2.8405   LearningRate 0.0073   Epoch: 14   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:32,797-Speed 3394.72 samples/sec   Loss 2.9510   LearningRate 0.0073   Epoch: 14   Global Step: 73900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:35,809-Speed 3399.47 samples/sec   Loss 2.7893   LearningRate 0.0073   Epoch: 14   Global Step: 73910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:38,822-Speed 3400.24 samples/sec   Loss 2.9040   LearningRate 0.0073   Epoch: 14   Global Step: 73920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:41,834-Speed 3401.07 samples/sec   Loss 2.9824   LearningRate 0.0072   Epoch: 14   Global Step: 73930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:44,829-Speed 3418.85 samples/sec   Loss 2.9196   LearningRate 0.0072   Epoch: 14   Global Step: 73940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:47,838-Speed 3404.40 samples/sec   Loss 2.9419   LearningRate 0.0072   Epoch: 14   Global Step: 73950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:50,852-Speed 3398.77 samples/sec   Loss 2.8949   LearningRate 0.0072   Epoch: 14   Global Step: 73960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:53,867-Speed 3397.44 samples/sec   Loss 2.9218   LearningRate 0.0072   Epoch: 14   Global Step: 73970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:56,880-Speed 3399.01 samples/sec   Loss 2.8489   LearningRate 0.0072   Epoch: 14   Global Step: 73980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:50:59,922-Speed 3366.37 samples/sec   Loss 3.0105   LearningRate 0.0072   Epoch: 14   Global Step: 73990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:51:02,937-Speed 3397.21 samples/sec   Loss 2.8678   LearningRate 0.0072   Epoch: 14   Global Step: 74000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:51:47,251-[lfw][74000]XNorm: 21.648780
Training: 2022-04-11 06:51:47,252-[lfw][74000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 06:51:47,253-[lfw][74000]Accuracy-Highest: 0.99850
Training: 2022-04-11 06:52:38,720-[cfp_fp][74000]XNorm: 20.810559
Training: 2022-04-11 06:52:38,721-[cfp_fp][74000]Accuracy-Flip: 0.98186+-0.00582
Training: 2022-04-11 06:52:38,721-[cfp_fp][74000]Accuracy-Highest: 0.98414
Training: 2022-04-11 06:53:23,088-[agedb_30][74000]XNorm: 21.834063
Training: 2022-04-11 06:53:23,089-[agedb_30][74000]Accuracy-Flip: 0.98383+-0.00597
Training: 2022-04-11 06:53:23,090-[agedb_30][74000]Accuracy-Highest: 0.98433
Training: 2022-04-11 06:53:26,093-Speed 71.53 samples/sec   Loss 2.9331   LearningRate 0.0072   Epoch: 14   Global Step: 74010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:53:29,087-Speed 3421.97 samples/sec   Loss 2.9823   LearningRate 0.0072   Epoch: 14   Global Step: 74020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:53:32,075-Speed 3427.78 samples/sec   Loss 2.9226   LearningRate 0.0072   Epoch: 14   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:53:35,069-Speed 3421.50 samples/sec   Loss 2.9922   LearningRate 0.0072   Epoch: 14   Global Step: 74040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 06:53:38,041-Speed 3445.98 samples/sec   Loss 2.9290   LearningRate 0.0072   Epoch: 14   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:53:41,033-Speed 3423.25 samples/sec   Loss 2.9030   LearningRate 0.0072   Epoch: 14   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:53:44,009-Speed 3441.89 samples/sec   Loss 2.9249   LearningRate 0.0072   Epoch: 14   Global Step: 74070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:53:47,003-Speed 3420.91 samples/sec   Loss 3.0250   LearningRate 0.0072   Epoch: 14   Global Step: 74080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:53:50,046-Speed 3365.73 samples/sec   Loss 2.8446   LearningRate 0.0072   Epoch: 14   Global Step: 74090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:53:53,058-Speed 3400.80 samples/sec   Loss 2.9727   LearningRate 0.0072   Epoch: 14   Global Step: 74100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:53:56,056-Speed 3416.50 samples/sec   Loss 2.9037   LearningRate 0.0072   Epoch: 14   Global Step: 74110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:53:59,069-Speed 3398.91 samples/sec   Loss 2.8753   LearningRate 0.0071   Epoch: 14   Global Step: 74120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:02,109-Speed 3369.32 samples/sec   Loss 2.8705   LearningRate 0.0071   Epoch: 14   Global Step: 74130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:05,110-Speed 3413.21 samples/sec   Loss 2.9130   LearningRate 0.0071   Epoch: 14   Global Step: 74140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:08,113-Speed 3411.18 samples/sec   Loss 2.9099   LearningRate 0.0071   Epoch: 14   Global Step: 74150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:11,113-Speed 3414.07 samples/sec   Loss 2.8892   LearningRate 0.0071   Epoch: 14   Global Step: 74160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:14,114-Speed 3412.65 samples/sec   Loss 2.9778   LearningRate 0.0071   Epoch: 14   Global Step: 74170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:54:17,116-Speed 3412.06 samples/sec   Loss 2.8044   LearningRate 0.0071   Epoch: 14   Global Step: 74180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:54:20,104-Speed 3427.89 samples/sec   Loss 2.9155   LearningRate 0.0071   Epoch: 14   Global Step: 74190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:23,109-Speed 3408.36 samples/sec   Loss 2.9493   LearningRate 0.0071   Epoch: 14   Global Step: 74200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:26,108-Speed 3416.18 samples/sec   Loss 2.9475   LearningRate 0.0071   Epoch: 14   Global Step: 74210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:29,109-Speed 3413.13 samples/sec   Loss 2.8975   LearningRate 0.0071   Epoch: 14   Global Step: 74220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:32,113-Speed 3409.49 samples/sec   Loss 2.9692   LearningRate 0.0071   Epoch: 14   Global Step: 74230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:35,118-Speed 3408.69 samples/sec   Loss 2.8594   LearningRate 0.0071   Epoch: 14   Global Step: 74240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:38,118-Speed 3414.26 samples/sec   Loss 2.9143   LearningRate 0.0071   Epoch: 14   Global Step: 74250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:41,123-Speed 3408.11 samples/sec   Loss 2.9483   LearningRate 0.0071   Epoch: 14   Global Step: 74260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:44,125-Speed 3412.68 samples/sec   Loss 2.8865   LearningRate 0.0071   Epoch: 14   Global Step: 74270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:47,132-Speed 3405.87 samples/sec   Loss 2.9679   LearningRate 0.0071   Epoch: 14   Global Step: 74280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:54:50,136-Speed 3410.00 samples/sec   Loss 2.8530   LearningRate 0.0071   Epoch: 14   Global Step: 74290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:54:53,184-Speed 3360.03 samples/sec   Loss 2.9678   LearningRate 0.0071   Epoch: 14   Global Step: 74300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:54:56,184-Speed 3413.56 samples/sec   Loss 3.0412   LearningRate 0.0070   Epoch: 14   Global Step: 74310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:54:59,188-Speed 3409.92 samples/sec   Loss 2.8266   LearningRate 0.0070   Epoch: 14   Global Step: 74320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:02,195-Speed 3406.36 samples/sec   Loss 2.9828   LearningRate 0.0070   Epoch: 14   Global Step: 74330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:05,199-Speed 3409.85 samples/sec   Loss 2.8736   LearningRate 0.0070   Epoch: 14   Global Step: 74340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:08,201-Speed 3411.80 samples/sec   Loss 2.9047   LearningRate 0.0070   Epoch: 14   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:11,201-Speed 3414.20 samples/sec   Loss 2.9547   LearningRate 0.0070   Epoch: 14   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:14,210-Speed 3403.78 samples/sec   Loss 2.9393   LearningRate 0.0070   Epoch: 14   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:17,229-Speed 3393.44 samples/sec   Loss 2.8294   LearningRate 0.0070   Epoch: 14   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:20,207-Speed 3438.61 samples/sec   Loss 2.9467   LearningRate 0.0070   Epoch: 14   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:23,212-Speed 3408.84 samples/sec   Loss 2.8446   LearningRate 0.0070   Epoch: 14   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:26,223-Speed 3402.49 samples/sec   Loss 3.0385   LearningRate 0.0070   Epoch: 14   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:29,235-Speed 3400.37 samples/sec   Loss 2.8899   LearningRate 0.0070   Epoch: 14   Global Step: 74420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:32,235-Speed 3414.26 samples/sec   Loss 2.8707   LearningRate 0.0070   Epoch: 14   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:35,246-Speed 3401.04 samples/sec   Loss 2.8504   LearningRate 0.0070   Epoch: 14   Global Step: 74440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:38,257-Speed 3402.22 samples/sec   Loss 2.8410   LearningRate 0.0070   Epoch: 14   Global Step: 74450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:41,263-Speed 3406.87 samples/sec   Loss 2.8588   LearningRate 0.0070   Epoch: 14   Global Step: 74460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:44,269-Speed 3407.82 samples/sec   Loss 2.9460   LearningRate 0.0070   Epoch: 14   Global Step: 74470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:47,272-Speed 3410.21 samples/sec   Loss 2.9575   LearningRate 0.0070   Epoch: 14   Global Step: 74480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:50,250-Speed 3439.37 samples/sec   Loss 2.9365   LearningRate 0.0070   Epoch: 14   Global Step: 74490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:53,263-Speed 3400.32 samples/sec   Loss 2.8238   LearningRate 0.0069   Epoch: 14   Global Step: 74500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:56,266-Speed 3410.45 samples/sec   Loss 2.8252   LearningRate 0.0069   Epoch: 14   Global Step: 74510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:55:59,270-Speed 3409.94 samples/sec   Loss 2.9275   LearningRate 0.0069   Epoch: 14   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:02,273-Speed 3410.14 samples/sec   Loss 2.9011   LearningRate 0.0069   Epoch: 14   Global Step: 74530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:05,284-Speed 3402.81 samples/sec   Loss 2.8705   LearningRate 0.0069   Epoch: 14   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:08,288-Speed 3408.52 samples/sec   Loss 2.8002   LearningRate 0.0069   Epoch: 14   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:11,299-Speed 3401.84 samples/sec   Loss 2.9771   LearningRate 0.0069   Epoch: 14   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:14,313-Speed 3398.97 samples/sec   Loss 2.8863   LearningRate 0.0069   Epoch: 14   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:17,319-Speed 3407.29 samples/sec   Loss 2.8677   LearningRate 0.0069   Epoch: 14   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:20,304-Speed 3430.82 samples/sec   Loss 3.0011   LearningRate 0.0069   Epoch: 14   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:23,309-Speed 3408.65 samples/sec   Loss 2.9017   LearningRate 0.0069   Epoch: 14   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:26,311-Speed 3412.53 samples/sec   Loss 2.8405   LearningRate 0.0069   Epoch: 14   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:29,324-Speed 3399.58 samples/sec   Loss 2.8360   LearningRate 0.0069   Epoch: 14   Global Step: 74620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:32,332-Speed 3404.99 samples/sec   Loss 2.7976   LearningRate 0.0069   Epoch: 14   Global Step: 74630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:35,354-Speed 3389.69 samples/sec   Loss 2.8155   LearningRate 0.0069   Epoch: 14   Global Step: 74640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:56:38,348-Speed 3420.41 samples/sec   Loss 2.9649   LearningRate 0.0069   Epoch: 14   Global Step: 74650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:56:41,356-Speed 3405.13 samples/sec   Loss 2.9475   LearningRate 0.0069   Epoch: 14   Global Step: 74660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:56:44,361-Speed 3408.71 samples/sec   Loss 2.9425   LearningRate 0.0069   Epoch: 14   Global Step: 74670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:56:47,367-Speed 3407.07 samples/sec   Loss 2.8186   LearningRate 0.0069   Epoch: 14   Global Step: 74680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:56:50,373-Speed 3408.65 samples/sec   Loss 2.8245   LearningRate 0.0068   Epoch: 14   Global Step: 74690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:56:53,384-Speed 3400.60 samples/sec   Loss 2.8696   LearningRate 0.0068   Epoch: 14   Global Step: 74700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:56:56,390-Speed 3407.77 samples/sec   Loss 2.9915   LearningRate 0.0068   Epoch: 14   Global Step: 74710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:56:59,392-Speed 3412.47 samples/sec   Loss 2.9206   LearningRate 0.0068   Epoch: 14   Global Step: 74720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:57:02,403-Speed 3400.74 samples/sec   Loss 2.9234   LearningRate 0.0068   Epoch: 14   Global Step: 74730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:57:05,416-Speed 3400.36 samples/sec   Loss 2.8810   LearningRate 0.0068   Epoch: 14   Global Step: 74740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:57:08,423-Speed 3405.38 samples/sec   Loss 2.7871   LearningRate 0.0068   Epoch: 14   Global Step: 74750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:11,440-Speed 3395.39 samples/sec   Loss 2.7942   LearningRate 0.0068   Epoch: 14   Global Step: 74760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:14,444-Speed 3409.15 samples/sec   Loss 2.9021   LearningRate 0.0068   Epoch: 14   Global Step: 74770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:17,459-Speed 3397.62 samples/sec   Loss 2.9491   LearningRate 0.0068   Epoch: 14   Global Step: 74780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:20,470-Speed 3401.87 samples/sec   Loss 2.7418   LearningRate 0.0068   Epoch: 14   Global Step: 74790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:23,488-Speed 3393.72 samples/sec   Loss 2.9784   LearningRate 0.0068   Epoch: 14   Global Step: 74800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:26,499-Speed 3402.16 samples/sec   Loss 2.8002   LearningRate 0.0068   Epoch: 14   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:29,508-Speed 3403.54 samples/sec   Loss 3.0106   LearningRate 0.0068   Epoch: 14   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:32,515-Speed 3405.94 samples/sec   Loss 2.8174   LearningRate 0.0068   Epoch: 14   Global Step: 74830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:35,531-Speed 3396.91 samples/sec   Loss 2.8442   LearningRate 0.0068   Epoch: 14   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:38,538-Speed 3405.92 samples/sec   Loss 2.8295   LearningRate 0.0068   Epoch: 14   Global Step: 74850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 06:57:41,537-Speed 3414.84 samples/sec   Loss 2.9278   LearningRate 0.0068   Epoch: 14   Global Step: 74860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:44,544-Speed 3407.64 samples/sec   Loss 2.8721   LearningRate 0.0068   Epoch: 14   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:47,556-Speed 3399.88 samples/sec   Loss 2.9042   LearningRate 0.0067   Epoch: 14   Global Step: 74880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:50,561-Speed 3409.13 samples/sec   Loss 2.6974   LearningRate 0.0067   Epoch: 14   Global Step: 74890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:53,593-Speed 3377.49 samples/sec   Loss 2.8649   LearningRate 0.0067   Epoch: 14   Global Step: 74900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:56,601-Speed 3404.90 samples/sec   Loss 2.7739   LearningRate 0.0067   Epoch: 14   Global Step: 74910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:57:59,612-Speed 3402.32 samples/sec   Loss 2.8291   LearningRate 0.0067   Epoch: 14   Global Step: 74920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:02,684-Speed 3333.81 samples/sec   Loss 2.9279   LearningRate 0.0067   Epoch: 14   Global Step: 74930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:05,677-Speed 3422.09 samples/sec   Loss 3.0540   LearningRate 0.0067   Epoch: 14   Global Step: 74940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:08,689-Speed 3401.07 samples/sec   Loss 2.7891   LearningRate 0.0067   Epoch: 14   Global Step: 74950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:11,698-Speed 3403.81 samples/sec   Loss 2.8527   LearningRate 0.0067   Epoch: 14   Global Step: 74960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:14,707-Speed 3405.08 samples/sec   Loss 2.9161   LearningRate 0.0067   Epoch: 14   Global Step: 74970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:17,722-Speed 3397.66 samples/sec   Loss 2.9920   LearningRate 0.0067   Epoch: 14   Global Step: 74980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:20,730-Speed 3405.21 samples/sec   Loss 2.9278   LearningRate 0.0067   Epoch: 14   Global Step: 74990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:23,739-Speed 3403.32 samples/sec   Loss 2.8903   LearningRate 0.0067   Epoch: 14   Global Step: 75000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:26,763-Speed 3387.01 samples/sec   Loss 2.8610   LearningRate 0.0067   Epoch: 14   Global Step: 75010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:29,771-Speed 3405.52 samples/sec   Loss 2.7613   LearningRate 0.0067   Epoch: 14   Global Step: 75020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:32,785-Speed 3398.46 samples/sec   Loss 2.9182   LearningRate 0.0067   Epoch: 14   Global Step: 75030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:58:35,808-Speed 3387.70 samples/sec   Loss 2.8775   LearningRate 0.0067   Epoch: 14   Global Step: 75040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:38,833-Speed 3386.97 samples/sec   Loss 2.7964   LearningRate 0.0067   Epoch: 14   Global Step: 75050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:41,840-Speed 3405.17 samples/sec   Loss 2.8985   LearningRate 0.0067   Epoch: 14   Global Step: 75060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:44,865-Speed 3387.10 samples/sec   Loss 2.8866   LearningRate 0.0067   Epoch: 14   Global Step: 75070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:47,882-Speed 3394.89 samples/sec   Loss 2.8526   LearningRate 0.0066   Epoch: 14   Global Step: 75080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:50,893-Speed 3401.89 samples/sec   Loss 2.8234   LearningRate 0.0066   Epoch: 14   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:53,906-Speed 3398.80 samples/sec   Loss 2.9293   LearningRate 0.0066   Epoch: 14   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:56,923-Speed 3394.18 samples/sec   Loss 2.9077   LearningRate 0.0066   Epoch: 14   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:58:59,938-Speed 3397.59 samples/sec   Loss 2.8720   LearningRate 0.0066   Epoch: 14   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:02,982-Speed 3364.68 samples/sec   Loss 2.9363   LearningRate 0.0066   Epoch: 14   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:05,987-Speed 3409.18 samples/sec   Loss 2.9142   LearningRate 0.0066   Epoch: 14   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:09,013-Speed 3384.50 samples/sec   Loss 2.8866   LearningRate 0.0066   Epoch: 14   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:12,024-Speed 3401.54 samples/sec   Loss 2.8349   LearningRate 0.0066   Epoch: 14   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:15,038-Speed 3398.43 samples/sec   Loss 3.0091   LearningRate 0.0066   Epoch: 14   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:18,053-Speed 3397.35 samples/sec   Loss 2.8644   LearningRate 0.0066   Epoch: 14   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:21,064-Speed 3401.71 samples/sec   Loss 2.8872   LearningRate 0.0066   Epoch: 14   Global Step: 75190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:24,068-Speed 3410.01 samples/sec   Loss 2.8228   LearningRate 0.0066   Epoch: 14   Global Step: 75200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:27,076-Speed 3404.28 samples/sec   Loss 2.7496   LearningRate 0.0066   Epoch: 14   Global Step: 75210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:30,094-Speed 3394.25 samples/sec   Loss 2.7860   LearningRate 0.0066   Epoch: 14   Global Step: 75220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:33,110-Speed 3395.74 samples/sec   Loss 2.9371   LearningRate 0.0066   Epoch: 14   Global Step: 75230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:36,139-Speed 3382.34 samples/sec   Loss 2.8352   LearningRate 0.0066   Epoch: 14   Global Step: 75240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:39,167-Speed 3382.84 samples/sec   Loss 2.7633   LearningRate 0.0066   Epoch: 14   Global Step: 75250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:42,184-Speed 3394.58 samples/sec   Loss 2.9637   LearningRate 0.0066   Epoch: 14   Global Step: 75260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:45,211-Speed 3383.42 samples/sec   Loss 2.9351   LearningRate 0.0066   Epoch: 14   Global Step: 75270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:48,224-Speed 3399.74 samples/sec   Loss 2.8995   LearningRate 0.0065   Epoch: 14   Global Step: 75280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:51,237-Speed 3399.63 samples/sec   Loss 2.9237   LearningRate 0.0065   Epoch: 14   Global Step: 75290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 06:59:54,252-Speed 3397.11 samples/sec   Loss 2.9082   LearningRate 0.0065   Epoch: 14   Global Step: 75300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 06:59:57,264-Speed 3400.08 samples/sec   Loss 2.9735   LearningRate 0.0065   Epoch: 14   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:00,275-Speed 3402.32 samples/sec   Loss 2.8548   LearningRate 0.0065   Epoch: 14   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:03,289-Speed 3398.59 samples/sec   Loss 2.8586   LearningRate 0.0065   Epoch: 14   Global Step: 75330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:06,303-Speed 3398.34 samples/sec   Loss 2.9811   LearningRate 0.0065   Epoch: 14   Global Step: 75340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:09,316-Speed 3399.10 samples/sec   Loss 2.8751   LearningRate 0.0065   Epoch: 14   Global Step: 75350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:12,341-Speed 3386.40 samples/sec   Loss 2.9166   LearningRate 0.0065   Epoch: 14   Global Step: 75360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:15,355-Speed 3398.77 samples/sec   Loss 2.9808   LearningRate 0.0065   Epoch: 14   Global Step: 75370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:18,396-Speed 3367.29 samples/sec   Loss 2.7980   LearningRate 0.0065   Epoch: 14   Global Step: 75380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:21,406-Speed 3403.10 samples/sec   Loss 2.7875   LearningRate 0.0065   Epoch: 14   Global Step: 75390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:24,404-Speed 3416.30 samples/sec   Loss 2.9856   LearningRate 0.0065   Epoch: 14   Global Step: 75400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:27,423-Speed 3392.60 samples/sec   Loss 2.8523   LearningRate 0.0065   Epoch: 14   Global Step: 75410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:30,436-Speed 3400.10 samples/sec   Loss 2.8055   LearningRate 0.0065   Epoch: 14   Global Step: 75420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:33,446-Speed 3403.25 samples/sec   Loss 2.9352   LearningRate 0.0065   Epoch: 14   Global Step: 75430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:36,457-Speed 3401.83 samples/sec   Loss 2.8658   LearningRate 0.0065   Epoch: 14   Global Step: 75440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:39,543-Speed 3318.16 samples/sec   Loss 2.8422   LearningRate 0.0065   Epoch: 14   Global Step: 75450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:42,639-Speed 3307.89 samples/sec   Loss 2.9183   LearningRate 0.0065   Epoch: 14   Global Step: 75460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:45,673-Speed 3376.58 samples/sec   Loss 2.7708   LearningRate 0.0064   Epoch: 14   Global Step: 75470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:48,709-Speed 3373.66 samples/sec   Loss 2.8030   LearningRate 0.0064   Epoch: 14   Global Step: 75480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:51,728-Speed 3392.91 samples/sec   Loss 2.8962   LearningRate 0.0064   Epoch: 14   Global Step: 75490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:54,735-Speed 3406.37 samples/sec   Loss 2.9365   LearningRate 0.0064   Epoch: 14   Global Step: 75500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:00:57,751-Speed 3395.84 samples/sec   Loss 2.7947   LearningRate 0.0064   Epoch: 14   Global Step: 75510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:00,768-Speed 3395.73 samples/sec   Loss 2.8954   LearningRate 0.0064   Epoch: 14   Global Step: 75520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:03,785-Speed 3393.83 samples/sec   Loss 2.7685   LearningRate 0.0064   Epoch: 14   Global Step: 75530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:06,795-Speed 3403.86 samples/sec   Loss 2.8373   LearningRate 0.0064   Epoch: 14   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:09,806-Speed 3401.05 samples/sec   Loss 2.7867   LearningRate 0.0064   Epoch: 14   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:12,817-Speed 3401.82 samples/sec   Loss 2.8577   LearningRate 0.0064   Epoch: 14   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:15,851-Speed 3375.35 samples/sec   Loss 2.8552   LearningRate 0.0064   Epoch: 14   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:18,865-Speed 3398.93 samples/sec   Loss 2.9295   LearningRate 0.0064   Epoch: 14   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:21,880-Speed 3396.66 samples/sec   Loss 2.7502   LearningRate 0.0064   Epoch: 14   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:24,877-Speed 3418.30 samples/sec   Loss 2.8956   LearningRate 0.0064   Epoch: 14   Global Step: 75600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:27,894-Speed 3394.30 samples/sec   Loss 2.8682   LearningRate 0.0064   Epoch: 14   Global Step: 75610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:30,907-Speed 3399.72 samples/sec   Loss 2.8283   LearningRate 0.0064   Epoch: 14   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:33,919-Speed 3400.68 samples/sec   Loss 2.8846   LearningRate 0.0064   Epoch: 14   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:36,938-Speed 3392.88 samples/sec   Loss 2.8218   LearningRate 0.0064   Epoch: 14   Global Step: 75640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:39,965-Speed 3383.38 samples/sec   Loss 2.7970   LearningRate 0.0064   Epoch: 14   Global Step: 75650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:42,981-Speed 3396.29 samples/sec   Loss 2.7596   LearningRate 0.0064   Epoch: 14   Global Step: 75660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:45,993-Speed 3401.13 samples/sec   Loss 2.9084   LearningRate 0.0063   Epoch: 14   Global Step: 75670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:49,008-Speed 3396.90 samples/sec   Loss 2.9413   LearningRate 0.0063   Epoch: 14   Global Step: 75680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:52,027-Speed 3392.72 samples/sec   Loss 2.8878   LearningRate 0.0063   Epoch: 14   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:55,046-Speed 3392.60 samples/sec   Loss 2.8692   LearningRate 0.0063   Epoch: 14   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:01:58,062-Speed 3396.26 samples/sec   Loss 2.8906   LearningRate 0.0063   Epoch: 14   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:01,079-Speed 3395.49 samples/sec   Loss 2.8951   LearningRate 0.0063   Epoch: 14   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:04,098-Speed 3391.49 samples/sec   Loss 2.8284   LearningRate 0.0063   Epoch: 14   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:07,117-Speed 3392.44 samples/sec   Loss 2.9389   LearningRate 0.0063   Epoch: 14   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:10,160-Speed 3366.51 samples/sec   Loss 2.8537   LearningRate 0.0063   Epoch: 14   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:13,175-Speed 3396.86 samples/sec   Loss 2.8743   LearningRate 0.0063   Epoch: 14   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:16,206-Speed 3380.08 samples/sec   Loss 3.0263   LearningRate 0.0063   Epoch: 14   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:19,224-Speed 3393.09 samples/sec   Loss 2.8311   LearningRate 0.0063   Epoch: 14   Global Step: 75780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:22,233-Speed 3404.14 samples/sec   Loss 2.8246   LearningRate 0.0063   Epoch: 14   Global Step: 75790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:25,230-Speed 3418.55 samples/sec   Loss 2.9242   LearningRate 0.0063   Epoch: 14   Global Step: 75800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:28,244-Speed 3397.75 samples/sec   Loss 2.8649   LearningRate 0.0063   Epoch: 14   Global Step: 75810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:02:31,241-Speed 3418.14 samples/sec   Loss 2.8988   LearningRate 0.0063   Epoch: 14   Global Step: 75820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:02:34,251-Speed 3402.05 samples/sec   Loss 2.8399   LearningRate 0.0063   Epoch: 14   Global Step: 75830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:02:37,266-Speed 3397.04 samples/sec   Loss 2.7793   LearningRate 0.0063   Epoch: 14   Global Step: 75840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:02:40,288-Speed 3389.79 samples/sec   Loss 2.8571   LearningRate 0.0063   Epoch: 14   Global Step: 75850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:02:43,374-Speed 3319.09 samples/sec   Loss 2.7944   LearningRate 0.0063   Epoch: 14   Global Step: 75860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:02:46,386-Speed 3400.89 samples/sec   Loss 2.8666   LearningRate 0.0063   Epoch: 14   Global Step: 75870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:03:00,385-Speed 731.54 samples/sec   Loss 2.0833   LearningRate 0.0062   Epoch: 15   Global Step: 75880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:03:03,476-Speed 3314.40 samples/sec   Loss 2.1095   LearningRate 0.0062   Epoch: 15   Global Step: 75890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:03:06,659-Speed 3217.75 samples/sec   Loss 2.1342   LearningRate 0.0062   Epoch: 15   Global Step: 75900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:03:09,671-Speed 3400.54 samples/sec   Loss 2.1609   LearningRate 0.0062   Epoch: 15   Global Step: 75910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:03:12,696-Speed 3387.09 samples/sec   Loss 2.0920   LearningRate 0.0062   Epoch: 15   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:15,746-Speed 3357.65 samples/sec   Loss 2.0930   LearningRate 0.0062   Epoch: 15   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:18,793-Speed 3361.77 samples/sec   Loss 2.0624   LearningRate 0.0062   Epoch: 15   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:21,829-Speed 3373.82 samples/sec   Loss 1.9988   LearningRate 0.0062   Epoch: 15   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:24,859-Speed 3381.22 samples/sec   Loss 2.1097   LearningRate 0.0062   Epoch: 15   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:27,876-Speed 3394.65 samples/sec   Loss 2.2103   LearningRate 0.0062   Epoch: 15   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:30,896-Speed 3391.31 samples/sec   Loss 1.9270   LearningRate 0.0062   Epoch: 15   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:33,905-Speed 3404.02 samples/sec   Loss 2.1127   LearningRate 0.0062   Epoch: 15   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:03:36,929-Speed 3388.32 samples/sec   Loss 2.0209   LearningRate 0.0062   Epoch: 15   Global Step: 76000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:04:21,363-[lfw][76000]XNorm: 21.568427
Training: 2022-04-11 07:04:21,364-[lfw][76000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 07:04:21,364-[lfw][76000]Accuracy-Highest: 0.99850
Training: 2022-04-11 07:05:12,824-[cfp_fp][76000]XNorm: 20.738903
Training: 2022-04-11 07:05:12,825-[cfp_fp][76000]Accuracy-Flip: 0.98400+-0.00510
Training: 2022-04-11 07:05:12,826-[cfp_fp][76000]Accuracy-Highest: 0.98414
Training: 2022-04-11 07:05:56,846-[agedb_30][76000]XNorm: 21.965160
Training: 2022-04-11 07:05:56,847-[agedb_30][76000]Accuracy-Flip: 0.98467+-0.00686
Training: 2022-04-11 07:05:56,847-[agedb_30][76000]Accuracy-Highest: 0.98467
Training: 2022-04-11 07:05:59,856-Speed 71.65 samples/sec   Loss 2.0340   LearningRate 0.0062   Epoch: 15   Global Step: 76010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:02,851-Speed 3420.36 samples/sec   Loss 2.1386   LearningRate 0.0062   Epoch: 15   Global Step: 76020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:05,868-Speed 3394.30 samples/sec   Loss 2.1095   LearningRate 0.0062   Epoch: 15   Global Step: 76030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:08,912-Speed 3365.59 samples/sec   Loss 2.0769   LearningRate 0.0062   Epoch: 15   Global Step: 76040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:11,918-Speed 3407.03 samples/sec   Loss 2.1753   LearningRate 0.0062   Epoch: 15   Global Step: 76050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:14,947-Speed 3381.67 samples/sec   Loss 2.1641   LearningRate 0.0062   Epoch: 15   Global Step: 76060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:17,944-Speed 3418.02 samples/sec   Loss 2.0547   LearningRate 0.0062   Epoch: 15   Global Step: 76070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:20,946-Speed 3411.89 samples/sec   Loss 2.2085   LearningRate 0.0061   Epoch: 15   Global Step: 76080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:23,957-Speed 3402.19 samples/sec   Loss 2.1282   LearningRate 0.0061   Epoch: 15   Global Step: 76090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:26,992-Speed 3375.21 samples/sec   Loss 2.2945   LearningRate 0.0061   Epoch: 15   Global Step: 76100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:29,993-Speed 3412.61 samples/sec   Loss 2.2733   LearningRate 0.0061   Epoch: 15   Global Step: 76110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:32,984-Speed 3424.87 samples/sec   Loss 2.1702   LearningRate 0.0061   Epoch: 15   Global Step: 76120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:35,992-Speed 3404.65 samples/sec   Loss 2.0808   LearningRate 0.0061   Epoch: 15   Global Step: 76130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:06:39,001-Speed 3405.40 samples/sec   Loss 2.0937   LearningRate 0.0061   Epoch: 15   Global Step: 76140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:06:42,029-Speed 3382.16 samples/sec   Loss 2.1284   LearningRate 0.0061   Epoch: 15   Global Step: 76150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:06:45,036-Speed 3406.37 samples/sec   Loss 2.0839   LearningRate 0.0061   Epoch: 15   Global Step: 76160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:06:48,051-Speed 3397.46 samples/sec   Loss 2.1165   LearningRate 0.0061   Epoch: 15   Global Step: 76170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:06:51,057-Speed 3407.54 samples/sec   Loss 2.1820   LearningRate 0.0061   Epoch: 15   Global Step: 76180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:06:54,086-Speed 3381.49 samples/sec   Loss 2.2147   LearningRate 0.0061   Epoch: 15   Global Step: 76190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:06:57,096-Speed 3402.57 samples/sec   Loss 2.1303   LearningRate 0.0061   Epoch: 15   Global Step: 76200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:07:00,102-Speed 3406.90 samples/sec   Loss 2.2449   LearningRate 0.0061   Epoch: 15   Global Step: 76210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:07:03,114-Speed 3401.62 samples/sec   Loss 2.1456   LearningRate 0.0061   Epoch: 15   Global Step: 76220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:07:06,121-Speed 3406.15 samples/sec   Loss 2.1285   LearningRate 0.0061   Epoch: 15   Global Step: 76230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:07:09,135-Speed 3398.93 samples/sec   Loss 2.1925   LearningRate 0.0061   Epoch: 15   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:12,158-Speed 3388.33 samples/sec   Loss 2.1780   LearningRate 0.0061   Epoch: 15   Global Step: 76250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:15,167-Speed 3403.62 samples/sec   Loss 2.1291   LearningRate 0.0061   Epoch: 15   Global Step: 76260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:18,175-Speed 3405.33 samples/sec   Loss 2.0570   LearningRate 0.0061   Epoch: 15   Global Step: 76270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:21,185-Speed 3403.25 samples/sec   Loss 2.1734   LearningRate 0.0060   Epoch: 15   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:24,192-Speed 3407.14 samples/sec   Loss 2.1773   LearningRate 0.0060   Epoch: 15   Global Step: 76290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:27,204-Speed 3400.50 samples/sec   Loss 2.2078   LearningRate 0.0060   Epoch: 15   Global Step: 76300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:30,211-Speed 3405.53 samples/sec   Loss 2.1205   LearningRate 0.0060   Epoch: 15   Global Step: 76310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:33,215-Speed 3410.96 samples/sec   Loss 2.2474   LearningRate 0.0060   Epoch: 15   Global Step: 76320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:36,221-Speed 3406.88 samples/sec   Loss 2.2300   LearningRate 0.0060   Epoch: 15   Global Step: 76330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:39,229-Speed 3405.45 samples/sec   Loss 2.1565   LearningRate 0.0060   Epoch: 15   Global Step: 76340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:42,357-Speed 3274.75 samples/sec   Loss 2.1384   LearningRate 0.0060   Epoch: 15   Global Step: 76350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:45,363-Speed 3406.80 samples/sec   Loss 2.2640   LearningRate 0.0060   Epoch: 15   Global Step: 76360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:48,384-Speed 3390.72 samples/sec   Loss 2.1167   LearningRate 0.0060   Epoch: 15   Global Step: 76370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:51,413-Speed 3381.45 samples/sec   Loss 2.2159   LearningRate 0.0060   Epoch: 15   Global Step: 76380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:54,420-Speed 3406.44 samples/sec   Loss 2.1124   LearningRate 0.0060   Epoch: 15   Global Step: 76390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:07:57,427-Speed 3406.58 samples/sec   Loss 2.2328   LearningRate 0.0060   Epoch: 15   Global Step: 76400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:00,455-Speed 3382.20 samples/sec   Loss 2.2954   LearningRate 0.0060   Epoch: 15   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:03,497-Speed 3367.66 samples/sec   Loss 2.1584   LearningRate 0.0060   Epoch: 15   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:06,515-Speed 3393.19 samples/sec   Loss 2.2790   LearningRate 0.0060   Epoch: 15   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:09,500-Speed 3432.50 samples/sec   Loss 2.2063   LearningRate 0.0060   Epoch: 15   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:12,506-Speed 3407.74 samples/sec   Loss 2.2682   LearningRate 0.0060   Epoch: 15   Global Step: 76450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:15,535-Speed 3381.15 samples/sec   Loss 2.1822   LearningRate 0.0060   Epoch: 15   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:18,545-Speed 3402.81 samples/sec   Loss 2.2337   LearningRate 0.0060   Epoch: 15   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:21,559-Speed 3398.69 samples/sec   Loss 2.2166   LearningRate 0.0060   Epoch: 15   Global Step: 76480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:24,559-Speed 3414.81 samples/sec   Loss 2.2149   LearningRate 0.0059   Epoch: 15   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:27,567-Speed 3404.49 samples/sec   Loss 2.2466   LearningRate 0.0059   Epoch: 15   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:30,716-Speed 3252.90 samples/sec   Loss 2.2285   LearningRate 0.0059   Epoch: 15   Global Step: 76510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:33,717-Speed 3414.42 samples/sec   Loss 2.2537   LearningRate 0.0059   Epoch: 15   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:36,727-Speed 3402.65 samples/sec   Loss 2.2889   LearningRate 0.0059   Epoch: 15   Global Step: 76530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:39,728-Speed 3414.25 samples/sec   Loss 2.2384   LearningRate 0.0059   Epoch: 15   Global Step: 76540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 07:08:42,729-Speed 3412.33 samples/sec   Loss 2.2584   LearningRate 0.0059   Epoch: 15   Global Step: 76550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:45,733-Speed 3410.11 samples/sec   Loss 2.1739   LearningRate 0.0059   Epoch: 15   Global Step: 76560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:48,739-Speed 3406.93 samples/sec   Loss 2.1692   LearningRate 0.0059   Epoch: 15   Global Step: 76570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:51,748-Speed 3404.56 samples/sec   Loss 2.2389   LearningRate 0.0059   Epoch: 15   Global Step: 76580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:54,759-Speed 3401.96 samples/sec   Loss 2.2116   LearningRate 0.0059   Epoch: 15   Global Step: 76590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:08:57,782-Speed 3388.58 samples/sec   Loss 2.2567   LearningRate 0.0059   Epoch: 15   Global Step: 76600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:00,792-Speed 3402.56 samples/sec   Loss 2.2108   LearningRate 0.0059   Epoch: 15   Global Step: 76610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:03,805-Speed 3399.90 samples/sec   Loss 2.3144   LearningRate 0.0059   Epoch: 15   Global Step: 76620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:06,858-Speed 3354.51 samples/sec   Loss 2.2366   LearningRate 0.0059   Epoch: 15   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:09,884-Speed 3385.39 samples/sec   Loss 2.1827   LearningRate 0.0059   Epoch: 15   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:12,904-Speed 3392.06 samples/sec   Loss 2.2326   LearningRate 0.0059   Epoch: 15   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:15,927-Speed 3387.82 samples/sec   Loss 2.3014   LearningRate 0.0059   Epoch: 15   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:18,926-Speed 3416.61 samples/sec   Loss 2.1651   LearningRate 0.0059   Epoch: 15   Global Step: 76670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:21,930-Speed 3409.25 samples/sec   Loss 2.2054   LearningRate 0.0059   Epoch: 15   Global Step: 76680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:24,938-Speed 3404.97 samples/sec   Loss 2.2745   LearningRate 0.0059   Epoch: 15   Global Step: 76690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:27,950-Speed 3400.70 samples/sec   Loss 2.2693   LearningRate 0.0058   Epoch: 15   Global Step: 76700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:30,954-Speed 3409.64 samples/sec   Loss 2.2262   LearningRate 0.0058   Epoch: 15   Global Step: 76710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:33,964-Speed 3402.99 samples/sec   Loss 2.2862   LearningRate 0.0058   Epoch: 15   Global Step: 76720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:37,010-Speed 3362.61 samples/sec   Loss 2.1810   LearningRate 0.0058   Epoch: 15   Global Step: 76730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:40,021-Speed 3402.59 samples/sec   Loss 2.2868   LearningRate 0.0058   Epoch: 15   Global Step: 76740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:43,026-Speed 3408.35 samples/sec   Loss 2.2347   LearningRate 0.0058   Epoch: 15   Global Step: 76750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:46,033-Speed 3406.63 samples/sec   Loss 2.3627   LearningRate 0.0058   Epoch: 15   Global Step: 76760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 07:09:49,042-Speed 3404.12 samples/sec   Loss 2.2756   LearningRate 0.0058   Epoch: 15   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:52,048-Speed 3408.02 samples/sec   Loss 2.3267   LearningRate 0.0058   Epoch: 15   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:55,057-Speed 3404.54 samples/sec   Loss 2.2930   LearningRate 0.0058   Epoch: 15   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:09:58,062-Speed 3408.12 samples/sec   Loss 2.2742   LearningRate 0.0058   Epoch: 15   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:01,145-Speed 3322.35 samples/sec   Loss 2.2906   LearningRate 0.0058   Epoch: 15   Global Step: 76810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:04,156-Speed 3401.95 samples/sec   Loss 2.3226   LearningRate 0.0058   Epoch: 15   Global Step: 76820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:07,163-Speed 3406.17 samples/sec   Loss 2.1935   LearningRate 0.0058   Epoch: 15   Global Step: 76830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:10,169-Speed 3408.08 samples/sec   Loss 2.2873   LearningRate 0.0058   Epoch: 15   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:13,174-Speed 3407.92 samples/sec   Loss 2.2900   LearningRate 0.0058   Epoch: 15   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:16,229-Speed 3353.09 samples/sec   Loss 2.3007   LearningRate 0.0058   Epoch: 15   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:19,227-Speed 3416.41 samples/sec   Loss 2.1862   LearningRate 0.0058   Epoch: 15   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:22,234-Speed 3406.67 samples/sec   Loss 2.2880   LearningRate 0.0058   Epoch: 15   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:25,241-Speed 3406.85 samples/sec   Loss 2.2769   LearningRate 0.0058   Epoch: 15   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:28,252-Speed 3401.25 samples/sec   Loss 2.3336   LearningRate 0.0058   Epoch: 15   Global Step: 76900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:31,281-Speed 3381.89 samples/sec   Loss 2.2408   LearningRate 0.0057   Epoch: 15   Global Step: 76910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 07:10:34,308-Speed 3384.28 samples/sec   Loss 2.3494   LearningRate 0.0057   Epoch: 15   Global Step: 76920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:37,319-Speed 3401.83 samples/sec   Loss 2.2182   LearningRate 0.0057   Epoch: 15   Global Step: 76930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:40,327-Speed 3404.85 samples/sec   Loss 2.3595   LearningRate 0.0057   Epoch: 15   Global Step: 76940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:43,367-Speed 3369.73 samples/sec   Loss 2.2548   LearningRate 0.0057   Epoch: 15   Global Step: 76950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:46,373-Speed 3407.74 samples/sec   Loss 2.1869   LearningRate 0.0057   Epoch: 15   Global Step: 76960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:49,366-Speed 3421.89 samples/sec   Loss 2.2701   LearningRate 0.0057   Epoch: 15   Global Step: 76970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:52,383-Speed 3395.15 samples/sec   Loss 2.2621   LearningRate 0.0057   Epoch: 15   Global Step: 76980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:55,406-Speed 3388.18 samples/sec   Loss 2.3154   LearningRate 0.0057   Epoch: 15   Global Step: 76990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:10:58,410-Speed 3409.38 samples/sec   Loss 2.3535   LearningRate 0.0057   Epoch: 15   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:01,424-Speed 3398.54 samples/sec   Loss 2.3280   LearningRate 0.0057   Epoch: 15   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:04,456-Speed 3378.42 samples/sec   Loss 2.2923   LearningRate 0.0057   Epoch: 15   Global Step: 77020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:07,471-Speed 3397.83 samples/sec   Loss 2.2836   LearningRate 0.0057   Epoch: 15   Global Step: 77030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:10,478-Speed 3405.73 samples/sec   Loss 2.3525   LearningRate 0.0057   Epoch: 15   Global Step: 77040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:13,493-Speed 3397.42 samples/sec   Loss 2.4476   LearningRate 0.0057   Epoch: 15   Global Step: 77050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:16,528-Speed 3375.47 samples/sec   Loss 2.2550   LearningRate 0.0057   Epoch: 15   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:19,513-Speed 3431.03 samples/sec   Loss 2.3316   LearningRate 0.0057   Epoch: 15   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:22,520-Speed 3406.41 samples/sec   Loss 2.3185   LearningRate 0.0057   Epoch: 15   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:25,533-Speed 3399.81 samples/sec   Loss 2.2533   LearningRate 0.0057   Epoch: 15   Global Step: 77090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:28,544-Speed 3401.44 samples/sec   Loss 2.2326   LearningRate 0.0057   Epoch: 15   Global Step: 77100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:31,555-Speed 3402.49 samples/sec   Loss 2.3154   LearningRate 0.0057   Epoch: 15   Global Step: 77110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:34,564-Speed 3404.36 samples/sec   Loss 2.3499   LearningRate 0.0056   Epoch: 15   Global Step: 77120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:37,581-Speed 3394.47 samples/sec   Loss 2.3028   LearningRate 0.0056   Epoch: 15   Global Step: 77130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:40,590-Speed 3404.10 samples/sec   Loss 2.1977   LearningRate 0.0056   Epoch: 15   Global Step: 77140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:43,603-Speed 3399.39 samples/sec   Loss 2.3164   LearningRate 0.0056   Epoch: 15   Global Step: 77150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:46,610-Speed 3406.79 samples/sec   Loss 2.3504   LearningRate 0.0056   Epoch: 15   Global Step: 77160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:49,617-Speed 3406.19 samples/sec   Loss 2.3262   LearningRate 0.0056   Epoch: 15   Global Step: 77170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:52,628-Speed 3401.94 samples/sec   Loss 2.3169   LearningRate 0.0056   Epoch: 15   Global Step: 77180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:55,637-Speed 3403.40 samples/sec   Loss 2.2854   LearningRate 0.0056   Epoch: 15   Global Step: 77190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:11:58,630-Speed 3422.51 samples/sec   Loss 2.3057   LearningRate 0.0056   Epoch: 15   Global Step: 77200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:01,641-Speed 3401.27 samples/sec   Loss 2.3719   LearningRate 0.0056   Epoch: 15   Global Step: 77210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:04,650-Speed 3404.27 samples/sec   Loss 2.3397   LearningRate 0.0056   Epoch: 15   Global Step: 77220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:07,656-Speed 3408.07 samples/sec   Loss 2.3481   LearningRate 0.0056   Epoch: 15   Global Step: 77230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:10,675-Speed 3392.43 samples/sec   Loss 2.2948   LearningRate 0.0056   Epoch: 15   Global Step: 77240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:13,698-Speed 3388.07 samples/sec   Loss 2.3208   LearningRate 0.0056   Epoch: 15   Global Step: 77250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:16,714-Speed 3396.90 samples/sec   Loss 2.3662   LearningRate 0.0056   Epoch: 15   Global Step: 77260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:19,727-Speed 3399.88 samples/sec   Loss 2.3024   LearningRate 0.0056   Epoch: 15   Global Step: 77270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:22,739-Speed 3400.63 samples/sec   Loss 2.2655   LearningRate 0.0056   Epoch: 15   Global Step: 77280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:25,749-Speed 3402.38 samples/sec   Loss 2.2438   LearningRate 0.0056   Epoch: 15   Global Step: 77290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:12:28,769-Speed 3391.70 samples/sec   Loss 2.4208   LearningRate 0.0056   Epoch: 15   Global Step: 77300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:31,776-Speed 3407.05 samples/sec   Loss 2.2915   LearningRate 0.0056   Epoch: 15   Global Step: 77310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:34,789-Speed 3399.61 samples/sec   Loss 2.2442   LearningRate 0.0056   Epoch: 15   Global Step: 77320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:37,804-Speed 3396.63 samples/sec   Loss 2.4188   LearningRate 0.0055   Epoch: 15   Global Step: 77330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:40,839-Speed 3375.18 samples/sec   Loss 2.3442   LearningRate 0.0055   Epoch: 15   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:43,859-Speed 3391.27 samples/sec   Loss 2.4186   LearningRate 0.0055   Epoch: 15   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:46,866-Speed 3406.62 samples/sec   Loss 2.3365   LearningRate 0.0055   Epoch: 15   Global Step: 77360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:49,882-Speed 3396.18 samples/sec   Loss 2.2135   LearningRate 0.0055   Epoch: 15   Global Step: 77370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:52,891-Speed 3404.62 samples/sec   Loss 2.3371   LearningRate 0.0055   Epoch: 15   Global Step: 77380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:55,905-Speed 3397.88 samples/sec   Loss 2.3040   LearningRate 0.0055   Epoch: 15   Global Step: 77390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:12:58,898-Speed 3422.09 samples/sec   Loss 2.2043   LearningRate 0.0055   Epoch: 15   Global Step: 77400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:01,924-Speed 3385.29 samples/sec   Loss 2.2928   LearningRate 0.0055   Epoch: 15   Global Step: 77410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:04,931-Speed 3406.00 samples/sec   Loss 2.2837   LearningRate 0.0055   Epoch: 15   Global Step: 77420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:07,953-Speed 3389.90 samples/sec   Loss 2.3181   LearningRate 0.0055   Epoch: 15   Global Step: 77430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:10,958-Speed 3408.15 samples/sec   Loss 2.3692   LearningRate 0.0055   Epoch: 15   Global Step: 77440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:13,971-Speed 3400.44 samples/sec   Loss 2.3211   LearningRate 0.0055   Epoch: 15   Global Step: 77450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:16,981-Speed 3402.25 samples/sec   Loss 2.3139   LearningRate 0.0055   Epoch: 15   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:19,998-Speed 3395.65 samples/sec   Loss 2.1941   LearningRate 0.0055   Epoch: 15   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:23,012-Speed 3398.11 samples/sec   Loss 2.4409   LearningRate 0.0055   Epoch: 15   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:26,029-Speed 3394.99 samples/sec   Loss 2.3452   LearningRate 0.0055   Epoch: 15   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:29,029-Speed 3414.12 samples/sec   Loss 2.3243   LearningRate 0.0055   Epoch: 15   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:32,044-Speed 3397.76 samples/sec   Loss 2.3778   LearningRate 0.0055   Epoch: 15   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:35,065-Speed 3390.44 samples/sec   Loss 2.3411   LearningRate 0.0055   Epoch: 15   Global Step: 77520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:38,078-Speed 3400.18 samples/sec   Loss 2.3420   LearningRate 0.0055   Epoch: 15   Global Step: 77530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:41,090-Speed 3400.60 samples/sec   Loss 2.2320   LearningRate 0.0055   Epoch: 15   Global Step: 77540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:44,107-Speed 3394.63 samples/sec   Loss 2.2113   LearningRate 0.0054   Epoch: 15   Global Step: 77550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:47,154-Speed 3361.29 samples/sec   Loss 2.2931   LearningRate 0.0054   Epoch: 15   Global Step: 77560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:50,168-Speed 3398.72 samples/sec   Loss 2.3057   LearningRate 0.0054   Epoch: 15   Global Step: 77570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:53,187-Speed 3393.45 samples/sec   Loss 2.3154   LearningRate 0.0054   Epoch: 15   Global Step: 77580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:56,206-Speed 3392.26 samples/sec   Loss 2.2846   LearningRate 0.0054   Epoch: 15   Global Step: 77590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:13:59,225-Speed 3393.62 samples/sec   Loss 2.2721   LearningRate 0.0054   Epoch: 15   Global Step: 77600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:02,274-Speed 3359.29 samples/sec   Loss 2.2669   LearningRate 0.0054   Epoch: 15   Global Step: 77610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:05,288-Speed 3398.26 samples/sec   Loss 2.3996   LearningRate 0.0054   Epoch: 15   Global Step: 77620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:08,300-Speed 3400.13 samples/sec   Loss 2.3518   LearningRate 0.0054   Epoch: 15   Global Step: 77630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:11,315-Speed 3397.45 samples/sec   Loss 2.3398   LearningRate 0.0054   Epoch: 15   Global Step: 77640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:14,335-Speed 3392.11 samples/sec   Loss 2.3753   LearningRate 0.0054   Epoch: 15   Global Step: 77650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:17,351-Speed 3396.39 samples/sec   Loss 2.3266   LearningRate 0.0054   Epoch: 15   Global Step: 77660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:20,365-Speed 3398.20 samples/sec   Loss 2.2809   LearningRate 0.0054   Epoch: 15   Global Step: 77670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:23,376-Speed 3401.70 samples/sec   Loss 2.2976   LearningRate 0.0054   Epoch: 15   Global Step: 77680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:26,397-Speed 3390.25 samples/sec   Loss 2.2554   LearningRate 0.0054   Epoch: 15   Global Step: 77690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:29,395-Speed 3417.30 samples/sec   Loss 2.2410   LearningRate 0.0054   Epoch: 15   Global Step: 77700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:32,408-Speed 3399.06 samples/sec   Loss 2.2954   LearningRate 0.0054   Epoch: 15   Global Step: 77710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:35,419-Speed 3401.83 samples/sec   Loss 2.3513   LearningRate 0.0054   Epoch: 15   Global Step: 77720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:38,434-Speed 3397.54 samples/sec   Loss 2.3157   LearningRate 0.0054   Epoch: 15   Global Step: 77730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:41,452-Speed 3393.39 samples/sec   Loss 2.3595   LearningRate 0.0054   Epoch: 15   Global Step: 77740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:44,466-Speed 3398.58 samples/sec   Loss 2.3569   LearningRate 0.0054   Epoch: 15   Global Step: 77750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:47,480-Speed 3398.34 samples/sec   Loss 2.4002   LearningRate 0.0054   Epoch: 15   Global Step: 77760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:50,497-Speed 3396.13 samples/sec   Loss 2.3765   LearningRate 0.0053   Epoch: 15   Global Step: 77770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:53,510-Speed 3399.47 samples/sec   Loss 2.4157   LearningRate 0.0053   Epoch: 15   Global Step: 77780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:56,519-Speed 3403.28 samples/sec   Loss 2.2622   LearningRate 0.0053   Epoch: 15   Global Step: 77790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:14:59,518-Speed 3416.23 samples/sec   Loss 2.2984   LearningRate 0.0053   Epoch: 15   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:02,534-Speed 3396.04 samples/sec   Loss 2.2234   LearningRate 0.0053   Epoch: 15   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:05,550-Speed 3396.33 samples/sec   Loss 2.2612   LearningRate 0.0053   Epoch: 15   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:08,558-Speed 3404.21 samples/sec   Loss 2.4050   LearningRate 0.0053   Epoch: 15   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:11,572-Speed 3398.82 samples/sec   Loss 2.3616   LearningRate 0.0053   Epoch: 15   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:14,597-Speed 3385.52 samples/sec   Loss 2.3815   LearningRate 0.0053   Epoch: 15   Global Step: 77850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:17,621-Speed 3388.17 samples/sec   Loss 2.3450   LearningRate 0.0053   Epoch: 15   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:20,617-Speed 3418.29 samples/sec   Loss 2.2792   LearningRate 0.0053   Epoch: 15   Global Step: 77870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:23,638-Speed 3391.14 samples/sec   Loss 2.2639   LearningRate 0.0053   Epoch: 15   Global Step: 77880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:26,661-Speed 3386.98 samples/sec   Loss 2.2706   LearningRate 0.0053   Epoch: 15   Global Step: 77890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:29,682-Speed 3390.80 samples/sec   Loss 2.3376   LearningRate 0.0053   Epoch: 15   Global Step: 77900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:32,703-Speed 3390.69 samples/sec   Loss 2.3022   LearningRate 0.0053   Epoch: 15   Global Step: 77910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:35,727-Speed 3388.00 samples/sec   Loss 2.3078   LearningRate 0.0053   Epoch: 15   Global Step: 77920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:38,762-Speed 3374.55 samples/sec   Loss 2.3053   LearningRate 0.0053   Epoch: 15   Global Step: 77930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:41,853-Speed 3313.77 samples/sec   Loss 2.3511   LearningRate 0.0053   Epoch: 15   Global Step: 77940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:44,864-Speed 3402.79 samples/sec   Loss 2.4319   LearningRate 0.0053   Epoch: 15   Global Step: 77950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:47,905-Speed 3367.23 samples/sec   Loss 2.4210   LearningRate 0.0053   Epoch: 15   Global Step: 77960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:50,913-Speed 3405.96 samples/sec   Loss 2.3667   LearningRate 0.0053   Epoch: 15   Global Step: 77970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:53,928-Speed 3397.02 samples/sec   Loss 2.3259   LearningRate 0.0053   Epoch: 15   Global Step: 77980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:15:56,938-Speed 3403.00 samples/sec   Loss 2.3098   LearningRate 0.0052   Epoch: 15   Global Step: 77990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:15:59,949-Speed 3401.36 samples/sec   Loss 2.4057   LearningRate 0.0052   Epoch: 15   Global Step: 78000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:16:44,204-[lfw][78000]XNorm: 22.490540
Training: 2022-04-11 07:16:44,204-[lfw][78000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 07:16:44,205-[lfw][78000]Accuracy-Highest: 0.99850
Training: 2022-04-11 07:17:35,525-[cfp_fp][78000]XNorm: 21.857906
Training: 2022-04-11 07:17:35,526-[cfp_fp][78000]Accuracy-Flip: 0.98614+-0.00434
Training: 2022-04-11 07:17:35,526-[cfp_fp][78000]Accuracy-Highest: 0.98614
Training: 2022-04-11 07:18:19,862-[agedb_30][78000]XNorm: 22.593340
Training: 2022-04-11 07:18:19,863-[agedb_30][78000]Accuracy-Flip: 0.98367+-0.00726
Training: 2022-04-11 07:18:19,863-[agedb_30][78000]Accuracy-Highest: 0.98467
Training: 2022-04-11 07:18:22,868-Speed 71.65 samples/sec   Loss 2.3617   LearningRate 0.0052   Epoch: 15   Global Step: 78010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:25,870-Speed 3410.88 samples/sec   Loss 2.3265   LearningRate 0.0052   Epoch: 15   Global Step: 78020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:28,873-Speed 3411.23 samples/sec   Loss 2.3039   LearningRate 0.0052   Epoch: 15   Global Step: 78030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:31,864-Speed 3424.21 samples/sec   Loss 2.3074   LearningRate 0.0052   Epoch: 15   Global Step: 78040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:34,859-Speed 3420.90 samples/sec   Loss 2.4195   LearningRate 0.0052   Epoch: 15   Global Step: 78050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:37,856-Speed 3417.55 samples/sec   Loss 2.4355   LearningRate 0.0052   Epoch: 15   Global Step: 78060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:40,858-Speed 3411.79 samples/sec   Loss 2.3921   LearningRate 0.0052   Epoch: 15   Global Step: 78070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:43,875-Speed 3394.41 samples/sec   Loss 2.3244   LearningRate 0.0052   Epoch: 15   Global Step: 78080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:18:46,873-Speed 3417.09 samples/sec   Loss 2.3311   LearningRate 0.0052   Epoch: 15   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:18:49,872-Speed 3415.11 samples/sec   Loss 2.3781   LearningRate 0.0052   Epoch: 15   Global Step: 78100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:18:52,876-Speed 3409.93 samples/sec   Loss 2.2937   LearningRate 0.0052   Epoch: 15   Global Step: 78110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:18:55,882-Speed 3407.64 samples/sec   Loss 2.1519   LearningRate 0.0052   Epoch: 15   Global Step: 78120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:18:58,884-Speed 3411.69 samples/sec   Loss 2.4603   LearningRate 0.0052   Epoch: 15   Global Step: 78130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:01,886-Speed 3412.56 samples/sec   Loss 2.3543   LearningRate 0.0052   Epoch: 15   Global Step: 78140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:04,889-Speed 3410.84 samples/sec   Loss 2.4144   LearningRate 0.0052   Epoch: 15   Global Step: 78150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:07,893-Speed 3409.55 samples/sec   Loss 2.3419   LearningRate 0.0052   Epoch: 15   Global Step: 78160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:10,895-Speed 3412.05 samples/sec   Loss 2.3337   LearningRate 0.0052   Epoch: 15   Global Step: 78170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:13,929-Speed 3375.51 samples/sec   Loss 2.4799   LearningRate 0.0052   Epoch: 15   Global Step: 78180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:16,924-Speed 3420.62 samples/sec   Loss 2.3099   LearningRate 0.0052   Epoch: 15   Global Step: 78190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:19,936-Speed 3401.15 samples/sec   Loss 2.3419   LearningRate 0.0052   Epoch: 15   Global Step: 78200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:22,942-Speed 3407.64 samples/sec   Loss 2.3702   LearningRate 0.0051   Epoch: 15   Global Step: 78210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:25,978-Speed 3372.81 samples/sec   Loss 2.3470   LearningRate 0.0051   Epoch: 15   Global Step: 78220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:28,988-Speed 3403.20 samples/sec   Loss 2.4960   LearningRate 0.0051   Epoch: 15   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:31,996-Speed 3405.73 samples/sec   Loss 2.3791   LearningRate 0.0051   Epoch: 15   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:19:34,983-Speed 3428.72 samples/sec   Loss 2.3373   LearningRate 0.0051   Epoch: 15   Global Step: 78250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:37,998-Speed 3397.42 samples/sec   Loss 2.2751   LearningRate 0.0051   Epoch: 15   Global Step: 78260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:41,013-Speed 3397.83 samples/sec   Loss 2.4652   LearningRate 0.0051   Epoch: 15   Global Step: 78270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:44,059-Speed 3362.19 samples/sec   Loss 2.3801   LearningRate 0.0051   Epoch: 15   Global Step: 78280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:47,069-Speed 3403.57 samples/sec   Loss 2.2908   LearningRate 0.0051   Epoch: 15   Global Step: 78290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:50,082-Speed 3399.16 samples/sec   Loss 2.4145   LearningRate 0.0051   Epoch: 15   Global Step: 78300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:53,086-Speed 3410.11 samples/sec   Loss 2.4124   LearningRate 0.0051   Epoch: 15   Global Step: 78310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:56,094-Speed 3405.91 samples/sec   Loss 2.2981   LearningRate 0.0051   Epoch: 15   Global Step: 78320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:19:59,097-Speed 3410.06 samples/sec   Loss 2.4040   LearningRate 0.0051   Epoch: 15   Global Step: 78330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:20:02,109-Speed 3400.96 samples/sec   Loss 2.4144   LearningRate 0.0051   Epoch: 15   Global Step: 78340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:20:05,122-Speed 3399.88 samples/sec   Loss 2.3662   LearningRate 0.0051   Epoch: 15   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:08,125-Speed 3410.62 samples/sec   Loss 2.3432   LearningRate 0.0051   Epoch: 15   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:11,188-Speed 3344.45 samples/sec   Loss 2.3832   LearningRate 0.0051   Epoch: 15   Global Step: 78370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:14,200-Speed 3401.12 samples/sec   Loss 2.2267   LearningRate 0.0051   Epoch: 15   Global Step: 78380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:17,208-Speed 3404.39 samples/sec   Loss 2.5378   LearningRate 0.0051   Epoch: 15   Global Step: 78390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:20,221-Speed 3400.09 samples/sec   Loss 2.4372   LearningRate 0.0051   Epoch: 15   Global Step: 78400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:23,238-Speed 3394.53 samples/sec   Loss 2.3645   LearningRate 0.0051   Epoch: 15   Global Step: 78410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:26,255-Speed 3396.05 samples/sec   Loss 2.3365   LearningRate 0.0051   Epoch: 15   Global Step: 78420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:29,265-Speed 3402.03 samples/sec   Loss 2.2588   LearningRate 0.0050   Epoch: 15   Global Step: 78430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:32,276-Speed 3402.08 samples/sec   Loss 2.3928   LearningRate 0.0050   Epoch: 15   Global Step: 78440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:35,279-Speed 3411.24 samples/sec   Loss 2.3781   LearningRate 0.0050   Epoch: 15   Global Step: 78450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:38,309-Speed 3379.85 samples/sec   Loss 2.3331   LearningRate 0.0050   Epoch: 15   Global Step: 78460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:41,316-Speed 3406.37 samples/sec   Loss 2.3169   LearningRate 0.0050   Epoch: 15   Global Step: 78470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:20:44,306-Speed 3426.49 samples/sec   Loss 2.4280   LearningRate 0.0050   Epoch: 15   Global Step: 78480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:20:47,308-Speed 3411.54 samples/sec   Loss 2.3085   LearningRate 0.0050   Epoch: 15   Global Step: 78490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:20:50,312-Speed 3409.40 samples/sec   Loss 2.3778   LearningRate 0.0050   Epoch: 15   Global Step: 78500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:20:53,319-Speed 3406.18 samples/sec   Loss 2.2485   LearningRate 0.0050   Epoch: 15   Global Step: 78510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:20:56,326-Speed 3406.98 samples/sec   Loss 2.3092   LearningRate 0.0050   Epoch: 15   Global Step: 78520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:20:59,350-Speed 3387.08 samples/sec   Loss 2.3353   LearningRate 0.0050   Epoch: 15   Global Step: 78530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:21:02,379-Speed 3381.61 samples/sec   Loss 2.4053   LearningRate 0.0050   Epoch: 15   Global Step: 78540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:21:05,417-Speed 3372.34 samples/sec   Loss 2.2003   LearningRate 0.0050   Epoch: 15   Global Step: 78550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:21:08,425-Speed 3404.27 samples/sec   Loss 2.3757   LearningRate 0.0050   Epoch: 15   Global Step: 78560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:21:11,444-Speed 3393.93 samples/sec   Loss 2.4079   LearningRate 0.0050   Epoch: 15   Global Step: 78570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:21:14,457-Speed 3398.74 samples/sec   Loss 2.2141   LearningRate 0.0050   Epoch: 15   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:17,490-Speed 3377.66 samples/sec   Loss 2.2823   LearningRate 0.0050   Epoch: 15   Global Step: 78590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:20,502-Speed 3401.41 samples/sec   Loss 2.2706   LearningRate 0.0050   Epoch: 15   Global Step: 78600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:23,531-Speed 3381.36 samples/sec   Loss 2.2760   LearningRate 0.0050   Epoch: 15   Global Step: 78610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:26,592-Speed 3345.82 samples/sec   Loss 2.3584   LearningRate 0.0050   Epoch: 15   Global Step: 78620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:29,601-Speed 3404.18 samples/sec   Loss 2.3398   LearningRate 0.0050   Epoch: 15   Global Step: 78630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:32,614-Speed 3399.95 samples/sec   Loss 2.2616   LearningRate 0.0050   Epoch: 15   Global Step: 78640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:35,627-Speed 3399.46 samples/sec   Loss 2.3639   LearningRate 0.0050   Epoch: 15   Global Step: 78650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:38,637-Speed 3402.74 samples/sec   Loss 2.4195   LearningRate 0.0049   Epoch: 15   Global Step: 78660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:41,653-Speed 3396.35 samples/sec   Loss 2.3289   LearningRate 0.0049   Epoch: 15   Global Step: 78670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:44,666-Speed 3399.01 samples/sec   Loss 2.3754   LearningRate 0.0049   Epoch: 15   Global Step: 78680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 07:21:47,662-Speed 3419.69 samples/sec   Loss 2.3007   LearningRate 0.0049   Epoch: 15   Global Step: 78690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:50,683-Speed 3389.35 samples/sec   Loss 2.3837   LearningRate 0.0049   Epoch: 15   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:53,755-Speed 3334.60 samples/sec   Loss 2.4027   LearningRate 0.0049   Epoch: 15   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:56,776-Speed 3390.94 samples/sec   Loss 2.3666   LearningRate 0.0049   Epoch: 15   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:21:59,796-Speed 3391.37 samples/sec   Loss 2.3656   LearningRate 0.0049   Epoch: 15   Global Step: 78730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:02,811-Speed 3397.76 samples/sec   Loss 2.3343   LearningRate 0.0049   Epoch: 15   Global Step: 78740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:05,828-Speed 3394.89 samples/sec   Loss 2.2598   LearningRate 0.0049   Epoch: 15   Global Step: 78750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:08,843-Speed 3397.00 samples/sec   Loss 2.3033   LearningRate 0.0049   Epoch: 15   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:11,857-Speed 3397.89 samples/sec   Loss 2.2645   LearningRate 0.0049   Epoch: 15   Global Step: 78770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:14,879-Speed 3389.88 samples/sec   Loss 2.3440   LearningRate 0.0049   Epoch: 15   Global Step: 78780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:17,881-Speed 3412.29 samples/sec   Loss 2.3525   LearningRate 0.0049   Epoch: 15   Global Step: 78790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:20,898-Speed 3395.04 samples/sec   Loss 2.4096   LearningRate 0.0049   Epoch: 15   Global Step: 78800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:23,939-Speed 3368.01 samples/sec   Loss 2.3127   LearningRate 0.0049   Epoch: 15   Global Step: 78810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:26,962-Speed 3388.09 samples/sec   Loss 2.3165   LearningRate 0.0049   Epoch: 15   Global Step: 78820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:29,979-Speed 3394.66 samples/sec   Loss 2.3596   LearningRate 0.0049   Epoch: 15   Global Step: 78830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:32,999-Speed 3392.77 samples/sec   Loss 2.3426   LearningRate 0.0049   Epoch: 15   Global Step: 78840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:36,015-Speed 3395.86 samples/sec   Loss 2.2032   LearningRate 0.0049   Epoch: 15   Global Step: 78850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:39,031-Speed 3395.75 samples/sec   Loss 2.3318   LearningRate 0.0049   Epoch: 15   Global Step: 78860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:42,057-Speed 3385.40 samples/sec   Loss 2.4277   LearningRate 0.0049   Epoch: 15   Global Step: 78870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:45,076-Speed 3392.98 samples/sec   Loss 2.4021   LearningRate 0.0049   Epoch: 15   Global Step: 78880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:48,076-Speed 3414.18 samples/sec   Loss 2.3647   LearningRate 0.0048   Epoch: 15   Global Step: 78890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:51,093-Speed 3394.56 samples/sec   Loss 2.3197   LearningRate 0.0048   Epoch: 15   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:54,111-Speed 3393.52 samples/sec   Loss 2.2841   LearningRate 0.0048   Epoch: 15   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:22:57,127-Speed 3397.29 samples/sec   Loss 2.4123   LearningRate 0.0048   Epoch: 15   Global Step: 78920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:00,138-Speed 3401.74 samples/sec   Loss 2.4281   LearningRate 0.0048   Epoch: 15   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:03,167-Speed 3381.36 samples/sec   Loss 2.3851   LearningRate 0.0048   Epoch: 15   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:06,214-Speed 3361.18 samples/sec   Loss 2.3548   LearningRate 0.0048   Epoch: 15   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:09,215-Speed 3414.06 samples/sec   Loss 2.3873   LearningRate 0.0048   Epoch: 15   Global Step: 78960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:12,237-Speed 3389.42 samples/sec   Loss 2.4457   LearningRate 0.0048   Epoch: 15   Global Step: 78970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:15,278-Speed 3367.77 samples/sec   Loss 2.4270   LearningRate 0.0048   Epoch: 15   Global Step: 78980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:18,292-Speed 3399.05 samples/sec   Loss 2.3978   LearningRate 0.0048   Epoch: 15   Global Step: 78990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:21,306-Speed 3398.17 samples/sec   Loss 2.2488   LearningRate 0.0048   Epoch: 15   Global Step: 79000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:24,325-Speed 3392.95 samples/sec   Loss 2.4383   LearningRate 0.0048   Epoch: 15   Global Step: 79010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:27,339-Speed 3398.70 samples/sec   Loss 2.3153   LearningRate 0.0048   Epoch: 15   Global Step: 79020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:30,350-Speed 3401.41 samples/sec   Loss 2.3598   LearningRate 0.0048   Epoch: 15   Global Step: 79030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:33,361-Speed 3402.27 samples/sec   Loss 2.3348   LearningRate 0.0048   Epoch: 15   Global Step: 79040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:36,378-Speed 3395.14 samples/sec   Loss 2.3272   LearningRate 0.0048   Epoch: 15   Global Step: 79050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:23:39,394-Speed 3396.72 samples/sec   Loss 2.2101   LearningRate 0.0048   Epoch: 15   Global Step: 79060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:42,406-Speed 3400.90 samples/sec   Loss 2.3659   LearningRate 0.0048   Epoch: 15   Global Step: 79070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:45,419-Speed 3399.36 samples/sec   Loss 2.3735   LearningRate 0.0048   Epoch: 15   Global Step: 79080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:48,443-Speed 3386.99 samples/sec   Loss 2.3952   LearningRate 0.0048   Epoch: 15   Global Step: 79090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:51,469-Speed 3385.42 samples/sec   Loss 2.3191   LearningRate 0.0048   Epoch: 15   Global Step: 79100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:54,487-Speed 3393.70 samples/sec   Loss 2.4250   LearningRate 0.0048   Epoch: 15   Global Step: 79110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:23:57,502-Speed 3396.82 samples/sec   Loss 2.4527   LearningRate 0.0047   Epoch: 15   Global Step: 79120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:00,516-Speed 3399.50 samples/sec   Loss 2.5446   LearningRate 0.0047   Epoch: 15   Global Step: 79130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:03,529-Speed 3398.81 samples/sec   Loss 2.4598   LearningRate 0.0047   Epoch: 15   Global Step: 79140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:06,544-Speed 3397.18 samples/sec   Loss 2.3078   LearningRate 0.0047   Epoch: 15   Global Step: 79150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:09,542-Speed 3416.70 samples/sec   Loss 2.2904   LearningRate 0.0047   Epoch: 15   Global Step: 79160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:12,561-Speed 3393.58 samples/sec   Loss 2.4256   LearningRate 0.0047   Epoch: 15   Global Step: 79170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:15,620-Speed 3347.78 samples/sec   Loss 2.4384   LearningRate 0.0047   Epoch: 15   Global Step: 79180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:18,674-Speed 3353.87 samples/sec   Loss 2.3424   LearningRate 0.0047   Epoch: 15   Global Step: 79190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:21,695-Speed 3391.47 samples/sec   Loss 2.3432   LearningRate 0.0047   Epoch: 15   Global Step: 79200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:24,712-Speed 3394.10 samples/sec   Loss 2.5418   LearningRate 0.0047   Epoch: 15   Global Step: 79210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:27,745-Speed 3377.24 samples/sec   Loss 2.3372   LearningRate 0.0047   Epoch: 15   Global Step: 79220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:30,773-Speed 3383.18 samples/sec   Loss 2.3642   LearningRate 0.0047   Epoch: 15   Global Step: 79230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:33,793-Speed 3391.83 samples/sec   Loss 2.3581   LearningRate 0.0047   Epoch: 15   Global Step: 79240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:36,812-Speed 3392.24 samples/sec   Loss 2.3581   LearningRate 0.0047   Epoch: 15   Global Step: 79250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:39,830-Speed 3394.87 samples/sec   Loss 2.2282   LearningRate 0.0047   Epoch: 15   Global Step: 79260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 07:24:42,829-Speed 3415.28 samples/sec   Loss 2.3652   LearningRate 0.0047   Epoch: 15   Global Step: 79270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:45,848-Speed 3393.02 samples/sec   Loss 2.3337   LearningRate 0.0047   Epoch: 15   Global Step: 79280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:48,865-Speed 3394.33 samples/sec   Loss 2.3347   LearningRate 0.0047   Epoch: 15   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:24:51,869-Speed 3409.51 samples/sec   Loss 2.3460   LearningRate 0.0047   Epoch: 15   Global Step: 79300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:24:54,900-Speed 3380.02 samples/sec   Loss 2.3919   LearningRate 0.0047   Epoch: 15   Global Step: 79310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:24:57,914-Speed 3398.20 samples/sec   Loss 2.5114   LearningRate 0.0047   Epoch: 15   Global Step: 79320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:00,933-Speed 3392.43 samples/sec   Loss 2.3541   LearningRate 0.0047   Epoch: 15   Global Step: 79330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:04,024-Speed 3313.96 samples/sec   Loss 2.4443   LearningRate 0.0047   Epoch: 15   Global Step: 79340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:07,374-Speed 3058.05 samples/sec   Loss 2.4080   LearningRate 0.0046   Epoch: 15   Global Step: 79350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:10,389-Speed 3396.67 samples/sec   Loss 2.1968   LearningRate 0.0046   Epoch: 15   Global Step: 79360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:13,403-Speed 3398.50 samples/sec   Loss 2.4866   LearningRate 0.0046   Epoch: 15   Global Step: 79370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:16,444-Speed 3369.04 samples/sec   Loss 2.3945   LearningRate 0.0046   Epoch: 15   Global Step: 79380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:19,458-Speed 3398.14 samples/sec   Loss 2.2531   LearningRate 0.0046   Epoch: 15   Global Step: 79390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:25:22,471-Speed 3399.52 samples/sec   Loss 2.3343   LearningRate 0.0046   Epoch: 15   Global Step: 79400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:25,492-Speed 3389.76 samples/sec   Loss 2.3322   LearningRate 0.0046   Epoch: 15   Global Step: 79410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:28,508-Speed 3397.25 samples/sec   Loss 2.2873   LearningRate 0.0046   Epoch: 15   Global Step: 79420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:31,523-Speed 3396.67 samples/sec   Loss 2.3747   LearningRate 0.0046   Epoch: 15   Global Step: 79430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:34,546-Speed 3388.52 samples/sec   Loss 2.4241   LearningRate 0.0046   Epoch: 15   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:37,607-Speed 3346.53 samples/sec   Loss 2.3784   LearningRate 0.0046   Epoch: 15   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:40,669-Speed 3344.49 samples/sec   Loss 2.3016   LearningRate 0.0046   Epoch: 15   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:43,688-Speed 3392.62 samples/sec   Loss 2.3892   LearningRate 0.0046   Epoch: 15   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:46,699-Speed 3401.58 samples/sec   Loss 2.3379   LearningRate 0.0046   Epoch: 15   Global Step: 79480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:49,718-Speed 3393.17 samples/sec   Loss 2.3488   LearningRate 0.0046   Epoch: 15   Global Step: 79490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:52,735-Speed 3395.30 samples/sec   Loss 2.3295   LearningRate 0.0046   Epoch: 15   Global Step: 79500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:55,753-Speed 3393.05 samples/sec   Loss 2.3740   LearningRate 0.0046   Epoch: 15   Global Step: 79510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:25:58,768-Speed 3397.54 samples/sec   Loss 2.3085   LearningRate 0.0046   Epoch: 15   Global Step: 79520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:01,789-Speed 3390.79 samples/sec   Loss 2.4175   LearningRate 0.0046   Epoch: 15   Global Step: 79530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:04,819-Speed 3380.12 samples/sec   Loss 2.3642   LearningRate 0.0046   Epoch: 15   Global Step: 79540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:07,828-Speed 3404.65 samples/sec   Loss 2.4312   LearningRate 0.0046   Epoch: 15   Global Step: 79550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:10,848-Speed 3392.58 samples/sec   Loss 2.4109   LearningRate 0.0046   Epoch: 15   Global Step: 79560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:13,862-Speed 3397.66 samples/sec   Loss 2.3809   LearningRate 0.0046   Epoch: 15   Global Step: 79570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:16,908-Speed 3362.84 samples/sec   Loss 2.3035   LearningRate 0.0046   Epoch: 15   Global Step: 79580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:19,935-Speed 3383.97 samples/sec   Loss 2.2693   LearningRate 0.0045   Epoch: 15   Global Step: 79590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:22,952-Speed 3395.51 samples/sec   Loss 2.3002   LearningRate 0.0045   Epoch: 15   Global Step: 79600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:25,965-Speed 3399.13 samples/sec   Loss 2.4050   LearningRate 0.0045   Epoch: 15   Global Step: 79610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:28,976-Speed 3401.40 samples/sec   Loss 2.3774   LearningRate 0.0045   Epoch: 15   Global Step: 79620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:31,995-Speed 3393.13 samples/sec   Loss 2.2827   LearningRate 0.0045   Epoch: 15   Global Step: 79630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:35,009-Speed 3398.94 samples/sec   Loss 2.3521   LearningRate 0.0045   Epoch: 15   Global Step: 79640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:38,061-Speed 3356.54 samples/sec   Loss 2.3523   LearningRate 0.0045   Epoch: 15   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:41,077-Speed 3395.97 samples/sec   Loss 2.3780   LearningRate 0.0045   Epoch: 15   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:44,101-Speed 3387.23 samples/sec   Loss 2.4048   LearningRate 0.0045   Epoch: 15   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:47,124-Speed 3388.25 samples/sec   Loss 2.4890   LearningRate 0.0045   Epoch: 15   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:50,136-Speed 3400.32 samples/sec   Loss 2.4122   LearningRate 0.0045   Epoch: 15   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:53,164-Speed 3383.26 samples/sec   Loss 2.4114   LearningRate 0.0045   Epoch: 15   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:26:56,168-Speed 3408.92 samples/sec   Loss 2.4138   LearningRate 0.0045   Epoch: 15   Global Step: 79710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:26:59,183-Speed 3397.70 samples/sec   Loss 2.2760   LearningRate 0.0045   Epoch: 15   Global Step: 79720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:02,202-Speed 3392.39 samples/sec   Loss 2.3790   LearningRate 0.0045   Epoch: 15   Global Step: 79730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:05,218-Speed 3396.49 samples/sec   Loss 2.4293   LearningRate 0.0045   Epoch: 15   Global Step: 79740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:08,231-Speed 3399.13 samples/sec   Loss 2.3654   LearningRate 0.0045   Epoch: 15   Global Step: 79750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:11,244-Speed 3399.97 samples/sec   Loss 2.4503   LearningRate 0.0045   Epoch: 15   Global Step: 79760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:14,258-Speed 3398.36 samples/sec   Loss 2.2931   LearningRate 0.0045   Epoch: 15   Global Step: 79770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:17,291-Speed 3377.49 samples/sec   Loss 2.3231   LearningRate 0.0045   Epoch: 15   Global Step: 79780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:20,309-Speed 3393.15 samples/sec   Loss 2.3652   LearningRate 0.0045   Epoch: 15   Global Step: 79790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:23,329-Speed 3392.12 samples/sec   Loss 2.3536   LearningRate 0.0045   Epoch: 15   Global Step: 79800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:26,350-Speed 3389.86 samples/sec   Loss 2.4462   LearningRate 0.0045   Epoch: 15   Global Step: 79810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:27:29,369-Speed 3393.30 samples/sec   Loss 2.3780   LearningRate 0.0045   Epoch: 15   Global Step: 79820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:27:32,369-Speed 3414.36 samples/sec   Loss 2.3124   LearningRate 0.0044   Epoch: 15   Global Step: 79830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:35,383-Speed 3398.30 samples/sec   Loss 2.2544   LearningRate 0.0044   Epoch: 15   Global Step: 79840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:38,394-Speed 3402.20 samples/sec   Loss 2.4144   LearningRate 0.0044   Epoch: 15   Global Step: 79850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:41,407-Speed 3398.90 samples/sec   Loss 2.3746   LearningRate 0.0044   Epoch: 15   Global Step: 79860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:44,419-Speed 3400.81 samples/sec   Loss 2.2095   LearningRate 0.0044   Epoch: 15   Global Step: 79870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:47,430-Speed 3401.72 samples/sec   Loss 2.3694   LearningRate 0.0044   Epoch: 15   Global Step: 79880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:50,444-Speed 3398.36 samples/sec   Loss 2.3677   LearningRate 0.0044   Epoch: 15   Global Step: 79890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:53,468-Speed 3386.82 samples/sec   Loss 2.2829   LearningRate 0.0044   Epoch: 15   Global Step: 79900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:56,492-Speed 3387.54 samples/sec   Loss 2.2542   LearningRate 0.0044   Epoch: 15   Global Step: 79910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:27:59,506-Speed 3398.53 samples/sec   Loss 2.4698   LearningRate 0.0044   Epoch: 15   Global Step: 79920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:28:02,522-Speed 3396.23 samples/sec   Loss 2.3644   LearningRate 0.0044   Epoch: 15   Global Step: 79930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:28:05,541-Speed 3393.32 samples/sec   Loss 2.3864   LearningRate 0.0044   Epoch: 15   Global Step: 79940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:28:08,556-Speed 3397.04 samples/sec   Loss 2.4451   LearningRate 0.0044   Epoch: 15   Global Step: 79950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:28:11,592-Speed 3373.31 samples/sec   Loss 2.3564   LearningRate 0.0044   Epoch: 15   Global Step: 79960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:28:14,611-Speed 3393.09 samples/sec   Loss 2.2801   LearningRate 0.0044   Epoch: 15   Global Step: 79970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:28:17,628-Speed 3395.28 samples/sec   Loss 2.4942   LearningRate 0.0044   Epoch: 15   Global Step: 79980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:28:20,641-Speed 3399.41 samples/sec   Loss 2.4069   LearningRate 0.0044   Epoch: 15   Global Step: 79990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:28:23,650-Speed 3404.10 samples/sec   Loss 2.3614   LearningRate 0.0044   Epoch: 15   Global Step: 80000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:29:07,978-[lfw][80000]XNorm: 22.612949
Training: 2022-04-11 07:29:07,978-[lfw][80000]Accuracy-Flip: 0.99817+-0.00252
Training: 2022-04-11 07:29:07,979-[lfw][80000]Accuracy-Highest: 0.99850
Training: 2022-04-11 07:29:59,260-[cfp_fp][80000]XNorm: 22.071024
Training: 2022-04-11 07:29:59,261-[cfp_fp][80000]Accuracy-Flip: 0.98414+-0.00594
Training: 2022-04-11 07:29:59,261-[cfp_fp][80000]Accuracy-Highest: 0.98614
Training: 2022-04-11 07:30:43,337-[agedb_30][80000]XNorm: 22.798119
Training: 2022-04-11 07:30:43,337-[agedb_30][80000]Accuracy-Flip: 0.98550+-0.00582
Training: 2022-04-11 07:30:43,338-[agedb_30][80000]Accuracy-Highest: 0.98550
Training: 2022-04-11 07:30:46,350-Speed 71.76 samples/sec   Loss 2.3717   LearningRate 0.0044   Epoch: 15   Global Step: 80010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:30:49,339-Speed 3427.01 samples/sec   Loss 2.4076   LearningRate 0.0044   Epoch: 15   Global Step: 80020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:30:52,326-Speed 3428.93 samples/sec   Loss 2.4681   LearningRate 0.0044   Epoch: 15   Global Step: 80030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 07:30:55,303-Speed 3441.52 samples/sec   Loss 2.3086   LearningRate 0.0044   Epoch: 15   Global Step: 80040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:30:58,295-Speed 3422.80 samples/sec   Loss 2.3055   LearningRate 0.0044   Epoch: 15   Global Step: 80050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:31:01,295-Speed 3414.29 samples/sec   Loss 2.3724   LearningRate 0.0044   Epoch: 15   Global Step: 80060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:31:04,288-Speed 3422.60 samples/sec   Loss 2.2990   LearningRate 0.0043   Epoch: 15   Global Step: 80070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:31:07,329-Speed 3367.95 samples/sec   Loss 2.2324   LearningRate 0.0043   Epoch: 15   Global Step: 80080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:10,327-Speed 3416.25 samples/sec   Loss 2.3661   LearningRate 0.0043   Epoch: 15   Global Step: 80090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:13,326-Speed 3415.06 samples/sec   Loss 2.3955   LearningRate 0.0043   Epoch: 15   Global Step: 80100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:16,343-Speed 3395.52 samples/sec   Loss 2.3705   LearningRate 0.0043   Epoch: 15   Global Step: 80110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:19,343-Speed 3415.02 samples/sec   Loss 2.4157   LearningRate 0.0043   Epoch: 15   Global Step: 80120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:22,346-Speed 3410.03 samples/sec   Loss 2.4449   LearningRate 0.0043   Epoch: 15   Global Step: 80130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:25,350-Speed 3409.61 samples/sec   Loss 2.3576   LearningRate 0.0043   Epoch: 15   Global Step: 80140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:28,357-Speed 3406.94 samples/sec   Loss 2.2548   LearningRate 0.0043   Epoch: 15   Global Step: 80150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:31,363-Speed 3406.56 samples/sec   Loss 2.2605   LearningRate 0.0043   Epoch: 15   Global Step: 80160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:34,365-Speed 3411.81 samples/sec   Loss 2.3838   LearningRate 0.0043   Epoch: 15   Global Step: 80170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:37,369-Speed 3410.70 samples/sec   Loss 2.3973   LearningRate 0.0043   Epoch: 15   Global Step: 80180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:31:40,374-Speed 3408.22 samples/sec   Loss 2.3001   LearningRate 0.0043   Epoch: 15   Global Step: 80190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:31:43,360-Speed 3430.11 samples/sec   Loss 2.4000   LearningRate 0.0043   Epoch: 15   Global Step: 80200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:46,366-Speed 3407.46 samples/sec   Loss 2.3743   LearningRate 0.0043   Epoch: 15   Global Step: 80210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:49,399-Speed 3377.03 samples/sec   Loss 2.3596   LearningRate 0.0043   Epoch: 15   Global Step: 80220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:52,433-Speed 3376.23 samples/sec   Loss 2.4137   LearningRate 0.0043   Epoch: 15   Global Step: 80230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:55,455-Speed 3389.73 samples/sec   Loss 2.3921   LearningRate 0.0043   Epoch: 15   Global Step: 80240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:31:58,460-Speed 3408.45 samples/sec   Loss 2.3257   LearningRate 0.0043   Epoch: 15   Global Step: 80250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:01,472-Speed 3399.99 samples/sec   Loss 2.3531   LearningRate 0.0043   Epoch: 15   Global Step: 80260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:04,480-Speed 3406.85 samples/sec   Loss 2.3081   LearningRate 0.0043   Epoch: 15   Global Step: 80270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:07,491-Speed 3401.67 samples/sec   Loss 2.3325   LearningRate 0.0043   Epoch: 15   Global Step: 80280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:10,497-Speed 3407.65 samples/sec   Loss 2.3832   LearningRate 0.0043   Epoch: 15   Global Step: 80290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:13,501-Speed 3409.66 samples/sec   Loss 2.3104   LearningRate 0.0043   Epoch: 15   Global Step: 80300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:16,514-Speed 3398.93 samples/sec   Loss 2.4144   LearningRate 0.0042   Epoch: 15   Global Step: 80310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:19,525-Speed 3402.29 samples/sec   Loss 2.4106   LearningRate 0.0042   Epoch: 15   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:22,554-Speed 3381.77 samples/sec   Loss 2.3792   LearningRate 0.0042   Epoch: 15   Global Step: 80330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:25,554-Speed 3414.33 samples/sec   Loss 2.3032   LearningRate 0.0042   Epoch: 15   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:28,575-Speed 3390.51 samples/sec   Loss 2.3676   LearningRate 0.0042   Epoch: 15   Global Step: 80350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:31,588-Speed 3400.04 samples/sec   Loss 2.3752   LearningRate 0.0042   Epoch: 15   Global Step: 80360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:34,591-Speed 3410.85 samples/sec   Loss 2.2940   LearningRate 0.0042   Epoch: 15   Global Step: 80370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:32:37,595-Speed 3409.21 samples/sec   Loss 2.2950   LearningRate 0.0042   Epoch: 15   Global Step: 80380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:40,617-Speed 3389.68 samples/sec   Loss 2.2658   LearningRate 0.0042   Epoch: 15   Global Step: 80390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:43,658-Speed 3368.51 samples/sec   Loss 2.3768   LearningRate 0.0042   Epoch: 15   Global Step: 80400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:46,657-Speed 3415.32 samples/sec   Loss 2.3815   LearningRate 0.0042   Epoch: 15   Global Step: 80410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:49,660-Speed 3410.01 samples/sec   Loss 2.3968   LearningRate 0.0042   Epoch: 15   Global Step: 80420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:52,673-Speed 3399.96 samples/sec   Loss 2.3478   LearningRate 0.0042   Epoch: 15   Global Step: 80430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:55,677-Speed 3411.77 samples/sec   Loss 2.2595   LearningRate 0.0042   Epoch: 15   Global Step: 80440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:32:58,685-Speed 3404.64 samples/sec   Loss 2.2209   LearningRate 0.0042   Epoch: 15   Global Step: 80450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:01,712-Speed 3383.76 samples/sec   Loss 2.3925   LearningRate 0.0042   Epoch: 15   Global Step: 80460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:04,713-Speed 3412.78 samples/sec   Loss 2.3747   LearningRate 0.0042   Epoch: 15   Global Step: 80470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:07,694-Speed 3436.45 samples/sec   Loss 2.3288   LearningRate 0.0042   Epoch: 15   Global Step: 80480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:10,730-Speed 3374.05 samples/sec   Loss 2.3716   LearningRate 0.0042   Epoch: 15   Global Step: 80490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:13,767-Speed 3372.26 samples/sec   Loss 2.3214   LearningRate 0.0042   Epoch: 15   Global Step: 80500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:16,772-Speed 3408.60 samples/sec   Loss 2.3770   LearningRate 0.0042   Epoch: 15   Global Step: 80510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:19,776-Speed 3409.26 samples/sec   Loss 2.5175   LearningRate 0.0042   Epoch: 15   Global Step: 80520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:22,791-Speed 3397.66 samples/sec   Loss 2.3617   LearningRate 0.0042   Epoch: 15   Global Step: 80530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:25,828-Speed 3372.55 samples/sec   Loss 2.3969   LearningRate 0.0042   Epoch: 15   Global Step: 80540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:28,843-Speed 3397.09 samples/sec   Loss 2.3864   LearningRate 0.0042   Epoch: 15   Global Step: 80550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:31,862-Speed 3392.90 samples/sec   Loss 2.3925   LearningRate 0.0041   Epoch: 15   Global Step: 80560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:34,879-Speed 3394.94 samples/sec   Loss 2.3972   LearningRate 0.0041   Epoch: 15   Global Step: 80570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:37,887-Speed 3405.66 samples/sec   Loss 2.2605   LearningRate 0.0041   Epoch: 15   Global Step: 80580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:33:40,887-Speed 3413.73 samples/sec   Loss 2.3352   LearningRate 0.0041   Epoch: 15   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:33:43,894-Speed 3406.39 samples/sec   Loss 2.3687   LearningRate 0.0041   Epoch: 15   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:33:46,897-Speed 3410.75 samples/sec   Loss 2.3407   LearningRate 0.0041   Epoch: 15   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:33:49,911-Speed 3398.37 samples/sec   Loss 2.3487   LearningRate 0.0041   Epoch: 15   Global Step: 80620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:33:52,898-Speed 3429.43 samples/sec   Loss 2.2629   LearningRate 0.0041   Epoch: 15   Global Step: 80630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:55,911-Speed 3399.65 samples/sec   Loss 2.3278   LearningRate 0.0041   Epoch: 15   Global Step: 80640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:33:58,928-Speed 3394.45 samples/sec   Loss 2.3345   LearningRate 0.0041   Epoch: 15   Global Step: 80650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:01,934-Speed 3408.14 samples/sec   Loss 2.3650   LearningRate 0.0041   Epoch: 15   Global Step: 80660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:04,941-Speed 3406.41 samples/sec   Loss 2.3356   LearningRate 0.0041   Epoch: 15   Global Step: 80670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:07,972-Speed 3378.51 samples/sec   Loss 2.3527   LearningRate 0.0041   Epoch: 15   Global Step: 80680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:10,974-Speed 3412.63 samples/sec   Loss 2.2742   LearningRate 0.0041   Epoch: 15   Global Step: 80690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:14,004-Speed 3380.47 samples/sec   Loss 2.2487   LearningRate 0.0041   Epoch: 15   Global Step: 80700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:17,016-Speed 3400.25 samples/sec   Loss 2.2711   LearningRate 0.0041   Epoch: 15   Global Step: 80710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:20,026-Speed 3404.13 samples/sec   Loss 2.3857   LearningRate 0.0041   Epoch: 15   Global Step: 80720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:23,031-Speed 3407.85 samples/sec   Loss 2.3791   LearningRate 0.0041   Epoch: 15   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:34:26,037-Speed 3407.82 samples/sec   Loss 2.3897   LearningRate 0.0041   Epoch: 15   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:34:29,026-Speed 3426.84 samples/sec   Loss 2.3662   LearningRate 0.0041   Epoch: 15   Global Step: 80750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:32,041-Speed 3396.87 samples/sec   Loss 2.3807   LearningRate 0.0041   Epoch: 15   Global Step: 80760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:35,062-Speed 3390.96 samples/sec   Loss 2.3639   LearningRate 0.0041   Epoch: 15   Global Step: 80770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:38,132-Speed 3336.25 samples/sec   Loss 2.3297   LearningRate 0.0041   Epoch: 15   Global Step: 80780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:41,149-Speed 3395.10 samples/sec   Loss 2.2747   LearningRate 0.0041   Epoch: 15   Global Step: 80790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:44,158-Speed 3404.25 samples/sec   Loss 2.3059   LearningRate 0.0041   Epoch: 15   Global Step: 80800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:47,168-Speed 3402.46 samples/sec   Loss 2.4015   LearningRate 0.0040   Epoch: 15   Global Step: 80810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:50,171-Speed 3411.07 samples/sec   Loss 2.2942   LearningRate 0.0040   Epoch: 15   Global Step: 80820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:53,199-Speed 3382.48 samples/sec   Loss 2.2543   LearningRate 0.0040   Epoch: 15   Global Step: 80830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:56,235-Speed 3374.07 samples/sec   Loss 2.3495   LearningRate 0.0040   Epoch: 15   Global Step: 80840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:34:59,255-Speed 3391.38 samples/sec   Loss 2.3359   LearningRate 0.0040   Epoch: 15   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:35:02,243-Speed 3427.39 samples/sec   Loss 2.3156   LearningRate 0.0040   Epoch: 15   Global Step: 80860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:05,254-Speed 3402.32 samples/sec   Loss 2.3179   LearningRate 0.0040   Epoch: 15   Global Step: 80870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:08,257-Speed 3410.01 samples/sec   Loss 2.4104   LearningRate 0.0040   Epoch: 15   Global Step: 80880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:11,276-Speed 3393.54 samples/sec   Loss 2.2435   LearningRate 0.0040   Epoch: 15   Global Step: 80890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:14,286-Speed 3402.76 samples/sec   Loss 2.3227   LearningRate 0.0040   Epoch: 15   Global Step: 80900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:17,312-Speed 3384.57 samples/sec   Loss 2.2904   LearningRate 0.0040   Epoch: 15   Global Step: 80910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:20,411-Speed 3305.41 samples/sec   Loss 2.3343   LearningRate 0.0040   Epoch: 15   Global Step: 80920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:33,466-Speed 784.46 samples/sec   Loss 2.1770   LearningRate 0.0040   Epoch: 16   Global Step: 80930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:36,491-Speed 3386.93 samples/sec   Loss 1.6201   LearningRate 0.0040   Epoch: 16   Global Step: 80940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:39,523-Speed 3377.47 samples/sec   Loss 1.7448   LearningRate 0.0040   Epoch: 16   Global Step: 80950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:35:42,539-Speed 3396.53 samples/sec   Loss 1.6103   LearningRate 0.0040   Epoch: 16   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:35:45,560-Speed 3390.92 samples/sec   Loss 1.6768   LearningRate 0.0040   Epoch: 16   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:35:48,585-Speed 3386.41 samples/sec   Loss 1.6491   LearningRate 0.0040   Epoch: 16   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:35:51,591-Speed 3406.80 samples/sec   Loss 1.7085   LearningRate 0.0040   Epoch: 16   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:35:54,600-Speed 3403.86 samples/sec   Loss 1.6586   LearningRate 0.0040   Epoch: 16   Global Step: 81000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:35:57,620-Speed 3391.48 samples/sec   Loss 1.7052   LearningRate 0.0040   Epoch: 16   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:00,631-Speed 3402.34 samples/sec   Loss 1.6895   LearningRate 0.0040   Epoch: 16   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:03,643-Speed 3400.46 samples/sec   Loss 1.6915   LearningRate 0.0040   Epoch: 16   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:06,637-Speed 3421.26 samples/sec   Loss 1.6344   LearningRate 0.0040   Epoch: 16   Global Step: 81040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:09,653-Speed 3396.05 samples/sec   Loss 1.6491   LearningRate 0.0040   Epoch: 16   Global Step: 81050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:12,674-Speed 3390.28 samples/sec   Loss 1.6585   LearningRate 0.0039   Epoch: 16   Global Step: 81060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:15,762-Speed 3317.54 samples/sec   Loss 1.6738   LearningRate 0.0039   Epoch: 16   Global Step: 81070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:18,814-Speed 3356.02 samples/sec   Loss 1.6946   LearningRate 0.0039   Epoch: 16   Global Step: 81080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:21,826-Speed 3401.28 samples/sec   Loss 1.6237   LearningRate 0.0039   Epoch: 16   Global Step: 81090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:24,830-Speed 3409.03 samples/sec   Loss 1.6493   LearningRate 0.0039   Epoch: 16   Global Step: 81100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:27,861-Speed 3379.03 samples/sec   Loss 1.7701   LearningRate 0.0039   Epoch: 16   Global Step: 81110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:30,911-Speed 3358.84 samples/sec   Loss 1.6793   LearningRate 0.0039   Epoch: 16   Global Step: 81120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:33,946-Speed 3375.08 samples/sec   Loss 1.6237   LearningRate 0.0039   Epoch: 16   Global Step: 81130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:36:36,952-Speed 3407.05 samples/sec   Loss 1.6982   LearningRate 0.0039   Epoch: 16   Global Step: 81140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:40,003-Speed 3356.77 samples/sec   Loss 1.6096   LearningRate 0.0039   Epoch: 16   Global Step: 81150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:43,018-Speed 3397.38 samples/sec   Loss 1.5395   LearningRate 0.0039   Epoch: 16   Global Step: 81160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:46,040-Speed 3390.34 samples/sec   Loss 1.6944   LearningRate 0.0039   Epoch: 16   Global Step: 81170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:49,068-Speed 3382.42 samples/sec   Loss 1.6874   LearningRate 0.0039   Epoch: 16   Global Step: 81180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:52,082-Speed 3397.52 samples/sec   Loss 1.6802   LearningRate 0.0039   Epoch: 16   Global Step: 81190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:55,139-Speed 3351.59 samples/sec   Loss 1.6430   LearningRate 0.0039   Epoch: 16   Global Step: 81200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:36:58,148-Speed 3403.74 samples/sec   Loss 1.7025   LearningRate 0.0039   Epoch: 16   Global Step: 81210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:01,160-Speed 3400.28 samples/sec   Loss 1.6272   LearningRate 0.0039   Epoch: 16   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:04,166-Speed 3406.96 samples/sec   Loss 1.6475   LearningRate 0.0039   Epoch: 16   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:07,157-Speed 3425.08 samples/sec   Loss 1.6376   LearningRate 0.0039   Epoch: 16   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:10,166-Speed 3404.06 samples/sec   Loss 1.7275   LearningRate 0.0039   Epoch: 16   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:13,189-Speed 3388.51 samples/sec   Loss 1.6570   LearningRate 0.0039   Epoch: 16   Global Step: 81260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:16,201-Speed 3400.32 samples/sec   Loss 1.7174   LearningRate 0.0039   Epoch: 16   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:19,209-Speed 3405.21 samples/sec   Loss 1.6847   LearningRate 0.0039   Epoch: 16   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:22,215-Speed 3407.11 samples/sec   Loss 1.7620   LearningRate 0.0039   Epoch: 16   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:25,231-Speed 3396.46 samples/sec   Loss 1.7026   LearningRate 0.0039   Epoch: 16   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:28,270-Speed 3370.55 samples/sec   Loss 1.6240   LearningRate 0.0039   Epoch: 16   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:37:31,266-Speed 3418.06 samples/sec   Loss 1.7849   LearningRate 0.0038   Epoch: 16   Global Step: 81320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:34,272-Speed 3407.74 samples/sec   Loss 1.7197   LearningRate 0.0038   Epoch: 16   Global Step: 81330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:37,299-Speed 3383.37 samples/sec   Loss 1.6556   LearningRate 0.0038   Epoch: 16   Global Step: 81340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:40,316-Speed 3395.63 samples/sec   Loss 1.7220   LearningRate 0.0038   Epoch: 16   Global Step: 81350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:43,352-Speed 3373.35 samples/sec   Loss 1.8375   LearningRate 0.0038   Epoch: 16   Global Step: 81360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:46,420-Speed 3339.83 samples/sec   Loss 1.7401   LearningRate 0.0038   Epoch: 16   Global Step: 81370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:49,459-Speed 3369.76 samples/sec   Loss 1.7394   LearningRate 0.0038   Epoch: 16   Global Step: 81380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:52,493-Speed 3375.38 samples/sec   Loss 1.6667   LearningRate 0.0038   Epoch: 16   Global Step: 81390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:55,520-Speed 3385.02 samples/sec   Loss 1.6769   LearningRate 0.0038   Epoch: 16   Global Step: 81400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:37:58,544-Speed 3387.34 samples/sec   Loss 1.6185   LearningRate 0.0038   Epoch: 16   Global Step: 81410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:38:01,557-Speed 3399.97 samples/sec   Loss 1.7172   LearningRate 0.0038   Epoch: 16   Global Step: 81420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:04,568-Speed 3401.36 samples/sec   Loss 1.6817   LearningRate 0.0038   Epoch: 16   Global Step: 81430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:07,579-Speed 3401.00 samples/sec   Loss 1.7199   LearningRate 0.0038   Epoch: 16   Global Step: 81440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:10,587-Speed 3405.75 samples/sec   Loss 1.7348   LearningRate 0.0038   Epoch: 16   Global Step: 81450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:13,595-Speed 3405.33 samples/sec   Loss 1.6691   LearningRate 0.0038   Epoch: 16   Global Step: 81460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:16,616-Speed 3390.16 samples/sec   Loss 1.7767   LearningRate 0.0038   Epoch: 16   Global Step: 81470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:19,635-Speed 3392.66 samples/sec   Loss 1.6486   LearningRate 0.0038   Epoch: 16   Global Step: 81480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:22,645-Speed 3402.64 samples/sec   Loss 1.7795   LearningRate 0.0038   Epoch: 16   Global Step: 81490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:25,669-Speed 3387.32 samples/sec   Loss 1.7690   LearningRate 0.0038   Epoch: 16   Global Step: 81500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:28,686-Speed 3396.74 samples/sec   Loss 1.6790   LearningRate 0.0038   Epoch: 16   Global Step: 81510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:31,681-Speed 3420.20 samples/sec   Loss 1.6958   LearningRate 0.0038   Epoch: 16   Global Step: 81520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:34,691-Speed 3402.48 samples/sec   Loss 1.6528   LearningRate 0.0038   Epoch: 16   Global Step: 81530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:37,756-Speed 3341.64 samples/sec   Loss 1.6936   LearningRate 0.0038   Epoch: 16   Global Step: 81540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:40,769-Speed 3399.84 samples/sec   Loss 1.7996   LearningRate 0.0038   Epoch: 16   Global Step: 81550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:43,780-Speed 3400.85 samples/sec   Loss 1.7488   LearningRate 0.0038   Epoch: 16   Global Step: 81560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:46,800-Speed 3391.88 samples/sec   Loss 1.7688   LearningRate 0.0038   Epoch: 16   Global Step: 81570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:49,818-Speed 3393.86 samples/sec   Loss 1.6950   LearningRate 0.0037   Epoch: 16   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:38:52,811-Speed 3422.04 samples/sec   Loss 1.7294   LearningRate 0.0037   Epoch: 16   Global Step: 81590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:38:55,824-Speed 3399.48 samples/sec   Loss 1.7688   LearningRate 0.0037   Epoch: 16   Global Step: 81600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:38:58,836-Speed 3401.01 samples/sec   Loss 1.7614   LearningRate 0.0037   Epoch: 16   Global Step: 81610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:01,851-Speed 3397.86 samples/sec   Loss 1.7379   LearningRate 0.0037   Epoch: 16   Global Step: 81620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:04,866-Speed 3396.49 samples/sec   Loss 1.6556   LearningRate 0.0037   Epoch: 16   Global Step: 81630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:07,911-Speed 3363.94 samples/sec   Loss 1.7744   LearningRate 0.0037   Epoch: 16   Global Step: 81640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:10,919-Speed 3404.52 samples/sec   Loss 1.7833   LearningRate 0.0037   Epoch: 16   Global Step: 81650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:13,931-Speed 3401.53 samples/sec   Loss 1.7596   LearningRate 0.0037   Epoch: 16   Global Step: 81660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:16,951-Speed 3392.52 samples/sec   Loss 1.7271   LearningRate 0.0037   Epoch: 16   Global Step: 81670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:19,960-Speed 3403.27 samples/sec   Loss 1.7956   LearningRate 0.0037   Epoch: 16   Global Step: 81680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:22,954-Speed 3421.66 samples/sec   Loss 1.8069   LearningRate 0.0037   Epoch: 16   Global Step: 81690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:25,986-Speed 3377.71 samples/sec   Loss 1.7464   LearningRate 0.0037   Epoch: 16   Global Step: 81700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:29,035-Speed 3359.97 samples/sec   Loss 1.6564   LearningRate 0.0037   Epoch: 16   Global Step: 81710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:32,049-Speed 3398.53 samples/sec   Loss 1.8433   LearningRate 0.0037   Epoch: 16   Global Step: 81720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:35,063-Speed 3397.79 samples/sec   Loss 1.7720   LearningRate 0.0037   Epoch: 16   Global Step: 81730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:38,077-Speed 3398.05 samples/sec   Loss 1.8142   LearningRate 0.0037   Epoch: 16   Global Step: 81740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:41,088-Speed 3401.42 samples/sec   Loss 1.7626   LearningRate 0.0037   Epoch: 16   Global Step: 81750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:44,104-Speed 3396.67 samples/sec   Loss 1.8372   LearningRate 0.0037   Epoch: 16   Global Step: 81760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:47,129-Speed 3385.90 samples/sec   Loss 1.8104   LearningRate 0.0037   Epoch: 16   Global Step: 81770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:50,142-Speed 3399.05 samples/sec   Loss 1.7463   LearningRate 0.0037   Epoch: 16   Global Step: 81780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:39:53,155-Speed 3400.09 samples/sec   Loss 1.7286   LearningRate 0.0037   Epoch: 16   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:39:56,171-Speed 3396.12 samples/sec   Loss 1.7636   LearningRate 0.0037   Epoch: 16   Global Step: 81800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:39:59,195-Speed 3387.47 samples/sec   Loss 1.6516   LearningRate 0.0037   Epoch: 16   Global Step: 81810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:02,223-Speed 3382.34 samples/sec   Loss 1.8102   LearningRate 0.0037   Epoch: 16   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:05,236-Speed 3399.34 samples/sec   Loss 1.7605   LearningRate 0.0037   Epoch: 16   Global Step: 81830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:08,252-Speed 3395.96 samples/sec   Loss 1.8115   LearningRate 0.0036   Epoch: 16   Global Step: 81840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:11,299-Speed 3361.92 samples/sec   Loss 1.7806   LearningRate 0.0036   Epoch: 16   Global Step: 81850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:14,325-Speed 3386.29 samples/sec   Loss 1.7200   LearningRate 0.0036   Epoch: 16   Global Step: 81860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:17,343-Speed 3393.55 samples/sec   Loss 1.7823   LearningRate 0.0036   Epoch: 16   Global Step: 81870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:20,356-Speed 3399.29 samples/sec   Loss 1.7494   LearningRate 0.0036   Epoch: 16   Global Step: 81880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:23,370-Speed 3398.44 samples/sec   Loss 1.7747   LearningRate 0.0036   Epoch: 16   Global Step: 81890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 07:40:26,361-Speed 3424.81 samples/sec   Loss 1.7654   LearningRate 0.0036   Epoch: 16   Global Step: 81900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:29,377-Speed 3395.73 samples/sec   Loss 1.7598   LearningRate 0.0036   Epoch: 16   Global Step: 81910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:32,396-Speed 3392.95 samples/sec   Loss 1.8004   LearningRate 0.0036   Epoch: 16   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:35,413-Speed 3395.47 samples/sec   Loss 1.7880   LearningRate 0.0036   Epoch: 16   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:38,440-Speed 3383.24 samples/sec   Loss 1.7369   LearningRate 0.0036   Epoch: 16   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:41,456-Speed 3396.59 samples/sec   Loss 1.7408   LearningRate 0.0036   Epoch: 16   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:44,469-Speed 3399.97 samples/sec   Loss 1.8241   LearningRate 0.0036   Epoch: 16   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:47,483-Speed 3397.44 samples/sec   Loss 1.8706   LearningRate 0.0036   Epoch: 16   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:50,495-Speed 3400.86 samples/sec   Loss 1.8013   LearningRate 0.0036   Epoch: 16   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:53,507-Speed 3401.37 samples/sec   Loss 1.8026   LearningRate 0.0036   Epoch: 16   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:40:56,524-Speed 3394.89 samples/sec   Loss 1.8007   LearningRate 0.0036   Epoch: 16   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:41:40,886-[lfw][82000]XNorm: 22.328598
Training: 2022-04-11 07:41:40,887-[lfw][82000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 07:41:40,887-[lfw][82000]Accuracy-Highest: 0.99850
Training: 2022-04-11 07:42:32,182-[cfp_fp][82000]XNorm: 21.870537
Training: 2022-04-11 07:42:32,183-[cfp_fp][82000]Accuracy-Flip: 0.98686+-0.00578
Training: 2022-04-11 07:42:32,184-[cfp_fp][82000]Accuracy-Highest: 0.98686
Training: 2022-04-11 07:43:16,173-[agedb_30][82000]XNorm: 22.679104
Training: 2022-04-11 07:43:16,173-[agedb_30][82000]Accuracy-Flip: 0.98467+-0.00722
Training: 2022-04-11 07:43:16,174-[agedb_30][82000]Accuracy-Highest: 0.98550
Training: 2022-04-11 07:43:19,177-Speed 71.78 samples/sec   Loss 1.7585   LearningRate 0.0036   Epoch: 16   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:22,172-Speed 3419.74 samples/sec   Loss 1.7360   LearningRate 0.0036   Epoch: 16   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:25,167-Speed 3420.18 samples/sec   Loss 1.8161   LearningRate 0.0036   Epoch: 16   Global Step: 82030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:28,164-Speed 3417.31 samples/sec   Loss 1.8582   LearningRate 0.0036   Epoch: 16   Global Step: 82040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:31,171-Speed 3406.12 samples/sec   Loss 1.6539   LearningRate 0.0036   Epoch: 16   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:34,159-Speed 3428.81 samples/sec   Loss 1.7576   LearningRate 0.0036   Epoch: 16   Global Step: 82060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:37,154-Speed 3419.98 samples/sec   Loss 1.8039   LearningRate 0.0036   Epoch: 16   Global Step: 82070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:40,152-Speed 3416.50 samples/sec   Loss 1.9447   LearningRate 0.0036   Epoch: 16   Global Step: 82080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:43,146-Speed 3420.78 samples/sec   Loss 1.7864   LearningRate 0.0036   Epoch: 16   Global Step: 82090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:46,122-Speed 3442.13 samples/sec   Loss 1.7777   LearningRate 0.0035   Epoch: 16   Global Step: 82100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:49,120-Speed 3416.95 samples/sec   Loss 1.7604   LearningRate 0.0035   Epoch: 16   Global Step: 82110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:52,117-Speed 3417.39 samples/sec   Loss 1.6786   LearningRate 0.0035   Epoch: 16   Global Step: 82120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:55,127-Speed 3402.20 samples/sec   Loss 1.8197   LearningRate 0.0035   Epoch: 16   Global Step: 82130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:43:58,121-Speed 3420.74 samples/sec   Loss 1.7635   LearningRate 0.0035   Epoch: 16   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:01,133-Speed 3401.18 samples/sec   Loss 1.7998   LearningRate 0.0035   Epoch: 16   Global Step: 82150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:04,133-Speed 3414.43 samples/sec   Loss 1.8390   LearningRate 0.0035   Epoch: 16   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:07,132-Speed 3415.17 samples/sec   Loss 1.7890   LearningRate 0.0035   Epoch: 16   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:10,133-Speed 3413.17 samples/sec   Loss 1.8282   LearningRate 0.0035   Epoch: 16   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:13,144-Speed 3401.36 samples/sec   Loss 1.8477   LearningRate 0.0035   Epoch: 16   Global Step: 82190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:16,139-Speed 3419.94 samples/sec   Loss 1.8890   LearningRate 0.0035   Epoch: 16   Global Step: 82200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:19,139-Speed 3414.91 samples/sec   Loss 1.7553   LearningRate 0.0035   Epoch: 16   Global Step: 82210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:22,140-Speed 3412.07 samples/sec   Loss 1.8087   LearningRate 0.0035   Epoch: 16   Global Step: 82220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:25,146-Speed 3407.37 samples/sec   Loss 1.6612   LearningRate 0.0035   Epoch: 16   Global Step: 82230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:28,164-Speed 3394.13 samples/sec   Loss 1.8042   LearningRate 0.0035   Epoch: 16   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:31,165-Speed 3412.82 samples/sec   Loss 1.8162   LearningRate 0.0035   Epoch: 16   Global Step: 82250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:34,177-Speed 3401.77 samples/sec   Loss 1.8858   LearningRate 0.0035   Epoch: 16   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:37,183-Speed 3406.76 samples/sec   Loss 1.8627   LearningRate 0.0035   Epoch: 16   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:40,197-Speed 3398.34 samples/sec   Loss 1.8039   LearningRate 0.0035   Epoch: 16   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:43,204-Speed 3405.71 samples/sec   Loss 1.7635   LearningRate 0.0035   Epoch: 16   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:46,193-Speed 3427.26 samples/sec   Loss 1.8557   LearningRate 0.0035   Epoch: 16   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:49,205-Speed 3401.00 samples/sec   Loss 1.8199   LearningRate 0.0035   Epoch: 16   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:52,227-Speed 3388.33 samples/sec   Loss 1.9413   LearningRate 0.0035   Epoch: 16   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:55,226-Speed 3415.45 samples/sec   Loss 1.9084   LearningRate 0.0035   Epoch: 16   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:44:58,232-Speed 3407.82 samples/sec   Loss 1.7817   LearningRate 0.0035   Epoch: 16   Global Step: 82340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:01,244-Speed 3400.58 samples/sec   Loss 1.7572   LearningRate 0.0035   Epoch: 16   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:04,252-Speed 3405.73 samples/sec   Loss 1.7822   LearningRate 0.0035   Epoch: 16   Global Step: 82360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:07,260-Speed 3404.95 samples/sec   Loss 1.7609   LearningRate 0.0035   Epoch: 16   Global Step: 82370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:10,261-Speed 3412.25 samples/sec   Loss 1.8191   LearningRate 0.0034   Epoch: 16   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:13,275-Speed 3399.76 samples/sec   Loss 1.8091   LearningRate 0.0034   Epoch: 16   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:16,264-Speed 3426.29 samples/sec   Loss 1.7860   LearningRate 0.0034   Epoch: 16   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:19,274-Speed 3402.63 samples/sec   Loss 1.7446   LearningRate 0.0034   Epoch: 16   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:22,273-Speed 3415.04 samples/sec   Loss 1.7747   LearningRate 0.0034   Epoch: 16   Global Step: 82420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:25,286-Speed 3400.05 samples/sec   Loss 1.6929   LearningRate 0.0034   Epoch: 16   Global Step: 82430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:28,302-Speed 3396.44 samples/sec   Loss 1.8072   LearningRate 0.0034   Epoch: 16   Global Step: 82440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:31,311-Speed 3404.35 samples/sec   Loss 1.7533   LearningRate 0.0034   Epoch: 16   Global Step: 82450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:34,315-Speed 3409.11 samples/sec   Loss 1.7294   LearningRate 0.0034   Epoch: 16   Global Step: 82460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:37,321-Speed 3407.14 samples/sec   Loss 1.7684   LearningRate 0.0034   Epoch: 16   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:40,333-Speed 3400.73 samples/sec   Loss 1.8226   LearningRate 0.0034   Epoch: 16   Global Step: 82480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:43,338-Speed 3408.57 samples/sec   Loss 1.7911   LearningRate 0.0034   Epoch: 16   Global Step: 82490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:46,331-Speed 3422.50 samples/sec   Loss 1.8782   LearningRate 0.0034   Epoch: 16   Global Step: 82500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:49,356-Speed 3385.98 samples/sec   Loss 1.7903   LearningRate 0.0034   Epoch: 16   Global Step: 82510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:52,366-Speed 3402.90 samples/sec   Loss 1.8631   LearningRate 0.0034   Epoch: 16   Global Step: 82520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:45:55,363-Speed 3417.94 samples/sec   Loss 1.8092   LearningRate 0.0034   Epoch: 16   Global Step: 82530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:45:58,366-Speed 3411.19 samples/sec   Loss 1.8587   LearningRate 0.0034   Epoch: 16   Global Step: 82540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:01,371-Speed 3407.84 samples/sec   Loss 1.7897   LearningRate 0.0034   Epoch: 16   Global Step: 82550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:04,394-Speed 3387.95 samples/sec   Loss 1.8537   LearningRate 0.0034   Epoch: 16   Global Step: 82560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:07,421-Speed 3383.81 samples/sec   Loss 1.7736   LearningRate 0.0034   Epoch: 16   Global Step: 82570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:10,431-Speed 3402.55 samples/sec   Loss 1.8353   LearningRate 0.0034   Epoch: 16   Global Step: 82580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:13,451-Speed 3392.60 samples/sec   Loss 1.8624   LearningRate 0.0034   Epoch: 16   Global Step: 82590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:16,458-Speed 3405.95 samples/sec   Loss 1.8037   LearningRate 0.0034   Epoch: 16   Global Step: 82600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:19,465-Speed 3406.76 samples/sec   Loss 1.8759   LearningRate 0.0034   Epoch: 16   Global Step: 82610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:22,469-Speed 3409.29 samples/sec   Loss 1.7606   LearningRate 0.0034   Epoch: 16   Global Step: 82620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:25,475-Speed 3407.59 samples/sec   Loss 1.8530   LearningRate 0.0034   Epoch: 16   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:46:28,497-Speed 3390.08 samples/sec   Loss 1.7709   LearningRate 0.0034   Epoch: 16   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:46:31,490-Speed 3421.74 samples/sec   Loss 1.8769   LearningRate 0.0033   Epoch: 16   Global Step: 82650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:34,494-Speed 3409.83 samples/sec   Loss 1.8508   LearningRate 0.0033   Epoch: 16   Global Step: 82660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:37,512-Speed 3393.68 samples/sec   Loss 1.8392   LearningRate 0.0033   Epoch: 16   Global Step: 82670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:40,519-Speed 3405.87 samples/sec   Loss 1.6784   LearningRate 0.0033   Epoch: 16   Global Step: 82680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:43,532-Speed 3399.81 samples/sec   Loss 1.8262   LearningRate 0.0033   Epoch: 16   Global Step: 82690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:46,584-Speed 3356.14 samples/sec   Loss 1.8355   LearningRate 0.0033   Epoch: 16   Global Step: 82700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:49,592-Speed 3404.94 samples/sec   Loss 1.8094   LearningRate 0.0033   Epoch: 16   Global Step: 82710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:52,618-Speed 3384.93 samples/sec   Loss 1.8496   LearningRate 0.0033   Epoch: 16   Global Step: 82720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:55,645-Speed 3383.26 samples/sec   Loss 1.8344   LearningRate 0.0033   Epoch: 16   Global Step: 82730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:46:58,656-Speed 3402.34 samples/sec   Loss 1.8446   LearningRate 0.0033   Epoch: 16   Global Step: 82740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:01,661-Speed 3408.03 samples/sec   Loss 1.7235   LearningRate 0.0033   Epoch: 16   Global Step: 82750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:04,670-Speed 3403.97 samples/sec   Loss 1.7450   LearningRate 0.0033   Epoch: 16   Global Step: 82760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:07,694-Speed 3387.67 samples/sec   Loss 1.8266   LearningRate 0.0033   Epoch: 16   Global Step: 82770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:10,713-Speed 3392.41 samples/sec   Loss 1.7767   LearningRate 0.0033   Epoch: 16   Global Step: 82780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:13,708-Speed 3420.75 samples/sec   Loss 1.8384   LearningRate 0.0033   Epoch: 16   Global Step: 82790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:16,717-Speed 3403.91 samples/sec   Loss 1.7926   LearningRate 0.0033   Epoch: 16   Global Step: 82800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:19,723-Speed 3406.61 samples/sec   Loss 1.7828   LearningRate 0.0033   Epoch: 16   Global Step: 82810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:22,738-Speed 3398.18 samples/sec   Loss 1.8123   LearningRate 0.0033   Epoch: 16   Global Step: 82820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:25,757-Speed 3391.76 samples/sec   Loss 1.8253   LearningRate 0.0033   Epoch: 16   Global Step: 82830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:28,773-Speed 3396.41 samples/sec   Loss 1.8416   LearningRate 0.0033   Epoch: 16   Global Step: 82840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:31,792-Speed 3392.24 samples/sec   Loss 1.7914   LearningRate 0.0033   Epoch: 16   Global Step: 82850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:34,809-Speed 3395.62 samples/sec   Loss 1.8519   LearningRate 0.0033   Epoch: 16   Global Step: 82860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:37,824-Speed 3396.94 samples/sec   Loss 1.8917   LearningRate 0.0033   Epoch: 16   Global Step: 82870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:40,834-Speed 3402.70 samples/sec   Loss 1.8566   LearningRate 0.0033   Epoch: 16   Global Step: 82880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:47:43,841-Speed 3406.42 samples/sec   Loss 1.8218   LearningRate 0.0033   Epoch: 16   Global Step: 82890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:46,864-Speed 3388.68 samples/sec   Loss 1.7862   LearningRate 0.0033   Epoch: 16   Global Step: 82900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:49,872-Speed 3404.46 samples/sec   Loss 1.8065   LearningRate 0.0033   Epoch: 16   Global Step: 82910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:52,880-Speed 3405.47 samples/sec   Loss 1.8516   LearningRate 0.0033   Epoch: 16   Global Step: 82920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:55,892-Speed 3400.25 samples/sec   Loss 1.7356   LearningRate 0.0032   Epoch: 16   Global Step: 82930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:47:58,905-Speed 3399.88 samples/sec   Loss 1.8642   LearningRate 0.0032   Epoch: 16   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:01,925-Speed 3390.91 samples/sec   Loss 1.8023   LearningRate 0.0032   Epoch: 16   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:04,968-Speed 3366.71 samples/sec   Loss 1.7732   LearningRate 0.0032   Epoch: 16   Global Step: 82960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:07,978-Speed 3403.12 samples/sec   Loss 1.8509   LearningRate 0.0032   Epoch: 16   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:11,015-Speed 3373.65 samples/sec   Loss 1.9480   LearningRate 0.0032   Epoch: 16   Global Step: 82980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:14,039-Speed 3387.28 samples/sec   Loss 1.8860   LearningRate 0.0032   Epoch: 16   Global Step: 82990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:17,050-Speed 3401.86 samples/sec   Loss 1.7968   LearningRate 0.0032   Epoch: 16   Global Step: 83000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:20,056-Speed 3406.54 samples/sec   Loss 1.7667   LearningRate 0.0032   Epoch: 16   Global Step: 83010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:23,064-Speed 3405.23 samples/sec   Loss 1.9236   LearningRate 0.0032   Epoch: 16   Global Step: 83020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:26,074-Speed 3402.67 samples/sec   Loss 1.8357   LearningRate 0.0032   Epoch: 16   Global Step: 83030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:29,130-Speed 3351.41 samples/sec   Loss 1.7920   LearningRate 0.0032   Epoch: 16   Global Step: 83040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:32,144-Speed 3398.78 samples/sec   Loss 1.8252   LearningRate 0.0032   Epoch: 16   Global Step: 83050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:35,157-Speed 3399.53 samples/sec   Loss 1.8443   LearningRate 0.0032   Epoch: 16   Global Step: 83060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:38,181-Speed 3387.74 samples/sec   Loss 1.7747   LearningRate 0.0032   Epoch: 16   Global Step: 83070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:41,242-Speed 3345.47 samples/sec   Loss 1.8081   LearningRate 0.0032   Epoch: 16   Global Step: 83080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:44,271-Speed 3382.03 samples/sec   Loss 1.8399   LearningRate 0.0032   Epoch: 16   Global Step: 83090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:47,286-Speed 3397.35 samples/sec   Loss 1.8741   LearningRate 0.0032   Epoch: 16   Global Step: 83100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:50,300-Speed 3398.15 samples/sec   Loss 1.8174   LearningRate 0.0032   Epoch: 16   Global Step: 83110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:48:53,294-Speed 3421.63 samples/sec   Loss 1.8485   LearningRate 0.0032   Epoch: 16   Global Step: 83120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:56,307-Speed 3399.04 samples/sec   Loss 1.7908   LearningRate 0.0032   Epoch: 16   Global Step: 83130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:48:59,315-Speed 3405.44 samples/sec   Loss 1.7907   LearningRate 0.0032   Epoch: 16   Global Step: 83140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:02,339-Speed 3386.39 samples/sec   Loss 1.7875   LearningRate 0.0032   Epoch: 16   Global Step: 83150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:05,426-Speed 3318.16 samples/sec   Loss 1.8032   LearningRate 0.0032   Epoch: 16   Global Step: 83160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:08,443-Speed 3394.80 samples/sec   Loss 1.8322   LearningRate 0.0032   Epoch: 16   Global Step: 83170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:11,450-Speed 3406.81 samples/sec   Loss 1.8332   LearningRate 0.0032   Epoch: 16   Global Step: 83180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:14,491-Speed 3368.73 samples/sec   Loss 1.8661   LearningRate 0.0032   Epoch: 16   Global Step: 83190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:17,504-Speed 3399.14 samples/sec   Loss 1.7960   LearningRate 0.0032   Epoch: 16   Global Step: 83200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:20,513-Speed 3403.54 samples/sec   Loss 1.8648   LearningRate 0.0031   Epoch: 16   Global Step: 83210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:23,532-Speed 3392.35 samples/sec   Loss 1.9032   LearningRate 0.0031   Epoch: 16   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:49:26,540-Speed 3405.31 samples/sec   Loss 1.8311   LearningRate 0.0031   Epoch: 16   Global Step: 83230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:29,563-Speed 3388.14 samples/sec   Loss 1.7905   LearningRate 0.0031   Epoch: 16   Global Step: 83240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:32,572-Speed 3404.52 samples/sec   Loss 1.7233   LearningRate 0.0031   Epoch: 16   Global Step: 83250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:35,599-Speed 3383.11 samples/sec   Loss 1.8608   LearningRate 0.0031   Epoch: 16   Global Step: 83260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:38,618-Speed 3393.44 samples/sec   Loss 1.8108   LearningRate 0.0031   Epoch: 16   Global Step: 83270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:41,629-Speed 3402.32 samples/sec   Loss 1.8218   LearningRate 0.0031   Epoch: 16   Global Step: 83280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:44,653-Speed 3387.63 samples/sec   Loss 1.7984   LearningRate 0.0031   Epoch: 16   Global Step: 83290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:47,666-Speed 3399.54 samples/sec   Loss 1.8287   LearningRate 0.0031   Epoch: 16   Global Step: 83300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:50,798-Speed 3269.80 samples/sec   Loss 1.8882   LearningRate 0.0031   Epoch: 16   Global Step: 83310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:53,847-Speed 3359.47 samples/sec   Loss 1.8200   LearningRate 0.0031   Epoch: 16   Global Step: 83320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:49:56,863-Speed 3395.78 samples/sec   Loss 1.8237   LearningRate 0.0031   Epoch: 16   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:49:59,860-Speed 3417.71 samples/sec   Loss 1.8277   LearningRate 0.0031   Epoch: 16   Global Step: 83340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:02,882-Speed 3390.18 samples/sec   Loss 1.8861   LearningRate 0.0031   Epoch: 16   Global Step: 83350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:05,903-Speed 3389.73 samples/sec   Loss 1.7787   LearningRate 0.0031   Epoch: 16   Global Step: 83360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:08,915-Speed 3401.89 samples/sec   Loss 1.8589   LearningRate 0.0031   Epoch: 16   Global Step: 83370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:11,949-Speed 3375.74 samples/sec   Loss 1.7561   LearningRate 0.0031   Epoch: 16   Global Step: 83380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:15,044-Speed 3308.74 samples/sec   Loss 1.7359   LearningRate 0.0031   Epoch: 16   Global Step: 83390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:18,062-Speed 3393.90 samples/sec   Loss 1.8378   LearningRate 0.0031   Epoch: 16   Global Step: 83400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:21,071-Speed 3403.93 samples/sec   Loss 1.8672   LearningRate 0.0031   Epoch: 16   Global Step: 83410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:24,078-Speed 3406.50 samples/sec   Loss 1.7899   LearningRate 0.0031   Epoch: 16   Global Step: 83420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:27,090-Speed 3400.17 samples/sec   Loss 1.8926   LearningRate 0.0031   Epoch: 16   Global Step: 83430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:30,111-Speed 3390.52 samples/sec   Loss 1.9153   LearningRate 0.0031   Epoch: 16   Global Step: 83440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:50:33,108-Speed 3417.40 samples/sec   Loss 1.7245   LearningRate 0.0031   Epoch: 16   Global Step: 83450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:36,125-Speed 3395.20 samples/sec   Loss 1.8441   LearningRate 0.0031   Epoch: 16   Global Step: 83460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:39,145-Speed 3391.97 samples/sec   Loss 1.8335   LearningRate 0.0031   Epoch: 16   Global Step: 83470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:42,180-Speed 3375.36 samples/sec   Loss 1.8735   LearningRate 0.0031   Epoch: 16   Global Step: 83480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:45,188-Speed 3404.92 samples/sec   Loss 1.7881   LearningRate 0.0031   Epoch: 16   Global Step: 83490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:48,194-Speed 3407.31 samples/sec   Loss 1.8236   LearningRate 0.0030   Epoch: 16   Global Step: 83500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:51,204-Speed 3402.45 samples/sec   Loss 1.8562   LearningRate 0.0030   Epoch: 16   Global Step: 83510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:54,221-Speed 3396.18 samples/sec   Loss 1.7709   LearningRate 0.0030   Epoch: 16   Global Step: 83520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:50:57,253-Speed 3378.64 samples/sec   Loss 1.7723   LearningRate 0.0030   Epoch: 16   Global Step: 83530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:51:00,266-Speed 3398.65 samples/sec   Loss 1.8157   LearningRate 0.0030   Epoch: 16   Global Step: 83540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:51:03,279-Speed 3400.27 samples/sec   Loss 1.7362   LearningRate 0.0030   Epoch: 16   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:06,292-Speed 3399.89 samples/sec   Loss 1.8805   LearningRate 0.0030   Epoch: 16   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:09,302-Speed 3402.51 samples/sec   Loss 1.8455   LearningRate 0.0030   Epoch: 16   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:12,321-Speed 3392.37 samples/sec   Loss 1.9228   LearningRate 0.0030   Epoch: 16   Global Step: 83580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:15,337-Speed 3396.34 samples/sec   Loss 1.8656   LearningRate 0.0030   Epoch: 16   Global Step: 83590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:18,357-Speed 3392.79 samples/sec   Loss 1.9530   LearningRate 0.0030   Epoch: 16   Global Step: 83600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:21,387-Speed 3380.18 samples/sec   Loss 1.7501   LearningRate 0.0030   Epoch: 16   Global Step: 83610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:24,417-Speed 3380.58 samples/sec   Loss 1.7739   LearningRate 0.0030   Epoch: 16   Global Step: 83620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:27,436-Speed 3393.21 samples/sec   Loss 1.8189   LearningRate 0.0030   Epoch: 16   Global Step: 83630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:30,563-Speed 3275.15 samples/sec   Loss 1.8578   LearningRate 0.0030   Epoch: 16   Global Step: 83640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:33,557-Speed 3421.43 samples/sec   Loss 1.8495   LearningRate 0.0030   Epoch: 16   Global Step: 83650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:36,568-Speed 3401.50 samples/sec   Loss 1.8855   LearningRate 0.0030   Epoch: 16   Global Step: 83660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:39,591-Speed 3389.28 samples/sec   Loss 1.8487   LearningRate 0.0030   Epoch: 16   Global Step: 83670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:42,602-Speed 3401.29 samples/sec   Loss 1.8005   LearningRate 0.0030   Epoch: 16   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:45,611-Speed 3403.58 samples/sec   Loss 1.7680   LearningRate 0.0030   Epoch: 16   Global Step: 83690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:48,630-Speed 3393.53 samples/sec   Loss 1.6806   LearningRate 0.0030   Epoch: 16   Global Step: 83700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:51,646-Speed 3395.24 samples/sec   Loss 1.8730   LearningRate 0.0030   Epoch: 16   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:54,665-Speed 3393.61 samples/sec   Loss 1.8398   LearningRate 0.0030   Epoch: 16   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:51:57,676-Speed 3400.92 samples/sec   Loss 1.7808   LearningRate 0.0030   Epoch: 16   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:00,688-Speed 3400.19 samples/sec   Loss 1.8100   LearningRate 0.0030   Epoch: 16   Global Step: 83740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:03,683-Speed 3420.71 samples/sec   Loss 1.8586   LearningRate 0.0030   Epoch: 16   Global Step: 83750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:06,697-Speed 3399.51 samples/sec   Loss 1.7905   LearningRate 0.0030   Epoch: 16   Global Step: 83760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:09,711-Speed 3398.58 samples/sec   Loss 1.8553   LearningRate 0.0030   Epoch: 16   Global Step: 83770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:12,736-Speed 3385.96 samples/sec   Loss 1.8889   LearningRate 0.0030   Epoch: 16   Global Step: 83780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:15,754-Speed 3393.37 samples/sec   Loss 1.9040   LearningRate 0.0029   Epoch: 16   Global Step: 83790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:18,771-Speed 3395.59 samples/sec   Loss 1.8695   LearningRate 0.0029   Epoch: 16   Global Step: 83800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:21,780-Speed 3403.97 samples/sec   Loss 1.7880   LearningRate 0.0029   Epoch: 16   Global Step: 83810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:24,796-Speed 3396.86 samples/sec   Loss 1.8086   LearningRate 0.0029   Epoch: 16   Global Step: 83820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:52:27,799-Speed 3410.61 samples/sec   Loss 1.9282   LearningRate 0.0029   Epoch: 16   Global Step: 83830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:30,813-Speed 3398.88 samples/sec   Loss 1.7861   LearningRate 0.0029   Epoch: 16   Global Step: 83840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:33,832-Speed 3392.94 samples/sec   Loss 1.7643   LearningRate 0.0029   Epoch: 16   Global Step: 83850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:36,854-Speed 3389.59 samples/sec   Loss 1.8074   LearningRate 0.0029   Epoch: 16   Global Step: 83860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:39,864-Speed 3402.38 samples/sec   Loss 1.8058   LearningRate 0.0029   Epoch: 16   Global Step: 83870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:42,875-Speed 3402.53 samples/sec   Loss 1.8769   LearningRate 0.0029   Epoch: 16   Global Step: 83880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:45,888-Speed 3399.09 samples/sec   Loss 1.9446   LearningRate 0.0029   Epoch: 16   Global Step: 83890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:48,901-Speed 3399.69 samples/sec   Loss 1.8490   LearningRate 0.0029   Epoch: 16   Global Step: 83900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:51,910-Speed 3403.53 samples/sec   Loss 1.7801   LearningRate 0.0029   Epoch: 16   Global Step: 83910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:54,933-Speed 3388.87 samples/sec   Loss 1.8804   LearningRate 0.0029   Epoch: 16   Global Step: 83920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:52:57,941-Speed 3404.22 samples/sec   Loss 1.8770   LearningRate 0.0029   Epoch: 16   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:53:00,974-Speed 3377.44 samples/sec   Loss 1.8208   LearningRate 0.0029   Epoch: 16   Global Step: 83940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:53:04,021-Speed 3361.31 samples/sec   Loss 1.8107   LearningRate 0.0029   Epoch: 16   Global Step: 83950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:53:07,047-Speed 3385.33 samples/sec   Loss 1.8254   LearningRate 0.0029   Epoch: 16   Global Step: 83960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:53:10,057-Speed 3403.67 samples/sec   Loss 1.8543   LearningRate 0.0029   Epoch: 16   Global Step: 83970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:53:13,072-Speed 3396.25 samples/sec   Loss 1.9197   LearningRate 0.0029   Epoch: 16   Global Step: 83980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:53:16,085-Speed 3399.40 samples/sec   Loss 1.8200   LearningRate 0.0029   Epoch: 16   Global Step: 83990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:53:19,282-Speed 3204.35 samples/sec   Loss 1.8953   LearningRate 0.0029   Epoch: 16   Global Step: 84000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:54:03,723-[lfw][84000]XNorm: 22.611818
Training: 2022-04-11 07:54:03,724-[lfw][84000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 07:54:03,724-[lfw][84000]Accuracy-Highest: 0.99850
Training: 2022-04-11 07:54:55,388-[cfp_fp][84000]XNorm: 21.961186
Training: 2022-04-11 07:54:55,389-[cfp_fp][84000]Accuracy-Flip: 0.98700+-0.00467
Training: 2022-04-11 07:54:55,390-[cfp_fp][84000]Accuracy-Highest: 0.98700
Training: 2022-04-11 07:55:39,864-[agedb_30][84000]XNorm: 22.608432
Training: 2022-04-11 07:55:39,865-[agedb_30][84000]Accuracy-Flip: 0.98317+-0.00701
Training: 2022-04-11 07:55:39,865-[agedb_30][84000]Accuracy-Highest: 0.98550
Training: 2022-04-11 07:55:42,874-Speed 71.31 samples/sec   Loss 1.9562   LearningRate 0.0029   Epoch: 16   Global Step: 84010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:55:45,847-Speed 3445.79 samples/sec   Loss 1.8300   LearningRate 0.0029   Epoch: 16   Global Step: 84020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:55:48,837-Speed 3424.86 samples/sec   Loss 1.8387   LearningRate 0.0029   Epoch: 16   Global Step: 84030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:55:51,865-Speed 3382.43 samples/sec   Loss 1.8359   LearningRate 0.0029   Epoch: 16   Global Step: 84040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:55:54,858-Speed 3422.91 samples/sec   Loss 1.9368   LearningRate 0.0029   Epoch: 16   Global Step: 84050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:55:57,852-Speed 3420.78 samples/sec   Loss 1.8980   LearningRate 0.0029   Epoch: 16   Global Step: 84060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:00,848-Speed 3419.59 samples/sec   Loss 1.8513   LearningRate 0.0029   Epoch: 16   Global Step: 84070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:03,846-Speed 3416.15 samples/sec   Loss 1.7621   LearningRate 0.0029   Epoch: 16   Global Step: 84080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:06,847-Speed 3412.65 samples/sec   Loss 1.8263   LearningRate 0.0028   Epoch: 16   Global Step: 84090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:09,846-Speed 3415.79 samples/sec   Loss 1.9068   LearningRate 0.0028   Epoch: 16   Global Step: 84100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:12,877-Speed 3379.82 samples/sec   Loss 1.9127   LearningRate 0.0028   Epoch: 16   Global Step: 84110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:15,922-Speed 3363.45 samples/sec   Loss 1.8204   LearningRate 0.0028   Epoch: 16   Global Step: 84120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:18,929-Speed 3405.68 samples/sec   Loss 1.7560   LearningRate 0.0028   Epoch: 16   Global Step: 84130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:21,925-Speed 3419.83 samples/sec   Loss 1.8469   LearningRate 0.0028   Epoch: 16   Global Step: 84140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:24,935-Speed 3402.85 samples/sec   Loss 1.8663   LearningRate 0.0028   Epoch: 16   Global Step: 84150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:27,931-Speed 3418.47 samples/sec   Loss 1.8293   LearningRate 0.0028   Epoch: 16   Global Step: 84160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:30,931-Speed 3413.97 samples/sec   Loss 1.8527   LearningRate 0.0028   Epoch: 16   Global Step: 84170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:33,945-Speed 3397.84 samples/sec   Loss 1.7530   LearningRate 0.0028   Epoch: 16   Global Step: 84180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:36,975-Speed 3381.05 samples/sec   Loss 1.8990   LearningRate 0.0028   Epoch: 16   Global Step: 84190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:39,985-Speed 3402.72 samples/sec   Loss 1.8801   LearningRate 0.0028   Epoch: 16   Global Step: 84200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:42,992-Speed 3405.85 samples/sec   Loss 1.8532   LearningRate 0.0028   Epoch: 16   Global Step: 84210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:45,985-Speed 3422.21 samples/sec   Loss 1.7954   LearningRate 0.0028   Epoch: 16   Global Step: 84220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:48,985-Speed 3414.78 samples/sec   Loss 1.8956   LearningRate 0.0028   Epoch: 16   Global Step: 84230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:56:51,968-Speed 3434.99 samples/sec   Loss 1.8938   LearningRate 0.0028   Epoch: 16   Global Step: 84240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:54,978-Speed 3402.89 samples/sec   Loss 1.7867   LearningRate 0.0028   Epoch: 16   Global Step: 84250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:56:58,022-Speed 3364.37 samples/sec   Loss 1.8663   LearningRate 0.0028   Epoch: 16   Global Step: 84260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:01,042-Speed 3392.00 samples/sec   Loss 1.8337   LearningRate 0.0028   Epoch: 16   Global Step: 84270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:04,067-Speed 3385.89 samples/sec   Loss 1.7923   LearningRate 0.0028   Epoch: 16   Global Step: 84280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:07,065-Speed 3417.03 samples/sec   Loss 1.8043   LearningRate 0.0028   Epoch: 16   Global Step: 84290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:10,069-Speed 3408.83 samples/sec   Loss 1.8107   LearningRate 0.0028   Epoch: 16   Global Step: 84300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:13,085-Speed 3396.37 samples/sec   Loss 1.8195   LearningRate 0.0028   Epoch: 16   Global Step: 84310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:16,085-Speed 3413.82 samples/sec   Loss 1.8392   LearningRate 0.0028   Epoch: 16   Global Step: 84320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:19,147-Speed 3345.56 samples/sec   Loss 1.9164   LearningRate 0.0028   Epoch: 16   Global Step: 84330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:22,150-Speed 3410.84 samples/sec   Loss 1.9111   LearningRate 0.0028   Epoch: 16   Global Step: 84340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:25,153-Speed 3411.55 samples/sec   Loss 1.8107   LearningRate 0.0028   Epoch: 16   Global Step: 84350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:28,177-Speed 3386.55 samples/sec   Loss 1.7837   LearningRate 0.0028   Epoch: 16   Global Step: 84360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:31,177-Speed 3414.22 samples/sec   Loss 1.8463   LearningRate 0.0028   Epoch: 16   Global Step: 84370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:34,177-Speed 3414.36 samples/sec   Loss 1.8686   LearningRate 0.0028   Epoch: 16   Global Step: 84380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:37,184-Speed 3406.64 samples/sec   Loss 1.8089   LearningRate 0.0027   Epoch: 16   Global Step: 84390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:40,190-Speed 3407.15 samples/sec   Loss 1.7618   LearningRate 0.0027   Epoch: 16   Global Step: 84400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:43,191-Speed 3412.60 samples/sec   Loss 1.8337   LearningRate 0.0027   Epoch: 16   Global Step: 84410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:46,191-Speed 3414.55 samples/sec   Loss 1.8730   LearningRate 0.0027   Epoch: 16   Global Step: 84420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:57:49,174-Speed 3433.52 samples/sec   Loss 1.8896   LearningRate 0.0027   Epoch: 16   Global Step: 84430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:52,240-Speed 3340.67 samples/sec   Loss 1.8221   LearningRate 0.0027   Epoch: 16   Global Step: 84440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:55,244-Speed 3410.10 samples/sec   Loss 1.8993   LearningRate 0.0027   Epoch: 16   Global Step: 84450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:57:58,246-Speed 3411.43 samples/sec   Loss 1.8734   LearningRate 0.0027   Epoch: 16   Global Step: 84460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:58:01,258-Speed 3400.43 samples/sec   Loss 1.8239   LearningRate 0.0027   Epoch: 16   Global Step: 84470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:58:04,259-Speed 3413.56 samples/sec   Loss 1.8692   LearningRate 0.0027   Epoch: 16   Global Step: 84480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:58:07,264-Speed 3408.76 samples/sec   Loss 1.7531   LearningRate 0.0027   Epoch: 16   Global Step: 84490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:58:10,295-Speed 3378.36 samples/sec   Loss 1.8293   LearningRate 0.0027   Epoch: 16   Global Step: 84500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:58:13,315-Speed 3392.35 samples/sec   Loss 1.9230   LearningRate 0.0027   Epoch: 16   Global Step: 84510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:58:16,331-Speed 3396.14 samples/sec   Loss 1.7646   LearningRate 0.0027   Epoch: 16   Global Step: 84520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 07:58:19,333-Speed 3411.88 samples/sec   Loss 1.8452   LearningRate 0.0027   Epoch: 16   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:22,335-Speed 3412.14 samples/sec   Loss 1.9242   LearningRate 0.0027   Epoch: 16   Global Step: 84540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:25,341-Speed 3407.13 samples/sec   Loss 1.7783   LearningRate 0.0027   Epoch: 16   Global Step: 84550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:28,347-Speed 3406.92 samples/sec   Loss 1.8913   LearningRate 0.0027   Epoch: 16   Global Step: 84560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:31,350-Speed 3411.24 samples/sec   Loss 1.8228   LearningRate 0.0027   Epoch: 16   Global Step: 84570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:34,355-Speed 3408.38 samples/sec   Loss 1.8089   LearningRate 0.0027   Epoch: 16   Global Step: 84580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:37,359-Speed 3409.62 samples/sec   Loss 1.8070   LearningRate 0.0027   Epoch: 16   Global Step: 84590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:40,375-Speed 3396.54 samples/sec   Loss 1.9137   LearningRate 0.0027   Epoch: 16   Global Step: 84600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:43,382-Speed 3405.41 samples/sec   Loss 1.7663   LearningRate 0.0027   Epoch: 16   Global Step: 84610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:46,390-Speed 3406.40 samples/sec   Loss 1.9247   LearningRate 0.0027   Epoch: 16   Global Step: 84620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:49,384-Speed 3420.44 samples/sec   Loss 1.8444   LearningRate 0.0027   Epoch: 16   Global Step: 84630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:52,390-Speed 3407.90 samples/sec   Loss 1.9531   LearningRate 0.0027   Epoch: 16   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:55,401-Speed 3402.08 samples/sec   Loss 1.8201   LearningRate 0.0027   Epoch: 16   Global Step: 84650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:58:58,409-Speed 3404.26 samples/sec   Loss 1.8310   LearningRate 0.0027   Epoch: 16   Global Step: 84660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:01,416-Speed 3406.31 samples/sec   Loss 1.8178   LearningRate 0.0027   Epoch: 16   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:04,526-Speed 3293.57 samples/sec   Loss 1.8038   LearningRate 0.0027   Epoch: 16   Global Step: 84680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:07,542-Speed 3396.26 samples/sec   Loss 1.8189   LearningRate 0.0027   Epoch: 16   Global Step: 84690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:10,559-Speed 3394.82 samples/sec   Loss 1.7945   LearningRate 0.0026   Epoch: 16   Global Step: 84700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:13,565-Speed 3407.58 samples/sec   Loss 1.7822   LearningRate 0.0026   Epoch: 16   Global Step: 84710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:16,571-Speed 3407.59 samples/sec   Loss 1.8096   LearningRate 0.0026   Epoch: 16   Global Step: 84720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:19,579-Speed 3404.59 samples/sec   Loss 1.8141   LearningRate 0.0026   Epoch: 16   Global Step: 84730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 07:59:22,568-Speed 3427.48 samples/sec   Loss 1.7775   LearningRate 0.0026   Epoch: 16   Global Step: 84740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:25,585-Speed 3394.31 samples/sec   Loss 1.7637   LearningRate 0.0026   Epoch: 16   Global Step: 84750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:28,590-Speed 3409.21 samples/sec   Loss 1.7850   LearningRate 0.0026   Epoch: 16   Global Step: 84760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:31,597-Speed 3405.93 samples/sec   Loss 1.8710   LearningRate 0.0026   Epoch: 16   Global Step: 84770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:34,605-Speed 3405.49 samples/sec   Loss 1.8651   LearningRate 0.0026   Epoch: 16   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:37,609-Speed 3409.07 samples/sec   Loss 1.8176   LearningRate 0.0026   Epoch: 16   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:40,621-Speed 3401.11 samples/sec   Loss 1.7747   LearningRate 0.0026   Epoch: 16   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:43,638-Speed 3394.77 samples/sec   Loss 1.9011   LearningRate 0.0026   Epoch: 16   Global Step: 84810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:46,646-Speed 3405.23 samples/sec   Loss 1.8098   LearningRate 0.0026   Epoch: 16   Global Step: 84820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:49,655-Speed 3404.58 samples/sec   Loss 1.8826   LearningRate 0.0026   Epoch: 16   Global Step: 84830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:52,652-Speed 3417.41 samples/sec   Loss 1.7935   LearningRate 0.0026   Epoch: 16   Global Step: 84840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:55,666-Speed 3398.18 samples/sec   Loss 1.8022   LearningRate 0.0026   Epoch: 16   Global Step: 84850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 07:59:58,678-Speed 3399.97 samples/sec   Loss 1.8988   LearningRate 0.0026   Epoch: 16   Global Step: 84860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:01,697-Speed 3393.24 samples/sec   Loss 1.9091   LearningRate 0.0026   Epoch: 16   Global Step: 84870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:04,703-Speed 3406.81 samples/sec   Loss 1.8506   LearningRate 0.0026   Epoch: 16   Global Step: 84880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:07,719-Speed 3396.96 samples/sec   Loss 1.7569   LearningRate 0.0026   Epoch: 16   Global Step: 84890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:10,746-Speed 3383.47 samples/sec   Loss 1.8090   LearningRate 0.0026   Epoch: 16   Global Step: 84900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:13,765-Speed 3392.96 samples/sec   Loss 1.8228   LearningRate 0.0026   Epoch: 16   Global Step: 84910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:16,778-Speed 3400.25 samples/sec   Loss 1.9219   LearningRate 0.0026   Epoch: 16   Global Step: 84920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:19,807-Speed 3381.27 samples/sec   Loss 1.9767   LearningRate 0.0026   Epoch: 16   Global Step: 84930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:22,820-Speed 3399.67 samples/sec   Loss 1.7592   LearningRate 0.0026   Epoch: 16   Global Step: 84940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:25,826-Speed 3407.21 samples/sec   Loss 1.7936   LearningRate 0.0026   Epoch: 16   Global Step: 84950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:28,832-Speed 3407.59 samples/sec   Loss 1.8628   LearningRate 0.0026   Epoch: 16   Global Step: 84960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:31,843-Speed 3401.99 samples/sec   Loss 1.8073   LearningRate 0.0026   Epoch: 16   Global Step: 84970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:34,852-Speed 3403.40 samples/sec   Loss 1.9060   LearningRate 0.0026   Epoch: 16   Global Step: 84980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:37,881-Speed 3382.08 samples/sec   Loss 1.8312   LearningRate 0.0026   Epoch: 16   Global Step: 84990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:40,904-Speed 3387.65 samples/sec   Loss 1.8380   LearningRate 0.0026   Epoch: 16   Global Step: 85000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:43,916-Speed 3401.06 samples/sec   Loss 1.8099   LearningRate 0.0025   Epoch: 16   Global Step: 85010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:46,922-Speed 3407.50 samples/sec   Loss 1.8915   LearningRate 0.0025   Epoch: 16   Global Step: 85020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:49,937-Speed 3397.47 samples/sec   Loss 1.8006   LearningRate 0.0025   Epoch: 16   Global Step: 85030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:52,955-Speed 3393.91 samples/sec   Loss 1.7782   LearningRate 0.0025   Epoch: 16   Global Step: 85040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:00:55,947-Speed 3422.51 samples/sec   Loss 1.7561   LearningRate 0.0025   Epoch: 16   Global Step: 85050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:00:58,992-Speed 3364.48 samples/sec   Loss 1.7097   LearningRate 0.0025   Epoch: 16   Global Step: 85060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:02,039-Speed 3361.04 samples/sec   Loss 1.7620   LearningRate 0.0025   Epoch: 16   Global Step: 85070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:05,055-Speed 3396.55 samples/sec   Loss 1.8325   LearningRate 0.0025   Epoch: 16   Global Step: 85080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:08,075-Speed 3390.92 samples/sec   Loss 1.8693   LearningRate 0.0025   Epoch: 16   Global Step: 85090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:11,084-Speed 3404.13 samples/sec   Loss 1.8309   LearningRate 0.0025   Epoch: 16   Global Step: 85100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:14,102-Speed 3393.92 samples/sec   Loss 1.9143   LearningRate 0.0025   Epoch: 16   Global Step: 85110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:17,114-Speed 3401.18 samples/sec   Loss 1.7583   LearningRate 0.0025   Epoch: 16   Global Step: 85120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:20,124-Speed 3402.67 samples/sec   Loss 1.9557   LearningRate 0.0025   Epoch: 16   Global Step: 85130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:23,134-Speed 3403.00 samples/sec   Loss 1.7685   LearningRate 0.0025   Epoch: 16   Global Step: 85140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:01:26,153-Speed 3392.97 samples/sec   Loss 1.8198   LearningRate 0.0025   Epoch: 16   Global Step: 85150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:29,171-Speed 3393.19 samples/sec   Loss 1.9153   LearningRate 0.0025   Epoch: 16   Global Step: 85160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:32,262-Speed 3314.17 samples/sec   Loss 1.8861   LearningRate 0.0025   Epoch: 16   Global Step: 85170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:35,272-Speed 3403.00 samples/sec   Loss 1.8891   LearningRate 0.0025   Epoch: 16   Global Step: 85180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:38,306-Speed 3375.94 samples/sec   Loss 1.7173   LearningRate 0.0025   Epoch: 16   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:41,332-Speed 3384.90 samples/sec   Loss 1.8140   LearningRate 0.0025   Epoch: 16   Global Step: 85200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:44,348-Speed 3396.11 samples/sec   Loss 1.6706   LearningRate 0.0025   Epoch: 16   Global Step: 85210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:47,361-Speed 3399.82 samples/sec   Loss 1.8195   LearningRate 0.0025   Epoch: 16   Global Step: 85220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:50,377-Speed 3395.88 samples/sec   Loss 1.8898   LearningRate 0.0025   Epoch: 16   Global Step: 85230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:53,389-Speed 3400.35 samples/sec   Loss 1.7853   LearningRate 0.0025   Epoch: 16   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:56,378-Speed 3426.83 samples/sec   Loss 1.8237   LearningRate 0.0025   Epoch: 16   Global Step: 85250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:01:59,387-Speed 3403.78 samples/sec   Loss 1.8837   LearningRate 0.0025   Epoch: 16   Global Step: 85260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:02:02,428-Speed 3368.47 samples/sec   Loss 1.8539   LearningRate 0.0025   Epoch: 16   Global Step: 85270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:02:05,436-Speed 3405.13 samples/sec   Loss 1.7916   LearningRate 0.0025   Epoch: 16   Global Step: 85280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:02:08,444-Speed 3406.69 samples/sec   Loss 1.7877   LearningRate 0.0025   Epoch: 16   Global Step: 85290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:11,458-Speed 3397.76 samples/sec   Loss 1.8174   LearningRate 0.0025   Epoch: 16   Global Step: 85300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:14,483-Speed 3386.83 samples/sec   Loss 1.8287   LearningRate 0.0025   Epoch: 16   Global Step: 85310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:17,496-Speed 3399.73 samples/sec   Loss 1.7634   LearningRate 0.0025   Epoch: 16   Global Step: 85320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:20,527-Speed 3379.78 samples/sec   Loss 1.7581   LearningRate 0.0024   Epoch: 16   Global Step: 85330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:23,549-Speed 3389.27 samples/sec   Loss 1.7792   LearningRate 0.0024   Epoch: 16   Global Step: 85340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:26,568-Speed 3393.37 samples/sec   Loss 1.8527   LearningRate 0.0024   Epoch: 16   Global Step: 85350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:29,585-Speed 3394.76 samples/sec   Loss 1.7556   LearningRate 0.0024   Epoch: 16   Global Step: 85360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:32,604-Speed 3393.81 samples/sec   Loss 1.8478   LearningRate 0.0024   Epoch: 16   Global Step: 85370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:35,621-Speed 3394.24 samples/sec   Loss 1.8087   LearningRate 0.0024   Epoch: 16   Global Step: 85380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:38,635-Speed 3398.60 samples/sec   Loss 1.8817   LearningRate 0.0024   Epoch: 16   Global Step: 85390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:02:41,668-Speed 3377.63 samples/sec   Loss 1.8493   LearningRate 0.0024   Epoch: 16   Global Step: 85400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:02:44,660-Speed 3423.17 samples/sec   Loss 1.7755   LearningRate 0.0024   Epoch: 16   Global Step: 85410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:47,717-Speed 3350.45 samples/sec   Loss 1.8410   LearningRate 0.0024   Epoch: 16   Global Step: 85420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:50,741-Speed 3387.56 samples/sec   Loss 1.8436   LearningRate 0.0024   Epoch: 16   Global Step: 85430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:53,765-Speed 3386.29 samples/sec   Loss 1.9278   LearningRate 0.0024   Epoch: 16   Global Step: 85440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:56,797-Speed 3378.18 samples/sec   Loss 1.8696   LearningRate 0.0024   Epoch: 16   Global Step: 85450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:02:59,807-Speed 3402.81 samples/sec   Loss 1.8607   LearningRate 0.0024   Epoch: 16   Global Step: 85460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:02,819-Speed 3400.87 samples/sec   Loss 1.8462   LearningRate 0.0024   Epoch: 16   Global Step: 85470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:05,836-Speed 3396.14 samples/sec   Loss 1.9611   LearningRate 0.0024   Epoch: 16   Global Step: 85480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:08,846-Speed 3402.54 samples/sec   Loss 1.9216   LearningRate 0.0024   Epoch: 16   Global Step: 85490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:11,856-Speed 3403.68 samples/sec   Loss 1.8765   LearningRate 0.0024   Epoch: 16   Global Step: 85500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:14,879-Speed 3388.09 samples/sec   Loss 1.8258   LearningRate 0.0024   Epoch: 16   Global Step: 85510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:17,889-Speed 3402.44 samples/sec   Loss 1.8765   LearningRate 0.0024   Epoch: 16   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:20,914-Speed 3386.33 samples/sec   Loss 1.8197   LearningRate 0.0024   Epoch: 16   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:23,931-Speed 3394.90 samples/sec   Loss 1.8476   LearningRate 0.0024   Epoch: 16   Global Step: 85540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:26,945-Speed 3398.26 samples/sec   Loss 1.8278   LearningRate 0.0024   Epoch: 16   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:29,987-Speed 3367.65 samples/sec   Loss 1.8911   LearningRate 0.0024   Epoch: 16   Global Step: 85560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:33,015-Speed 3383.06 samples/sec   Loss 1.7732   LearningRate 0.0024   Epoch: 16   Global Step: 85570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:36,030-Speed 3396.27 samples/sec   Loss 1.8487   LearningRate 0.0024   Epoch: 16   Global Step: 85580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:39,046-Speed 3396.78 samples/sec   Loss 1.7677   LearningRate 0.0024   Epoch: 16   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:42,098-Speed 3355.78 samples/sec   Loss 1.7714   LearningRate 0.0024   Epoch: 16   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:03:45,080-Speed 3435.06 samples/sec   Loss 1.7634   LearningRate 0.0024   Epoch: 16   Global Step: 85610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:48,097-Speed 3394.73 samples/sec   Loss 1.8270   LearningRate 0.0024   Epoch: 16   Global Step: 85620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:51,133-Speed 3373.43 samples/sec   Loss 1.9146   LearningRate 0.0024   Epoch: 16   Global Step: 85630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:54,151-Speed 3394.01 samples/sec   Loss 1.7655   LearningRate 0.0024   Epoch: 16   Global Step: 85640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:03:57,162-Speed 3401.80 samples/sec   Loss 1.8461   LearningRate 0.0024   Epoch: 16   Global Step: 85650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:00,176-Speed 3398.26 samples/sec   Loss 1.8652   LearningRate 0.0023   Epoch: 16   Global Step: 85660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:03,187-Speed 3401.83 samples/sec   Loss 1.8099   LearningRate 0.0023   Epoch: 16   Global Step: 85670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:06,207-Speed 3391.84 samples/sec   Loss 1.6763   LearningRate 0.0023   Epoch: 16   Global Step: 85680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:09,227-Speed 3391.23 samples/sec   Loss 1.7919   LearningRate 0.0023   Epoch: 16   Global Step: 85690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:12,286-Speed 3349.35 samples/sec   Loss 1.6663   LearningRate 0.0023   Epoch: 16   Global Step: 85700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:15,310-Speed 3386.87 samples/sec   Loss 1.8255   LearningRate 0.0023   Epoch: 16   Global Step: 85710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:04:18,328-Speed 3393.94 samples/sec   Loss 1.8746   LearningRate 0.0023   Epoch: 16   Global Step: 85720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:04:21,349-Speed 3390.38 samples/sec   Loss 1.8211   LearningRate 0.0023   Epoch: 16   Global Step: 85730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:04:24,342-Speed 3422.26 samples/sec   Loss 1.7989   LearningRate 0.0023   Epoch: 16   Global Step: 85740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:27,352-Speed 3403.27 samples/sec   Loss 1.7329   LearningRate 0.0023   Epoch: 16   Global Step: 85750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:30,358-Speed 3406.74 samples/sec   Loss 1.8250   LearningRate 0.0023   Epoch: 16   Global Step: 85760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:33,371-Speed 3399.61 samples/sec   Loss 1.8176   LearningRate 0.0023   Epoch: 16   Global Step: 85770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:36,409-Speed 3372.00 samples/sec   Loss 1.7029   LearningRate 0.0023   Epoch: 16   Global Step: 85780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:39,454-Speed 3362.87 samples/sec   Loss 1.7959   LearningRate 0.0023   Epoch: 16   Global Step: 85790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:42,470-Speed 3396.66 samples/sec   Loss 1.8635   LearningRate 0.0023   Epoch: 16   Global Step: 85800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:45,489-Speed 3392.39 samples/sec   Loss 1.7841   LearningRate 0.0023   Epoch: 16   Global Step: 85810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:48,498-Speed 3404.43 samples/sec   Loss 1.8446   LearningRate 0.0023   Epoch: 16   Global Step: 85820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:51,508-Speed 3402.76 samples/sec   Loss 1.8712   LearningRate 0.0023   Epoch: 16   Global Step: 85830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:04:54,529-Speed 3391.25 samples/sec   Loss 1.9147   LearningRate 0.0023   Epoch: 16   Global Step: 85840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:04:57,545-Speed 3395.51 samples/sec   Loss 1.8327   LearningRate 0.0023   Epoch: 16   Global Step: 85850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:05:00,542-Speed 3417.63 samples/sec   Loss 1.8997   LearningRate 0.0023   Epoch: 16   Global Step: 85860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:03,560-Speed 3393.70 samples/sec   Loss 1.7949   LearningRate 0.0023   Epoch: 16   Global Step: 85870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:06,582-Speed 3389.97 samples/sec   Loss 1.7999   LearningRate 0.0023   Epoch: 16   Global Step: 85880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:09,624-Speed 3366.84 samples/sec   Loss 1.8072   LearningRate 0.0023   Epoch: 16   Global Step: 85890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:12,638-Speed 3398.87 samples/sec   Loss 1.8078   LearningRate 0.0023   Epoch: 16   Global Step: 85900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:15,669-Speed 3378.31 samples/sec   Loss 1.9557   LearningRate 0.0023   Epoch: 16   Global Step: 85910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:18,688-Speed 3393.93 samples/sec   Loss 1.8154   LearningRate 0.0023   Epoch: 16   Global Step: 85920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:21,704-Speed 3396.15 samples/sec   Loss 1.8146   LearningRate 0.0023   Epoch: 16   Global Step: 85930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:24,739-Speed 3374.82 samples/sec   Loss 1.7767   LearningRate 0.0023   Epoch: 16   Global Step: 85940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:27,760-Speed 3390.44 samples/sec   Loss 1.8342   LearningRate 0.0023   Epoch: 16   Global Step: 85950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:05:30,777-Speed 3393.98 samples/sec   Loss 1.8229   LearningRate 0.0023   Epoch: 16   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:05:33,791-Speed 3398.71 samples/sec   Loss 1.7763   LearningRate 0.0023   Epoch: 16   Global Step: 85970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:05:36,883-Speed 3312.55 samples/sec   Loss 1.9842   LearningRate 0.0023   Epoch: 16   Global Step: 85980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:05:50,056-Speed 777.46 samples/sec   Loss 1.6229   LearningRate 0.0022   Epoch: 17   Global Step: 85990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:05:53,147-Speed 3313.65 samples/sec   Loss 1.3550   LearningRate 0.0022   Epoch: 17   Global Step: 86000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:06:37,323-[lfw][86000]XNorm: 22.082740
Training: 2022-04-11 08:06:37,324-[lfw][86000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 08:06:37,324-[lfw][86000]Accuracy-Highest: 0.99850
Training: 2022-04-11 08:07:28,876-[cfp_fp][86000]XNorm: 21.868317
Training: 2022-04-11 08:07:28,877-[cfp_fp][86000]Accuracy-Flip: 0.98786+-0.00496
Training: 2022-04-11 08:07:28,878-[cfp_fp][86000]Accuracy-Highest: 0.98786
Training: 2022-04-11 08:08:13,287-[agedb_30][86000]XNorm: 22.396077
Training: 2022-04-11 08:08:13,288-[agedb_30][86000]Accuracy-Flip: 0.98450+-0.00753
Training: 2022-04-11 08:08:13,289-[agedb_30][86000]Accuracy-Highest: 0.98550
Training: 2022-04-11 08:08:16,371-Speed 71.50 samples/sec   Loss 1.3132   LearningRate 0.0022   Epoch: 17   Global Step: 86010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:19,376-Speed 3408.50 samples/sec   Loss 1.2478   LearningRate 0.0022   Epoch: 17   Global Step: 86020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:22,366-Speed 3426.75 samples/sec   Loss 1.2685   LearningRate 0.0022   Epoch: 17   Global Step: 86030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:25,389-Speed 3387.53 samples/sec   Loss 1.1868   LearningRate 0.0022   Epoch: 17   Global Step: 86040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:29,001-Speed 2835.86 samples/sec   Loss 1.3474   LearningRate 0.0022   Epoch: 17   Global Step: 86050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:32,011-Speed 3403.46 samples/sec   Loss 1.2943   LearningRate 0.0022   Epoch: 17   Global Step: 86060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 08:08:35,526-Speed 2914.07 samples/sec   Loss 1.2886   LearningRate 0.0022   Epoch: 17   Global Step: 86070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:38,570-Speed 3364.11 samples/sec   Loss 1.3543   LearningRate 0.0022   Epoch: 17   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:41,580-Speed 3404.19 samples/sec   Loss 1.3051   LearningRate 0.0022   Epoch: 17   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:08:44,573-Speed 3422.83 samples/sec   Loss 1.2681   LearningRate 0.0022   Epoch: 17   Global Step: 86100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:08:47,572-Speed 3415.72 samples/sec   Loss 1.2277   LearningRate 0.0022   Epoch: 17   Global Step: 86110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:08:50,588-Speed 3397.04 samples/sec   Loss 1.2770   LearningRate 0.0022   Epoch: 17   Global Step: 86120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:08:53,593-Speed 3408.29 samples/sec   Loss 1.3078   LearningRate 0.0022   Epoch: 17   Global Step: 86130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:08:56,595-Speed 3411.92 samples/sec   Loss 1.3292   LearningRate 0.0022   Epoch: 17   Global Step: 86140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:08:59,614-Speed 3393.30 samples/sec   Loss 1.2963   LearningRate 0.0022   Epoch: 17   Global Step: 86150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:09:02,661-Speed 3362.30 samples/sec   Loss 1.2971   LearningRate 0.0022   Epoch: 17   Global Step: 86160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:09:05,673-Speed 3401.93 samples/sec   Loss 1.2171   LearningRate 0.0022   Epoch: 17   Global Step: 86170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:09:08,723-Speed 3358.48 samples/sec   Loss 1.2775   LearningRate 0.0022   Epoch: 17   Global Step: 86180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:09:11,729-Speed 3407.95 samples/sec   Loss 1.3205   LearningRate 0.0022   Epoch: 17   Global Step: 86190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:09:14,736-Speed 3406.38 samples/sec   Loss 1.3024   LearningRate 0.0022   Epoch: 17   Global Step: 86200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:17,745-Speed 3404.09 samples/sec   Loss 1.2389   LearningRate 0.0022   Epoch: 17   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:20,757-Speed 3401.70 samples/sec   Loss 1.2635   LearningRate 0.0022   Epoch: 17   Global Step: 86220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:23,771-Speed 3398.31 samples/sec   Loss 1.3030   LearningRate 0.0022   Epoch: 17   Global Step: 86230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:26,808-Speed 3372.51 samples/sec   Loss 1.3868   LearningRate 0.0022   Epoch: 17   Global Step: 86240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:29,834-Speed 3384.73 samples/sec   Loss 1.4039   LearningRate 0.0022   Epoch: 17   Global Step: 86250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:32,845-Speed 3402.47 samples/sec   Loss 1.3311   LearningRate 0.0022   Epoch: 17   Global Step: 86260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:35,848-Speed 3410.97 samples/sec   Loss 1.2779   LearningRate 0.0022   Epoch: 17   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:38,894-Speed 3361.67 samples/sec   Loss 1.2922   LearningRate 0.0022   Epoch: 17   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:41,964-Speed 3336.71 samples/sec   Loss 1.3333   LearningRate 0.0022   Epoch: 17   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:44,967-Speed 3411.70 samples/sec   Loss 1.3580   LearningRate 0.0022   Epoch: 17   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:47,985-Speed 3393.99 samples/sec   Loss 1.3322   LearningRate 0.0022   Epoch: 17   Global Step: 86310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:51,037-Speed 3356.64 samples/sec   Loss 1.3342   LearningRate 0.0022   Epoch: 17   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:54,042-Speed 3408.71 samples/sec   Loss 1.3074   LearningRate 0.0021   Epoch: 17   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:09:57,053-Speed 3401.51 samples/sec   Loss 1.2837   LearningRate 0.0021   Epoch: 17   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:00,056-Speed 3411.53 samples/sec   Loss 1.2803   LearningRate 0.0021   Epoch: 17   Global Step: 86350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:03,058-Speed 3411.66 samples/sec   Loss 1.3291   LearningRate 0.0021   Epoch: 17   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:06,065-Speed 3406.74 samples/sec   Loss 1.2802   LearningRate 0.0021   Epoch: 17   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:09,093-Speed 3382.76 samples/sec   Loss 1.2803   LearningRate 0.0021   Epoch: 17   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:12,077-Speed 3432.08 samples/sec   Loss 1.3213   LearningRate 0.0021   Epoch: 17   Global Step: 86390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:15,097-Speed 3391.66 samples/sec   Loss 1.3111   LearningRate 0.0021   Epoch: 17   Global Step: 86400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:18,116-Speed 3393.72 samples/sec   Loss 1.2683   LearningRate 0.0021   Epoch: 17   Global Step: 86410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:21,127-Speed 3402.34 samples/sec   Loss 1.3173   LearningRate 0.0021   Epoch: 17   Global Step: 86420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:24,137-Speed 3402.65 samples/sec   Loss 1.3268   LearningRate 0.0021   Epoch: 17   Global Step: 86430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:27,169-Speed 3377.81 samples/sec   Loss 1.2506   LearningRate 0.0021   Epoch: 17   Global Step: 86440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:30,550-Speed 3030.17 samples/sec   Loss 1.3263   LearningRate 0.0021   Epoch: 17   Global Step: 86450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:33,553-Speed 3410.51 samples/sec   Loss 1.4059   LearningRate 0.0021   Epoch: 17   Global Step: 86460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:36,559-Speed 3407.85 samples/sec   Loss 1.2933   LearningRate 0.0021   Epoch: 17   Global Step: 86470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:39,561-Speed 3411.98 samples/sec   Loss 1.3506   LearningRate 0.0021   Epoch: 17   Global Step: 86480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:10:42,568-Speed 3405.60 samples/sec   Loss 1.3429   LearningRate 0.0021   Epoch: 17   Global Step: 86490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:45,588-Speed 3392.95 samples/sec   Loss 1.3898   LearningRate 0.0021   Epoch: 17   Global Step: 86500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:48,589-Speed 3412.74 samples/sec   Loss 1.3099   LearningRate 0.0021   Epoch: 17   Global Step: 86510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:51,600-Speed 3400.81 samples/sec   Loss 1.4323   LearningRate 0.0021   Epoch: 17   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:54,691-Speed 3314.42 samples/sec   Loss 1.3981   LearningRate 0.0021   Epoch: 17   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 08:10:57,679-Speed 3428.19 samples/sec   Loss 1.3257   LearningRate 0.0021   Epoch: 17   Global Step: 86540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:00,755-Speed 3330.40 samples/sec   Loss 1.3525   LearningRate 0.0021   Epoch: 17   Global Step: 86550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:03,784-Speed 3381.57 samples/sec   Loss 1.3742   LearningRate 0.0021   Epoch: 17   Global Step: 86560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:06,846-Speed 3344.76 samples/sec   Loss 1.3812   LearningRate 0.0021   Epoch: 17   Global Step: 86570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:09,857-Speed 3402.52 samples/sec   Loss 1.4278   LearningRate 0.0021   Epoch: 17   Global Step: 86580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:12,864-Speed 3406.40 samples/sec   Loss 1.2716   LearningRate 0.0021   Epoch: 17   Global Step: 86590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:15,882-Speed 3393.26 samples/sec   Loss 1.3500   LearningRate 0.0021   Epoch: 17   Global Step: 86600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:18,911-Speed 3381.95 samples/sec   Loss 1.3516   LearningRate 0.0021   Epoch: 17   Global Step: 86610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:21,921-Speed 3402.16 samples/sec   Loss 1.3884   LearningRate 0.0021   Epoch: 17   Global Step: 86620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:24,947-Speed 3385.63 samples/sec   Loss 1.3564   LearningRate 0.0021   Epoch: 17   Global Step: 86630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 08:11:27,956-Speed 3403.58 samples/sec   Loss 1.2876   LearningRate 0.0021   Epoch: 17   Global Step: 86640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:11:30,984-Speed 3383.00 samples/sec   Loss 1.2810   LearningRate 0.0021   Epoch: 17   Global Step: 86650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:11:33,997-Speed 3400.15 samples/sec   Loss 1.3444   LearningRate 0.0021   Epoch: 17   Global Step: 86660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:11:37,014-Speed 3395.15 samples/sec   Loss 1.3981   LearningRate 0.0021   Epoch: 17   Global Step: 86670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:11:40,027-Speed 3398.81 samples/sec   Loss 1.4401   LearningRate 0.0020   Epoch: 17   Global Step: 86680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:11:43,012-Speed 3432.14 samples/sec   Loss 1.3868   LearningRate 0.0020   Epoch: 17   Global Step: 86690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:11:46,051-Speed 3370.26 samples/sec   Loss 1.3595   LearningRate 0.0020   Epoch: 17   Global Step: 86700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:11:49,057-Speed 3407.29 samples/sec   Loss 1.3439   LearningRate 0.0020   Epoch: 17   Global Step: 86710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:11:52,072-Speed 3397.48 samples/sec   Loss 1.4099   LearningRate 0.0020   Epoch: 17   Global Step: 86720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:11:55,078-Speed 3407.86 samples/sec   Loss 1.4140   LearningRate 0.0020   Epoch: 17   Global Step: 86730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:11:58,094-Speed 3396.01 samples/sec   Loss 1.3309   LearningRate 0.0020   Epoch: 17   Global Step: 86740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:01,099-Speed 3408.50 samples/sec   Loss 1.3015   LearningRate 0.0020   Epoch: 17   Global Step: 86750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:04,113-Speed 3397.59 samples/sec   Loss 1.4280   LearningRate 0.0020   Epoch: 17   Global Step: 86760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:07,119-Speed 3408.14 samples/sec   Loss 1.3904   LearningRate 0.0020   Epoch: 17   Global Step: 86770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:10,124-Speed 3409.13 samples/sec   Loss 1.3533   LearningRate 0.0020   Epoch: 17   Global Step: 86780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:13,140-Speed 3396.30 samples/sec   Loss 1.3416   LearningRate 0.0020   Epoch: 17   Global Step: 86790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:12:16,187-Speed 3361.00 samples/sec   Loss 1.3466   LearningRate 0.0020   Epoch: 17   Global Step: 86800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:12:19,241-Speed 3353.70 samples/sec   Loss 1.3433   LearningRate 0.0020   Epoch: 17   Global Step: 86810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:12:22,251-Speed 3403.05 samples/sec   Loss 1.3354   LearningRate 0.0020   Epoch: 17   Global Step: 86820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:12:25,258-Speed 3406.73 samples/sec   Loss 1.4095   LearningRate 0.0020   Epoch: 17   Global Step: 86830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:12:28,274-Speed 3395.40 samples/sec   Loss 1.3564   LearningRate 0.0020   Epoch: 17   Global Step: 86840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:12:31,292-Speed 3394.71 samples/sec   Loss 1.4004   LearningRate 0.0020   Epoch: 17   Global Step: 86850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:12:34,279-Speed 3429.39 samples/sec   Loss 1.3173   LearningRate 0.0020   Epoch: 17   Global Step: 86860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:37,294-Speed 3396.14 samples/sec   Loss 1.3391   LearningRate 0.0020   Epoch: 17   Global Step: 86870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:40,306-Speed 3400.93 samples/sec   Loss 1.4048   LearningRate 0.0020   Epoch: 17   Global Step: 86880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:43,312-Speed 3407.75 samples/sec   Loss 1.3410   LearningRate 0.0020   Epoch: 17   Global Step: 86890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:46,319-Speed 3407.04 samples/sec   Loss 1.2538   LearningRate 0.0020   Epoch: 17   Global Step: 86900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:49,327-Speed 3404.24 samples/sec   Loss 1.2821   LearningRate 0.0020   Epoch: 17   Global Step: 86910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:52,344-Speed 3395.87 samples/sec   Loss 1.3086   LearningRate 0.0020   Epoch: 17   Global Step: 86920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:55,353-Speed 3403.80 samples/sec   Loss 1.3632   LearningRate 0.0020   Epoch: 17   Global Step: 86930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:12:58,373-Speed 3391.52 samples/sec   Loss 1.2974   LearningRate 0.0020   Epoch: 17   Global Step: 86940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:13:01,435-Speed 3344.94 samples/sec   Loss 1.3711   LearningRate 0.0020   Epoch: 17   Global Step: 86950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:13:04,480-Speed 3363.83 samples/sec   Loss 1.3655   LearningRate 0.0020   Epoch: 17   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:07,495-Speed 3397.20 samples/sec   Loss 1.3561   LearningRate 0.0020   Epoch: 17   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:10,497-Speed 3412.24 samples/sec   Loss 1.4256   LearningRate 0.0020   Epoch: 17   Global Step: 86980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:13,510-Speed 3398.76 samples/sec   Loss 1.3559   LearningRate 0.0020   Epoch: 17   Global Step: 86990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:16,522-Speed 3401.63 samples/sec   Loss 1.2922   LearningRate 0.0020   Epoch: 17   Global Step: 87000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:19,563-Speed 3368.65 samples/sec   Loss 1.3925   LearningRate 0.0020   Epoch: 17   Global Step: 87010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:22,570-Speed 3405.92 samples/sec   Loss 1.3844   LearningRate 0.0020   Epoch: 17   Global Step: 87020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:25,623-Speed 3355.13 samples/sec   Loss 1.3194   LearningRate 0.0020   Epoch: 17   Global Step: 87030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:28,695-Speed 3334.07 samples/sec   Loss 1.3376   LearningRate 0.0019   Epoch: 17   Global Step: 87040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:31,701-Speed 3407.96 samples/sec   Loss 1.4134   LearningRate 0.0019   Epoch: 17   Global Step: 87050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:34,686-Speed 3430.90 samples/sec   Loss 1.3840   LearningRate 0.0019   Epoch: 17   Global Step: 87060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:37,707-Speed 3391.25 samples/sec   Loss 1.3851   LearningRate 0.0019   Epoch: 17   Global Step: 87070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:40,714-Speed 3406.37 samples/sec   Loss 1.3382   LearningRate 0.0019   Epoch: 17   Global Step: 87080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:43,721-Speed 3405.84 samples/sec   Loss 1.3619   LearningRate 0.0019   Epoch: 17   Global Step: 87090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:46,735-Speed 3398.56 samples/sec   Loss 1.3028   LearningRate 0.0019   Epoch: 17   Global Step: 87100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:49,747-Speed 3401.17 samples/sec   Loss 1.2934   LearningRate 0.0019   Epoch: 17   Global Step: 87110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:52,816-Speed 3336.83 samples/sec   Loss 1.3699   LearningRate 0.0019   Epoch: 17   Global Step: 87120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:55,825-Speed 3404.29 samples/sec   Loss 1.4354   LearningRate 0.0019   Epoch: 17   Global Step: 87130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:13:58,834-Speed 3403.31 samples/sec   Loss 1.3166   LearningRate 0.0019   Epoch: 17   Global Step: 87140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:01,842-Speed 3405.51 samples/sec   Loss 1.3472   LearningRate 0.0019   Epoch: 17   Global Step: 87150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:04,846-Speed 3409.97 samples/sec   Loss 1.3641   LearningRate 0.0019   Epoch: 17   Global Step: 87160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:07,855-Speed 3404.52 samples/sec   Loss 1.3667   LearningRate 0.0019   Epoch: 17   Global Step: 87170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:10,868-Speed 3399.71 samples/sec   Loss 1.3997   LearningRate 0.0019   Epoch: 17   Global Step: 87180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:13,878-Speed 3402.66 samples/sec   Loss 1.2850   LearningRate 0.0019   Epoch: 17   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:16,887-Speed 3403.46 samples/sec   Loss 1.3666   LearningRate 0.0019   Epoch: 17   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:19,958-Speed 3336.85 samples/sec   Loss 1.3467   LearningRate 0.0019   Epoch: 17   Global Step: 87210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:22,951-Speed 3421.76 samples/sec   Loss 1.3833   LearningRate 0.0019   Epoch: 17   Global Step: 87220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:25,959-Speed 3405.51 samples/sec   Loss 1.3650   LearningRate 0.0019   Epoch: 17   Global Step: 87230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:28,972-Speed 3399.66 samples/sec   Loss 1.3454   LearningRate 0.0019   Epoch: 17   Global Step: 87240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:31,983-Speed 3401.38 samples/sec   Loss 1.2597   LearningRate 0.0019   Epoch: 17   Global Step: 87250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:34,990-Speed 3407.12 samples/sec   Loss 1.4138   LearningRate 0.0019   Epoch: 17   Global Step: 87260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:38,002-Speed 3400.62 samples/sec   Loss 1.3497   LearningRate 0.0019   Epoch: 17   Global Step: 87270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:41,038-Speed 3374.35 samples/sec   Loss 1.3620   LearningRate 0.0019   Epoch: 17   Global Step: 87280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:44,065-Speed 3383.71 samples/sec   Loss 1.3031   LearningRate 0.0019   Epoch: 17   Global Step: 87290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:47,088-Speed 3387.77 samples/sec   Loss 1.5053   LearningRate 0.0019   Epoch: 17   Global Step: 87300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:50,107-Speed 3393.87 samples/sec   Loss 1.3084   LearningRate 0.0019   Epoch: 17   Global Step: 87310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:53,123-Speed 3396.14 samples/sec   Loss 1.3582   LearningRate 0.0019   Epoch: 17   Global Step: 87320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:14:56,117-Speed 3421.21 samples/sec   Loss 1.3717   LearningRate 0.0019   Epoch: 17   Global Step: 87330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:14:59,128-Speed 3402.14 samples/sec   Loss 1.3920   LearningRate 0.0019   Epoch: 17   Global Step: 87340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:02,184-Speed 3351.69 samples/sec   Loss 1.3688   LearningRate 0.0019   Epoch: 17   Global Step: 87350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:05,249-Speed 3341.65 samples/sec   Loss 1.2883   LearningRate 0.0019   Epoch: 17   Global Step: 87360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:08,329-Speed 3325.87 samples/sec   Loss 1.3514   LearningRate 0.0019   Epoch: 17   Global Step: 87370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:11,340-Speed 3402.39 samples/sec   Loss 1.3654   LearningRate 0.0019   Epoch: 17   Global Step: 87380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:14,355-Speed 3396.64 samples/sec   Loss 1.3986   LearningRate 0.0019   Epoch: 17   Global Step: 87390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:17,384-Speed 3381.42 samples/sec   Loss 1.2709   LearningRate 0.0019   Epoch: 17   Global Step: 87400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:20,391-Speed 3406.27 samples/sec   Loss 1.3616   LearningRate 0.0018   Epoch: 17   Global Step: 87410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:23,410-Speed 3392.65 samples/sec   Loss 1.4479   LearningRate 0.0018   Epoch: 17   Global Step: 87420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:15:26,454-Speed 3365.67 samples/sec   Loss 1.4042   LearningRate 0.0018   Epoch: 17   Global Step: 87430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:29,520-Speed 3340.78 samples/sec   Loss 1.3478   LearningRate 0.0018   Epoch: 17   Global Step: 87440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:32,530-Speed 3402.75 samples/sec   Loss 1.3027   LearningRate 0.0018   Epoch: 17   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:35,538-Speed 3405.29 samples/sec   Loss 1.4647   LearningRate 0.0018   Epoch: 17   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:38,559-Speed 3390.47 samples/sec   Loss 1.3950   LearningRate 0.0018   Epoch: 17   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:41,569-Speed 3403.04 samples/sec   Loss 1.4899   LearningRate 0.0018   Epoch: 17   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:44,578-Speed 3404.37 samples/sec   Loss 1.4140   LearningRate 0.0018   Epoch: 17   Global Step: 87490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:47,604-Speed 3384.93 samples/sec   Loss 1.3026   LearningRate 0.0018   Epoch: 17   Global Step: 87500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:50,640-Speed 3374.30 samples/sec   Loss 1.3016   LearningRate 0.0018   Epoch: 17   Global Step: 87510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:53,646-Speed 3406.68 samples/sec   Loss 1.3404   LearningRate 0.0018   Epoch: 17   Global Step: 87520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:56,647-Speed 3413.30 samples/sec   Loss 1.4132   LearningRate 0.0018   Epoch: 17   Global Step: 87530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:15:59,656-Speed 3404.77 samples/sec   Loss 1.4400   LearningRate 0.0018   Epoch: 17   Global Step: 87540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:02,691-Speed 3373.65 samples/sec   Loss 1.4374   LearningRate 0.0018   Epoch: 17   Global Step: 87550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:05,709-Speed 3394.55 samples/sec   Loss 1.3762   LearningRate 0.0018   Epoch: 17   Global Step: 87560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:08,760-Speed 3357.47 samples/sec   Loss 1.3656   LearningRate 0.0018   Epoch: 17   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:11,774-Speed 3398.80 samples/sec   Loss 1.3729   LearningRate 0.0018   Epoch: 17   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:14,782-Speed 3405.61 samples/sec   Loss 1.5050   LearningRate 0.0018   Epoch: 17   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:17,827-Speed 3362.74 samples/sec   Loss 1.3578   LearningRate 0.0018   Epoch: 17   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:20,854-Speed 3384.80 samples/sec   Loss 1.3784   LearningRate 0.0018   Epoch: 17   Global Step: 87610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:23,866-Speed 3399.36 samples/sec   Loss 1.4052   LearningRate 0.0018   Epoch: 17   Global Step: 87620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:16:26,890-Speed 3387.38 samples/sec   Loss 1.3835   LearningRate 0.0018   Epoch: 17   Global Step: 87630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 08:16:29,869-Speed 3439.47 samples/sec   Loss 1.4702   LearningRate 0.0018   Epoch: 17   Global Step: 87640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:32,883-Speed 3398.43 samples/sec   Loss 1.4091   LearningRate 0.0018   Epoch: 17   Global Step: 87650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:35,901-Speed 3393.25 samples/sec   Loss 1.3753   LearningRate 0.0018   Epoch: 17   Global Step: 87660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:38,916-Speed 3398.33 samples/sec   Loss 1.4080   LearningRate 0.0018   Epoch: 17   Global Step: 87670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:41,933-Speed 3395.90 samples/sec   Loss 1.3830   LearningRate 0.0018   Epoch: 17   Global Step: 87680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:44,948-Speed 3397.07 samples/sec   Loss 1.4769   LearningRate 0.0018   Epoch: 17   Global Step: 87690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:47,956-Speed 3404.18 samples/sec   Loss 1.3453   LearningRate 0.0018   Epoch: 17   Global Step: 87700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:50,967-Speed 3402.54 samples/sec   Loss 1.4178   LearningRate 0.0018   Epoch: 17   Global Step: 87710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:53,983-Speed 3395.53 samples/sec   Loss 1.4203   LearningRate 0.0018   Epoch: 17   Global Step: 87720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:16:56,999-Speed 3397.17 samples/sec   Loss 1.5259   LearningRate 0.0018   Epoch: 17   Global Step: 87730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:17:00,025-Speed 3383.81 samples/sec   Loss 1.3396   LearningRate 0.0018   Epoch: 17   Global Step: 87740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:03,077-Speed 3356.56 samples/sec   Loss 1.4785   LearningRate 0.0018   Epoch: 17   Global Step: 87750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:06,104-Speed 3384.43 samples/sec   Loss 1.3809   LearningRate 0.0018   Epoch: 17   Global Step: 87760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:09,122-Speed 3392.63 samples/sec   Loss 1.3002   LearningRate 0.0018   Epoch: 17   Global Step: 87770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:12,137-Speed 3398.39 samples/sec   Loss 1.3537   LearningRate 0.0017   Epoch: 17   Global Step: 87780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:15,174-Speed 3372.68 samples/sec   Loss 1.3814   LearningRate 0.0017   Epoch: 17   Global Step: 87790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:18,198-Speed 3386.51 samples/sec   Loss 1.3454   LearningRate 0.0017   Epoch: 17   Global Step: 87800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:21,209-Speed 3402.30 samples/sec   Loss 1.3597   LearningRate 0.0017   Epoch: 17   Global Step: 87810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:24,234-Speed 3386.27 samples/sec   Loss 1.3721   LearningRate 0.0017   Epoch: 17   Global Step: 87820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:27,245-Speed 3400.93 samples/sec   Loss 1.3416   LearningRate 0.0017   Epoch: 17   Global Step: 87830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:30,245-Speed 3414.56 samples/sec   Loss 1.4577   LearningRate 0.0017   Epoch: 17   Global Step: 87840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:33,255-Speed 3402.69 samples/sec   Loss 1.3967   LearningRate 0.0017   Epoch: 17   Global Step: 87850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:36,270-Speed 3397.53 samples/sec   Loss 1.4059   LearningRate 0.0017   Epoch: 17   Global Step: 87860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:39,294-Speed 3386.70 samples/sec   Loss 1.3956   LearningRate 0.0017   Epoch: 17   Global Step: 87870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:17:42,307-Speed 3400.43 samples/sec   Loss 1.4306   LearningRate 0.0017   Epoch: 17   Global Step: 87880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:17:45,313-Speed 3408.33 samples/sec   Loss 1.4391   LearningRate 0.0017   Epoch: 17   Global Step: 87890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:17:48,326-Speed 3399.00 samples/sec   Loss 1.3990   LearningRate 0.0017   Epoch: 17   Global Step: 87900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:17:51,416-Speed 3316.03 samples/sec   Loss 1.3748   LearningRate 0.0017   Epoch: 17   Global Step: 87910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:17:54,484-Speed 3338.01 samples/sec   Loss 1.3190   LearningRate 0.0017   Epoch: 17   Global Step: 87920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:17:57,503-Speed 3392.41 samples/sec   Loss 1.4038   LearningRate 0.0017   Epoch: 17   Global Step: 87930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:18:00,515-Speed 3401.45 samples/sec   Loss 1.3829   LearningRate 0.0017   Epoch: 17   Global Step: 87940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:18:03,550-Speed 3373.87 samples/sec   Loss 1.3525   LearningRate 0.0017   Epoch: 17   Global Step: 87950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:18:06,563-Speed 3400.60 samples/sec   Loss 1.3466   LearningRate 0.0017   Epoch: 17   Global Step: 87960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:18:09,585-Speed 3388.56 samples/sec   Loss 1.3550   LearningRate 0.0017   Epoch: 17   Global Step: 87970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:18:12,616-Speed 3379.00 samples/sec   Loss 1.4046   LearningRate 0.0017   Epoch: 17   Global Step: 87980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:18:15,632-Speed 3396.36 samples/sec   Loss 1.3661   LearningRate 0.0017   Epoch: 17   Global Step: 87990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:18:18,686-Speed 3353.85 samples/sec   Loss 1.3331   LearningRate 0.0017   Epoch: 17   Global Step: 88000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:19:03,003-[lfw][88000]XNorm: 22.146688
Training: 2022-04-11 08:19:03,003-[lfw][88000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 08:19:03,004-[lfw][88000]Accuracy-Highest: 0.99850
Training: 2022-04-11 08:19:54,257-[cfp_fp][88000]XNorm: 22.118556
Training: 2022-04-11 08:19:54,258-[cfp_fp][88000]Accuracy-Flip: 0.98857+-0.00461
Training: 2022-04-11 08:19:54,258-[cfp_fp][88000]Accuracy-Highest: 0.98857
Training: 2022-04-11 08:20:38,400-[agedb_30][88000]XNorm: 22.611050
Training: 2022-04-11 08:20:38,401-[agedb_30][88000]Accuracy-Flip: 0.98233+-0.00793
Training: 2022-04-11 08:20:38,401-[agedb_30][88000]Accuracy-Highest: 0.98550
Training: 2022-04-11 08:20:41,398-Speed 71.75 samples/sec   Loss 1.3531   LearningRate 0.0017   Epoch: 17   Global Step: 88010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:20:44,391-Speed 3421.56 samples/sec   Loss 1.3390   LearningRate 0.0017   Epoch: 17   Global Step: 88020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:20:47,395-Speed 3409.91 samples/sec   Loss 1.3188   LearningRate 0.0017   Epoch: 17   Global Step: 88030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:20:50,391-Speed 3418.85 samples/sec   Loss 1.4532   LearningRate 0.0017   Epoch: 17   Global Step: 88040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:20:53,401-Speed 3403.55 samples/sec   Loss 1.3794   LearningRate 0.0017   Epoch: 17   Global Step: 88050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:20:56,402-Speed 3413.99 samples/sec   Loss 1.3794   LearningRate 0.0017   Epoch: 17   Global Step: 88060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:20:59,401-Speed 3414.23 samples/sec   Loss 1.3948   LearningRate 0.0017   Epoch: 17   Global Step: 88070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:02,384-Speed 3433.63 samples/sec   Loss 1.4084   LearningRate 0.0017   Epoch: 17   Global Step: 88080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:05,392-Speed 3405.55 samples/sec   Loss 1.3908   LearningRate 0.0017   Epoch: 17   Global Step: 88090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:08,377-Speed 3431.05 samples/sec   Loss 1.3855   LearningRate 0.0017   Epoch: 17   Global Step: 88100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:11,372-Speed 3421.12 samples/sec   Loss 1.3058   LearningRate 0.0017   Epoch: 17   Global Step: 88110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:14,411-Speed 3369.75 samples/sec   Loss 1.3174   LearningRate 0.0017   Epoch: 17   Global Step: 88120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:17,413-Speed 3412.01 samples/sec   Loss 1.3915   LearningRate 0.0017   Epoch: 17   Global Step: 88130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:20,412-Speed 3415.35 samples/sec   Loss 1.3892   LearningRate 0.0017   Epoch: 17   Global Step: 88140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:23,441-Speed 3380.99 samples/sec   Loss 1.3359   LearningRate 0.0017   Epoch: 17   Global Step: 88150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:26,452-Speed 3402.74 samples/sec   Loss 1.2779   LearningRate 0.0017   Epoch: 17   Global Step: 88160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:29,468-Speed 3395.40 samples/sec   Loss 1.3653   LearningRate 0.0016   Epoch: 17   Global Step: 88170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:32,508-Speed 3369.95 samples/sec   Loss 1.3987   LearningRate 0.0016   Epoch: 17   Global Step: 88180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:35,525-Speed 3407.06 samples/sec   Loss 1.4186   LearningRate 0.0016   Epoch: 17   Global Step: 88190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:21:38,528-Speed 3410.89 samples/sec   Loss 1.3236   LearningRate 0.0016   Epoch: 17   Global Step: 88200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:41,635-Speed 3296.42 samples/sec   Loss 1.3053   LearningRate 0.0016   Epoch: 17   Global Step: 88210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:44,640-Speed 3408.07 samples/sec   Loss 1.4308   LearningRate 0.0016   Epoch: 17   Global Step: 88220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:47,668-Speed 3383.07 samples/sec   Loss 1.2678   LearningRate 0.0016   Epoch: 17   Global Step: 88230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:50,686-Speed 3393.61 samples/sec   Loss 1.4392   LearningRate 0.0016   Epoch: 17   Global Step: 88240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:53,702-Speed 3395.80 samples/sec   Loss 1.3513   LearningRate 0.0016   Epoch: 17   Global Step: 88250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:56,713-Speed 3402.18 samples/sec   Loss 1.5255   LearningRate 0.0016   Epoch: 17   Global Step: 88260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:21:59,715-Speed 3412.50 samples/sec   Loss 1.3735   LearningRate 0.0016   Epoch: 17   Global Step: 88270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:02,729-Speed 3398.44 samples/sec   Loss 1.3859   LearningRate 0.0016   Epoch: 17   Global Step: 88280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:05,744-Speed 3410.90 samples/sec   Loss 1.4042   LearningRate 0.0016   Epoch: 17   Global Step: 88290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:08,732-Speed 3427.13 samples/sec   Loss 1.3902   LearningRate 0.0016   Epoch: 17   Global Step: 88300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:11,787-Speed 3352.95 samples/sec   Loss 1.3823   LearningRate 0.0016   Epoch: 17   Global Step: 88310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:14,821-Speed 3380.35 samples/sec   Loss 1.3703   LearningRate 0.0016   Epoch: 17   Global Step: 88320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:17,831-Speed 3402.43 samples/sec   Loss 1.3916   LearningRate 0.0016   Epoch: 17   Global Step: 88330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:20,851-Speed 3402.39 samples/sec   Loss 1.3337   LearningRate 0.0016   Epoch: 17   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:23,855-Speed 3408.71 samples/sec   Loss 1.3878   LearningRate 0.0016   Epoch: 17   Global Step: 88350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:26,876-Speed 3390.61 samples/sec   Loss 1.4113   LearningRate 0.0016   Epoch: 17   Global Step: 88360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:29,900-Speed 3406.91 samples/sec   Loss 1.3421   LearningRate 0.0016   Epoch: 17   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:32,910-Speed 3402.77 samples/sec   Loss 1.3881   LearningRate 0.0016   Epoch: 17   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:22:35,918-Speed 3410.63 samples/sec   Loss 1.4359   LearningRate 0.0016   Epoch: 17   Global Step: 88390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:22:38,950-Speed 3378.38 samples/sec   Loss 1.4487   LearningRate 0.0016   Epoch: 17   Global Step: 88400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:22:41,965-Speed 3397.76 samples/sec   Loss 1.4072   LearningRate 0.0016   Epoch: 17   Global Step: 88410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:22:45,176-Speed 3411.63 samples/sec   Loss 1.3890   LearningRate 0.0016   Epoch: 17   Global Step: 88420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:22:48,182-Speed 3407.59 samples/sec   Loss 1.3643   LearningRate 0.0016   Epoch: 17   Global Step: 88430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:22:51,200-Speed 3399.01 samples/sec   Loss 1.3970   LearningRate 0.0016   Epoch: 17   Global Step: 88440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:22:54,205-Speed 3408.87 samples/sec   Loss 1.3489   LearningRate 0.0016   Epoch: 17   Global Step: 88450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:22:57,216-Speed 3402.10 samples/sec   Loss 1.4077   LearningRate 0.0016   Epoch: 17   Global Step: 88460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:00,242-Speed 3408.66 samples/sec   Loss 1.3915   LearningRate 0.0016   Epoch: 17   Global Step: 88470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:03,249-Speed 3405.81 samples/sec   Loss 1.4482   LearningRate 0.0016   Epoch: 17   Global Step: 88480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:06,362-Speed 3414.21 samples/sec   Loss 1.3400   LearningRate 0.0016   Epoch: 17   Global Step: 88490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:09,378-Speed 3396.84 samples/sec   Loss 1.4403   LearningRate 0.0016   Epoch: 17   Global Step: 88500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:12,388-Speed 3402.82 samples/sec   Loss 1.3773   LearningRate 0.0016   Epoch: 17   Global Step: 88510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:15,391-Speed 3413.27 samples/sec   Loss 1.3520   LearningRate 0.0016   Epoch: 17   Global Step: 88520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:18,404-Speed 3399.52 samples/sec   Loss 1.4097   LearningRate 0.0016   Epoch: 17   Global Step: 88530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:21,424-Speed 3398.30 samples/sec   Loss 1.3185   LearningRate 0.0016   Epoch: 17   Global Step: 88540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:24,430-Speed 3407.38 samples/sec   Loss 1.4543   LearningRate 0.0016   Epoch: 17   Global Step: 88550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:27,459-Speed 3381.39 samples/sec   Loss 1.3557   LearningRate 0.0016   Epoch: 17   Global Step: 88560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:30,486-Speed 3390.04 samples/sec   Loss 1.3437   LearningRate 0.0015   Epoch: 17   Global Step: 88570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:33,495-Speed 3403.86 samples/sec   Loss 1.3695   LearningRate 0.0015   Epoch: 17   Global Step: 88580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:36,510-Speed 3425.92 samples/sec   Loss 1.3637   LearningRate 0.0015   Epoch: 17   Global Step: 88590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:39,520-Speed 3402.25 samples/sec   Loss 1.3254   LearningRate 0.0015   Epoch: 17   Global Step: 88600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:23:42,509-Speed 3426.43 samples/sec   Loss 1.3526   LearningRate 0.0015   Epoch: 17   Global Step: 88610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:45,533-Speed 3387.75 samples/sec   Loss 1.4315   LearningRate 0.0015   Epoch: 17   Global Step: 88620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:48,547-Speed 3398.47 samples/sec   Loss 1.4626   LearningRate 0.0015   Epoch: 17   Global Step: 88630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:51,627-Speed 3325.03 samples/sec   Loss 1.3160   LearningRate 0.0015   Epoch: 17   Global Step: 88640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:54,640-Speed 3399.83 samples/sec   Loss 1.4334   LearningRate 0.0015   Epoch: 17   Global Step: 88650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:23:57,649-Speed 3403.30 samples/sec   Loss 1.3181   LearningRate 0.0015   Epoch: 17   Global Step: 88660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:00,662-Speed 3400.28 samples/sec   Loss 1.3347   LearningRate 0.0015   Epoch: 17   Global Step: 88670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:03,675-Speed 3400.15 samples/sec   Loss 1.3410   LearningRate 0.0015   Epoch: 17   Global Step: 88680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:06,694-Speed 3392.99 samples/sec   Loss 1.3802   LearningRate 0.0015   Epoch: 17   Global Step: 88690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:09,704-Speed 3402.51 samples/sec   Loss 1.3089   LearningRate 0.0015   Epoch: 17   Global Step: 88700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:12,727-Speed 3388.63 samples/sec   Loss 1.4100   LearningRate 0.0015   Epoch: 17   Global Step: 88710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:24:15,787-Speed 3346.79 samples/sec   Loss 1.4183   LearningRate 0.0015   Epoch: 17   Global Step: 88720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:24:18,821-Speed 3376.10 samples/sec   Loss 1.4276   LearningRate 0.0015   Epoch: 17   Global Step: 88730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:24:21,831-Speed 3406.26 samples/sec   Loss 1.4727   LearningRate 0.0015   Epoch: 17   Global Step: 88740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:24:24,826-Speed 3419.50 samples/sec   Loss 1.3794   LearningRate 0.0015   Epoch: 17   Global Step: 88750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:27,837-Speed 3401.41 samples/sec   Loss 1.3600   LearningRate 0.0015   Epoch: 17   Global Step: 88760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:30,853-Speed 3397.10 samples/sec   Loss 1.4284   LearningRate 0.0015   Epoch: 17   Global Step: 88770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:33,861-Speed 3405.13 samples/sec   Loss 1.2897   LearningRate 0.0015   Epoch: 17   Global Step: 88780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:36,940-Speed 3326.57 samples/sec   Loss 1.3623   LearningRate 0.0015   Epoch: 17   Global Step: 88790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:39,953-Speed 3398.81 samples/sec   Loss 1.4036   LearningRate 0.0015   Epoch: 17   Global Step: 88800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:43,002-Speed 3359.50 samples/sec   Loss 1.3823   LearningRate 0.0015   Epoch: 17   Global Step: 88810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:46,017-Speed 3397.83 samples/sec   Loss 1.4101   LearningRate 0.0015   Epoch: 17   Global Step: 88820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:49,064-Speed 3361.24 samples/sec   Loss 1.3755   LearningRate 0.0015   Epoch: 17   Global Step: 88830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:52,082-Speed 3393.32 samples/sec   Loss 1.4640   LearningRate 0.0015   Epoch: 17   Global Step: 88840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:24:55,112-Speed 3380.39 samples/sec   Loss 1.3259   LearningRate 0.0015   Epoch: 17   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:24:58,134-Speed 3389.56 samples/sec   Loss 1.3523   LearningRate 0.0015   Epoch: 17   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:01,165-Speed 3380.24 samples/sec   Loss 1.4420   LearningRate 0.0015   Epoch: 17   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:04,210-Speed 3362.88 samples/sec   Loss 1.3838   LearningRate 0.0015   Epoch: 17   Global Step: 88880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:07,204-Speed 3421.38 samples/sec   Loss 1.3355   LearningRate 0.0015   Epoch: 17   Global Step: 88890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:10,216-Speed 3401.49 samples/sec   Loss 1.3297   LearningRate 0.0015   Epoch: 17   Global Step: 88900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:13,227-Speed 3401.33 samples/sec   Loss 1.3608   LearningRate 0.0015   Epoch: 17   Global Step: 88910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:16,270-Speed 3366.27 samples/sec   Loss 1.4569   LearningRate 0.0015   Epoch: 17   Global Step: 88920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:19,283-Speed 3398.88 samples/sec   Loss 1.3981   LearningRate 0.0015   Epoch: 17   Global Step: 88930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:22,294-Speed 3401.59 samples/sec   Loss 1.3759   LearningRate 0.0015   Epoch: 17   Global Step: 88940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:25,315-Speed 3391.14 samples/sec   Loss 1.4786   LearningRate 0.0015   Epoch: 17   Global Step: 88950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:28,327-Speed 3399.94 samples/sec   Loss 1.4175   LearningRate 0.0015   Epoch: 17   Global Step: 88960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:31,340-Speed 3399.76 samples/sec   Loss 1.4053   LearningRate 0.0015   Epoch: 17   Global Step: 88970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:34,354-Speed 3398.73 samples/sec   Loss 1.4606   LearningRate 0.0014   Epoch: 17   Global Step: 88980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:25:37,372-Speed 3393.85 samples/sec   Loss 1.4143   LearningRate 0.0014   Epoch: 17   Global Step: 88990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:40,383-Speed 3401.77 samples/sec   Loss 1.2947   LearningRate 0.0014   Epoch: 17   Global Step: 89000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:43,399-Speed 3395.76 samples/sec   Loss 1.4162   LearningRate 0.0014   Epoch: 17   Global Step: 89010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:46,416-Speed 3394.87 samples/sec   Loss 1.4179   LearningRate 0.0014   Epoch: 17   Global Step: 89020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:49,521-Speed 3299.59 samples/sec   Loss 1.3416   LearningRate 0.0014   Epoch: 17   Global Step: 89030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:52,540-Speed 3391.77 samples/sec   Loss 1.4311   LearningRate 0.0014   Epoch: 17   Global Step: 89040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:55,578-Speed 3371.62 samples/sec   Loss 1.4142   LearningRate 0.0014   Epoch: 17   Global Step: 89050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:25:58,589-Speed 3402.32 samples/sec   Loss 1.3631   LearningRate 0.0014   Epoch: 17   Global Step: 89060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:01,608-Speed 3393.55 samples/sec   Loss 1.3278   LearningRate 0.0014   Epoch: 17   Global Step: 89070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:04,633-Speed 3385.99 samples/sec   Loss 1.4604   LearningRate 0.0014   Epoch: 17   Global Step: 89080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:07,632-Speed 3415.50 samples/sec   Loss 1.4872   LearningRate 0.0014   Epoch: 17   Global Step: 89090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:10,645-Speed 3399.66 samples/sec   Loss 1.4158   LearningRate 0.0014   Epoch: 17   Global Step: 89100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:13,670-Speed 3385.90 samples/sec   Loss 1.4127   LearningRate 0.0014   Epoch: 17   Global Step: 89110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:16,688-Speed 3394.35 samples/sec   Loss 1.3534   LearningRate 0.0014   Epoch: 17   Global Step: 89120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:19,701-Speed 3399.27 samples/sec   Loss 1.3737   LearningRate 0.0014   Epoch: 17   Global Step: 89130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:22,714-Speed 3399.59 samples/sec   Loss 1.4317   LearningRate 0.0014   Epoch: 17   Global Step: 89140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:25,728-Speed 3398.47 samples/sec   Loss 1.2359   LearningRate 0.0014   Epoch: 17   Global Step: 89150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:28,744-Speed 3396.76 samples/sec   Loss 1.3293   LearningRate 0.0014   Epoch: 17   Global Step: 89160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:26:31,740-Speed 3419.05 samples/sec   Loss 1.4270   LearningRate 0.0014   Epoch: 17   Global Step: 89170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:34,767-Speed 3383.56 samples/sec   Loss 1.4553   LearningRate 0.0014   Epoch: 17   Global Step: 89180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:37,788-Speed 3389.90 samples/sec   Loss 1.3982   LearningRate 0.0014   Epoch: 17   Global Step: 89190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:40,807-Speed 3393.23 samples/sec   Loss 1.3793   LearningRate 0.0014   Epoch: 17   Global Step: 89200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:43,820-Speed 3399.12 samples/sec   Loss 1.3996   LearningRate 0.0014   Epoch: 17   Global Step: 89210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:46,840-Speed 3392.46 samples/sec   Loss 1.3856   LearningRate 0.0014   Epoch: 17   Global Step: 89220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:49,870-Speed 3379.54 samples/sec   Loss 1.3290   LearningRate 0.0014   Epoch: 17   Global Step: 89230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:52,886-Speed 3396.38 samples/sec   Loss 1.2532   LearningRate 0.0014   Epoch: 17   Global Step: 89240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:55,906-Speed 3391.36 samples/sec   Loss 1.4547   LearningRate 0.0014   Epoch: 17   Global Step: 89250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:26:58,946-Speed 3369.79 samples/sec   Loss 1.3569   LearningRate 0.0014   Epoch: 17   Global Step: 89260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:27:01,999-Speed 3354.76 samples/sec   Loss 1.4087   LearningRate 0.0014   Epoch: 17   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:05,025-Speed 3384.89 samples/sec   Loss 1.4653   LearningRate 0.0014   Epoch: 17   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:08,055-Speed 3381.20 samples/sec   Loss 1.2970   LearningRate 0.0014   Epoch: 17   Global Step: 89290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:11,067-Speed 3400.84 samples/sec   Loss 1.3489   LearningRate 0.0014   Epoch: 17   Global Step: 89300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:14,090-Speed 3387.65 samples/sec   Loss 1.3598   LearningRate 0.0014   Epoch: 17   Global Step: 89310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:17,149-Speed 3348.88 samples/sec   Loss 1.4840   LearningRate 0.0014   Epoch: 17   Global Step: 89320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:20,186-Speed 3373.69 samples/sec   Loss 1.4105   LearningRate 0.0014   Epoch: 17   Global Step: 89330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:23,214-Speed 3381.56 samples/sec   Loss 1.3612   LearningRate 0.0014   Epoch: 17   Global Step: 89340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:26,224-Speed 3403.69 samples/sec   Loss 1.4181   LearningRate 0.0014   Epoch: 17   Global Step: 89350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:29,248-Speed 3387.36 samples/sec   Loss 1.3739   LearningRate 0.0014   Epoch: 17   Global Step: 89360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:32,258-Speed 3402.92 samples/sec   Loss 1.3540   LearningRate 0.0014   Epoch: 17   Global Step: 89370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:35,268-Speed 3402.45 samples/sec   Loss 1.4308   LearningRate 0.0014   Epoch: 17   Global Step: 89380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:38,282-Speed 3398.46 samples/sec   Loss 1.3542   LearningRate 0.0014   Epoch: 17   Global Step: 89390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:41,300-Speed 3393.66 samples/sec   Loss 1.3675   LearningRate 0.0014   Epoch: 17   Global Step: 89400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:44,310-Speed 3402.92 samples/sec   Loss 1.3921   LearningRate 0.0013   Epoch: 17   Global Step: 89410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:47,339-Speed 3381.73 samples/sec   Loss 1.2937   LearningRate 0.0013   Epoch: 17   Global Step: 89420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:50,376-Speed 3373.20 samples/sec   Loss 1.3541   LearningRate 0.0013   Epoch: 17   Global Step: 89430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:53,398-Speed 3389.16 samples/sec   Loss 1.2885   LearningRate 0.0013   Epoch: 17   Global Step: 89440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:56,418-Speed 3391.79 samples/sec   Loss 1.3399   LearningRate 0.0013   Epoch: 17   Global Step: 89450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:27:59,433-Speed 3396.65 samples/sec   Loss 1.4533   LearningRate 0.0013   Epoch: 17   Global Step: 89460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:02,437-Speed 3409.19 samples/sec   Loss 1.3983   LearningRate 0.0013   Epoch: 17   Global Step: 89470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:05,448-Speed 3402.08 samples/sec   Loss 1.4007   LearningRate 0.0013   Epoch: 17   Global Step: 89480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:08,469-Speed 3390.21 samples/sec   Loss 1.4584   LearningRate 0.0013   Epoch: 17   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:11,495-Speed 3385.26 samples/sec   Loss 1.3226   LearningRate 0.0013   Epoch: 17   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:14,538-Speed 3366.25 samples/sec   Loss 1.3837   LearningRate 0.0013   Epoch: 17   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:17,560-Speed 3388.84 samples/sec   Loss 1.3482   LearningRate 0.0013   Epoch: 17   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:20,582-Speed 3390.39 samples/sec   Loss 1.3723   LearningRate 0.0013   Epoch: 17   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:23,592-Speed 3402.73 samples/sec   Loss 1.3793   LearningRate 0.0013   Epoch: 17   Global Step: 89540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:26,611-Speed 3392.31 samples/sec   Loss 1.2868   LearningRate 0.0013   Epoch: 17   Global Step: 89550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:28:29,608-Speed 3417.72 samples/sec   Loss 1.3836   LearningRate 0.0013   Epoch: 17   Global Step: 89560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:32,621-Speed 3399.54 samples/sec   Loss 1.3915   LearningRate 0.0013   Epoch: 17   Global Step: 89570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:35,636-Speed 3397.68 samples/sec   Loss 1.3820   LearningRate 0.0013   Epoch: 17   Global Step: 89580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:38,654-Speed 3394.17 samples/sec   Loss 1.3796   LearningRate 0.0013   Epoch: 17   Global Step: 89590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:41,667-Speed 3399.63 samples/sec   Loss 1.3268   LearningRate 0.0013   Epoch: 17   Global Step: 89600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:44,705-Speed 3370.47 samples/sec   Loss 1.4160   LearningRate 0.0013   Epoch: 17   Global Step: 89610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:47,729-Speed 3387.32 samples/sec   Loss 1.3690   LearningRate 0.0013   Epoch: 17   Global Step: 89620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:50,751-Speed 3390.11 samples/sec   Loss 1.3650   LearningRate 0.0013   Epoch: 17   Global Step: 89630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:53,769-Speed 3393.69 samples/sec   Loss 1.3764   LearningRate 0.0013   Epoch: 17   Global Step: 89640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:56,794-Speed 3386.44 samples/sec   Loss 1.3678   LearningRate 0.0013   Epoch: 17   Global Step: 89650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:28:59,820-Speed 3384.12 samples/sec   Loss 1.2955   LearningRate 0.0013   Epoch: 17   Global Step: 89660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:02,883-Speed 3343.55 samples/sec   Loss 1.4281   LearningRate 0.0013   Epoch: 17   Global Step: 89670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:05,903-Speed 3391.75 samples/sec   Loss 1.4150   LearningRate 0.0013   Epoch: 17   Global Step: 89680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:08,947-Speed 3365.24 samples/sec   Loss 1.4184   LearningRate 0.0013   Epoch: 17   Global Step: 89690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:11,987-Speed 3368.93 samples/sec   Loss 1.4519   LearningRate 0.0013   Epoch: 17   Global Step: 89700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:15,005-Speed 3394.15 samples/sec   Loss 1.4370   LearningRate 0.0013   Epoch: 17   Global Step: 89710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:18,030-Speed 3385.81 samples/sec   Loss 1.3717   LearningRate 0.0013   Epoch: 17   Global Step: 89720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:21,032-Speed 3412.35 samples/sec   Loss 1.4792   LearningRate 0.0013   Epoch: 17   Global Step: 89730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:24,058-Speed 3385.20 samples/sec   Loss 1.3761   LearningRate 0.0013   Epoch: 17   Global Step: 89740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:27,079-Speed 3389.84 samples/sec   Loss 1.3842   LearningRate 0.0013   Epoch: 17   Global Step: 89750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:30,093-Speed 3398.72 samples/sec   Loss 1.4607   LearningRate 0.0013   Epoch: 17   Global Step: 89760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:33,114-Speed 3389.81 samples/sec   Loss 1.3456   LearningRate 0.0013   Epoch: 17   Global Step: 89770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:36,121-Speed 3406.37 samples/sec   Loss 1.4132   LearningRate 0.0013   Epoch: 17   Global Step: 89780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:39,163-Speed 3367.49 samples/sec   Loss 1.3965   LearningRate 0.0013   Epoch: 17   Global Step: 89790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:42,176-Speed 3399.46 samples/sec   Loss 1.3659   LearningRate 0.0013   Epoch: 17   Global Step: 89800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:45,192-Speed 3396.14 samples/sec   Loss 1.3455   LearningRate 0.0013   Epoch: 17   Global Step: 89810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:48,203-Speed 3402.35 samples/sec   Loss 1.3495   LearningRate 0.0013   Epoch: 17   Global Step: 89820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:29:51,220-Speed 3394.25 samples/sec   Loss 1.4227   LearningRate 0.0013   Epoch: 17   Global Step: 89830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:54,233-Speed 3400.07 samples/sec   Loss 1.4103   LearningRate 0.0013   Epoch: 17   Global Step: 89840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:29:57,248-Speed 3397.02 samples/sec   Loss 1.3454   LearningRate 0.0012   Epoch: 17   Global Step: 89850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:30:00,260-Speed 3400.66 samples/sec   Loss 1.3811   LearningRate 0.0012   Epoch: 17   Global Step: 89860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:30:03,324-Speed 3342.62 samples/sec   Loss 1.4039   LearningRate 0.0012   Epoch: 17   Global Step: 89870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:30:06,337-Speed 3399.87 samples/sec   Loss 1.3637   LearningRate 0.0012   Epoch: 17   Global Step: 89880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:30:09,353-Speed 3395.58 samples/sec   Loss 1.3696   LearningRate 0.0012   Epoch: 17   Global Step: 89890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:30:12,368-Speed 3398.15 samples/sec   Loss 1.4211   LearningRate 0.0012   Epoch: 17   Global Step: 89900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:30:15,420-Speed 3355.53 samples/sec   Loss 1.4505   LearningRate 0.0012   Epoch: 17   Global Step: 89910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:18,434-Speed 3397.43 samples/sec   Loss 1.4442   LearningRate 0.0012   Epoch: 17   Global Step: 89920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:21,446-Speed 3401.09 samples/sec   Loss 1.3834   LearningRate 0.0012   Epoch: 17   Global Step: 89930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:24,469-Speed 3388.10 samples/sec   Loss 1.3818   LearningRate 0.0012   Epoch: 17   Global Step: 89940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:27,488-Speed 3393.72 samples/sec   Loss 1.3708   LearningRate 0.0012   Epoch: 17   Global Step: 89950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:30,507-Speed 3392.24 samples/sec   Loss 1.4708   LearningRate 0.0012   Epoch: 17   Global Step: 89960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:33,525-Speed 3393.81 samples/sec   Loss 1.3246   LearningRate 0.0012   Epoch: 17   Global Step: 89970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:36,545-Speed 3392.26 samples/sec   Loss 1.3542   LearningRate 0.0012   Epoch: 17   Global Step: 89980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:39,672-Speed 3276.33 samples/sec   Loss 1.4309   LearningRate 0.0012   Epoch: 17   Global Step: 89990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:30:42,747-Speed 3330.61 samples/sec   Loss 1.4388   LearningRate 0.0012   Epoch: 17   Global Step: 90000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:31:27,017-[lfw][90000]XNorm: 21.895830
Training: 2022-04-11 08:31:27,017-[lfw][90000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 08:31:27,018-[lfw][90000]Accuracy-Highest: 0.99850
Training: 2022-04-11 08:32:18,246-[cfp_fp][90000]XNorm: 21.862363
Training: 2022-04-11 08:32:18,247-[cfp_fp][90000]Accuracy-Flip: 0.98800+-0.00483
Training: 2022-04-11 08:32:18,247-[cfp_fp][90000]Accuracy-Highest: 0.98857
Training: 2022-04-11 08:33:02,231-[agedb_30][90000]XNorm: 22.314070
Training: 2022-04-11 08:33:02,232-[agedb_30][90000]Accuracy-Flip: 0.98317+-0.00845
Training: 2022-04-11 08:33:02,233-[agedb_30][90000]Accuracy-Highest: 0.98550
Training: 2022-04-11 08:33:05,241-Speed 71.86 samples/sec   Loss 1.4257   LearningRate 0.0012   Epoch: 17   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:08,240-Speed 3414.98 samples/sec   Loss 1.4122   LearningRate 0.0012   Epoch: 17   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:11,231-Speed 3424.26 samples/sec   Loss 1.2799   LearningRate 0.0012   Epoch: 17   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:14,222-Speed 3425.35 samples/sec   Loss 1.4051   LearningRate 0.0012   Epoch: 17   Global Step: 90040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:17,271-Speed 3358.70 samples/sec   Loss 1.3326   LearningRate 0.0012   Epoch: 17   Global Step: 90050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:20,265-Speed 3421.85 samples/sec   Loss 1.4011   LearningRate 0.0012   Epoch: 17   Global Step: 90060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:23,262-Speed 3417.72 samples/sec   Loss 1.3138   LearningRate 0.0012   Epoch: 17   Global Step: 90070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:26,259-Speed 3417.96 samples/sec   Loss 1.4080   LearningRate 0.0012   Epoch: 17   Global Step: 90080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:29,309-Speed 3358.20 samples/sec   Loss 1.5181   LearningRate 0.0012   Epoch: 17   Global Step: 90090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:32,336-Speed 3383.95 samples/sec   Loss 1.3743   LearningRate 0.0012   Epoch: 17   Global Step: 90100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:35,335-Speed 3414.94 samples/sec   Loss 1.4532   LearningRate 0.0012   Epoch: 17   Global Step: 90110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:38,356-Speed 3390.79 samples/sec   Loss 1.3527   LearningRate 0.0012   Epoch: 17   Global Step: 90120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:41,370-Speed 3397.78 samples/sec   Loss 1.3962   LearningRate 0.0012   Epoch: 17   Global Step: 90130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:44,372-Speed 3412.62 samples/sec   Loss 1.3624   LearningRate 0.0012   Epoch: 17   Global Step: 90140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:47,373-Speed 3412.51 samples/sec   Loss 1.4065   LearningRate 0.0012   Epoch: 17   Global Step: 90150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:33:50,393-Speed 3392.12 samples/sec   Loss 1.3207   LearningRate 0.0012   Epoch: 17   Global Step: 90160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:53,401-Speed 3404.82 samples/sec   Loss 1.3334   LearningRate 0.0012   Epoch: 17   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:56,419-Speed 3393.98 samples/sec   Loss 1.3433   LearningRate 0.0012   Epoch: 17   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:33:59,441-Speed 3389.41 samples/sec   Loss 1.4739   LearningRate 0.0012   Epoch: 17   Global Step: 90190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:02,444-Speed 3411.13 samples/sec   Loss 1.3214   LearningRate 0.0012   Epoch: 17   Global Step: 90200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:05,495-Speed 3356.95 samples/sec   Loss 1.3471   LearningRate 0.0012   Epoch: 17   Global Step: 90210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:08,500-Speed 3408.94 samples/sec   Loss 1.4221   LearningRate 0.0012   Epoch: 17   Global Step: 90220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:11,502-Speed 3411.99 samples/sec   Loss 1.3915   LearningRate 0.0012   Epoch: 17   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:14,490-Speed 3428.17 samples/sec   Loss 1.3968   LearningRate 0.0012   Epoch: 17   Global Step: 90240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:17,496-Speed 3407.77 samples/sec   Loss 1.3854   LearningRate 0.0012   Epoch: 17   Global Step: 90250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:20,494-Speed 3415.64 samples/sec   Loss 1.4166   LearningRate 0.0012   Epoch: 17   Global Step: 90260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:23,496-Speed 3412.98 samples/sec   Loss 1.3675   LearningRate 0.0012   Epoch: 17   Global Step: 90270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:26,502-Speed 3406.36 samples/sec   Loss 1.4084   LearningRate 0.0012   Epoch: 17   Global Step: 90280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:29,521-Speed 3393.22 samples/sec   Loss 1.3558   LearningRate 0.0012   Epoch: 17   Global Step: 90290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:32,520-Speed 3415.82 samples/sec   Loss 1.3855   LearningRate 0.0012   Epoch: 17   Global Step: 90300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:35,526-Speed 3406.89 samples/sec   Loss 1.4227   LearningRate 0.0012   Epoch: 17   Global Step: 90310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:38,598-Speed 3334.81 samples/sec   Loss 1.4081   LearningRate 0.0011   Epoch: 17   Global Step: 90320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:41,603-Speed 3408.25 samples/sec   Loss 1.4079   LearningRate 0.0011   Epoch: 17   Global Step: 90330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:34:44,606-Speed 3410.54 samples/sec   Loss 1.4093   LearningRate 0.0011   Epoch: 17   Global Step: 90340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:47,610-Speed 3411.04 samples/sec   Loss 1.4071   LearningRate 0.0011   Epoch: 17   Global Step: 90350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:50,619-Speed 3404.19 samples/sec   Loss 1.3871   LearningRate 0.0011   Epoch: 17   Global Step: 90360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:53,665-Speed 3362.31 samples/sec   Loss 1.4118   LearningRate 0.0011   Epoch: 17   Global Step: 90370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:56,669-Speed 3409.45 samples/sec   Loss 1.4699   LearningRate 0.0011   Epoch: 17   Global Step: 90380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:34:59,705-Speed 3374.53 samples/sec   Loss 1.3549   LearningRate 0.0011   Epoch: 17   Global Step: 90390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:02,732-Speed 3383.79 samples/sec   Loss 1.3175   LearningRate 0.0011   Epoch: 17   Global Step: 90400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:05,766-Speed 3375.04 samples/sec   Loss 1.4097   LearningRate 0.0011   Epoch: 17   Global Step: 90410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:08,772-Speed 3407.60 samples/sec   Loss 1.4357   LearningRate 0.0011   Epoch: 17   Global Step: 90420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:11,757-Speed 3431.69 samples/sec   Loss 1.3890   LearningRate 0.0011   Epoch: 17   Global Step: 90430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:14,757-Speed 3414.82 samples/sec   Loss 1.2800   LearningRate 0.0011   Epoch: 17   Global Step: 90440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:17,765-Speed 3403.97 samples/sec   Loss 1.4075   LearningRate 0.0011   Epoch: 17   Global Step: 90450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:20,766-Speed 3413.33 samples/sec   Loss 1.3852   LearningRate 0.0011   Epoch: 17   Global Step: 90460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:23,799-Speed 3377.76 samples/sec   Loss 1.5125   LearningRate 0.0011   Epoch: 17   Global Step: 90470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:26,813-Speed 3398.76 samples/sec   Loss 1.4211   LearningRate 0.0011   Epoch: 17   Global Step: 90480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:29,825-Speed 3400.62 samples/sec   Loss 1.4674   LearningRate 0.0011   Epoch: 17   Global Step: 90490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:32,825-Speed 3413.33 samples/sec   Loss 1.4754   LearningRate 0.0011   Epoch: 17   Global Step: 90500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:35,828-Speed 3410.87 samples/sec   Loss 1.3604   LearningRate 0.0011   Epoch: 17   Global Step: 90510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:38,834-Speed 3407.94 samples/sec   Loss 1.4980   LearningRate 0.0011   Epoch: 17   Global Step: 90520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:35:41,845-Speed 3401.28 samples/sec   Loss 1.3171   LearningRate 0.0011   Epoch: 17   Global Step: 90530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:44,847-Speed 3411.90 samples/sec   Loss 1.4060   LearningRate 0.0011   Epoch: 17   Global Step: 90540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:47,864-Speed 3395.72 samples/sec   Loss 1.4405   LearningRate 0.0011   Epoch: 17   Global Step: 90550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:50,865-Speed 3412.95 samples/sec   Loss 1.4268   LearningRate 0.0011   Epoch: 17   Global Step: 90560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:53,873-Speed 3406.82 samples/sec   Loss 1.3695   LearningRate 0.0011   Epoch: 17   Global Step: 90570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:56,879-Speed 3407.13 samples/sec   Loss 1.3692   LearningRate 0.0011   Epoch: 17   Global Step: 90580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:35:59,886-Speed 3406.40 samples/sec   Loss 1.4414   LearningRate 0.0011   Epoch: 17   Global Step: 90590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:02,908-Speed 3389.62 samples/sec   Loss 1.4647   LearningRate 0.0011   Epoch: 17   Global Step: 90600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:05,914-Speed 3406.98 samples/sec   Loss 1.3967   LearningRate 0.0011   Epoch: 17   Global Step: 90610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:08,920-Speed 3407.86 samples/sec   Loss 1.5051   LearningRate 0.0011   Epoch: 17   Global Step: 90620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:11,929-Speed 3404.36 samples/sec   Loss 1.4021   LearningRate 0.0011   Epoch: 17   Global Step: 90630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 08:36:14,928-Speed 3415.13 samples/sec   Loss 1.3872   LearningRate 0.0011   Epoch: 17   Global Step: 90640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:17,937-Speed 3404.83 samples/sec   Loss 1.3779   LearningRate 0.0011   Epoch: 17   Global Step: 90650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:20,941-Speed 3408.73 samples/sec   Loss 1.3093   LearningRate 0.0011   Epoch: 17   Global Step: 90660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:23,965-Speed 3387.82 samples/sec   Loss 1.3689   LearningRate 0.0011   Epoch: 17   Global Step: 90670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:26,968-Speed 3411.14 samples/sec   Loss 1.4241   LearningRate 0.0011   Epoch: 17   Global Step: 90680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:29,979-Speed 3402.13 samples/sec   Loss 1.4954   LearningRate 0.0011   Epoch: 17   Global Step: 90690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:32,990-Speed 3401.49 samples/sec   Loss 1.4268   LearningRate 0.0011   Epoch: 17   Global Step: 90700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:36,004-Speed 3397.57 samples/sec   Loss 1.3354   LearningRate 0.0011   Epoch: 17   Global Step: 90710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:36:39,014-Speed 3403.64 samples/sec   Loss 1.4129   LearningRate 0.0011   Epoch: 17   Global Step: 90720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:36:42,021-Speed 3405.66 samples/sec   Loss 1.4140   LearningRate 0.0011   Epoch: 17   Global Step: 90730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:36:45,035-Speed 3398.68 samples/sec   Loss 1.3548   LearningRate 0.0011   Epoch: 17   Global Step: 90740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:36:48,041-Speed 3407.79 samples/sec   Loss 1.3688   LearningRate 0.0011   Epoch: 17   Global Step: 90750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:36:51,051-Speed 3402.72 samples/sec   Loss 1.3722   LearningRate 0.0011   Epoch: 17   Global Step: 90760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:36:54,058-Speed 3407.34 samples/sec   Loss 1.3412   LearningRate 0.0011   Epoch: 17   Global Step: 90770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:36:57,070-Speed 3400.23 samples/sec   Loss 1.3077   LearningRate 0.0011   Epoch: 17   Global Step: 90780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:00,099-Speed 3381.00 samples/sec   Loss 1.4634   LearningRate 0.0011   Epoch: 17   Global Step: 90790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:03,107-Speed 3406.05 samples/sec   Loss 1.4237   LearningRate 0.0010   Epoch: 17   Global Step: 90800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:06,117-Speed 3402.69 samples/sec   Loss 1.4021   LearningRate 0.0010   Epoch: 17   Global Step: 90810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:09,119-Speed 3411.67 samples/sec   Loss 1.3794   LearningRate 0.0010   Epoch: 17   Global Step: 90820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:12,145-Speed 3385.30 samples/sec   Loss 1.3743   LearningRate 0.0010   Epoch: 17   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:15,157-Speed 3401.55 samples/sec   Loss 1.3615   LearningRate 0.0010   Epoch: 17   Global Step: 90840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:18,238-Speed 3323.61 samples/sec   Loss 1.4248   LearningRate 0.0010   Epoch: 17   Global Step: 90850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:21,249-Speed 3401.42 samples/sec   Loss 1.4185   LearningRate 0.0010   Epoch: 17   Global Step: 90860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:24,294-Speed 3364.84 samples/sec   Loss 1.4595   LearningRate 0.0010   Epoch: 17   Global Step: 90870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:27,315-Speed 3389.81 samples/sec   Loss 1.3541   LearningRate 0.0010   Epoch: 17   Global Step: 90880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:30,353-Speed 3372.23 samples/sec   Loss 1.3894   LearningRate 0.0010   Epoch: 17   Global Step: 90890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:37:33,347-Speed 3422.16 samples/sec   Loss 1.3121   LearningRate 0.0010   Epoch: 17   Global Step: 90900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:36,417-Speed 3335.98 samples/sec   Loss 1.3764   LearningRate 0.0010   Epoch: 17   Global Step: 90910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:39,461-Speed 3365.63 samples/sec   Loss 1.3534   LearningRate 0.0010   Epoch: 17   Global Step: 90920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:42,482-Speed 3390.28 samples/sec   Loss 1.4021   LearningRate 0.0010   Epoch: 17   Global Step: 90930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:45,491-Speed 3404.31 samples/sec   Loss 1.4274   LearningRate 0.0010   Epoch: 17   Global Step: 90940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:48,516-Speed 3386.09 samples/sec   Loss 1.3361   LearningRate 0.0010   Epoch: 17   Global Step: 90950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:51,549-Speed 3376.77 samples/sec   Loss 1.3534   LearningRate 0.0010   Epoch: 17   Global Step: 90960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:54,556-Speed 3406.83 samples/sec   Loss 1.3055   LearningRate 0.0010   Epoch: 17   Global Step: 90970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:37:57,564-Speed 3404.57 samples/sec   Loss 1.3674   LearningRate 0.0010   Epoch: 17   Global Step: 90980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:38:00,574-Speed 3402.94 samples/sec   Loss 1.3546   LearningRate 0.0010   Epoch: 17   Global Step: 90990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:38:03,586-Speed 3401.36 samples/sec   Loss 1.5091   LearningRate 0.0010   Epoch: 17   Global Step: 91000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:06,608-Speed 3388.57 samples/sec   Loss 1.5032   LearningRate 0.0010   Epoch: 17   Global Step: 91010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:09,618-Speed 3403.50 samples/sec   Loss 1.3948   LearningRate 0.0010   Epoch: 17   Global Step: 91020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:12,629-Speed 3401.47 samples/sec   Loss 1.3898   LearningRate 0.0010   Epoch: 17   Global Step: 91030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:15,802-Speed 3227.97 samples/sec   Loss 1.3089   LearningRate 0.0010   Epoch: 17   Global Step: 91040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:29,207-Speed 764.02 samples/sec   Loss 1.1833   LearningRate 0.0010   Epoch: 18   Global Step: 91050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:32,231-Speed 3390.83 samples/sec   Loss 0.9903   LearningRate 0.0010   Epoch: 18   Global Step: 91060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:35,251-Speed 3391.35 samples/sec   Loss 1.0522   LearningRate 0.0010   Epoch: 18   Global Step: 91070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:38,272-Speed 3392.58 samples/sec   Loss 1.0567   LearningRate 0.0010   Epoch: 18   Global Step: 91080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:41,331-Speed 3348.27 samples/sec   Loss 1.1006   LearningRate 0.0010   Epoch: 18   Global Step: 91090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:44,341-Speed 3402.65 samples/sec   Loss 1.0067   LearningRate 0.0010   Epoch: 18   Global Step: 91100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:47,354-Speed 3399.38 samples/sec   Loss 1.1515   LearningRate 0.0010   Epoch: 18   Global Step: 91110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:50,376-Speed 3390.34 samples/sec   Loss 0.9838   LearningRate 0.0010   Epoch: 18   Global Step: 91120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:53,391-Speed 3396.90 samples/sec   Loss 1.0968   LearningRate 0.0010   Epoch: 18   Global Step: 91130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:56,401-Speed 3403.27 samples/sec   Loss 1.0110   LearningRate 0.0010   Epoch: 18   Global Step: 91140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:38:59,421-Speed 3390.79 samples/sec   Loss 1.0812   LearningRate 0.0010   Epoch: 18   Global Step: 91150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:02,445-Speed 3387.07 samples/sec   Loss 1.0058   LearningRate 0.0010   Epoch: 18   Global Step: 91160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:05,484-Speed 3371.02 samples/sec   Loss 1.0513   LearningRate 0.0010   Epoch: 18   Global Step: 91170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:08,510-Speed 3385.13 samples/sec   Loss 1.1358   LearningRate 0.0010   Epoch: 18   Global Step: 91180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:11,579-Speed 3336.83 samples/sec   Loss 1.0471   LearningRate 0.0010   Epoch: 18   Global Step: 91190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:14,597-Speed 3394.80 samples/sec   Loss 1.0690   LearningRate 0.0010   Epoch: 18   Global Step: 91200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:17,620-Speed 3388.08 samples/sec   Loss 1.0274   LearningRate 0.0010   Epoch: 18   Global Step: 91210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:20,641-Speed 3390.84 samples/sec   Loss 1.0004   LearningRate 0.0010   Epoch: 18   Global Step: 91220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:23,666-Speed 3385.74 samples/sec   Loss 1.0417   LearningRate 0.0010   Epoch: 18   Global Step: 91230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:26,703-Speed 3371.98 samples/sec   Loss 1.0841   LearningRate 0.0010   Epoch: 18   Global Step: 91240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:29,747-Speed 3365.55 samples/sec   Loss 1.0275   LearningRate 0.0010   Epoch: 18   Global Step: 91250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:39:32,773-Speed 3385.22 samples/sec   Loss 1.0354   LearningRate 0.0010   Epoch: 18   Global Step: 91260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:35,798-Speed 3386.00 samples/sec   Loss 1.1228   LearningRate 0.0010   Epoch: 18   Global Step: 91270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:38,836-Speed 3371.86 samples/sec   Loss 1.0553   LearningRate 0.0010   Epoch: 18   Global Step: 91280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:41,870-Speed 3375.11 samples/sec   Loss 1.0123   LearningRate 0.0010   Epoch: 18   Global Step: 91290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:44,895-Speed 3386.33 samples/sec   Loss 1.0823   LearningRate 0.0010   Epoch: 18   Global Step: 91300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:47,920-Speed 3386.22 samples/sec   Loss 1.0779   LearningRate 0.0009   Epoch: 18   Global Step: 91310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:51,032-Speed 3290.52 samples/sec   Loss 1.0932   LearningRate 0.0009   Epoch: 18   Global Step: 91320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:54,063-Speed 3380.66 samples/sec   Loss 1.1537   LearningRate 0.0009   Epoch: 18   Global Step: 91330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:39:57,083-Speed 3391.59 samples/sec   Loss 1.0946   LearningRate 0.0009   Epoch: 18   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:00,107-Speed 3387.60 samples/sec   Loss 1.1268   LearningRate 0.0009   Epoch: 18   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:03,109-Speed 3411.59 samples/sec   Loss 1.1080   LearningRate 0.0009   Epoch: 18   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:06,131-Speed 3388.41 samples/sec   Loss 1.0051   LearningRate 0.0009   Epoch: 18   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:09,149-Speed 3393.98 samples/sec   Loss 1.0486   LearningRate 0.0009   Epoch: 18   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:12,166-Speed 3395.03 samples/sec   Loss 1.0398   LearningRate 0.0009   Epoch: 18   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:15,194-Speed 3383.41 samples/sec   Loss 1.0745   LearningRate 0.0009   Epoch: 18   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:18,224-Speed 3379.78 samples/sec   Loss 1.0752   LearningRate 0.0009   Epoch: 18   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:21,243-Speed 3392.08 samples/sec   Loss 1.1290   LearningRate 0.0009   Epoch: 18   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:24,256-Speed 3399.74 samples/sec   Loss 0.9679   LearningRate 0.0009   Epoch: 18   Global Step: 91430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:27,302-Speed 3363.23 samples/sec   Loss 1.0701   LearningRate 0.0009   Epoch: 18   Global Step: 91440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:30,315-Speed 3400.20 samples/sec   Loss 1.0569   LearningRate 0.0009   Epoch: 18   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:33,306-Speed 3424.24 samples/sec   Loss 1.0379   LearningRate 0.0009   Epoch: 18   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:36,315-Speed 3403.76 samples/sec   Loss 1.1457   LearningRate 0.0009   Epoch: 18   Global Step: 91470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:39,323-Speed 3405.26 samples/sec   Loss 0.9907   LearningRate 0.0009   Epoch: 18   Global Step: 91480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:42,331-Speed 3405.02 samples/sec   Loss 1.0226   LearningRate 0.0009   Epoch: 18   Global Step: 91490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:45,344-Speed 3400.13 samples/sec   Loss 1.0609   LearningRate 0.0009   Epoch: 18   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:48,352-Speed 3404.32 samples/sec   Loss 1.0062   LearningRate 0.0009   Epoch: 18   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:51,364-Speed 3401.06 samples/sec   Loss 0.9895   LearningRate 0.0009   Epoch: 18   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:54,374-Speed 3403.34 samples/sec   Loss 1.0796   LearningRate 0.0009   Epoch: 18   Global Step: 91530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:40:57,382-Speed 3405.34 samples/sec   Loss 1.1312   LearningRate 0.0009   Epoch: 18   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:00,391-Speed 3403.65 samples/sec   Loss 1.0614   LearningRate 0.0009   Epoch: 18   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:03,408-Speed 3395.41 samples/sec   Loss 1.0195   LearningRate 0.0009   Epoch: 18   Global Step: 91560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 08:41:06,405-Speed 3417.56 samples/sec   Loss 1.1025   LearningRate 0.0009   Epoch: 18   Global Step: 91570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:09,417-Speed 3399.90 samples/sec   Loss 1.0582   LearningRate 0.0009   Epoch: 18   Global Step: 91580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:12,426-Speed 3404.02 samples/sec   Loss 1.0220   LearningRate 0.0009   Epoch: 18   Global Step: 91590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:15,434-Speed 3405.83 samples/sec   Loss 1.0398   LearningRate 0.0009   Epoch: 18   Global Step: 91600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:18,469-Speed 3373.84 samples/sec   Loss 1.0808   LearningRate 0.0009   Epoch: 18   Global Step: 91610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:21,481-Speed 3400.77 samples/sec   Loss 1.0789   LearningRate 0.0009   Epoch: 18   Global Step: 91620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:24,494-Speed 3399.36 samples/sec   Loss 1.0669   LearningRate 0.0009   Epoch: 18   Global Step: 91630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:27,530-Speed 3374.11 samples/sec   Loss 1.0251   LearningRate 0.0009   Epoch: 18   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:30,546-Speed 3396.89 samples/sec   Loss 1.1664   LearningRate 0.0009   Epoch: 18   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:33,561-Speed 3396.84 samples/sec   Loss 1.0680   LearningRate 0.0009   Epoch: 18   Global Step: 91660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:36,568-Speed 3406.51 samples/sec   Loss 1.0228   LearningRate 0.0009   Epoch: 18   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:39,603-Speed 3374.63 samples/sec   Loss 1.0556   LearningRate 0.0009   Epoch: 18   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:42,618-Speed 3396.65 samples/sec   Loss 1.1236   LearningRate 0.0009   Epoch: 18   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:45,629-Speed 3403.58 samples/sec   Loss 1.0894   LearningRate 0.0009   Epoch: 18   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:48,643-Speed 3398.16 samples/sec   Loss 1.1527   LearningRate 0.0009   Epoch: 18   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:51,661-Speed 3393.02 samples/sec   Loss 1.0358   LearningRate 0.0009   Epoch: 18   Global Step: 91720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:54,680-Speed 3393.42 samples/sec   Loss 1.0733   LearningRate 0.0009   Epoch: 18   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:41:57,692-Speed 3400.56 samples/sec   Loss 1.0901   LearningRate 0.0009   Epoch: 18   Global Step: 91740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:00,704-Speed 3400.89 samples/sec   Loss 1.0137   LearningRate 0.0009   Epoch: 18   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:03,726-Speed 3388.52 samples/sec   Loss 1.0187   LearningRate 0.0009   Epoch: 18   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:06,736-Speed 3402.90 samples/sec   Loss 1.1006   LearningRate 0.0009   Epoch: 18   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:09,751-Speed 3397.66 samples/sec   Loss 1.1026   LearningRate 0.0009   Epoch: 18   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:12,754-Speed 3410.77 samples/sec   Loss 1.0130   LearningRate 0.0009   Epoch: 18   Global Step: 91790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:15,773-Speed 3393.18 samples/sec   Loss 1.0269   LearningRate 0.0009   Epoch: 18   Global Step: 91800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:18,789-Speed 3395.92 samples/sec   Loss 1.0526   LearningRate 0.0009   Epoch: 18   Global Step: 91810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:21,803-Speed 3398.01 samples/sec   Loss 1.1464   LearningRate 0.0009   Epoch: 18   Global Step: 91820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:24,823-Speed 3392.32 samples/sec   Loss 1.0923   LearningRate 0.0009   Epoch: 18   Global Step: 91830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:27,836-Speed 3399.37 samples/sec   Loss 1.0505   LearningRate 0.0008   Epoch: 18   Global Step: 91840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:30,857-Speed 3390.13 samples/sec   Loss 1.1434   LearningRate 0.0008   Epoch: 18   Global Step: 91850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:33,873-Speed 3396.61 samples/sec   Loss 0.9943   LearningRate 0.0008   Epoch: 18   Global Step: 91860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:36,912-Speed 3370.14 samples/sec   Loss 1.0313   LearningRate 0.0008   Epoch: 18   Global Step: 91870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:39,924-Speed 3400.46 samples/sec   Loss 1.0956   LearningRate 0.0008   Epoch: 18   Global Step: 91880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:42:42,943-Speed 3392.80 samples/sec   Loss 1.0525   LearningRate 0.0008   Epoch: 18   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:45,966-Speed 3388.68 samples/sec   Loss 1.0257   LearningRate 0.0008   Epoch: 18   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:49,007-Speed 3367.73 samples/sec   Loss 1.0717   LearningRate 0.0008   Epoch: 18   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:52,047-Speed 3369.38 samples/sec   Loss 1.0583   LearningRate 0.0008   Epoch: 18   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:55,064-Speed 3395.16 samples/sec   Loss 1.0084   LearningRate 0.0008   Epoch: 18   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:42:58,075-Speed 3401.51 samples/sec   Loss 1.1328   LearningRate 0.0008   Epoch: 18   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:43:01,074-Speed 3415.67 samples/sec   Loss 1.0268   LearningRate 0.0008   Epoch: 18   Global Step: 91950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:43:04,091-Speed 3394.58 samples/sec   Loss 1.1136   LearningRate 0.0008   Epoch: 18   Global Step: 91960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:43:07,114-Speed 3388.11 samples/sec   Loss 1.1363   LearningRate 0.0008   Epoch: 18   Global Step: 91970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:43:10,133-Speed 3392.42 samples/sec   Loss 1.0882   LearningRate 0.0008   Epoch: 18   Global Step: 91980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:43:13,155-Speed 3389.79 samples/sec   Loss 1.1452   LearningRate 0.0008   Epoch: 18   Global Step: 91990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:43:16,187-Speed 3378.09 samples/sec   Loss 1.0310   LearningRate 0.0008   Epoch: 18   Global Step: 92000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:44:00,228-[lfw][92000]XNorm: 22.418110
Training: 2022-04-11 08:44:00,229-[lfw][92000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 08:44:00,229-[lfw][92000]Accuracy-Highest: 0.99850
Training: 2022-04-11 08:44:51,478-[cfp_fp][92000]XNorm: 22.511127
Training: 2022-04-11 08:44:51,478-[cfp_fp][92000]Accuracy-Flip: 0.98843+-0.00411
Training: 2022-04-11 08:44:51,479-[cfp_fp][92000]Accuracy-Highest: 0.98857
Training: 2022-04-11 08:45:35,388-[agedb_30][92000]XNorm: 22.807042
Training: 2022-04-11 08:45:35,389-[agedb_30][92000]Accuracy-Flip: 0.98433+-0.00746
Training: 2022-04-11 08:45:35,389-[agedb_30][92000]Accuracy-Highest: 0.98550
Training: 2022-04-11 08:45:38,383-Speed 72.01 samples/sec   Loss 1.1208   LearningRate 0.0008   Epoch: 18   Global Step: 92010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:45:41,379-Speed 3418.78 samples/sec   Loss 1.1054   LearningRate 0.0008   Epoch: 18   Global Step: 92020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:45:44,372-Speed 3422.27 samples/sec   Loss 1.0292   LearningRate 0.0008   Epoch: 18   Global Step: 92030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:45:47,360-Speed 3427.72 samples/sec   Loss 1.0697   LearningRate 0.0008   Epoch: 18   Global Step: 92040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:45:50,365-Speed 3409.16 samples/sec   Loss 1.0968   LearningRate 0.0008   Epoch: 18   Global Step: 92050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:45:53,362-Speed 3418.20 samples/sec   Loss 1.1078   LearningRate 0.0008   Epoch: 18   Global Step: 92060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:45:56,356-Speed 3420.53 samples/sec   Loss 1.1083   LearningRate 0.0008   Epoch: 18   Global Step: 92070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:45:59,348-Speed 3423.48 samples/sec   Loss 1.0593   LearningRate 0.0008   Epoch: 18   Global Step: 92080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:02,349-Speed 3412.25 samples/sec   Loss 1.0158   LearningRate 0.0008   Epoch: 18   Global Step: 92090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:05,346-Speed 3418.09 samples/sec   Loss 1.0475   LearningRate 0.0008   Epoch: 18   Global Step: 92100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:08,345-Speed 3415.42 samples/sec   Loss 1.1007   LearningRate 0.0008   Epoch: 18   Global Step: 92110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:11,344-Speed 3414.68 samples/sec   Loss 1.0266   LearningRate 0.0008   Epoch: 18   Global Step: 92120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:14,352-Speed 3405.35 samples/sec   Loss 1.0423   LearningRate 0.0008   Epoch: 18   Global Step: 92130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:17,351-Speed 3416.61 samples/sec   Loss 1.1103   LearningRate 0.0008   Epoch: 18   Global Step: 92140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:20,336-Speed 3430.91 samples/sec   Loss 1.0137   LearningRate 0.0008   Epoch: 18   Global Step: 92150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:23,337-Speed 3413.47 samples/sec   Loss 1.0261   LearningRate 0.0008   Epoch: 18   Global Step: 92160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:26,340-Speed 3410.39 samples/sec   Loss 1.0407   LearningRate 0.0008   Epoch: 18   Global Step: 92170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:29,361-Speed 3389.57 samples/sec   Loss 1.1007   LearningRate 0.0008   Epoch: 18   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:32,474-Speed 3290.57 samples/sec   Loss 1.0975   LearningRate 0.0008   Epoch: 18   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:35,486-Speed 3400.62 samples/sec   Loss 1.0782   LearningRate 0.0008   Epoch: 18   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:38,491-Speed 3408.78 samples/sec   Loss 1.1171   LearningRate 0.0008   Epoch: 18   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:41,504-Speed 3399.28 samples/sec   Loss 1.1254   LearningRate 0.0008   Epoch: 18   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:44,525-Speed 3391.31 samples/sec   Loss 1.0952   LearningRate 0.0008   Epoch: 18   Global Step: 92230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:47,533-Speed 3405.15 samples/sec   Loss 1.0783   LearningRate 0.0008   Epoch: 18   Global Step: 92240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:50,544-Speed 3400.93 samples/sec   Loss 1.0669   LearningRate 0.0008   Epoch: 18   Global Step: 92250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 08:46:53,541-Speed 3418.15 samples/sec   Loss 1.0970   LearningRate 0.0008   Epoch: 18   Global Step: 92260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:56,548-Speed 3405.81 samples/sec   Loss 1.1022   LearningRate 0.0008   Epoch: 18   Global Step: 92270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:46:59,555-Speed 3407.04 samples/sec   Loss 1.0709   LearningRate 0.0008   Epoch: 18   Global Step: 92280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:02,567-Speed 3401.48 samples/sec   Loss 1.0356   LearningRate 0.0008   Epoch: 18   Global Step: 92290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:05,624-Speed 3349.49 samples/sec   Loss 1.0834   LearningRate 0.0008   Epoch: 18   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:08,639-Speed 3397.90 samples/sec   Loss 1.1275   LearningRate 0.0008   Epoch: 18   Global Step: 92310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:11,670-Speed 3379.60 samples/sec   Loss 0.9912   LearningRate 0.0008   Epoch: 18   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:14,719-Speed 3358.62 samples/sec   Loss 1.1413   LearningRate 0.0008   Epoch: 18   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:17,726-Speed 3406.82 samples/sec   Loss 1.0996   LearningRate 0.0008   Epoch: 18   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:20,732-Speed 3407.78 samples/sec   Loss 1.0553   LearningRate 0.0008   Epoch: 18   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:23,714-Speed 3434.04 samples/sec   Loss 1.0898   LearningRate 0.0008   Epoch: 18   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:26,719-Speed 3408.51 samples/sec   Loss 1.1094   LearningRate 0.0008   Epoch: 18   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:29,724-Speed 3409.20 samples/sec   Loss 1.0688   LearningRate 0.0008   Epoch: 18   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:32,724-Speed 3414.38 samples/sec   Loss 1.0121   LearningRate 0.0008   Epoch: 18   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:35,726-Speed 3410.84 samples/sec   Loss 1.1595   LearningRate 0.0007   Epoch: 18   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:38,745-Speed 3393.46 samples/sec   Loss 1.1237   LearningRate 0.0007   Epoch: 18   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:41,756-Speed 3401.48 samples/sec   Loss 1.1628   LearningRate 0.0007   Epoch: 18   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:44,789-Speed 3377.68 samples/sec   Loss 1.0900   LearningRate 0.0007   Epoch: 18   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:47,789-Speed 3413.91 samples/sec   Loss 1.0153   LearningRate 0.0007   Epoch: 18   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:50,838-Speed 3359.18 samples/sec   Loss 1.1235   LearningRate 0.0007   Epoch: 18   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:53,837-Speed 3415.98 samples/sec   Loss 1.1016   LearningRate 0.0007   Epoch: 18   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:56,839-Speed 3412.08 samples/sec   Loss 1.1104   LearningRate 0.0007   Epoch: 18   Global Step: 92470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:47:59,845-Speed 3406.87 samples/sec   Loss 1.0812   LearningRate 0.0007   Epoch: 18   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:02,845-Speed 3414.34 samples/sec   Loss 1.1353   LearningRate 0.0007   Epoch: 18   Global Step: 92490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:05,872-Speed 3383.97 samples/sec   Loss 1.0978   LearningRate 0.0007   Epoch: 18   Global Step: 92500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:08,871-Speed 3415.10 samples/sec   Loss 1.1017   LearningRate 0.0007   Epoch: 18   Global Step: 92510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:11,885-Speed 3398.62 samples/sec   Loss 1.0565   LearningRate 0.0007   Epoch: 18   Global Step: 92520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:14,888-Speed 3411.23 samples/sec   Loss 1.1263   LearningRate 0.0007   Epoch: 18   Global Step: 92530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:17,909-Speed 3390.45 samples/sec   Loss 1.0936   LearningRate 0.0007   Epoch: 18   Global Step: 92540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:20,912-Speed 3411.02 samples/sec   Loss 1.0052   LearningRate 0.0007   Epoch: 18   Global Step: 92550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:23,919-Speed 3405.72 samples/sec   Loss 1.0323   LearningRate 0.0007   Epoch: 18   Global Step: 92560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:26,924-Speed 3408.90 samples/sec   Loss 1.0708   LearningRate 0.0007   Epoch: 18   Global Step: 92570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:29,992-Speed 3338.70 samples/sec   Loss 1.0501   LearningRate 0.0007   Epoch: 18   Global Step: 92580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:33,042-Speed 3357.61 samples/sec   Loss 1.1311   LearningRate 0.0007   Epoch: 18   Global Step: 92590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:36,071-Speed 3381.31 samples/sec   Loss 1.1357   LearningRate 0.0007   Epoch: 18   Global Step: 92600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:39,139-Speed 3339.53 samples/sec   Loss 1.1133   LearningRate 0.0007   Epoch: 18   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:42,144-Speed 3408.74 samples/sec   Loss 1.0110   LearningRate 0.0007   Epoch: 18   Global Step: 92620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:45,146-Speed 3411.72 samples/sec   Loss 1.0730   LearningRate 0.0007   Epoch: 18   Global Step: 92630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:48,155-Speed 3404.22 samples/sec   Loss 1.1162   LearningRate 0.0007   Epoch: 18   Global Step: 92640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:51,165-Speed 3402.86 samples/sec   Loss 1.0262   LearningRate 0.0007   Epoch: 18   Global Step: 92650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:54,159-Speed 3420.94 samples/sec   Loss 1.0274   LearningRate 0.0007   Epoch: 18   Global Step: 92660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:48:57,162-Speed 3410.03 samples/sec   Loss 1.1651   LearningRate 0.0007   Epoch: 18   Global Step: 92670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:00,168-Speed 3407.34 samples/sec   Loss 1.1015   LearningRate 0.0007   Epoch: 18   Global Step: 92680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:03,182-Speed 3399.36 samples/sec   Loss 1.1103   LearningRate 0.0007   Epoch: 18   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:06,189-Speed 3406.01 samples/sec   Loss 1.0455   LearningRate 0.0007   Epoch: 18   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:09,193-Speed 3409.57 samples/sec   Loss 1.0930   LearningRate 0.0007   Epoch: 18   Global Step: 92710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:12,192-Speed 3415.16 samples/sec   Loss 1.0891   LearningRate 0.0007   Epoch: 18   Global Step: 92720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:15,203-Speed 3402.58 samples/sec   Loss 1.1863   LearningRate 0.0007   Epoch: 18   Global Step: 92730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:18,208-Speed 3408.23 samples/sec   Loss 1.1439   LearningRate 0.0007   Epoch: 18   Global Step: 92740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:21,214-Speed 3407.42 samples/sec   Loss 1.0417   LearningRate 0.0007   Epoch: 18   Global Step: 92750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:24,203-Speed 3426.22 samples/sec   Loss 1.0678   LearningRate 0.0007   Epoch: 18   Global Step: 92760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:27,215-Speed 3400.78 samples/sec   Loss 1.0942   LearningRate 0.0007   Epoch: 18   Global Step: 92770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:30,234-Speed 3392.36 samples/sec   Loss 1.1281   LearningRate 0.0007   Epoch: 18   Global Step: 92780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:33,242-Speed 3405.09 samples/sec   Loss 1.1663   LearningRate 0.0007   Epoch: 18   Global Step: 92790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:36,253-Speed 3402.78 samples/sec   Loss 1.0605   LearningRate 0.0007   Epoch: 18   Global Step: 92800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:39,287-Speed 3375.90 samples/sec   Loss 1.1155   LearningRate 0.0007   Epoch: 18   Global Step: 92810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:42,295-Speed 3404.82 samples/sec   Loss 1.1226   LearningRate 0.0007   Epoch: 18   Global Step: 92820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:45,312-Speed 3394.58 samples/sec   Loss 1.1116   LearningRate 0.0007   Epoch: 18   Global Step: 92830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:48,319-Speed 3406.54 samples/sec   Loss 1.2150   LearningRate 0.0007   Epoch: 18   Global Step: 92840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:51,348-Speed 3381.43 samples/sec   Loss 1.1361   LearningRate 0.0007   Epoch: 18   Global Step: 92850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:54,334-Speed 3430.30 samples/sec   Loss 1.0220   LearningRate 0.0007   Epoch: 18   Global Step: 92860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:49:57,346-Speed 3400.25 samples/sec   Loss 1.1104   LearningRate 0.0007   Epoch: 18   Global Step: 92870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:50:00,360-Speed 3399.12 samples/sec   Loss 1.1226   LearningRate 0.0007   Epoch: 18   Global Step: 92880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:50:03,354-Speed 3421.19 samples/sec   Loss 1.0486   LearningRate 0.0007   Epoch: 18   Global Step: 92890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:06,382-Speed 3382.48 samples/sec   Loss 1.1574   LearningRate 0.0007   Epoch: 18   Global Step: 92900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:09,387-Speed 3409.16 samples/sec   Loss 1.1596   LearningRate 0.0007   Epoch: 18   Global Step: 92910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:12,391-Speed 3409.63 samples/sec   Loss 1.0304   LearningRate 0.0007   Epoch: 18   Global Step: 92920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:15,418-Speed 3383.04 samples/sec   Loss 1.0609   LearningRate 0.0007   Epoch: 18   Global Step: 92930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:18,455-Speed 3373.09 samples/sec   Loss 1.1039   LearningRate 0.0007   Epoch: 18   Global Step: 92940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:21,468-Speed 3399.00 samples/sec   Loss 1.0724   LearningRate 0.0007   Epoch: 18   Global Step: 92950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:24,476-Speed 3404.78 samples/sec   Loss 1.0200   LearningRate 0.0007   Epoch: 18   Global Step: 92960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:27,490-Speed 3398.66 samples/sec   Loss 1.0454   LearningRate 0.0007   Epoch: 18   Global Step: 92970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:30,505-Speed 3396.91 samples/sec   Loss 1.0981   LearningRate 0.0007   Epoch: 18   Global Step: 92980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:33,524-Speed 3394.05 samples/sec   Loss 1.0929   LearningRate 0.0007   Epoch: 18   Global Step: 92990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:50:36,542-Speed 3392.87 samples/sec   Loss 1.0245   LearningRate 0.0007   Epoch: 18   Global Step: 93000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:39,552-Speed 3403.02 samples/sec   Loss 1.1445   LearningRate 0.0006   Epoch: 18   Global Step: 93010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:42,567-Speed 3397.18 samples/sec   Loss 1.0291   LearningRate 0.0006   Epoch: 18   Global Step: 93020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:45,572-Speed 3408.23 samples/sec   Loss 1.1055   LearningRate 0.0006   Epoch: 18   Global Step: 93030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:48,605-Speed 3377.71 samples/sec   Loss 1.1606   LearningRate 0.0006   Epoch: 18   Global Step: 93040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:51,611-Speed 3406.96 samples/sec   Loss 1.0666   LearningRate 0.0006   Epoch: 18   Global Step: 93050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:54,630-Speed 3393.15 samples/sec   Loss 1.0951   LearningRate 0.0006   Epoch: 18   Global Step: 93060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:50:57,634-Speed 3409.87 samples/sec   Loss 1.0670   LearningRate 0.0006   Epoch: 18   Global Step: 93070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:00,657-Speed 3388.10 samples/sec   Loss 1.1526   LearningRate 0.0006   Epoch: 18   Global Step: 93080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:03,671-Speed 3398.12 samples/sec   Loss 1.0836   LearningRate 0.0006   Epoch: 18   Global Step: 93090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:06,684-Speed 3399.23 samples/sec   Loss 1.0538   LearningRate 0.0006   Epoch: 18   Global Step: 93100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:51:09,694-Speed 3403.44 samples/sec   Loss 1.0553   LearningRate 0.0006   Epoch: 18   Global Step: 93110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:51:12,699-Speed 3408.30 samples/sec   Loss 1.0296   LearningRate 0.0006   Epoch: 18   Global Step: 93120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:51:15,691-Speed 3423.71 samples/sec   Loss 1.0784   LearningRate 0.0006   Epoch: 18   Global Step: 93130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:18,702-Speed 3401.73 samples/sec   Loss 1.0885   LearningRate 0.0006   Epoch: 18   Global Step: 93140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:21,710-Speed 3404.64 samples/sec   Loss 1.1433   LearningRate 0.0006   Epoch: 18   Global Step: 93150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:24,730-Speed 3391.66 samples/sec   Loss 1.0329   LearningRate 0.0006   Epoch: 18   Global Step: 93160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:27,800-Speed 3336.17 samples/sec   Loss 1.1257   LearningRate 0.0006   Epoch: 18   Global Step: 93170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:30,850-Speed 3357.79 samples/sec   Loss 1.0253   LearningRate 0.0006   Epoch: 18   Global Step: 93180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:33,863-Speed 3400.45 samples/sec   Loss 1.0354   LearningRate 0.0006   Epoch: 18   Global Step: 93190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:36,875-Speed 3400.17 samples/sec   Loss 1.0358   LearningRate 0.0006   Epoch: 18   Global Step: 93200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:39,892-Speed 3395.78 samples/sec   Loss 1.0752   LearningRate 0.0006   Epoch: 18   Global Step: 93210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:42,898-Speed 3407.51 samples/sec   Loss 1.1800   LearningRate 0.0006   Epoch: 18   Global Step: 93220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:45,911-Speed 3398.98 samples/sec   Loss 1.0752   LearningRate 0.0006   Epoch: 18   Global Step: 93230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:51:48,907-Speed 3418.32 samples/sec   Loss 1.0519   LearningRate 0.0006   Epoch: 18   Global Step: 93240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:51,920-Speed 3399.43 samples/sec   Loss 1.0169   LearningRate 0.0006   Epoch: 18   Global Step: 93250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:54,944-Speed 3387.68 samples/sec   Loss 1.0587   LearningRate 0.0006   Epoch: 18   Global Step: 93260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:51:58,018-Speed 3331.55 samples/sec   Loss 1.1472   LearningRate 0.0006   Epoch: 18   Global Step: 93270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:01,037-Speed 3392.94 samples/sec   Loss 1.1452   LearningRate 0.0006   Epoch: 18   Global Step: 93280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:04,073-Speed 3374.21 samples/sec   Loss 1.1250   LearningRate 0.0006   Epoch: 18   Global Step: 93290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:07,085-Speed 3400.12 samples/sec   Loss 1.0450   LearningRate 0.0006   Epoch: 18   Global Step: 93300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:10,094-Speed 3404.30 samples/sec   Loss 1.0845   LearningRate 0.0006   Epoch: 18   Global Step: 93310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:13,115-Speed 3390.75 samples/sec   Loss 1.0849   LearningRate 0.0006   Epoch: 18   Global Step: 93320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:16,129-Speed 3397.63 samples/sec   Loss 1.1025   LearningRate 0.0006   Epoch: 18   Global Step: 93330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:19,153-Speed 3387.71 samples/sec   Loss 1.0763   LearningRate 0.0006   Epoch: 18   Global Step: 93340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:52:22,165-Speed 3400.67 samples/sec   Loss 1.0712   LearningRate 0.0006   Epoch: 18   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:52:25,200-Speed 3374.76 samples/sec   Loss 1.1174   LearningRate 0.0006   Epoch: 18   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:52:28,293-Speed 3311.12 samples/sec   Loss 1.1442   LearningRate 0.0006   Epoch: 18   Global Step: 93370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:31,323-Speed 3380.38 samples/sec   Loss 1.0697   LearningRate 0.0006   Epoch: 18   Global Step: 93380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:34,347-Speed 3387.81 samples/sec   Loss 0.9622   LearningRate 0.0006   Epoch: 18   Global Step: 93390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:37,365-Speed 3393.87 samples/sec   Loss 1.0818   LearningRate 0.0006   Epoch: 18   Global Step: 93400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:40,396-Speed 3379.97 samples/sec   Loss 1.0984   LearningRate 0.0006   Epoch: 18   Global Step: 93410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:43,406-Speed 3402.33 samples/sec   Loss 1.1497   LearningRate 0.0006   Epoch: 18   Global Step: 93420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:46,420-Speed 3398.43 samples/sec   Loss 1.1048   LearningRate 0.0006   Epoch: 18   Global Step: 93430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:49,436-Speed 3395.31 samples/sec   Loss 1.1039   LearningRate 0.0006   Epoch: 18   Global Step: 93440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:52,467-Speed 3379.89 samples/sec   Loss 1.0861   LearningRate 0.0006   Epoch: 18   Global Step: 93450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:55,474-Speed 3406.01 samples/sec   Loss 1.1012   LearningRate 0.0006   Epoch: 18   Global Step: 93460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:52:58,485-Speed 3402.20 samples/sec   Loss 1.0056   LearningRate 0.0006   Epoch: 18   Global Step: 93470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:01,499-Speed 3397.79 samples/sec   Loss 1.0504   LearningRate 0.0006   Epoch: 18   Global Step: 93480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:04,512-Speed 3399.72 samples/sec   Loss 1.1256   LearningRate 0.0006   Epoch: 18   Global Step: 93490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:07,525-Speed 3399.91 samples/sec   Loss 1.0048   LearningRate 0.0006   Epoch: 18   Global Step: 93500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:10,534-Speed 3404.19 samples/sec   Loss 1.1009   LearningRate 0.0006   Epoch: 18   Global Step: 93510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:13,561-Speed 3383.88 samples/sec   Loss 1.1852   LearningRate 0.0006   Epoch: 18   Global Step: 93520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:16,576-Speed 3396.23 samples/sec   Loss 1.1244   LearningRate 0.0006   Epoch: 18   Global Step: 93530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:19,591-Speed 3397.25 samples/sec   Loss 1.1234   LearningRate 0.0006   Epoch: 18   Global Step: 93540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:22,606-Speed 3398.02 samples/sec   Loss 1.0713   LearningRate 0.0006   Epoch: 18   Global Step: 93550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:25,618-Speed 3399.93 samples/sec   Loss 1.1481   LearningRate 0.0006   Epoch: 18   Global Step: 93560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:28,614-Speed 3418.86 samples/sec   Loss 1.1240   LearningRate 0.0006   Epoch: 18   Global Step: 93570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:31,635-Speed 3390.50 samples/sec   Loss 1.0822   LearningRate 0.0006   Epoch: 18   Global Step: 93580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:34,651-Speed 3396.42 samples/sec   Loss 1.0139   LearningRate 0.0006   Epoch: 18   Global Step: 93590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:37,678-Speed 3384.06 samples/sec   Loss 1.1199   LearningRate 0.0006   Epoch: 18   Global Step: 93600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:40,690-Speed 3399.77 samples/sec   Loss 1.0980   LearningRate 0.0006   Epoch: 18   Global Step: 93610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:43,737-Speed 3361.49 samples/sec   Loss 1.0713   LearningRate 0.0006   Epoch: 18   Global Step: 93620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:53:46,731-Speed 3421.01 samples/sec   Loss 1.0913   LearningRate 0.0006   Epoch: 18   Global Step: 93630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:53:49,754-Speed 3388.60 samples/sec   Loss 1.0853   LearningRate 0.0006   Epoch: 18   Global Step: 93640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:53:52,814-Speed 3346.62 samples/sec   Loss 1.1123   LearningRate 0.0006   Epoch: 18   Global Step: 93650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:53:55,863-Speed 3359.31 samples/sec   Loss 1.0972   LearningRate 0.0005   Epoch: 18   Global Step: 93660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:53:58,879-Speed 3396.37 samples/sec   Loss 1.0730   LearningRate 0.0005   Epoch: 18   Global Step: 93670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:54:01,906-Speed 3384.29 samples/sec   Loss 1.0576   LearningRate 0.0005   Epoch: 18   Global Step: 93680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:54:04,923-Speed 3394.34 samples/sec   Loss 1.0689   LearningRate 0.0005   Epoch: 18   Global Step: 93690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:54:07,939-Speed 3396.18 samples/sec   Loss 1.0319   LearningRate 0.0005   Epoch: 18   Global Step: 93700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:54:10,962-Speed 3388.85 samples/sec   Loss 1.1535   LearningRate 0.0005   Epoch: 18   Global Step: 93710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:54:13,975-Speed 3399.75 samples/sec   Loss 1.1542   LearningRate 0.0005   Epoch: 18   Global Step: 93720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:54:16,988-Speed 3399.06 samples/sec   Loss 1.1187   LearningRate 0.0005   Epoch: 18   Global Step: 93730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:20,016-Speed 3382.49 samples/sec   Loss 1.1310   LearningRate 0.0005   Epoch: 18   Global Step: 93740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:23,036-Speed 3391.93 samples/sec   Loss 1.0347   LearningRate 0.0005   Epoch: 18   Global Step: 93750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:26,047-Speed 3401.05 samples/sec   Loss 1.0592   LearningRate 0.0005   Epoch: 18   Global Step: 93760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:29,068-Speed 3391.52 samples/sec   Loss 1.1830   LearningRate 0.0005   Epoch: 18   Global Step: 93770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:32,079-Speed 3401.17 samples/sec   Loss 1.0585   LearningRate 0.0005   Epoch: 18   Global Step: 93780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:35,104-Speed 3386.28 samples/sec   Loss 1.0703   LearningRate 0.0005   Epoch: 18   Global Step: 93790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:38,125-Speed 3390.28 samples/sec   Loss 1.0487   LearningRate 0.0005   Epoch: 18   Global Step: 93800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:41,135-Speed 3402.13 samples/sec   Loss 1.0986   LearningRate 0.0005   Epoch: 18   Global Step: 93810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:44,149-Speed 3399.50 samples/sec   Loss 1.1521   LearningRate 0.0005   Epoch: 18   Global Step: 93820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:47,165-Speed 3395.48 samples/sec   Loss 1.1350   LearningRate 0.0005   Epoch: 18   Global Step: 93830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 08:54:50,164-Speed 3414.58 samples/sec   Loss 1.0854   LearningRate 0.0005   Epoch: 18   Global Step: 93840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:53,178-Speed 3399.23 samples/sec   Loss 1.0855   LearningRate 0.0005   Epoch: 18   Global Step: 93850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:56,191-Speed 3398.89 samples/sec   Loss 1.0246   LearningRate 0.0005   Epoch: 18   Global Step: 93860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:54:59,213-Speed 3389.67 samples/sec   Loss 1.1095   LearningRate 0.0005   Epoch: 18   Global Step: 93870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:55:02,229-Speed 3396.38 samples/sec   Loss 1.0726   LearningRate 0.0005   Epoch: 18   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:55:05,232-Speed 3410.69 samples/sec   Loss 1.1101   LearningRate 0.0005   Epoch: 18   Global Step: 93890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:08,250-Speed 3394.11 samples/sec   Loss 1.0552   LearningRate 0.0005   Epoch: 18   Global Step: 93900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:11,275-Speed 3386.01 samples/sec   Loss 1.1025   LearningRate 0.0005   Epoch: 18   Global Step: 93910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:14,296-Speed 3390.25 samples/sec   Loss 1.1374   LearningRate 0.0005   Epoch: 18   Global Step: 93920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:17,310-Speed 3397.91 samples/sec   Loss 1.0834   LearningRate 0.0005   Epoch: 18   Global Step: 93930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:20,322-Speed 3400.42 samples/sec   Loss 1.0702   LearningRate 0.0005   Epoch: 18   Global Step: 93940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:23,337-Speed 3397.75 samples/sec   Loss 1.0185   LearningRate 0.0005   Epoch: 18   Global Step: 93950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:26,401-Speed 3342.48 samples/sec   Loss 1.1813   LearningRate 0.0005   Epoch: 18   Global Step: 93960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:29,553-Speed 3249.41 samples/sec   Loss 1.0647   LearningRate 0.0005   Epoch: 18   Global Step: 93970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:32,595-Speed 3367.65 samples/sec   Loss 1.1892   LearningRate 0.0005   Epoch: 18   Global Step: 93980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:55:35,653-Speed 3349.11 samples/sec   Loss 1.1244   LearningRate 0.0005   Epoch: 18   Global Step: 93990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:55:38,675-Speed 3389.61 samples/sec   Loss 1.0864   LearningRate 0.0005   Epoch: 18   Global Step: 94000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:56:22,926-[lfw][94000]XNorm: 21.991098
Training: 2022-04-11 08:56:22,927-[lfw][94000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 08:56:22,927-[lfw][94000]Accuracy-Highest: 0.99850
Training: 2022-04-11 08:57:14,213-[cfp_fp][94000]XNorm: 22.270706
Training: 2022-04-11 08:57:14,214-[cfp_fp][94000]Accuracy-Flip: 0.98857+-0.00447
Training: 2022-04-11 08:57:14,214-[cfp_fp][94000]Accuracy-Highest: 0.98857
Training: 2022-04-11 08:57:58,341-[agedb_30][94000]XNorm: 22.440194
Training: 2022-04-11 08:57:58,342-[agedb_30][94000]Accuracy-Flip: 0.98417+-0.00712
Training: 2022-04-11 08:57:58,343-[agedb_30][94000]Accuracy-Highest: 0.98550
Training: 2022-04-11 08:58:01,358-Speed 71.77 samples/sec   Loss 1.0852   LearningRate 0.0005   Epoch: 18   Global Step: 94010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:04,354-Speed 3418.23 samples/sec   Loss 1.0973   LearningRate 0.0005   Epoch: 18   Global Step: 94020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:07,343-Speed 3426.87 samples/sec   Loss 1.1198   LearningRate 0.0005   Epoch: 18   Global Step: 94030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:10,337-Speed 3421.33 samples/sec   Loss 1.1165   LearningRate 0.0005   Epoch: 18   Global Step: 94040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:13,348-Speed 3400.73 samples/sec   Loss 1.1465   LearningRate 0.0005   Epoch: 18   Global Step: 94050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:16,360-Speed 3401.67 samples/sec   Loss 1.1132   LearningRate 0.0005   Epoch: 18   Global Step: 94060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:19,354-Speed 3421.33 samples/sec   Loss 1.0779   LearningRate 0.0005   Epoch: 18   Global Step: 94070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:22,353-Speed 3414.79 samples/sec   Loss 1.0929   LearningRate 0.0005   Epoch: 18   Global Step: 94080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:25,336-Speed 3433.69 samples/sec   Loss 1.1390   LearningRate 0.0005   Epoch: 18   Global Step: 94090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:28,337-Speed 3412.92 samples/sec   Loss 1.0897   LearningRate 0.0005   Epoch: 18   Global Step: 94100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:31,356-Speed 3392.94 samples/sec   Loss 1.2061   LearningRate 0.0005   Epoch: 18   Global Step: 94110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:34,350-Speed 3420.56 samples/sec   Loss 1.1090   LearningRate 0.0005   Epoch: 18   Global Step: 94120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:37,351-Speed 3413.11 samples/sec   Loss 1.1085   LearningRate 0.0005   Epoch: 18   Global Step: 94130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:40,351-Speed 3414.29 samples/sec   Loss 1.0667   LearningRate 0.0005   Epoch: 18   Global Step: 94140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:43,363-Speed 3400.60 samples/sec   Loss 1.1059   LearningRate 0.0005   Epoch: 18   Global Step: 94150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:58:46,349-Speed 3430.68 samples/sec   Loss 1.0419   LearningRate 0.0005   Epoch: 18   Global Step: 94160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:58:49,363-Speed 3398.77 samples/sec   Loss 1.1155   LearningRate 0.0005   Epoch: 18   Global Step: 94170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:58:52,368-Speed 3408.07 samples/sec   Loss 1.0858   LearningRate 0.0005   Epoch: 18   Global Step: 94180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:58:55,371-Speed 3410.58 samples/sec   Loss 1.0501   LearningRate 0.0005   Epoch: 18   Global Step: 94190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:58:58,388-Speed 3394.50 samples/sec   Loss 1.0876   LearningRate 0.0005   Epoch: 18   Global Step: 94200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:59:01,393-Speed 3408.67 samples/sec   Loss 1.1037   LearningRate 0.0005   Epoch: 18   Global Step: 94210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:59:04,419-Speed 3385.43 samples/sec   Loss 1.1708   LearningRate 0.0005   Epoch: 18   Global Step: 94220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:59:07,456-Speed 3372.34 samples/sec   Loss 1.0718   LearningRate 0.0005   Epoch: 18   Global Step: 94230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:59:10,460-Speed 3409.17 samples/sec   Loss 1.0224   LearningRate 0.0005   Epoch: 18   Global Step: 94240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:59:13,465-Speed 3409.15 samples/sec   Loss 1.0494   LearningRate 0.0005   Epoch: 18   Global Step: 94250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 08:59:16,478-Speed 3399.17 samples/sec   Loss 1.1161   LearningRate 0.0005   Epoch: 18   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:19,481-Speed 3411.66 samples/sec   Loss 1.1872   LearningRate 0.0005   Epoch: 18   Global Step: 94270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:22,490-Speed 3404.29 samples/sec   Loss 1.0669   LearningRate 0.0005   Epoch: 18   Global Step: 94280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:25,493-Speed 3410.76 samples/sec   Loss 1.1542   LearningRate 0.0005   Epoch: 18   Global Step: 94290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:28,556-Speed 3343.15 samples/sec   Loss 1.0222   LearningRate 0.0005   Epoch: 18   Global Step: 94300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:31,570-Speed 3398.45 samples/sec   Loss 1.0257   LearningRate 0.0005   Epoch: 18   Global Step: 94310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:34,576-Speed 3407.24 samples/sec   Loss 1.0857   LearningRate 0.0005   Epoch: 18   Global Step: 94320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:37,580-Speed 3409.37 samples/sec   Loss 1.0495   LearningRate 0.0005   Epoch: 18   Global Step: 94330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:40,596-Speed 3396.56 samples/sec   Loss 1.0841   LearningRate 0.0005   Epoch: 18   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:43,598-Speed 3412.82 samples/sec   Loss 1.1411   LearningRate 0.0005   Epoch: 18   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:46,583-Speed 3430.82 samples/sec   Loss 1.1359   LearningRate 0.0005   Epoch: 18   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:49,641-Speed 3350.20 samples/sec   Loss 1.1040   LearningRate 0.0005   Epoch: 18   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:52,669-Speed 3381.82 samples/sec   Loss 1.0655   LearningRate 0.0004   Epoch: 18   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:55,676-Speed 3407.33 samples/sec   Loss 1.1302   LearningRate 0.0004   Epoch: 18   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 08:59:58,684-Speed 3404.81 samples/sec   Loss 1.0481   LearningRate 0.0004   Epoch: 18   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:01,710-Speed 3383.81 samples/sec   Loss 1.0145   LearningRate 0.0004   Epoch: 18   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:04,738-Speed 3383.85 samples/sec   Loss 1.1004   LearningRate 0.0004   Epoch: 18   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:07,751-Speed 3399.14 samples/sec   Loss 1.1067   LearningRate 0.0004   Epoch: 18   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:10,771-Speed 3392.27 samples/sec   Loss 1.0695   LearningRate 0.0004   Epoch: 18   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:13,782-Speed 3401.50 samples/sec   Loss 1.0045   LearningRate 0.0004   Epoch: 18   Global Step: 94450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:17,025-Speed 3157.35 samples/sec   Loss 1.1115   LearningRate 0.0004   Epoch: 18   Global Step: 94460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:20,040-Speed 3398.01 samples/sec   Loss 1.0456   LearningRate 0.0004   Epoch: 18   Global Step: 94470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:23,044-Speed 3409.33 samples/sec   Loss 1.0858   LearningRate 0.0004   Epoch: 18   Global Step: 94480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:26,053-Speed 3403.83 samples/sec   Loss 1.1401   LearningRate 0.0004   Epoch: 18   Global Step: 94490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:29,064-Speed 3401.91 samples/sec   Loss 0.9909   LearningRate 0.0004   Epoch: 18   Global Step: 94500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:32,076-Speed 3400.30 samples/sec   Loss 1.0631   LearningRate 0.0004   Epoch: 18   Global Step: 94510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:35,096-Speed 3392.25 samples/sec   Loss 1.1585   LearningRate 0.0004   Epoch: 18   Global Step: 94520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:38,137-Speed 3368.20 samples/sec   Loss 1.0857   LearningRate 0.0004   Epoch: 18   Global Step: 94530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:41,150-Speed 3399.13 samples/sec   Loss 1.0143   LearningRate 0.0004   Epoch: 18   Global Step: 94540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:44,163-Speed 3399.46 samples/sec   Loss 1.1321   LearningRate 0.0004   Epoch: 18   Global Step: 94550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:47,153-Speed 3426.43 samples/sec   Loss 1.1034   LearningRate 0.0004   Epoch: 18   Global Step: 94560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:50,160-Speed 3406.68 samples/sec   Loss 1.0542   LearningRate 0.0004   Epoch: 18   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:53,170-Speed 3402.97 samples/sec   Loss 1.0887   LearningRate 0.0004   Epoch: 18   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:56,181-Speed 3402.29 samples/sec   Loss 1.0372   LearningRate 0.0004   Epoch: 18   Global Step: 94590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:00:59,171-Speed 3424.81 samples/sec   Loss 1.1624   LearningRate 0.0004   Epoch: 18   Global Step: 94600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:02,196-Speed 3385.76 samples/sec   Loss 1.1158   LearningRate 0.0004   Epoch: 18   Global Step: 94610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:05,271-Speed 3331.87 samples/sec   Loss 1.0864   LearningRate 0.0004   Epoch: 18   Global Step: 94620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:08,301-Speed 3380.15 samples/sec   Loss 1.0384   LearningRate 0.0004   Epoch: 18   Global Step: 94630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:11,320-Speed 3393.52 samples/sec   Loss 1.0874   LearningRate 0.0004   Epoch: 18   Global Step: 94640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:14,333-Speed 3399.06 samples/sec   Loss 1.0803   LearningRate 0.0004   Epoch: 18   Global Step: 94650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:17,348-Speed 3397.47 samples/sec   Loss 1.0294   LearningRate 0.0004   Epoch: 18   Global Step: 94660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:20,355-Speed 3405.58 samples/sec   Loss 1.0677   LearningRate 0.0004   Epoch: 18   Global Step: 94670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:23,367-Speed 3400.38 samples/sec   Loss 1.0963   LearningRate 0.0004   Epoch: 18   Global Step: 94680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:26,429-Speed 3345.75 samples/sec   Loss 1.0727   LearningRate 0.0004   Epoch: 18   Global Step: 94690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:01:29,459-Speed 3379.60 samples/sec   Loss 1.0169   LearningRate 0.0004   Epoch: 18   Global Step: 94700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:32,473-Speed 3398.87 samples/sec   Loss 1.0862   LearningRate 0.0004   Epoch: 18   Global Step: 94710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:35,492-Speed 3393.24 samples/sec   Loss 1.0989   LearningRate 0.0004   Epoch: 18   Global Step: 94720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:38,516-Speed 3386.84 samples/sec   Loss 1.0876   LearningRate 0.0004   Epoch: 18   Global Step: 94730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:41,542-Speed 3385.84 samples/sec   Loss 1.1147   LearningRate 0.0004   Epoch: 18   Global Step: 94740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:44,554-Speed 3400.63 samples/sec   Loss 1.0732   LearningRate 0.0004   Epoch: 18   Global Step: 94750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:47,568-Speed 3398.10 samples/sec   Loss 1.0144   LearningRate 0.0004   Epoch: 18   Global Step: 94760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:50,589-Speed 3390.66 samples/sec   Loss 1.0253   LearningRate 0.0004   Epoch: 18   Global Step: 94770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:53,609-Speed 3391.38 samples/sec   Loss 1.1077   LearningRate 0.0004   Epoch: 18   Global Step: 94780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:56,620-Speed 3401.71 samples/sec   Loss 1.0091   LearningRate 0.0004   Epoch: 18   Global Step: 94790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:01:59,608-Speed 3427.36 samples/sec   Loss 1.0608   LearningRate 0.0004   Epoch: 18   Global Step: 94800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:02,619-Speed 3401.22 samples/sec   Loss 1.1146   LearningRate 0.0004   Epoch: 18   Global Step: 94810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:05,635-Speed 3397.62 samples/sec   Loss 1.1705   LearningRate 0.0004   Epoch: 18   Global Step: 94820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:08,644-Speed 3404.40 samples/sec   Loss 1.0856   LearningRate 0.0004   Epoch: 18   Global Step: 94830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:11,657-Speed 3399.90 samples/sec   Loss 1.0748   LearningRate 0.0004   Epoch: 18   Global Step: 94840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:14,666-Speed 3403.14 samples/sec   Loss 1.0086   LearningRate 0.0004   Epoch: 18   Global Step: 94850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:17,679-Speed 3399.66 samples/sec   Loss 0.9590   LearningRate 0.0004   Epoch: 18   Global Step: 94860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:20,692-Speed 3400.00 samples/sec   Loss 1.1557   LearningRate 0.0004   Epoch: 18   Global Step: 94870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:23,712-Speed 3391.33 samples/sec   Loss 1.0812   LearningRate 0.0004   Epoch: 18   Global Step: 94880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:26,733-Speed 3390.77 samples/sec   Loss 1.0398   LearningRate 0.0004   Epoch: 18   Global Step: 94890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:02:29,743-Speed 3402.46 samples/sec   Loss 1.0868   LearningRate 0.0004   Epoch: 18   Global Step: 94900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:32,765-Speed 3388.75 samples/sec   Loss 1.0765   LearningRate 0.0004   Epoch: 18   Global Step: 94910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:35,779-Speed 3399.35 samples/sec   Loss 1.0913   LearningRate 0.0004   Epoch: 18   Global Step: 94920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:38,824-Speed 3363.64 samples/sec   Loss 1.0335   LearningRate 0.0004   Epoch: 18   Global Step: 94930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:41,854-Speed 3380.08 samples/sec   Loss 1.0154   LearningRate 0.0004   Epoch: 18   Global Step: 94940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:44,868-Speed 3399.46 samples/sec   Loss 1.0505   LearningRate 0.0004   Epoch: 18   Global Step: 94950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:47,884-Speed 3395.41 samples/sec   Loss 1.1120   LearningRate 0.0004   Epoch: 18   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:50,895-Speed 3402.16 samples/sec   Loss 1.0129   LearningRate 0.0004   Epoch: 18   Global Step: 94970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:53,921-Speed 3384.03 samples/sec   Loss 1.0371   LearningRate 0.0004   Epoch: 18   Global Step: 94980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:56,948-Speed 3383.65 samples/sec   Loss 1.0915   LearningRate 0.0004   Epoch: 18   Global Step: 94990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:02:59,949-Speed 3413.72 samples/sec   Loss 1.0145   LearningRate 0.0004   Epoch: 18   Global Step: 95000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:02,970-Speed 3390.69 samples/sec   Loss 1.0955   LearningRate 0.0004   Epoch: 18   Global Step: 95010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:05,992-Speed 3390.42 samples/sec   Loss 1.0316   LearningRate 0.0004   Epoch: 18   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:09,001-Speed 3403.08 samples/sec   Loss 1.0601   LearningRate 0.0004   Epoch: 18   Global Step: 95030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:12,022-Speed 3391.04 samples/sec   Loss 1.0733   LearningRate 0.0004   Epoch: 18   Global Step: 95040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:15,036-Speed 3398.25 samples/sec   Loss 1.0653   LearningRate 0.0004   Epoch: 18   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:18,060-Speed 3387.04 samples/sec   Loss 1.0970   LearningRate 0.0004   Epoch: 18   Global Step: 95060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:21,080-Speed 3391.98 samples/sec   Loss 1.0245   LearningRate 0.0004   Epoch: 18   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:24,113-Speed 3376.75 samples/sec   Loss 1.1493   LearningRate 0.0004   Epoch: 18   Global Step: 95080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:27,131-Speed 3392.92 samples/sec   Loss 1.0852   LearningRate 0.0004   Epoch: 18   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:30,133-Speed 3412.74 samples/sec   Loss 1.0294   LearningRate 0.0004   Epoch: 18   Global Step: 95100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:33,145-Speed 3401.04 samples/sec   Loss 1.1876   LearningRate 0.0004   Epoch: 18   Global Step: 95110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:36,168-Speed 3388.04 samples/sec   Loss 1.1260   LearningRate 0.0004   Epoch: 18   Global Step: 95120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:39,185-Speed 3394.79 samples/sec   Loss 1.1275   LearningRate 0.0004   Epoch: 18   Global Step: 95130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:42,199-Speed 3398.47 samples/sec   Loss 1.0762   LearningRate 0.0004   Epoch: 18   Global Step: 95140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:45,210-Speed 3401.89 samples/sec   Loss 1.0620   LearningRate 0.0004   Epoch: 18   Global Step: 95150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:48,223-Speed 3398.73 samples/sec   Loss 1.1781   LearningRate 0.0004   Epoch: 18   Global Step: 95160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:51,241-Speed 3394.54 samples/sec   Loss 1.1613   LearningRate 0.0004   Epoch: 18   Global Step: 95170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:54,276-Speed 3374.12 samples/sec   Loss 1.0521   LearningRate 0.0003   Epoch: 18   Global Step: 95180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:03:57,299-Speed 3388.26 samples/sec   Loss 1.1169   LearningRate 0.0003   Epoch: 18   Global Step: 95190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:00,300-Speed 3413.01 samples/sec   Loss 1.1482   LearningRate 0.0003   Epoch: 18   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:03,461-Speed 3240.24 samples/sec   Loss 1.1710   LearningRate 0.0003   Epoch: 18   Global Step: 95210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:06,481-Speed 3392.81 samples/sec   Loss 1.0263   LearningRate 0.0003   Epoch: 18   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:09,526-Speed 3362.57 samples/sec   Loss 1.1391   LearningRate 0.0003   Epoch: 18   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:12,565-Speed 3370.54 samples/sec   Loss 1.0407   LearningRate 0.0003   Epoch: 18   Global Step: 95240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:15,586-Speed 3390.81 samples/sec   Loss 1.0145   LearningRate 0.0003   Epoch: 18   Global Step: 95250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:18,604-Speed 3393.48 samples/sec   Loss 1.0056   LearningRate 0.0003   Epoch: 18   Global Step: 95260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:21,617-Speed 3399.56 samples/sec   Loss 1.1074   LearningRate 0.0003   Epoch: 18   Global Step: 95270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:24,665-Speed 3360.29 samples/sec   Loss 1.0595   LearningRate 0.0003   Epoch: 18   Global Step: 95280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:04:27,675-Speed 3403.57 samples/sec   Loss 1.1080   LearningRate 0.0003   Epoch: 18   Global Step: 95290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:30,688-Speed 3399.19 samples/sec   Loss 1.0418   LearningRate 0.0003   Epoch: 18   Global Step: 95300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:33,701-Speed 3399.95 samples/sec   Loss 1.0239   LearningRate 0.0003   Epoch: 18   Global Step: 95310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:36,719-Speed 3393.41 samples/sec   Loss 1.0068   LearningRate 0.0003   Epoch: 18   Global Step: 95320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:39,753-Speed 3375.57 samples/sec   Loss 1.0441   LearningRate 0.0003   Epoch: 18   Global Step: 95330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:42,775-Speed 3389.87 samples/sec   Loss 1.1489   LearningRate 0.0003   Epoch: 18   Global Step: 95340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:45,786-Speed 3401.60 samples/sec   Loss 1.0532   LearningRate 0.0003   Epoch: 18   Global Step: 95350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:48,808-Speed 3388.48 samples/sec   Loss 1.0841   LearningRate 0.0003   Epoch: 18   Global Step: 95360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:51,829-Speed 3391.89 samples/sec   Loss 0.9938   LearningRate 0.0003   Epoch: 18   Global Step: 95370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:54,848-Speed 3392.62 samples/sec   Loss 1.0629   LearningRate 0.0003   Epoch: 18   Global Step: 95380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:04:57,861-Speed 3399.37 samples/sec   Loss 1.0370   LearningRate 0.0003   Epoch: 18   Global Step: 95390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:00,872-Speed 3402.08 samples/sec   Loss 1.0187   LearningRate 0.0003   Epoch: 18   Global Step: 95400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:03,891-Speed 3392.29 samples/sec   Loss 1.1034   LearningRate 0.0003   Epoch: 18   Global Step: 95410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:06,907-Speed 3396.23 samples/sec   Loss 1.0427   LearningRate 0.0003   Epoch: 18   Global Step: 95420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:09,929-Speed 3388.61 samples/sec   Loss 1.1133   LearningRate 0.0003   Epoch: 18   Global Step: 95430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:12,945-Speed 3396.78 samples/sec   Loss 1.1168   LearningRate 0.0003   Epoch: 18   Global Step: 95440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:15,959-Speed 3397.60 samples/sec   Loss 1.1911   LearningRate 0.0003   Epoch: 18   Global Step: 95450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:18,987-Speed 3382.80 samples/sec   Loss 1.1148   LearningRate 0.0003   Epoch: 18   Global Step: 95460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:22,013-Speed 3385.09 samples/sec   Loss 1.0773   LearningRate 0.0003   Epoch: 18   Global Step: 95470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:25,045-Speed 3377.89 samples/sec   Loss 1.1005   LearningRate 0.0003   Epoch: 18   Global Step: 95480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:28,050-Speed 3408.30 samples/sec   Loss 1.1612   LearningRate 0.0003   Epoch: 18   Global Step: 95490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:31,086-Speed 3374.60 samples/sec   Loss 1.1381   LearningRate 0.0003   Epoch: 18   Global Step: 95500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:34,102-Speed 3395.73 samples/sec   Loss 1.0780   LearningRate 0.0003   Epoch: 18   Global Step: 95510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:37,118-Speed 3395.90 samples/sec   Loss 1.0552   LearningRate 0.0003   Epoch: 18   Global Step: 95520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:40,137-Speed 3393.08 samples/sec   Loss 1.0729   LearningRate 0.0003   Epoch: 18   Global Step: 95530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:43,162-Speed 3385.25 samples/sec   Loss 1.1469   LearningRate 0.0003   Epoch: 18   Global Step: 95540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:46,182-Speed 3392.12 samples/sec   Loss 0.9925   LearningRate 0.0003   Epoch: 18   Global Step: 95550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:49,202-Speed 3391.80 samples/sec   Loss 1.0105   LearningRate 0.0003   Epoch: 18   Global Step: 95560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:52,215-Speed 3399.00 samples/sec   Loss 1.0639   LearningRate 0.0003   Epoch: 18   Global Step: 95570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:55,282-Speed 3339.62 samples/sec   Loss 1.0604   LearningRate 0.0003   Epoch: 18   Global Step: 95580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:05:58,274-Speed 3423.63 samples/sec   Loss 1.1456   LearningRate 0.0003   Epoch: 18   Global Step: 95590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:01,289-Speed 3397.34 samples/sec   Loss 1.0473   LearningRate 0.0003   Epoch: 18   Global Step: 95600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:04,307-Speed 3393.96 samples/sec   Loss 1.0827   LearningRate 0.0003   Epoch: 18   Global Step: 95610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:07,325-Speed 3393.53 samples/sec   Loss 0.9952   LearningRate 0.0003   Epoch: 18   Global Step: 95620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:10,342-Speed 3395.00 samples/sec   Loss 0.9496   LearningRate 0.0003   Epoch: 18   Global Step: 95630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:13,348-Speed 3407.50 samples/sec   Loss 1.1354   LearningRate 0.0003   Epoch: 18   Global Step: 95640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:16,391-Speed 3365.76 samples/sec   Loss 1.1330   LearningRate 0.0003   Epoch: 18   Global Step: 95650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:19,410-Speed 3392.90 samples/sec   Loss 1.0368   LearningRate 0.0003   Epoch: 18   Global Step: 95660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:22,421-Speed 3402.23 samples/sec   Loss 1.0709   LearningRate 0.0003   Epoch: 18   Global Step: 95670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:25,436-Speed 3396.64 samples/sec   Loss 1.0738   LearningRate 0.0003   Epoch: 18   Global Step: 95680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:28,452-Speed 3396.36 samples/sec   Loss 1.0860   LearningRate 0.0003   Epoch: 18   Global Step: 95690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:31,458-Speed 3407.23 samples/sec   Loss 1.0284   LearningRate 0.0003   Epoch: 18   Global Step: 95700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:34,495-Speed 3372.62 samples/sec   Loss 1.1636   LearningRate 0.0003   Epoch: 18   Global Step: 95710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:37,519-Speed 3387.75 samples/sec   Loss 1.0496   LearningRate 0.0003   Epoch: 18   Global Step: 95720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:40,531-Speed 3400.63 samples/sec   Loss 1.0507   LearningRate 0.0003   Epoch: 18   Global Step: 95730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:06:43,547-Speed 3395.95 samples/sec   Loss 1.1413   LearningRate 0.0003   Epoch: 18   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:46,567-Speed 3392.22 samples/sec   Loss 1.1856   LearningRate 0.0003   Epoch: 18   Global Step: 95750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:49,587-Speed 3390.94 samples/sec   Loss 1.0560   LearningRate 0.0003   Epoch: 18   Global Step: 95760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:52,708-Speed 3281.71 samples/sec   Loss 1.1247   LearningRate 0.0003   Epoch: 18   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:55,727-Speed 3392.70 samples/sec   Loss 1.0836   LearningRate 0.0003   Epoch: 18   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:06:58,747-Speed 3391.31 samples/sec   Loss 1.0419   LearningRate 0.0003   Epoch: 18   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:01,768-Speed 3390.49 samples/sec   Loss 1.0826   LearningRate 0.0003   Epoch: 18   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:04,796-Speed 3382.91 samples/sec   Loss 1.1263   LearningRate 0.0003   Epoch: 18   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:07,813-Speed 3395.60 samples/sec   Loss 1.1017   LearningRate 0.0003   Epoch: 18   Global Step: 95820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:10,802-Speed 3426.83 samples/sec   Loss 1.0606   LearningRate 0.0003   Epoch: 18   Global Step: 95830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:13,814-Speed 3400.09 samples/sec   Loss 1.0048   LearningRate 0.0003   Epoch: 18   Global Step: 95840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:16,829-Speed 3397.83 samples/sec   Loss 1.0406   LearningRate 0.0003   Epoch: 18   Global Step: 95850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:19,842-Speed 3399.04 samples/sec   Loss 1.0877   LearningRate 0.0003   Epoch: 18   Global Step: 95860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:22,856-Speed 3398.08 samples/sec   Loss 1.1388   LearningRate 0.0003   Epoch: 18   Global Step: 95870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:25,869-Speed 3399.54 samples/sec   Loss 1.1447   LearningRate 0.0003   Epoch: 18   Global Step: 95880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:28,885-Speed 3395.72 samples/sec   Loss 1.1014   LearningRate 0.0003   Epoch: 18   Global Step: 95890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:31,908-Speed 3388.11 samples/sec   Loss 1.0975   LearningRate 0.0003   Epoch: 18   Global Step: 95900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:34,939-Speed 3380.05 samples/sec   Loss 1.1125   LearningRate 0.0003   Epoch: 18   Global Step: 95910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:37,962-Speed 3387.92 samples/sec   Loss 1.0982   LearningRate 0.0003   Epoch: 18   Global Step: 95920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:07:40,978-Speed 3396.89 samples/sec   Loss 0.9994   LearningRate 0.0003   Epoch: 18   Global Step: 95930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:44,003-Speed 3385.83 samples/sec   Loss 1.1448   LearningRate 0.0003   Epoch: 18   Global Step: 95940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:47,025-Speed 3389.67 samples/sec   Loss 1.0878   LearningRate 0.0003   Epoch: 18   Global Step: 95950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:50,034-Speed 3402.84 samples/sec   Loss 1.1104   LearningRate 0.0003   Epoch: 18   Global Step: 95960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:53,134-Speed 3304.60 samples/sec   Loss 1.0581   LearningRate 0.0003   Epoch: 18   Global Step: 95970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:56,187-Speed 3354.24 samples/sec   Loss 1.0635   LearningRate 0.0003   Epoch: 18   Global Step: 95980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:07:59,238-Speed 3357.73 samples/sec   Loss 1.0586   LearningRate 0.0003   Epoch: 18   Global Step: 95990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:08:02,250-Speed 3401.37 samples/sec   Loss 1.0423   LearningRate 0.0003   Epoch: 18   Global Step: 96000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:08:46,221-[lfw][96000]XNorm: 22.269381
Training: 2022-04-11 09:08:46,222-[lfw][96000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 09:08:46,222-[lfw][96000]Accuracy-Highest: 0.99850
Training: 2022-04-11 09:09:37,551-[cfp_fp][96000]XNorm: 22.442077
Training: 2022-04-11 09:09:37,551-[cfp_fp][96000]Accuracy-Flip: 0.98943+-0.00539
Training: 2022-04-11 09:09:37,552-[cfp_fp][96000]Accuracy-Highest: 0.98943
Training: 2022-04-11 09:10:21,666-[agedb_30][96000]XNorm: 22.679242
Training: 2022-04-11 09:10:21,667-[agedb_30][96000]Accuracy-Flip: 0.98400+-0.00688
Training: 2022-04-11 09:10:21,667-[agedb_30][96000]Accuracy-Highest: 0.98550
Training: 2022-04-11 09:10:24,666-Speed 71.90 samples/sec   Loss 1.0744   LearningRate 0.0003   Epoch: 18   Global Step: 96010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:10:27,670-Speed 3409.69 samples/sec   Loss 1.0556   LearningRate 0.0003   Epoch: 18   Global Step: 96020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:10:30,640-Speed 3448.40 samples/sec   Loss 1.0444   LearningRate 0.0003   Epoch: 18   Global Step: 96030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:10:33,635-Speed 3419.98 samples/sec   Loss 1.0770   LearningRate 0.0003   Epoch: 18   Global Step: 96040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:10:36,631-Speed 3418.43 samples/sec   Loss 1.1678   LearningRate 0.0003   Epoch: 18   Global Step: 96050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:10:39,643-Speed 3400.69 samples/sec   Loss 1.0820   LearningRate 0.0003   Epoch: 18   Global Step: 96060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:10:42,647-Speed 3410.39 samples/sec   Loss 1.0032   LearningRate 0.0003   Epoch: 18   Global Step: 96070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:10:45,633-Speed 3429.92 samples/sec   Loss 1.0342   LearningRate 0.0003   Epoch: 18   Global Step: 96080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:10:48,629-Speed 3419.16 samples/sec   Loss 1.0957   LearningRate 0.0003   Epoch: 18   Global Step: 96090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:10:51,726-Speed 3306.71 samples/sec   Loss 1.0103   LearningRate 0.0003   Epoch: 18   Global Step: 96100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:03,906-Speed 840.83 samples/sec   Loss 1.0279   LearningRate 0.0002   Epoch: 19   Global Step: 96110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:07,078-Speed 3229.52 samples/sec   Loss 0.9468   LearningRate 0.0002   Epoch: 19   Global Step: 96120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:10,072-Speed 3420.43 samples/sec   Loss 1.0121   LearningRate 0.0002   Epoch: 19   Global Step: 96130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:13,074-Speed 3412.40 samples/sec   Loss 0.9285   LearningRate 0.0002   Epoch: 19   Global Step: 96140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:16,088-Speed 3397.93 samples/sec   Loss 0.9187   LearningRate 0.0002   Epoch: 19   Global Step: 96150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:19,090-Speed 3412.86 samples/sec   Loss 0.8867   LearningRate 0.0002   Epoch: 19   Global Step: 96160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:22,089-Speed 3414.22 samples/sec   Loss 0.9566   LearningRate 0.0002   Epoch: 19   Global Step: 96170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:25,101-Speed 3401.12 samples/sec   Loss 0.9586   LearningRate 0.0002   Epoch: 19   Global Step: 96180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:11:28,130-Speed 3382.05 samples/sec   Loss 0.9255   LearningRate 0.0002   Epoch: 19   Global Step: 96190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:11:31,136-Speed 3406.92 samples/sec   Loss 0.8500   LearningRate 0.0002   Epoch: 19   Global Step: 96200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:11:34,146-Speed 3403.90 samples/sec   Loss 0.8934   LearningRate 0.0002   Epoch: 19   Global Step: 96210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:11:37,156-Speed 3401.82 samples/sec   Loss 0.8999   LearningRate 0.0002   Epoch: 19   Global Step: 96220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:11:40,159-Speed 3410.79 samples/sec   Loss 0.8705   LearningRate 0.0002   Epoch: 19   Global Step: 96230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:11:43,164-Speed 3408.37 samples/sec   Loss 0.8308   LearningRate 0.0002   Epoch: 19   Global Step: 96240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 09:11:46,154-Speed 3426.36 samples/sec   Loss 0.9282   LearningRate 0.0002   Epoch: 19   Global Step: 96250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:49,171-Speed 3394.62 samples/sec   Loss 0.9264   LearningRate 0.0002   Epoch: 19   Global Step: 96260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:52,224-Speed 3355.25 samples/sec   Loss 0.9794   LearningRate 0.0002   Epoch: 19   Global Step: 96270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:55,237-Speed 3399.45 samples/sec   Loss 0.9590   LearningRate 0.0002   Epoch: 19   Global Step: 96280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:11:58,247-Speed 3402.68 samples/sec   Loss 0.8906   LearningRate 0.0002   Epoch: 19   Global Step: 96290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:12:01,257-Speed 3402.75 samples/sec   Loss 0.9315   LearningRate 0.0002   Epoch: 19   Global Step: 96300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:12:04,266-Speed 3404.44 samples/sec   Loss 0.9217   LearningRate 0.0002   Epoch: 19   Global Step: 96310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:12:07,271-Speed 3408.41 samples/sec   Loss 0.9779   LearningRate 0.0002   Epoch: 19   Global Step: 96320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 09:12:10,278-Speed 3406.36 samples/sec   Loss 0.9449   LearningRate 0.0002   Epoch: 19   Global Step: 96330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:12:13,296-Speed 3394.18 samples/sec   Loss 0.9027   LearningRate 0.0002   Epoch: 19   Global Step: 96340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:12:16,327-Speed 3378.21 samples/sec   Loss 0.9467   LearningRate 0.0002   Epoch: 19   Global Step: 96350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:19,336-Speed 3404.98 samples/sec   Loss 0.9227   LearningRate 0.0002   Epoch: 19   Global Step: 96360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:22,342-Speed 3407.36 samples/sec   Loss 0.9339   LearningRate 0.0002   Epoch: 19   Global Step: 96370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:25,356-Speed 3398.32 samples/sec   Loss 0.8804   LearningRate 0.0002   Epoch: 19   Global Step: 96380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:28,461-Speed 3299.49 samples/sec   Loss 0.8729   LearningRate 0.0002   Epoch: 19   Global Step: 96390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:31,514-Speed 3355.48 samples/sec   Loss 0.9932   LearningRate 0.0002   Epoch: 19   Global Step: 96400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:34,515-Speed 3413.01 samples/sec   Loss 0.9415   LearningRate 0.0002   Epoch: 19   Global Step: 96410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:37,523-Speed 3404.68 samples/sec   Loss 0.9586   LearningRate 0.0002   Epoch: 19   Global Step: 96420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:40,570-Speed 3361.51 samples/sec   Loss 0.9786   LearningRate 0.0002   Epoch: 19   Global Step: 96430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:43,572-Speed 3412.06 samples/sec   Loss 0.9132   LearningRate 0.0002   Epoch: 19   Global Step: 96440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:46,567-Speed 3419.39 samples/sec   Loss 0.8701   LearningRate 0.0002   Epoch: 19   Global Step: 96450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:49,574-Speed 3407.54 samples/sec   Loss 0.8504   LearningRate 0.0002   Epoch: 19   Global Step: 96460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:52,577-Speed 3409.96 samples/sec   Loss 0.9865   LearningRate 0.0002   Epoch: 19   Global Step: 96470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:55,583-Speed 3407.66 samples/sec   Loss 0.9856   LearningRate 0.0002   Epoch: 19   Global Step: 96480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:12:58,594-Speed 3401.71 samples/sec   Loss 0.8549   LearningRate 0.0002   Epoch: 19   Global Step: 96490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:01,616-Speed 3388.63 samples/sec   Loss 0.9229   LearningRate 0.0002   Epoch: 19   Global Step: 96500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:04,625-Speed 3404.53 samples/sec   Loss 0.9237   LearningRate 0.0002   Epoch: 19   Global Step: 96510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:07,632-Speed 3405.65 samples/sec   Loss 0.8948   LearningRate 0.0002   Epoch: 19   Global Step: 96520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:10,643-Speed 3401.74 samples/sec   Loss 1.0667   LearningRate 0.0002   Epoch: 19   Global Step: 96530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:13,680-Speed 3372.88 samples/sec   Loss 0.9010   LearningRate 0.0002   Epoch: 19   Global Step: 96540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:16,675-Speed 3420.01 samples/sec   Loss 1.0138   LearningRate 0.0002   Epoch: 19   Global Step: 96550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:19,695-Speed 3392.19 samples/sec   Loss 1.0427   LearningRate 0.0002   Epoch: 19   Global Step: 96560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:22,699-Speed 3409.30 samples/sec   Loss 0.9623   LearningRate 0.0002   Epoch: 19   Global Step: 96570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:25,703-Speed 3409.86 samples/sec   Loss 0.8811   LearningRate 0.0002   Epoch: 19   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:28,711-Speed 3405.40 samples/sec   Loss 0.9198   LearningRate 0.0002   Epoch: 19   Global Step: 96590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:13:31,702-Speed 3423.38 samples/sec   Loss 0.8726   LearningRate 0.0002   Epoch: 19   Global Step: 96600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:34,712-Speed 3403.54 samples/sec   Loss 1.0044   LearningRate 0.0002   Epoch: 19   Global Step: 96610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:37,716-Speed 3409.23 samples/sec   Loss 0.9115   LearningRate 0.0002   Epoch: 19   Global Step: 96620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:40,728-Speed 3400.71 samples/sec   Loss 0.8478   LearningRate 0.0002   Epoch: 19   Global Step: 96630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:43,732-Speed 3410.22 samples/sec   Loss 0.9630   LearningRate 0.0002   Epoch: 19   Global Step: 96640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:46,739-Speed 3406.41 samples/sec   Loss 0.9256   LearningRate 0.0002   Epoch: 19   Global Step: 96650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:49,744-Speed 3408.59 samples/sec   Loss 0.9525   LearningRate 0.0002   Epoch: 19   Global Step: 96660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:52,749-Speed 3408.17 samples/sec   Loss 0.9491   LearningRate 0.0002   Epoch: 19   Global Step: 96670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:55,760-Speed 3401.99 samples/sec   Loss 0.9441   LearningRate 0.0002   Epoch: 19   Global Step: 96680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:13:58,764-Speed 3408.91 samples/sec   Loss 0.9445   LearningRate 0.0002   Epoch: 19   Global Step: 96690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:01,800-Speed 3373.95 samples/sec   Loss 0.8916   LearningRate 0.0002   Epoch: 19   Global Step: 96700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:04,810-Speed 3402.63 samples/sec   Loss 0.9764   LearningRate 0.0002   Epoch: 19   Global Step: 96710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:07,824-Speed 3399.18 samples/sec   Loss 0.9175   LearningRate 0.0002   Epoch: 19   Global Step: 96720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:10,832-Speed 3404.98 samples/sec   Loss 0.9888   LearningRate 0.0002   Epoch: 19   Global Step: 96730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:13,886-Speed 3353.48 samples/sec   Loss 0.9011   LearningRate 0.0002   Epoch: 19   Global Step: 96740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:16,907-Speed 3390.60 samples/sec   Loss 0.8989   LearningRate 0.0002   Epoch: 19   Global Step: 96750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:19,918-Speed 3401.81 samples/sec   Loss 0.9884   LearningRate 0.0002   Epoch: 19   Global Step: 96760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:22,940-Speed 3388.96 samples/sec   Loss 0.8590   LearningRate 0.0002   Epoch: 19   Global Step: 96770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:25,958-Speed 3394.44 samples/sec   Loss 1.0120   LearningRate 0.0002   Epoch: 19   Global Step: 96780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:28,967-Speed 3404.32 samples/sec   Loss 0.9700   LearningRate 0.0002   Epoch: 19   Global Step: 96790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:31,969-Speed 3411.46 samples/sec   Loss 0.9183   LearningRate 0.0002   Epoch: 19   Global Step: 96800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:34,976-Speed 3406.71 samples/sec   Loss 0.9458   LearningRate 0.0002   Epoch: 19   Global Step: 96810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:14:37,982-Speed 3407.30 samples/sec   Loss 0.9872   LearningRate 0.0002   Epoch: 19   Global Step: 96820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:40,999-Speed 3395.05 samples/sec   Loss 0.8750   LearningRate 0.0002   Epoch: 19   Global Step: 96830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:44,046-Speed 3361.66 samples/sec   Loss 0.9312   LearningRate 0.0002   Epoch: 19   Global Step: 96840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:47,052-Speed 3407.08 samples/sec   Loss 0.9929   LearningRate 0.0002   Epoch: 19   Global Step: 96850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:50,061-Speed 3404.46 samples/sec   Loss 0.9941   LearningRate 0.0002   Epoch: 19   Global Step: 96860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:53,088-Speed 3383.71 samples/sec   Loss 0.9375   LearningRate 0.0002   Epoch: 19   Global Step: 96870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:56,099-Speed 3400.90 samples/sec   Loss 0.8960   LearningRate 0.0002   Epoch: 19   Global Step: 96880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:14:59,116-Speed 3395.14 samples/sec   Loss 0.9910   LearningRate 0.0002   Epoch: 19   Global Step: 96890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:15:02,132-Speed 3396.63 samples/sec   Loss 0.9238   LearningRate 0.0002   Epoch: 19   Global Step: 96900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:15:05,177-Speed 3363.43 samples/sec   Loss 1.0153   LearningRate 0.0002   Epoch: 19   Global Step: 96910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:15:08,190-Speed 3399.89 samples/sec   Loss 1.0067   LearningRate 0.0002   Epoch: 19   Global Step: 96920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:11,219-Speed 3380.72 samples/sec   Loss 1.0099   LearningRate 0.0002   Epoch: 19   Global Step: 96930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:14,237-Speed 3394.41 samples/sec   Loss 0.9483   LearningRate 0.0002   Epoch: 19   Global Step: 96940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:17,243-Speed 3407.44 samples/sec   Loss 0.8893   LearningRate 0.0002   Epoch: 19   Global Step: 96950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:20,266-Speed 3388.20 samples/sec   Loss 0.9466   LearningRate 0.0002   Epoch: 19   Global Step: 96960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:23,275-Speed 3404.12 samples/sec   Loss 0.9797   LearningRate 0.0002   Epoch: 19   Global Step: 96970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:26,286-Speed 3401.06 samples/sec   Loss 0.8678   LearningRate 0.0002   Epoch: 19   Global Step: 96980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:29,299-Speed 3400.40 samples/sec   Loss 0.9483   LearningRate 0.0002   Epoch: 19   Global Step: 96990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:32,311-Speed 3400.98 samples/sec   Loss 0.9644   LearningRate 0.0002   Epoch: 19   Global Step: 97000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:35,333-Speed 3388.27 samples/sec   Loss 1.0240   LearningRate 0.0002   Epoch: 19   Global Step: 97010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:38,329-Speed 3419.92 samples/sec   Loss 0.9197   LearningRate 0.0002   Epoch: 19   Global Step: 97020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:41,344-Speed 3397.07 samples/sec   Loss 0.9874   LearningRate 0.0002   Epoch: 19   Global Step: 97030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:44,367-Speed 3388.07 samples/sec   Loss 0.9417   LearningRate 0.0002   Epoch: 19   Global Step: 97040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:47,383-Speed 3396.22 samples/sec   Loss 0.9329   LearningRate 0.0002   Epoch: 19   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:50,396-Speed 3399.06 samples/sec   Loss 0.8961   LearningRate 0.0002   Epoch: 19   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:53,421-Speed 3386.07 samples/sec   Loss 0.9989   LearningRate 0.0002   Epoch: 19   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:56,436-Speed 3396.43 samples/sec   Loss 0.9683   LearningRate 0.0002   Epoch: 19   Global Step: 97080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:15:59,456-Speed 3392.32 samples/sec   Loss 0.9939   LearningRate 0.0002   Epoch: 19   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:02,471-Speed 3397.40 samples/sec   Loss 0.9089   LearningRate 0.0002   Epoch: 19   Global Step: 97100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:05,493-Speed 3389.52 samples/sec   Loss 1.0183   LearningRate 0.0002   Epoch: 19   Global Step: 97110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:08,487-Speed 3421.51 samples/sec   Loss 1.0066   LearningRate 0.0002   Epoch: 19   Global Step: 97120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:11,505-Speed 3393.27 samples/sec   Loss 1.0003   LearningRate 0.0002   Epoch: 19   Global Step: 97130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:14,552-Speed 3361.78 samples/sec   Loss 0.9928   LearningRate 0.0002   Epoch: 19   Global Step: 97140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:17,564-Speed 3400.73 samples/sec   Loss 0.9115   LearningRate 0.0002   Epoch: 19   Global Step: 97150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:20,594-Speed 3380.35 samples/sec   Loss 0.9787   LearningRate 0.0002   Epoch: 19   Global Step: 97160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:23,610-Speed 3396.33 samples/sec   Loss 0.8785   LearningRate 0.0002   Epoch: 19   Global Step: 97170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:26,627-Speed 3394.46 samples/sec   Loss 0.9163   LearningRate 0.0002   Epoch: 19   Global Step: 97180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:29,639-Speed 3401.06 samples/sec   Loss 1.0230   LearningRate 0.0002   Epoch: 19   Global Step: 97190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:32,657-Speed 3393.99 samples/sec   Loss 0.9265   LearningRate 0.0002   Epoch: 19   Global Step: 97200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:35,670-Speed 3399.60 samples/sec   Loss 0.9268   LearningRate 0.0002   Epoch: 19   Global Step: 97210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:38,692-Speed 3389.64 samples/sec   Loss 0.9841   LearningRate 0.0002   Epoch: 19   Global Step: 97220   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 09:16:41,693-Speed 3412.76 samples/sec   Loss 0.9469   LearningRate 0.0002   Epoch: 19   Global Step: 97230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:44,716-Speed 3388.30 samples/sec   Loss 0.9163   LearningRate 0.0002   Epoch: 19   Global Step: 97240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:47,731-Speed 3396.92 samples/sec   Loss 0.8405   LearningRate 0.0001   Epoch: 19   Global Step: 97250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:50,753-Speed 3388.73 samples/sec   Loss 0.9545   LearningRate 0.0001   Epoch: 19   Global Step: 97260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:53,767-Speed 3399.31 samples/sec   Loss 0.8797   LearningRate 0.0001   Epoch: 19   Global Step: 97270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:56,789-Speed 3388.77 samples/sec   Loss 0.8880   LearningRate 0.0001   Epoch: 19   Global Step: 97280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:16:59,805-Speed 3396.84 samples/sec   Loss 0.9742   LearningRate 0.0001   Epoch: 19   Global Step: 97290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:02,849-Speed 3365.10 samples/sec   Loss 0.9640   LearningRate 0.0001   Epoch: 19   Global Step: 97300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:05,874-Speed 3385.27 samples/sec   Loss 0.9167   LearningRate 0.0001   Epoch: 19   Global Step: 97310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:08,891-Speed 3395.56 samples/sec   Loss 0.9266   LearningRate 0.0001   Epoch: 19   Global Step: 97320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:11,895-Speed 3409.39 samples/sec   Loss 1.0009   LearningRate 0.0001   Epoch: 19   Global Step: 97330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:14,907-Speed 3400.94 samples/sec   Loss 0.8761   LearningRate 0.0001   Epoch: 19   Global Step: 97340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:17,922-Speed 3396.91 samples/sec   Loss 0.9922   LearningRate 0.0001   Epoch: 19   Global Step: 97350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:20,936-Speed 3398.08 samples/sec   Loss 0.9534   LearningRate 0.0001   Epoch: 19   Global Step: 97360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:23,971-Speed 3375.46 samples/sec   Loss 0.9479   LearningRate 0.0001   Epoch: 19   Global Step: 97370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:27,027-Speed 3351.89 samples/sec   Loss 0.9020   LearningRate 0.0001   Epoch: 19   Global Step: 97380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:30,050-Speed 3387.91 samples/sec   Loss 0.8778   LearningRate 0.0001   Epoch: 19   Global Step: 97390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:33,069-Speed 3393.55 samples/sec   Loss 0.9756   LearningRate 0.0001   Epoch: 19   Global Step: 97400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:36,087-Speed 3393.03 samples/sec   Loss 0.9760   LearningRate 0.0001   Epoch: 19   Global Step: 97410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:39,099-Speed 3400.85 samples/sec   Loss 0.9176   LearningRate 0.0001   Epoch: 19   Global Step: 97420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:42,109-Speed 3403.15 samples/sec   Loss 0.9384   LearningRate 0.0001   Epoch: 19   Global Step: 97430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:45,125-Speed 3395.47 samples/sec   Loss 0.9558   LearningRate 0.0001   Epoch: 19   Global Step: 97440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:48,140-Speed 3397.32 samples/sec   Loss 0.9643   LearningRate 0.0001   Epoch: 19   Global Step: 97450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:51,160-Speed 3391.14 samples/sec   Loss 0.8869   LearningRate 0.0001   Epoch: 19   Global Step: 97460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:17:54,174-Speed 3398.42 samples/sec   Loss 0.9561   LearningRate 0.0001   Epoch: 19   Global Step: 97470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:17:57,186-Speed 3400.88 samples/sec   Loss 0.9285   LearningRate 0.0001   Epoch: 19   Global Step: 97480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:00,201-Speed 3397.44 samples/sec   Loss 0.9136   LearningRate 0.0001   Epoch: 19   Global Step: 97490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:03,218-Speed 3394.79 samples/sec   Loss 0.9799   LearningRate 0.0001   Epoch: 19   Global Step: 97500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:06,232-Speed 3398.97 samples/sec   Loss 0.8577   LearningRate 0.0001   Epoch: 19   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:09,251-Speed 3393.50 samples/sec   Loss 1.0183   LearningRate 0.0001   Epoch: 19   Global Step: 97520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:12,264-Speed 3398.40 samples/sec   Loss 0.9680   LearningRate 0.0001   Epoch: 19   Global Step: 97530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:15,283-Speed 3393.62 samples/sec   Loss 0.8319   LearningRate 0.0001   Epoch: 19   Global Step: 97540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:18,344-Speed 3346.28 samples/sec   Loss 0.8880   LearningRate 0.0001   Epoch: 19   Global Step: 97550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:21,360-Speed 3396.29 samples/sec   Loss 0.8991   LearningRate 0.0001   Epoch: 19   Global Step: 97560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:24,375-Speed 3396.91 samples/sec   Loss 0.9171   LearningRate 0.0001   Epoch: 19   Global Step: 97570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:27,389-Speed 3398.10 samples/sec   Loss 0.9749   LearningRate 0.0001   Epoch: 19   Global Step: 97580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:30,402-Speed 3399.91 samples/sec   Loss 0.9032   LearningRate 0.0001   Epoch: 19   Global Step: 97590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:33,417-Speed 3397.56 samples/sec   Loss 0.9170   LearningRate 0.0001   Epoch: 19   Global Step: 97600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:36,449-Speed 3377.52 samples/sec   Loss 0.9291   LearningRate 0.0001   Epoch: 19   Global Step: 97610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:39,518-Speed 3337.91 samples/sec   Loss 0.8684   LearningRate 0.0001   Epoch: 19   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:42,535-Speed 3394.64 samples/sec   Loss 0.9586   LearningRate 0.0001   Epoch: 19   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:45,558-Speed 3387.99 samples/sec   Loss 1.0078   LearningRate 0.0001   Epoch: 19   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:48,572-Speed 3398.64 samples/sec   Loss 0.9140   LearningRate 0.0001   Epoch: 19   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:51,596-Speed 3387.49 samples/sec   Loss 0.9323   LearningRate 0.0001   Epoch: 19   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:54,594-Speed 3416.38 samples/sec   Loss 0.9259   LearningRate 0.0001   Epoch: 19   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:18:57,593-Speed 3415.52 samples/sec   Loss 0.8716   LearningRate 0.0001   Epoch: 19   Global Step: 97680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:00,609-Speed 3396.20 samples/sec   Loss 0.9198   LearningRate 0.0001   Epoch: 19   Global Step: 97690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:03,650-Speed 3367.83 samples/sec   Loss 0.9470   LearningRate 0.0001   Epoch: 19   Global Step: 97700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:06,673-Speed 3388.36 samples/sec   Loss 0.9561   LearningRate 0.0001   Epoch: 19   Global Step: 97710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:09,694-Speed 3391.21 samples/sec   Loss 0.9151   LearningRate 0.0001   Epoch: 19   Global Step: 97720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:12,711-Speed 3395.20 samples/sec   Loss 0.9156   LearningRate 0.0001   Epoch: 19   Global Step: 97730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:15,745-Speed 3376.06 samples/sec   Loss 0.9707   LearningRate 0.0001   Epoch: 19   Global Step: 97740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:18,759-Speed 3400.12 samples/sec   Loss 0.9406   LearningRate 0.0001   Epoch: 19   Global Step: 97750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:21,788-Speed 3381.97 samples/sec   Loss 1.0063   LearningRate 0.0001   Epoch: 19   Global Step: 97760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:24,800-Speed 3400.99 samples/sec   Loss 0.9797   LearningRate 0.0001   Epoch: 19   Global Step: 97770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:19:27,820-Speed 3391.64 samples/sec   Loss 0.9540   LearningRate 0.0001   Epoch: 19   Global Step: 97780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:30,843-Speed 3388.06 samples/sec   Loss 0.9982   LearningRate 0.0001   Epoch: 19   Global Step: 97790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:33,870-Speed 3384.01 samples/sec   Loss 0.9542   LearningRate 0.0001   Epoch: 19   Global Step: 97800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:36,911-Speed 3367.14 samples/sec   Loss 0.8779   LearningRate 0.0001   Epoch: 19   Global Step: 97810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:39,935-Speed 3387.74 samples/sec   Loss 0.9703   LearningRate 0.0001   Epoch: 19   Global Step: 97820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:42,978-Speed 3365.57 samples/sec   Loss 0.9279   LearningRate 0.0001   Epoch: 19   Global Step: 97830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:45,990-Speed 3400.91 samples/sec   Loss 0.9863   LearningRate 0.0001   Epoch: 19   Global Step: 97840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:49,010-Speed 3391.46 samples/sec   Loss 1.0200   LearningRate 0.0001   Epoch: 19   Global Step: 97850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:52,046-Speed 3373.91 samples/sec   Loss 0.9333   LearningRate 0.0001   Epoch: 19   Global Step: 97860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:55,066-Speed 3392.05 samples/sec   Loss 0.9257   LearningRate 0.0001   Epoch: 19   Global Step: 97870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:19:58,064-Speed 3416.84 samples/sec   Loss 0.9376   LearningRate 0.0001   Epoch: 19   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:01,079-Speed 3396.26 samples/sec   Loss 0.9141   LearningRate 0.0001   Epoch: 19   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:04,103-Speed 3387.27 samples/sec   Loss 0.9369   LearningRate 0.0001   Epoch: 19   Global Step: 97900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:07,127-Speed 3387.75 samples/sec   Loss 0.9113   LearningRate 0.0001   Epoch: 19   Global Step: 97910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:10,144-Speed 3394.20 samples/sec   Loss 0.9176   LearningRate 0.0001   Epoch: 19   Global Step: 97920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:13,171-Speed 3383.63 samples/sec   Loss 0.9181   LearningRate 0.0001   Epoch: 19   Global Step: 97930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:16,220-Speed 3359.74 samples/sec   Loss 0.9541   LearningRate 0.0001   Epoch: 19   Global Step: 97940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:19,241-Speed 3391.35 samples/sec   Loss 0.9354   LearningRate 0.0001   Epoch: 19   Global Step: 97950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:22,256-Speed 3396.35 samples/sec   Loss 0.9377   LearningRate 0.0001   Epoch: 19   Global Step: 97960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:25,271-Speed 3397.46 samples/sec   Loss 0.9730   LearningRate 0.0001   Epoch: 19   Global Step: 97970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:28,287-Speed 3396.94 samples/sec   Loss 0.9343   LearningRate 0.0001   Epoch: 19   Global Step: 97980   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 09:20:31,282-Speed 3418.65 samples/sec   Loss 0.9609   LearningRate 0.0001   Epoch: 19   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:20:34,304-Speed 3390.47 samples/sec   Loss 0.9405   LearningRate 0.0001   Epoch: 19   Global Step: 98000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:21:18,614-[lfw][98000]XNorm: 22.098909
Training: 2022-04-11 09:21:18,615-[lfw][98000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 09:21:18,615-[lfw][98000]Accuracy-Highest: 0.99850
Training: 2022-04-11 09:22:10,011-[cfp_fp][98000]XNorm: 22.369260
Training: 2022-04-11 09:22:10,011-[cfp_fp][98000]Accuracy-Flip: 0.98986+-0.00573
Training: 2022-04-11 09:22:10,012-[cfp_fp][98000]Accuracy-Highest: 0.98986
Training: 2022-04-11 09:22:54,072-[agedb_30][98000]XNorm: 22.539158
Training: 2022-04-11 09:22:54,073-[agedb_30][98000]Accuracy-Flip: 0.98517+-0.00728
Training: 2022-04-11 09:22:54,073-[agedb_30][98000]Accuracy-Highest: 0.98550
Training: 2022-04-11 09:22:57,072-Speed 71.72 samples/sec   Loss 0.9238   LearningRate 0.0001   Epoch: 19   Global Step: 98010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:00,057-Speed 3431.97 samples/sec   Loss 0.9675   LearningRate 0.0001   Epoch: 19   Global Step: 98020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:03,105-Speed 3359.67 samples/sec   Loss 0.9578   LearningRate 0.0001   Epoch: 19   Global Step: 98030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:06,161-Speed 3352.48 samples/sec   Loss 0.9713   LearningRate 0.0001   Epoch: 19   Global Step: 98040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:09,145-Speed 3431.96 samples/sec   Loss 0.8549   LearningRate 0.0001   Epoch: 19   Global Step: 98050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:12,142-Speed 3418.17 samples/sec   Loss 0.9819   LearningRate 0.0001   Epoch: 19   Global Step: 98060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:15,144-Speed 3411.87 samples/sec   Loss 0.8641   LearningRate 0.0001   Epoch: 19   Global Step: 98070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:18,141-Speed 3417.25 samples/sec   Loss 0.8440   LearningRate 0.0001   Epoch: 19   Global Step: 98080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:21,134-Speed 3422.97 samples/sec   Loss 0.9525   LearningRate 0.0001   Epoch: 19   Global Step: 98090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:24,127-Speed 3422.70 samples/sec   Loss 0.9672   LearningRate 0.0001   Epoch: 19   Global Step: 98100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:27,130-Speed 3410.09 samples/sec   Loss 0.9190   LearningRate 0.0001   Epoch: 19   Global Step: 98110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:30,131-Speed 3413.71 samples/sec   Loss 0.9625   LearningRate 0.0001   Epoch: 19   Global Step: 98120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:33,129-Speed 3416.79 samples/sec   Loss 0.9255   LearningRate 0.0001   Epoch: 19   Global Step: 98130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:36,129-Speed 3413.92 samples/sec   Loss 0.8783   LearningRate 0.0001   Epoch: 19   Global Step: 98140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:39,130-Speed 3412.94 samples/sec   Loss 0.9323   LearningRate 0.0001   Epoch: 19   Global Step: 98150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:42,132-Speed 3413.25 samples/sec   Loss 0.9394   LearningRate 0.0001   Epoch: 19   Global Step: 98160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:45,141-Speed 3403.79 samples/sec   Loss 0.9603   LearningRate 0.0001   Epoch: 19   Global Step: 98170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:48,142-Speed 3412.01 samples/sec   Loss 0.9285   LearningRate 0.0001   Epoch: 19   Global Step: 98180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:23:51,128-Speed 3430.40 samples/sec   Loss 0.9461   LearningRate 0.0001   Epoch: 19   Global Step: 98190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:54,136-Speed 3404.90 samples/sec   Loss 0.9514   LearningRate 0.0001   Epoch: 19   Global Step: 98200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:23:57,144-Speed 3407.50 samples/sec   Loss 0.9445   LearningRate 0.0001   Epoch: 19   Global Step: 98210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:00,148-Speed 3408.91 samples/sec   Loss 1.0286   LearningRate 0.0001   Epoch: 19   Global Step: 98220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:03,154-Speed 3406.88 samples/sec   Loss 0.8892   LearningRate 0.0001   Epoch: 19   Global Step: 98230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:06,188-Speed 3376.34 samples/sec   Loss 0.9331   LearningRate 0.0001   Epoch: 19   Global Step: 98240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:09,196-Speed 3405.96 samples/sec   Loss 0.9671   LearningRate 0.0001   Epoch: 19   Global Step: 98250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:12,229-Speed 3377.12 samples/sec   Loss 0.9529   LearningRate 0.0001   Epoch: 19   Global Step: 98260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:15,367-Speed 3263.76 samples/sec   Loss 0.9886   LearningRate 0.0001   Epoch: 19   Global Step: 98270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:18,385-Speed 3394.07 samples/sec   Loss 0.9573   LearningRate 0.0001   Epoch: 19   Global Step: 98280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:24:21,396-Speed 3402.06 samples/sec   Loss 0.9735   LearningRate 0.0001   Epoch: 19   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:24,401-Speed 3407.73 samples/sec   Loss 0.9160   LearningRate 0.0001   Epoch: 19   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:27,478-Speed 3328.89 samples/sec   Loss 0.8619   LearningRate 0.0001   Epoch: 19   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:30,491-Speed 3400.23 samples/sec   Loss 0.9473   LearningRate 0.0001   Epoch: 19   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:33,501-Speed 3402.15 samples/sec   Loss 0.9588   LearningRate 0.0001   Epoch: 19   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:36,547-Speed 3362.93 samples/sec   Loss 0.8945   LearningRate 0.0001   Epoch: 19   Global Step: 98340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:39,558-Speed 3402.31 samples/sec   Loss 0.9011   LearningRate 0.0001   Epoch: 19   Global Step: 98350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:42,577-Speed 3392.43 samples/sec   Loss 0.9349   LearningRate 0.0001   Epoch: 19   Global Step: 98360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:45,590-Speed 3399.20 samples/sec   Loss 0.8741   LearningRate 0.0001   Epoch: 19   Global Step: 98370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:48,620-Speed 3380.45 samples/sec   Loss 1.0406   LearningRate 0.0001   Epoch: 19   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:51,645-Speed 3386.33 samples/sec   Loss 0.9111   LearningRate 0.0001   Epoch: 19   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:54,679-Speed 3375.80 samples/sec   Loss 1.0282   LearningRate 0.0001   Epoch: 19   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:24:57,694-Speed 3397.48 samples/sec   Loss 0.9186   LearningRate 0.0001   Epoch: 19   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:00,702-Speed 3404.54 samples/sec   Loss 0.9206   LearningRate 0.0001   Epoch: 19   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:03,791-Speed 3316.12 samples/sec   Loss 0.9057   LearningRate 0.0001   Epoch: 19   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:06,901-Speed 3293.76 samples/sec   Loss 0.9789   LearningRate 0.0001   Epoch: 19   Global Step: 98440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:09,919-Speed 3394.06 samples/sec   Loss 0.9632   LearningRate 0.0001   Epoch: 19   Global Step: 98450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:12,930-Speed 3401.46 samples/sec   Loss 1.0140   LearningRate 0.0001   Epoch: 19   Global Step: 98460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:15,946-Speed 3396.89 samples/sec   Loss 1.0269   LearningRate 0.0001   Epoch: 19   Global Step: 98470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:18,964-Speed 3393.22 samples/sec   Loss 0.9304   LearningRate 0.0001   Epoch: 19   Global Step: 98480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:21,966-Speed 3411.97 samples/sec   Loss 0.8376   LearningRate 0.0001   Epoch: 19   Global Step: 98490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:24,973-Speed 3405.74 samples/sec   Loss 0.9310   LearningRate 0.0001   Epoch: 19   Global Step: 98500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:27,980-Speed 3407.01 samples/sec   Loss 0.9430   LearningRate 0.0001   Epoch: 19   Global Step: 98510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:30,989-Speed 3403.82 samples/sec   Loss 0.9041   LearningRate 0.0001   Epoch: 19   Global Step: 98520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:34,002-Speed 3398.69 samples/sec   Loss 1.0325   LearningRate 0.0001   Epoch: 19   Global Step: 98530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:25:37,007-Speed 3409.96 samples/sec   Loss 0.9460   LearningRate 0.0001   Epoch: 19   Global Step: 98540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:40,021-Speed 3398.13 samples/sec   Loss 0.9158   LearningRate 0.0001   Epoch: 19   Global Step: 98550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:43,025-Speed 3409.25 samples/sec   Loss 0.9193   LearningRate 0.0001   Epoch: 19   Global Step: 98560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:46,032-Speed 3406.64 samples/sec   Loss 0.8878   LearningRate 0.0001   Epoch: 19   Global Step: 98570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:49,041-Speed 3403.93 samples/sec   Loss 1.0525   LearningRate 0.0001   Epoch: 19   Global Step: 98580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:52,050-Speed 3404.91 samples/sec   Loss 0.9010   LearningRate 0.0001   Epoch: 19   Global Step: 98590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:55,121-Speed 3335.59 samples/sec   Loss 0.9888   LearningRate 0.0001   Epoch: 19   Global Step: 98600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:25:58,133-Speed 3400.59 samples/sec   Loss 0.9210   LearningRate 0.0001   Epoch: 19   Global Step: 98610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:01,119-Speed 3430.27 samples/sec   Loss 0.9362   LearningRate 0.0001   Epoch: 19   Global Step: 98620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:04,131-Speed 3399.72 samples/sec   Loss 0.9207   LearningRate 0.0001   Epoch: 19   Global Step: 98630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:07,147-Speed 3396.68 samples/sec   Loss 1.0382   LearningRate 0.0001   Epoch: 19   Global Step: 98640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:10,160-Speed 3400.54 samples/sec   Loss 1.0041   LearningRate 0.0001   Epoch: 19   Global Step: 98650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:13,214-Speed 3353.27 samples/sec   Loss 0.8792   LearningRate 0.0001   Epoch: 19   Global Step: 98660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:16,224-Speed 3403.02 samples/sec   Loss 0.9341   LearningRate 0.0001   Epoch: 19   Global Step: 98670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:19,236-Speed 3400.03 samples/sec   Loss 0.9655   LearningRate 0.0001   Epoch: 19   Global Step: 98680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:22,242-Speed 3408.15 samples/sec   Loss 0.9783   LearningRate 0.0001   Epoch: 19   Global Step: 98690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:25,297-Speed 3351.89 samples/sec   Loss 0.9373   LearningRate 0.0001   Epoch: 19   Global Step: 98700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:28,310-Speed 3400.16 samples/sec   Loss 0.9283   LearningRate 0.0001   Epoch: 19   Global Step: 98710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:26:31,319-Speed 3403.76 samples/sec   Loss 0.9363   LearningRate 0.0001   Epoch: 19   Global Step: 98720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:34,334-Speed 3397.45 samples/sec   Loss 1.0680   LearningRate 0.0001   Epoch: 19   Global Step: 98730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:37,349-Speed 3397.96 samples/sec   Loss 0.8784   LearningRate 0.0001   Epoch: 19   Global Step: 98740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:40,386-Speed 3372.21 samples/sec   Loss 1.0082   LearningRate 0.0001   Epoch: 19   Global Step: 98750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:43,395-Speed 3403.35 samples/sec   Loss 0.9346   LearningRate 0.0001   Epoch: 19   Global Step: 98760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:46,403-Speed 3405.79 samples/sec   Loss 0.9442   LearningRate 0.0001   Epoch: 19   Global Step: 98770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:49,427-Speed 3386.89 samples/sec   Loss 0.9642   LearningRate 0.0001   Epoch: 19   Global Step: 98780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:52,439-Speed 3400.21 samples/sec   Loss 0.9331   LearningRate 0.0001   Epoch: 19   Global Step: 98790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:55,450-Speed 3401.97 samples/sec   Loss 0.9039   LearningRate 0.0001   Epoch: 19   Global Step: 98800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:26:58,463-Speed 3398.96 samples/sec   Loss 0.9513   LearningRate 0.0001   Epoch: 19   Global Step: 98810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:01,484-Speed 3391.35 samples/sec   Loss 0.9288   LearningRate 0.0001   Epoch: 19   Global Step: 98820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:04,498-Speed 3397.85 samples/sec   Loss 0.9680   LearningRate 0.0001   Epoch: 19   Global Step: 98830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:07,522-Speed 3387.12 samples/sec   Loss 0.8823   LearningRate 0.0001   Epoch: 19   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:10,548-Speed 3385.29 samples/sec   Loss 0.9618   LearningRate 0.0001   Epoch: 19   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:13,566-Speed 3393.64 samples/sec   Loss 0.9393   LearningRate 0.0001   Epoch: 19   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:16,559-Speed 3422.65 samples/sec   Loss 1.0388   LearningRate 0.0001   Epoch: 19   Global Step: 98870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:19,600-Speed 3368.41 samples/sec   Loss 0.9560   LearningRate 0.0001   Epoch: 19   Global Step: 98880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:22,616-Speed 3395.41 samples/sec   Loss 0.8986   LearningRate 0.0001   Epoch: 19   Global Step: 98890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:25,624-Speed 3404.99 samples/sec   Loss 0.9668   LearningRate 0.0000   Epoch: 19   Global Step: 98900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:28,649-Speed 3386.17 samples/sec   Loss 0.9684   LearningRate 0.0000   Epoch: 19   Global Step: 98910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:31,668-Speed 3392.72 samples/sec   Loss 0.9299   LearningRate 0.0000   Epoch: 19   Global Step: 98920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:34,676-Speed 3405.16 samples/sec   Loss 0.9246   LearningRate 0.0000   Epoch: 19   Global Step: 98930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:37,700-Speed 3387.66 samples/sec   Loss 0.9591   LearningRate 0.0000   Epoch: 19   Global Step: 98940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:40,713-Speed 3398.95 samples/sec   Loss 0.9892   LearningRate 0.0000   Epoch: 19   Global Step: 98950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:43,723-Speed 3403.52 samples/sec   Loss 0.9366   LearningRate 0.0000   Epoch: 19   Global Step: 98960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:27:46,742-Speed 3392.55 samples/sec   Loss 0.9409   LearningRate 0.0000   Epoch: 19   Global Step: 98970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:49,783-Speed 3367.87 samples/sec   Loss 0.9604   LearningRate 0.0000   Epoch: 19   Global Step: 98980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:52,799-Speed 3396.26 samples/sec   Loss 0.9496   LearningRate 0.0000   Epoch: 19   Global Step: 98990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:55,814-Speed 3397.57 samples/sec   Loss 1.0157   LearningRate 0.0000   Epoch: 19   Global Step: 99000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:27:58,824-Speed 3402.86 samples/sec   Loss 0.9738   LearningRate 0.0000   Epoch: 19   Global Step: 99010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:01,839-Speed 3396.89 samples/sec   Loss 0.9773   LearningRate 0.0000   Epoch: 19   Global Step: 99020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:04,860-Speed 3391.19 samples/sec   Loss 0.9722   LearningRate 0.0000   Epoch: 19   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:07,882-Speed 3389.46 samples/sec   Loss 0.9405   LearningRate 0.0000   Epoch: 19   Global Step: 99040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:10,893-Speed 3401.34 samples/sec   Loss 0.9069   LearningRate 0.0000   Epoch: 19   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:13,926-Speed 3377.53 samples/sec   Loss 0.9973   LearningRate 0.0000   Epoch: 19   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:16,962-Speed 3373.73 samples/sec   Loss 0.9320   LearningRate 0.0000   Epoch: 19   Global Step: 99070   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 09:28:19,968-Speed 3406.86 samples/sec   Loss 0.9102   LearningRate 0.0000   Epoch: 19   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:22,980-Speed 3400.46 samples/sec   Loss 1.0029   LearningRate 0.0000   Epoch: 19   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:26,002-Speed 3389.83 samples/sec   Loss 0.9574   LearningRate 0.0000   Epoch: 19   Global Step: 99100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:29,023-Speed 3389.81 samples/sec   Loss 0.9679   LearningRate 0.0000   Epoch: 19   Global Step: 99110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:32,040-Speed 3396.07 samples/sec   Loss 0.9062   LearningRate 0.0000   Epoch: 19   Global Step: 99120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:35,076-Speed 3373.82 samples/sec   Loss 0.9821   LearningRate 0.0000   Epoch: 19   Global Step: 99130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:38,136-Speed 3346.97 samples/sec   Loss 0.8795   LearningRate 0.0000   Epoch: 19   Global Step: 99140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:41,193-Speed 3350.43 samples/sec   Loss 0.8960   LearningRate 0.0000   Epoch: 19   Global Step: 99150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:44,228-Speed 3375.19 samples/sec   Loss 0.8948   LearningRate 0.0000   Epoch: 19   Global Step: 99160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:47,252-Speed 3387.00 samples/sec   Loss 0.9516   LearningRate 0.0000   Epoch: 19   Global Step: 99170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:50,247-Speed 3419.77 samples/sec   Loss 0.9877   LearningRate 0.0000   Epoch: 19   Global Step: 99180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:53,286-Speed 3370.16 samples/sec   Loss 0.9073   LearningRate 0.0000   Epoch: 19   Global Step: 99190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:56,331-Speed 3364.00 samples/sec   Loss 0.8788   LearningRate 0.0000   Epoch: 19   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:28:59,345-Speed 3398.48 samples/sec   Loss 0.9206   LearningRate 0.0000   Epoch: 19   Global Step: 99210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:02,382-Speed 3373.17 samples/sec   Loss 0.9614   LearningRate 0.0000   Epoch: 19   Global Step: 99220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:05,458-Speed 3330.32 samples/sec   Loss 0.9602   LearningRate 0.0000   Epoch: 19   Global Step: 99230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:08,502-Speed 3364.33 samples/sec   Loss 0.8629   LearningRate 0.0000   Epoch: 19   Global Step: 99240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:11,525-Speed 3388.22 samples/sec   Loss 0.8969   LearningRate 0.0000   Epoch: 19   Global Step: 99250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:14,606-Speed 3324.37 samples/sec   Loss 0.9541   LearningRate 0.0000   Epoch: 19   Global Step: 99260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:17,618-Speed 3401.25 samples/sec   Loss 0.9980   LearningRate 0.0000   Epoch: 19   Global Step: 99270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:20,630-Speed 3399.62 samples/sec   Loss 0.9281   LearningRate 0.0000   Epoch: 19   Global Step: 99280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 09:29:23,627-Speed 3417.60 samples/sec   Loss 0.8976   LearningRate 0.0000   Epoch: 19   Global Step: 99290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:26,642-Speed 3398.40 samples/sec   Loss 0.9155   LearningRate 0.0000   Epoch: 19   Global Step: 99300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:29,660-Speed 3392.98 samples/sec   Loss 0.8570   LearningRate 0.0000   Epoch: 19   Global Step: 99310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:32,681-Speed 3391.63 samples/sec   Loss 0.9645   LearningRate 0.0000   Epoch: 19   Global Step: 99320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:35,701-Speed 3390.90 samples/sec   Loss 0.9438   LearningRate 0.0000   Epoch: 19   Global Step: 99330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:38,774-Speed 3332.94 samples/sec   Loss 0.9603   LearningRate 0.0000   Epoch: 19   Global Step: 99340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:41,815-Speed 3367.99 samples/sec   Loss 0.9515   LearningRate 0.0000   Epoch: 19   Global Step: 99350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:29:44,821-Speed 3407.28 samples/sec   Loss 0.9083   LearningRate 0.0000   Epoch: 19   Global Step: 99360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:29:47,860-Speed 3371.77 samples/sec   Loss 0.9976   LearningRate 0.0000   Epoch: 19   Global Step: 99370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:29:50,900-Speed 3369.28 samples/sec   Loss 0.9275   LearningRate 0.0000   Epoch: 19   Global Step: 99380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:29:53,924-Speed 3387.65 samples/sec   Loss 0.9560   LearningRate 0.0000   Epoch: 19   Global Step: 99390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:29:56,940-Speed 3396.70 samples/sec   Loss 0.9439   LearningRate 0.0000   Epoch: 19   Global Step: 99400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:29:59,958-Speed 3393.61 samples/sec   Loss 0.9191   LearningRate 0.0000   Epoch: 19   Global Step: 99410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:30:02,996-Speed 3370.87 samples/sec   Loss 0.9222   LearningRate 0.0000   Epoch: 19   Global Step: 99420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:30:06,039-Speed 3366.56 samples/sec   Loss 0.8746   LearningRate 0.0000   Epoch: 19   Global Step: 99430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:30:09,055-Speed 3396.06 samples/sec   Loss 0.9368   LearningRate 0.0000   Epoch: 19   Global Step: 99440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:30:12,081-Speed 3384.62 samples/sec   Loss 0.9040   LearningRate 0.0000   Epoch: 19   Global Step: 99450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:30:15,115-Speed 3375.72 samples/sec   Loss 0.9662   LearningRate 0.0000   Epoch: 19   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:18,134-Speed 3392.41 samples/sec   Loss 0.9499   LearningRate 0.0000   Epoch: 19   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:21,152-Speed 3394.55 samples/sec   Loss 0.9763   LearningRate 0.0000   Epoch: 19   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:24,161-Speed 3403.52 samples/sec   Loss 0.9598   LearningRate 0.0000   Epoch: 19   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:27,181-Speed 3392.46 samples/sec   Loss 0.9027   LearningRate 0.0000   Epoch: 19   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:30,191-Speed 3402.06 samples/sec   Loss 0.9501   LearningRate 0.0000   Epoch: 19   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:33,204-Speed 3399.21 samples/sec   Loss 1.0037   LearningRate 0.0000   Epoch: 19   Global Step: 99520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:36,238-Speed 3376.39 samples/sec   Loss 0.9843   LearningRate 0.0000   Epoch: 19   Global Step: 99530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:39,298-Speed 3347.27 samples/sec   Loss 0.9192   LearningRate 0.0000   Epoch: 19   Global Step: 99540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:42,315-Speed 3395.05 samples/sec   Loss 0.9427   LearningRate 0.0000   Epoch: 19   Global Step: 99550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:45,307-Speed 3423.06 samples/sec   Loss 0.9321   LearningRate 0.0000   Epoch: 19   Global Step: 99560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:48,334-Speed 3384.39 samples/sec   Loss 0.9552   LearningRate 0.0000   Epoch: 19   Global Step: 99570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:51,357-Speed 3388.18 samples/sec   Loss 0.9047   LearningRate 0.0000   Epoch: 19   Global Step: 99580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:54,381-Speed 3387.31 samples/sec   Loss 0.9388   LearningRate 0.0000   Epoch: 19   Global Step: 99590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:30:57,398-Speed 3394.58 samples/sec   Loss 0.9349   LearningRate 0.0000   Epoch: 19   Global Step: 99600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:00,502-Speed 3300.28 samples/sec   Loss 0.9102   LearningRate 0.0000   Epoch: 19   Global Step: 99610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:03,576-Speed 3331.67 samples/sec   Loss 1.0125   LearningRate 0.0000   Epoch: 19   Global Step: 99620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:06,679-Speed 3300.48 samples/sec   Loss 0.9832   LearningRate 0.0000   Epoch: 19   Global Step: 99630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:09,696-Speed 3395.39 samples/sec   Loss 0.9321   LearningRate 0.0000   Epoch: 19   Global Step: 99640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:12,708-Speed 3401.20 samples/sec   Loss 0.9047   LearningRate 0.0000   Epoch: 19   Global Step: 99650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:15,708-Speed 3414.17 samples/sec   Loss 0.9448   LearningRate 0.0000   Epoch: 19   Global Step: 99660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:18,728-Speed 3390.88 samples/sec   Loss 1.0237   LearningRate 0.0000   Epoch: 19   Global Step: 99670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:21,740-Speed 3401.49 samples/sec   Loss 0.9390   LearningRate 0.0000   Epoch: 19   Global Step: 99680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:24,756-Speed 3395.78 samples/sec   Loss 0.9053   LearningRate 0.0000   Epoch: 19   Global Step: 99690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:27,775-Speed 3393.30 samples/sec   Loss 0.9115   LearningRate 0.0000   Epoch: 19   Global Step: 99700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:30,794-Speed 3392.14 samples/sec   Loss 0.9839   LearningRate 0.0000   Epoch: 19   Global Step: 99710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:33,810-Speed 3395.38 samples/sec   Loss 0.9256   LearningRate 0.0000   Epoch: 19   Global Step: 99720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:36,826-Speed 3397.14 samples/sec   Loss 0.9657   LearningRate 0.0000   Epoch: 19   Global Step: 99730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:39,838-Speed 3400.43 samples/sec   Loss 0.9455   LearningRate 0.0000   Epoch: 19   Global Step: 99740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:42,853-Speed 3397.39 samples/sec   Loss 0.8892   LearningRate 0.0000   Epoch: 19   Global Step: 99750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:45,870-Speed 3395.38 samples/sec   Loss 0.8274   LearningRate 0.0000   Epoch: 19   Global Step: 99760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 09:31:48,868-Speed 3416.23 samples/sec   Loss 0.9060   LearningRate 0.0000   Epoch: 19   Global Step: 99770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:51,883-Speed 3397.76 samples/sec   Loss 0.9444   LearningRate 0.0000   Epoch: 19   Global Step: 99780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:54,921-Speed 3371.37 samples/sec   Loss 0.9703   LearningRate 0.0000   Epoch: 19   Global Step: 99790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:31:57,941-Speed 3391.11 samples/sec   Loss 0.9192   LearningRate 0.0000   Epoch: 19   Global Step: 99800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:00,958-Speed 3395.14 samples/sec   Loss 0.9567   LearningRate 0.0000   Epoch: 19   Global Step: 99810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:03,977-Speed 3392.64 samples/sec   Loss 0.9253   LearningRate 0.0000   Epoch: 19   Global Step: 99820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:06,996-Speed 3392.81 samples/sec   Loss 0.9064   LearningRate 0.0000   Epoch: 19   Global Step: 99830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:10,009-Speed 3400.38 samples/sec   Loss 0.9763   LearningRate 0.0000   Epoch: 19   Global Step: 99840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:13,027-Speed 3394.13 samples/sec   Loss 0.8494   LearningRate 0.0000   Epoch: 19   Global Step: 99850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:16,041-Speed 3398.73 samples/sec   Loss 0.9918   LearningRate 0.0000   Epoch: 19   Global Step: 99860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:19,046-Speed 3408.31 samples/sec   Loss 0.9489   LearningRate 0.0000   Epoch: 19   Global Step: 99870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:22,053-Speed 3406.05 samples/sec   Loss 0.8741   LearningRate 0.0000   Epoch: 19   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:25,066-Speed 3398.99 samples/sec   Loss 0.9388   LearningRate 0.0000   Epoch: 19   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:28,081-Speed 3397.71 samples/sec   Loss 0.9625   LearningRate 0.0000   Epoch: 19   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:31,093-Speed 3400.61 samples/sec   Loss 0.9836   LearningRate 0.0000   Epoch: 19   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:32:34,086-Speed 3421.94 samples/sec   Loss 1.0179   LearningRate 0.0000   Epoch: 19   Global Step: 99920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:37,101-Speed 3397.47 samples/sec   Loss 0.9251   LearningRate 0.0000   Epoch: 19   Global Step: 99930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:40,128-Speed 3384.03 samples/sec   Loss 0.9257   LearningRate 0.0000   Epoch: 19   Global Step: 99940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:43,141-Speed 3399.40 samples/sec   Loss 0.9082   LearningRate 0.0000   Epoch: 19   Global Step: 99950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:46,159-Speed 3394.50 samples/sec   Loss 0.8859   LearningRate 0.0000   Epoch: 19   Global Step: 99960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:49,172-Speed 3398.52 samples/sec   Loss 0.9825   LearningRate 0.0000   Epoch: 19   Global Step: 99970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:52,197-Speed 3386.98 samples/sec   Loss 0.9582   LearningRate 0.0000   Epoch: 19   Global Step: 99980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:55,211-Speed 3398.07 samples/sec   Loss 0.8966   LearningRate 0.0000   Epoch: 19   Global Step: 99990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:32:58,234-Speed 3388.10 samples/sec   Loss 0.9836   LearningRate 0.0000   Epoch: 19   Global Step: 100000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:33:42,435-[lfw][100000]XNorm: 22.076277
Training: 2022-04-11 09:33:42,435-[lfw][100000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 09:33:42,436-[lfw][100000]Accuracy-Highest: 0.99850
Training: 2022-04-11 09:34:33,652-[cfp_fp][100000]XNorm: 22.350055
Training: 2022-04-11 09:34:33,653-[cfp_fp][100000]Accuracy-Flip: 0.98929+-0.00547
Training: 2022-04-11 09:34:33,653-[cfp_fp][100000]Accuracy-Highest: 0.98986
Training: 2022-04-11 09:35:17,945-[agedb_30][100000]XNorm: 22.507673
Training: 2022-04-11 09:35:17,945-[agedb_30][100000]Accuracy-Flip: 0.98550+-0.00683
Training: 2022-04-11 09:35:17,946-[agedb_30][100000]Accuracy-Highest: 0.98550
Training: 2022-04-11 09:35:20,950-Speed 71.75 samples/sec   Loss 0.9284   LearningRate 0.0000   Epoch: 19   Global Step: 100010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:35:24,000-Speed 3358.19 samples/sec   Loss 0.9416   LearningRate 0.0000   Epoch: 19   Global Step: 100020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:27,028-Speed 3382.33 samples/sec   Loss 0.9557   LearningRate 0.0000   Epoch: 19   Global Step: 100030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:30,042-Speed 3398.70 samples/sec   Loss 1.0298   LearningRate 0.0000   Epoch: 19   Global Step: 100040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:33,038-Speed 3419.55 samples/sec   Loss 0.9190   LearningRate 0.0000   Epoch: 19   Global Step: 100050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:36,041-Speed 3410.41 samples/sec   Loss 0.8940   LearningRate 0.0000   Epoch: 19   Global Step: 100060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:39,056-Speed 3397.62 samples/sec   Loss 0.9254   LearningRate 0.0000   Epoch: 19   Global Step: 100070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:42,059-Speed 3411.11 samples/sec   Loss 0.9074   LearningRate 0.0000   Epoch: 19   Global Step: 100080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:45,059-Speed 3414.36 samples/sec   Loss 0.9086   LearningRate 0.0000   Epoch: 19   Global Step: 100090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:48,077-Speed 3393.72 samples/sec   Loss 0.9233   LearningRate 0.0000   Epoch: 19   Global Step: 100100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:51,079-Speed 3411.84 samples/sec   Loss 0.9681   LearningRate 0.0000   Epoch: 19   Global Step: 100110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:35:54,083-Speed 3409.30 samples/sec   Loss 1.0167   LearningRate 0.0000   Epoch: 19   Global Step: 100120   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 09:35:57,064-Speed 3436.40 samples/sec   Loss 1.0122   LearningRate 0.0000   Epoch: 19   Global Step: 100130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:00,067-Speed 3411.36 samples/sec   Loss 0.9070   LearningRate 0.0000   Epoch: 19   Global Step: 100140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:03,068-Speed 3412.95 samples/sec   Loss 0.9090   LearningRate 0.0000   Epoch: 19   Global Step: 100150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:06,083-Speed 3396.57 samples/sec   Loss 0.9611   LearningRate 0.0000   Epoch: 19   Global Step: 100160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:09,094-Speed 3402.42 samples/sec   Loss 0.9036   LearningRate 0.0000   Epoch: 19   Global Step: 100170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:12,109-Speed 3397.51 samples/sec   Loss 0.9481   LearningRate 0.0000   Epoch: 19   Global Step: 100180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:15,115-Speed 3406.89 samples/sec   Loss 1.0183   LearningRate 0.0000   Epoch: 19   Global Step: 100190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:18,097-Speed 3434.57 samples/sec   Loss 0.9669   LearningRate 0.0000   Epoch: 19   Global Step: 100200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:21,098-Speed 3413.10 samples/sec   Loss 0.9341   LearningRate 0.0000   Epoch: 19   Global Step: 100210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:24,149-Speed 3356.87 samples/sec   Loss 0.9040   LearningRate 0.0000   Epoch: 19   Global Step: 100220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:27,160-Speed 3401.97 samples/sec   Loss 0.9640   LearningRate 0.0000   Epoch: 19   Global Step: 100230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:30,178-Speed 3394.58 samples/sec   Loss 0.9974   LearningRate 0.0000   Epoch: 19   Global Step: 100240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:33,178-Speed 3414.44 samples/sec   Loss 0.9363   LearningRate 0.0000   Epoch: 19   Global Step: 100250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:36,181-Speed 3409.81 samples/sec   Loss 0.9194   LearningRate 0.0000   Epoch: 19   Global Step: 100260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:39,190-Speed 3404.89 samples/sec   Loss 0.9072   LearningRate 0.0000   Epoch: 19   Global Step: 100270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:42,193-Speed 3410.56 samples/sec   Loss 0.9098   LearningRate 0.0000   Epoch: 19   Global Step: 100280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:45,197-Speed 3409.67 samples/sec   Loss 0.9301   LearningRate 0.0000   Epoch: 19   Global Step: 100290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:36:48,207-Speed 3402.93 samples/sec   Loss 0.9465   LearningRate 0.0000   Epoch: 19   Global Step: 100300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:51,217-Speed 3402.57 samples/sec   Loss 0.8730   LearningRate 0.0000   Epoch: 19   Global Step: 100310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:54,221-Speed 3410.82 samples/sec   Loss 0.9577   LearningRate 0.0000   Epoch: 19   Global Step: 100320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:36:57,225-Speed 3409.43 samples/sec   Loss 1.0226   LearningRate 0.0000   Epoch: 19   Global Step: 100330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:00,228-Speed 3411.79 samples/sec   Loss 0.8721   LearningRate 0.0000   Epoch: 19   Global Step: 100340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:03,243-Speed 3396.82 samples/sec   Loss 0.9601   LearningRate 0.0000   Epoch: 19   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:06,255-Speed 3400.34 samples/sec   Loss 0.9460   LearningRate 0.0000   Epoch: 19   Global Step: 100360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:09,275-Speed 3391.35 samples/sec   Loss 0.9599   LearningRate 0.0000   Epoch: 19   Global Step: 100370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:12,285-Speed 3403.24 samples/sec   Loss 1.0474   LearningRate 0.0000   Epoch: 19   Global Step: 100380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:15,296-Speed 3401.66 samples/sec   Loss 0.9770   LearningRate 0.0000   Epoch: 19   Global Step: 100390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:18,303-Speed 3406.73 samples/sec   Loss 0.9273   LearningRate 0.0000   Epoch: 19   Global Step: 100400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:21,309-Speed 3407.05 samples/sec   Loss 0.8961   LearningRate 0.0000   Epoch: 19   Global Step: 100410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:24,313-Speed 3409.46 samples/sec   Loss 0.9129   LearningRate 0.0000   Epoch: 19   Global Step: 100420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:27,330-Speed 3395.39 samples/sec   Loss 0.9632   LearningRate 0.0000   Epoch: 19   Global Step: 100430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:37:30,324-Speed 3421.51 samples/sec   Loss 0.9396   LearningRate 0.0000   Epoch: 19   Global Step: 100440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:33,334-Speed 3402.50 samples/sec   Loss 0.9458   LearningRate 0.0000   Epoch: 19   Global Step: 100450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:36,343-Speed 3403.54 samples/sec   Loss 0.9397   LearningRate 0.0000   Epoch: 19   Global Step: 100460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:39,350-Speed 3406.57 samples/sec   Loss 1.0101   LearningRate 0.0000   Epoch: 19   Global Step: 100470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:42,357-Speed 3406.13 samples/sec   Loss 0.8731   LearningRate 0.0000   Epoch: 19   Global Step: 100480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:45,363-Speed 3407.33 samples/sec   Loss 0.8953   LearningRate 0.0000   Epoch: 19   Global Step: 100490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:48,371-Speed 3405.97 samples/sec   Loss 0.8728   LearningRate 0.0000   Epoch: 19   Global Step: 100500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:51,393-Speed 3389.25 samples/sec   Loss 0.9537   LearningRate 0.0000   Epoch: 19   Global Step: 100510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:54,420-Speed 3383.65 samples/sec   Loss 0.9204   LearningRate 0.0000   Epoch: 19   Global Step: 100520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:37:57,428-Speed 3405.03 samples/sec   Loss 0.8309   LearningRate 0.0000   Epoch: 19   Global Step: 100530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 09:38:00,438-Speed 3402.68 samples/sec   Loss 0.9710   LearningRate 0.0000   Epoch: 19   Global Step: 100540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:03,457-Speed 3393.49 samples/sec   Loss 0.9180   LearningRate 0.0000   Epoch: 19   Global Step: 100550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:06,502-Speed 3363.40 samples/sec   Loss 0.9362   LearningRate 0.0000   Epoch: 19   Global Step: 100560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:09,521-Speed 3393.01 samples/sec   Loss 0.8906   LearningRate 0.0000   Epoch: 19   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:12,524-Speed 3410.57 samples/sec   Loss 0.9414   LearningRate 0.0000   Epoch: 19   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:15,534-Speed 3403.29 samples/sec   Loss 0.9649   LearningRate 0.0000   Epoch: 19   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:18,543-Speed 3404.02 samples/sec   Loss 0.9452   LearningRate 0.0000   Epoch: 19   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:21,551-Speed 3404.53 samples/sec   Loss 0.9395   LearningRate 0.0000   Epoch: 19   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:24,558-Speed 3406.46 samples/sec   Loss 0.9982   LearningRate 0.0000   Epoch: 19   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:27,581-Speed 3388.85 samples/sec   Loss 0.9475   LearningRate 0.0000   Epoch: 19   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:30,569-Speed 3427.92 samples/sec   Loss 0.9168   LearningRate 0.0000   Epoch: 19   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:33,578-Speed 3404.02 samples/sec   Loss 0.9311   LearningRate 0.0000   Epoch: 19   Global Step: 100650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:36,583-Speed 3407.89 samples/sec   Loss 0.9088   LearningRate 0.0000   Epoch: 19   Global Step: 100660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:39,596-Speed 3399.53 samples/sec   Loss 0.9356   LearningRate 0.0000   Epoch: 19   Global Step: 100670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:42,611-Speed 3397.51 samples/sec   Loss 0.9704   LearningRate 0.0000   Epoch: 19   Global Step: 100680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:45,630-Speed 3393.68 samples/sec   Loss 0.8974   LearningRate 0.0000   Epoch: 19   Global Step: 100690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:48,637-Speed 3405.51 samples/sec   Loss 0.9758   LearningRate 0.0000   Epoch: 19   Global Step: 100700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:51,644-Speed 3406.96 samples/sec   Loss 0.8962   LearningRate 0.0000   Epoch: 19   Global Step: 100710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:54,663-Speed 3392.78 samples/sec   Loss 1.0226   LearningRate 0.0000   Epoch: 19   Global Step: 100720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:38:57,667-Speed 3408.78 samples/sec   Loss 0.8879   LearningRate 0.0000   Epoch: 19   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:00,671-Speed 3410.51 samples/sec   Loss 0.8932   LearningRate 0.0000   Epoch: 19   Global Step: 100740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:03,813-Speed 3259.75 samples/sec   Loss 0.8953   LearningRate 0.0000   Epoch: 19   Global Step: 100750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:06,859-Speed 3361.99 samples/sec   Loss 0.9205   LearningRate 0.0000   Epoch: 19   Global Step: 100760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:09,865-Speed 3407.51 samples/sec   Loss 0.9130   LearningRate 0.0000   Epoch: 19   Global Step: 100770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:12,882-Speed 3395.63 samples/sec   Loss 0.9892   LearningRate 0.0000   Epoch: 19   Global Step: 100780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:15,888-Speed 3407.43 samples/sec   Loss 0.9326   LearningRate 0.0000   Epoch: 19   Global Step: 100790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:18,905-Speed 3394.31 samples/sec   Loss 0.9500   LearningRate 0.0000   Epoch: 19   Global Step: 100800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:21,923-Speed 3394.37 samples/sec   Loss 0.9113   LearningRate 0.0000   Epoch: 19   Global Step: 100810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:24,948-Speed 3385.74 samples/sec   Loss 0.9228   LearningRate 0.0000   Epoch: 19   Global Step: 100820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:27,977-Speed 3381.82 samples/sec   Loss 0.9844   LearningRate 0.0000   Epoch: 19   Global Step: 100830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:30,982-Speed 3408.87 samples/sec   Loss 0.9045   LearningRate 0.0000   Epoch: 19   Global Step: 100840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:33,991-Speed 3403.98 samples/sec   Loss 0.9299   LearningRate 0.0000   Epoch: 19   Global Step: 100850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:37,004-Speed 3398.76 samples/sec   Loss 0.9090   LearningRate 0.0000   Epoch: 19   Global Step: 100860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:40,014-Speed 3403.75 samples/sec   Loss 0.8382   LearningRate 0.0000   Epoch: 19   Global Step: 100870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:43,026-Speed 3400.55 samples/sec   Loss 0.9167   LearningRate 0.0000   Epoch: 19   Global Step: 100880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:46,046-Speed 3391.44 samples/sec   Loss 0.9142   LearningRate 0.0000   Epoch: 19   Global Step: 100890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:49,064-Speed 3393.13 samples/sec   Loss 0.9096   LearningRate 0.0000   Epoch: 19   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:52,076-Speed 3400.45 samples/sec   Loss 0.9084   LearningRate 0.0000   Epoch: 19   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:55,086-Speed 3403.99 samples/sec   Loss 0.9571   LearningRate 0.0000   Epoch: 19   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:39:58,100-Speed 3397.98 samples/sec   Loss 0.9287   LearningRate 0.0000   Epoch: 19   Global Step: 100930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:01,106-Speed 3407.34 samples/sec   Loss 0.8343   LearningRate 0.0000   Epoch: 19   Global Step: 100940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:04,117-Speed 3402.17 samples/sec   Loss 0.8951   LearningRate 0.0000   Epoch: 19   Global Step: 100950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:07,129-Speed 3399.99 samples/sec   Loss 0.9420   LearningRate 0.0000   Epoch: 19   Global Step: 100960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:10,137-Speed 3405.57 samples/sec   Loss 0.9020   LearningRate 0.0000   Epoch: 19   Global Step: 100970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:13,148-Speed 3402.53 samples/sec   Loss 0.9236   LearningRate 0.0000   Epoch: 19   Global Step: 100980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:16,199-Speed 3357.21 samples/sec   Loss 0.9573   LearningRate 0.0000   Epoch: 19   Global Step: 100990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:19,263-Speed 3342.34 samples/sec   Loss 0.9286   LearningRate 0.0000   Epoch: 19   Global Step: 101000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:22,276-Speed 3399.50 samples/sec   Loss 0.9515   LearningRate 0.0000   Epoch: 19   Global Step: 101010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:25,288-Speed 3400.78 samples/sec   Loss 0.9158   LearningRate 0.0000   Epoch: 19   Global Step: 101020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:28,328-Speed 3368.60 samples/sec   Loss 0.8932   LearningRate 0.0000   Epoch: 19   Global Step: 101030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:31,341-Speed 3399.85 samples/sec   Loss 0.9671   LearningRate 0.0000   Epoch: 19   Global Step: 101040   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-11 09:40:34,345-Speed 3410.27 samples/sec   Loss 0.8500   LearningRate 0.0000   Epoch: 19   Global Step: 101050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:37,385-Speed 3369.19 samples/sec   Loss 0.9696   LearningRate 0.0000   Epoch: 19   Global Step: 101060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:40,438-Speed 3355.85 samples/sec   Loss 0.9420   LearningRate 0.0000   Epoch: 19   Global Step: 101070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:43,449-Speed 3401.25 samples/sec   Loss 1.0150   LearningRate 0.0000   Epoch: 19   Global Step: 101080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:46,457-Speed 3405.35 samples/sec   Loss 0.9804   LearningRate 0.0000   Epoch: 19   Global Step: 101090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:49,468-Speed 3401.80 samples/sec   Loss 0.9802   LearningRate 0.0000   Epoch: 19   Global Step: 101100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:52,490-Speed 3388.49 samples/sec   Loss 0.9941   LearningRate 0.0000   Epoch: 19   Global Step: 101110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:55,511-Speed 3390.94 samples/sec   Loss 0.8778   LearningRate 0.0000   Epoch: 19   Global Step: 101120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:40:58,529-Speed 3393.66 samples/sec   Loss 0.8918   LearningRate 0.0000   Epoch: 19   Global Step: 101130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:41:01,549-Speed 3391.93 samples/sec   Loss 0.9865   LearningRate 0.0000   Epoch: 19   Global Step: 101140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:41:04,623-Speed 3332.65 samples/sec   Loss 0.9510   LearningRate 0.0000   Epoch: 19   Global Step: 101150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 09:41:07,635-Speed 3399.95 samples/sec   Loss 0.9520   LearningRate 0.0000   Epoch: 19   Global Step: 101160   Fp16 Grad Scale: 65536   Required: -0 hours