Training: 2022-04-27 01:27:05,279-rank_id: 0
Training: 2022-04-27 01:27:19,616-: margin_list              [1.0, 0.5, 0.0]
Training: 2022-04-27 01:27:19,617-: network                  r100
Training: 2022-04-27 01:27:19,617-: resume                   False
Training: 2022-04-27 01:27:19,617-: output                   work_dirs/ms1mv2_r100
Training: 2022-04-27 01:27:19,617-: embedding_size           512
Training: 2022-04-27 01:27:19,617-: sample_rate              1.0
Training: 2022-04-27 01:27:19,617-: interclass_filtering_threshold0
Training: 2022-04-27 01:27:19,617-: fp16                     True
Training: 2022-04-27 01:27:19,617-: batch_size               128
Training: 2022-04-27 01:27:19,617-: optimizer                sgd
Training: 2022-04-27 01:27:19,617-: lr                       0.1
Training: 2022-04-27 01:27:19,617-: momentum                 0.9
Training: 2022-04-27 01:27:19,617-: weight_decay             0.0005
Training: 2022-04-27 01:27:19,617-: verbose                  2000
Training: 2022-04-27 01:27:19,617-: frequent                 10
Training: 2022-04-27 01:27:19,617-: dali                     False
Training: 2022-04-27 01:27:19,617-: rec                      /train_tmp/faces_emore
Training: 2022-04-27 01:27:19,617-: num_classes              85742
Training: 2022-04-27 01:27:19,617-: num_image                5822653
Training: 2022-04-27 01:27:19,617-: num_epoch                20
Training: 2022-04-27 01:27:19,617-: warmup_epoch             0
Training: 2022-04-27 01:27:19,617-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-27 01:27:19,617-: total_batch_size         1024
Training: 2022-04-27 01:27:19,617-: warmup_step              0
Training: 2022-04-27 01:27:19,618-: total_step               113720
Training: 2022-04-27 01:28:26,950-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-27 01:28:32,357-Speed 3431.74 samples/sec   Loss 46.7125   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-04-27 01:28:35,354-Speed 3417.36 samples/sec   Loss 47.6395   LearningRate 0.0999   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 01:28:38,318-Speed 3455.95 samples/sec   Loss 48.0146   LearningRate 0.0999   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 01:28:41,264-Speed 3477.52 samples/sec   Loss 47.0877   LearningRate 0.0999   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 01:28:44,197-Speed 3492.18 samples/sec   Loss 47.1361   LearningRate 0.0999   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 01:28:47,124-Speed 3499.40 samples/sec   Loss 46.9640   LearningRate 0.0999   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 01:28:50,053-Speed 3496.64 samples/sec   Loss 46.7923   LearningRate 0.0999   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 01:28:52,997-Speed 3479.21 samples/sec   Loss 46.2833   LearningRate 0.0998   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 01:28:55,948-Speed 3471.46 samples/sec   Loss 46.2290   LearningRate 0.0998   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 01:28:58,922-Speed 3443.20 samples/sec   Loss 46.1605   LearningRate 0.0998   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 01:29:01,855-Speed 3492.63 samples/sec   Loss 46.0213   LearningRate 0.0998   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 01:29:04,785-Speed 3495.88 samples/sec   Loss 45.8512   LearningRate 0.0998   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 01:29:07,720-Speed 3489.64 samples/sec   Loss 45.6772   LearningRate 0.0998   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 01:29:10,649-Speed 3497.20 samples/sec   Loss 45.6512   LearningRate 0.0997   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 01:29:13,580-Speed 3495.18 samples/sec   Loss 45.4421   LearningRate 0.0997   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 01:29:16,572-Speed 3423.07 samples/sec   Loss 45.2200   LearningRate 0.0997   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 01:29:19,509-Speed 3486.41 samples/sec   Loss 45.1176   LearningRate 0.0997   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 01:29:22,444-Speed 3491.13 samples/sec   Loss 44.8073   LearningRate 0.0997   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 01:29:25,376-Speed 3492.51 samples/sec   Loss 44.6877   LearningRate 0.0996   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 01:29:28,312-Speed 3489.39 samples/sec   Loss 44.5415   LearningRate 0.0996   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:29:31,249-Speed 3487.09 samples/sec   Loss 44.2927   LearningRate 0.0996   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:29:34,198-Speed 3473.13 samples/sec   Loss 44.1602   LearningRate 0.0996   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:29:37,136-Speed 3486.71 samples/sec   Loss 43.8958   LearningRate 0.0996   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:29:40,074-Speed 3485.60 samples/sec   Loss 43.6988   LearningRate 0.0996   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:29:43,017-Speed 3481.43 samples/sec   Loss 43.5673   LearningRate 0.0995   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:29:45,953-Speed 3488.13 samples/sec   Loss 43.3693   LearningRate 0.0995   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:29:48,891-Speed 3486.56 samples/sec   Loss 43.1162   LearningRate 0.0995   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:29:51,833-Speed 3481.73 samples/sec   Loss 43.0610   LearningRate 0.0995   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:29:54,773-Speed 3483.34 samples/sec   Loss 42.8345   LearningRate 0.0995   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:29:57,715-Speed 3481.58 samples/sec   Loss 42.6498   LearningRate 0.0995   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:00,652-Speed 3487.12 samples/sec   Loss 42.4366   LearningRate 0.0994   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:03,591-Speed 3486.01 samples/sec   Loss 42.2423   LearningRate 0.0994   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:06,531-Speed 3484.20 samples/sec   Loss 42.0894   LearningRate 0.0994   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:09,471-Speed 3483.31 samples/sec   Loss 41.8885   LearningRate 0.0994   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:12,411-Speed 3483.68 samples/sec   Loss 41.6302   LearningRate 0.0994   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:15,349-Speed 3486.52 samples/sec   Loss 41.4453   LearningRate 0.0994   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:18,287-Speed 3485.85 samples/sec   Loss 41.3575   LearningRate 0.0993   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:21,230-Speed 3480.56 samples/sec   Loss 41.1698   LearningRate 0.0993   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:24,168-Speed 3485.93 samples/sec   Loss 40.8791   LearningRate 0.0993   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:30:27,116-Speed 3474.65 samples/sec   Loss 40.7846   LearningRate 0.0993   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:30,054-Speed 3485.78 samples/sec   Loss 40.7333   LearningRate 0.0993   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:32,994-Speed 3483.35 samples/sec   Loss 40.3832   LearningRate 0.0992   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:35,950-Speed 3465.09 samples/sec   Loss 40.3120   LearningRate 0.0992   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:38,891-Speed 3483.13 samples/sec   Loss 40.0522   LearningRate 0.0992   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:41,842-Speed 3471.29 samples/sec   Loss 39.9291   LearningRate 0.0992   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:44,789-Speed 3475.58 samples/sec   Loss 39.7384   LearningRate 0.0992   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:47,731-Speed 3480.52 samples/sec   Loss 39.5834   LearningRate 0.0992   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:50,678-Speed 3475.95 samples/sec   Loss 39.4160   LearningRate 0.0991   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:53,623-Speed 3477.88 samples/sec   Loss 39.1798   LearningRate 0.0991   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:56,553-Speed 3495.39 samples/sec   Loss 39.1398   LearningRate 0.0991   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:30:59,499-Speed 3476.75 samples/sec   Loss 38.9353   LearningRate 0.0991   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:02,450-Speed 3471.72 samples/sec   Loss 38.6784   LearningRate 0.0991   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:05,395-Speed 3477.04 samples/sec   Loss 38.4823   LearningRate 0.0991   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:08,342-Speed 3476.32 samples/sec   Loss 38.3644   LearningRate 0.0990   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:11,285-Speed 3479.24 samples/sec   Loss 38.1609   LearningRate 0.0990   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:14,227-Speed 3481.57 samples/sec   Loss 37.9103   LearningRate 0.0990   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:17,174-Speed 3475.47 samples/sec   Loss 37.8805   LearningRate 0.0990   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:20,117-Speed 3480.55 samples/sec   Loss 37.6852   LearningRate 0.0990   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:23,061-Speed 3478.56 samples/sec   Loss 37.5015   LearningRate 0.0989   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:25,997-Speed 3489.60 samples/sec   Loss 37.2822   LearningRate 0.0989   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:28,950-Speed 3468.74 samples/sec   Loss 37.0927   LearningRate 0.0989   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:31,894-Speed 3478.98 samples/sec   Loss 36.7745   LearningRate 0.0989   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:34,838-Speed 3478.66 samples/sec   Loss 36.7736   LearningRate 0.0989   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:37,781-Speed 3480.29 samples/sec   Loss 36.6476   LearningRate 0.0989   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:40,726-Speed 3478.06 samples/sec   Loss 36.2974   LearningRate 0.0988   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:43,669-Speed 3480.82 samples/sec   Loss 36.1542   LearningRate 0.0988   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:46,619-Speed 3471.72 samples/sec   Loss 35.9093   LearningRate 0.0988   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:49,569-Speed 3471.68 samples/sec   Loss 35.7093   LearningRate 0.0988   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:52,518-Speed 3473.61 samples/sec   Loss 35.5678   LearningRate 0.0988   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:55,450-Speed 3492.80 samples/sec   Loss 35.3586   LearningRate 0.0988   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:31:58,404-Speed 3468.06 samples/sec   Loss 35.2870   LearningRate 0.0987   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:01,355-Speed 3471.75 samples/sec   Loss 35.1534   LearningRate 0.0987   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:04,302-Speed 3474.64 samples/sec   Loss 34.8681   LearningRate 0.0987   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:07,254-Speed 3470.64 samples/sec   Loss 34.7991   LearningRate 0.0987   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:10,203-Speed 3471.96 samples/sec   Loss 34.4469   LearningRate 0.0987   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:13,158-Speed 3466.40 samples/sec   Loss 34.2558   LearningRate 0.0987   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:16,107-Speed 3473.95 samples/sec   Loss 34.1711   LearningRate 0.0986   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:19,054-Speed 3475.42 samples/sec   Loss 34.0545   LearningRate 0.0986   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:22,017-Speed 3457.09 samples/sec   Loss 33.7512   LearningRate 0.0986   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:24,966-Speed 3473.29 samples/sec   Loss 33.5246   LearningRate 0.0986   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 524288   Required: 10 hours
Training: 2022-04-27 01:32:27,905-Speed 3485.28 samples/sec   Loss 33.4970   LearningRate 0.0986   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:30,856-Speed 3470.61 samples/sec   Loss 33.3390   LearningRate 0.0985   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:33,813-Speed 3463.89 samples/sec   Loss 33.0562   LearningRate 0.0985   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:36,765-Speed 3469.13 samples/sec   Loss 32.8666   LearningRate 0.0985   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:39,751-Speed 3430.58 samples/sec   Loss 32.6138   LearningRate 0.0985   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:42,721-Speed 3448.07 samples/sec   Loss 32.5411   LearningRate 0.0985   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:45,676-Speed 3465.70 samples/sec   Loss 32.2798   LearningRate 0.0985   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:48,628-Speed 3470.55 samples/sec   Loss 32.2559   LearningRate 0.0984   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:51,584-Speed 3465.29 samples/sec   Loss 31.9369   LearningRate 0.0984   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:54,541-Speed 3463.37 samples/sec   Loss 31.7570   LearningRate 0.0984   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:32:57,472-Speed 3494.92 samples/sec   Loss 31.6951   LearningRate 0.0984   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:00,424-Speed 3469.43 samples/sec   Loss 31.6234   LearningRate 0.0984   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:03,375-Speed 3471.31 samples/sec   Loss 31.3588   LearningRate 0.0984   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:06,331-Speed 3464.47 samples/sec   Loss 31.1023   LearningRate 0.0983   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:09,285-Speed 3466.55 samples/sec   Loss 30.9703   LearningRate 0.0983   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:12,243-Speed 3463.17 samples/sec   Loss 30.7795   LearningRate 0.0983   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:15,195-Speed 3469.27 samples/sec   Loss 30.5400   LearningRate 0.0983   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:18,147-Speed 3470.27 samples/sec   Loss 30.5388   LearningRate 0.0983   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:21,097-Speed 3472.36 samples/sec   Loss 30.1616   LearningRate 0.0982   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:24,051-Speed 3466.81 samples/sec   Loss 30.1865   LearningRate 0.0982   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:27,003-Speed 3469.88 samples/sec   Loss 29.8027   LearningRate 0.0982   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:33:29,954-Speed 3471.35 samples/sec   Loss 29.7471   LearningRate 0.0982   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:33:32,910-Speed 3464.59 samples/sec   Loss 29.5194   LearningRate 0.0982   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:33:35,860-Speed 3472.49 samples/sec   Loss 29.2013   LearningRate 0.0982   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:33:38,818-Speed 3462.08 samples/sec   Loss 29.0126   LearningRate 0.0981   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:33:41,772-Speed 3467.29 samples/sec   Loss 28.9054   LearningRate 0.0981   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:33:44,716-Speed 3479.87 samples/sec   Loss 28.7258   LearningRate 0.0981   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:47,667-Speed 3471.26 samples/sec   Loss 28.5205   LearningRate 0.0981   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:50,619-Speed 3469.59 samples/sec   Loss 28.5390   LearningRate 0.0981   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:53,571-Speed 3468.85 samples/sec   Loss 28.2381   LearningRate 0.0981   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:56,522-Speed 3471.32 samples/sec   Loss 28.1222   LearningRate 0.0980   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:33:59,475-Speed 3468.02 samples/sec   Loss 28.0501   LearningRate 0.0980   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:02,456-Speed 3436.36 samples/sec   Loss 27.8529   LearningRate 0.0980   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:05,414-Speed 3462.33 samples/sec   Loss 27.4859   LearningRate 0.0980   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:08,370-Speed 3465.22 samples/sec   Loss 27.4816   LearningRate 0.0980   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:11,322-Speed 3469.30 samples/sec   Loss 27.4047   LearningRate 0.0980   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:14,277-Speed 3466.47 samples/sec   Loss 27.1757   LearningRate 0.0979   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:34:17,230-Speed 3468.93 samples/sec   Loss 26.9348   LearningRate 0.0979   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:34:20,186-Speed 3465.48 samples/sec   Loss 26.8652   LearningRate 0.0979   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:34:23,141-Speed 3465.41 samples/sec   Loss 26.6820   LearningRate 0.0979   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:34:26,094-Speed 3468.61 samples/sec   Loss 26.6253   LearningRate 0.0979   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:34:29,051-Speed 3463.96 samples/sec   Loss 26.3177   LearningRate 0.0978   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:32,001-Speed 3471.45 samples/sec   Loss 26.3232   LearningRate 0.0978   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:34,957-Speed 3464.87 samples/sec   Loss 26.0734   LearningRate 0.0978   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:37,913-Speed 3465.48 samples/sec   Loss 25.8342   LearningRate 0.0978   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:40,866-Speed 3468.71 samples/sec   Loss 25.7510   LearningRate 0.0978   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:43,823-Speed 3464.53 samples/sec   Loss 25.5355   LearningRate 0.0978   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:46,783-Speed 3459.71 samples/sec   Loss 25.4521   LearningRate 0.0977   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:49,738-Speed 3465.97 samples/sec   Loss 25.3086   LearningRate 0.0977   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:52,691-Speed 3468.50 samples/sec   Loss 24.9476   LearningRate 0.0977   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:55,683-Speed 3422.61 samples/sec   Loss 24.9923   LearningRate 0.0977   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:34:58,652-Speed 3450.33 samples/sec   Loss 24.8184   LearningRate 0.0977   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:35:01,613-Speed 3459.64 samples/sec   Loss 24.7787   LearningRate 0.0977   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:35:04,576-Speed 3456.34 samples/sec   Loss 24.6523   LearningRate 0.0976   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:35:07,524-Speed 3474.77 samples/sec   Loss 24.4503   LearningRate 0.0976   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:10,480-Speed 3464.97 samples/sec   Loss 24.2784   LearningRate 0.0976   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:13,443-Speed 3456.38 samples/sec   Loss 24.0857   LearningRate 0.0976   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:16,401-Speed 3463.28 samples/sec   Loss 24.0810   LearningRate 0.0976   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:19,353-Speed 3469.10 samples/sec   Loss 23.9469   LearningRate 0.0976   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:22,313-Speed 3460.43 samples/sec   Loss 23.6692   LearningRate 0.0975   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:25,267-Speed 3467.39 samples/sec   Loss 23.6292   LearningRate 0.0975   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:28,231-Speed 3455.77 samples/sec   Loss 23.5041   LearningRate 0.0975   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:31,196-Speed 3453.70 samples/sec   Loss 23.3429   LearningRate 0.0975   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:34,154-Speed 3462.96 samples/sec   Loss 23.2969   LearningRate 0.0975   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:35:37,113-Speed 3461.96 samples/sec   Loss 23.2366   LearningRate 0.0974   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 01:35:40,040-Speed 3500.05 samples/sec   Loss 23.0133   LearningRate 0.0974   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:35:42,999-Speed 3461.25 samples/sec   Loss 22.8900   LearningRate 0.0974   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:35:45,954-Speed 3465.16 samples/sec   Loss 22.9868   LearningRate 0.0974   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:35:48,914-Speed 3460.88 samples/sec   Loss 22.7094   LearningRate 0.0974   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:35:51,877-Speed 3457.08 samples/sec   Loss 22.6335   LearningRate 0.0974   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:35:54,835-Speed 3461.70 samples/sec   Loss 22.4347   LearningRate 0.0973   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:35:57,799-Speed 3455.83 samples/sec   Loss 22.3818   LearningRate 0.0973   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:36:00,760-Speed 3459.62 samples/sec   Loss 22.1709   LearningRate 0.0973   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:36:03,718-Speed 3462.31 samples/sec   Loss 22.0594   LearningRate 0.0973   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:36:06,679-Speed 3459.13 samples/sec   Loss 21.9179   LearningRate 0.0973   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 01:36:09,639-Speed 3461.07 samples/sec   Loss 21.7345   LearningRate 0.0973   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:12,604-Speed 3453.96 samples/sec   Loss 21.9192   LearningRate 0.0972   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:15,573-Speed 3449.52 samples/sec   Loss 21.4801   LearningRate 0.0972   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:18,536-Speed 3457.60 samples/sec   Loss 21.4815   LearningRate 0.0972   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:21,507-Speed 3447.19 samples/sec   Loss 21.5369   LearningRate 0.0972   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:24,466-Speed 3462.01 samples/sec   Loss 21.4062   LearningRate 0.0972   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:27,430-Speed 3455.55 samples/sec   Loss 21.3606   LearningRate 0.0972   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:30,394-Speed 3455.37 samples/sec   Loss 21.1720   LearningRate 0.0971   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:33,354-Speed 3460.79 samples/sec   Loss 21.2957   LearningRate 0.0971   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:36,312-Speed 3462.28 samples/sec   Loss 21.0234   LearningRate 0.0971   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:39,270-Speed 3463.36 samples/sec   Loss 21.0604   LearningRate 0.0971   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:36:42,231-Speed 3459.17 samples/sec   Loss 20.7749   LearningRate 0.0971   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:36:45,181-Speed 3472.04 samples/sec   Loss 20.7017   LearningRate 0.0970   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:48,142-Speed 3458.85 samples/sec   Loss 20.6507   LearningRate 0.0970   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:51,103-Speed 3459.61 samples/sec   Loss 20.6199   LearningRate 0.0970   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:54,111-Speed 3404.61 samples/sec   Loss 20.4980   LearningRate 0.0970   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:36:57,129-Speed 3393.28 samples/sec   Loss 20.4476   LearningRate 0.0970   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:00,090-Speed 3460.05 samples/sec   Loss 20.1752   LearningRate 0.0970   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:03,052-Speed 3458.51 samples/sec   Loss 20.2595   LearningRate 0.0969   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:06,060-Speed 3404.15 samples/sec   Loss 20.0411   LearningRate 0.0969   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:09,021-Speed 3458.93 samples/sec   Loss 20.0573   LearningRate 0.0969   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:11,983-Speed 3458.89 samples/sec   Loss 19.9011   LearningRate 0.0969   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:14,946-Speed 3456.65 samples/sec   Loss 19.9083   LearningRate 0.0969   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:37:17,917-Speed 3446.91 samples/sec   Loss 19.8106   LearningRate 0.0969   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:37:20,884-Speed 3451.93 samples/sec   Loss 19.6799   LearningRate 0.0968   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:37:23,851-Speed 3452.45 samples/sec   Loss 19.6714   LearningRate 0.0968   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:37:26,820-Speed 3450.21 samples/sec   Loss 19.6649   LearningRate 0.0968   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:37:29,787-Speed 3451.54 samples/sec   Loss 19.4689   LearningRate 0.0968   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:37:32,747-Speed 3460.41 samples/sec   Loss 19.4527   LearningRate 0.0968   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:37:35,699-Speed 3470.33 samples/sec   Loss 19.3712   LearningRate 0.0968   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:38,662-Speed 3456.04 samples/sec   Loss 19.4042   LearningRate 0.0967   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:41,624-Speed 3458.90 samples/sec   Loss 19.0824   LearningRate 0.0967   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:44,589-Speed 3453.37 samples/sec   Loss 18.8792   LearningRate 0.0967   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:47,550-Speed 3459.43 samples/sec   Loss 19.0752   LearningRate 0.0967   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:50,511-Speed 3459.85 samples/sec   Loss 18.7519   LearningRate 0.0967   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:53,475-Speed 3455.63 samples/sec   Loss 18.8056   LearningRate 0.0967   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:56,436-Speed 3459.08 samples/sec   Loss 18.9784   LearningRate 0.0966   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:37:59,398-Speed 3458.38 samples/sec   Loss 18.6702   LearningRate 0.0966   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:38:02,365-Speed 3451.83 samples/sec   Loss 18.6783   LearningRate 0.0966   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:38:05,355-Speed 3438.58 samples/sec   Loss 18.6623   LearningRate 0.0966   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:38:08,317-Speed 3458.10 samples/sec   Loss 18.5580   LearningRate 0.0966   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:38:11,286-Speed 3452.93 samples/sec   Loss 18.4478   LearningRate 0.0965   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:38:14,252-Speed 3453.92 samples/sec   Loss 18.5991   LearningRate 0.0965   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 01:38:17,203-Speed 3470.47 samples/sec   Loss 18.4642   LearningRate 0.0965   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 01:39:00,686-[lfw][2000]XNorm: 22.268713
Training: 2022-04-27 01:39:00,691-[lfw][2000]Accuracy-Flip: 0.98117+-0.00606
Training: 2022-04-27 01:39:00,691-[lfw][2000]Accuracy-Highest: 0.98117
Training: 2022-04-27 01:39:50,905-[cfp_fp][2000]XNorm: 18.848542
Training: 2022-04-27 01:39:50,906-[cfp_fp][2000]Accuracy-Flip: 0.78671+-0.02209
Training: 2022-04-27 01:39:50,906-[cfp_fp][2000]Accuracy-Highest: 0.78671
Training: 2022-04-27 01:40:34,127-[agedb_30][2000]XNorm: 21.529968
Training: 2022-04-27 01:40:34,128-[agedb_30][2000]Accuracy-Flip: 0.89517+-0.01863
Training: 2022-04-27 01:40:34,129-[agedb_30][2000]Accuracy-Highest: 0.89517
Training: 2022-04-27 01:40:37,086-Speed 73.21 samples/sec   Loss 18.1831   LearningRate 0.0965   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:40:40,047-Speed 3459.48 samples/sec   Loss 18.0367   LearningRate 0.0965   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:40:43,012-Speed 3453.81 samples/sec   Loss 18.0553   LearningRate 0.0965   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:40:45,977-Speed 3467.35 samples/sec   Loss 18.1793   LearningRate 0.0964   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:40:48,931-Speed 3466.23 samples/sec   Loss 18.0511   LearningRate 0.0964   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:40:51,888-Speed 3464.47 samples/sec   Loss 18.1344   LearningRate 0.0964   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:40:54,858-Speed 3448.46 samples/sec   Loss 18.2913   LearningRate 0.0964   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:40:57,837-Speed 3458.59 samples/sec   Loss 18.0095   LearningRate 0.0964   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:41:00,797-Speed 3459.84 samples/sec   Loss 17.8124   LearningRate 0.0964   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:41:03,760-Speed 3457.01 samples/sec   Loss 17.9375   LearningRate 0.0963   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:06,742-Speed 3448.29 samples/sec   Loss 17.6548   LearningRate 0.0963   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:09,706-Speed 3455.21 samples/sec   Loss 17.5310   LearningRate 0.0963   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:12,671-Speed 3454.76 samples/sec   Loss 17.4227   LearningRate 0.0963   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:15,637-Speed 3452.77 samples/sec   Loss 17.3870   LearningRate 0.0963   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:18,603-Speed 3453.98 samples/sec   Loss 17.2406   LearningRate 0.0963   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:21,607-Speed 3453.97 samples/sec   Loss 17.3169   LearningRate 0.0962   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:24,573-Speed 3452.56 samples/sec   Loss 17.4195   LearningRate 0.0962   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:27,553-Speed 3441.46 samples/sec   Loss 17.1651   LearningRate 0.0962   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:30,528-Speed 3443.17 samples/sec   Loss 17.2418   LearningRate 0.0962   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:33,508-Speed 3437.39 samples/sec   Loss 17.2135   LearningRate 0.0962   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:41:36,484-Speed 3447.40 samples/sec   Loss 17.1875   LearningRate 0.0962   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:41:39,460-Speed 3441.44 samples/sec   Loss 17.0191   LearningRate 0.0961   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:42,436-Speed 3442.48 samples/sec   Loss 16.9554   LearningRate 0.0961   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:45,424-Speed 3427.51 samples/sec   Loss 17.0467   LearningRate 0.0961   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:48,403-Speed 3438.29 samples/sec   Loss 16.8527   LearningRate 0.0961   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:51,392-Speed 3436.79 samples/sec   Loss 16.9890   LearningRate 0.0961   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:54,371-Speed 3437.56 samples/sec   Loss 16.8323   LearningRate 0.0960   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:41:57,356-Speed 3437.51 samples/sec   Loss 16.7029   LearningRate 0.0960   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:00,340-Speed 3432.31 samples/sec   Loss 16.6687   LearningRate 0.0960   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:03,315-Speed 3442.65 samples/sec   Loss 16.5367   LearningRate 0.0960   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:06,302-Speed 3444.57 samples/sec   Loss 16.7361   LearningRate 0.0960   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:09,280-Speed 3438.64 samples/sec   Loss 16.4924   LearningRate 0.0960   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:42:12,253-Speed 3445.31 samples/sec   Loss 16.7148   LearningRate 0.0959   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:42:15,212-Speed 3461.34 samples/sec   Loss 16.2982   LearningRate 0.0959   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:18,180-Speed 3450.17 samples/sec   Loss 16.3453   LearningRate 0.0959   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:21,150-Speed 3453.25 samples/sec   Loss 16.3905   LearningRate 0.0959   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:24,113-Speed 3456.21 samples/sec   Loss 16.4681   LearningRate 0.0959   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:27,083-Speed 3449.00 samples/sec   Loss 16.4621   LearningRate 0.0959   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:30,058-Speed 3455.72 samples/sec   Loss 16.3237   LearningRate 0.0958   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:33,022-Speed 3455.87 samples/sec   Loss 16.1416   LearningRate 0.0958   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:35,985-Speed 3456.41 samples/sec   Loss 16.2304   LearningRate 0.0958   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:38,961-Speed 3450.47 samples/sec   Loss 16.0783   LearningRate 0.0958   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:41,929-Speed 3451.00 samples/sec   Loss 16.2764   LearningRate 0.0958   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:44,894-Speed 3454.45 samples/sec   Loss 16.3176   LearningRate 0.0958   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:42:47,844-Speed 3472.34 samples/sec   Loss 16.0671   LearningRate 0.0957   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:50,822-Speed 3439.04 samples/sec   Loss 16.0153   LearningRate 0.0957   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:53,825-Speed 3424.90 samples/sec   Loss 15.8714   LearningRate 0.0957   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:56,782-Speed 3463.16 samples/sec   Loss 15.8232   LearningRate 0.0957   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:42:59,751-Speed 3457.04 samples/sec   Loss 15.8106   LearningRate 0.0957   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:02,716-Speed 3454.85 samples/sec   Loss 15.7955   LearningRate 0.0957   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:05,675-Speed 3460.89 samples/sec   Loss 15.9779   LearningRate 0.0956   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:08,635-Speed 3465.79 samples/sec   Loss 15.8576   LearningRate 0.0956   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:11,594-Speed 3461.07 samples/sec   Loss 15.7525   LearningRate 0.0956   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:14,553-Speed 3461.50 samples/sec   Loss 15.6378   LearningRate 0.0956   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:17,541-Speed 3427.83 samples/sec   Loss 15.7377   LearningRate 0.0956   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:43:20,479-Speed 3486.18 samples/sec   Loss 15.7147   LearningRate 0.0955   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:23,441-Speed 3463.26 samples/sec   Loss 15.6587   LearningRate 0.0955   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:26,399-Speed 3462.42 samples/sec   Loss 15.5164   LearningRate 0.0955   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:29,375-Speed 3449.33 samples/sec   Loss 15.3378   LearningRate 0.0955   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:32,333-Speed 3462.65 samples/sec   Loss 15.5933   LearningRate 0.0955   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:35,296-Speed 3456.29 samples/sec   Loss 15.4742   LearningRate 0.0955   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:38,262-Speed 3461.91 samples/sec   Loss 15.4657   LearningRate 0.0954   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:41,220-Speed 3462.59 samples/sec   Loss 15.4065   LearningRate 0.0954   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:44,180-Speed 3459.51 samples/sec   Loss 15.3714   LearningRate 0.0954   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:47,140-Speed 3460.88 samples/sec   Loss 15.4012   LearningRate 0.0954   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:43:50,102-Speed 3457.51 samples/sec   Loss 15.2465   LearningRate 0.0954   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:53,068-Speed 3457.70 samples/sec   Loss 15.3444   LearningRate 0.0954   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:56,022-Speed 3467.50 samples/sec   Loss 15.2158   LearningRate 0.0953   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:43:58,999-Speed 3458.02 samples/sec   Loss 15.3221   LearningRate 0.0953   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:44:01,962-Speed 3457.37 samples/sec   Loss 15.4514   LearningRate 0.0953   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:44:04,925-Speed 3456.41 samples/sec   Loss 15.1176   LearningRate 0.0953   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:44:07,893-Speed 3459.22 samples/sec   Loss 15.1491   LearningRate 0.0953   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:44:10,851-Speed 3462.72 samples/sec   Loss 15.1606   LearningRate 0.0953   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:44:13,810-Speed 3460.52 samples/sec   Loss 14.9439   LearningRate 0.0952   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:44:16,770-Speed 3460.16 samples/sec   Loss 14.9734   LearningRate 0.0952   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:44:19,733-Speed 3457.25 samples/sec   Loss 15.0973   LearningRate 0.0952   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:22,698-Speed 3463.66 samples/sec   Loss 15.0369   LearningRate 0.0952   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:25,658-Speed 3460.30 samples/sec   Loss 15.0502   LearningRate 0.0952   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:28,662-Speed 3416.45 samples/sec   Loss 15.0699   LearningRate 0.0952   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:31,626-Speed 3456.05 samples/sec   Loss 15.0686   LearningRate 0.0951   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:34,583-Speed 3463.11 samples/sec   Loss 14.8821   LearningRate 0.0951   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:37,543-Speed 3460.85 samples/sec   Loss 14.9850   LearningRate 0.0951   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:40,520-Speed 3440.01 samples/sec   Loss 14.8803   LearningRate 0.0951   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:43,485-Speed 3454.72 samples/sec   Loss 15.0090   LearningRate 0.0951   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:46,447-Speed 3457.41 samples/sec   Loss 14.7879   LearningRate 0.0951   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:49,407-Speed 3460.58 samples/sec   Loss 14.7958   LearningRate 0.0950   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:52,369-Speed 3458.28 samples/sec   Loss 14.6468   LearningRate 0.0950   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:55,340-Speed 3447.23 samples/sec   Loss 14.7457   LearningRate 0.0950   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:44:58,292-Speed 3469.94 samples/sec   Loss 14.3814   LearningRate 0.0950   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:01,254-Speed 3458.24 samples/sec   Loss 14.6328   LearningRate 0.0950   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:04,215-Speed 3458.16 samples/sec   Loss 14.8609   LearningRate 0.0949   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:07,180-Speed 3454.80 samples/sec   Loss 14.4447   LearningRate 0.0949   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:10,144-Speed 3455.89 samples/sec   Loss 14.6500   LearningRate 0.0949   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:13,117-Speed 3445.07 samples/sec   Loss 14.6104   LearningRate 0.0949   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:16,085-Speed 3450.31 samples/sec   Loss 14.6259   LearningRate 0.0949   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:19,046-Speed 3460.04 samples/sec   Loss 14.5242   LearningRate 0.0949   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:22,012-Speed 3453.15 samples/sec   Loss 14.4962   LearningRate 0.0948   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:24,975-Speed 3456.38 samples/sec   Loss 14.7006   LearningRate 0.0948   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:27,941-Speed 3453.14 samples/sec   Loss 14.4967   LearningRate 0.0948   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:45:30,907-Speed 3453.95 samples/sec   Loss 14.4583   LearningRate 0.0948   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:45:33,862-Speed 3465.79 samples/sec   Loss 14.3377   LearningRate 0.0948   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:36,824-Speed 3458.39 samples/sec   Loss 14.4729   LearningRate 0.0948   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:39,790-Speed 3452.76 samples/sec   Loss 14.4582   LearningRate 0.0947   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:42,751-Speed 3458.88 samples/sec   Loss 14.6884   LearningRate 0.0947   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:45,715-Speed 3456.50 samples/sec   Loss 14.3913   LearningRate 0.0947   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:48,680-Speed 3454.32 samples/sec   Loss 14.3389   LearningRate 0.0947   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:51,655-Speed 3442.51 samples/sec   Loss 14.2578   LearningRate 0.0947   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:54,617-Speed 3457.82 samples/sec   Loss 14.1476   LearningRate 0.0947   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:45:57,584-Speed 3452.70 samples/sec   Loss 14.1439   LearningRate 0.0946   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:00,548-Speed 3455.20 samples/sec   Loss 14.2396   LearningRate 0.0946   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:03,524-Speed 3441.44 samples/sec   Loss 14.2403   LearningRate 0.0946   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:46:06,475-Speed 3470.75 samples/sec   Loss 14.2431   LearningRate 0.0946   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:09,438-Speed 3457.80 samples/sec   Loss 14.1686   LearningRate 0.0946   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:12,402-Speed 3455.51 samples/sec   Loss 14.2021   LearningRate 0.0946   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:15,370-Speed 3451.23 samples/sec   Loss 14.0034   LearningRate 0.0945   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:18,338-Speed 3450.80 samples/sec   Loss 14.0763   LearningRate 0.0945   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:21,303-Speed 3453.95 samples/sec   Loss 13.9800   LearningRate 0.0945   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:24,272-Speed 3449.69 samples/sec   Loss 14.0742   LearningRate 0.0945   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:46:27,237-Speed 3455.07 samples/sec   Loss 13.9726   LearningRate 0.0945   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:46:30,205-Speed 3450.38 samples/sec   Loss 14.1402   LearningRate 0.0945   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:46:33,167-Speed 3457.35 samples/sec   Loss 13.9443   LearningRate 0.0944   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:46:36,136-Speed 3451.08 samples/sec   Loss 13.9287   LearningRate 0.0944   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:46:39,118-Speed 3434.38 samples/sec   Loss 13.9053   LearningRate 0.0944   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:46:42,085-Speed 3451.97 samples/sec   Loss 13.8279   LearningRate 0.0944   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:46:45,029-Speed 3479.56 samples/sec   Loss 14.0538   LearningRate 0.0944   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:46:47,994-Speed 3454.38 samples/sec   Loss 13.7916   LearningRate 0.0943   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:46:50,981-Speed 3428.27 samples/sec   Loss 13.7024   LearningRate 0.0943   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:46:53,944-Speed 3457.03 samples/sec   Loss 13.7790   LearningRate 0.0943   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:46:56,906-Speed 3457.86 samples/sec   Loss 13.8780   LearningRate 0.0943   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:46:59,876-Speed 3448.51 samples/sec   Loss 13.8322   LearningRate 0.0943   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:47:02,843-Speed 3452.26 samples/sec   Loss 13.7727   LearningRate 0.0943   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:47:05,807-Speed 3455.73 samples/sec   Loss 13.8067   LearningRate 0.0942   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:47:08,775-Speed 3450.51 samples/sec   Loss 13.6751   LearningRate 0.0942   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:47:11,738-Speed 3457.52 samples/sec   Loss 13.7858   LearningRate 0.0942   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:47:14,700-Speed 3457.23 samples/sec   Loss 13.7152   LearningRate 0.0942   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:17,668-Speed 3451.11 samples/sec   Loss 13.5333   LearningRate 0.0942   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:20,632-Speed 3455.42 samples/sec   Loss 13.8620   LearningRate 0.0942   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:23,602-Speed 3449.02 samples/sec   Loss 13.6661   LearningRate 0.0941   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:26,575-Speed 3445.41 samples/sec   Loss 13.6261   LearningRate 0.0941   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:29,543-Speed 3450.01 samples/sec   Loss 13.4465   LearningRate 0.0941   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:32,516-Speed 3445.35 samples/sec   Loss 13.6180   LearningRate 0.0941   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:35,481-Speed 3455.45 samples/sec   Loss 13.5932   LearningRate 0.0941   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:38,456-Speed 3442.33 samples/sec   Loss 13.2598   LearningRate 0.0941   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:41,422-Speed 3453.10 samples/sec   Loss 13.6328   LearningRate 0.0940   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:44,389-Speed 3452.70 samples/sec   Loss 13.6105   LearningRate 0.0940   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:47:47,343-Speed 3466.86 samples/sec   Loss 13.4114   LearningRate 0.0940   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:50,322-Speed 3438.76 samples/sec   Loss 13.7047   LearningRate 0.0940   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:53,289-Speed 3451.55 samples/sec   Loss 13.5576   LearningRate 0.0940   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:56,255-Speed 3453.13 samples/sec   Loss 13.4074   LearningRate 0.0940   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:47:59,224-Speed 3450.17 samples/sec   Loss 13.6480   LearningRate 0.0939   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:02,187-Speed 3456.60 samples/sec   Loss 13.6496   LearningRate 0.0939   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:05,166-Speed 3438.78 samples/sec   Loss 13.4692   LearningRate 0.0939   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:08,139-Speed 3445.51 samples/sec   Loss 13.4721   LearningRate 0.0939   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:11,118-Speed 3437.04 samples/sec   Loss 13.3394   LearningRate 0.0939   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:14,088-Speed 3449.32 samples/sec   Loss 13.2881   LearningRate 0.0939   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:17,044-Speed 3465.51 samples/sec   Loss 13.3024   LearningRate 0.0938   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:20,011-Speed 3450.99 samples/sec   Loss 13.3821   LearningRate 0.0938   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:22,980-Speed 3450.68 samples/sec   Loss 13.2987   LearningRate 0.0938   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:25,948-Speed 3450.79 samples/sec   Loss 13.1173   LearningRate 0.0938   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:48:28,907-Speed 3461.47 samples/sec   Loss 12.9718   LearningRate 0.0938   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:31,877-Speed 3449.01 samples/sec   Loss 13.3359   LearningRate 0.0938   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:34,859-Speed 3434.96 samples/sec   Loss 13.4502   LearningRate 0.0937   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:37,829-Speed 3448.29 samples/sec   Loss 13.2952   LearningRate 0.0937   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:40,797-Speed 3450.06 samples/sec   Loss 13.1307   LearningRate 0.0937   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:43,765-Speed 3451.27 samples/sec   Loss 13.3252   LearningRate 0.0937   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:46,740-Speed 3443.47 samples/sec   Loss 13.1921   LearningRate 0.0937   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:49,715-Speed 3442.28 samples/sec   Loss 13.1363   LearningRate 0.0936   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:52,703-Speed 3428.55 samples/sec   Loss 13.2379   LearningRate 0.0936   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:55,674-Speed 3446.51 samples/sec   Loss 13.2232   LearningRate 0.0936   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:48:58,642-Speed 3452.17 samples/sec   Loss 13.1979   LearningRate 0.0936   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:01,630-Speed 3426.72 samples/sec   Loss 13.2076   LearningRate 0.0936   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:04,611-Speed 3437.19 samples/sec   Loss 13.2507   LearningRate 0.0936   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:07,578-Speed 3451.69 samples/sec   Loss 12.8917   LearningRate 0.0935   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:10,548-Speed 3449.36 samples/sec   Loss 13.1249   LearningRate 0.0935   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:13,518-Speed 3449.01 samples/sec   Loss 12.9993   LearningRate 0.0935   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:16,485-Speed 3452.54 samples/sec   Loss 13.2269   LearningRate 0.0935   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:19,459-Speed 3444.23 samples/sec   Loss 13.2934   LearningRate 0.0935   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:22,429-Speed 3448.20 samples/sec   Loss 13.1207   LearningRate 0.0935   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:25,398-Speed 3449.44 samples/sec   Loss 12.8527   LearningRate 0.0934   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:28,370-Speed 3447.27 samples/sec   Loss 13.0059   LearningRate 0.0934   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:31,367-Speed 3417.05 samples/sec   Loss 12.9649   LearningRate 0.0934   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:34,340-Speed 3445.19 samples/sec   Loss 12.9584   LearningRate 0.0934   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:37,315-Speed 3443.41 samples/sec   Loss 12.7697   LearningRate 0.0934   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:40,289-Speed 3443.47 samples/sec   Loss 12.9830   LearningRate 0.0934   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:43,263-Speed 3443.88 samples/sec   Loss 12.8748   LearningRate 0.0933   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:46,237-Speed 3444.16 samples/sec   Loss 12.9882   LearningRate 0.0933   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:49,208-Speed 3447.05 samples/sec   Loss 13.0280   LearningRate 0.0933   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:52,180-Speed 3446.37 samples/sec   Loss 13.0841   LearningRate 0.0933   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:55,152-Speed 3447.01 samples/sec   Loss 12.7353   LearningRate 0.0933   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:49:58,125-Speed 3445.02 samples/sec   Loss 12.8807   LearningRate 0.0933   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:50:01,110-Speed 3430.95 samples/sec   Loss 12.9883   LearningRate 0.0932   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:50:04,084-Speed 3444.71 samples/sec   Loss 12.7149   LearningRate 0.0932   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 01:50:07,047-Speed 3456.24 samples/sec   Loss 12.7856   LearningRate 0.0932   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:50:10,022-Speed 3442.28 samples/sec   Loss 12.8257   LearningRate 0.0932   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:50:12,998-Speed 3442.55 samples/sec   Loss 12.8412   LearningRate 0.0932   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:50:15,988-Speed 3425.11 samples/sec   Loss 12.9467   LearningRate 0.0932   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:50:18,995-Speed 3406.95 samples/sec   Loss 12.7295   LearningRate 0.0931   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:50:21,967-Speed 3445.93 samples/sec   Loss 12.8251   LearningRate 0.0931   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:50:24,943-Speed 3441.78 samples/sec   Loss 12.9355   LearningRate 0.0931   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:50:27,918-Speed 3443.98 samples/sec   Loss 12.7543   LearningRate 0.0931   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 01:51:11,467-[lfw][4000]XNorm: 22.044406
Training: 2022-04-27 01:51:11,468-[lfw][4000]Accuracy-Flip: 0.99200+-0.00314
Training: 2022-04-27 01:51:11,468-[lfw][4000]Accuracy-Highest: 0.99200
Training: 2022-04-27 01:52:02,153-[cfp_fp][4000]XNorm: 19.166544
Training: 2022-04-27 01:52:02,153-[cfp_fp][4000]Accuracy-Flip: 0.87400+-0.01373
Training: 2022-04-27 01:52:02,154-[cfp_fp][4000]Accuracy-Highest: 0.87400
Training: 2022-04-27 01:52:45,825-[agedb_30][4000]XNorm: 21.483009
Training: 2022-04-27 01:52:45,825-[agedb_30][4000]Accuracy-Flip: 0.94000+-0.01402
Training: 2022-04-27 01:52:45,826-[agedb_30][4000]Accuracy-Highest: 0.94000
Training: 2022-04-27 01:52:48,782-Speed 72.69 samples/sec   Loss 12.7220   LearningRate 0.0931   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:52:51,736-Speed 3466.51 samples/sec   Loss 12.6761   LearningRate 0.0931   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:52:54,698-Speed 3458.05 samples/sec   Loss 12.6618   LearningRate 0.0930   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:52:57,662-Speed 3456.00 samples/sec   Loss 12.6964   LearningRate 0.0930   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:53:00,625-Speed 3456.67 samples/sec   Loss 12.7081   LearningRate 0.0930   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:53:03,587-Speed 3458.15 samples/sec   Loss 12.6644   LearningRate 0.0930   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:53:06,547-Speed 3459.77 samples/sec   Loss 12.6412   LearningRate 0.0930   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:53:09,522-Speed 3442.35 samples/sec   Loss 12.5219   LearningRate 0.0930   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:53:12,490-Speed 3451.46 samples/sec   Loss 12.7559   LearningRate 0.0929   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:53:15,444-Speed 3467.01 samples/sec   Loss 12.6429   LearningRate 0.0929   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:18,413-Speed 3450.61 samples/sec   Loss 12.5599   LearningRate 0.0929   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:21,386-Speed 3445.29 samples/sec   Loss 12.7687   LearningRate 0.0929   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:24,361-Speed 3442.69 samples/sec   Loss 12.5487   LearningRate 0.0929   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:27,331-Speed 3448.38 samples/sec   Loss 12.7661   LearningRate 0.0929   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:30,310-Speed 3438.14 samples/sec   Loss 12.5928   LearningRate 0.0928   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:33,280-Speed 3448.99 samples/sec   Loss 12.5189   LearningRate 0.0928   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:36,264-Speed 3431.53 samples/sec   Loss 12.5061   LearningRate 0.0928   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:53:39,228-Speed 3456.30 samples/sec   Loss 12.3325   LearningRate 0.0928   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:53:42,202-Speed 3443.72 samples/sec   Loss 12.4509   LearningRate 0.0928   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:53:45,169-Speed 3452.18 samples/sec   Loss 12.4642   LearningRate 0.0927   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:53:48,137-Speed 3451.11 samples/sec   Loss 12.5399   LearningRate 0.0927   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:53:51,102-Speed 3454.45 samples/sec   Loss 12.1225   LearningRate 0.0927   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:53:54,067-Speed 3454.15 samples/sec   Loss 12.4095   LearningRate 0.0927   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:53:57,031-Speed 3456.16 samples/sec   Loss 12.5147   LearningRate 0.0927   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:53:59,994-Speed 3455.60 samples/sec   Loss 12.4771   LearningRate 0.0927   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:54:02,972-Speed 3439.93 samples/sec   Loss 12.4487   LearningRate 0.0926   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:54:05,937-Speed 3454.43 samples/sec   Loss 12.2745   LearningRate 0.0926   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:54:08,900-Speed 3457.31 samples/sec   Loss 12.4280   LearningRate 0.0926   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:11,860-Speed 3459.38 samples/sec   Loss 12.4596   LearningRate 0.0926   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:14,817-Speed 3464.02 samples/sec   Loss 12.5640   LearningRate 0.0926   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:17,791-Speed 3444.63 samples/sec   Loss 12.3124   LearningRate 0.0926   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:20,749-Speed 3462.29 samples/sec   Loss 12.4590   LearningRate 0.0925   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:23,704-Speed 3466.01 samples/sec   Loss 12.2962   LearningRate 0.0925   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:26,659-Speed 3466.43 samples/sec   Loss 12.3322   LearningRate 0.0925   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:29,616-Speed 3463.60 samples/sec   Loss 12.3833   LearningRate 0.0925   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:32,573-Speed 3463.52 samples/sec   Loss 12.2661   LearningRate 0.0925   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:35,529-Speed 3465.27 samples/sec   Loss 12.2460   LearningRate 0.0925   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:38,488-Speed 3461.05 samples/sec   Loss 12.2668   LearningRate 0.0924   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:54:41,441-Speed 3468.96 samples/sec   Loss 12.3868   LearningRate 0.0924   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:44,405-Speed 3455.49 samples/sec   Loss 12.2766   LearningRate 0.0924   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:54:47,353-Speed 3474.77 samples/sec   Loss 12.0989   LearningRate 0.0924   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:54:50,307-Speed 3467.55 samples/sec   Loss 12.2504   LearningRate 0.0924   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:54:53,259-Speed 3469.91 samples/sec   Loss 12.2606   LearningRate 0.0924   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:54:56,212-Speed 3467.41 samples/sec   Loss 12.2635   LearningRate 0.0923   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:54:59,170-Speed 3462.92 samples/sec   Loss 12.1805   LearningRate 0.0923   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:02,141-Speed 3448.04 samples/sec   Loss 12.0922   LearningRate 0.0923   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:05,097-Speed 3464.68 samples/sec   Loss 11.9597   LearningRate 0.0923   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:08,060-Speed 3456.22 samples/sec   Loss 12.1669   LearningRate 0.0923   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:11,020-Speed 3460.60 samples/sec   Loss 12.0596   LearningRate 0.0923   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:13,979-Speed 3461.41 samples/sec   Loss 12.1990   LearningRate 0.0922   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:16,942-Speed 3457.15 samples/sec   Loss 12.1878   LearningRate 0.0922   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:19,902-Speed 3459.71 samples/sec   Loss 12.2039   LearningRate 0.0922   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:22,859-Speed 3464.06 samples/sec   Loss 12.1032   LearningRate 0.0922   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:25,828-Speed 3449.40 samples/sec   Loss 12.0611   LearningRate 0.0922   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:28,791-Speed 3457.16 samples/sec   Loss 12.1608   LearningRate 0.0922   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:31,750-Speed 3461.86 samples/sec   Loss 12.0708   LearningRate 0.0921   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:34,707-Speed 3463.95 samples/sec   Loss 12.0174   LearningRate 0.0921   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:37,666-Speed 3461.52 samples/sec   Loss 12.1736   LearningRate 0.0921   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:55:40,622-Speed 3464.54 samples/sec   Loss 12.0749   LearningRate 0.0921   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:43,578-Speed 3464.57 samples/sec   Loss 12.0388   LearningRate 0.0921   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:46,539-Speed 3460.22 samples/sec   Loss 12.1564   LearningRate 0.0921   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:49,536-Speed 3417.38 samples/sec   Loss 11.9758   LearningRate 0.0920   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:52,533-Speed 3417.86 samples/sec   Loss 11.9349   LearningRate 0.0920   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:55,528-Speed 3419.59 samples/sec   Loss 11.8644   LearningRate 0.0920   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:55:58,498-Speed 3448.86 samples/sec   Loss 12.0072   LearningRate 0.0920   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:01,457-Speed 3460.90 samples/sec   Loss 12.1318   LearningRate 0.0920   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:04,422-Speed 3455.04 samples/sec   Loss 12.0745   LearningRate 0.0920   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:07,384-Speed 3457.47 samples/sec   Loss 11.9165   LearningRate 0.0919   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:10,332-Speed 3474.55 samples/sec   Loss 11.9810   LearningRate 0.0919   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:13,289-Speed 3463.08 samples/sec   Loss 11.9553   LearningRate 0.0919   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:16,252-Speed 3457.01 samples/sec   Loss 11.9592   LearningRate 0.0919   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:19,217-Speed 3455.05 samples/sec   Loss 11.9294   LearningRate 0.0919   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:22,178-Speed 3459.57 samples/sec   Loss 12.0220   LearningRate 0.0919   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:25,145-Speed 3451.29 samples/sec   Loss 12.0176   LearningRate 0.0918   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:28,107-Speed 3457.76 samples/sec   Loss 11.9086   LearningRate 0.0918   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:31,067-Speed 3461.01 samples/sec   Loss 11.9220   LearningRate 0.0918   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:34,031-Speed 3456.11 samples/sec   Loss 11.8397   LearningRate 0.0918   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:36,995-Speed 3454.86 samples/sec   Loss 12.1992   LearningRate 0.0918   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:39,961-Speed 3453.01 samples/sec   Loss 11.8839   LearningRate 0.0918   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:56:42,930-Speed 3449.94 samples/sec   Loss 11.9992   LearningRate 0.0917   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:56:45,893-Speed 3457.18 samples/sec   Loss 11.7339   LearningRate 0.0917   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:56:48,868-Speed 3441.88 samples/sec   Loss 11.8986   LearningRate 0.0917   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:56:51,830-Speed 3458.46 samples/sec   Loss 11.8758   LearningRate 0.0917   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:56:54,826-Speed 3418.86 samples/sec   Loss 11.6342   LearningRate 0.0917   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:56:57,789-Speed 3456.83 samples/sec   Loss 11.8295   LearningRate 0.0917   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:00,758-Speed 3450.29 samples/sec   Loss 11.9476   LearningRate 0.0916   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:03,732-Speed 3443.71 samples/sec   Loss 11.8129   LearningRate 0.0916   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:06,694-Speed 3458.12 samples/sec   Loss 11.7009   LearningRate 0.0916   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:09,670-Speed 3441.18 samples/sec   Loss 11.8622   LearningRate 0.0916   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:12,629-Speed 3460.87 samples/sec   Loss 11.7870   LearningRate 0.0916   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:15,598-Speed 3449.85 samples/sec   Loss 12.0083   LearningRate 0.0916   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:18,565-Speed 3451.84 samples/sec   Loss 11.8415   LearningRate 0.0915   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:21,530-Speed 3454.90 samples/sec   Loss 11.7247   LearningRate 0.0915   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:57:24,491-Speed 3458.97 samples/sec   Loss 11.7343   LearningRate 0.0915   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:27,464-Speed 3446.31 samples/sec   Loss 11.8280   LearningRate 0.0915   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:30,439-Speed 3442.62 samples/sec   Loss 11.8131   LearningRate 0.0915   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:33,418-Speed 3437.42 samples/sec   Loss 11.8861   LearningRate 0.0915   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:36,384-Speed 3453.89 samples/sec   Loss 11.8026   LearningRate 0.0914   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:39,351-Speed 3452.01 samples/sec   Loss 11.7676   LearningRate 0.0914   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:42,319-Speed 3451.09 samples/sec   Loss 11.6989   LearningRate 0.0914   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:45,284-Speed 3453.79 samples/sec   Loss 11.5832   LearningRate 0.0914   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:48,248-Speed 3456.04 samples/sec   Loss 11.7145   LearningRate 0.0914   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:51,215-Speed 3451.84 samples/sec   Loss 11.7248   LearningRate 0.0913   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:57:54,214-Speed 3415.74 samples/sec   Loss 11.6808   LearningRate 0.0913   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:57:57,177-Speed 3456.94 samples/sec   Loss 11.7859   LearningRate 0.0913   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:58:00,142-Speed 3454.02 samples/sec   Loss 11.5850   LearningRate 0.0913   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:58:03,107-Speed 3455.14 samples/sec   Loss 11.5795   LearningRate 0.0913   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:58:06,060-Speed 3467.94 samples/sec   Loss 11.5968   LearningRate 0.0913   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:09,023-Speed 3456.49 samples/sec   Loss 11.4955   LearningRate 0.0912   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:11,994-Speed 3447.81 samples/sec   Loss 11.5108   LearningRate 0.0912   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:14,974-Speed 3437.17 samples/sec   Loss 11.5584   LearningRate 0.0912   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:17,943-Speed 3448.75 samples/sec   Loss 11.7204   LearningRate 0.0912   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:20,908-Speed 3454.70 samples/sec   Loss 11.5258   LearningRate 0.0912   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:23,878-Speed 3449.31 samples/sec   Loss 11.6642   LearningRate 0.0912   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:26,847-Speed 3449.73 samples/sec   Loss 11.6276   LearningRate 0.0911   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:29,813-Speed 3453.50 samples/sec   Loss 11.7060   LearningRate 0.0911   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:32,782-Speed 3449.50 samples/sec   Loss 11.5916   LearningRate 0.0911   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:35,746-Speed 3455.06 samples/sec   Loss 11.5142   LearningRate 0.0911   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:38,726-Speed 3437.92 samples/sec   Loss 11.6085   LearningRate 0.0911   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:41,702-Speed 3441.38 samples/sec   Loss 11.6366   LearningRate 0.0911   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:44,673-Speed 3447.12 samples/sec   Loss 11.6984   LearningRate 0.0910   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:47,651-Speed 3439.13 samples/sec   Loss 11.5753   LearningRate 0.0910   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:50,618-Speed 3452.03 samples/sec   Loss 11.5495   LearningRate 0.0910   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:53,585-Speed 3453.28 samples/sec   Loss 11.5842   LearningRate 0.0910   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:56,557-Speed 3445.76 samples/sec   Loss 11.3767   LearningRate 0.0910   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:58:59,521-Speed 3456.36 samples/sec   Loss 11.3179   LearningRate 0.0910   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:02,490-Speed 3448.85 samples/sec   Loss 11.6310   LearningRate 0.0909   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:05,463-Speed 3445.82 samples/sec   Loss 11.4901   LearningRate 0.0909   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 01:59:08,417-Speed 3466.68 samples/sec   Loss 11.6950   LearningRate 0.0909   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:11,387-Speed 3448.87 samples/sec   Loss 11.5676   LearningRate 0.0909   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:14,354-Speed 3451.43 samples/sec   Loss 11.5911   LearningRate 0.0909   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:17,319-Speed 3454.61 samples/sec   Loss 11.7352   LearningRate 0.0909   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:20,287-Speed 3451.47 samples/sec   Loss 11.4962   LearningRate 0.0908   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:23,257-Speed 3448.43 samples/sec   Loss 11.7128   LearningRate 0.0908   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:26,226-Speed 3449.76 samples/sec   Loss 11.5622   LearningRate 0.0908   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:29,204-Speed 3440.16 samples/sec   Loss 11.5311   LearningRate 0.0908   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 01:59:32,158-Speed 3466.83 samples/sec   Loss 11.4741   LearningRate 0.0908   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:59:35,135-Speed 3440.85 samples/sec   Loss 11.4841   LearningRate 0.0908   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:59:38,106-Speed 3446.91 samples/sec   Loss 11.4434   LearningRate 0.0907   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:59:41,080-Speed 3444.06 samples/sec   Loss 11.4643   LearningRate 0.0907   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 01:59:44,059-Speed 3438.37 samples/sec   Loss 11.4438   LearningRate 0.0907   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:59:47,032-Speed 3445.63 samples/sec   Loss 11.5566   LearningRate 0.0907   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:59:50,019-Speed 3428.71 samples/sec   Loss 11.4290   LearningRate 0.0907   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:59:53,012-Speed 3429.48 samples/sec   Loss 11.5018   LearningRate 0.0907   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:59:55,983-Speed 3447.76 samples/sec   Loss 11.5005   LearningRate 0.0906   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 01:59:58,954-Speed 3448.15 samples/sec   Loss 11.4596   LearningRate 0.0906   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:00:01,923-Speed 3449.30 samples/sec   Loss 11.3218   LearningRate 0.0906   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:04,891-Speed 3451.07 samples/sec   Loss 11.3086   LearningRate 0.0906   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:07,861-Speed 3448.62 samples/sec   Loss 11.4523   LearningRate 0.0906   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:10,840-Speed 3438.29 samples/sec   Loss 11.4423   LearningRate 0.0906   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:13,811-Speed 3447.17 samples/sec   Loss 11.3704   LearningRate 0.0905   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:16,799-Speed 3428.01 samples/sec   Loss 11.2679   LearningRate 0.0905   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:19,775-Speed 3441.64 samples/sec   Loss 11.4258   LearningRate 0.0905   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:22,760-Speed 3432.79 samples/sec   Loss 11.3139   LearningRate 0.0905   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:25,732-Speed 3445.29 samples/sec   Loss 11.4127   LearningRate 0.0905   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:28,699-Speed 3452.06 samples/sec   Loss 11.2902   LearningRate 0.0905   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:31,666-Speed 3452.81 samples/sec   Loss 11.3586   LearningRate 0.0904   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:00:34,630-Speed 3455.57 samples/sec   Loss 11.3395   LearningRate 0.0904   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:37,605-Speed 3442.32 samples/sec   Loss 11.2115   LearningRate 0.0904   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:40,576-Speed 3448.37 samples/sec   Loss 11.3782   LearningRate 0.0904   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:43,554-Speed 3439.30 samples/sec   Loss 11.3691   LearningRate 0.0904   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:46,523-Speed 3449.27 samples/sec   Loss 11.2051   LearningRate 0.0904   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:49,499-Speed 3441.51 samples/sec   Loss 11.1439   LearningRate 0.0903   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:52,477-Speed 3439.70 samples/sec   Loss 11.1068   LearningRate 0.0903   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:55,453-Speed 3442.01 samples/sec   Loss 11.4076   LearningRate 0.0903   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:00:58,421-Speed 3450.81 samples/sec   Loss 11.2732   LearningRate 0.0903   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:01:01,398-Speed 3439.75 samples/sec   Loss 11.3598   LearningRate 0.0903   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:01:04,431-Speed 3377.23 samples/sec   Loss 11.3454   LearningRate 0.0903   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:01:17,614-Speed 776.82 samples/sec   Loss 10.8269   LearningRate 0.0902   Epoch: 1   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:01:20,787-Speed 3228.55 samples/sec   Loss 10.6072   LearningRate 0.0902   Epoch: 1   Global Step: 5700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:23,754-Speed 3451.88 samples/sec   Loss 10.4715   LearningRate 0.0902   Epoch: 1   Global Step: 5710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:26,758-Speed 3410.13 samples/sec   Loss 10.5011   LearningRate 0.0902   Epoch: 1   Global Step: 5720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:29,734-Speed 3442.10 samples/sec   Loss 10.3723   LearningRate 0.0902   Epoch: 1   Global Step: 5730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:32,698-Speed 3455.63 samples/sec   Loss 10.5867   LearningRate 0.0902   Epoch: 1   Global Step: 5740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:35,667-Speed 3449.98 samples/sec   Loss 10.5834   LearningRate 0.0901   Epoch: 1   Global Step: 5750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:38,634-Speed 3452.00 samples/sec   Loss 10.4450   LearningRate 0.0901   Epoch: 1   Global Step: 5760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:41,602-Speed 3450.81 samples/sec   Loss 10.5189   LearningRate 0.0901   Epoch: 1   Global Step: 5770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:44,570-Speed 3451.05 samples/sec   Loss 10.6047   LearningRate 0.0901   Epoch: 1   Global Step: 5780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:47,536-Speed 3452.70 samples/sec   Loss 10.6301   LearningRate 0.0901   Epoch: 1   Global Step: 5790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:01:50,506-Speed 3448.81 samples/sec   Loss 10.6487   LearningRate 0.0901   Epoch: 1   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:01:53,476-Speed 3448.71 samples/sec   Loss 10.5849   LearningRate 0.0900   Epoch: 1   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:01:56,455-Speed 3437.68 samples/sec   Loss 10.5846   LearningRate 0.0900   Epoch: 1   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:01:59,433-Speed 3440.17 samples/sec   Loss 10.7574   LearningRate 0.0900   Epoch: 1   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:02,411-Speed 3439.56 samples/sec   Loss 10.7184   LearningRate 0.0900   Epoch: 1   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:05,384-Speed 3444.43 samples/sec   Loss 10.7082   LearningRate 0.0900   Epoch: 1   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:08,366-Speed 3435.11 samples/sec   Loss 10.5023   LearningRate 0.0900   Epoch: 1   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:11,339-Speed 3445.03 samples/sec   Loss 10.6521   LearningRate 0.0899   Epoch: 1   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:14,321-Speed 3434.41 samples/sec   Loss 10.6254   LearningRate 0.0899   Epoch: 1   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:17,331-Speed 3403.40 samples/sec   Loss 10.7144   LearningRate 0.0899   Epoch: 1   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:20,309-Speed 3439.19 samples/sec   Loss 10.7861   LearningRate 0.0899   Epoch: 1   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:23,290-Speed 3435.69 samples/sec   Loss 10.6975   LearningRate 0.0899   Epoch: 1   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:02:26,262-Speed 3446.70 samples/sec   Loss 10.6946   LearningRate 0.0899   Epoch: 1   Global Step: 5920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:29,249-Speed 3429.32 samples/sec   Loss 10.7485   LearningRate 0.0898   Epoch: 1   Global Step: 5930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:32,233-Speed 3431.61 samples/sec   Loss 10.7220   LearningRate 0.0898   Epoch: 1   Global Step: 5940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:35,222-Speed 3427.19 samples/sec   Loss 10.9531   LearningRate 0.0898   Epoch: 1   Global Step: 5950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:38,253-Speed 3379.48 samples/sec   Loss 10.8661   LearningRate 0.0898   Epoch: 1   Global Step: 5960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:41,260-Speed 3406.67 samples/sec   Loss 10.7004   LearningRate 0.0898   Epoch: 1   Global Step: 5970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:44,241-Speed 3435.40 samples/sec   Loss 10.7769   LearningRate 0.0898   Epoch: 1   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:47,231-Speed 3425.07 samples/sec   Loss 10.6545   LearningRate 0.0897   Epoch: 1   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:02:50,216-Speed 3431.47 samples/sec   Loss 10.7644   LearningRate 0.0897   Epoch: 1   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:03:33,585-[lfw][6000]XNorm: 23.073785
Training: 2022-04-27 02:03:33,585-[lfw][6000]Accuracy-Flip: 0.99367+-0.00371
Training: 2022-04-27 02:03:33,586-[lfw][6000]Accuracy-Highest: 0.99367
Training: 2022-04-27 02:04:23,975-[cfp_fp][6000]XNorm: 19.895782
Training: 2022-04-27 02:04:23,976-[cfp_fp][6000]Accuracy-Flip: 0.88571+-0.01898
Training: 2022-04-27 02:04:23,976-[cfp_fp][6000]Accuracy-Highest: 0.88571
Training: 2022-04-27 02:05:07,190-[agedb_30][6000]XNorm: 22.293554
Training: 2022-04-27 02:05:07,191-[agedb_30][6000]Accuracy-Flip: 0.95450+-0.00873
Training: 2022-04-27 02:05:07,191-[agedb_30][6000]Accuracy-Highest: 0.95450
Training: 2022-04-27 02:05:10,246-Speed 73.13 samples/sec   Loss 10.5289   LearningRate 0.0897   Epoch: 1   Global Step: 6010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:13,184-Speed 3485.59 samples/sec   Loss 10.7065   LearningRate 0.0897   Epoch: 1   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:05:16,124-Speed 3484.20 samples/sec   Loss 10.7681   LearningRate 0.0897   Epoch: 1   Global Step: 6030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:19,084-Speed 3460.13 samples/sec   Loss 10.8672   LearningRate 0.0897   Epoch: 1   Global Step: 6040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:22,035-Speed 3471.54 samples/sec   Loss 10.6543   LearningRate 0.0896   Epoch: 1   Global Step: 6050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:24,984-Speed 3473.16 samples/sec   Loss 10.6421   LearningRate 0.0896   Epoch: 1   Global Step: 6060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:27,933-Speed 3472.34 samples/sec   Loss 10.6934   LearningRate 0.0896   Epoch: 1   Global Step: 6070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:30,889-Speed 3465.52 samples/sec   Loss 10.7527   LearningRate 0.0896   Epoch: 1   Global Step: 6080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:33,846-Speed 3463.06 samples/sec   Loss 10.7204   LearningRate 0.0896   Epoch: 1   Global Step: 6090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:36,808-Speed 3458.60 samples/sec   Loss 10.6154   LearningRate 0.0896   Epoch: 1   Global Step: 6100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:39,767-Speed 3460.97 samples/sec   Loss 10.6772   LearningRate 0.0895   Epoch: 1   Global Step: 6110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:42,725-Speed 3463.43 samples/sec   Loss 10.7883   LearningRate 0.0895   Epoch: 1   Global Step: 6120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:05:45,681-Speed 3464.29 samples/sec   Loss 10.8516   LearningRate 0.0895   Epoch: 1   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:05:48,645-Speed 3455.52 samples/sec   Loss 10.8135   LearningRate 0.0895   Epoch: 1   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:05:51,648-Speed 3411.24 samples/sec   Loss 10.8408   LearningRate 0.0895   Epoch: 1   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:05:54,620-Speed 3446.34 samples/sec   Loss 10.7989   LearningRate 0.0895   Epoch: 1   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:05:57,581-Speed 3458.09 samples/sec   Loss 10.9222   LearningRate 0.0894   Epoch: 1   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:00,553-Speed 3446.24 samples/sec   Loss 10.7898   LearningRate 0.0894   Epoch: 1   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:03,524-Speed 3448.16 samples/sec   Loss 10.7260   LearningRate 0.0894   Epoch: 1   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:06,493-Speed 3449.29 samples/sec   Loss 10.6569   LearningRate 0.0894   Epoch: 1   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:09,464-Speed 3448.24 samples/sec   Loss 10.6387   LearningRate 0.0894   Epoch: 1   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:12,435-Speed 3447.50 samples/sec   Loss 10.6409   LearningRate 0.0894   Epoch: 1   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:15,395-Speed 3459.78 samples/sec   Loss 10.7698   LearningRate 0.0893   Epoch: 1   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:18,379-Speed 3432.86 samples/sec   Loss 10.5541   LearningRate 0.0893   Epoch: 1   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:21,355-Speed 3441.46 samples/sec   Loss 10.7689   LearningRate 0.0893   Epoch: 1   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:24,325-Speed 3449.47 samples/sec   Loss 10.6237   LearningRate 0.0893   Epoch: 1   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:27,294-Speed 3449.30 samples/sec   Loss 10.8297   LearningRate 0.0893   Epoch: 1   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:30,261-Speed 3452.27 samples/sec   Loss 10.7151   LearningRate 0.0893   Epoch: 1   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:33,232-Speed 3447.46 samples/sec   Loss 10.5236   LearningRate 0.0892   Epoch: 1   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:36,196-Speed 3455.50 samples/sec   Loss 10.7107   LearningRate 0.0892   Epoch: 1   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:39,163-Speed 3452.91 samples/sec   Loss 10.6512   LearningRate 0.0892   Epoch: 1   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:42,123-Speed 3459.70 samples/sec   Loss 10.7294   LearningRate 0.0892   Epoch: 1   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:45,064-Speed 3483.13 samples/sec   Loss 10.6558   LearningRate 0.0892   Epoch: 1   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:06:48,006-Speed 3480.95 samples/sec   Loss 10.6628   LearningRate 0.0892   Epoch: 1   Global Step: 6340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:06:50,965-Speed 3461.21 samples/sec   Loss 10.6995   LearningRate 0.0891   Epoch: 1   Global Step: 6350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:06:53,921-Speed 3464.70 samples/sec   Loss 10.7592   LearningRate 0.0891   Epoch: 1   Global Step: 6360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:06:56,876-Speed 3466.12 samples/sec   Loss 10.6906   LearningRate 0.0891   Epoch: 1   Global Step: 6370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:06:59,834-Speed 3462.71 samples/sec   Loss 10.6857   LearningRate 0.0891   Epoch: 1   Global Step: 6380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:02,796-Speed 3458.82 samples/sec   Loss 10.7154   LearningRate 0.0891   Epoch: 1   Global Step: 6390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:05,755-Speed 3460.48 samples/sec   Loss 10.6490   LearningRate 0.0891   Epoch: 1   Global Step: 6400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:08,708-Speed 3468.93 samples/sec   Loss 10.7876   LearningRate 0.0890   Epoch: 1   Global Step: 6410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:11,664-Speed 3465.46 samples/sec   Loss 10.6204   LearningRate 0.0890   Epoch: 1   Global Step: 6420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:14,618-Speed 3466.88 samples/sec   Loss 10.5849   LearningRate 0.0890   Epoch: 1   Global Step: 6430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:18,635-Speed 2549.64 samples/sec   Loss 10.6405   LearningRate 0.0890   Epoch: 1   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:07:21,989-Speed 3053.37 samples/sec   Loss 10.9193   LearningRate 0.0890   Epoch: 1   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:07:24,934-Speed 3478.59 samples/sec   Loss 10.7994   LearningRate 0.0890   Epoch: 1   Global Step: 6460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:27,896-Speed 3457.09 samples/sec   Loss 10.6753   LearningRate 0.0889   Epoch: 1   Global Step: 6470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:30,855-Speed 3462.59 samples/sec   Loss 10.6318   LearningRate 0.0889   Epoch: 1   Global Step: 6480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:33,814-Speed 3461.57 samples/sec   Loss 10.6656   LearningRate 0.0889   Epoch: 1   Global Step: 6490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:36,769-Speed 3465.83 samples/sec   Loss 10.7309   LearningRate 0.0889   Epoch: 1   Global Step: 6500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:39,738-Speed 3449.70 samples/sec   Loss 10.5875   LearningRate 0.0889   Epoch: 1   Global Step: 6510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:42,695-Speed 3463.43 samples/sec   Loss 10.6032   LearningRate 0.0889   Epoch: 1   Global Step: 6520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:45,649-Speed 3467.56 samples/sec   Loss 10.5886   LearningRate 0.0888   Epoch: 1   Global Step: 6530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:48,612-Speed 3456.02 samples/sec   Loss 10.6710   LearningRate 0.0888   Epoch: 1   Global Step: 6540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:51,593-Speed 3436.07 samples/sec   Loss 10.5963   LearningRate 0.0888   Epoch: 1   Global Step: 6550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:07:54,547-Speed 3467.30 samples/sec   Loss 10.6970   LearningRate 0.0888   Epoch: 1   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:07:57,500-Speed 3468.65 samples/sec   Loss 10.7024   LearningRate 0.0888   Epoch: 1   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:00,469-Speed 3450.13 samples/sec   Loss 10.7724   LearningRate 0.0888   Epoch: 1   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:03,421-Speed 3469.18 samples/sec   Loss 10.7604   LearningRate 0.0887   Epoch: 1   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:06,380-Speed 3461.70 samples/sec   Loss 10.8319   LearningRate 0.0887   Epoch: 1   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:09,334-Speed 3467.14 samples/sec   Loss 10.7395   LearningRate 0.0887   Epoch: 1   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:12,291-Speed 3464.20 samples/sec   Loss 10.5716   LearningRate 0.0887   Epoch: 1   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:15,288-Speed 3417.35 samples/sec   Loss 10.7703   LearningRate 0.0887   Epoch: 1   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:18,251-Speed 3456.46 samples/sec   Loss 10.4119   LearningRate 0.0887   Epoch: 1   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:21,206-Speed 3466.18 samples/sec   Loss 10.5784   LearningRate 0.0886   Epoch: 1   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:24,164-Speed 3463.26 samples/sec   Loss 10.6671   LearningRate 0.0886   Epoch: 1   Global Step: 6660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:08:27,122-Speed 3462.57 samples/sec   Loss 10.5522   LearningRate 0.0886   Epoch: 1   Global Step: 6670   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:08:30,079-Speed 3463.99 samples/sec   Loss 10.6308   LearningRate 0.0886   Epoch: 1   Global Step: 6680   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:08:33,039-Speed 3460.06 samples/sec   Loss 10.5427   LearningRate 0.0886   Epoch: 1   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:36,000-Speed 3458.94 samples/sec   Loss 10.6931   LearningRate 0.0886   Epoch: 1   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:38,960-Speed 3460.32 samples/sec   Loss 10.6240   LearningRate 0.0885   Epoch: 1   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:41,927-Speed 3451.85 samples/sec   Loss 10.4661   LearningRate 0.0885   Epoch: 1   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:44,887-Speed 3460.40 samples/sec   Loss 10.5663   LearningRate 0.0885   Epoch: 1   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:47,842-Speed 3466.53 samples/sec   Loss 10.5252   LearningRate 0.0885   Epoch: 1   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:50,799-Speed 3463.98 samples/sec   Loss 10.4807   LearningRate 0.0885   Epoch: 1   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:53,758-Speed 3462.15 samples/sec   Loss 10.7586   LearningRate 0.0885   Epoch: 1   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:56,715-Speed 3463.89 samples/sec   Loss 10.6169   LearningRate 0.0884   Epoch: 1   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:08:59,672-Speed 3463.34 samples/sec   Loss 10.5026   LearningRate 0.0884   Epoch: 1   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:02,620-Speed 3474.81 samples/sec   Loss 10.4474   LearningRate 0.0884   Epoch: 1   Global Step: 6790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:05,580-Speed 3459.46 samples/sec   Loss 10.5860   LearningRate 0.0884   Epoch: 1   Global Step: 6800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:08,539-Speed 3462.56 samples/sec   Loss 10.6524   LearningRate 0.0884   Epoch: 1   Global Step: 6810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:11,496-Speed 3463.26 samples/sec   Loss 10.5506   LearningRate 0.0884   Epoch: 1   Global Step: 6820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:14,452-Speed 3464.55 samples/sec   Loss 10.5429   LearningRate 0.0883   Epoch: 1   Global Step: 6830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:17,409-Speed 3464.60 samples/sec   Loss 10.5179   LearningRate 0.0883   Epoch: 1   Global Step: 6840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:20,365-Speed 3464.63 samples/sec   Loss 10.6633   LearningRate 0.0883   Epoch: 1   Global Step: 6850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:23,323-Speed 3462.92 samples/sec   Loss 10.5832   LearningRate 0.0883   Epoch: 1   Global Step: 6860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:26,302-Speed 3438.09 samples/sec   Loss 10.5510   LearningRate 0.0883   Epoch: 1   Global Step: 6870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:29,270-Speed 3451.38 samples/sec   Loss 10.6934   LearningRate 0.0883   Epoch: 1   Global Step: 6880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:09:32,229-Speed 3461.05 samples/sec   Loss 10.4751   LearningRate 0.0882   Epoch: 1   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:35,190-Speed 3458.35 samples/sec   Loss 10.5096   LearningRate 0.0882   Epoch: 1   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:38,152-Speed 3458.42 samples/sec   Loss 10.4647   LearningRate 0.0882   Epoch: 1   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:41,114-Speed 3458.29 samples/sec   Loss 10.5579   LearningRate 0.0882   Epoch: 1   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:44,076-Speed 3458.25 samples/sec   Loss 10.5310   LearningRate 0.0882   Epoch: 1   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:47,036-Speed 3460.68 samples/sec   Loss 10.4103   LearningRate 0.0882   Epoch: 1   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:50,015-Speed 3437.50 samples/sec   Loss 10.5441   LearningRate 0.0882   Epoch: 1   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:52,979-Speed 3456.24 samples/sec   Loss 10.5088   LearningRate 0.0881   Epoch: 1   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:55,943-Speed 3455.60 samples/sec   Loss 10.5790   LearningRate 0.0881   Epoch: 1   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:09:58,905-Speed 3457.48 samples/sec   Loss 10.5226   LearningRate 0.0881   Epoch: 1   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:01,853-Speed 3474.70 samples/sec   Loss 10.4920   LearningRate 0.0881   Epoch: 1   Global Step: 6990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:04,811-Speed 3461.97 samples/sec   Loss 10.4184   LearningRate 0.0881   Epoch: 1   Global Step: 7000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:07,804-Speed 3422.80 samples/sec   Loss 10.7787   LearningRate 0.0881   Epoch: 1   Global Step: 7010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:10,773-Speed 3449.03 samples/sec   Loss 10.4955   LearningRate 0.0880   Epoch: 1   Global Step: 7020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:13,743-Speed 3449.60 samples/sec   Loss 10.3526   LearningRate 0.0880   Epoch: 1   Global Step: 7030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:16,704-Speed 3458.66 samples/sec   Loss 10.5644   LearningRate 0.0880   Epoch: 1   Global Step: 7040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:19,663-Speed 3461.61 samples/sec   Loss 10.3965   LearningRate 0.0880   Epoch: 1   Global Step: 7050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:22,639-Speed 3442.62 samples/sec   Loss 10.4579   LearningRate 0.0880   Epoch: 1   Global Step: 7060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:25,599-Speed 3459.48 samples/sec   Loss 10.5719   LearningRate 0.0880   Epoch: 1   Global Step: 7070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:28,562-Speed 3457.20 samples/sec   Loss 10.5213   LearningRate 0.0879   Epoch: 1   Global Step: 7080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:10:31,525-Speed 3457.58 samples/sec   Loss 10.5082   LearningRate 0.0879   Epoch: 1   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:34,486-Speed 3458.27 samples/sec   Loss 10.2785   LearningRate 0.0879   Epoch: 1   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:37,449-Speed 3456.63 samples/sec   Loss 10.4097   LearningRate 0.0879   Epoch: 1   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:40,409-Speed 3460.09 samples/sec   Loss 10.4781   LearningRate 0.0879   Epoch: 1   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:43,374-Speed 3455.25 samples/sec   Loss 10.3461   LearningRate 0.0879   Epoch: 1   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:46,336-Speed 3458.31 samples/sec   Loss 10.5360   LearningRate 0.0878   Epoch: 1   Global Step: 7140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:49,302-Speed 3453.76 samples/sec   Loss 10.4254   LearningRate 0.0878   Epoch: 1   Global Step: 7150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:52,266-Speed 3455.65 samples/sec   Loss 10.3127   LearningRate 0.0878   Epoch: 1   Global Step: 7160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:55,232-Speed 3452.17 samples/sec   Loss 10.5698   LearningRate 0.0878   Epoch: 1   Global Step: 7170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:10:58,195-Speed 3457.52 samples/sec   Loss 10.4690   LearningRate 0.0878   Epoch: 1   Global Step: 7180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:11:01,175-Speed 3436.65 samples/sec   Loss 10.5089   LearningRate 0.0878   Epoch: 1   Global Step: 7190   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:11:04,150-Speed 3443.29 samples/sec   Loss 10.5650   LearningRate 0.0877   Epoch: 1   Global Step: 7200   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:11:07,113-Speed 3457.20 samples/sec   Loss 10.4711   LearningRate 0.0877   Epoch: 1   Global Step: 7210   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:11:10,087-Speed 3442.80 samples/sec   Loss 10.2449   LearningRate 0.0877   Epoch: 1   Global Step: 7220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:11:13,063-Speed 3443.31 samples/sec   Loss 10.4969   LearningRate 0.0877   Epoch: 1   Global Step: 7230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:11:16,077-Speed 3397.33 samples/sec   Loss 10.3287   LearningRate 0.0877   Epoch: 1   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:11:19,043-Speed 3454.44 samples/sec   Loss 10.3897   LearningRate 0.0877   Epoch: 1   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:11:22,006-Speed 3455.90 samples/sec   Loss 10.3020   LearningRate 0.0876   Epoch: 1   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:11:24,961-Speed 3466.53 samples/sec   Loss 10.3475   LearningRate 0.0876   Epoch: 1   Global Step: 7270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:27,924-Speed 3456.76 samples/sec   Loss 10.2909   LearningRate 0.0876   Epoch: 1   Global Step: 7280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:30,887-Speed 3456.20 samples/sec   Loss 10.2417   LearningRate 0.0876   Epoch: 1   Global Step: 7290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:33,853-Speed 3453.92 samples/sec   Loss 10.2756   LearningRate 0.0876   Epoch: 1   Global Step: 7300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:36,833-Speed 3437.44 samples/sec   Loss 10.2931   LearningRate 0.0876   Epoch: 1   Global Step: 7310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:39,803-Speed 3447.82 samples/sec   Loss 10.3350   LearningRate 0.0875   Epoch: 1   Global Step: 7320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:42,772-Speed 3450.61 samples/sec   Loss 10.2536   LearningRate 0.0875   Epoch: 1   Global Step: 7330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:45,734-Speed 3457.53 samples/sec   Loss 10.1401   LearningRate 0.0875   Epoch: 1   Global Step: 7340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:48,715-Speed 3435.77 samples/sec   Loss 10.3934   LearningRate 0.0875   Epoch: 1   Global Step: 7350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:11:51,702-Speed 3429.31 samples/sec   Loss 10.2870   LearningRate 0.0875   Epoch: 1   Global Step: 7360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:11:54,683-Speed 3435.90 samples/sec   Loss 10.2650   LearningRate 0.0875   Epoch: 1   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:11:57,655-Speed 3445.25 samples/sec   Loss 10.1990   LearningRate 0.0874   Epoch: 1   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:00,635-Speed 3438.39 samples/sec   Loss 10.4105   LearningRate 0.0874   Epoch: 1   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:03,605-Speed 3447.69 samples/sec   Loss 10.1475   LearningRate 0.0874   Epoch: 1   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:06,575-Speed 3449.53 samples/sec   Loss 10.1233   LearningRate 0.0874   Epoch: 1   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:09,543-Speed 3450.51 samples/sec   Loss 10.3261   LearningRate 0.0874   Epoch: 1   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:12,510-Speed 3452.01 samples/sec   Loss 10.2554   LearningRate 0.0874   Epoch: 1   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:15,477-Speed 3452.37 samples/sec   Loss 10.1451   LearningRate 0.0873   Epoch: 1   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:18,448-Speed 3446.77 samples/sec   Loss 10.2505   LearningRate 0.0873   Epoch: 1   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:21,414-Speed 3454.16 samples/sec   Loss 10.1401   LearningRate 0.0873   Epoch: 1   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:24,398-Speed 3432.21 samples/sec   Loss 10.2269   LearningRate 0.0873   Epoch: 1   Global Step: 7470   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:12:27,368-Speed 3448.71 samples/sec   Loss 10.3761   LearningRate 0.0873   Epoch: 1   Global Step: 7480   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:12:30,341-Speed 3445.41 samples/sec   Loss 10.2290   LearningRate 0.0873   Epoch: 1   Global Step: 7490   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:12:33,314-Speed 3444.85 samples/sec   Loss 10.2777   LearningRate 0.0872   Epoch: 1   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:36,280-Speed 3453.88 samples/sec   Loss 9.8846   LearningRate 0.0872   Epoch: 1   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:39,260-Speed 3437.64 samples/sec   Loss 10.1869   LearningRate 0.0872   Epoch: 1   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:42,240-Speed 3437.14 samples/sec   Loss 10.0918   LearningRate 0.0872   Epoch: 1   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:45,207-Speed 3451.92 samples/sec   Loss 10.2810   LearningRate 0.0872   Epoch: 1   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:48,186-Speed 3437.89 samples/sec   Loss 10.1473   LearningRate 0.0872   Epoch: 1   Global Step: 7550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:51,157-Speed 3448.05 samples/sec   Loss 10.1726   LearningRate 0.0871   Epoch: 1   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:54,127-Speed 3448.57 samples/sec   Loss 10.2842   LearningRate 0.0871   Epoch: 1   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:12:57,100-Speed 3445.19 samples/sec   Loss 10.3577   LearningRate 0.0871   Epoch: 1   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:00,079-Speed 3438.29 samples/sec   Loss 10.2854   LearningRate 0.0871   Epoch: 1   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:03,052-Speed 3445.35 samples/sec   Loss 10.2659   LearningRate 0.0871   Epoch: 1   Global Step: 7600   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:06,026-Speed 3443.29 samples/sec   Loss 10.2474   LearningRate 0.0871   Epoch: 1   Global Step: 7610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:09,000-Speed 3444.07 samples/sec   Loss 10.3385   LearningRate 0.0870   Epoch: 1   Global Step: 7620   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:11,962-Speed 3457.33 samples/sec   Loss 10.2175   LearningRate 0.0870   Epoch: 1   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:14,938-Speed 3443.09 samples/sec   Loss 10.0786   LearningRate 0.0870   Epoch: 1   Global Step: 7640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:17,914-Speed 3441.02 samples/sec   Loss 10.1349   LearningRate 0.0870   Epoch: 1   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:20,883-Speed 3449.72 samples/sec   Loss 10.2065   LearningRate 0.0870   Epoch: 1   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:23,859-Speed 3442.01 samples/sec   Loss 10.0868   LearningRate 0.0870   Epoch: 1   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:26,843-Speed 3432.11 samples/sec   Loss 10.1743   LearningRate 0.0869   Epoch: 1   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:29,817-Speed 3444.53 samples/sec   Loss 10.0673   LearningRate 0.0869   Epoch: 1   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:32,787-Speed 3448.00 samples/sec   Loss 10.3431   LearningRate 0.0869   Epoch: 1   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:35,756-Speed 3449.56 samples/sec   Loss 10.2412   LearningRate 0.0869   Epoch: 1   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:38,725-Speed 3450.18 samples/sec   Loss 10.1996   LearningRate 0.0869   Epoch: 1   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:41,709-Speed 3432.37 samples/sec   Loss 10.1636   LearningRate 0.0869   Epoch: 1   Global Step: 7730   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:44,683-Speed 3444.89 samples/sec   Loss 10.1298   LearningRate 0.0869   Epoch: 1   Global Step: 7740   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:47,656-Speed 3444.21 samples/sec   Loss 10.0596   LearningRate 0.0868   Epoch: 1   Global Step: 7750   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:50,648-Speed 3423.60 samples/sec   Loss 10.2902   LearningRate 0.0868   Epoch: 1   Global Step: 7760   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:53,662-Speed 3397.89 samples/sec   Loss 10.1462   LearningRate 0.0868   Epoch: 1   Global Step: 7770   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:13:56,623-Speed 3460.20 samples/sec   Loss 10.0607   LearningRate 0.0868   Epoch: 1   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:13:59,597-Speed 3442.88 samples/sec   Loss 10.2453   LearningRate 0.0868   Epoch: 1   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:02,568-Speed 3447.74 samples/sec   Loss 10.1503   LearningRate 0.0868   Epoch: 1   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:05,540-Speed 3446.35 samples/sec   Loss 9.9707   LearningRate 0.0867   Epoch: 1   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:08,511-Speed 3447.09 samples/sec   Loss 10.1844   LearningRate 0.0867   Epoch: 1   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:11,491-Speed 3437.78 samples/sec   Loss 10.2345   LearningRate 0.0867   Epoch: 1   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:14,460-Speed 3450.30 samples/sec   Loss 10.2028   LearningRate 0.0867   Epoch: 1   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:17,455-Speed 3419.56 samples/sec   Loss 10.1004   LearningRate 0.0867   Epoch: 1   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:20,432-Speed 3440.73 samples/sec   Loss 9.9670   LearningRate 0.0867   Epoch: 1   Global Step: 7860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:23,420-Speed 3427.24 samples/sec   Loss 9.9192   LearningRate 0.0866   Epoch: 1   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:14:26,391-Speed 3447.34 samples/sec   Loss 10.3640   LearningRate 0.0866   Epoch: 1   Global Step: 7880   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:14:29,342-Speed 3471.26 samples/sec   Loss 10.1644   LearningRate 0.0866   Epoch: 1   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:32,312-Speed 3448.22 samples/sec   Loss 10.1325   LearningRate 0.0866   Epoch: 1   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:35,284-Speed 3446.57 samples/sec   Loss 10.1559   LearningRate 0.0866   Epoch: 1   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:38,260-Speed 3441.42 samples/sec   Loss 10.1023   LearningRate 0.0866   Epoch: 1   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:41,246-Speed 3430.25 samples/sec   Loss 10.2757   LearningRate 0.0865   Epoch: 1   Global Step: 7930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:44,221-Speed 3443.27 samples/sec   Loss 10.0308   LearningRate 0.0865   Epoch: 1   Global Step: 7940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:47,189-Speed 3450.54 samples/sec   Loss 9.9515   LearningRate 0.0865   Epoch: 1   Global Step: 7950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:50,172-Speed 3434.03 samples/sec   Loss 10.0592   LearningRate 0.0865   Epoch: 1   Global Step: 7960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:53,139-Speed 3452.27 samples/sec   Loss 10.0240   LearningRate 0.0865   Epoch: 1   Global Step: 7970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:56,108-Speed 3448.93 samples/sec   Loss 10.0941   LearningRate 0.0865   Epoch: 1   Global Step: 7980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:14:59,078-Speed 3449.26 samples/sec   Loss 10.0679   LearningRate 0.0864   Epoch: 1   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:15:02,071-Speed 3421.45 samples/sec   Loss 10.0461   LearningRate 0.0864   Epoch: 1   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:15:45,403-[lfw][8000]XNorm: 22.481622
Training: 2022-04-27 02:15:45,404-[lfw][8000]Accuracy-Flip: 0.99300+-0.00379
Training: 2022-04-27 02:15:45,404-[lfw][8000]Accuracy-Highest: 0.99367
Training: 2022-04-27 02:16:35,626-[cfp_fp][8000]XNorm: 19.511639
Training: 2022-04-27 02:16:35,626-[cfp_fp][8000]Accuracy-Flip: 0.91271+-0.01837
Training: 2022-04-27 02:16:35,627-[cfp_fp][8000]Accuracy-Highest: 0.91271
Training: 2022-04-27 02:17:19,001-[agedb_30][8000]XNorm: 22.350999
Training: 2022-04-27 02:17:19,002-[agedb_30][8000]Accuracy-Flip: 0.95733+-0.00964
Training: 2022-04-27 02:17:19,002-[agedb_30][8000]Accuracy-Highest: 0.95733
Training: 2022-04-27 02:17:21,961-Speed 73.20 samples/sec   Loss 10.0692   LearningRate 0.0864   Epoch: 1   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:24,934-Speed 3444.50 samples/sec   Loss 10.0313   LearningRate 0.0864   Epoch: 1   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:27,897-Speed 3456.84 samples/sec   Loss 9.8848   LearningRate 0.0864   Epoch: 1   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:30,869-Speed 3446.81 samples/sec   Loss 10.2110   LearningRate 0.0864   Epoch: 1   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:33,865-Speed 3418.43 samples/sec   Loss 9.9268   LearningRate 0.0863   Epoch: 1   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:36,835-Speed 3449.06 samples/sec   Loss 10.1880   LearningRate 0.0863   Epoch: 1   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:39,802-Speed 3451.63 samples/sec   Loss 10.0784   LearningRate 0.0863   Epoch: 1   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:42,770-Speed 3451.64 samples/sec   Loss 10.0917   LearningRate 0.0863   Epoch: 1   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:45,740-Speed 3448.21 samples/sec   Loss 9.8892   LearningRate 0.0863   Epoch: 1   Global Step: 8090   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:17:48,708-Speed 3450.98 samples/sec   Loss 9.9809   LearningRate 0.0863   Epoch: 1   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:17:51,675-Speed 3452.15 samples/sec   Loss 10.0077   LearningRate 0.0862   Epoch: 1   Global Step: 8110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:17:54,648-Speed 3445.22 samples/sec   Loss 10.1764   LearningRate 0.0862   Epoch: 1   Global Step: 8120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:17:57,629-Speed 3435.36 samples/sec   Loss 10.1115   LearningRate 0.0862   Epoch: 1   Global Step: 8130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:00,595-Speed 3453.54 samples/sec   Loss 10.0225   LearningRate 0.0862   Epoch: 1   Global Step: 8140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:03,566-Speed 3447.93 samples/sec   Loss 9.9209   LearningRate 0.0862   Epoch: 1   Global Step: 8150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:06,533-Speed 3452.22 samples/sec   Loss 9.9388   LearningRate 0.0862   Epoch: 1   Global Step: 8160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:09,497-Speed 3454.68 samples/sec   Loss 10.0434   LearningRate 0.0861   Epoch: 1   Global Step: 8170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:12,487-Speed 3425.36 samples/sec   Loss 10.1321   LearningRate 0.0861   Epoch: 1   Global Step: 8180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:15,465-Speed 3439.37 samples/sec   Loss 10.0065   LearningRate 0.0861   Epoch: 1   Global Step: 8190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:18,432-Speed 3452.18 samples/sec   Loss 10.0002   LearningRate 0.0861   Epoch: 1   Global Step: 8200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:18:21,396-Speed 3456.40 samples/sec   Loss 9.9577   LearningRate 0.0861   Epoch: 1   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:24,361-Speed 3454.81 samples/sec   Loss 9.9631   LearningRate 0.0861   Epoch: 1   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:27,324-Speed 3456.03 samples/sec   Loss 9.9240   LearningRate 0.0860   Epoch: 1   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:30,286-Speed 3458.44 samples/sec   Loss 9.9797   LearningRate 0.0860   Epoch: 1   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:33,248-Speed 3457.51 samples/sec   Loss 9.7905   LearningRate 0.0860   Epoch: 1   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:36,215-Speed 3451.76 samples/sec   Loss 9.9893   LearningRate 0.0860   Epoch: 1   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:39,179-Speed 3455.71 samples/sec   Loss 9.7925   LearningRate 0.0860   Epoch: 1   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:42,149-Speed 3449.45 samples/sec   Loss 9.9353   LearningRate 0.0860   Epoch: 1   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:45,113-Speed 3455.11 samples/sec   Loss 9.8447   LearningRate 0.0860   Epoch: 1   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:48,075-Speed 3457.90 samples/sec   Loss 9.8963   LearningRate 0.0859   Epoch: 1   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:51,036-Speed 3459.71 samples/sec   Loss 9.6828   LearningRate 0.0859   Epoch: 1   Global Step: 8310   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:18:53,982-Speed 3476.16 samples/sec   Loss 9.8360   LearningRate 0.0859   Epoch: 1   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:56,938-Speed 3464.58 samples/sec   Loss 9.7089   LearningRate 0.0859   Epoch: 1   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:18:59,898-Speed 3460.64 samples/sec   Loss 9.9158   LearningRate 0.0859   Epoch: 1   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:02,859-Speed 3459.56 samples/sec   Loss 9.9601   LearningRate 0.0859   Epoch: 1   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:05,825-Speed 3452.96 samples/sec   Loss 10.0533   LearningRate 0.0858   Epoch: 1   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:08,786-Speed 3459.10 samples/sec   Loss 9.9732   LearningRate 0.0858   Epoch: 1   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:11,746-Speed 3460.14 samples/sec   Loss 9.8339   LearningRate 0.0858   Epoch: 1   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:14,699-Speed 3468.89 samples/sec   Loss 9.8662   LearningRate 0.0858   Epoch: 1   Global Step: 8390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:17,667-Speed 3451.05 samples/sec   Loss 9.9053   LearningRate 0.0858   Epoch: 1   Global Step: 8400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:20,627-Speed 3460.32 samples/sec   Loss 9.9754   LearningRate 0.0858   Epoch: 1   Global Step: 8410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:23,609-Speed 3434.87 samples/sec   Loss 10.0028   LearningRate 0.0857   Epoch: 1   Global Step: 8420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:26,578-Speed 3449.23 samples/sec   Loss 10.0171   LearningRate 0.0857   Epoch: 1   Global Step: 8430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:29,556-Speed 3439.83 samples/sec   Loss 9.7729   LearningRate 0.0857   Epoch: 1   Global Step: 8440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:32,523-Speed 3452.03 samples/sec   Loss 9.7875   LearningRate 0.0857   Epoch: 1   Global Step: 8450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:35,485-Speed 3457.70 samples/sec   Loss 9.9340   LearningRate 0.0857   Epoch: 1   Global Step: 8460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:38,445-Speed 3460.30 samples/sec   Loss 9.9952   LearningRate 0.0857   Epoch: 1   Global Step: 8470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:41,402-Speed 3464.37 samples/sec   Loss 9.8700   LearningRate 0.0856   Epoch: 1   Global Step: 8480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:19:44,361-Speed 3461.64 samples/sec   Loss 9.8802   LearningRate 0.0856   Epoch: 1   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:47,320-Speed 3461.93 samples/sec   Loss 9.8040   LearningRate 0.0856   Epoch: 1   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:50,280-Speed 3460.30 samples/sec   Loss 9.9300   LearningRate 0.0856   Epoch: 1   Global Step: 8510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:53,241-Speed 3458.73 samples/sec   Loss 9.7443   LearningRate 0.0856   Epoch: 1   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:56,199-Speed 3462.98 samples/sec   Loss 9.8061   LearningRate 0.0856   Epoch: 1   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:19:59,158-Speed 3461.27 samples/sec   Loss 9.8740   LearningRate 0.0855   Epoch: 1   Global Step: 8540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:02,119-Speed 3458.93 samples/sec   Loss 9.6585   LearningRate 0.0855   Epoch: 1   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:05,087-Speed 3450.25 samples/sec   Loss 9.8344   LearningRate 0.0855   Epoch: 1   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:08,045-Speed 3462.75 samples/sec   Loss 9.7535   LearningRate 0.0855   Epoch: 1   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:11,003-Speed 3463.28 samples/sec   Loss 9.9025   LearningRate 0.0855   Epoch: 1   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:13,961-Speed 3462.24 samples/sec   Loss 9.6444   LearningRate 0.0855   Epoch: 1   Global Step: 8590   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:20:16,943-Speed 3435.29 samples/sec   Loss 9.8880   LearningRate 0.0854   Epoch: 1   Global Step: 8600   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:20:19,891-Speed 3474.36 samples/sec   Loss 10.0078   LearningRate 0.0854   Epoch: 1   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:22,849-Speed 3462.38 samples/sec   Loss 9.8217   LearningRate 0.0854   Epoch: 1   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:25,810-Speed 3458.92 samples/sec   Loss 9.7415   LearningRate 0.0854   Epoch: 1   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:28,776-Speed 3453.35 samples/sec   Loss 9.9836   LearningRate 0.0854   Epoch: 1   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:31,741-Speed 3454.02 samples/sec   Loss 9.8748   LearningRate 0.0854   Epoch: 1   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:34,702-Speed 3459.64 samples/sec   Loss 9.6805   LearningRate 0.0853   Epoch: 1   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:37,664-Speed 3457.66 samples/sec   Loss 9.7511   LearningRate 0.0853   Epoch: 1   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:40,623-Speed 3461.58 samples/sec   Loss 9.7492   LearningRate 0.0853   Epoch: 1   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:43,593-Speed 3449.45 samples/sec   Loss 9.6790   LearningRate 0.0853   Epoch: 1   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:46,556-Speed 3456.19 samples/sec   Loss 9.6578   LearningRate 0.0853   Epoch: 1   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:20:49,517-Speed 3458.96 samples/sec   Loss 9.6822   LearningRate 0.0853   Epoch: 1   Global Step: 8710   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:20:52,481-Speed 3456.12 samples/sec   Loss 9.8248   LearningRate 0.0853   Epoch: 1   Global Step: 8720   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:20:55,444-Speed 3456.18 samples/sec   Loss 9.7289   LearningRate 0.0852   Epoch: 1   Global Step: 8730   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:20:58,419-Speed 3443.19 samples/sec   Loss 9.8617   LearningRate 0.0852   Epoch: 1   Global Step: 8740   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:01,386-Speed 3452.44 samples/sec   Loss 9.7760   LearningRate 0.0852   Epoch: 1   Global Step: 8750   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:04,364-Speed 3438.94 samples/sec   Loss 9.6532   LearningRate 0.0852   Epoch: 1   Global Step: 8760   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:07,328-Speed 3456.22 samples/sec   Loss 9.7384   LearningRate 0.0852   Epoch: 1   Global Step: 8770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:10,289-Speed 3458.93 samples/sec   Loss 9.7544   LearningRate 0.0852   Epoch: 1   Global Step: 8780   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:13,258-Speed 3449.92 samples/sec   Loss 9.7269   LearningRate 0.0851   Epoch: 1   Global Step: 8790   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:16,220-Speed 3457.62 samples/sec   Loss 9.7420   LearningRate 0.0851   Epoch: 1   Global Step: 8800   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:19,170-Speed 3471.56 samples/sec   Loss 9.6505   LearningRate 0.0851   Epoch: 1   Global Step: 8810   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:22,134-Speed 3455.84 samples/sec   Loss 9.7786   LearningRate 0.0851   Epoch: 1   Global Step: 8820   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-27 02:21:25,094-Speed 3459.66 samples/sec   Loss 9.7086   LearningRate 0.0851   Epoch: 1   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:28,063-Speed 3451.33 samples/sec   Loss 9.6077   LearningRate 0.0851   Epoch: 1   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:31,027-Speed 3454.93 samples/sec   Loss 9.6119   LearningRate 0.0850   Epoch: 1   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:33,992-Speed 3454.67 samples/sec   Loss 9.5182   LearningRate 0.0850   Epoch: 1   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:36,958-Speed 3453.51 samples/sec   Loss 9.6045   LearningRate 0.0850   Epoch: 1   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:39,923-Speed 3454.64 samples/sec   Loss 9.6888   LearningRate 0.0850   Epoch: 1   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:42,889-Speed 3452.86 samples/sec   Loss 9.4488   LearningRate 0.0850   Epoch: 1   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:45,854-Speed 3454.92 samples/sec   Loss 9.5248   LearningRate 0.0850   Epoch: 1   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:21:48,820-Speed 3452.55 samples/sec   Loss 9.6814   LearningRate 0.0849   Epoch: 1   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:21:51,792-Speed 3446.48 samples/sec   Loss 9.6512   LearningRate 0.0849   Epoch: 1   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:21:54,763-Speed 3447.31 samples/sec   Loss 9.5142   LearningRate 0.0849   Epoch: 1   Global Step: 8930   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:21:57,729-Speed 3454.16 samples/sec   Loss 9.6403   LearningRate 0.0849   Epoch: 1   Global Step: 8940   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:22:00,708-Speed 3438.34 samples/sec   Loss 9.5796   LearningRate 0.0849   Epoch: 1   Global Step: 8950   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:22:03,678-Speed 3448.25 samples/sec   Loss 9.6602   LearningRate 0.0849   Epoch: 1   Global Step: 8960   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:22:06,650-Speed 3445.50 samples/sec   Loss 9.7572   LearningRate 0.0848   Epoch: 1   Global Step: 8970   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:22:09,606-Speed 3465.01 samples/sec   Loss 9.6384   LearningRate 0.0848   Epoch: 1   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:12,574-Speed 3451.51 samples/sec   Loss 9.5106   LearningRate 0.0848   Epoch: 1   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:15,584-Speed 3402.47 samples/sec   Loss 9.6941   LearningRate 0.0848   Epoch: 1   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:18,553-Speed 3450.45 samples/sec   Loss 9.5933   LearningRate 0.0848   Epoch: 1   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:21,517-Speed 3455.04 samples/sec   Loss 9.6645   LearningRate 0.0848   Epoch: 1   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:24,490-Speed 3445.04 samples/sec   Loss 9.6580   LearningRate 0.0847   Epoch: 1   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:27,459-Speed 3450.78 samples/sec   Loss 9.7104   LearningRate 0.0847   Epoch: 1   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:30,425-Speed 3453.70 samples/sec   Loss 9.6876   LearningRate 0.0847   Epoch: 1   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:33,389-Speed 3454.65 samples/sec   Loss 9.5230   LearningRate 0.0847   Epoch: 1   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:36,359-Speed 3448.65 samples/sec   Loss 9.8033   LearningRate 0.0847   Epoch: 1   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:39,328-Speed 3449.65 samples/sec   Loss 9.6954   LearningRate 0.0847   Epoch: 1   Global Step: 9080   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:22:42,313-Speed 3431.39 samples/sec   Loss 9.7620   LearningRate 0.0847   Epoch: 1   Global Step: 9090   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:22:45,269-Speed 3464.92 samples/sec   Loss 9.6830   LearningRate 0.0846   Epoch: 1   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:48,241-Speed 3446.64 samples/sec   Loss 9.7062   LearningRate 0.0846   Epoch: 1   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:51,209-Speed 3450.56 samples/sec   Loss 9.6835   LearningRate 0.0846   Epoch: 1   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:54,190-Speed 3436.51 samples/sec   Loss 9.5929   LearningRate 0.0846   Epoch: 1   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:22:57,172-Speed 3434.94 samples/sec   Loss 9.7697   LearningRate 0.0846   Epoch: 1   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:00,144-Speed 3445.97 samples/sec   Loss 9.7771   LearningRate 0.0846   Epoch: 1   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:03,111-Speed 3452.67 samples/sec   Loss 9.6915   LearningRate 0.0845   Epoch: 1   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:06,078-Speed 3451.81 samples/sec   Loss 9.7020   LearningRate 0.0845   Epoch: 1   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:09,030-Speed 3469.30 samples/sec   Loss 9.5135   LearningRate 0.0845   Epoch: 1   Global Step: 9180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:12,000-Speed 3448.17 samples/sec   Loss 9.5059   LearningRate 0.0845   Epoch: 1   Global Step: 9190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:14,975-Speed 3443.37 samples/sec   Loss 9.6408   LearningRate 0.0845   Epoch: 1   Global Step: 9200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:17,944-Speed 3449.95 samples/sec   Loss 9.5421   LearningRate 0.0845   Epoch: 1   Global Step: 9210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:20,915-Speed 3447.86 samples/sec   Loss 9.6718   LearningRate 0.0844   Epoch: 1   Global Step: 9220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:23,894-Speed 3437.56 samples/sec   Loss 9.5909   LearningRate 0.0844   Epoch: 1   Global Step: 9230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:26,874-Speed 3436.81 samples/sec   Loss 9.4600   LearningRate 0.0844   Epoch: 1   Global Step: 9240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:29,838-Speed 3455.43 samples/sec   Loss 9.5935   LearningRate 0.0844   Epoch: 1   Global Step: 9250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:32,803-Speed 3454.99 samples/sec   Loss 9.5456   LearningRate 0.0844   Epoch: 1   Global Step: 9260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:35,768-Speed 3454.27 samples/sec   Loss 9.6474   LearningRate 0.0844   Epoch: 1   Global Step: 9270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:23:38,748-Speed 3436.56 samples/sec   Loss 9.5867   LearningRate 0.0843   Epoch: 1   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:41,720-Speed 3446.94 samples/sec   Loss 9.6007   LearningRate 0.0843   Epoch: 1   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:44,684-Speed 3454.73 samples/sec   Loss 9.5647   LearningRate 0.0843   Epoch: 1   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:47,654-Speed 3450.10 samples/sec   Loss 9.5831   LearningRate 0.0843   Epoch: 1   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:50,663-Speed 3403.04 samples/sec   Loss 9.7703   LearningRate 0.0843   Epoch: 1   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:53,636-Speed 3445.52 samples/sec   Loss 9.6329   LearningRate 0.0843   Epoch: 1   Global Step: 9330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:56,602-Speed 3453.20 samples/sec   Loss 9.5410   LearningRate 0.0842   Epoch: 1   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:23:59,572-Speed 3448.57 samples/sec   Loss 9.6053   LearningRate 0.0842   Epoch: 1   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:02,544-Speed 3446.28 samples/sec   Loss 9.4859   LearningRate 0.0842   Epoch: 1   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:05,512-Speed 3450.61 samples/sec   Loss 9.5707   LearningRate 0.0842   Epoch: 1   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:08,480-Speed 3451.16 samples/sec   Loss 9.5174   LearningRate 0.0842   Epoch: 1   Global Step: 9380   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:24:11,436-Speed 3464.92 samples/sec   Loss 9.5813   LearningRate 0.0842   Epoch: 1   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:14,431-Speed 3420.09 samples/sec   Loss 9.5239   LearningRate 0.0842   Epoch: 1   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:17,415-Speed 3433.83 samples/sec   Loss 9.6059   LearningRate 0.0841   Epoch: 1   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:20,390-Speed 3442.67 samples/sec   Loss 9.5937   LearningRate 0.0841   Epoch: 1   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:23,363-Speed 3445.09 samples/sec   Loss 9.5035   LearningRate 0.0841   Epoch: 1   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:26,333-Speed 3448.70 samples/sec   Loss 9.5874   LearningRate 0.0841   Epoch: 1   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:29,299-Speed 3452.61 samples/sec   Loss 9.4835   LearningRate 0.0841   Epoch: 1   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:32,265-Speed 3453.65 samples/sec   Loss 9.5529   LearningRate 0.0841   Epoch: 1   Global Step: 9460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:35,257-Speed 3423.01 samples/sec   Loss 9.5599   LearningRate 0.0840   Epoch: 1   Global Step: 9470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:38,230-Speed 3445.62 samples/sec   Loss 9.5257   LearningRate 0.0840   Epoch: 1   Global Step: 9480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:41,200-Speed 3448.23 samples/sec   Loss 9.6115   LearningRate 0.0840   Epoch: 1   Global Step: 9490   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:24:44,168-Speed 3451.24 samples/sec   Loss 9.5972   LearningRate 0.0840   Epoch: 1   Global Step: 9500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:24:47,136-Speed 3451.35 samples/sec   Loss 9.5231   LearningRate 0.0840   Epoch: 1   Global Step: 9510   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:24:50,102-Speed 3453.66 samples/sec   Loss 9.4963   LearningRate 0.0840   Epoch: 1   Global Step: 9520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:24:53,072-Speed 3448.25 samples/sec   Loss 9.4817   LearningRate 0.0839   Epoch: 1   Global Step: 9530   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:24:56,030-Speed 3462.42 samples/sec   Loss 9.5215   LearningRate 0.0839   Epoch: 1   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:24:58,999-Speed 3449.75 samples/sec   Loss 9.5892   LearningRate 0.0839   Epoch: 1   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:01,974-Speed 3442.89 samples/sec   Loss 9.4250   LearningRate 0.0839   Epoch: 1   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:04,981-Speed 3406.53 samples/sec   Loss 9.4262   LearningRate 0.0839   Epoch: 1   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:07,956-Speed 3442.73 samples/sec   Loss 9.6550   LearningRate 0.0839   Epoch: 1   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:10,928-Speed 3447.01 samples/sec   Loss 9.4395   LearningRate 0.0838   Epoch: 1   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:13,895-Speed 3451.29 samples/sec   Loss 9.5077   LearningRate 0.0838   Epoch: 1   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:16,865-Speed 3449.06 samples/sec   Loss 9.4784   LearningRate 0.0838   Epoch: 1   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:19,836-Speed 3447.48 samples/sec   Loss 9.4517   LearningRate 0.0838   Epoch: 1   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:22,841-Speed 3407.98 samples/sec   Loss 9.5163   LearningRate 0.0838   Epoch: 1   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:25,831-Speed 3425.47 samples/sec   Loss 9.6028   LearningRate 0.0838   Epoch: 1   Global Step: 9640   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:25:28,816-Speed 3431.40 samples/sec   Loss 9.5215   LearningRate 0.0837   Epoch: 1   Global Step: 9650   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:25:31,789-Speed 3445.30 samples/sec   Loss 9.6140   LearningRate 0.0837   Epoch: 1   Global Step: 9660   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:25:34,765-Speed 3441.87 samples/sec   Loss 9.4768   LearningRate 0.0837   Epoch: 1   Global Step: 9670   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:25:37,732-Speed 3451.76 samples/sec   Loss 9.2860   LearningRate 0.0837   Epoch: 1   Global Step: 9680   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:25:40,728-Speed 3418.59 samples/sec   Loss 9.6005   LearningRate 0.0837   Epoch: 1   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:43,698-Speed 3449.20 samples/sec   Loss 9.5165   LearningRate 0.0837   Epoch: 1   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:46,685-Speed 3429.39 samples/sec   Loss 9.4554   LearningRate 0.0837   Epoch: 1   Global Step: 9710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:49,665-Speed 3436.84 samples/sec   Loss 9.4322   LearningRate 0.0836   Epoch: 1   Global Step: 9720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:52,636-Speed 3447.58 samples/sec   Loss 9.5387   LearningRate 0.0836   Epoch: 1   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:55,606-Speed 3448.21 samples/sec   Loss 9.3915   LearningRate 0.0836   Epoch: 1   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:25:58,576-Speed 3448.31 samples/sec   Loss 9.3292   LearningRate 0.0836   Epoch: 1   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:01,554-Speed 3439.69 samples/sec   Loss 9.3896   LearningRate 0.0836   Epoch: 1   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:04,522-Speed 3450.56 samples/sec   Loss 9.4528   LearningRate 0.0836   Epoch: 1   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:07,493-Speed 3447.76 samples/sec   Loss 9.4742   LearningRate 0.0835   Epoch: 1   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:10,462-Speed 3450.34 samples/sec   Loss 9.2487   LearningRate 0.0835   Epoch: 1   Global Step: 9790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:26:13,441-Speed 3437.30 samples/sec   Loss 9.3288   LearningRate 0.0835   Epoch: 1   Global Step: 9800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:26:16,413-Speed 3446.85 samples/sec   Loss 9.1837   LearningRate 0.0835   Epoch: 1   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:19,383-Speed 3448.61 samples/sec   Loss 9.3296   LearningRate 0.0835   Epoch: 1   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:22,358-Speed 3442.09 samples/sec   Loss 9.3860   LearningRate 0.0835   Epoch: 1   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:25,333-Speed 3443.87 samples/sec   Loss 9.5475   LearningRate 0.0834   Epoch: 1   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:28,313-Speed 3436.31 samples/sec   Loss 9.3958   LearningRate 0.0834   Epoch: 1   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:31,291-Speed 3440.16 samples/sec   Loss 9.5296   LearningRate 0.0834   Epoch: 1   Global Step: 9860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:34,267-Speed 3441.34 samples/sec   Loss 9.2516   LearningRate 0.0834   Epoch: 1   Global Step: 9870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:37,247-Speed 3437.47 samples/sec   Loss 9.1116   LearningRate 0.0834   Epoch: 1   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:40,235-Speed 3427.59 samples/sec   Loss 9.2642   LearningRate 0.0834   Epoch: 1   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:43,245-Speed 3401.85 samples/sec   Loss 9.3862   LearningRate 0.0833   Epoch: 1   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:46,194-Speed 3473.33 samples/sec   Loss 9.4544   LearningRate 0.0833   Epoch: 1   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:49,162-Speed 3452.72 samples/sec   Loss 9.3918   LearningRate 0.0833   Epoch: 1   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:52,137-Speed 3442.28 samples/sec   Loss 9.2695   LearningRate 0.0833   Epoch: 1   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:55,105-Speed 3450.80 samples/sec   Loss 9.2940   LearningRate 0.0833   Epoch: 1   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:26:58,074-Speed 3449.98 samples/sec   Loss 9.4831   LearningRate 0.0833   Epoch: 1   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:27:01,044-Speed 3449.26 samples/sec   Loss 9.4688   LearningRate 0.0833   Epoch: 1   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:27:04,005-Speed 3459.35 samples/sec   Loss 9.3840   LearningRate 0.0832   Epoch: 1   Global Step: 9970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:27:06,971-Speed 3453.08 samples/sec   Loss 9.3375   LearningRate 0.0832   Epoch: 1   Global Step: 9980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:27:09,946-Speed 3443.26 samples/sec   Loss 9.4971   LearningRate 0.0832   Epoch: 1   Global Step: 9990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:27:12,916-Speed 3449.24 samples/sec   Loss 9.3752   LearningRate 0.0832   Epoch: 1   Global Step: 10000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:27:56,368-[lfw][10000]XNorm: 22.190784
Training: 2022-04-27 02:27:56,369-[lfw][10000]Accuracy-Flip: 0.99483+-0.00263
Training: 2022-04-27 02:27:56,369-[lfw][10000]Accuracy-Highest: 0.99483
Training: 2022-04-27 02:28:46,783-[cfp_fp][10000]XNorm: 19.089473
Training: 2022-04-27 02:28:46,784-[cfp_fp][10000]Accuracy-Flip: 0.90543+-0.01380
Training: 2022-04-27 02:28:46,784-[cfp_fp][10000]Accuracy-Highest: 0.91271
Training: 2022-04-27 02:29:30,045-[agedb_30][10000]XNorm: 22.184881
Training: 2022-04-27 02:29:30,045-[agedb_30][10000]Accuracy-Flip: 0.95550+-0.00827
Training: 2022-04-27 02:29:30,046-[agedb_30][10000]Accuracy-Highest: 0.95733
Training: 2022-04-27 02:29:32,989-Speed 73.10 samples/sec   Loss 9.2891   LearningRate 0.0832   Epoch: 1   Global Step: 10010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:29:35,934-Speed 3477.91 samples/sec   Loss 9.2535   LearningRate 0.0832   Epoch: 1   Global Step: 10020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:29:38,875-Speed 3482.51 samples/sec   Loss 9.3335   LearningRate 0.0831   Epoch: 1   Global Step: 10030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:29:41,835-Speed 3460.70 samples/sec   Loss 9.3081   LearningRate 0.0831   Epoch: 1   Global Step: 10040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:29:44,778-Speed 3480.44 samples/sec   Loss 9.3989   LearningRate 0.0831   Epoch: 1   Global Step: 10050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:29:47,726-Speed 3473.48 samples/sec   Loss 9.2462   LearningRate 0.0831   Epoch: 1   Global Step: 10060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:29:50,694-Speed 3451.38 samples/sec   Loss 9.3533   LearningRate 0.0831   Epoch: 1   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:29:53,648-Speed 3466.87 samples/sec   Loss 9.3950   LearningRate 0.0831   Epoch: 1   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:29:56,606-Speed 3462.39 samples/sec   Loss 9.4901   LearningRate 0.0830   Epoch: 1   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:29:59,561-Speed 3466.76 samples/sec   Loss 9.5309   LearningRate 0.0830   Epoch: 1   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:02,517-Speed 3464.84 samples/sec   Loss 9.3611   LearningRate 0.0830   Epoch: 1   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:05,475-Speed 3463.12 samples/sec   Loss 9.3376   LearningRate 0.0830   Epoch: 1   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:08,435-Speed 3460.21 samples/sec   Loss 9.3695   LearningRate 0.0830   Epoch: 1   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:11,403-Speed 3451.03 samples/sec   Loss 9.3354   LearningRate 0.0830   Epoch: 1   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:14,360-Speed 3463.11 samples/sec   Loss 9.4076   LearningRate 0.0829   Epoch: 1   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:17,312-Speed 3470.04 samples/sec   Loss 9.4701   LearningRate 0.0829   Epoch: 1   Global Step: 10160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:20,283-Speed 3447.35 samples/sec   Loss 9.2816   LearningRate 0.0829   Epoch: 1   Global Step: 10170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:23,247-Speed 3455.02 samples/sec   Loss 9.2815   LearningRate 0.0829   Epoch: 1   Global Step: 10180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:26,219-Speed 3446.81 samples/sec   Loss 9.3077   LearningRate 0.0829   Epoch: 1   Global Step: 10190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:29,189-Speed 3448.09 samples/sec   Loss 9.3229   LearningRate 0.0829   Epoch: 1   Global Step: 10200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:32,156-Speed 3452.47 samples/sec   Loss 9.1343   LearningRate 0.0828   Epoch: 1   Global Step: 10210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:35,124-Speed 3451.43 samples/sec   Loss 9.2167   LearningRate 0.0828   Epoch: 1   Global Step: 10220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:38,097-Speed 3445.17 samples/sec   Loss 9.1998   LearningRate 0.0828   Epoch: 1   Global Step: 10230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:41,068-Speed 3447.12 samples/sec   Loss 9.3163   LearningRate 0.0828   Epoch: 1   Global Step: 10240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:44,040-Speed 3446.72 samples/sec   Loss 9.1943   LearningRate 0.0828   Epoch: 1   Global Step: 10250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 02:30:47,024-Speed 3431.89 samples/sec   Loss 9.3244   LearningRate 0.0828   Epoch: 1   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:49,994-Speed 3448.65 samples/sec   Loss 9.1728   LearningRate 0.0828   Epoch: 1   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 02:30:52,969-Speed 3442.99 samples/sec   Loss 9.2947   LearningRate 0.0827   Epoch: 1   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:30:55,932-Speed 3456.21 samples/sec   Loss 9.3106   LearningRate 0.0827   Epoch: 1   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:30:58,899-Speed 3452.98 samples/sec   Loss 9.3902   LearningRate 0.0827   Epoch: 1   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:01,869-Speed 3448.08 samples/sec   Loss 9.2540   LearningRate 0.0827   Epoch: 1   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:04,834-Speed 3454.92 samples/sec   Loss 9.3407   LearningRate 0.0827   Epoch: 1   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:07,797-Speed 3456.89 samples/sec   Loss 9.2041   LearningRate 0.0827   Epoch: 1   Global Step: 10330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:10,758-Speed 3459.36 samples/sec   Loss 9.1264   LearningRate 0.0826   Epoch: 1   Global Step: 10340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:13,726-Speed 3450.79 samples/sec   Loss 9.3196   LearningRate 0.0826   Epoch: 1   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:16,680-Speed 3466.98 samples/sec   Loss 9.3807   LearningRate 0.0826   Epoch: 1   Global Step: 10360   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:31:19,638-Speed 3463.45 samples/sec   Loss 9.4370   LearningRate 0.0826   Epoch: 1   Global Step: 10370   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:31:22,591-Speed 3467.73 samples/sec   Loss 9.4078   LearningRate 0.0826   Epoch: 1   Global Step: 10380   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:31:25,549-Speed 3463.39 samples/sec   Loss 9.3608   LearningRate 0.0826   Epoch: 1   Global Step: 10390   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:31:28,494-Speed 3477.90 samples/sec   Loss 9.3442   LearningRate 0.0825   Epoch: 1   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:31,447-Speed 3467.84 samples/sec   Loss 9.1051   LearningRate 0.0825   Epoch: 1   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:34,402-Speed 3466.99 samples/sec   Loss 9.3177   LearningRate 0.0825   Epoch: 1   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:37,354-Speed 3469.76 samples/sec   Loss 9.2233   LearningRate 0.0825   Epoch: 1   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:40,313-Speed 3460.84 samples/sec   Loss 9.1675   LearningRate 0.0825   Epoch: 1   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:43,269-Speed 3465.48 samples/sec   Loss 9.2768   LearningRate 0.0825   Epoch: 1   Global Step: 10450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:46,232-Speed 3456.47 samples/sec   Loss 9.4785   LearningRate 0.0824   Epoch: 1   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:49,187-Speed 3465.68 samples/sec   Loss 9.4253   LearningRate 0.0824   Epoch: 1   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:52,143-Speed 3465.17 samples/sec   Loss 9.1086   LearningRate 0.0824   Epoch: 1   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:55,097-Speed 3467.35 samples/sec   Loss 9.1740   LearningRate 0.0824   Epoch: 1   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:31:58,050-Speed 3468.71 samples/sec   Loss 9.2070   LearningRate 0.0824   Epoch: 1   Global Step: 10500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:32:01,006-Speed 3465.65 samples/sec   Loss 9.1098   LearningRate 0.0824   Epoch: 1   Global Step: 10510   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:32:03,953-Speed 3475.29 samples/sec   Loss 9.1563   LearningRate 0.0824   Epoch: 1   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:06,907-Speed 3467.02 samples/sec   Loss 9.1144   LearningRate 0.0823   Epoch: 1   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:09,863-Speed 3464.87 samples/sec   Loss 9.2930   LearningRate 0.0823   Epoch: 1   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:12,853-Speed 3425.08 samples/sec   Loss 9.3373   LearningRate 0.0823   Epoch: 1   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:15,815-Speed 3458.81 samples/sec   Loss 9.1407   LearningRate 0.0823   Epoch: 1   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:18,767-Speed 3469.25 samples/sec   Loss 9.1812   LearningRate 0.0823   Epoch: 1   Global Step: 10570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:21,731-Speed 3456.11 samples/sec   Loss 9.0756   LearningRate 0.0823   Epoch: 1   Global Step: 10580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:24,685-Speed 3466.95 samples/sec   Loss 9.1564   LearningRate 0.0822   Epoch: 1   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:27,640-Speed 3465.65 samples/sec   Loss 9.0358   LearningRate 0.0822   Epoch: 1   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:30,596-Speed 3465.64 samples/sec   Loss 9.1526   LearningRate 0.0822   Epoch: 1   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:33,549-Speed 3467.84 samples/sec   Loss 9.1457   LearningRate 0.0822   Epoch: 1   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:36,511-Speed 3458.60 samples/sec   Loss 8.9709   LearningRate 0.0822   Epoch: 1   Global Step: 10630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:39,478-Speed 3451.75 samples/sec   Loss 9.1944   LearningRate 0.0822   Epoch: 1   Global Step: 10640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:42,434-Speed 3464.79 samples/sec   Loss 9.1838   LearningRate 0.0821   Epoch: 1   Global Step: 10650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:45,388-Speed 3466.83 samples/sec   Loss 9.2665   LearningRate 0.0821   Epoch: 1   Global Step: 10660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:48,354-Speed 3454.60 samples/sec   Loss 9.2384   LearningRate 0.0821   Epoch: 1   Global Step: 10670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:51,314-Speed 3460.15 samples/sec   Loss 9.2665   LearningRate 0.0821   Epoch: 1   Global Step: 10680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:32:54,268-Speed 3466.67 samples/sec   Loss 9.1575   LearningRate 0.0821   Epoch: 1   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:32:57,224-Speed 3464.97 samples/sec   Loss 9.1672   LearningRate 0.0821   Epoch: 1   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:00,182-Speed 3463.00 samples/sec   Loss 8.9677   LearningRate 0.0821   Epoch: 1   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:03,143-Speed 3459.25 samples/sec   Loss 9.0035   LearningRate 0.0820   Epoch: 1   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:06,106-Speed 3456.65 samples/sec   Loss 9.1638   LearningRate 0.0820   Epoch: 1   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:09,062-Speed 3464.49 samples/sec   Loss 9.2630   LearningRate 0.0820   Epoch: 1   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:12,020-Speed 3462.27 samples/sec   Loss 9.1905   LearningRate 0.0820   Epoch: 1   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:14,988-Speed 3452.05 samples/sec   Loss 9.2254   LearningRate 0.0820   Epoch: 1   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:17,954-Speed 3453.45 samples/sec   Loss 9.0129   LearningRate 0.0820   Epoch: 1   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:20,915-Speed 3458.96 samples/sec   Loss 9.2196   LearningRate 0.0819   Epoch: 1   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:23,872-Speed 3463.24 samples/sec   Loss 9.0397   LearningRate 0.0819   Epoch: 1   Global Step: 10790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:33:26,828-Speed 3464.54 samples/sec   Loss 9.1843   LearningRate 0.0819   Epoch: 1   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:29,785-Speed 3463.95 samples/sec   Loss 9.0446   LearningRate 0.0819   Epoch: 1   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:32,746-Speed 3459.49 samples/sec   Loss 9.0373   LearningRate 0.0819   Epoch: 1   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:35,705-Speed 3460.51 samples/sec   Loss 8.9260   LearningRate 0.0819   Epoch: 1   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:38,664-Speed 3462.23 samples/sec   Loss 9.0704   LearningRate 0.0818   Epoch: 1   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:41,626-Speed 3458.77 samples/sec   Loss 9.2277   LearningRate 0.0818   Epoch: 1   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:44,583-Speed 3463.83 samples/sec   Loss 9.3381   LearningRate 0.0818   Epoch: 1   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:47,538-Speed 3465.32 samples/sec   Loss 9.0980   LearningRate 0.0818   Epoch: 1   Global Step: 10870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:50,500-Speed 3458.81 samples/sec   Loss 9.1473   LearningRate 0.0818   Epoch: 1   Global Step: 10880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:53,458-Speed 3461.63 samples/sec   Loss 9.2201   LearningRate 0.0818   Epoch: 1   Global Step: 10890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:56,405-Speed 3475.40 samples/sec   Loss 9.1369   LearningRate 0.0817   Epoch: 1   Global Step: 10900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:33:59,351-Speed 3476.79 samples/sec   Loss 9.1255   LearningRate 0.0817   Epoch: 1   Global Step: 10910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:02,310-Speed 3461.14 samples/sec   Loss 8.9893   LearningRate 0.0817   Epoch: 1   Global Step: 10920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:05,277-Speed 3452.08 samples/sec   Loss 9.1727   LearningRate 0.0817   Epoch: 1   Global Step: 10930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:08,251-Speed 3445.03 samples/sec   Loss 9.1715   LearningRate 0.0817   Epoch: 1   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:11,209-Speed 3462.65 samples/sec   Loss 9.3148   LearningRate 0.0817   Epoch: 1   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:14,164-Speed 3465.93 samples/sec   Loss 9.0534   LearningRate 0.0817   Epoch: 1   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:17,124-Speed 3460.35 samples/sec   Loss 9.0863   LearningRate 0.0816   Epoch: 1   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:20,085-Speed 3458.50 samples/sec   Loss 9.1506   LearningRate 0.0816   Epoch: 1   Global Step: 10980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:23,052-Speed 3452.57 samples/sec   Loss 9.1438   LearningRate 0.0816   Epoch: 1   Global Step: 10990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:26,020-Speed 3450.84 samples/sec   Loss 9.2623   LearningRate 0.0816   Epoch: 1   Global Step: 11000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:34:28,993-Speed 3445.45 samples/sec   Loss 9.1859   LearningRate 0.0816   Epoch: 1   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:31,955-Speed 3457.29 samples/sec   Loss 9.1339   LearningRate 0.0816   Epoch: 1   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:34,940-Speed 3431.57 samples/sec   Loss 8.9219   LearningRate 0.0815   Epoch: 1   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:37,912-Speed 3446.48 samples/sec   Loss 9.0326   LearningRate 0.0815   Epoch: 1   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:40,887-Speed 3443.00 samples/sec   Loss 9.1718   LearningRate 0.0815   Epoch: 1   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:43,861-Speed 3445.45 samples/sec   Loss 9.0993   LearningRate 0.0815   Epoch: 1   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:46,825-Speed 3455.67 samples/sec   Loss 9.0670   LearningRate 0.0815   Epoch: 1   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:49,784-Speed 3460.75 samples/sec   Loss 9.0391   LearningRate 0.0815   Epoch: 1   Global Step: 11080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:52,744-Speed 3460.64 samples/sec   Loss 9.0347   LearningRate 0.0814   Epoch: 1   Global Step: 11090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:55,706-Speed 3457.30 samples/sec   Loss 9.0816   LearningRate 0.0814   Epoch: 1   Global Step: 11100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:34:58,676-Speed 3449.26 samples/sec   Loss 9.0009   LearningRate 0.0814   Epoch: 1   Global Step: 11110   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:35:01,641-Speed 3454.53 samples/sec   Loss 8.9568   LearningRate 0.0814   Epoch: 1   Global Step: 11120   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:35:04,590-Speed 3472.84 samples/sec   Loss 9.1947   LearningRate 0.0814   Epoch: 1   Global Step: 11130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:07,552-Speed 3457.76 samples/sec   Loss 9.1268   LearningRate 0.0814   Epoch: 1   Global Step: 11140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:10,522-Speed 3449.44 samples/sec   Loss 9.1328   LearningRate 0.0814   Epoch: 1   Global Step: 11150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:13,483-Speed 3459.06 samples/sec   Loss 9.0296   LearningRate 0.0813   Epoch: 1   Global Step: 11160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:16,444-Speed 3458.71 samples/sec   Loss 9.2550   LearningRate 0.0813   Epoch: 1   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:19,403-Speed 3461.75 samples/sec   Loss 9.1382   LearningRate 0.0813   Epoch: 1   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:22,384-Speed 3435.23 samples/sec   Loss 9.1580   LearningRate 0.0813   Epoch: 1   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:25,349-Speed 3454.52 samples/sec   Loss 9.1952   LearningRate 0.0813   Epoch: 1   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:28,314-Speed 3454.99 samples/sec   Loss 9.1366   LearningRate 0.0813   Epoch: 1   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:31,284-Speed 3448.87 samples/sec   Loss 9.1156   LearningRate 0.0812   Epoch: 1   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:34,249-Speed 3454.55 samples/sec   Loss 8.9286   LearningRate 0.0812   Epoch: 1   Global Step: 11230   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:35:37,199-Speed 3472.54 samples/sec   Loss 8.9594   LearningRate 0.0812   Epoch: 1   Global Step: 11240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:40,161-Speed 3457.19 samples/sec   Loss 9.0871   LearningRate 0.0812   Epoch: 1   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:43,125-Speed 3455.58 samples/sec   Loss 8.9413   LearningRate 0.0812   Epoch: 1   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:46,085-Speed 3460.57 samples/sec   Loss 9.1832   LearningRate 0.0812   Epoch: 1   Global Step: 11270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:49,069-Speed 3432.67 samples/sec   Loss 9.1366   LearningRate 0.0811   Epoch: 1   Global Step: 11280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:52,027-Speed 3462.08 samples/sec   Loss 8.9133   LearningRate 0.0811   Epoch: 1   Global Step: 11290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:54,990-Speed 3456.84 samples/sec   Loss 8.9714   LearningRate 0.0811   Epoch: 1   Global Step: 11300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:35:57,954-Speed 3456.05 samples/sec   Loss 9.0260   LearningRate 0.0811   Epoch: 1   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:00,923-Speed 3450.00 samples/sec   Loss 9.0391   LearningRate 0.0811   Epoch: 1   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:03,887-Speed 3455.25 samples/sec   Loss 9.0272   LearningRate 0.0811   Epoch: 1   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:06,845-Speed 3462.74 samples/sec   Loss 9.0132   LearningRate 0.0811   Epoch: 1   Global Step: 11340   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:36:09,805-Speed 3460.21 samples/sec   Loss 9.0413   LearningRate 0.0810   Epoch: 1   Global Step: 11350   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:36:12,757-Speed 3470.10 samples/sec   Loss 9.1754   LearningRate 0.0810   Epoch: 1   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:15,795-Speed 3370.44 samples/sec   Loss 8.9825   LearningRate 0.0810   Epoch: 1   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:28,798-Speed 787.63 samples/sec   Loss 8.6045   LearningRate 0.0810   Epoch: 2   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:31,761-Speed 3456.39 samples/sec   Loss 8.4724   LearningRate 0.0810   Epoch: 2   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:34,729-Speed 3451.88 samples/sec   Loss 8.5073   LearningRate 0.0810   Epoch: 2   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:37,750-Speed 3390.23 samples/sec   Loss 8.3042   LearningRate 0.0809   Epoch: 2   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:40,732-Speed 3434.35 samples/sec   Loss 8.3410   LearningRate 0.0809   Epoch: 2   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:43,707-Speed 3443.62 samples/sec   Loss 8.3430   LearningRate 0.0809   Epoch: 2   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:46,682-Speed 3442.62 samples/sec   Loss 8.4696   LearningRate 0.0809   Epoch: 2   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:49,654-Speed 3446.69 samples/sec   Loss 8.2999   LearningRate 0.0809   Epoch: 2   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:36:52,635-Speed 3435.31 samples/sec   Loss 8.3283   LearningRate 0.0809   Epoch: 2   Global Step: 11460   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:36:55,613-Speed 3439.56 samples/sec   Loss 8.5107   LearningRate 0.0808   Epoch: 2   Global Step: 11470   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:36:58,583-Speed 3448.42 samples/sec   Loss 8.3868   LearningRate 0.0808   Epoch: 2   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:01,554-Speed 3447.94 samples/sec   Loss 8.5992   LearningRate 0.0808   Epoch: 2   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:04,542-Speed 3427.69 samples/sec   Loss 8.4515   LearningRate 0.0808   Epoch: 2   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:07,548-Speed 3407.53 samples/sec   Loss 8.5580   LearningRate 0.0808   Epoch: 2   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:10,541-Speed 3422.06 samples/sec   Loss 8.5337   LearningRate 0.0808   Epoch: 2   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:13,522-Speed 3436.43 samples/sec   Loss 8.4040   LearningRate 0.0808   Epoch: 2   Global Step: 11530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:16,503-Speed 3435.72 samples/sec   Loss 8.5570   LearningRate 0.0807   Epoch: 2   Global Step: 11540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:19,487-Speed 3433.34 samples/sec   Loss 8.6215   LearningRate 0.0807   Epoch: 2   Global Step: 11550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:22,465-Speed 3438.71 samples/sec   Loss 8.7114   LearningRate 0.0807   Epoch: 2   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:25,443-Speed 3439.21 samples/sec   Loss 8.5851   LearningRate 0.0807   Epoch: 2   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:28,412-Speed 3450.33 samples/sec   Loss 8.4642   LearningRate 0.0807   Epoch: 2   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:31,406-Speed 3420.77 samples/sec   Loss 8.6710   LearningRate 0.0807   Epoch: 2   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:34,386-Speed 3436.49 samples/sec   Loss 8.6856   LearningRate 0.0806   Epoch: 2   Global Step: 11600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:37,371-Speed 3431.85 samples/sec   Loss 8.7338   LearningRate 0.0806   Epoch: 2   Global Step: 11610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:40,360-Speed 3427.07 samples/sec   Loss 8.5024   LearningRate 0.0806   Epoch: 2   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:43,349-Speed 3426.07 samples/sec   Loss 8.5492   LearningRate 0.0806   Epoch: 2   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:46,408-Speed 3348.11 samples/sec   Loss 8.4412   LearningRate 0.0806   Epoch: 2   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:49,402-Speed 3421.44 samples/sec   Loss 8.6084   LearningRate 0.0806   Epoch: 2   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:52,390-Speed 3428.19 samples/sec   Loss 8.6236   LearningRate 0.0805   Epoch: 2   Global Step: 11660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:55,372-Speed 3434.76 samples/sec   Loss 8.5805   LearningRate 0.0805   Epoch: 2   Global Step: 11670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:37:58,347-Speed 3443.05 samples/sec   Loss 8.6169   LearningRate 0.0805   Epoch: 2   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:01,352-Speed 3408.28 samples/sec   Loss 8.5896   LearningRate 0.0805   Epoch: 2   Global Step: 11690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:04,333-Speed 3436.10 samples/sec   Loss 8.4465   LearningRate 0.0805   Epoch: 2   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:07,319-Speed 3429.47 samples/sec   Loss 8.6017   LearningRate 0.0805   Epoch: 2   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:10,292-Speed 3445.73 samples/sec   Loss 8.5032   LearningRate 0.0805   Epoch: 2   Global Step: 11720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:13,280-Speed 3427.73 samples/sec   Loss 8.6757   LearningRate 0.0804   Epoch: 2   Global Step: 11730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:16,269-Speed 3426.42 samples/sec   Loss 8.9050   LearningRate 0.0804   Epoch: 2   Global Step: 11740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:19,236-Speed 3452.20 samples/sec   Loss 8.8143   LearningRate 0.0804   Epoch: 2   Global Step: 11750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:22,213-Speed 3440.10 samples/sec   Loss 8.7233   LearningRate 0.0804   Epoch: 2   Global Step: 11760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:25,184-Speed 3447.98 samples/sec   Loss 8.7255   LearningRate 0.0804   Epoch: 2   Global Step: 11770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:28,156-Speed 3445.75 samples/sec   Loss 8.7900   LearningRate 0.0804   Epoch: 2   Global Step: 11780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:31,142-Speed 3432.20 samples/sec   Loss 8.7415   LearningRate 0.0803   Epoch: 2   Global Step: 11790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:34,133-Speed 3424.23 samples/sec   Loss 8.6530   LearningRate 0.0803   Epoch: 2   Global Step: 11800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:37,116-Speed 3432.55 samples/sec   Loss 8.5816   LearningRate 0.0803   Epoch: 2   Global Step: 11810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:40,090-Speed 3443.86 samples/sec   Loss 8.6871   LearningRate 0.0803   Epoch: 2   Global Step: 11820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:43,058-Speed 3451.54 samples/sec   Loss 8.5467   LearningRate 0.0803   Epoch: 2   Global Step: 11830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:46,025-Speed 3451.89 samples/sec   Loss 8.7309   LearningRate 0.0803   Epoch: 2   Global Step: 11840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:38:48,997-Speed 3446.71 samples/sec   Loss 8.6451   LearningRate 0.0802   Epoch: 2   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:51,974-Speed 3440.28 samples/sec   Loss 8.7882   LearningRate 0.0802   Epoch: 2   Global Step: 11860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:54,951-Speed 3440.75 samples/sec   Loss 8.7368   LearningRate 0.0802   Epoch: 2   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:38:57,922-Speed 3447.55 samples/sec   Loss 8.6361   LearningRate 0.0802   Epoch: 2   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:00,890-Speed 3450.54 samples/sec   Loss 8.5299   LearningRate 0.0802   Epoch: 2   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:03,901-Speed 3401.71 samples/sec   Loss 8.6496   LearningRate 0.0802   Epoch: 2   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:06,880-Speed 3438.03 samples/sec   Loss 8.7293   LearningRate 0.0802   Epoch: 2   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:09,849-Speed 3450.05 samples/sec   Loss 8.6040   LearningRate 0.0801   Epoch: 2   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:12,822-Speed 3445.55 samples/sec   Loss 8.6795   LearningRate 0.0801   Epoch: 2   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:15,792-Speed 3448.71 samples/sec   Loss 8.7296   LearningRate 0.0801   Epoch: 2   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:18,752-Speed 3460.59 samples/sec   Loss 8.8028   LearningRate 0.0801   Epoch: 2   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:39:21,780-Speed 3382.19 samples/sec   Loss 8.8391   LearningRate 0.0801   Epoch: 2   Global Step: 11960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:39:24,772-Speed 3423.15 samples/sec   Loss 8.7708   LearningRate 0.0801   Epoch: 2   Global Step: 11970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:39:27,747-Speed 3442.14 samples/sec   Loss 8.6809   LearningRate 0.0800   Epoch: 2   Global Step: 11980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:39:30,717-Speed 3448.98 samples/sec   Loss 8.6785   LearningRate 0.0800   Epoch: 2   Global Step: 11990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:39:33,685-Speed 3450.82 samples/sec   Loss 8.5722   LearningRate 0.0800   Epoch: 2   Global Step: 12000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:40:16,957-[lfw][12000]XNorm: 22.718744
Training: 2022-04-27 02:40:16,958-[lfw][12000]Accuracy-Flip: 0.99600+-0.00291
Training: 2022-04-27 02:40:16,958-[lfw][12000]Accuracy-Highest: 0.99600
Training: 2022-04-27 02:41:07,144-[cfp_fp][12000]XNorm: 19.773736
Training: 2022-04-27 02:41:07,145-[cfp_fp][12000]Accuracy-Flip: 0.93214+-0.01344
Training: 2022-04-27 02:41:07,145-[cfp_fp][12000]Accuracy-Highest: 0.93214
Training: 2022-04-27 02:41:50,248-[agedb_30][12000]XNorm: 22.603312
Training: 2022-04-27 02:41:50,249-[agedb_30][12000]Accuracy-Flip: 0.96700+-0.00985
Training: 2022-04-27 02:41:50,249-[agedb_30][12000]Accuracy-Highest: 0.96700
Training: 2022-04-27 02:41:53,212-Speed 73.39 samples/sec   Loss 8.8026   LearningRate 0.0800   Epoch: 2   Global Step: 12010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:41:56,163-Speed 3470.53 samples/sec   Loss 8.6237   LearningRate 0.0800   Epoch: 2   Global Step: 12020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:41:59,126-Speed 3456.82 samples/sec   Loss 8.7304   LearningRate 0.0800   Epoch: 2   Global Step: 12030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:02,083-Speed 3464.29 samples/sec   Loss 8.6652   LearningRate 0.0799   Epoch: 2   Global Step: 12040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:05,047-Speed 3455.71 samples/sec   Loss 8.6697   LearningRate 0.0799   Epoch: 2   Global Step: 12050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:08,007-Speed 3459.39 samples/sec   Loss 8.6882   LearningRate 0.0799   Epoch: 2   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:10,992-Speed 3432.12 samples/sec   Loss 8.6894   LearningRate 0.0799   Epoch: 2   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:13,961-Speed 3449.39 samples/sec   Loss 8.7036   LearningRate 0.0799   Epoch: 2   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:17,022-Speed 3346.20 samples/sec   Loss 8.7359   LearningRate 0.0799   Epoch: 2   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:20,010-Speed 3428.27 samples/sec   Loss 8.5775   LearningRate 0.0799   Epoch: 2   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:22,974-Speed 3455.15 samples/sec   Loss 8.8298   LearningRate 0.0798   Epoch: 2   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:25,963-Speed 3426.54 samples/sec   Loss 8.7553   LearningRate 0.0798   Epoch: 2   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:28,930-Speed 3452.11 samples/sec   Loss 8.6425   LearningRate 0.0798   Epoch: 2   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:31,900-Speed 3449.06 samples/sec   Loss 8.6533   LearningRate 0.0798   Epoch: 2   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:34,875-Speed 3442.35 samples/sec   Loss 8.6975   LearningRate 0.0798   Epoch: 2   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:37,823-Speed 3474.36 samples/sec   Loss 8.8720   LearningRate 0.0798   Epoch: 2   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:42:40,772-Speed 3474.14 samples/sec   Loss 8.7509   LearningRate 0.0797   Epoch: 2   Global Step: 12170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:43,738-Speed 3452.43 samples/sec   Loss 8.7746   LearningRate 0.0797   Epoch: 2   Global Step: 12180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:46,698-Speed 3460.87 samples/sec   Loss 8.7567   LearningRate 0.0797   Epoch: 2   Global Step: 12190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:49,666-Speed 3449.91 samples/sec   Loss 8.6847   LearningRate 0.0797   Epoch: 2   Global Step: 12200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:52,634-Speed 3451.46 samples/sec   Loss 8.6322   LearningRate 0.0797   Epoch: 2   Global Step: 12210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:55,590-Speed 3464.84 samples/sec   Loss 8.7045   LearningRate 0.0797   Epoch: 2   Global Step: 12220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:42:58,566-Speed 3442.41 samples/sec   Loss 8.7654   LearningRate 0.0796   Epoch: 2   Global Step: 12230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:01,523-Speed 3463.13 samples/sec   Loss 8.6886   LearningRate 0.0796   Epoch: 2   Global Step: 12240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:04,486-Speed 3456.47 samples/sec   Loss 8.9470   LearningRate 0.0796   Epoch: 2   Global Step: 12250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:07,474-Speed 3428.42 samples/sec   Loss 8.7775   LearningRate 0.0796   Epoch: 2   Global Step: 12260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:10,431-Speed 3464.50 samples/sec   Loss 8.7681   LearningRate 0.0796   Epoch: 2   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:43:13,394-Speed 3456.62 samples/sec   Loss 8.6623   LearningRate 0.0796   Epoch: 2   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:43:16,355-Speed 3458.42 samples/sec   Loss 8.8449   LearningRate 0.0796   Epoch: 2   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:43:19,314-Speed 3461.25 samples/sec   Loss 8.6800   LearningRate 0.0795   Epoch: 2   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:43:22,297-Speed 3433.91 samples/sec   Loss 8.5401   LearningRate 0.0795   Epoch: 2   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:43:25,289-Speed 3423.19 samples/sec   Loss 8.8403   LearningRate 0.0795   Epoch: 2   Global Step: 12320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:28,262-Speed 3444.92 samples/sec   Loss 8.7064   LearningRate 0.0795   Epoch: 2   Global Step: 12330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:31,223-Speed 3459.69 samples/sec   Loss 8.7585   LearningRate 0.0795   Epoch: 2   Global Step: 12340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:34,180-Speed 3463.29 samples/sec   Loss 8.6683   LearningRate 0.0795   Epoch: 2   Global Step: 12350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:37,144-Speed 3455.95 samples/sec   Loss 8.8703   LearningRate 0.0794   Epoch: 2   Global Step: 12360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:40,102-Speed 3462.50 samples/sec   Loss 8.7548   LearningRate 0.0794   Epoch: 2   Global Step: 12370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:43,058-Speed 3465.59 samples/sec   Loss 8.8291   LearningRate 0.0794   Epoch: 2   Global Step: 12380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:46,012-Speed 3466.82 samples/sec   Loss 8.6946   LearningRate 0.0794   Epoch: 2   Global Step: 12390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:48,968-Speed 3464.60 samples/sec   Loss 8.7860   LearningRate 0.0794   Epoch: 2   Global Step: 12400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:51,926-Speed 3463.27 samples/sec   Loss 8.7044   LearningRate 0.0794   Epoch: 2   Global Step: 12410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:43:54,884-Speed 3462.56 samples/sec   Loss 8.6381   LearningRate 0.0793   Epoch: 2   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:43:57,854-Speed 3448.88 samples/sec   Loss 8.5345   LearningRate 0.0793   Epoch: 2   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:00,817-Speed 3456.04 samples/sec   Loss 8.7549   LearningRate 0.0793   Epoch: 2   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:03,785-Speed 3451.50 samples/sec   Loss 8.7293   LearningRate 0.0793   Epoch: 2   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:06,750-Speed 3455.23 samples/sec   Loss 8.5426   LearningRate 0.0793   Epoch: 2   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:09,715-Speed 3453.45 samples/sec   Loss 8.7692   LearningRate 0.0793   Epoch: 2   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:12,701-Speed 3429.83 samples/sec   Loss 8.8623   LearningRate 0.0793   Epoch: 2   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:15,670-Speed 3449.66 samples/sec   Loss 8.6734   LearningRate 0.0792   Epoch: 2   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:18,646-Speed 3441.85 samples/sec   Loss 8.6806   LearningRate 0.0792   Epoch: 2   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:21,611-Speed 3454.90 samples/sec   Loss 8.6066   LearningRate 0.0792   Epoch: 2   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:24,590-Speed 3438.03 samples/sec   Loss 8.6651   LearningRate 0.0792   Epoch: 2   Global Step: 12520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:44:27,564-Speed 3444.76 samples/sec   Loss 8.6569   LearningRate 0.0792   Epoch: 2   Global Step: 12530   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:44:30,525-Speed 3460.03 samples/sec   Loss 8.7488   LearningRate 0.0792   Epoch: 2   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:44:33,470-Speed 3477.87 samples/sec   Loss 8.6435   LearningRate 0.0791   Epoch: 2   Global Step: 12550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:36,450-Speed 3436.71 samples/sec   Loss 8.6274   LearningRate 0.0791   Epoch: 2   Global Step: 12560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:39,412-Speed 3458.24 samples/sec   Loss 8.4967   LearningRate 0.0791   Epoch: 2   Global Step: 12570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:42,377-Speed 3453.72 samples/sec   Loss 8.6512   LearningRate 0.0791   Epoch: 2   Global Step: 12580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:45,344-Speed 3452.22 samples/sec   Loss 8.8442   LearningRate 0.0791   Epoch: 2   Global Step: 12590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:48,304-Speed 3460.24 samples/sec   Loss 8.6951   LearningRate 0.0791   Epoch: 2   Global Step: 12600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:51,272-Speed 3451.55 samples/sec   Loss 8.7102   LearningRate 0.0791   Epoch: 2   Global Step: 12610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:54,234-Speed 3457.68 samples/sec   Loss 8.7412   LearningRate 0.0790   Epoch: 2   Global Step: 12620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:44:57,200-Speed 3453.74 samples/sec   Loss 8.8401   LearningRate 0.0790   Epoch: 2   Global Step: 12630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:45:00,179-Speed 3437.84 samples/sec   Loss 8.7314   LearningRate 0.0790   Epoch: 2   Global Step: 12640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:45:03,161-Speed 3434.27 samples/sec   Loss 8.5555   LearningRate 0.0790   Epoch: 2   Global Step: 12650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:06,155-Speed 3422.35 samples/sec   Loss 8.8152   LearningRate 0.0790   Epoch: 2   Global Step: 12660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:09,116-Speed 3459.30 samples/sec   Loss 8.6343   LearningRate 0.0790   Epoch: 2   Global Step: 12670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:12,089-Speed 3444.59 samples/sec   Loss 8.6766   LearningRate 0.0789   Epoch: 2   Global Step: 12680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:15,071-Speed 3434.23 samples/sec   Loss 8.5970   LearningRate 0.0789   Epoch: 2   Global Step: 12690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:18,039-Speed 3451.55 samples/sec   Loss 8.6494   LearningRate 0.0789   Epoch: 2   Global Step: 12700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:21,004-Speed 3454.48 samples/sec   Loss 8.4130   LearningRate 0.0789   Epoch: 2   Global Step: 12710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:23,972-Speed 3451.02 samples/sec   Loss 8.5432   LearningRate 0.0789   Epoch: 2   Global Step: 12720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:26,933-Speed 3458.78 samples/sec   Loss 8.6331   LearningRate 0.0789   Epoch: 2   Global Step: 12730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:29,900-Speed 3453.09 samples/sec   Loss 8.7386   LearningRate 0.0788   Epoch: 2   Global Step: 12740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:32,864-Speed 3454.45 samples/sec   Loss 8.5989   LearningRate 0.0788   Epoch: 2   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:45:35,835-Speed 3448.45 samples/sec   Loss 8.5863   LearningRate 0.0788   Epoch: 2   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:45:38,808-Speed 3444.27 samples/sec   Loss 8.7080   LearningRate 0.0788   Epoch: 2   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:45:41,801-Speed 3422.31 samples/sec   Loss 8.5807   LearningRate 0.0788   Epoch: 2   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:45:44,782-Speed 3435.84 samples/sec   Loss 8.7273   LearningRate 0.0788   Epoch: 2   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:45:47,768-Speed 3429.56 samples/sec   Loss 8.6895   LearningRate 0.0788   Epoch: 2   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:45:50,737-Speed 3450.53 samples/sec   Loss 8.5051   LearningRate 0.0787   Epoch: 2   Global Step: 12810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:53,709-Speed 3446.47 samples/sec   Loss 8.5868   LearningRate 0.0787   Epoch: 2   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:56,691-Speed 3434.91 samples/sec   Loss 8.7688   LearningRate 0.0787   Epoch: 2   Global Step: 12830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:45:59,668-Speed 3440.58 samples/sec   Loss 8.6002   LearningRate 0.0787   Epoch: 2   Global Step: 12840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:46:02,638-Speed 3448.93 samples/sec   Loss 8.6849   LearningRate 0.0787   Epoch: 2   Global Step: 12850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:46:05,608-Speed 3448.33 samples/sec   Loss 8.6045   LearningRate 0.0787   Epoch: 2   Global Step: 12860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:46:08,570-Speed 3457.60 samples/sec   Loss 8.8041   LearningRate 0.0786   Epoch: 2   Global Step: 12870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:46:11,548-Speed 3440.31 samples/sec   Loss 8.6302   LearningRate 0.0786   Epoch: 2   Global Step: 12880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:46:14,518-Speed 3447.83 samples/sec   Loss 8.6677   LearningRate 0.0786   Epoch: 2   Global Step: 12890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:46:17,499-Speed 3435.52 samples/sec   Loss 8.6984   LearningRate 0.0786   Epoch: 2   Global Step: 12900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:46:20,470-Speed 3448.50 samples/sec   Loss 8.7387   LearningRate 0.0786   Epoch: 2   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:23,441-Speed 3446.94 samples/sec   Loss 8.6353   LearningRate 0.0786   Epoch: 2   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:26,408-Speed 3452.53 samples/sec   Loss 8.6365   LearningRate 0.0786   Epoch: 2   Global Step: 12930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:29,384-Speed 3442.08 samples/sec   Loss 8.8280   LearningRate 0.0785   Epoch: 2   Global Step: 12940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:32,361-Speed 3440.23 samples/sec   Loss 8.7879   LearningRate 0.0785   Epoch: 2   Global Step: 12950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:35,330-Speed 3450.05 samples/sec   Loss 8.6240   LearningRate 0.0785   Epoch: 2   Global Step: 12960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:38,303-Speed 3446.20 samples/sec   Loss 8.5727   LearningRate 0.0785   Epoch: 2   Global Step: 12970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:41,324-Speed 3389.52 samples/sec   Loss 8.5543   LearningRate 0.0785   Epoch: 2   Global Step: 12980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:44,296-Speed 3446.23 samples/sec   Loss 8.7376   LearningRate 0.0785   Epoch: 2   Global Step: 12990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:47,262-Speed 3453.91 samples/sec   Loss 8.6671   LearningRate 0.0784   Epoch: 2   Global Step: 13000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:46:50,250-Speed 3427.81 samples/sec   Loss 8.6905   LearningRate 0.0784   Epoch: 2   Global Step: 13010   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:46:53,217-Speed 3452.27 samples/sec   Loss 8.6740   LearningRate 0.0784   Epoch: 2   Global Step: 13020   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:46:56,186-Speed 3450.04 samples/sec   Loss 8.6933   LearningRate 0.0784   Epoch: 2   Global Step: 13030   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:46:59,157-Speed 3446.27 samples/sec   Loss 8.6197   LearningRate 0.0784   Epoch: 2   Global Step: 13040   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:47:02,120-Speed 3457.44 samples/sec   Loss 8.8459   LearningRate 0.0784   Epoch: 2   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:05,124-Speed 3409.87 samples/sec   Loss 8.6361   LearningRate 0.0784   Epoch: 2   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:08,087-Speed 3456.22 samples/sec   Loss 8.5787   LearningRate 0.0783   Epoch: 2   Global Step: 13070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:11,056-Speed 3450.31 samples/sec   Loss 8.6464   LearningRate 0.0783   Epoch: 2   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:14,031-Speed 3443.36 samples/sec   Loss 8.5983   LearningRate 0.0783   Epoch: 2   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:16,999-Speed 3450.84 samples/sec   Loss 8.7832   LearningRate 0.0783   Epoch: 2   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:19,989-Speed 3425.58 samples/sec   Loss 8.6987   LearningRate 0.0783   Epoch: 2   Global Step: 13110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:22,961-Speed 3446.61 samples/sec   Loss 8.7217   LearningRate 0.0783   Epoch: 2   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:25,952-Speed 3423.58 samples/sec   Loss 8.5792   LearningRate 0.0782   Epoch: 2   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:28,924-Speed 3446.75 samples/sec   Loss 8.6400   LearningRate 0.0782   Epoch: 2   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:31,889-Speed 3453.82 samples/sec   Loss 8.5550   LearningRate 0.0782   Epoch: 2   Global Step: 13150   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:47:34,854-Speed 3454.13 samples/sec   Loss 8.6245   LearningRate 0.0782   Epoch: 2   Global Step: 13160   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:47:37,814-Speed 3461.51 samples/sec   Loss 8.6155   LearningRate 0.0782   Epoch: 2   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:40,785-Speed 3446.84 samples/sec   Loss 8.5742   LearningRate 0.0782   Epoch: 2   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:47:43,755-Speed 3449.52 samples/sec   Loss 8.7935   LearningRate 0.0781   Epoch: 2   Global Step: 13190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:47:46,818-Speed 3343.24 samples/sec   Loss 8.4200   LearningRate 0.0781   Epoch: 2   Global Step: 13200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:47:49,857-Speed 3370.95 samples/sec   Loss 8.5739   LearningRate 0.0781   Epoch: 2   Global Step: 13210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:47:52,873-Speed 3395.30 samples/sec   Loss 8.6461   LearningRate 0.0781   Epoch: 2   Global Step: 13220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:47:55,850-Speed 3440.32 samples/sec   Loss 8.6824   LearningRate 0.0781   Epoch: 2   Global Step: 13230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:47:58,824-Speed 3444.10 samples/sec   Loss 8.3772   LearningRate 0.0781   Epoch: 2   Global Step: 13240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:48:01,808-Speed 3432.80 samples/sec   Loss 8.7459   LearningRate 0.0781   Epoch: 2   Global Step: 13250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:48:04,788-Speed 3437.63 samples/sec   Loss 8.5587   LearningRate 0.0780   Epoch: 2   Global Step: 13260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:48:07,766-Speed 3439.17 samples/sec   Loss 8.7301   LearningRate 0.0780   Epoch: 2   Global Step: 13270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:48:10,751-Speed 3430.81 samples/sec   Loss 8.6433   LearningRate 0.0780   Epoch: 2   Global Step: 13280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:48:13,758-Speed 3406.14 samples/sec   Loss 8.5048   LearningRate 0.0780   Epoch: 2   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:16,724-Speed 3453.20 samples/sec   Loss 8.3605   LearningRate 0.0780   Epoch: 2   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:19,699-Speed 3442.78 samples/sec   Loss 8.5453   LearningRate 0.0780   Epoch: 2   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:22,680-Speed 3436.01 samples/sec   Loss 8.4781   LearningRate 0.0779   Epoch: 2   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:25,683-Speed 3410.27 samples/sec   Loss 8.6381   LearningRate 0.0779   Epoch: 2   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:28,652-Speed 3450.17 samples/sec   Loss 8.5182   LearningRate 0.0779   Epoch: 2   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:31,619-Speed 3452.96 samples/sec   Loss 8.5179   LearningRate 0.0779   Epoch: 2   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:34,604-Speed 3430.77 samples/sec   Loss 8.5631   LearningRate 0.0779   Epoch: 2   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:37,572-Speed 3451.30 samples/sec   Loss 8.5247   LearningRate 0.0779   Epoch: 2   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:40,540-Speed 3450.48 samples/sec   Loss 8.6175   LearningRate 0.0779   Epoch: 2   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:43,498-Speed 3463.03 samples/sec   Loss 8.4313   LearningRate 0.0778   Epoch: 2   Global Step: 13390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:46,480-Speed 3434.78 samples/sec   Loss 8.4873   LearningRate 0.0778   Epoch: 2   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:49,466-Speed 3429.94 samples/sec   Loss 8.4842   LearningRate 0.0778   Epoch: 2   Global Step: 13410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:52,453-Speed 3429.25 samples/sec   Loss 8.4239   LearningRate 0.0778   Epoch: 2   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:55,426-Speed 3444.07 samples/sec   Loss 8.4318   LearningRate 0.0778   Epoch: 2   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:48:58,401-Speed 3443.28 samples/sec   Loss 8.3408   LearningRate 0.0778   Epoch: 2   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:01,385-Speed 3432.66 samples/sec   Loss 8.7077   LearningRate 0.0777   Epoch: 2   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:04,368-Speed 3433.37 samples/sec   Loss 8.5171   LearningRate 0.0777   Epoch: 2   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:07,338-Speed 3449.60 samples/sec   Loss 8.3781   LearningRate 0.0777   Epoch: 2   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:10,314-Speed 3441.13 samples/sec   Loss 8.6604   LearningRate 0.0777   Epoch: 2   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:13,294-Speed 3437.04 samples/sec   Loss 8.5272   LearningRate 0.0777   Epoch: 2   Global Step: 13490   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:49:16,264-Speed 3448.77 samples/sec   Loss 8.4905   LearningRate 0.0777   Epoch: 2   Global Step: 13500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:49:19,228-Speed 3455.22 samples/sec   Loss 8.5595   LearningRate 0.0777   Epoch: 2   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:22,205-Speed 3440.14 samples/sec   Loss 8.6737   LearningRate 0.0776   Epoch: 2   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:25,178-Speed 3446.13 samples/sec   Loss 8.4256   LearningRate 0.0776   Epoch: 2   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:28,158-Speed 3436.25 samples/sec   Loss 8.7204   LearningRate 0.0776   Epoch: 2   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:31,155-Speed 3418.98 samples/sec   Loss 8.4390   LearningRate 0.0776   Epoch: 2   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:34,126-Speed 3447.04 samples/sec   Loss 8.5222   LearningRate 0.0776   Epoch: 2   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:37,119-Speed 3422.13 samples/sec   Loss 8.4967   LearningRate 0.0776   Epoch: 2   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:40,095-Speed 3441.86 samples/sec   Loss 8.5641   LearningRate 0.0775   Epoch: 2   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:43,062-Speed 3451.60 samples/sec   Loss 8.6211   LearningRate 0.0775   Epoch: 2   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:46,031-Speed 3450.15 samples/sec   Loss 8.4624   LearningRate 0.0775   Epoch: 2   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:49,027-Speed 3417.69 samples/sec   Loss 8.4569   LearningRate 0.0775   Epoch: 2   Global Step: 13610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:49:51,991-Speed 3455.40 samples/sec   Loss 8.6149   LearningRate 0.0775   Epoch: 2   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:49:54,960-Speed 3450.96 samples/sec   Loss 8.5797   LearningRate 0.0775   Epoch: 2   Global Step: 13630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:49:57,934-Speed 3443.31 samples/sec   Loss 8.4693   LearningRate 0.0774   Epoch: 2   Global Step: 13640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:00,910-Speed 3443.36 samples/sec   Loss 8.5435   LearningRate 0.0774   Epoch: 2   Global Step: 13650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:03,888-Speed 3438.80 samples/sec   Loss 8.5398   LearningRate 0.0774   Epoch: 2   Global Step: 13660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:06,866-Speed 3438.72 samples/sec   Loss 8.4982   LearningRate 0.0774   Epoch: 2   Global Step: 13670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:09,836-Speed 3449.60 samples/sec   Loss 8.5365   LearningRate 0.0774   Epoch: 2   Global Step: 13680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:12,808-Speed 3445.42 samples/sec   Loss 8.4630   LearningRate 0.0774   Epoch: 2   Global Step: 13690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:15,777-Speed 3449.53 samples/sec   Loss 8.2404   LearningRate 0.0774   Epoch: 2   Global Step: 13700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:18,757-Speed 3438.17 samples/sec   Loss 8.3486   LearningRate 0.0773   Epoch: 2   Global Step: 13710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:21,733-Speed 3441.04 samples/sec   Loss 8.4595   LearningRate 0.0773   Epoch: 2   Global Step: 13720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:24,706-Speed 3445.23 samples/sec   Loss 8.4331   LearningRate 0.0773   Epoch: 2   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:50:27,688-Speed 3434.93 samples/sec   Loss 8.5395   LearningRate 0.0773   Epoch: 2   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:50:30,677-Speed 3427.14 samples/sec   Loss 8.4466   LearningRate 0.0773   Epoch: 2   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:50:33,631-Speed 3467.22 samples/sec   Loss 8.4398   LearningRate 0.0773   Epoch: 2   Global Step: 13760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:36,609-Speed 3438.60 samples/sec   Loss 8.4833   LearningRate 0.0772   Epoch: 2   Global Step: 13770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:39,598-Speed 3427.38 samples/sec   Loss 8.4017   LearningRate 0.0772   Epoch: 2   Global Step: 13780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:42,590-Speed 3423.32 samples/sec   Loss 8.6306   LearningRate 0.0772   Epoch: 2   Global Step: 13790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:45,573-Speed 3433.36 samples/sec   Loss 8.5203   LearningRate 0.0772   Epoch: 2   Global Step: 13800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:48,545-Speed 3446.76 samples/sec   Loss 8.4349   LearningRate 0.0772   Epoch: 2   Global Step: 13810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:51,514-Speed 3449.86 samples/sec   Loss 8.5099   LearningRate 0.0772   Epoch: 2   Global Step: 13820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:54,491-Speed 3440.11 samples/sec   Loss 8.4302   LearningRate 0.0772   Epoch: 2   Global Step: 13830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:50:57,479-Speed 3427.52 samples/sec   Loss 8.4114   LearningRate 0.0771   Epoch: 2   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:51:00,460-Speed 3435.57 samples/sec   Loss 8.4112   LearningRate 0.0771   Epoch: 2   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:51:03,433-Speed 3446.04 samples/sec   Loss 8.5778   LearningRate 0.0771   Epoch: 2   Global Step: 13860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:06,403-Speed 3447.91 samples/sec   Loss 8.4096   LearningRate 0.0771   Epoch: 2   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:09,394-Speed 3424.54 samples/sec   Loss 8.5038   LearningRate 0.0771   Epoch: 2   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:12,366-Speed 3446.61 samples/sec   Loss 8.3273   LearningRate 0.0771   Epoch: 2   Global Step: 13890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:15,355-Speed 3427.41 samples/sec   Loss 8.4162   LearningRate 0.0770   Epoch: 2   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:18,325-Speed 3449.19 samples/sec   Loss 8.5203   LearningRate 0.0770   Epoch: 2   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:21,298-Speed 3445.13 samples/sec   Loss 8.4826   LearningRate 0.0770   Epoch: 2   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:24,270-Speed 3445.38 samples/sec   Loss 8.5018   LearningRate 0.0770   Epoch: 2   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:27,268-Speed 3417.87 samples/sec   Loss 8.4656   LearningRate 0.0770   Epoch: 2   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:30,237-Speed 3448.83 samples/sec   Loss 8.3657   LearningRate 0.0770   Epoch: 2   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:33,213-Speed 3441.87 samples/sec   Loss 8.4240   LearningRate 0.0770   Epoch: 2   Global Step: 13960   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:51:36,185-Speed 3446.20 samples/sec   Loss 8.4003   LearningRate 0.0769   Epoch: 2   Global Step: 13970   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:51:39,156-Speed 3447.99 samples/sec   Loss 8.4584   LearningRate 0.0769   Epoch: 2   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:42,150-Speed 3420.99 samples/sec   Loss 8.4895   LearningRate 0.0769   Epoch: 2   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:51:45,125-Speed 3442.95 samples/sec   Loss 8.5025   LearningRate 0.0769   Epoch: 2   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:52:28,336-[lfw][14000]XNorm: 21.906293
Training: 2022-04-27 02:52:28,336-[lfw][14000]Accuracy-Flip: 0.99617+-0.00279
Training: 2022-04-27 02:52:28,337-[lfw][14000]Accuracy-Highest: 0.99617
Training: 2022-04-27 02:53:18,525-[cfp_fp][14000]XNorm: 19.340853
Training: 2022-04-27 02:53:18,526-[cfp_fp][14000]Accuracy-Flip: 0.93029+-0.01142
Training: 2022-04-27 02:53:18,526-[cfp_fp][14000]Accuracy-Highest: 0.93214
Training: 2022-04-27 02:54:01,701-[agedb_30][14000]XNorm: 21.228078
Training: 2022-04-27 02:54:01,702-[agedb_30][14000]Accuracy-Flip: 0.96217+-0.00940
Training: 2022-04-27 02:54:01,702-[agedb_30][14000]Accuracy-Highest: 0.96700
Training: 2022-04-27 02:54:04,658-Speed 73.39 samples/sec   Loss 8.5295   LearningRate 0.0769   Epoch: 2   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:07,611-Speed 3468.93 samples/sec   Loss 8.5830   LearningRate 0.0769   Epoch: 2   Global Step: 14020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:10,580-Speed 3449.71 samples/sec   Loss 8.4358   LearningRate 0.0768   Epoch: 2   Global Step: 14030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:13,536-Speed 3464.91 samples/sec   Loss 8.3354   LearningRate 0.0768   Epoch: 2   Global Step: 14040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:16,506-Speed 3448.39 samples/sec   Loss 8.5037   LearningRate 0.0768   Epoch: 2   Global Step: 14050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:19,466-Speed 3460.79 samples/sec   Loss 8.4184   LearningRate 0.0768   Epoch: 2   Global Step: 14060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:22,423-Speed 3463.71 samples/sec   Loss 8.4899   LearningRate 0.0768   Epoch: 2   Global Step: 14070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:25,382-Speed 3460.69 samples/sec   Loss 8.2598   LearningRate 0.0768   Epoch: 2   Global Step: 14080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:28,357-Speed 3443.51 samples/sec   Loss 8.3635   LearningRate 0.0768   Epoch: 2   Global Step: 14090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:31,326-Speed 3449.14 samples/sec   Loss 8.4603   LearningRate 0.0767   Epoch: 2   Global Step: 14100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:34,289-Speed 3457.64 samples/sec   Loss 8.2471   LearningRate 0.0767   Epoch: 2   Global Step: 14110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:54:37,262-Speed 3444.50 samples/sec   Loss 8.4468   LearningRate 0.0767   Epoch: 2   Global Step: 14120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:40,232-Speed 3448.99 samples/sec   Loss 8.4797   LearningRate 0.0767   Epoch: 2   Global Step: 14130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:43,201-Speed 3449.77 samples/sec   Loss 8.4550   LearningRate 0.0767   Epoch: 2   Global Step: 14140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:46,171-Speed 3448.13 samples/sec   Loss 8.4771   LearningRate 0.0767   Epoch: 2   Global Step: 14150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:49,152-Speed 3435.91 samples/sec   Loss 8.4527   LearningRate 0.0766   Epoch: 2   Global Step: 14160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:52,129-Speed 3441.16 samples/sec   Loss 8.3487   LearningRate 0.0766   Epoch: 2   Global Step: 14170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:55,120-Speed 3424.46 samples/sec   Loss 8.3548   LearningRate 0.0766   Epoch: 2   Global Step: 14180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:54:58,095-Speed 3442.11 samples/sec   Loss 8.3656   LearningRate 0.0766   Epoch: 2   Global Step: 14190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:01,060-Speed 3455.10 samples/sec   Loss 8.2896   LearningRate 0.0766   Epoch: 2   Global Step: 14200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:04,037-Speed 3440.99 samples/sec   Loss 8.5388   LearningRate 0.0766   Epoch: 2   Global Step: 14210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:07,073-Speed 3373.73 samples/sec   Loss 8.4654   LearningRate 0.0766   Epoch: 2   Global Step: 14220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:10,042-Speed 3448.63 samples/sec   Loss 8.3958   LearningRate 0.0765   Epoch: 2   Global Step: 14230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:13,024-Speed 3434.99 samples/sec   Loss 8.3633   LearningRate 0.0765   Epoch: 2   Global Step: 14240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:16,000-Speed 3441.45 samples/sec   Loss 8.3778   LearningRate 0.0765   Epoch: 2   Global Step: 14250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:18,971-Speed 3447.65 samples/sec   Loss 8.2928   LearningRate 0.0765   Epoch: 2   Global Step: 14260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:21,941-Speed 3448.68 samples/sec   Loss 8.3900   LearningRate 0.0765   Epoch: 2   Global Step: 14270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:24,909-Speed 3450.86 samples/sec   Loss 8.3474   LearningRate 0.0765   Epoch: 2   Global Step: 14280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:27,879-Speed 3448.42 samples/sec   Loss 8.2086   LearningRate 0.0764   Epoch: 2   Global Step: 14290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:55:30,853-Speed 3444.59 samples/sec   Loss 8.4201   LearningRate 0.0764   Epoch: 2   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:33,815-Speed 3457.63 samples/sec   Loss 8.3307   LearningRate 0.0764   Epoch: 2   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:36,795-Speed 3437.89 samples/sec   Loss 8.4095   LearningRate 0.0764   Epoch: 2   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:39,760-Speed 3453.36 samples/sec   Loss 8.4761   LearningRate 0.0764   Epoch: 2   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:42,734-Speed 3444.78 samples/sec   Loss 8.4837   LearningRate 0.0764   Epoch: 2   Global Step: 14340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:45,703-Speed 3448.70 samples/sec   Loss 8.3683   LearningRate 0.0764   Epoch: 2   Global Step: 14350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:48,671-Speed 3451.84 samples/sec   Loss 8.4261   LearningRate 0.0763   Epoch: 2   Global Step: 14360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:51,631-Speed 3459.52 samples/sec   Loss 8.4429   LearningRate 0.0763   Epoch: 2   Global Step: 14370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:54,586-Speed 3466.16 samples/sec   Loss 8.3968   LearningRate 0.0763   Epoch: 2   Global Step: 14380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:55:57,571-Speed 3431.40 samples/sec   Loss 8.2451   LearningRate 0.0763   Epoch: 2   Global Step: 14390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:00,529-Speed 3463.52 samples/sec   Loss 8.2697   LearningRate 0.0763   Epoch: 2   Global Step: 14400   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:56:03,476-Speed 3475.61 samples/sec   Loss 8.2790   LearningRate 0.0763   Epoch: 2   Global Step: 14410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:06,442-Speed 3453.21 samples/sec   Loss 8.4530   LearningRate 0.0762   Epoch: 2   Global Step: 14420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:09,399-Speed 3463.35 samples/sec   Loss 8.2594   LearningRate 0.0762   Epoch: 2   Global Step: 14430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:12,359-Speed 3460.31 samples/sec   Loss 8.3954   LearningRate 0.0762   Epoch: 2   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:15,312-Speed 3467.72 samples/sec   Loss 8.3766   LearningRate 0.0762   Epoch: 2   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:18,267-Speed 3466.58 samples/sec   Loss 8.3360   LearningRate 0.0762   Epoch: 2   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:21,228-Speed 3459.50 samples/sec   Loss 8.3473   LearningRate 0.0762   Epoch: 2   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:24,192-Speed 3455.21 samples/sec   Loss 8.3535   LearningRate 0.0762   Epoch: 2   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:27,159-Speed 3451.96 samples/sec   Loss 8.2597   LearningRate 0.0761   Epoch: 2   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:30,136-Speed 3441.12 samples/sec   Loss 8.3162   LearningRate 0.0761   Epoch: 2   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:33,101-Speed 3454.16 samples/sec   Loss 8.2425   LearningRate 0.0761   Epoch: 2   Global Step: 14510   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:56:36,053-Speed 3469.23 samples/sec   Loss 8.4379   LearningRate 0.0761   Epoch: 2   Global Step: 14520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:56:39,020-Speed 3453.53 samples/sec   Loss 8.3927   LearningRate 0.0761   Epoch: 2   Global Step: 14530   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:56:41,992-Speed 3445.28 samples/sec   Loss 8.3298   LearningRate 0.0761   Epoch: 2   Global Step: 14540   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:56:44,947-Speed 3466.65 samples/sec   Loss 8.2128   LearningRate 0.0760   Epoch: 2   Global Step: 14550   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:56:47,890-Speed 3480.16 samples/sec   Loss 8.4296   LearningRate 0.0760   Epoch: 2   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:50,847-Speed 3464.10 samples/sec   Loss 8.3902   LearningRate 0.0760   Epoch: 2   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:53,824-Speed 3440.52 samples/sec   Loss 8.4108   LearningRate 0.0760   Epoch: 2   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:56,809-Speed 3431.06 samples/sec   Loss 8.3978   LearningRate 0.0760   Epoch: 2   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:56:59,822-Speed 3399.88 samples/sec   Loss 8.3595   LearningRate 0.0760   Epoch: 2   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:02,791-Speed 3449.29 samples/sec   Loss 8.2810   LearningRate 0.0760   Epoch: 2   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:05,748-Speed 3463.60 samples/sec   Loss 8.5684   LearningRate 0.0759   Epoch: 2   Global Step: 14620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:08,711-Speed 3456.88 samples/sec   Loss 8.3291   LearningRate 0.0759   Epoch: 2   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:11,687-Speed 3442.03 samples/sec   Loss 8.3178   LearningRate 0.0759   Epoch: 2   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:14,646-Speed 3461.46 samples/sec   Loss 8.4948   LearningRate 0.0759   Epoch: 2   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:17,605-Speed 3461.00 samples/sec   Loss 8.2148   LearningRate 0.0759   Epoch: 2   Global Step: 14660   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:57:20,562-Speed 3463.98 samples/sec   Loss 8.2745   LearningRate 0.0759   Epoch: 2   Global Step: 14670   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:57:23,537-Speed 3442.82 samples/sec   Loss 8.2326   LearningRate 0.0758   Epoch: 2   Global Step: 14680   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:57:26,498-Speed 3459.26 samples/sec   Loss 8.3338   LearningRate 0.0758   Epoch: 2   Global Step: 14690   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:57:29,448-Speed 3471.98 samples/sec   Loss 8.3241   LearningRate 0.0758   Epoch: 2   Global Step: 14700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:32,415-Speed 3452.67 samples/sec   Loss 8.3180   LearningRate 0.0758   Epoch: 2   Global Step: 14710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:35,369-Speed 3466.85 samples/sec   Loss 8.2791   LearningRate 0.0758   Epoch: 2   Global Step: 14720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:38,336-Speed 3452.19 samples/sec   Loss 8.2646   LearningRate 0.0758   Epoch: 2   Global Step: 14730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:41,312-Speed 3441.46 samples/sec   Loss 8.3229   LearningRate 0.0758   Epoch: 2   Global Step: 14740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:44,303-Speed 3424.25 samples/sec   Loss 8.4101   LearningRate 0.0757   Epoch: 2   Global Step: 14750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:47,269-Speed 3453.68 samples/sec   Loss 8.1902   LearningRate 0.0757   Epoch: 2   Global Step: 14760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:50,244-Speed 3443.49 samples/sec   Loss 8.1727   LearningRate 0.0757   Epoch: 2   Global Step: 14770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:53,210-Speed 3452.77 samples/sec   Loss 8.4986   LearningRate 0.0757   Epoch: 2   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:56,194-Speed 3432.38 samples/sec   Loss 8.3335   LearningRate 0.0757   Epoch: 2   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:57:59,152-Speed 3462.59 samples/sec   Loss 8.3376   LearningRate 0.0757   Epoch: 2   Global Step: 14800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:58:02,100-Speed 3475.10 samples/sec   Loss 8.3684   LearningRate 0.0756   Epoch: 2   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:05,058-Speed 3462.08 samples/sec   Loss 8.2210   LearningRate 0.0756   Epoch: 2   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:08,028-Speed 3448.32 samples/sec   Loss 8.2045   LearningRate 0.0756   Epoch: 2   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:10,989-Speed 3459.69 samples/sec   Loss 8.2258   LearningRate 0.0756   Epoch: 2   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:13,954-Speed 3454.51 samples/sec   Loss 8.3122   LearningRate 0.0756   Epoch: 2   Global Step: 14850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:16,919-Speed 3454.49 samples/sec   Loss 8.2813   LearningRate 0.0756   Epoch: 2   Global Step: 14860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:19,882-Speed 3456.64 samples/sec   Loss 8.3198   LearningRate 0.0756   Epoch: 2   Global Step: 14870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:22,851-Speed 3449.48 samples/sec   Loss 8.4261   LearningRate 0.0755   Epoch: 2   Global Step: 14880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:25,816-Speed 3454.33 samples/sec   Loss 8.2075   LearningRate 0.0755   Epoch: 2   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:28,789-Speed 3444.96 samples/sec   Loss 8.2911   LearningRate 0.0755   Epoch: 2   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:31,757-Speed 3451.87 samples/sec   Loss 8.0955   LearningRate 0.0755   Epoch: 2   Global Step: 14910   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 02:58:34,722-Speed 3454.30 samples/sec   Loss 8.2851   LearningRate 0.0755   Epoch: 2   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:37,719-Speed 3416.84 samples/sec   Loss 8.5049   LearningRate 0.0755   Epoch: 2   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:40,684-Speed 3455.53 samples/sec   Loss 8.3148   LearningRate 0.0755   Epoch: 2   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:43,648-Speed 3455.54 samples/sec   Loss 8.3003   LearningRate 0.0754   Epoch: 2   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 02:58:46,619-Speed 3446.87 samples/sec   Loss 8.2256   LearningRate 0.0754   Epoch: 2   Global Step: 14960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:58:49,592-Speed 3446.12 samples/sec   Loss 8.4484   LearningRate 0.0754   Epoch: 2   Global Step: 14970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:58:52,549-Speed 3463.49 samples/sec   Loss 8.1994   LearningRate 0.0754   Epoch: 2   Global Step: 14980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:58:55,538-Speed 3426.69 samples/sec   Loss 8.2216   LearningRate 0.0754   Epoch: 2   Global Step: 14990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:58:58,544-Speed 3407.10 samples/sec   Loss 8.1492   LearningRate 0.0754   Epoch: 2   Global Step: 15000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:01,531-Speed 3428.51 samples/sec   Loss 8.3323   LearningRate 0.0753   Epoch: 2   Global Step: 15010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:04,525-Speed 3421.22 samples/sec   Loss 8.3638   LearningRate 0.0753   Epoch: 2   Global Step: 15020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:07,484-Speed 3461.96 samples/sec   Loss 8.2598   LearningRate 0.0753   Epoch: 2   Global Step: 15030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:10,462-Speed 3439.89 samples/sec   Loss 8.1607   LearningRate 0.0753   Epoch: 2   Global Step: 15040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:13,425-Speed 3456.55 samples/sec   Loss 8.3903   LearningRate 0.0753   Epoch: 2   Global Step: 15050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:16,387-Speed 3457.15 samples/sec   Loss 8.2856   LearningRate 0.0753   Epoch: 2   Global Step: 15060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:19,353-Speed 3453.96 samples/sec   Loss 8.3065   LearningRate 0.0753   Epoch: 2   Global Step: 15070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:22,316-Speed 3456.22 samples/sec   Loss 8.3096   LearningRate 0.0752   Epoch: 2   Global Step: 15080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:25,283-Speed 3451.94 samples/sec   Loss 8.3070   LearningRate 0.0752   Epoch: 2   Global Step: 15090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:28,262-Speed 3437.81 samples/sec   Loss 8.0976   LearningRate 0.0752   Epoch: 2   Global Step: 15100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:31,252-Speed 3425.83 samples/sec   Loss 8.2906   LearningRate 0.0752   Epoch: 2   Global Step: 15110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:34,243-Speed 3425.13 samples/sec   Loss 8.3447   LearningRate 0.0752   Epoch: 2   Global Step: 15120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 02:59:37,208-Speed 3454.16 samples/sec   Loss 8.2849   LearningRate 0.0752   Epoch: 2   Global Step: 15130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:40,190-Speed 3434.98 samples/sec   Loss 8.2596   LearningRate 0.0751   Epoch: 2   Global Step: 15140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:43,170-Speed 3436.48 samples/sec   Loss 8.3797   LearningRate 0.0751   Epoch: 2   Global Step: 15150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:46,153-Speed 3433.63 samples/sec   Loss 8.3523   LearningRate 0.0751   Epoch: 2   Global Step: 15160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:49,116-Speed 3457.02 samples/sec   Loss 8.2519   LearningRate 0.0751   Epoch: 2   Global Step: 15170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:52,080-Speed 3455.20 samples/sec   Loss 8.1404   LearningRate 0.0751   Epoch: 2   Global Step: 15180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:55,054-Speed 3444.49 samples/sec   Loss 8.4225   LearningRate 0.0751   Epoch: 2   Global Step: 15190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 02:59:58,001-Speed 3475.76 samples/sec   Loss 8.3189   LearningRate 0.0751   Epoch: 2   Global Step: 15200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:00,980-Speed 3437.75 samples/sec   Loss 8.2578   LearningRate 0.0750   Epoch: 2   Global Step: 15210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:03,967-Speed 3429.94 samples/sec   Loss 8.2152   LearningRate 0.0750   Epoch: 2   Global Step: 15220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:06,947-Speed 3437.29 samples/sec   Loss 8.1660   LearningRate 0.0750   Epoch: 2   Global Step: 15230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:09,922-Speed 3442.77 samples/sec   Loss 8.2570   LearningRate 0.0750   Epoch: 2   Global Step: 15240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:12,908-Speed 3430.75 samples/sec   Loss 8.3088   LearningRate 0.0750   Epoch: 2   Global Step: 15250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:15,892-Speed 3431.68 samples/sec   Loss 8.1541   LearningRate 0.0750   Epoch: 2   Global Step: 15260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:18,880-Speed 3427.67 samples/sec   Loss 8.0980   LearningRate 0.0749   Epoch: 2   Global Step: 15270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:21,850-Speed 3448.67 samples/sec   Loss 8.2992   LearningRate 0.0749   Epoch: 2   Global Step: 15280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:24,818-Speed 3450.91 samples/sec   Loss 8.2189   LearningRate 0.0749   Epoch: 2   Global Step: 15290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 03:00:27,804-Speed 3430.71 samples/sec   Loss 8.2446   LearningRate 0.0749   Epoch: 2   Global Step: 15300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:30,793-Speed 3426.57 samples/sec   Loss 8.2214   LearningRate 0.0749   Epoch: 2   Global Step: 15310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:33,761-Speed 3451.41 samples/sec   Loss 8.3153   LearningRate 0.0749   Epoch: 2   Global Step: 15320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:36,723-Speed 3458.31 samples/sec   Loss 8.1616   LearningRate 0.0749   Epoch: 2   Global Step: 15330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:39,698-Speed 3442.19 samples/sec   Loss 8.3175   LearningRate 0.0748   Epoch: 2   Global Step: 15340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:42,664-Speed 3453.04 samples/sec   Loss 8.1649   LearningRate 0.0748   Epoch: 2   Global Step: 15350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:45,633-Speed 3450.74 samples/sec   Loss 8.1547   LearningRate 0.0748   Epoch: 2   Global Step: 15360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:48,594-Speed 3458.14 samples/sec   Loss 8.3579   LearningRate 0.0748   Epoch: 2   Global Step: 15370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:51,560-Speed 3453.69 samples/sec   Loss 8.2532   LearningRate 0.0748   Epoch: 2   Global Step: 15380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:54,540-Speed 3436.86 samples/sec   Loss 8.1322   LearningRate 0.0748   Epoch: 2   Global Step: 15390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:00:57,522-Speed 3435.05 samples/sec   Loss 8.2173   LearningRate 0.0747   Epoch: 2   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:00,514-Speed 3423.45 samples/sec   Loss 8.3037   LearningRate 0.0747   Epoch: 2   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:03,498-Speed 3432.41 samples/sec   Loss 8.2617   LearningRate 0.0747   Epoch: 2   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:06,489-Speed 3424.00 samples/sec   Loss 8.0422   LearningRate 0.0747   Epoch: 2   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:09,473-Speed 3433.06 samples/sec   Loss 8.2081   LearningRate 0.0747   Epoch: 2   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:12,465-Speed 3423.56 samples/sec   Loss 8.1742   LearningRate 0.0747   Epoch: 2   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:15,467-Speed 3411.08 samples/sec   Loss 8.2639   LearningRate 0.0747   Epoch: 2   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:18,442-Speed 3442.63 samples/sec   Loss 8.3024   LearningRate 0.0746   Epoch: 2   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:21,434-Speed 3423.53 samples/sec   Loss 8.0773   LearningRate 0.0746   Epoch: 2   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:24,426-Speed 3423.64 samples/sec   Loss 8.2233   LearningRate 0.0746   Epoch: 2   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:27,412-Speed 3430.36 samples/sec   Loss 8.4285   LearningRate 0.0746   Epoch: 2   Global Step: 15500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:01:30,397-Speed 3430.87 samples/sec   Loss 8.2252   LearningRate 0.0746   Epoch: 2   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:33,374-Speed 3440.36 samples/sec   Loss 8.3581   LearningRate 0.0746   Epoch: 2   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:36,347-Speed 3452.57 samples/sec   Loss 8.1522   LearningRate 0.0746   Epoch: 2   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:39,314-Speed 3452.18 samples/sec   Loss 8.1462   LearningRate 0.0745   Epoch: 2   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:42,298-Speed 3432.37 samples/sec   Loss 8.3290   LearningRate 0.0745   Epoch: 2   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:45,279-Speed 3435.07 samples/sec   Loss 8.2315   LearningRate 0.0745   Epoch: 2   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:48,276-Speed 3418.47 samples/sec   Loss 8.1719   LearningRate 0.0745   Epoch: 2   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:51,249-Speed 3445.10 samples/sec   Loss 8.1776   LearningRate 0.0745   Epoch: 2   Global Step: 15580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:54,236-Speed 3428.65 samples/sec   Loss 8.2328   LearningRate 0.0745   Epoch: 2   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:01:57,212-Speed 3442.13 samples/sec   Loss 8.2551   LearningRate 0.0744   Epoch: 2   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:00,153-Speed 3482.29 samples/sec   Loss 8.2033   LearningRate 0.0744   Epoch: 2   Global Step: 15610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:03,138-Speed 3430.76 samples/sec   Loss 8.1103   LearningRate 0.0744   Epoch: 2   Global Step: 15620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:06,112-Speed 3444.08 samples/sec   Loss 8.2428   LearningRate 0.0744   Epoch: 2   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:09,077-Speed 3455.14 samples/sec   Loss 8.0873   LearningRate 0.0744   Epoch: 2   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:12,051-Speed 3443.67 samples/sec   Loss 8.0008   LearningRate 0.0744   Epoch: 2   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:15,018-Speed 3452.52 samples/sec   Loss 8.0327   LearningRate 0.0744   Epoch: 2   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:17,997-Speed 3438.09 samples/sec   Loss 8.1009   LearningRate 0.0743   Epoch: 2   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:20,985-Speed 3428.05 samples/sec   Loss 8.1380   LearningRate 0.0743   Epoch: 2   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:23,965-Speed 3437.15 samples/sec   Loss 8.2772   LearningRate 0.0743   Epoch: 2   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:26,942-Speed 3440.76 samples/sec   Loss 8.2560   LearningRate 0.0743   Epoch: 2   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:02:29,916-Speed 3443.47 samples/sec   Loss 8.1503   LearningRate 0.0743   Epoch: 2   Global Step: 15710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:32,902-Speed 3430.40 samples/sec   Loss 8.1991   LearningRate 0.0743   Epoch: 2   Global Step: 15720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:35,865-Speed 3456.02 samples/sec   Loss 8.0511   LearningRate 0.0742   Epoch: 2   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:38,863-Speed 3416.67 samples/sec   Loss 8.0180   LearningRate 0.0742   Epoch: 2   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:41,846-Speed 3434.45 samples/sec   Loss 8.2056   LearningRate 0.0742   Epoch: 2   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:44,812-Speed 3452.95 samples/sec   Loss 8.1963   LearningRate 0.0742   Epoch: 2   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:47,801-Speed 3426.23 samples/sec   Loss 8.0445   LearningRate 0.0742   Epoch: 2   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:50,770-Speed 3450.74 samples/sec   Loss 8.1893   LearningRate 0.0742   Epoch: 2   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:53,754-Speed 3431.45 samples/sec   Loss 8.0255   LearningRate 0.0742   Epoch: 2   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:56,751-Speed 3418.17 samples/sec   Loss 8.0969   LearningRate 0.0741   Epoch: 2   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:02:59,723-Speed 3446.13 samples/sec   Loss 8.1474   LearningRate 0.0741   Epoch: 2   Global Step: 15810   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:03:02,702-Speed 3437.60 samples/sec   Loss 8.1543   LearningRate 0.0741   Epoch: 2   Global Step: 15820   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:03:05,648-Speed 3477.02 samples/sec   Loss 8.2207   LearningRate 0.0741   Epoch: 2   Global Step: 15830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:08,638-Speed 3425.81 samples/sec   Loss 8.2136   LearningRate 0.0741   Epoch: 2   Global Step: 15840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:11,609-Speed 3446.98 samples/sec   Loss 8.1597   LearningRate 0.0741   Epoch: 2   Global Step: 15850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:14,585-Speed 3442.01 samples/sec   Loss 8.1060   LearningRate 0.0741   Epoch: 2   Global Step: 15860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:17,549-Speed 3455.33 samples/sec   Loss 8.1628   LearningRate 0.0740   Epoch: 2   Global Step: 15870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:20,526-Speed 3441.73 samples/sec   Loss 8.3012   LearningRate 0.0740   Epoch: 2   Global Step: 15880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:23,518-Speed 3422.12 samples/sec   Loss 8.1810   LearningRate 0.0740   Epoch: 2   Global Step: 15890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:26,532-Speed 3398.71 samples/sec   Loss 8.1397   LearningRate 0.0740   Epoch: 2   Global Step: 15900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:29,513-Speed 3435.65 samples/sec   Loss 8.0612   LearningRate 0.0740   Epoch: 2   Global Step: 15910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:32,484-Speed 3447.03 samples/sec   Loss 7.9337   LearningRate 0.0740   Epoch: 2   Global Step: 15920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:03:35,455-Speed 3448.30 samples/sec   Loss 7.9544   LearningRate 0.0739   Epoch: 2   Global Step: 15930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:03:38,428-Speed 3445.20 samples/sec   Loss 8.0290   LearningRate 0.0739   Epoch: 2   Global Step: 15940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:03:41,409-Speed 3435.61 samples/sec   Loss 8.1144   LearningRate 0.0739   Epoch: 2   Global Step: 15950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:03:44,394-Speed 3431.62 samples/sec   Loss 8.0211   LearningRate 0.0739   Epoch: 2   Global Step: 15960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:03:47,397-Speed 3410.53 samples/sec   Loss 8.1448   LearningRate 0.0739   Epoch: 2   Global Step: 15970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:03:50,370-Speed 3445.78 samples/sec   Loss 8.0580   LearningRate 0.0739   Epoch: 2   Global Step: 15980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:03:53,382-Speed 3399.63 samples/sec   Loss 8.1522   LearningRate 0.0739   Epoch: 2   Global Step: 15990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:03:56,361-Speed 3438.04 samples/sec   Loss 8.0119   LearningRate 0.0738   Epoch: 2   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:04:39,843-[lfw][16000]XNorm: 21.961942
Training: 2022-04-27 03:04:39,844-[lfw][16000]Accuracy-Flip: 0.99633+-0.00306
Training: 2022-04-27 03:04:39,844-[lfw][16000]Accuracy-Highest: 0.99633
Training: 2022-04-27 03:05:30,035-[cfp_fp][16000]XNorm: 18.888790
Training: 2022-04-27 03:05:30,036-[cfp_fp][16000]Accuracy-Flip: 0.92114+-0.01830
Training: 2022-04-27 03:05:30,036-[cfp_fp][16000]Accuracy-Highest: 0.93214
Training: 2022-04-27 03:06:13,131-[agedb_30][16000]XNorm: 21.890852
Training: 2022-04-27 03:06:13,132-[agedb_30][16000]Accuracy-Flip: 0.96700+-0.01115
Training: 2022-04-27 03:06:13,132-[agedb_30][16000]Accuracy-Highest: 0.96700
Training: 2022-04-27 03:06:16,103-Speed 73.28 samples/sec   Loss 8.2020   LearningRate 0.0738   Epoch: 2   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:19,057-Speed 3466.93 samples/sec   Loss 8.2026   LearningRate 0.0738   Epoch: 2   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:22,027-Speed 3448.35 samples/sec   Loss 8.0855   LearningRate 0.0738   Epoch: 2   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:24,999-Speed 3445.66 samples/sec   Loss 8.0881   LearningRate 0.0738   Epoch: 2   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:27,964-Speed 3454.74 samples/sec   Loss 7.9979   LearningRate 0.0738   Epoch: 2   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:30,934-Speed 3449.16 samples/sec   Loss 7.9406   LearningRate 0.0737   Epoch: 2   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:33,893-Speed 3461.09 samples/sec   Loss 7.8188   LearningRate 0.0737   Epoch: 2   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:36,864-Speed 3446.83 samples/sec   Loss 8.0811   LearningRate 0.0737   Epoch: 2   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:39,946-Speed 3324.20 samples/sec   Loss 8.0774   LearningRate 0.0737   Epoch: 2   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:42,922-Speed 3441.26 samples/sec   Loss 8.2206   LearningRate 0.0737   Epoch: 2   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:45,910-Speed 3427.46 samples/sec   Loss 8.1274   LearningRate 0.0737   Epoch: 2   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:48,885-Speed 3442.75 samples/sec   Loss 8.0555   LearningRate 0.0737   Epoch: 2   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:06:51,860-Speed 3443.97 samples/sec   Loss 8.0486   LearningRate 0.0736   Epoch: 2   Global Step: 16130   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:06:54,824-Speed 3455.14 samples/sec   Loss 7.8538   LearningRate 0.0736   Epoch: 2   Global Step: 16140   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:06:57,794-Speed 3448.19 samples/sec   Loss 8.1264   LearningRate 0.0736   Epoch: 2   Global Step: 16150   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:07:00,782-Speed 3428.49 samples/sec   Loss 8.1562   LearningRate 0.0736   Epoch: 2   Global Step: 16160   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:07:03,770-Speed 3427.24 samples/sec   Loss 8.2240   LearningRate 0.0736   Epoch: 2   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:06,760-Speed 3425.67 samples/sec   Loss 8.1748   LearningRate 0.0736   Epoch: 2   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:09,736-Speed 3441.86 samples/sec   Loss 8.1162   LearningRate 0.0736   Epoch: 2   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:12,735-Speed 3414.63 samples/sec   Loss 8.1315   LearningRate 0.0735   Epoch: 2   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:15,712-Speed 3440.78 samples/sec   Loss 8.1171   LearningRate 0.0735   Epoch: 2   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:18,685-Speed 3445.49 samples/sec   Loss 7.9731   LearningRate 0.0735   Epoch: 2   Global Step: 16220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:21,662-Speed 3440.67 samples/sec   Loss 8.0577   LearningRate 0.0735   Epoch: 2   Global Step: 16230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:24,653-Speed 3424.05 samples/sec   Loss 8.0651   LearningRate 0.0735   Epoch: 2   Global Step: 16240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:27,657-Speed 3409.92 samples/sec   Loss 8.0525   LearningRate 0.0735   Epoch: 2   Global Step: 16250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:30,653-Speed 3418.56 samples/sec   Loss 8.0797   LearningRate 0.0734   Epoch: 2   Global Step: 16260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:33,644-Speed 3424.32 samples/sec   Loss 7.8837   LearningRate 0.0734   Epoch: 2   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:07:36,619-Speed 3442.81 samples/sec   Loss 7.9646   LearningRate 0.0734   Epoch: 2   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:07:39,606-Speed 3429.71 samples/sec   Loss 8.1348   LearningRate 0.0734   Epoch: 2   Global Step: 16290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:07:42,582-Speed 3440.90 samples/sec   Loss 8.0615   LearningRate 0.0734   Epoch: 2   Global Step: 16300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:07:45,576-Speed 3421.72 samples/sec   Loss 8.0419   LearningRate 0.0734   Epoch: 2   Global Step: 16310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:07:48,540-Speed 3455.42 samples/sec   Loss 7.9949   LearningRate 0.0734   Epoch: 2   Global Step: 16320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:07:51,517-Speed 3440.94 samples/sec   Loss 8.0353   LearningRate 0.0733   Epoch: 2   Global Step: 16330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:07:54,484-Speed 3451.11 samples/sec   Loss 8.1269   LearningRate 0.0733   Epoch: 2   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:07:57,479-Speed 3419.92 samples/sec   Loss 8.1334   LearningRate 0.0733   Epoch: 2   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:00,464-Speed 3432.47 samples/sec   Loss 8.0932   LearningRate 0.0733   Epoch: 2   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:03,440-Speed 3440.56 samples/sec   Loss 8.1254   LearningRate 0.0733   Epoch: 2   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:06,435-Speed 3420.00 samples/sec   Loss 8.0495   LearningRate 0.0733   Epoch: 2   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:09,418-Speed 3433.96 samples/sec   Loss 8.0090   LearningRate 0.0733   Epoch: 2   Global Step: 16390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:12,393-Speed 3442.81 samples/sec   Loss 7.9731   LearningRate 0.0732   Epoch: 2   Global Step: 16400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:15,365-Speed 3445.69 samples/sec   Loss 7.8356   LearningRate 0.0732   Epoch: 2   Global Step: 16410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:18,356-Speed 3424.99 samples/sec   Loss 8.1536   LearningRate 0.0732   Epoch: 2   Global Step: 16420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:21,328-Speed 3446.36 samples/sec   Loss 7.8782   LearningRate 0.0732   Epoch: 2   Global Step: 16430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:08:24,329-Speed 3413.17 samples/sec   Loss 8.0700   LearningRate 0.0732   Epoch: 2   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:27,321-Speed 3422.62 samples/sec   Loss 7.9279   LearningRate 0.0732   Epoch: 2   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:30,332-Speed 3402.77 samples/sec   Loss 8.2090   LearningRate 0.0731   Epoch: 2   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:33,322-Speed 3424.51 samples/sec   Loss 8.0693   LearningRate 0.0731   Epoch: 2   Global Step: 16470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:36,303-Speed 3436.13 samples/sec   Loss 7.8687   LearningRate 0.0731   Epoch: 2   Global Step: 16480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:39,315-Speed 3400.69 samples/sec   Loss 8.1544   LearningRate 0.0731   Epoch: 2   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:42,334-Speed 3391.83 samples/sec   Loss 7.9584   LearningRate 0.0731   Epoch: 2   Global Step: 16500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:45,310-Speed 3441.70 samples/sec   Loss 7.9711   LearningRate 0.0731   Epoch: 2   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:48,288-Speed 3439.43 samples/sec   Loss 7.9416   LearningRate 0.0731   Epoch: 2   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:51,288-Speed 3414.24 samples/sec   Loss 8.0273   LearningRate 0.0730   Epoch: 2   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:08:54,256-Speed 3451.13 samples/sec   Loss 8.0653   LearningRate 0.0730   Epoch: 2   Global Step: 16540   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:08:57,211-Speed 3466.37 samples/sec   Loss 8.0316   LearningRate 0.0730   Epoch: 2   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:00,183-Speed 3446.34 samples/sec   Loss 8.0037   LearningRate 0.0730   Epoch: 2   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:03,151-Speed 3451.39 samples/sec   Loss 7.8970   LearningRate 0.0730   Epoch: 2   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:06,129-Speed 3438.64 samples/sec   Loss 8.0820   LearningRate 0.0730   Epoch: 2   Global Step: 16580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:09,119-Speed 3425.58 samples/sec   Loss 7.9661   LearningRate 0.0730   Epoch: 2   Global Step: 16590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:12,120-Speed 3412.82 samples/sec   Loss 7.8978   LearningRate 0.0729   Epoch: 2   Global Step: 16600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:15,099-Speed 3438.74 samples/sec   Loss 7.9587   LearningRate 0.0729   Epoch: 2   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:18,076-Speed 3440.58 samples/sec   Loss 7.8931   LearningRate 0.0729   Epoch: 2   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:21,063-Speed 3429.85 samples/sec   Loss 8.0525   LearningRate 0.0729   Epoch: 2   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:24,034-Speed 3448.03 samples/sec   Loss 7.8939   LearningRate 0.0729   Epoch: 2   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:27,029-Speed 3419.98 samples/sec   Loss 8.1917   LearningRate 0.0729   Epoch: 2   Global Step: 16650   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:09:29,980-Speed 3470.27 samples/sec   Loss 7.9754   LearningRate 0.0728   Epoch: 2   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:32,948-Speed 3451.12 samples/sec   Loss 7.9723   LearningRate 0.0728   Epoch: 2   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:35,939-Speed 3423.68 samples/sec   Loss 8.0555   LearningRate 0.0728   Epoch: 2   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:38,927-Speed 3428.49 samples/sec   Loss 8.0453   LearningRate 0.0728   Epoch: 2   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:41,910-Speed 3432.81 samples/sec   Loss 8.0264   LearningRate 0.0728   Epoch: 2   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:44,899-Speed 3426.96 samples/sec   Loss 8.0593   LearningRate 0.0728   Epoch: 2   Global Step: 16710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:47,899-Speed 3414.80 samples/sec   Loss 8.0475   LearningRate 0.0728   Epoch: 2   Global Step: 16720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:50,891-Speed 3423.67 samples/sec   Loss 8.0690   LearningRate 0.0727   Epoch: 2   Global Step: 16730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:53,891-Speed 3413.35 samples/sec   Loss 8.0722   LearningRate 0.0727   Epoch: 2   Global Step: 16740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:56,866-Speed 3443.16 samples/sec   Loss 7.9896   LearningRate 0.0727   Epoch: 2   Global Step: 16750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:09:59,850-Speed 3432.24 samples/sec   Loss 7.8893   LearningRate 0.0727   Epoch: 2   Global Step: 16760   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:10:02,851-Speed 3413.23 samples/sec   Loss 7.8627   LearningRate 0.0727   Epoch: 2   Global Step: 16770   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:10:05,843-Speed 3422.60 samples/sec   Loss 7.9459   LearningRate 0.0727   Epoch: 2   Global Step: 16780   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:10:08,836-Speed 3422.90 samples/sec   Loss 7.9429   LearningRate 0.0727   Epoch: 2   Global Step: 16790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:10:11,917-Speed 3324.37 samples/sec   Loss 8.0394   LearningRate 0.0726   Epoch: 2   Global Step: 16800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:10:14,898-Speed 3436.00 samples/sec   Loss 7.8870   LearningRate 0.0726   Epoch: 2   Global Step: 16810   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:10:17,878-Speed 3437.14 samples/sec   Loss 7.7558   LearningRate 0.0726   Epoch: 2   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:20,870-Speed 3422.82 samples/sec   Loss 8.0028   LearningRate 0.0726   Epoch: 2   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:23,857-Speed 3429.18 samples/sec   Loss 7.8892   LearningRate 0.0726   Epoch: 2   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:26,860-Speed 3411.04 samples/sec   Loss 7.9306   LearningRate 0.0726   Epoch: 2   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:29,839-Speed 3438.19 samples/sec   Loss 7.8935   LearningRate 0.0725   Epoch: 2   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:32,824-Speed 3431.41 samples/sec   Loss 8.0811   LearningRate 0.0725   Epoch: 2   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:35,804-Speed 3436.75 samples/sec   Loss 7.9572   LearningRate 0.0725   Epoch: 2   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:38,808-Speed 3409.23 samples/sec   Loss 8.1454   LearningRate 0.0725   Epoch: 2   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:41,799-Speed 3424.54 samples/sec   Loss 7.9187   LearningRate 0.0725   Epoch: 2   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:44,785-Speed 3430.69 samples/sec   Loss 7.8788   LearningRate 0.0725   Epoch: 2   Global Step: 16910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:47,758-Speed 3444.84 samples/sec   Loss 7.8339   LearningRate 0.0725   Epoch: 2   Global Step: 16920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:50,741-Speed 3433.24 samples/sec   Loss 8.0237   LearningRate 0.0724   Epoch: 2   Global Step: 16930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:53,727-Speed 3430.40 samples/sec   Loss 7.9960   LearningRate 0.0724   Epoch: 2   Global Step: 16940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:56,700-Speed 3444.82 samples/sec   Loss 7.9259   LearningRate 0.0724   Epoch: 2   Global Step: 16950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:10:59,676-Speed 3441.53 samples/sec   Loss 7.9621   LearningRate 0.0724   Epoch: 2   Global Step: 16960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:11:02,645-Speed 3450.78 samples/sec   Loss 7.9584   LearningRate 0.0724   Epoch: 2   Global Step: 16970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:11:05,631-Speed 3429.34 samples/sec   Loss 7.8662   LearningRate 0.0724   Epoch: 2   Global Step: 16980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:11:08,612-Speed 3436.11 samples/sec   Loss 7.8672   LearningRate 0.0724   Epoch: 2   Global Step: 16990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:11:11,610-Speed 3416.66 samples/sec   Loss 7.9038   LearningRate 0.0723   Epoch: 2   Global Step: 17000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:11:14,603-Speed 3422.70 samples/sec   Loss 7.9785   LearningRate 0.0723   Epoch: 2   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:11:17,590-Speed 3428.96 samples/sec   Loss 7.9395   LearningRate 0.0723   Epoch: 2   Global Step: 17020   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:11:20,596-Speed 3407.30 samples/sec   Loss 7.8231   LearningRate 0.0723   Epoch: 2   Global Step: 17030   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:11:23,563-Speed 3451.56 samples/sec   Loss 7.9000   LearningRate 0.0723   Epoch: 2   Global Step: 17040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:11:26,607-Speed 3364.91 samples/sec   Loss 8.1746   LearningRate 0.0723   Epoch: 2   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:11:39,751-Speed 779.10 samples/sec   Loss 7.8263   LearningRate 0.0722   Epoch: 3   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:11:42,745-Speed 3422.43 samples/sec   Loss 7.3061   LearningRate 0.0722   Epoch: 3   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:11:45,877-Speed 3270.96 samples/sec   Loss 7.1845   LearningRate 0.0722   Epoch: 3   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:11:48,875-Speed 3416.25 samples/sec   Loss 7.3882   LearningRate 0.0722   Epoch: 3   Global Step: 17090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:11:51,881-Speed 3407.69 samples/sec   Loss 7.2892   LearningRate 0.0722   Epoch: 3   Global Step: 17100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:11:54,885-Speed 3410.51 samples/sec   Loss 7.2068   LearningRate 0.0722   Epoch: 3   Global Step: 17110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:11:57,900-Speed 3397.23 samples/sec   Loss 7.2141   LearningRate 0.0722   Epoch: 3   Global Step: 17120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:12:00,900-Speed 3415.23 samples/sec   Loss 7.1894   LearningRate 0.0721   Epoch: 3   Global Step: 17130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:12:03,900-Speed 3414.37 samples/sec   Loss 7.4200   LearningRate 0.0721   Epoch: 3   Global Step: 17140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:12:06,886-Speed 3430.24 samples/sec   Loss 7.4053   LearningRate 0.0721   Epoch: 3   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:09,878-Speed 3423.78 samples/sec   Loss 7.5607   LearningRate 0.0721   Epoch: 3   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:12,879-Speed 3413.20 samples/sec   Loss 7.5331   LearningRate 0.0721   Epoch: 3   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:15,861-Speed 3435.30 samples/sec   Loss 7.2968   LearningRate 0.0721   Epoch: 3   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:18,856-Speed 3419.20 samples/sec   Loss 7.4130   LearningRate 0.0721   Epoch: 3   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:21,833-Speed 3441.75 samples/sec   Loss 7.3662   LearningRate 0.0720   Epoch: 3   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:24,967-Speed 3267.79 samples/sec   Loss 7.4494   LearningRate 0.0720   Epoch: 3   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:27,962-Speed 3420.33 samples/sec   Loss 7.5304   LearningRate 0.0720   Epoch: 3   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:30,965-Speed 3410.22 samples/sec   Loss 7.5298   LearningRate 0.0720   Epoch: 3   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:33,962-Speed 3418.69 samples/sec   Loss 7.5157   LearningRate 0.0720   Epoch: 3   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:36,937-Speed 3442.60 samples/sec   Loss 7.4297   LearningRate 0.0720   Epoch: 3   Global Step: 17250   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:12:39,933-Speed 3418.51 samples/sec   Loss 7.5177   LearningRate 0.0719   Epoch: 3   Global Step: 17260   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:12:42,930-Speed 3416.76 samples/sec   Loss 7.6374   LearningRate 0.0719   Epoch: 3   Global Step: 17270   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:12:45,914-Speed 3432.33 samples/sec   Loss 7.5380   LearningRate 0.0719   Epoch: 3   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:48,914-Speed 3414.55 samples/sec   Loss 7.5632   LearningRate 0.0719   Epoch: 3   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:51,915-Speed 3413.44 samples/sec   Loss 7.6450   LearningRate 0.0719   Epoch: 3   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:12:54,894-Speed 3438.12 samples/sec   Loss 7.5645   LearningRate 0.0719   Epoch: 3   Global Step: 17310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:12:57,878-Speed 3432.61 samples/sec   Loss 7.6225   LearningRate 0.0719   Epoch: 3   Global Step: 17320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:00,864-Speed 3430.02 samples/sec   Loss 7.4636   LearningRate 0.0718   Epoch: 3   Global Step: 17330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:03,969-Speed 3298.35 samples/sec   Loss 7.5329   LearningRate 0.0718   Epoch: 3   Global Step: 17340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:06,969-Speed 3414.58 samples/sec   Loss 7.6636   LearningRate 0.0718   Epoch: 3   Global Step: 17350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:09,964-Speed 3419.69 samples/sec   Loss 7.4500   LearningRate 0.0718   Epoch: 3   Global Step: 17360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:12,946-Speed 3434.64 samples/sec   Loss 7.6630   LearningRate 0.0718   Epoch: 3   Global Step: 17370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:15,926-Speed 3437.64 samples/sec   Loss 7.6570   LearningRate 0.0718   Epoch: 3   Global Step: 17380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:18,908-Speed 3434.69 samples/sec   Loss 7.4985   LearningRate 0.0718   Epoch: 3   Global Step: 17390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:21,909-Speed 3412.32 samples/sec   Loss 7.5661   LearningRate 0.0717   Epoch: 3   Global Step: 17400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:13:24,886-Speed 3440.64 samples/sec   Loss 7.6062   LearningRate 0.0717   Epoch: 3   Global Step: 17410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:27,893-Speed 3406.05 samples/sec   Loss 7.6564   LearningRate 0.0717   Epoch: 3   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:30,896-Speed 3410.53 samples/sec   Loss 7.6646   LearningRate 0.0717   Epoch: 3   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:33,887-Speed 3424.60 samples/sec   Loss 7.5751   LearningRate 0.0717   Epoch: 3   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:36,877-Speed 3425.82 samples/sec   Loss 7.6410   LearningRate 0.0717   Epoch: 3   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:39,870-Speed 3422.57 samples/sec   Loss 7.7013   LearningRate 0.0717   Epoch: 3   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:42,851-Speed 3435.44 samples/sec   Loss 7.6950   LearningRate 0.0716   Epoch: 3   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:45,841-Speed 3425.25 samples/sec   Loss 7.6038   LearningRate 0.0716   Epoch: 3   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:48,858-Speed 3395.73 samples/sec   Loss 7.5904   LearningRate 0.0716   Epoch: 3   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:51,835-Speed 3440.29 samples/sec   Loss 7.5920   LearningRate 0.0716   Epoch: 3   Global Step: 17500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:13:54,821-Speed 3430.12 samples/sec   Loss 7.5945   LearningRate 0.0716   Epoch: 3   Global Step: 17510   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:13:57,783-Speed 3457.88 samples/sec   Loss 7.4592   LearningRate 0.0716   Epoch: 3   Global Step: 17520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:00,775-Speed 3422.63 samples/sec   Loss 7.5083   LearningRate 0.0715   Epoch: 3   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:03,775-Speed 3414.64 samples/sec   Loss 7.5330   LearningRate 0.0715   Epoch: 3   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:06,776-Speed 3413.13 samples/sec   Loss 7.6959   LearningRate 0.0715   Epoch: 3   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:09,783-Speed 3406.53 samples/sec   Loss 7.6123   LearningRate 0.0715   Epoch: 3   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:12,788-Speed 3407.92 samples/sec   Loss 7.6903   LearningRate 0.0715   Epoch: 3   Global Step: 17570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:15,787-Speed 3415.78 samples/sec   Loss 7.5350   LearningRate 0.0715   Epoch: 3   Global Step: 17580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:18,800-Speed 3399.20 samples/sec   Loss 7.8129   LearningRate 0.0715   Epoch: 3   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:21,804-Speed 3409.82 samples/sec   Loss 7.6198   LearningRate 0.0714   Epoch: 3   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:24,912-Speed 3294.89 samples/sec   Loss 7.7478   LearningRate 0.0714   Epoch: 3   Global Step: 17610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:27,912-Speed 3413.73 samples/sec   Loss 7.6231   LearningRate 0.0714   Epoch: 3   Global Step: 17620   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:14:30,903-Speed 3425.25 samples/sec   Loss 7.6840   LearningRate 0.0714   Epoch: 3   Global Step: 17630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:33,893-Speed 3424.80 samples/sec   Loss 7.6827   LearningRate 0.0714   Epoch: 3   Global Step: 17640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:36,884-Speed 3425.14 samples/sec   Loss 7.6618   LearningRate 0.0714   Epoch: 3   Global Step: 17650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:39,869-Speed 3431.38 samples/sec   Loss 7.6193   LearningRate 0.0714   Epoch: 3   Global Step: 17660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:42,848-Speed 3437.84 samples/sec   Loss 7.6040   LearningRate 0.0713   Epoch: 3   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:45,832-Speed 3432.04 samples/sec   Loss 7.7618   LearningRate 0.0713   Epoch: 3   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:48,819-Speed 3429.50 samples/sec   Loss 7.7916   LearningRate 0.0713   Epoch: 3   Global Step: 17690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:51,793-Speed 3444.44 samples/sec   Loss 7.5851   LearningRate 0.0713   Epoch: 3   Global Step: 17700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:54,796-Speed 3410.13 samples/sec   Loss 7.7041   LearningRate 0.0713   Epoch: 3   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:14:57,793-Speed 3418.25 samples/sec   Loss 7.6435   LearningRate 0.0713   Epoch: 3   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:00,779-Speed 3430.01 samples/sec   Loss 7.6448   LearningRate 0.0712   Epoch: 3   Global Step: 17730   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:15:03,754-Speed 3442.78 samples/sec   Loss 7.8288   LearningRate 0.0712   Epoch: 3   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:06,765-Speed 3401.64 samples/sec   Loss 7.6389   LearningRate 0.0712   Epoch: 3   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:09,762-Speed 3417.23 samples/sec   Loss 7.8096   LearningRate 0.0712   Epoch: 3   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:12,767-Speed 3408.24 samples/sec   Loss 7.7188   LearningRate 0.0712   Epoch: 3   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:15,749-Speed 3435.27 samples/sec   Loss 7.5655   LearningRate 0.0712   Epoch: 3   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:18,729-Speed 3437.14 samples/sec   Loss 7.7416   LearningRate 0.0712   Epoch: 3   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:21,722-Speed 3421.78 samples/sec   Loss 7.6062   LearningRate 0.0711   Epoch: 3   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:24,722-Speed 3414.28 samples/sec   Loss 7.5862   LearningRate 0.0711   Epoch: 3   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:27,741-Speed 3392.20 samples/sec   Loss 7.7084   LearningRate 0.0711   Epoch: 3   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:30,759-Speed 3394.03 samples/sec   Loss 7.7732   LearningRate 0.0711   Epoch: 3   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:33,777-Speed 3394.41 samples/sec   Loss 7.7383   LearningRate 0.0711   Epoch: 3   Global Step: 17840   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:15:36,773-Speed 3419.14 samples/sec   Loss 7.6951   LearningRate 0.0711   Epoch: 3   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:39,791-Speed 3393.73 samples/sec   Loss 7.7693   LearningRate 0.0711   Epoch: 3   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:42,803-Speed 3400.54 samples/sec   Loss 7.6512   LearningRate 0.0710   Epoch: 3   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:45,835-Speed 3378.31 samples/sec   Loss 7.6271   LearningRate 0.0710   Epoch: 3   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:48,835-Speed 3414.68 samples/sec   Loss 7.6921   LearningRate 0.0710   Epoch: 3   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:51,828-Speed 3422.01 samples/sec   Loss 7.7503   LearningRate 0.0710   Epoch: 3   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:54,839-Speed 3400.89 samples/sec   Loss 7.8306   LearningRate 0.0710   Epoch: 3   Global Step: 17910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:15:57,857-Speed 3394.58 samples/sec   Loss 7.6995   LearningRate 0.0710   Epoch: 3   Global Step: 17920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:16:00,886-Speed 3381.10 samples/sec   Loss 7.8562   LearningRate 0.0710   Epoch: 3   Global Step: 17930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:16:03,911-Speed 3386.67 samples/sec   Loss 7.8840   LearningRate 0.0709   Epoch: 3   Global Step: 17940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:16:06,935-Speed 3387.03 samples/sec   Loss 7.7127   LearningRate 0.0709   Epoch: 3   Global Step: 17950   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:16:09,924-Speed 3425.87 samples/sec   Loss 7.7293   LearningRate 0.0709   Epoch: 3   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:16:12,910-Speed 3431.09 samples/sec   Loss 7.5480   LearningRate 0.0709   Epoch: 3   Global Step: 17970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:16:15,889-Speed 3437.57 samples/sec   Loss 7.8240   LearningRate 0.0709   Epoch: 3   Global Step: 17980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:16:18,911-Speed 3389.22 samples/sec   Loss 7.7183   LearningRate 0.0709   Epoch: 3   Global Step: 17990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:16:21,897-Speed 3430.11 samples/sec   Loss 7.7898   LearningRate 0.0708   Epoch: 3   Global Step: 18000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:17:05,215-[lfw][18000]XNorm: 21.702406
Training: 2022-04-27 03:17:05,216-[lfw][18000]Accuracy-Flip: 0.99533+-0.00348
Training: 2022-04-27 03:17:05,216-[lfw][18000]Accuracy-Highest: 0.99633
Training: 2022-04-27 03:17:55,618-[cfp_fp][18000]XNorm: 18.879926
Training: 2022-04-27 03:17:55,619-[cfp_fp][18000]Accuracy-Flip: 0.92943+-0.01472
Training: 2022-04-27 03:17:55,619-[cfp_fp][18000]Accuracy-Highest: 0.93214
Training: 2022-04-27 03:18:39,074-[agedb_30][18000]XNorm: 21.278972
Training: 2022-04-27 03:18:39,075-[agedb_30][18000]Accuracy-Flip: 0.96583+-0.00790
Training: 2022-04-27 03:18:39,075-[agedb_30][18000]Accuracy-Highest: 0.96700
Training: 2022-04-27 03:18:42,070-Speed 73.05 samples/sec   Loss 7.7985   LearningRate 0.0708   Epoch: 3   Global Step: 18010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:18:45,034-Speed 3455.41 samples/sec   Loss 7.6265   LearningRate 0.0708   Epoch: 3   Global Step: 18020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:18:48,036-Speed 3412.04 samples/sec   Loss 7.7247   LearningRate 0.0708   Epoch: 3   Global Step: 18030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:18:51,025-Speed 3426.97 samples/sec   Loss 7.8285   LearningRate 0.0708   Epoch: 3   Global Step: 18040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:18:54,002-Speed 3440.67 samples/sec   Loss 7.6176   LearningRate 0.0708   Epoch: 3   Global Step: 18050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:18:56,998-Speed 3418.65 samples/sec   Loss 7.7467   LearningRate 0.0708   Epoch: 3   Global Step: 18060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:00,024-Speed 3384.86 samples/sec   Loss 7.7955   LearningRate 0.0707   Epoch: 3   Global Step: 18070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:03,069-Speed 3363.86 samples/sec   Loss 7.7107   LearningRate 0.0707   Epoch: 3   Global Step: 18080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:06,072-Speed 3410.72 samples/sec   Loss 7.8319   LearningRate 0.0707   Epoch: 3   Global Step: 18090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:09,070-Speed 3416.62 samples/sec   Loss 7.6809   LearningRate 0.0707   Epoch: 3   Global Step: 18100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:12,051-Speed 3435.78 samples/sec   Loss 7.7490   LearningRate 0.0707   Epoch: 3   Global Step: 18110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:15,063-Speed 3399.83 samples/sec   Loss 7.8048   LearningRate 0.0707   Epoch: 3   Global Step: 18120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:18,118-Speed 3352.91 samples/sec   Loss 7.7618   LearningRate 0.0707   Epoch: 3   Global Step: 18130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:21,107-Speed 3427.03 samples/sec   Loss 7.5527   LearningRate 0.0706   Epoch: 3   Global Step: 18140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:24,094-Speed 3429.22 samples/sec   Loss 7.6804   LearningRate 0.0706   Epoch: 3   Global Step: 18150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:27,085-Speed 3424.60 samples/sec   Loss 7.6763   LearningRate 0.0706   Epoch: 3   Global Step: 18160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:30,083-Speed 3416.17 samples/sec   Loss 7.7382   LearningRate 0.0706   Epoch: 3   Global Step: 18170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:19:33,041-Speed 3462.77 samples/sec   Loss 7.6534   LearningRate 0.0706   Epoch: 3   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:36,037-Speed 3417.98 samples/sec   Loss 7.7553   LearningRate 0.0706   Epoch: 3   Global Step: 18190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:39,023-Speed 3430.41 samples/sec   Loss 7.7666   LearningRate 0.0706   Epoch: 3   Global Step: 18200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:42,011-Speed 3427.70 samples/sec   Loss 7.6131   LearningRate 0.0705   Epoch: 3   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:44,990-Speed 3437.85 samples/sec   Loss 7.6836   LearningRate 0.0705   Epoch: 3   Global Step: 18220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:47,986-Speed 3418.56 samples/sec   Loss 7.8663   LearningRate 0.0705   Epoch: 3   Global Step: 18230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:50,985-Speed 3415.76 samples/sec   Loss 7.5930   LearningRate 0.0705   Epoch: 3   Global Step: 18240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:53,977-Speed 3423.14 samples/sec   Loss 7.8730   LearningRate 0.0705   Epoch: 3   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:56,958-Speed 3435.98 samples/sec   Loss 7.6953   LearningRate 0.0705   Epoch: 3   Global Step: 18260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:19:59,919-Speed 3458.61 samples/sec   Loss 7.6864   LearningRate 0.0704   Epoch: 3   Global Step: 18270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:20:02,923-Speed 3410.75 samples/sec   Loss 7.6364   LearningRate 0.0704   Epoch: 3   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:05,921-Speed 3416.01 samples/sec   Loss 7.7808   LearningRate 0.0704   Epoch: 3   Global Step: 18290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:08,900-Speed 3437.87 samples/sec   Loss 7.7535   LearningRate 0.0704   Epoch: 3   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:11,933-Speed 3377.56 samples/sec   Loss 7.6288   LearningRate 0.0704   Epoch: 3   Global Step: 18310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:14,925-Speed 3422.18 samples/sec   Loss 7.5487   LearningRate 0.0704   Epoch: 3   Global Step: 18320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:17,914-Speed 3427.00 samples/sec   Loss 7.7234   LearningRate 0.0704   Epoch: 3   Global Step: 18330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:20,891-Speed 3441.09 samples/sec   Loss 7.6883   LearningRate 0.0703   Epoch: 3   Global Step: 18340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:23,893-Speed 3412.23 samples/sec   Loss 7.6098   LearningRate 0.0703   Epoch: 3   Global Step: 18350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:26,884-Speed 3423.87 samples/sec   Loss 7.8244   LearningRate 0.0703   Epoch: 3   Global Step: 18360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:29,886-Speed 3411.61 samples/sec   Loss 7.8459   LearningRate 0.0703   Epoch: 3   Global Step: 18370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:32,894-Speed 3405.62 samples/sec   Loss 7.6137   LearningRate 0.0703   Epoch: 3   Global Step: 18380   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:20:35,887-Speed 3422.44 samples/sec   Loss 7.6798   LearningRate 0.0703   Epoch: 3   Global Step: 18390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:38,856-Speed 3448.89 samples/sec   Loss 7.6111   LearningRate 0.0703   Epoch: 3   Global Step: 18400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:41,843-Speed 3428.95 samples/sec   Loss 7.7119   LearningRate 0.0702   Epoch: 3   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:20:44,825-Speed 3434.70 samples/sec   Loss 7.6681   LearningRate 0.0702   Epoch: 3   Global Step: 18420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:20:47,822-Speed 3417.49 samples/sec   Loss 7.7973   LearningRate 0.0702   Epoch: 3   Global Step: 18430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:20:50,788-Speed 3453.77 samples/sec   Loss 7.5511   LearningRate 0.0702   Epoch: 3   Global Step: 18440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:20:53,781-Speed 3422.57 samples/sec   Loss 7.7019   LearningRate 0.0702   Epoch: 3   Global Step: 18450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:20:56,785-Speed 3408.99 samples/sec   Loss 7.8359   LearningRate 0.0702   Epoch: 3   Global Step: 18460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:20:59,784-Speed 3415.45 samples/sec   Loss 7.6144   LearningRate 0.0702   Epoch: 3   Global Step: 18470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:21:02,789-Speed 3408.80 samples/sec   Loss 7.9264   LearningRate 0.0701   Epoch: 3   Global Step: 18480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:21:05,793-Speed 3408.74 samples/sec   Loss 7.6969   LearningRate 0.0701   Epoch: 3   Global Step: 18490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:21:08,793-Speed 3414.28 samples/sec   Loss 7.7355   LearningRate 0.0701   Epoch: 3   Global Step: 18500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:21:11,798-Speed 3408.25 samples/sec   Loss 7.7713   LearningRate 0.0701   Epoch: 3   Global Step: 18510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:21:14,812-Speed 3399.18 samples/sec   Loss 7.8533   LearningRate 0.0701   Epoch: 3   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:17,798-Speed 3429.71 samples/sec   Loss 7.7450   LearningRate 0.0701   Epoch: 3   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:20,795-Speed 3418.47 samples/sec   Loss 7.7435   LearningRate 0.0701   Epoch: 3   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:23,780-Speed 3430.56 samples/sec   Loss 7.7312   LearningRate 0.0700   Epoch: 3   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:26,779-Speed 3414.86 samples/sec   Loss 7.7524   LearningRate 0.0700   Epoch: 3   Global Step: 18560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:29,747-Speed 3451.00 samples/sec   Loss 7.6475   LearningRate 0.0700   Epoch: 3   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:32,718-Speed 3447.33 samples/sec   Loss 7.7429   LearningRate 0.0700   Epoch: 3   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:35,715-Speed 3417.50 samples/sec   Loss 7.7190   LearningRate 0.0700   Epoch: 3   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:38,705-Speed 3426.13 samples/sec   Loss 7.6042   LearningRate 0.0700   Epoch: 3   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:41,683-Speed 3439.31 samples/sec   Loss 7.6608   LearningRate 0.0699   Epoch: 3   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:44,663-Speed 3437.35 samples/sec   Loss 7.6961   LearningRate 0.0699   Epoch: 3   Global Step: 18620   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:21:47,658-Speed 3419.81 samples/sec   Loss 7.8637   LearningRate 0.0699   Epoch: 3   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:50,668-Speed 3402.75 samples/sec   Loss 7.7290   LearningRate 0.0699   Epoch: 3   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:53,737-Speed 3336.97 samples/sec   Loss 7.7663   LearningRate 0.0699   Epoch: 3   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:56,701-Speed 3455.37 samples/sec   Loss 7.7239   LearningRate 0.0699   Epoch: 3   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:21:59,706-Speed 3409.17 samples/sec   Loss 7.7168   LearningRate 0.0699   Epoch: 3   Global Step: 18670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:22:02,693-Speed 3428.61 samples/sec   Loss 7.4515   LearningRate 0.0698   Epoch: 3   Global Step: 18680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:05,696-Speed 3410.44 samples/sec   Loss 7.7857   LearningRate 0.0698   Epoch: 3   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:08,680-Speed 3432.38 samples/sec   Loss 7.6572   LearningRate 0.0698   Epoch: 3   Global Step: 18700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:11,711-Speed 3380.31 samples/sec   Loss 7.5355   LearningRate 0.0698   Epoch: 3   Global Step: 18710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:14,720-Speed 3403.31 samples/sec   Loss 7.6151   LearningRate 0.0698   Epoch: 3   Global Step: 18720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:17,731-Speed 3402.20 samples/sec   Loss 7.7179   LearningRate 0.0698   Epoch: 3   Global Step: 18730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:20,727-Speed 3417.87 samples/sec   Loss 7.6252   LearningRate 0.0698   Epoch: 3   Global Step: 18740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:23,747-Speed 3391.79 samples/sec   Loss 7.7119   LearningRate 0.0697   Epoch: 3   Global Step: 18750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:26,729-Speed 3434.58 samples/sec   Loss 7.6702   LearningRate 0.0697   Epoch: 3   Global Step: 18760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:29,714-Speed 3431.02 samples/sec   Loss 7.6384   LearningRate 0.0697   Epoch: 3   Global Step: 18770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:32,696-Speed 3434.70 samples/sec   Loss 7.7072   LearningRate 0.0697   Epoch: 3   Global Step: 18780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:22:35,694-Speed 3417.23 samples/sec   Loss 7.5547   LearningRate 0.0697   Epoch: 3   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:22:38,692-Speed 3416.21 samples/sec   Loss 7.6689   LearningRate 0.0697   Epoch: 3   Global Step: 18800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:22:41,697-Speed 3409.12 samples/sec   Loss 7.6189   LearningRate 0.0697   Epoch: 3   Global Step: 18810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:22:44,708-Speed 3401.29 samples/sec   Loss 7.6273   LearningRate 0.0696   Epoch: 3   Global Step: 18820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:22:47,711-Speed 3410.37 samples/sec   Loss 7.5945   LearningRate 0.0696   Epoch: 3   Global Step: 18830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:22:50,692-Speed 3435.49 samples/sec   Loss 7.5921   LearningRate 0.0696   Epoch: 3   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:53,694-Speed 3412.11 samples/sec   Loss 7.5833   LearningRate 0.0696   Epoch: 3   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:56,680-Speed 3429.72 samples/sec   Loss 7.6398   LearningRate 0.0696   Epoch: 3   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:22:59,669-Speed 3426.88 samples/sec   Loss 7.6218   LearningRate 0.0696   Epoch: 3   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:23:02,677-Speed 3405.05 samples/sec   Loss 7.7118   LearningRate 0.0696   Epoch: 3   Global Step: 18880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:23:05,659-Speed 3434.87 samples/sec   Loss 7.6237   LearningRate 0.0695   Epoch: 3   Global Step: 18890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:23:08,656-Speed 3417.90 samples/sec   Loss 7.6511   LearningRate 0.0695   Epoch: 3   Global Step: 18900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:23:11,648-Speed 3423.07 samples/sec   Loss 7.6979   LearningRate 0.0695   Epoch: 3   Global Step: 18910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:23:14,674-Speed 3384.63 samples/sec   Loss 7.5621   LearningRate 0.0695   Epoch: 3   Global Step: 18920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:23:17,676-Speed 3411.51 samples/sec   Loss 7.7125   LearningRate 0.0695   Epoch: 3   Global Step: 18930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:23:20,694-Speed 3394.19 samples/sec   Loss 7.5855   LearningRate 0.0695   Epoch: 3   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:23,687-Speed 3422.71 samples/sec   Loss 7.6522   LearningRate 0.0694   Epoch: 3   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:26,681-Speed 3420.55 samples/sec   Loss 7.5076   LearningRate 0.0694   Epoch: 3   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:29,670-Speed 3426.63 samples/sec   Loss 7.6540   LearningRate 0.0694   Epoch: 3   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:32,664-Speed 3420.76 samples/sec   Loss 7.5686   LearningRate 0.0694   Epoch: 3   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:35,690-Speed 3385.57 samples/sec   Loss 7.6426   LearningRate 0.0694   Epoch: 3   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:38,679-Speed 3426.68 samples/sec   Loss 7.5768   LearningRate 0.0694   Epoch: 3   Global Step: 19000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:41,673-Speed 3420.97 samples/sec   Loss 7.6927   LearningRate 0.0694   Epoch: 3   Global Step: 19010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:44,690-Speed 3394.52 samples/sec   Loss 7.6787   LearningRate 0.0693   Epoch: 3   Global Step: 19020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:47,683-Speed 3422.00 samples/sec   Loss 7.5778   LearningRate 0.0693   Epoch: 3   Global Step: 19030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:23:50,675-Speed 3422.65 samples/sec   Loss 7.6053   LearningRate 0.0693   Epoch: 3   Global Step: 19040   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:23:53,655-Speed 3437.00 samples/sec   Loss 7.4828   LearningRate 0.0693   Epoch: 3   Global Step: 19050   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:23:56,665-Speed 3402.76 samples/sec   Loss 7.7067   LearningRate 0.0693   Epoch: 3   Global Step: 19060   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:23:59,670-Speed 3409.11 samples/sec   Loss 7.7008   LearningRate 0.0693   Epoch: 3   Global Step: 19070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:02,648-Speed 3439.45 samples/sec   Loss 7.6218   LearningRate 0.0693   Epoch: 3   Global Step: 19080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:05,650-Speed 3412.23 samples/sec   Loss 7.4842   LearningRate 0.0692   Epoch: 3   Global Step: 19090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:08,660-Speed 3402.38 samples/sec   Loss 7.5629   LearningRate 0.0692   Epoch: 3   Global Step: 19100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:11,675-Speed 3397.08 samples/sec   Loss 7.4711   LearningRate 0.0692   Epoch: 3   Global Step: 19110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:14,678-Speed 3410.81 samples/sec   Loss 7.6747   LearningRate 0.0692   Epoch: 3   Global Step: 19120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:17,695-Speed 3394.52 samples/sec   Loss 7.6338   LearningRate 0.0692   Epoch: 3   Global Step: 19130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:20,712-Speed 3394.90 samples/sec   Loss 7.6456   LearningRate 0.0692   Epoch: 3   Global Step: 19140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:23,710-Speed 3416.12 samples/sec   Loss 7.5422   LearningRate 0.0692   Epoch: 3   Global Step: 19150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:26,698-Speed 3428.70 samples/sec   Loss 7.7426   LearningRate 0.0691   Epoch: 3   Global Step: 19160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:30,001-Speed 3100.68 samples/sec   Loss 7.6659   LearningRate 0.0691   Epoch: 3   Global Step: 19170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:24:33,000-Speed 3415.36 samples/sec   Loss 7.7848   LearningRate 0.0691   Epoch: 3   Global Step: 19180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:35,995-Speed 3419.60 samples/sec   Loss 7.4875   LearningRate 0.0691   Epoch: 3   Global Step: 19190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:38,996-Speed 3413.08 samples/sec   Loss 7.4876   LearningRate 0.0691   Epoch: 3   Global Step: 19200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:41,981-Speed 3430.95 samples/sec   Loss 7.6384   LearningRate 0.0691   Epoch: 3   Global Step: 19210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:44,985-Speed 3409.09 samples/sec   Loss 7.6934   LearningRate 0.0691   Epoch: 3   Global Step: 19220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:47,986-Speed 3412.93 samples/sec   Loss 7.6750   LearningRate 0.0690   Epoch: 3   Global Step: 19230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:50,985-Speed 3415.37 samples/sec   Loss 7.7137   LearningRate 0.0690   Epoch: 3   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:53,983-Speed 3416.39 samples/sec   Loss 7.6491   LearningRate 0.0690   Epoch: 3   Global Step: 19250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:24:57,011-Speed 3382.54 samples/sec   Loss 7.6775   LearningRate 0.0690   Epoch: 3   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:00,034-Speed 3388.79 samples/sec   Loss 7.5309   LearningRate 0.0690   Epoch: 3   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:03,045-Speed 3402.16 samples/sec   Loss 7.4758   LearningRate 0.0690   Epoch: 3   Global Step: 19280   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:25:06,068-Speed 3388.58 samples/sec   Loss 7.7104   LearningRate 0.0690   Epoch: 3   Global Step: 19290   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:25:09,078-Speed 3402.46 samples/sec   Loss 7.6014   LearningRate 0.0689   Epoch: 3   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:12,059-Speed 3436.63 samples/sec   Loss 7.5011   LearningRate 0.0689   Epoch: 3   Global Step: 19310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:15,056-Speed 3417.34 samples/sec   Loss 7.5211   LearningRate 0.0689   Epoch: 3   Global Step: 19320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:18,062-Speed 3406.91 samples/sec   Loss 7.5312   LearningRate 0.0689   Epoch: 3   Global Step: 19330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:21,169-Speed 3296.42 samples/sec   Loss 7.6088   LearningRate 0.0689   Epoch: 3   Global Step: 19340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:24,290-Speed 3282.18 samples/sec   Loss 7.6826   LearningRate 0.0689   Epoch: 3   Global Step: 19350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:27,324-Speed 3376.47 samples/sec   Loss 7.6802   LearningRate 0.0688   Epoch: 3   Global Step: 19360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:30,341-Speed 3394.91 samples/sec   Loss 7.5817   LearningRate 0.0688   Epoch: 3   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:33,346-Speed 3407.59 samples/sec   Loss 7.5857   LearningRate 0.0688   Epoch: 3   Global Step: 19380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:36,351-Speed 3408.73 samples/sec   Loss 7.7001   LearningRate 0.0688   Epoch: 3   Global Step: 19390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:39,362-Speed 3401.72 samples/sec   Loss 7.6537   LearningRate 0.0688   Epoch: 3   Global Step: 19400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:25:42,368-Speed 3407.45 samples/sec   Loss 7.6037   LearningRate 0.0688   Epoch: 3   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:45,392-Speed 3387.29 samples/sec   Loss 7.6189   LearningRate 0.0688   Epoch: 3   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:48,419-Speed 3383.64 samples/sec   Loss 7.7574   LearningRate 0.0687   Epoch: 3   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:51,413-Speed 3420.67 samples/sec   Loss 7.5587   LearningRate 0.0687   Epoch: 3   Global Step: 19440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:54,426-Speed 3400.08 samples/sec   Loss 7.6314   LearningRate 0.0687   Epoch: 3   Global Step: 19450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:25:57,435-Speed 3403.49 samples/sec   Loss 7.4862   LearningRate 0.0687   Epoch: 3   Global Step: 19460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:26:00,446-Speed 3402.01 samples/sec   Loss 7.5442   LearningRate 0.0687   Epoch: 3   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:03,458-Speed 3399.80 samples/sec   Loss 7.5191   LearningRate 0.0687   Epoch: 3   Global Step: 19480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:06,446-Speed 3428.72 samples/sec   Loss 7.4440   LearningRate 0.0687   Epoch: 3   Global Step: 19490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:09,470-Speed 3386.21 samples/sec   Loss 7.6514   LearningRate 0.0686   Epoch: 3   Global Step: 19500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:12,493-Speed 3389.18 samples/sec   Loss 7.7438   LearningRate 0.0686   Epoch: 3   Global Step: 19510   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:26:15,483-Speed 3424.65 samples/sec   Loss 7.4790   LearningRate 0.0686   Epoch: 3   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:18,486-Speed 3411.99 samples/sec   Loss 7.5406   LearningRate 0.0686   Epoch: 3   Global Step: 19530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:21,480-Speed 3420.00 samples/sec   Loss 7.4391   LearningRate 0.0686   Epoch: 3   Global Step: 19540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:24,497-Speed 3395.25 samples/sec   Loss 7.5903   LearningRate 0.0686   Epoch: 3   Global Step: 19550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:27,511-Speed 3398.02 samples/sec   Loss 7.5264   LearningRate 0.0686   Epoch: 3   Global Step: 19560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:30,520-Speed 3403.62 samples/sec   Loss 7.5722   LearningRate 0.0685   Epoch: 3   Global Step: 19570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:33,517-Speed 3417.98 samples/sec   Loss 7.5189   LearningRate 0.0685   Epoch: 3   Global Step: 19580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:36,513-Speed 3418.05 samples/sec   Loss 7.4658   LearningRate 0.0685   Epoch: 3   Global Step: 19590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:39,536-Speed 3389.25 samples/sec   Loss 7.6546   LearningRate 0.0685   Epoch: 3   Global Step: 19600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:42,555-Speed 3392.22 samples/sec   Loss 7.7609   LearningRate 0.0685   Epoch: 3   Global Step: 19610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:45,569-Speed 3398.43 samples/sec   Loss 7.5181   LearningRate 0.0685   Epoch: 3   Global Step: 19620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:26:48,559-Speed 3426.32 samples/sec   Loss 7.5483   LearningRate 0.0685   Epoch: 3   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:51,576-Speed 3393.91 samples/sec   Loss 7.7061   LearningRate 0.0684   Epoch: 3   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:54,589-Speed 3399.80 samples/sec   Loss 7.6449   LearningRate 0.0684   Epoch: 3   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:26:57,626-Speed 3372.00 samples/sec   Loss 7.6496   LearningRate 0.0684   Epoch: 3   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:00,649-Speed 3388.18 samples/sec   Loss 7.4137   LearningRate 0.0684   Epoch: 3   Global Step: 19670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:03,678-Speed 3381.26 samples/sec   Loss 7.5955   LearningRate 0.0684   Epoch: 3   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:06,684-Speed 3407.30 samples/sec   Loss 7.5434   LearningRate 0.0684   Epoch: 3   Global Step: 19690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:09,685-Speed 3414.13 samples/sec   Loss 7.4623   LearningRate 0.0684   Epoch: 3   Global Step: 19700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:12,692-Speed 3405.39 samples/sec   Loss 7.5666   LearningRate 0.0683   Epoch: 3   Global Step: 19710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:15,706-Speed 3398.14 samples/sec   Loss 7.4482   LearningRate 0.0683   Epoch: 3   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:18,709-Speed 3411.76 samples/sec   Loss 7.7088   LearningRate 0.0683   Epoch: 3   Global Step: 19730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:21,716-Speed 3405.35 samples/sec   Loss 7.5852   LearningRate 0.0683   Epoch: 3   Global Step: 19740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:24,726-Speed 3403.20 samples/sec   Loss 7.5290   LearningRate 0.0683   Epoch: 3   Global Step: 19750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:27,760-Speed 3375.51 samples/sec   Loss 7.4765   LearningRate 0.0683   Epoch: 3   Global Step: 19760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:30,786-Speed 3385.08 samples/sec   Loss 7.5030   LearningRate 0.0683   Epoch: 3   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:33,818-Speed 3377.75 samples/sec   Loss 7.6792   LearningRate 0.0682   Epoch: 3   Global Step: 19780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:36,811-Speed 3422.47 samples/sec   Loss 7.5889   LearningRate 0.0682   Epoch: 3   Global Step: 19790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:39,833-Speed 3389.25 samples/sec   Loss 7.6579   LearningRate 0.0682   Epoch: 3   Global Step: 19800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:42,866-Speed 3377.50 samples/sec   Loss 7.6460   LearningRate 0.0682   Epoch: 3   Global Step: 19810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:45,894-Speed 3381.99 samples/sec   Loss 7.5330   LearningRate 0.0682   Epoch: 3   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:48,930-Speed 3373.49 samples/sec   Loss 7.4628   LearningRate 0.0682   Epoch: 3   Global Step: 19830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:27:51,954-Speed 3386.94 samples/sec   Loss 7.6671   LearningRate 0.0682   Epoch: 3   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:54,992-Speed 3371.76 samples/sec   Loss 7.5224   LearningRate 0.0681   Epoch: 3   Global Step: 19850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:27:58,019-Speed 3383.93 samples/sec   Loss 7.5998   LearningRate 0.0681   Epoch: 3   Global Step: 19860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:28:01,057-Speed 3370.85 samples/sec   Loss 7.4758   LearningRate 0.0681   Epoch: 3   Global Step: 19870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:28:04,068-Speed 3402.06 samples/sec   Loss 7.4956   LearningRate 0.0681   Epoch: 3   Global Step: 19880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:28:07,083-Speed 3396.91 samples/sec   Loss 7.6408   LearningRate 0.0681   Epoch: 3   Global Step: 19890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:10,089-Speed 3407.35 samples/sec   Loss 7.5708   LearningRate 0.0681   Epoch: 3   Global Step: 19900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:13,078-Speed 3426.64 samples/sec   Loss 7.4473   LearningRate 0.0680   Epoch: 3   Global Step: 19910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:16,118-Speed 3369.88 samples/sec   Loss 7.5498   LearningRate 0.0680   Epoch: 3   Global Step: 19920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:19,131-Speed 3399.47 samples/sec   Loss 7.4117   LearningRate 0.0680   Epoch: 3   Global Step: 19930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:22,157-Speed 3384.98 samples/sec   Loss 7.3860   LearningRate 0.0680   Epoch: 3   Global Step: 19940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:25,169-Speed 3400.91 samples/sec   Loss 7.5044   LearningRate 0.0680   Epoch: 3   Global Step: 19950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:28,183-Speed 3397.72 samples/sec   Loss 7.4408   LearningRate 0.0680   Epoch: 3   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:31,193-Speed 3403.20 samples/sec   Loss 7.5799   LearningRate 0.0680   Epoch: 3   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:34,219-Speed 3385.08 samples/sec   Loss 7.5435   LearningRate 0.0679   Epoch: 3   Global Step: 19980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:28:37,261-Speed 3367.17 samples/sec   Loss 7.5722   LearningRate 0.0679   Epoch: 3   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:28:40,296-Speed 3373.76 samples/sec   Loss 7.5731   LearningRate 0.0679   Epoch: 3   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:29:23,734-[lfw][20000]XNorm: 21.442830
Training: 2022-04-27 03:29:23,734-[lfw][20000]Accuracy-Flip: 0.99600+-0.00351
Training: 2022-04-27 03:29:23,735-[lfw][20000]Accuracy-Highest: 0.99633
Training: 2022-04-27 03:30:14,203-[cfp_fp][20000]XNorm: 19.539685
Training: 2022-04-27 03:30:14,203-[cfp_fp][20000]Accuracy-Flip: 0.93114+-0.01514
Training: 2022-04-27 03:30:14,204-[cfp_fp][20000]Accuracy-Highest: 0.93214
Training: 2022-04-27 03:30:57,601-[agedb_30][20000]XNorm: 21.576006
Training: 2022-04-27 03:30:57,602-[agedb_30][20000]Accuracy-Flip: 0.97167+-0.00771
Training: 2022-04-27 03:30:57,602-[agedb_30][20000]Accuracy-Highest: 0.97167
Training: 2022-04-27 03:31:00,614-Speed 72.98 samples/sec   Loss 7.5742   LearningRate 0.0679   Epoch: 3   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:03,607-Speed 3421.31 samples/sec   Loss 7.3463   LearningRate 0.0679   Epoch: 3   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:06,604-Speed 3417.79 samples/sec   Loss 7.4157   LearningRate 0.0679   Epoch: 3   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:09,582-Speed 3439.00 samples/sec   Loss 7.4190   LearningRate 0.0679   Epoch: 3   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:12,606-Speed 3387.37 samples/sec   Loss 7.5717   LearningRate 0.0678   Epoch: 3   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:15,595-Speed 3425.88 samples/sec   Loss 7.5570   LearningRate 0.0678   Epoch: 3   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:18,614-Speed 3393.50 samples/sec   Loss 7.5244   LearningRate 0.0678   Epoch: 3   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:21,621-Speed 3406.09 samples/sec   Loss 7.3571   LearningRate 0.0678   Epoch: 3   Global Step: 20080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:24,641-Speed 3391.63 samples/sec   Loss 7.4560   LearningRate 0.0678   Epoch: 3   Global Step: 20090   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-27 03:31:27,735-Speed 3310.49 samples/sec   Loss 7.5317   LearningRate 0.0678   Epoch: 3   Global Step: 20100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:30,739-Speed 3409.62 samples/sec   Loss 7.4287   LearningRate 0.0678   Epoch: 3   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:33,754-Speed 3396.40 samples/sec   Loss 7.5535   LearningRate 0.0677   Epoch: 3   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:36,771-Speed 3395.08 samples/sec   Loss 7.5843   LearningRate 0.0677   Epoch: 3   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:39,768-Speed 3417.41 samples/sec   Loss 7.4061   LearningRate 0.0677   Epoch: 3   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:31:42,757-Speed 3427.08 samples/sec   Loss 7.4228   LearningRate 0.0677   Epoch: 3   Global Step: 20150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:31:45,778-Speed 3390.73 samples/sec   Loss 7.4979   LearningRate 0.0677   Epoch: 3   Global Step: 20160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:31:48,795-Speed 3394.72 samples/sec   Loss 7.5426   LearningRate 0.0677   Epoch: 3   Global Step: 20170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:31:51,825-Speed 3380.83 samples/sec   Loss 7.5500   LearningRate 0.0677   Epoch: 3   Global Step: 20180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:31:54,877-Speed 3355.38 samples/sec   Loss 7.5545   LearningRate 0.0676   Epoch: 3   Global Step: 20190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:31:57,895-Speed 3394.12 samples/sec   Loss 7.6523   LearningRate 0.0676   Epoch: 3   Global Step: 20200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:00,913-Speed 3394.12 samples/sec   Loss 7.4397   LearningRate 0.0676   Epoch: 3   Global Step: 20210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:03,941-Speed 3382.86 samples/sec   Loss 7.4859   LearningRate 0.0676   Epoch: 3   Global Step: 20220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:06,968-Speed 3383.29 samples/sec   Loss 7.5125   LearningRate 0.0676   Epoch: 3   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:09,989-Speed 3390.39 samples/sec   Loss 7.5852   LearningRate 0.0676   Epoch: 3   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:13,004-Speed 3397.48 samples/sec   Loss 7.4494   LearningRate 0.0676   Epoch: 3   Global Step: 20250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:16,037-Speed 3377.31 samples/sec   Loss 7.3952   LearningRate 0.0675   Epoch: 3   Global Step: 20260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:19,066-Speed 3381.38 samples/sec   Loss 7.3298   LearningRate 0.0675   Epoch: 3   Global Step: 20270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:22,085-Speed 3392.81 samples/sec   Loss 7.4873   LearningRate 0.0675   Epoch: 3   Global Step: 20280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:25,106-Speed 3390.55 samples/sec   Loss 7.4090   LearningRate 0.0675   Epoch: 3   Global Step: 20290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:28,130-Speed 3386.88 samples/sec   Loss 7.3767   LearningRate 0.0675   Epoch: 3   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:31,135-Speed 3408.10 samples/sec   Loss 7.3997   LearningRate 0.0675   Epoch: 3   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:34,153-Speed 3394.10 samples/sec   Loss 7.4026   LearningRate 0.0675   Epoch: 3   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:32:37,150-Speed 3417.30 samples/sec   Loss 7.3439   LearningRate 0.0674   Epoch: 3   Global Step: 20330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:40,178-Speed 3382.58 samples/sec   Loss 7.4667   LearningRate 0.0674   Epoch: 3   Global Step: 20340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:43,184-Speed 3408.18 samples/sec   Loss 7.4904   LearningRate 0.0674   Epoch: 3   Global Step: 20350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:46,191-Speed 3405.32 samples/sec   Loss 7.4895   LearningRate 0.0674   Epoch: 3   Global Step: 20360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:49,190-Speed 3415.67 samples/sec   Loss 7.3712   LearningRate 0.0674   Epoch: 3   Global Step: 20370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:52,204-Speed 3398.27 samples/sec   Loss 7.3810   LearningRate 0.0674   Epoch: 3   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:55,204-Speed 3413.79 samples/sec   Loss 7.3774   LearningRate 0.0674   Epoch: 3   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:32:58,191-Speed 3429.45 samples/sec   Loss 7.3719   LearningRate 0.0673   Epoch: 3   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:33:01,205-Speed 3397.58 samples/sec   Loss 7.5305   LearningRate 0.0673   Epoch: 3   Global Step: 20410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:33:04,204-Speed 3416.30 samples/sec   Loss 7.2459   LearningRate 0.0673   Epoch: 3   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 03:33:07,224-Speed 3391.57 samples/sec   Loss 7.5860   LearningRate 0.0673   Epoch: 3   Global Step: 20430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 03:33:10,244-Speed 3391.22 samples/sec   Loss 7.4569   LearningRate 0.0673   Epoch: 3   Global Step: 20440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:13,254-Speed 3403.07 samples/sec   Loss 7.6904   LearningRate 0.0673   Epoch: 3   Global Step: 20450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:16,263-Speed 3403.22 samples/sec   Loss 7.3755   LearningRate 0.0673   Epoch: 3   Global Step: 20460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:19,284-Speed 3391.02 samples/sec   Loss 7.3978   LearningRate 0.0672   Epoch: 3   Global Step: 20470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:22,301-Speed 3394.70 samples/sec   Loss 7.4320   LearningRate 0.0672   Epoch: 3   Global Step: 20480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:25,323-Speed 3389.31 samples/sec   Loss 7.4796   LearningRate 0.0672   Epoch: 3   Global Step: 20490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:28,321-Speed 3416.28 samples/sec   Loss 7.2400   LearningRate 0.0672   Epoch: 3   Global Step: 20500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:31,342-Speed 3389.98 samples/sec   Loss 7.5217   LearningRate 0.0672   Epoch: 3   Global Step: 20510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:33:34,324-Speed 3436.05 samples/sec   Loss 7.3600   LearningRate 0.0672   Epoch: 3   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:37,341-Speed 3393.97 samples/sec   Loss 7.3467   LearningRate 0.0672   Epoch: 3   Global Step: 20530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:40,340-Speed 3416.15 samples/sec   Loss 7.4513   LearningRate 0.0671   Epoch: 3   Global Step: 20540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:43,347-Speed 3404.92 samples/sec   Loss 7.3992   LearningRate 0.0671   Epoch: 3   Global Step: 20550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:46,380-Speed 3378.06 samples/sec   Loss 7.5161   LearningRate 0.0671   Epoch: 3   Global Step: 20560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:49,403-Speed 3387.66 samples/sec   Loss 7.4147   LearningRate 0.0671   Epoch: 3   Global Step: 20570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:52,425-Speed 3389.63 samples/sec   Loss 7.5711   LearningRate 0.0671   Epoch: 3   Global Step: 20580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:55,432-Speed 3405.12 samples/sec   Loss 7.4032   LearningRate 0.0671   Epoch: 3   Global Step: 20590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:33:58,455-Speed 3388.62 samples/sec   Loss 7.5301   LearningRate 0.0671   Epoch: 3   Global Step: 20600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:01,477-Speed 3389.95 samples/sec   Loss 7.5856   LearningRate 0.0670   Epoch: 3   Global Step: 20610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:04,492-Speed 3397.10 samples/sec   Loss 7.5411   LearningRate 0.0670   Epoch: 3   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:07,505-Speed 3399.69 samples/sec   Loss 7.4747   LearningRate 0.0670   Epoch: 3   Global Step: 20630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:10,526-Speed 3390.55 samples/sec   Loss 7.5534   LearningRate 0.0670   Epoch: 3   Global Step: 20640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:13,547-Speed 3389.62 samples/sec   Loss 7.5776   LearningRate 0.0670   Epoch: 3   Global Step: 20650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:16,555-Speed 3405.16 samples/sec   Loss 7.6411   LearningRate 0.0670   Epoch: 3   Global Step: 20660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:19,554-Speed 3414.97 samples/sec   Loss 7.4576   LearningRate 0.0670   Epoch: 3   Global Step: 20670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:22,573-Speed 3392.80 samples/sec   Loss 7.6101   LearningRate 0.0669   Epoch: 3   Global Step: 20680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:25,599-Speed 3385.35 samples/sec   Loss 7.5818   LearningRate 0.0669   Epoch: 3   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:34:28,574-Speed 3442.53 samples/sec   Loss 7.4362   LearningRate 0.0669   Epoch: 3   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:31,572-Speed 3416.27 samples/sec   Loss 7.4263   LearningRate 0.0669   Epoch: 3   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:34,593-Speed 3391.07 samples/sec   Loss 7.4473   LearningRate 0.0669   Epoch: 3   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:37,602-Speed 3404.46 samples/sec   Loss 7.3261   LearningRate 0.0669   Epoch: 3   Global Step: 20730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:40,620-Speed 3393.71 samples/sec   Loss 7.5023   LearningRate 0.0669   Epoch: 3   Global Step: 20740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:43,638-Speed 3392.76 samples/sec   Loss 7.4754   LearningRate 0.0668   Epoch: 3   Global Step: 20750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:46,666-Speed 3382.64 samples/sec   Loss 7.4602   LearningRate 0.0668   Epoch: 3   Global Step: 20760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:49,687-Speed 3390.89 samples/sec   Loss 7.3565   LearningRate 0.0668   Epoch: 3   Global Step: 20770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:52,702-Speed 3397.33 samples/sec   Loss 7.4476   LearningRate 0.0668   Epoch: 3   Global Step: 20780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:55,719-Speed 3395.23 samples/sec   Loss 7.3586   LearningRate 0.0668   Epoch: 3   Global Step: 20790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:34:58,743-Speed 3386.70 samples/sec   Loss 7.3502   LearningRate 0.0668   Epoch: 3   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:01,759-Speed 3396.46 samples/sec   Loss 7.4882   LearningRate 0.0667   Epoch: 3   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:04,777-Speed 3394.55 samples/sec   Loss 7.3885   LearningRate 0.0667   Epoch: 3   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:07,794-Speed 3394.60 samples/sec   Loss 7.4521   LearningRate 0.0667   Epoch: 3   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:10,818-Speed 3387.03 samples/sec   Loss 7.4854   LearningRate 0.0667   Epoch: 3   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:13,839-Speed 3390.23 samples/sec   Loss 7.4102   LearningRate 0.0667   Epoch: 3   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:16,859-Speed 3391.62 samples/sec   Loss 7.3850   LearningRate 0.0667   Epoch: 3   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:19,878-Speed 3393.04 samples/sec   Loss 7.5158   LearningRate 0.0667   Epoch: 3   Global Step: 20870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:22,906-Speed 3381.72 samples/sec   Loss 7.3493   LearningRate 0.0666   Epoch: 3   Global Step: 20880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:25,918-Speed 3400.99 samples/sec   Loss 7.5158   LearningRate 0.0666   Epoch: 3   Global Step: 20890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:28,946-Speed 3382.86 samples/sec   Loss 7.3947   LearningRate 0.0666   Epoch: 3   Global Step: 20900   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:35:31,967-Speed 3390.06 samples/sec   Loss 7.2988   LearningRate 0.0666   Epoch: 3   Global Step: 20910   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:35:34,992-Speed 3386.61 samples/sec   Loss 7.5861   LearningRate 0.0666   Epoch: 3   Global Step: 20920   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:35:37,987-Speed 3419.17 samples/sec   Loss 7.4642   LearningRate 0.0666   Epoch: 3   Global Step: 20930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:41,019-Speed 3378.03 samples/sec   Loss 7.3765   LearningRate 0.0666   Epoch: 3   Global Step: 20940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:44,046-Speed 3383.81 samples/sec   Loss 7.4054   LearningRate 0.0665   Epoch: 3   Global Step: 20950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:35:47,050-Speed 3409.28 samples/sec   Loss 7.2580   LearningRate 0.0665   Epoch: 3   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:35:50,053-Speed 3410.60 samples/sec   Loss 7.3064   LearningRate 0.0665   Epoch: 3   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:35:53,065-Speed 3400.61 samples/sec   Loss 7.3127   LearningRate 0.0665   Epoch: 3   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:35:56,097-Speed 3378.24 samples/sec   Loss 7.4150   LearningRate 0.0665   Epoch: 3   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:35:59,119-Speed 3389.19 samples/sec   Loss 7.4917   LearningRate 0.0665   Epoch: 3   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:02,140-Speed 3390.64 samples/sec   Loss 7.2726   LearningRate 0.0665   Epoch: 3   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:05,166-Speed 3384.39 samples/sec   Loss 7.2524   LearningRate 0.0664   Epoch: 3   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:08,183-Speed 3394.74 samples/sec   Loss 7.2490   LearningRate 0.0664   Epoch: 3   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:11,208-Speed 3386.33 samples/sec   Loss 7.4101   LearningRate 0.0664   Epoch: 3   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:14,229-Speed 3390.25 samples/sec   Loss 7.2374   LearningRate 0.0664   Epoch: 3   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:17,256-Speed 3383.74 samples/sec   Loss 7.3333   LearningRate 0.0664   Epoch: 3   Global Step: 21060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:36:20,281-Speed 3385.84 samples/sec   Loss 7.4061   LearningRate 0.0664   Epoch: 3   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:36:23,290-Speed 3403.97 samples/sec   Loss 7.3892   LearningRate 0.0664   Epoch: 3   Global Step: 21080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:26,307-Speed 3395.63 samples/sec   Loss 7.3607   LearningRate 0.0663   Epoch: 3   Global Step: 21090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:29,315-Speed 3405.14 samples/sec   Loss 7.3781   LearningRate 0.0663   Epoch: 3   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:32,370-Speed 3351.92 samples/sec   Loss 7.4789   LearningRate 0.0663   Epoch: 3   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:35,386-Speed 3396.00 samples/sec   Loss 7.3596   LearningRate 0.0663   Epoch: 3   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:38,413-Speed 3384.17 samples/sec   Loss 7.5909   LearningRate 0.0663   Epoch: 3   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:41,436-Speed 3387.46 samples/sec   Loss 7.5336   LearningRate 0.0663   Epoch: 3   Global Step: 21140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:44,468-Speed 3378.24 samples/sec   Loss 7.3601   LearningRate 0.0663   Epoch: 3   Global Step: 21150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:47,493-Speed 3386.38 samples/sec   Loss 7.5063   LearningRate 0.0662   Epoch: 3   Global Step: 21160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:50,518-Speed 3385.65 samples/sec   Loss 7.3390   LearningRate 0.0662   Epoch: 3   Global Step: 21170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:36:53,535-Speed 3395.03 samples/sec   Loss 7.4207   LearningRate 0.0662   Epoch: 3   Global Step: 21180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:36:56,554-Speed 3392.63 samples/sec   Loss 7.1832   LearningRate 0.0662   Epoch: 3   Global Step: 21190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:36:59,579-Speed 3386.28 samples/sec   Loss 7.2922   LearningRate 0.0662   Epoch: 3   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:37:02,594-Speed 3397.29 samples/sec   Loss 7.3450   LearningRate 0.0662   Epoch: 3   Global Step: 21210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:05,629-Speed 3374.52 samples/sec   Loss 7.2852   LearningRate 0.0662   Epoch: 3   Global Step: 21220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:08,646-Speed 3394.81 samples/sec   Loss 7.2811   LearningRate 0.0661   Epoch: 3   Global Step: 21230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:11,651-Speed 3407.72 samples/sec   Loss 7.3711   LearningRate 0.0661   Epoch: 3   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:14,666-Speed 3397.54 samples/sec   Loss 7.3408   LearningRate 0.0661   Epoch: 3   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:17,694-Speed 3383.33 samples/sec   Loss 7.4455   LearningRate 0.0661   Epoch: 3   Global Step: 21260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:20,724-Speed 3380.09 samples/sec   Loss 7.4233   LearningRate 0.0661   Epoch: 3   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:23,756-Speed 3377.84 samples/sec   Loss 7.2034   LearningRate 0.0661   Epoch: 3   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:26,816-Speed 3347.56 samples/sec   Loss 7.4340   LearningRate 0.0661   Epoch: 3   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:29,904-Speed 3316.07 samples/sec   Loss 7.5451   LearningRate 0.0660   Epoch: 3   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:32,934-Speed 3380.46 samples/sec   Loss 7.2225   LearningRate 0.0660   Epoch: 3   Global Step: 21310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:37:35,939-Speed 3407.95 samples/sec   Loss 7.3841   LearningRate 0.0660   Epoch: 3   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:38,965-Speed 3385.61 samples/sec   Loss 7.3533   LearningRate 0.0660   Epoch: 3   Global Step: 21330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:41,982-Speed 3395.13 samples/sec   Loss 7.3068   LearningRate 0.0660   Epoch: 3   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:44,995-Speed 3399.06 samples/sec   Loss 7.3676   LearningRate 0.0660   Epoch: 3   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:48,043-Speed 3360.90 samples/sec   Loss 7.3115   LearningRate 0.0660   Epoch: 3   Global Step: 21360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:51,073-Speed 3380.13 samples/sec   Loss 7.1999   LearningRate 0.0659   Epoch: 3   Global Step: 21370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:54,089-Speed 3395.67 samples/sec   Loss 7.3164   LearningRate 0.0659   Epoch: 3   Global Step: 21380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:37:57,130-Speed 3368.44 samples/sec   Loss 7.5458   LearningRate 0.0659   Epoch: 3   Global Step: 21390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:00,160-Speed 3380.03 samples/sec   Loss 7.5167   LearningRate 0.0659   Epoch: 3   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:03,180-Speed 3391.32 samples/sec   Loss 7.3017   LearningRate 0.0659   Epoch: 3   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:06,197-Speed 3395.75 samples/sec   Loss 7.4277   LearningRate 0.0659   Epoch: 3   Global Step: 21420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:09,227-Speed 3380.52 samples/sec   Loss 7.3236   LearningRate 0.0659   Epoch: 3   Global Step: 21430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:12,380-Speed 3247.83 samples/sec   Loss 7.4059   LearningRate 0.0658   Epoch: 3   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:15,399-Speed 3392.84 samples/sec   Loss 7.3912   LearningRate 0.0658   Epoch: 3   Global Step: 21450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:18,420-Speed 3390.60 samples/sec   Loss 7.4795   LearningRate 0.0658   Epoch: 3   Global Step: 21460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:21,448-Speed 3381.83 samples/sec   Loss 7.3969   LearningRate 0.0658   Epoch: 3   Global Step: 21470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:24,482-Speed 3376.77 samples/sec   Loss 7.3820   LearningRate 0.0658   Epoch: 3   Global Step: 21480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:27,520-Speed 3370.91 samples/sec   Loss 7.4172   LearningRate 0.0658   Epoch: 3   Global Step: 21490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:30,555-Speed 3374.65 samples/sec   Loss 7.3697   LearningRate 0.0658   Epoch: 3   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:33,629-Speed 3332.64 samples/sec   Loss 7.3566   LearningRate 0.0657   Epoch: 3   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:36,659-Speed 3380.04 samples/sec   Loss 7.3092   LearningRate 0.0657   Epoch: 3   Global Step: 21520   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:38:39,675-Speed 3396.39 samples/sec   Loss 7.4339   LearningRate 0.0657   Epoch: 3   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:38:42,680-Speed 3408.51 samples/sec   Loss 7.3215   LearningRate 0.0657   Epoch: 3   Global Step: 21540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:45,691-Speed 3401.23 samples/sec   Loss 7.3601   LearningRate 0.0657   Epoch: 3   Global Step: 21550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:48,717-Speed 3384.71 samples/sec   Loss 7.4138   LearningRate 0.0657   Epoch: 3   Global Step: 21560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:51,757-Speed 3369.68 samples/sec   Loss 7.2597   LearningRate 0.0657   Epoch: 3   Global Step: 21570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:54,789-Speed 3377.91 samples/sec   Loss 7.3113   LearningRate 0.0656   Epoch: 3   Global Step: 21580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:38:57,860-Speed 3335.66 samples/sec   Loss 7.3136   LearningRate 0.0656   Epoch: 3   Global Step: 21590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:00,886-Speed 3384.15 samples/sec   Loss 7.3416   LearningRate 0.0656   Epoch: 3   Global Step: 21600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:03,920-Speed 3375.65 samples/sec   Loss 7.2265   LearningRate 0.0656   Epoch: 3   Global Step: 21610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:06,949-Speed 3382.51 samples/sec   Loss 7.3868   LearningRate 0.0656   Epoch: 3   Global Step: 21620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:09,971-Speed 3388.52 samples/sec   Loss 7.2496   LearningRate 0.0656   Epoch: 3   Global Step: 21630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:13,003-Speed 3378.94 samples/sec   Loss 7.2762   LearningRate 0.0656   Epoch: 3   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:39:16,034-Speed 3379.15 samples/sec   Loss 7.3369   LearningRate 0.0655   Epoch: 3   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:39:19,065-Speed 3379.71 samples/sec   Loss 7.2310   LearningRate 0.0655   Epoch: 3   Global Step: 21660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:39:22,076-Speed 3401.26 samples/sec   Loss 7.1326   LearningRate 0.0655   Epoch: 3   Global Step: 21670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:25,133-Speed 3350.71 samples/sec   Loss 7.2321   LearningRate 0.0655   Epoch: 3   Global Step: 21680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:28,171-Speed 3371.70 samples/sec   Loss 7.5001   LearningRate 0.0655   Epoch: 3   Global Step: 21690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:31,204-Speed 3376.42 samples/sec   Loss 7.2116   LearningRate 0.0655   Epoch: 3   Global Step: 21700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:34,227-Speed 3387.72 samples/sec   Loss 7.3895   LearningRate 0.0655   Epoch: 3   Global Step: 21710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:37,247-Speed 3392.35 samples/sec   Loss 7.2271   LearningRate 0.0654   Epoch: 3   Global Step: 21720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:40,277-Speed 3380.57 samples/sec   Loss 7.2941   LearningRate 0.0654   Epoch: 3   Global Step: 21730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:43,304-Speed 3384.16 samples/sec   Loss 7.2964   LearningRate 0.0654   Epoch: 3   Global Step: 21740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:46,328-Speed 3386.10 samples/sec   Loss 7.1904   LearningRate 0.0654   Epoch: 3   Global Step: 21750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:49,356-Speed 3382.85 samples/sec   Loss 7.3764   LearningRate 0.0654   Epoch: 3   Global Step: 21760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:39:52,384-Speed 3382.79 samples/sec   Loss 7.4272   LearningRate 0.0654   Epoch: 3   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:39:55,410-Speed 3384.94 samples/sec   Loss 7.2999   LearningRate 0.0654   Epoch: 3   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:39:58,446-Speed 3373.56 samples/sec   Loss 7.3786   LearningRate 0.0653   Epoch: 3   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:40:01,448-Speed 3411.98 samples/sec   Loss 7.3168   LearningRate 0.0653   Epoch: 3   Global Step: 21800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:04,479-Speed 3378.83 samples/sec   Loss 7.3860   LearningRate 0.0653   Epoch: 3   Global Step: 21810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:07,507-Speed 3382.18 samples/sec   Loss 7.2010   LearningRate 0.0653   Epoch: 3   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:10,535-Speed 3382.70 samples/sec   Loss 7.2805   LearningRate 0.0653   Epoch: 3   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:13,591-Speed 3351.57 samples/sec   Loss 7.2132   LearningRate 0.0653   Epoch: 3   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:16,632-Speed 3368.60 samples/sec   Loss 7.3148   LearningRate 0.0653   Epoch: 3   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:19,667-Speed 3374.24 samples/sec   Loss 7.1790   LearningRate 0.0652   Epoch: 3   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:22,699-Speed 3378.05 samples/sec   Loss 7.2492   LearningRate 0.0652   Epoch: 3   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:25,732-Speed 3378.13 samples/sec   Loss 7.2733   LearningRate 0.0652   Epoch: 3   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:28,762-Speed 3379.25 samples/sec   Loss 7.2250   LearningRate 0.0652   Epoch: 3   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:31,785-Speed 3388.91 samples/sec   Loss 7.2055   LearningRate 0.0652   Epoch: 3   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:40:34,819-Speed 3375.29 samples/sec   Loss 7.1567   LearningRate 0.0652   Epoch: 3   Global Step: 21910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:40:37,840-Speed 3390.56 samples/sec   Loss 7.3674   LearningRate 0.0652   Epoch: 3   Global Step: 21920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:40:40,865-Speed 3385.80 samples/sec   Loss 7.3102   LearningRate 0.0652   Epoch: 3   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:40:43,894-Speed 3381.50 samples/sec   Loss 7.3008   LearningRate 0.0651   Epoch: 3   Global Step: 21940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:40:46,895-Speed 3413.59 samples/sec   Loss 7.2111   LearningRate 0.0651   Epoch: 3   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:49,920-Speed 3385.33 samples/sec   Loss 7.0888   LearningRate 0.0651   Epoch: 3   Global Step: 21960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:40:52,933-Speed 3399.07 samples/sec   Loss 7.2024   LearningRate 0.0651   Epoch: 3   Global Step: 21970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:40:55,959-Speed 3385.65 samples/sec   Loss 7.3254   LearningRate 0.0651   Epoch: 3   Global Step: 21980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:40:58,989-Speed 3379.74 samples/sec   Loss 7.4264   LearningRate 0.0651   Epoch: 3   Global Step: 21990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:41:02,094-Speed 3298.81 samples/sec   Loss 7.3613   LearningRate 0.0651   Epoch: 3   Global Step: 22000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:41:45,527-[lfw][22000]XNorm: 22.253873
Training: 2022-04-27 03:41:45,528-[lfw][22000]Accuracy-Flip: 0.99717+-0.00325
Training: 2022-04-27 03:41:45,528-[lfw][22000]Accuracy-Highest: 0.99717
Training: 2022-04-27 03:42:36,070-[cfp_fp][22000]XNorm: 19.574763
Training: 2022-04-27 03:42:36,071-[cfp_fp][22000]Accuracy-Flip: 0.94257+-0.01108
Training: 2022-04-27 03:42:36,071-[cfp_fp][22000]Accuracy-Highest: 0.94257
Training: 2022-04-27 03:43:19,382-[agedb_30][22000]XNorm: 22.148440
Training: 2022-04-27 03:43:19,383-[agedb_30][22000]Accuracy-Flip: 0.96983+-0.00893
Training: 2022-04-27 03:43:19,383-[agedb_30][22000]Accuracy-Highest: 0.97167
Training: 2022-04-27 03:43:22,398-Speed 72.98 samples/sec   Loss 7.1951   LearningRate 0.0650   Epoch: 3   Global Step: 22010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:43:25,410-Speed 3400.79 samples/sec   Loss 7.3108   LearningRate 0.0650   Epoch: 3   Global Step: 22020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:43:28,428-Speed 3394.27 samples/sec   Loss 7.2444   LearningRate 0.0650   Epoch: 3   Global Step: 22030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:43:31,441-Speed 3399.34 samples/sec   Loss 7.2526   LearningRate 0.0650   Epoch: 3   Global Step: 22040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:43:34,456-Speed 3397.46 samples/sec   Loss 7.1681   LearningRate 0.0650   Epoch: 3   Global Step: 22050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:43:37,470-Speed 3397.29 samples/sec   Loss 7.3932   LearningRate 0.0650   Epoch: 3   Global Step: 22060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:43:40,495-Speed 3386.00 samples/sec   Loss 7.2190   LearningRate 0.0650   Epoch: 3   Global Step: 22070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:43:43,503-Speed 3405.91 samples/sec   Loss 7.0947   LearningRate 0.0649   Epoch: 3   Global Step: 22080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:43:46,518-Speed 3396.42 samples/sec   Loss 7.3178   LearningRate 0.0649   Epoch: 3   Global Step: 22090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:43:49,530-Speed 3400.72 samples/sec   Loss 7.1310   LearningRate 0.0649   Epoch: 3   Global Step: 22100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:43:52,551-Speed 3390.61 samples/sec   Loss 7.3329   LearningRate 0.0649   Epoch: 3   Global Step: 22110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:43:55,571-Speed 3391.34 samples/sec   Loss 7.1886   LearningRate 0.0649   Epoch: 3   Global Step: 22120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:43:58,595-Speed 3386.92 samples/sec   Loss 7.2998   LearningRate 0.0649   Epoch: 3   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:44:01,617-Speed 3389.86 samples/sec   Loss 7.2480   LearningRate 0.0649   Epoch: 3   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:44:04,636-Speed 3392.06 samples/sec   Loss 7.1749   LearningRate 0.0648   Epoch: 3   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:44:07,648-Speed 3400.30 samples/sec   Loss 7.1324   LearningRate 0.0648   Epoch: 3   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:44:10,688-Speed 3369.29 samples/sec   Loss 7.1769   LearningRate 0.0648   Epoch: 3   Global Step: 22170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:13,712-Speed 3387.58 samples/sec   Loss 7.4067   LearningRate 0.0648   Epoch: 3   Global Step: 22180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:16,729-Speed 3394.04 samples/sec   Loss 7.3266   LearningRate 0.0648   Epoch: 3   Global Step: 22190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:19,754-Speed 3386.28 samples/sec   Loss 7.1780   LearningRate 0.0648   Epoch: 3   Global Step: 22200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:22,786-Speed 3377.87 samples/sec   Loss 7.2092   LearningRate 0.0648   Epoch: 3   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:25,810-Speed 3387.67 samples/sec   Loss 7.2251   LearningRate 0.0647   Epoch: 3   Global Step: 22220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:28,830-Speed 3391.75 samples/sec   Loss 7.1543   LearningRate 0.0647   Epoch: 3   Global Step: 22230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:31,837-Speed 3406.52 samples/sec   Loss 7.1482   LearningRate 0.0647   Epoch: 3   Global Step: 22240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:34,866-Speed 3380.52 samples/sec   Loss 7.2625   LearningRate 0.0647   Epoch: 3   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:37,898-Speed 3378.53 samples/sec   Loss 7.2666   LearningRate 0.0647   Epoch: 3   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:40,898-Speed 3413.50 samples/sec   Loss 7.2566   LearningRate 0.0647   Epoch: 3   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:44:43,875-Speed 3441.39 samples/sec   Loss 7.1614   LearningRate 0.0647   Epoch: 3   Global Step: 22280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:44:46,891-Speed 3395.08 samples/sec   Loss 7.2449   LearningRate 0.0646   Epoch: 3   Global Step: 22290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:44:49,933-Speed 3367.35 samples/sec   Loss 7.3266   LearningRate 0.0646   Epoch: 3   Global Step: 22300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:44:52,944-Speed 3401.73 samples/sec   Loss 7.0772   LearningRate 0.0646   Epoch: 3   Global Step: 22310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:44:55,961-Speed 3395.43 samples/sec   Loss 7.1493   LearningRate 0.0646   Epoch: 3   Global Step: 22320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:44:58,978-Speed 3394.49 samples/sec   Loss 7.2628   LearningRate 0.0646   Epoch: 3   Global Step: 22330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:45:01,995-Speed 3394.81 samples/sec   Loss 7.3449   LearningRate 0.0646   Epoch: 3   Global Step: 22340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:45:05,015-Speed 3391.66 samples/sec   Loss 7.3359   LearningRate 0.0646   Epoch: 3   Global Step: 22350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:45:08,031-Speed 3395.83 samples/sec   Loss 7.3629   LearningRate 0.0645   Epoch: 3   Global Step: 22360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:45:11,064-Speed 3376.51 samples/sec   Loss 7.3341   LearningRate 0.0645   Epoch: 3   Global Step: 22370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 03:45:14,073-Speed 3404.36 samples/sec   Loss 7.1860   LearningRate 0.0645   Epoch: 3   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:17,090-Speed 3395.29 samples/sec   Loss 7.2047   LearningRate 0.0645   Epoch: 3   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:20,107-Speed 3394.98 samples/sec   Loss 7.2736   LearningRate 0.0645   Epoch: 3   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:23,122-Speed 3397.07 samples/sec   Loss 7.2243   LearningRate 0.0645   Epoch: 3   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:26,139-Speed 3394.93 samples/sec   Loss 7.3620   LearningRate 0.0645   Epoch: 3   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:29,152-Speed 3399.00 samples/sec   Loss 7.2114   LearningRate 0.0644   Epoch: 3   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:32,167-Speed 3397.76 samples/sec   Loss 7.1278   LearningRate 0.0644   Epoch: 3   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:35,191-Speed 3386.28 samples/sec   Loss 7.3203   LearningRate 0.0644   Epoch: 3   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:38,207-Speed 3396.26 samples/sec   Loss 7.2422   LearningRate 0.0644   Epoch: 3   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:41,228-Speed 3390.28 samples/sec   Loss 7.2338   LearningRate 0.0644   Epoch: 3   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:45:44,243-Speed 3396.95 samples/sec   Loss 7.2629   LearningRate 0.0644   Epoch: 3   Global Step: 22480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:45:47,258-Speed 3397.98 samples/sec   Loss 7.1305   LearningRate 0.0644   Epoch: 3   Global Step: 22490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:45:50,278-Speed 3391.65 samples/sec   Loss 7.1476   LearningRate 0.0643   Epoch: 3   Global Step: 22500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:45:53,304-Speed 3384.55 samples/sec   Loss 7.2821   LearningRate 0.0643   Epoch: 3   Global Step: 22510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:45:56,323-Speed 3392.11 samples/sec   Loss 7.1256   LearningRate 0.0643   Epoch: 3   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:45:59,341-Speed 3394.22 samples/sec   Loss 7.1364   LearningRate 0.0643   Epoch: 3   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:02,361-Speed 3390.85 samples/sec   Loss 7.1376   LearningRate 0.0643   Epoch: 3   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:05,370-Speed 3403.90 samples/sec   Loss 7.1777   LearningRate 0.0643   Epoch: 3   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:08,384-Speed 3398.10 samples/sec   Loss 7.0889   LearningRate 0.0643   Epoch: 3   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:11,399-Speed 3397.61 samples/sec   Loss 7.1909   LearningRate 0.0642   Epoch: 3   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:14,413-Speed 3398.58 samples/sec   Loss 7.2363   LearningRate 0.0642   Epoch: 3   Global Step: 22580   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:46:17,413-Speed 3413.65 samples/sec   Loss 7.0739   LearningRate 0.0642   Epoch: 3   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:20,428-Speed 3398.04 samples/sec   Loss 7.1825   LearningRate 0.0642   Epoch: 3   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:23,442-Speed 3398.26 samples/sec   Loss 7.1457   LearningRate 0.0642   Epoch: 3   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:26,458-Speed 3395.01 samples/sec   Loss 7.2651   LearningRate 0.0642   Epoch: 3   Global Step: 22620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:29,473-Speed 3397.60 samples/sec   Loss 7.2603   LearningRate 0.0642   Epoch: 3   Global Step: 22630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:32,481-Speed 3405.59 samples/sec   Loss 7.1472   LearningRate 0.0641   Epoch: 3   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:35,504-Speed 3388.80 samples/sec   Loss 7.1489   LearningRate 0.0641   Epoch: 3   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:38,521-Speed 3394.73 samples/sec   Loss 7.1999   LearningRate 0.0641   Epoch: 3   Global Step: 22660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:41,548-Speed 3383.48 samples/sec   Loss 7.1950   LearningRate 0.0641   Epoch: 3   Global Step: 22670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:44,567-Speed 3393.76 samples/sec   Loss 7.3588   LearningRate 0.0641   Epoch: 3   Global Step: 22680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:47,584-Speed 3393.93 samples/sec   Loss 7.2577   LearningRate 0.0641   Epoch: 3   Global Step: 22690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:50,598-Speed 3398.22 samples/sec   Loss 7.2089   LearningRate 0.0641   Epoch: 3   Global Step: 22700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:53,611-Speed 3400.12 samples/sec   Loss 7.2031   LearningRate 0.0640   Epoch: 3   Global Step: 22710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:46:56,617-Speed 3407.34 samples/sec   Loss 7.1487   LearningRate 0.0640   Epoch: 3   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:46:59,641-Speed 3385.88 samples/sec   Loss 7.3585   LearningRate 0.0640   Epoch: 3   Global Step: 22730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:47:02,743-Speed 3302.36 samples/sec   Loss 7.1574   LearningRate 0.0640   Epoch: 3   Global Step: 22740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:47:16,257-Speed 757.81 samples/sec   Loss 6.7898   LearningRate 0.0640   Epoch: 4   Global Step: 22750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:47:19,275-Speed 3393.67 samples/sec   Loss 6.6017   LearningRate 0.0640   Epoch: 4   Global Step: 22760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:47:22,331-Speed 3351.76 samples/sec   Loss 6.5562   LearningRate 0.0640   Epoch: 4   Global Step: 22770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:47:25,510-Speed 3222.21 samples/sec   Loss 6.5530   LearningRate 0.0639   Epoch: 4   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:47:28,528-Speed 3393.63 samples/sec   Loss 6.5103   LearningRate 0.0639   Epoch: 4   Global Step: 22790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:31,558-Speed 3379.88 samples/sec   Loss 6.4951   LearningRate 0.0639   Epoch: 4   Global Step: 22800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:36,065-Speed 2272.61 samples/sec   Loss 6.6043   LearningRate 0.0639   Epoch: 4   Global Step: 22810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:39,096-Speed 3378.60 samples/sec   Loss 6.6183   LearningRate 0.0639   Epoch: 4   Global Step: 22820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:42,119-Speed 3388.23 samples/sec   Loss 6.5144   LearningRate 0.0639   Epoch: 4   Global Step: 22830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:45,139-Speed 3391.88 samples/sec   Loss 6.6169   LearningRate 0.0639   Epoch: 4   Global Step: 22840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:48,151-Speed 3400.53 samples/sec   Loss 6.7527   LearningRate 0.0639   Epoch: 4   Global Step: 22850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:51,183-Speed 3378.83 samples/sec   Loss 6.6572   LearningRate 0.0638   Epoch: 4   Global Step: 22860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:54,198-Speed 3397.28 samples/sec   Loss 6.6876   LearningRate 0.0638   Epoch: 4   Global Step: 22870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:47:57,213-Speed 3396.58 samples/sec   Loss 6.6435   LearningRate 0.0638   Epoch: 4   Global Step: 22880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:00,252-Speed 3370.49 samples/sec   Loss 6.8037   LearningRate 0.0638   Epoch: 4   Global Step: 22890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:03,286-Speed 3376.30 samples/sec   Loss 6.9326   LearningRate 0.0638   Epoch: 4   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:06,310-Speed 3386.12 samples/sec   Loss 6.6491   LearningRate 0.0638   Epoch: 4   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:09,360-Speed 3358.33 samples/sec   Loss 6.7099   LearningRate 0.0638   Epoch: 4   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:12,415-Speed 3352.65 samples/sec   Loss 6.6289   LearningRate 0.0637   Epoch: 4   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:15,473-Speed 3349.08 samples/sec   Loss 6.7882   LearningRate 0.0637   Epoch: 4   Global Step: 22940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:18,564-Speed 3313.80 samples/sec   Loss 6.8623   LearningRate 0.0637   Epoch: 4   Global Step: 22950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:21,597-Speed 3377.42 samples/sec   Loss 6.4916   LearningRate 0.0637   Epoch: 4   Global Step: 22960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:24,623-Speed 3384.47 samples/sec   Loss 6.8169   LearningRate 0.0637   Epoch: 4   Global Step: 22970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:27,654-Speed 3379.94 samples/sec   Loss 6.7553   LearningRate 0.0637   Epoch: 4   Global Step: 22980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:30,686-Speed 3377.18 samples/sec   Loss 6.8742   LearningRate 0.0637   Epoch: 4   Global Step: 22990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:33,734-Speed 3360.97 samples/sec   Loss 6.7060   LearningRate 0.0636   Epoch: 4   Global Step: 23000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:36,788-Speed 3353.58 samples/sec   Loss 6.7027   LearningRate 0.0636   Epoch: 4   Global Step: 23010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:39,884-Speed 3307.94 samples/sec   Loss 6.6878   LearningRate 0.0636   Epoch: 4   Global Step: 23020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:42,917-Speed 3377.66 samples/sec   Loss 6.7254   LearningRate 0.0636   Epoch: 4   Global Step: 23030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:45,936-Speed 3393.00 samples/sec   Loss 6.6523   LearningRate 0.0636   Epoch: 4   Global Step: 23040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:48:48,972-Speed 3373.14 samples/sec   Loss 6.7925   LearningRate 0.0636   Epoch: 4   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:51,997-Speed 3385.74 samples/sec   Loss 6.8050   LearningRate 0.0636   Epoch: 4   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:55,015-Speed 3393.88 samples/sec   Loss 6.8258   LearningRate 0.0635   Epoch: 4   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:48:58,039-Speed 3386.53 samples/sec   Loss 6.8302   LearningRate 0.0635   Epoch: 4   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:01,062-Speed 3388.74 samples/sec   Loss 6.9066   LearningRate 0.0635   Epoch: 4   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:04,091-Speed 3381.63 samples/sec   Loss 6.7539   LearningRate 0.0635   Epoch: 4   Global Step: 23100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:07,115-Speed 3386.55 samples/sec   Loss 6.8540   LearningRate 0.0635   Epoch: 4   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:10,134-Speed 3393.18 samples/sec   Loss 6.7214   LearningRate 0.0635   Epoch: 4   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:13,151-Speed 3395.14 samples/sec   Loss 6.8733   LearningRate 0.0635   Epoch: 4   Global Step: 23130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:16,169-Speed 3393.29 samples/sec   Loss 6.8242   LearningRate 0.0634   Epoch: 4   Global Step: 23140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:19,172-Speed 3411.67 samples/sec   Loss 6.8441   LearningRate 0.0634   Epoch: 4   Global Step: 23150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:22,215-Speed 3365.75 samples/sec   Loss 7.0100   LearningRate 0.0634   Epoch: 4   Global Step: 23160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:25,240-Speed 3385.20 samples/sec   Loss 7.0237   LearningRate 0.0634   Epoch: 4   Global Step: 23170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:28,291-Speed 3356.71 samples/sec   Loss 6.8283   LearningRate 0.0634   Epoch: 4   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:31,314-Speed 3388.94 samples/sec   Loss 6.8590   LearningRate 0.0634   Epoch: 4   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:34,324-Speed 3402.79 samples/sec   Loss 6.8233   LearningRate 0.0634   Epoch: 4   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:37,341-Speed 3394.82 samples/sec   Loss 6.9689   LearningRate 0.0633   Epoch: 4   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:49:40,346-Speed 3408.15 samples/sec   Loss 6.8694   LearningRate 0.0633   Epoch: 4   Global Step: 23220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:49:43,373-Speed 3384.31 samples/sec   Loss 7.0409   LearningRate 0.0633   Epoch: 4   Global Step: 23230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:49:46,404-Speed 3378.45 samples/sec   Loss 6.9650   LearningRate 0.0633   Epoch: 4   Global Step: 23240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:49:49,427-Speed 3388.42 samples/sec   Loss 6.9681   LearningRate 0.0633   Epoch: 4   Global Step: 23250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:49:52,446-Speed 3392.91 samples/sec   Loss 6.8105   LearningRate 0.0633   Epoch: 4   Global Step: 23260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:49:55,462-Speed 3395.75 samples/sec   Loss 7.1044   LearningRate 0.0633   Epoch: 4   Global Step: 23270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:49:58,482-Speed 3391.71 samples/sec   Loss 6.9878   LearningRate 0.0632   Epoch: 4   Global Step: 23280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:50:01,490-Speed 3405.55 samples/sec   Loss 6.8987   LearningRate 0.0632   Epoch: 4   Global Step: 23290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:50:04,503-Speed 3398.76 samples/sec   Loss 7.0096   LearningRate 0.0632   Epoch: 4   Global Step: 23300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:50:07,528-Speed 3386.28 samples/sec   Loss 6.7895   LearningRate 0.0632   Epoch: 4   Global Step: 23310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:50:10,566-Speed 3371.50 samples/sec   Loss 6.9209   LearningRate 0.0632   Epoch: 4   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:13,673-Speed 3296.07 samples/sec   Loss 6.7811   LearningRate 0.0632   Epoch: 4   Global Step: 23330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:16,712-Speed 3370.34 samples/sec   Loss 7.0568   LearningRate 0.0632   Epoch: 4   Global Step: 23340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:19,747-Speed 3388.26 samples/sec   Loss 6.8411   LearningRate 0.0632   Epoch: 4   Global Step: 23350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:22,768-Speed 3390.16 samples/sec   Loss 6.9517   LearningRate 0.0631   Epoch: 4   Global Step: 23360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:25,803-Speed 3374.93 samples/sec   Loss 7.0548   LearningRate 0.0631   Epoch: 4   Global Step: 23370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:28,850-Speed 3361.43 samples/sec   Loss 6.9913   LearningRate 0.0631   Epoch: 4   Global Step: 23380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:31,880-Speed 3380.59 samples/sec   Loss 6.8433   LearningRate 0.0631   Epoch: 4   Global Step: 23390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:34,893-Speed 3399.25 samples/sec   Loss 6.8868   LearningRate 0.0631   Epoch: 4   Global Step: 23400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:37,975-Speed 3323.13 samples/sec   Loss 6.9364   LearningRate 0.0631   Epoch: 4   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:40,997-Speed 3389.00 samples/sec   Loss 6.9801   LearningRate 0.0631   Epoch: 4   Global Step: 23420   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:50:44,037-Speed 3369.75 samples/sec   Loss 7.1047   LearningRate 0.0630   Epoch: 4   Global Step: 23430   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:50:47,060-Speed 3387.54 samples/sec   Loss 6.9310   LearningRate 0.0630   Epoch: 4   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:50,090-Speed 3380.47 samples/sec   Loss 7.0243   LearningRate 0.0630   Epoch: 4   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:50:53,109-Speed 3393.22 samples/sec   Loss 6.9547   LearningRate 0.0630   Epoch: 4   Global Step: 23460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:50:56,142-Speed 3376.68 samples/sec   Loss 6.9852   LearningRate 0.0630   Epoch: 4   Global Step: 23470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:50:59,166-Speed 3387.15 samples/sec   Loss 6.9156   LearningRate 0.0630   Epoch: 4   Global Step: 23480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:02,217-Speed 3357.53 samples/sec   Loss 6.7963   LearningRate 0.0630   Epoch: 4   Global Step: 23490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:05,245-Speed 3382.44 samples/sec   Loss 6.9727   LearningRate 0.0629   Epoch: 4   Global Step: 23500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:08,272-Speed 3382.81 samples/sec   Loss 7.0155   LearningRate 0.0629   Epoch: 4   Global Step: 23510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:11,305-Speed 3377.29 samples/sec   Loss 7.0667   LearningRate 0.0629   Epoch: 4   Global Step: 23520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:14,334-Speed 3381.24 samples/sec   Loss 7.0227   LearningRate 0.0629   Epoch: 4   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:17,365-Speed 3379.72 samples/sec   Loss 6.9159   LearningRate 0.0629   Epoch: 4   Global Step: 23540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:20,389-Speed 3387.22 samples/sec   Loss 6.8723   LearningRate 0.0629   Epoch: 4   Global Step: 23550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:23,468-Speed 3326.17 samples/sec   Loss 6.9424   LearningRate 0.0629   Epoch: 4   Global Step: 23560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:51:26,494-Speed 3385.17 samples/sec   Loss 7.0527   LearningRate 0.0628   Epoch: 4   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:51:29,522-Speed 3381.92 samples/sec   Loss 6.9586   LearningRate 0.0628   Epoch: 4   Global Step: 23580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:51:32,528-Speed 3408.11 samples/sec   Loss 6.9985   LearningRate 0.0628   Epoch: 4   Global Step: 23590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:35,564-Speed 3373.58 samples/sec   Loss 7.0018   LearningRate 0.0628   Epoch: 4   Global Step: 23600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:38,591-Speed 3383.95 samples/sec   Loss 6.9442   LearningRate 0.0628   Epoch: 4   Global Step: 23610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:41,616-Speed 3384.94 samples/sec   Loss 6.7636   LearningRate 0.0628   Epoch: 4   Global Step: 23620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:44,648-Speed 3378.49 samples/sec   Loss 6.9398   LearningRate 0.0628   Epoch: 4   Global Step: 23630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:47,672-Speed 3387.50 samples/sec   Loss 7.0540   LearningRate 0.0627   Epoch: 4   Global Step: 23640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:50,707-Speed 3375.12 samples/sec   Loss 6.8654   LearningRate 0.0627   Epoch: 4   Global Step: 23650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:53,729-Speed 3389.15 samples/sec   Loss 7.0181   LearningRate 0.0627   Epoch: 4   Global Step: 23660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:56,755-Speed 3384.59 samples/sec   Loss 7.0439   LearningRate 0.0627   Epoch: 4   Global Step: 23670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:51:59,775-Speed 3391.20 samples/sec   Loss 6.9708   LearningRate 0.0627   Epoch: 4   Global Step: 23680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:02,801-Speed 3385.53 samples/sec   Loss 7.0013   LearningRate 0.0627   Epoch: 4   Global Step: 23690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:05,806-Speed 3407.35 samples/sec   Loss 7.0808   LearningRate 0.0627   Epoch: 4   Global Step: 23700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:08,829-Speed 3388.35 samples/sec   Loss 6.8819   LearningRate 0.0626   Epoch: 4   Global Step: 23710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:11,918-Speed 3315.30 samples/sec   Loss 6.9598   LearningRate 0.0626   Epoch: 4   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:14,990-Speed 3335.14 samples/sec   Loss 6.9753   LearningRate 0.0626   Epoch: 4   Global Step: 23730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:18,014-Speed 3387.34 samples/sec   Loss 7.0131   LearningRate 0.0626   Epoch: 4   Global Step: 23740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:21,041-Speed 3383.22 samples/sec   Loss 6.8682   LearningRate 0.0626   Epoch: 4   Global Step: 23750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:24,063-Speed 3389.36 samples/sec   Loss 6.9526   LearningRate 0.0626   Epoch: 4   Global Step: 23760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:27,084-Speed 3389.72 samples/sec   Loss 6.9935   LearningRate 0.0626   Epoch: 4   Global Step: 23770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:30,120-Speed 3373.89 samples/sec   Loss 6.9128   LearningRate 0.0626   Epoch: 4   Global Step: 23780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:33,162-Speed 3367.25 samples/sec   Loss 6.8785   LearningRate 0.0625   Epoch: 4   Global Step: 23790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:52:36,265-Speed 3300.94 samples/sec   Loss 6.9934   LearningRate 0.0625   Epoch: 4   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:39,367-Speed 3301.75 samples/sec   Loss 6.9463   LearningRate 0.0625   Epoch: 4   Global Step: 23810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:42,397-Speed 3380.07 samples/sec   Loss 6.9298   LearningRate 0.0625   Epoch: 4   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:45,426-Speed 3381.80 samples/sec   Loss 7.0811   LearningRate 0.0625   Epoch: 4   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:48,454-Speed 3382.82 samples/sec   Loss 6.8784   LearningRate 0.0625   Epoch: 4   Global Step: 23840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:51,479-Speed 3385.71 samples/sec   Loss 7.0261   LearningRate 0.0625   Epoch: 4   Global Step: 23850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:54,503-Speed 3387.43 samples/sec   Loss 7.1284   LearningRate 0.0624   Epoch: 4   Global Step: 23860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:52:57,516-Speed 3398.66 samples/sec   Loss 6.9937   LearningRate 0.0624   Epoch: 4   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:00,542-Speed 3385.30 samples/sec   Loss 7.1572   LearningRate 0.0624   Epoch: 4   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:03,566-Speed 3386.25 samples/sec   Loss 6.9075   LearningRate 0.0624   Epoch: 4   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:06,602-Speed 3373.99 samples/sec   Loss 7.0283   LearningRate 0.0624   Epoch: 4   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:09,632-Speed 3379.81 samples/sec   Loss 7.0193   LearningRate 0.0624   Epoch: 4   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:12,659-Speed 3384.32 samples/sec   Loss 6.9727   LearningRate 0.0624   Epoch: 4   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:15,684-Speed 3385.96 samples/sec   Loss 6.8192   LearningRate 0.0623   Epoch: 4   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:18,726-Speed 3366.59 samples/sec   Loss 7.1133   LearningRate 0.0623   Epoch: 4   Global Step: 23940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:21,755-Speed 3381.47 samples/sec   Loss 6.9897   LearningRate 0.0623   Epoch: 4   Global Step: 23950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:24,781-Speed 3384.65 samples/sec   Loss 6.9858   LearningRate 0.0623   Epoch: 4   Global Step: 23960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:53:27,817-Speed 3373.69 samples/sec   Loss 7.1994   LearningRate 0.0623   Epoch: 4   Global Step: 23970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:53:30,840-Speed 3388.88 samples/sec   Loss 7.0030   LearningRate 0.0623   Epoch: 4   Global Step: 23980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:53:33,874-Speed 3375.43 samples/sec   Loss 6.9236   LearningRate 0.0623   Epoch: 4   Global Step: 23990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:53:36,905-Speed 3379.14 samples/sec   Loss 6.9334   LearningRate 0.0622   Epoch: 4   Global Step: 24000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:54:20,188-[lfw][24000]XNorm: 21.892247
Training: 2022-04-27 03:54:20,189-[lfw][24000]Accuracy-Flip: 0.99700+-0.00277
Training: 2022-04-27 03:54:20,189-[lfw][24000]Accuracy-Highest: 0.99717
Training: 2022-04-27 03:55:10,431-[cfp_fp][24000]XNorm: 19.193311
Training: 2022-04-27 03:55:10,431-[cfp_fp][24000]Accuracy-Flip: 0.93714+-0.01194
Training: 2022-04-27 03:55:10,432-[cfp_fp][24000]Accuracy-Highest: 0.94257
Training: 2022-04-27 03:55:53,781-[agedb_30][24000]XNorm: 21.897793
Training: 2022-04-27 03:55:53,782-[agedb_30][24000]Accuracy-Flip: 0.96750+-0.00739
Training: 2022-04-27 03:55:53,782-[agedb_30][24000]Accuracy-Highest: 0.97167
Training: 2022-04-27 03:55:56,800-Speed 73.20 samples/sec   Loss 7.1318   LearningRate 0.0622   Epoch: 4   Global Step: 24010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:55:59,810-Speed 3403.66 samples/sec   Loss 7.0214   LearningRate 0.0622   Epoch: 4   Global Step: 24020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:02,822-Speed 3401.04 samples/sec   Loss 6.9673   LearningRate 0.0622   Epoch: 4   Global Step: 24030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:05,835-Speed 3398.58 samples/sec   Loss 7.0037   LearningRate 0.0622   Epoch: 4   Global Step: 24040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:08,876-Speed 3368.57 samples/sec   Loss 6.8831   LearningRate 0.0622   Epoch: 4   Global Step: 24050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:11,885-Speed 3403.34 samples/sec   Loss 7.0923   LearningRate 0.0622   Epoch: 4   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:14,902-Speed 3395.12 samples/sec   Loss 6.8972   LearningRate 0.0621   Epoch: 4   Global Step: 24070   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:56:17,899-Speed 3417.05 samples/sec   Loss 6.8309   LearningRate 0.0621   Epoch: 4   Global Step: 24080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:20,899-Speed 3413.87 samples/sec   Loss 6.9708   LearningRate 0.0621   Epoch: 4   Global Step: 24090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:23,918-Speed 3393.22 samples/sec   Loss 6.9136   LearningRate 0.0621   Epoch: 4   Global Step: 24100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:26,936-Speed 3394.40 samples/sec   Loss 6.9582   LearningRate 0.0621   Epoch: 4   Global Step: 24110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:29,961-Speed 3385.41 samples/sec   Loss 7.0657   LearningRate 0.0621   Epoch: 4   Global Step: 24120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:32,980-Speed 3393.31 samples/sec   Loss 6.9219   LearningRate 0.0621   Epoch: 4   Global Step: 24130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:35,997-Speed 3395.13 samples/sec   Loss 6.9136   LearningRate 0.0621   Epoch: 4   Global Step: 24140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:39,012-Speed 3396.18 samples/sec   Loss 6.9942   LearningRate 0.0620   Epoch: 4   Global Step: 24150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:42,031-Speed 3393.10 samples/sec   Loss 7.0611   LearningRate 0.0620   Epoch: 4   Global Step: 24160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:45,047-Speed 3395.75 samples/sec   Loss 6.9189   LearningRate 0.0620   Epoch: 4   Global Step: 24170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:48,139-Speed 3313.04 samples/sec   Loss 7.0116   LearningRate 0.0620   Epoch: 4   Global Step: 24180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:56:51,162-Speed 3387.45 samples/sec   Loss 7.0538   LearningRate 0.0620   Epoch: 4   Global Step: 24190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:54,177-Speed 3397.67 samples/sec   Loss 6.8926   LearningRate 0.0620   Epoch: 4   Global Step: 24200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:56:57,228-Speed 3357.08 samples/sec   Loss 7.0350   LearningRate 0.0620   Epoch: 4   Global Step: 24210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:00,264-Speed 3373.82 samples/sec   Loss 6.9672   LearningRate 0.0619   Epoch: 4   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:03,284-Speed 3390.90 samples/sec   Loss 6.8827   LearningRate 0.0619   Epoch: 4   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:06,302-Speed 3394.59 samples/sec   Loss 6.9614   LearningRate 0.0619   Epoch: 4   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:09,319-Speed 3394.70 samples/sec   Loss 6.9703   LearningRate 0.0619   Epoch: 4   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:12,339-Speed 3391.26 samples/sec   Loss 7.0011   LearningRate 0.0619   Epoch: 4   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:15,356-Speed 3394.75 samples/sec   Loss 7.1137   LearningRate 0.0619   Epoch: 4   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:18,383-Speed 3383.94 samples/sec   Loss 7.0750   LearningRate 0.0619   Epoch: 4   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:21,394-Speed 3401.53 samples/sec   Loss 7.1105   LearningRate 0.0618   Epoch: 4   Global Step: 24290   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:57:24,408-Speed 3398.24 samples/sec   Loss 7.0896   LearningRate 0.0618   Epoch: 4   Global Step: 24300   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:57:27,427-Speed 3392.28 samples/sec   Loss 7.2381   LearningRate 0.0618   Epoch: 4   Global Step: 24310   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 03:57:30,417-Speed 3425.85 samples/sec   Loss 6.8338   LearningRate 0.0618   Epoch: 4   Global Step: 24320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:33,431-Speed 3398.63 samples/sec   Loss 6.9893   LearningRate 0.0618   Epoch: 4   Global Step: 24330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:36,440-Speed 3404.10 samples/sec   Loss 7.0407   LearningRate 0.0618   Epoch: 4   Global Step: 24340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:57:39,438-Speed 3416.47 samples/sec   Loss 7.0127   LearningRate 0.0618   Epoch: 4   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:57:42,452-Speed 3399.18 samples/sec   Loss 6.8160   LearningRate 0.0617   Epoch: 4   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:57:45,469-Speed 3394.49 samples/sec   Loss 6.9770   LearningRate 0.0617   Epoch: 4   Global Step: 24370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:57:48,482-Speed 3400.40 samples/sec   Loss 7.1098   LearningRate 0.0617   Epoch: 4   Global Step: 24380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:57:51,491-Speed 3403.22 samples/sec   Loss 6.9342   LearningRate 0.0617   Epoch: 4   Global Step: 24390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:57:54,515-Speed 3386.80 samples/sec   Loss 6.8485   LearningRate 0.0617   Epoch: 4   Global Step: 24400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:57:57,528-Speed 3400.14 samples/sec   Loss 6.8527   LearningRate 0.0617   Epoch: 4   Global Step: 24410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:00,577-Speed 3358.88 samples/sec   Loss 7.0268   LearningRate 0.0617   Epoch: 4   Global Step: 24420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:03,586-Speed 3403.68 samples/sec   Loss 6.9613   LearningRate 0.0616   Epoch: 4   Global Step: 24430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:06,597-Speed 3402.56 samples/sec   Loss 6.8921   LearningRate 0.0616   Epoch: 4   Global Step: 24440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:09,614-Speed 3394.12 samples/sec   Loss 7.0286   LearningRate 0.0616   Epoch: 4   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:58:12,647-Speed 3377.66 samples/sec   Loss 7.2039   LearningRate 0.0616   Epoch: 4   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:58:15,662-Speed 3396.36 samples/sec   Loss 6.9931   LearningRate 0.0616   Epoch: 4   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:58:18,678-Speed 3396.60 samples/sec   Loss 6.8814   LearningRate 0.0616   Epoch: 4   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:58:21,693-Speed 3397.59 samples/sec   Loss 6.9578   LearningRate 0.0616   Epoch: 4   Global Step: 24490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:58:24,686-Speed 3421.81 samples/sec   Loss 7.0226   LearningRate 0.0616   Epoch: 4   Global Step: 24500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:27,699-Speed 3398.83 samples/sec   Loss 6.9571   LearningRate 0.0615   Epoch: 4   Global Step: 24510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:30,713-Speed 3398.22 samples/sec   Loss 6.9832   LearningRate 0.0615   Epoch: 4   Global Step: 24520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:33,731-Speed 3393.71 samples/sec   Loss 6.9816   LearningRate 0.0615   Epoch: 4   Global Step: 24530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:36,955-Speed 3177.92 samples/sec   Loss 6.8685   LearningRate 0.0615   Epoch: 4   Global Step: 24540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:39,989-Speed 3375.26 samples/sec   Loss 7.1216   LearningRate 0.0615   Epoch: 4   Global Step: 24550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:43,010-Speed 3391.00 samples/sec   Loss 7.0109   LearningRate 0.0615   Epoch: 4   Global Step: 24560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:46,022-Speed 3400.89 samples/sec   Loss 6.9841   LearningRate 0.0615   Epoch: 4   Global Step: 24570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:49,052-Speed 3380.11 samples/sec   Loss 6.9929   LearningRate 0.0614   Epoch: 4   Global Step: 24580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:52,070-Speed 3394.22 samples/sec   Loss 6.9676   LearningRate 0.0614   Epoch: 4   Global Step: 24590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:58:55,088-Speed 3393.46 samples/sec   Loss 6.9112   LearningRate 0.0614   Epoch: 4   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:58:58,153-Speed 3341.61 samples/sec   Loss 6.8743   LearningRate 0.0614   Epoch: 4   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:01,190-Speed 3373.10 samples/sec   Loss 7.0400   LearningRate 0.0614   Epoch: 4   Global Step: 24620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:04,407-Speed 3183.28 samples/sec   Loss 6.8661   LearningRate 0.0614   Epoch: 4   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:07,420-Speed 3400.29 samples/sec   Loss 6.9797   LearningRate 0.0614   Epoch: 4   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:10,433-Speed 3399.05 samples/sec   Loss 6.9435   LearningRate 0.0613   Epoch: 4   Global Step: 24650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:13,427-Speed 3421.21 samples/sec   Loss 6.9783   LearningRate 0.0613   Epoch: 4   Global Step: 24660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:16,441-Speed 3398.12 samples/sec   Loss 7.0019   LearningRate 0.0613   Epoch: 4   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:19,458-Speed 3395.57 samples/sec   Loss 7.0640   LearningRate 0.0613   Epoch: 4   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:22,485-Speed 3383.48 samples/sec   Loss 6.9274   LearningRate 0.0613   Epoch: 4   Global Step: 24690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:25,529-Speed 3364.65 samples/sec   Loss 7.0313   LearningRate 0.0613   Epoch: 4   Global Step: 24700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:28,569-Speed 3368.09 samples/sec   Loss 6.8108   LearningRate 0.0613   Epoch: 4   Global Step: 24710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:31,589-Speed 3392.74 samples/sec   Loss 6.7941   LearningRate 0.0613   Epoch: 4   Global Step: 24720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:34,608-Speed 3392.86 samples/sec   Loss 6.9716   LearningRate 0.0612   Epoch: 4   Global Step: 24730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:37,624-Speed 3396.47 samples/sec   Loss 7.0566   LearningRate 0.0612   Epoch: 4   Global Step: 24740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:40,643-Speed 3392.64 samples/sec   Loss 7.0487   LearningRate 0.0612   Epoch: 4   Global Step: 24750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 03:59:43,667-Speed 3387.18 samples/sec   Loss 6.9587   LearningRate 0.0612   Epoch: 4   Global Step: 24760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:46,722-Speed 3352.22 samples/sec   Loss 6.8824   LearningRate 0.0612   Epoch: 4   Global Step: 24770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:49,738-Speed 3396.70 samples/sec   Loss 6.9587   LearningRate 0.0612   Epoch: 4   Global Step: 24780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:52,752-Speed 3397.36 samples/sec   Loss 7.0160   LearningRate 0.0612   Epoch: 4   Global Step: 24790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:55,771-Speed 3393.44 samples/sec   Loss 6.8691   LearningRate 0.0611   Epoch: 4   Global Step: 24800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 03:59:58,786-Speed 3396.08 samples/sec   Loss 7.0207   LearningRate 0.0611   Epoch: 4   Global Step: 24810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:01,803-Speed 3395.59 samples/sec   Loss 6.9553   LearningRate 0.0611   Epoch: 4   Global Step: 24820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:04,817-Speed 3397.96 samples/sec   Loss 6.9049   LearningRate 0.0611   Epoch: 4   Global Step: 24830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:07,832-Speed 3397.37 samples/sec   Loss 6.7832   LearningRate 0.0611   Epoch: 4   Global Step: 24840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:10,860-Speed 3382.67 samples/sec   Loss 6.9198   LearningRate 0.0611   Epoch: 4   Global Step: 24850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:13,909-Speed 3359.10 samples/sec   Loss 6.9813   LearningRate 0.0611   Epoch: 4   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:16,937-Speed 3382.53 samples/sec   Loss 6.9101   LearningRate 0.0610   Epoch: 4   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:19,957-Speed 3392.07 samples/sec   Loss 7.0022   LearningRate 0.0610   Epoch: 4   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:22,973-Speed 3395.16 samples/sec   Loss 6.8952   LearningRate 0.0610   Epoch: 4   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:25,998-Speed 3386.46 samples/sec   Loss 6.8327   LearningRate 0.0610   Epoch: 4   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:29,013-Speed 3396.93 samples/sec   Loss 6.8397   LearningRate 0.0610   Epoch: 4   Global Step: 24910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:32,054-Speed 3367.99 samples/sec   Loss 7.0024   LearningRate 0.0610   Epoch: 4   Global Step: 24920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:35,097-Speed 3366.14 samples/sec   Loss 6.9247   LearningRate 0.0610   Epoch: 4   Global Step: 24930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:38,166-Speed 3337.54 samples/sec   Loss 6.8048   LearningRate 0.0609   Epoch: 4   Global Step: 24940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:00:41,168-Speed 3412.27 samples/sec   Loss 6.7550   LearningRate 0.0609   Epoch: 4   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:00:44,189-Speed 3389.89 samples/sec   Loss 6.8939   LearningRate 0.0609   Epoch: 4   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:00:47,248-Speed 3351.51 samples/sec   Loss 6.9368   LearningRate 0.0609   Epoch: 4   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:00:50,267-Speed 3392.90 samples/sec   Loss 6.9149   LearningRate 0.0609   Epoch: 4   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:00:53,290-Speed 3388.37 samples/sec   Loss 7.0899   LearningRate 0.0609   Epoch: 4   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:00:56,308-Speed 3392.56 samples/sec   Loss 6.8495   LearningRate 0.0609   Epoch: 4   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:00:59,407-Speed 3305.30 samples/sec   Loss 6.9051   LearningRate 0.0609   Epoch: 4   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:01:02,429-Speed 3389.44 samples/sec   Loss 6.9963   LearningRate 0.0608   Epoch: 4   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:01:05,475-Speed 3363.11 samples/sec   Loss 6.9077   LearningRate 0.0608   Epoch: 4   Global Step: 25030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:01:08,494-Speed 3392.50 samples/sec   Loss 6.9277   LearningRate 0.0608   Epoch: 4   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:01:11,525-Speed 3380.13 samples/sec   Loss 6.9139   LearningRate 0.0608   Epoch: 4   Global Step: 25050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:14,544-Speed 3392.35 samples/sec   Loss 6.8926   LearningRate 0.0608   Epoch: 4   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:17,558-Speed 3398.06 samples/sec   Loss 6.7052   LearningRate 0.0608   Epoch: 4   Global Step: 25070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:20,574-Speed 3396.40 samples/sec   Loss 6.8686   LearningRate 0.0608   Epoch: 4   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:23,603-Speed 3381.19 samples/sec   Loss 6.8747   LearningRate 0.0607   Epoch: 4   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:26,644-Speed 3367.12 samples/sec   Loss 6.8772   LearningRate 0.0607   Epoch: 4   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:29,668-Speed 3387.09 samples/sec   Loss 6.8912   LearningRate 0.0607   Epoch: 4   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:32,689-Speed 3390.87 samples/sec   Loss 6.8752   LearningRate 0.0607   Epoch: 4   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:35,704-Speed 3397.61 samples/sec   Loss 6.9194   LearningRate 0.0607   Epoch: 4   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:38,726-Speed 3389.39 samples/sec   Loss 6.7855   LearningRate 0.0607   Epoch: 4   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:41,750-Speed 3387.40 samples/sec   Loss 6.9122   LearningRate 0.0607   Epoch: 4   Global Step: 25150   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:01:44,752-Speed 3411.62 samples/sec   Loss 7.0176   LearningRate 0.0606   Epoch: 4   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:47,790-Speed 3371.56 samples/sec   Loss 6.7969   LearningRate 0.0606   Epoch: 4   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:50,808-Speed 3393.66 samples/sec   Loss 6.9564   LearningRate 0.0606   Epoch: 4   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:53,838-Speed 3380.48 samples/sec   Loss 6.9789   LearningRate 0.0606   Epoch: 4   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:56,864-Speed 3383.79 samples/sec   Loss 6.7670   LearningRate 0.0606   Epoch: 4   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:01:59,941-Speed 3329.39 samples/sec   Loss 6.8831   LearningRate 0.0606   Epoch: 4   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:02,965-Speed 3387.35 samples/sec   Loss 6.8846   LearningRate 0.0606   Epoch: 4   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:05,994-Speed 3380.54 samples/sec   Loss 6.9633   LearningRate 0.0606   Epoch: 4   Global Step: 25230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:09,019-Speed 3386.45 samples/sec   Loss 6.8487   LearningRate 0.0605   Epoch: 4   Global Step: 25240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:12,044-Speed 3385.56 samples/sec   Loss 6.8384   LearningRate 0.0605   Epoch: 4   Global Step: 25250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:15,078-Speed 3375.81 samples/sec   Loss 6.8887   LearningRate 0.0605   Epoch: 4   Global Step: 25260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:18,151-Speed 3334.08 samples/sec   Loss 7.0122   LearningRate 0.0605   Epoch: 4   Global Step: 25270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:21,177-Speed 3384.33 samples/sec   Loss 7.0200   LearningRate 0.0605   Epoch: 4   Global Step: 25280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:24,206-Speed 3381.02 samples/sec   Loss 6.8980   LearningRate 0.0605   Epoch: 4   Global Step: 25290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:27,248-Speed 3367.83 samples/sec   Loss 6.9379   LearningRate 0.0605   Epoch: 4   Global Step: 25300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:30,281-Speed 3377.00 samples/sec   Loss 6.8976   LearningRate 0.0604   Epoch: 4   Global Step: 25310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:33,381-Speed 3303.03 samples/sec   Loss 6.8942   LearningRate 0.0604   Epoch: 4   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:02:36,409-Speed 3382.97 samples/sec   Loss 6.8847   LearningRate 0.0604   Epoch: 4   Global Step: 25330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:02:39,432-Speed 3388.62 samples/sec   Loss 6.7174   LearningRate 0.0604   Epoch: 4   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:02:42,472-Speed 3369.04 samples/sec   Loss 6.9310   LearningRate 0.0604   Epoch: 4   Global Step: 25350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:02:45,511-Speed 3370.19 samples/sec   Loss 6.8429   LearningRate 0.0604   Epoch: 4   Global Step: 25360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:02:48,541-Speed 3380.01 samples/sec   Loss 6.8986   LearningRate 0.0604   Epoch: 4   Global Step: 25370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:02:51,566-Speed 3385.94 samples/sec   Loss 6.7821   LearningRate 0.0603   Epoch: 4   Global Step: 25380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:02:54,591-Speed 3386.22 samples/sec   Loss 6.9524   LearningRate 0.0603   Epoch: 4   Global Step: 25390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:02:57,615-Speed 3386.80 samples/sec   Loss 6.7681   LearningRate 0.0603   Epoch: 4   Global Step: 25400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:00,641-Speed 3384.67 samples/sec   Loss 6.9054   LearningRate 0.0603   Epoch: 4   Global Step: 25410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:03,741-Speed 3304.45 samples/sec   Loss 6.8082   LearningRate 0.0603   Epoch: 4   Global Step: 25420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:06,764-Speed 3388.26 samples/sec   Loss 6.9432   LearningRate 0.0603   Epoch: 4   Global Step: 25430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:03:09,789-Speed 3385.26 samples/sec   Loss 6.8705   LearningRate 0.0603   Epoch: 4   Global Step: 25440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:03:12,832-Speed 3366.52 samples/sec   Loss 6.8071   LearningRate 0.0602   Epoch: 4   Global Step: 25450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:03:15,883-Speed 3356.85 samples/sec   Loss 7.0616   LearningRate 0.0602   Epoch: 4   Global Step: 25460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:03:18,922-Speed 3370.02 samples/sec   Loss 6.9063   LearningRate 0.0602   Epoch: 4   Global Step: 25470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:03:21,949-Speed 3384.19 samples/sec   Loss 6.9006   LearningRate 0.0602   Epoch: 4   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:03:24,979-Speed 3380.14 samples/sec   Loss 6.8240   LearningRate 0.0602   Epoch: 4   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:03:28,037-Speed 3349.07 samples/sec   Loss 6.7634   LearningRate 0.0602   Epoch: 4   Global Step: 25500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:31,057-Speed 3391.54 samples/sec   Loss 6.9503   LearningRate 0.0602   Epoch: 4   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:34,083-Speed 3384.66 samples/sec   Loss 6.6305   LearningRate 0.0602   Epoch: 4   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:37,108-Speed 3385.89 samples/sec   Loss 6.8707   LearningRate 0.0601   Epoch: 4   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:40,134-Speed 3385.36 samples/sec   Loss 7.0121   LearningRate 0.0601   Epoch: 4   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:43,179-Speed 3363.11 samples/sec   Loss 6.6298   LearningRate 0.0601   Epoch: 4   Global Step: 25550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:46,206-Speed 3384.43 samples/sec   Loss 6.8233   LearningRate 0.0601   Epoch: 4   Global Step: 25560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:49,231-Speed 3385.95 samples/sec   Loss 6.7964   LearningRate 0.0601   Epoch: 4   Global Step: 25570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:52,257-Speed 3385.65 samples/sec   Loss 6.6841   LearningRate 0.0601   Epoch: 4   Global Step: 25580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:55,282-Speed 3385.11 samples/sec   Loss 7.0152   LearningRate 0.0601   Epoch: 4   Global Step: 25590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:03:58,320-Speed 3371.31 samples/sec   Loss 6.7267   LearningRate 0.0600   Epoch: 4   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:01,346-Speed 3385.32 samples/sec   Loss 6.8325   LearningRate 0.0600   Epoch: 4   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:04,376-Speed 3380.46 samples/sec   Loss 6.8505   LearningRate 0.0600   Epoch: 4   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:07,403-Speed 3383.76 samples/sec   Loss 6.9037   LearningRate 0.0600   Epoch: 4   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:10,441-Speed 3371.22 samples/sec   Loss 6.8035   LearningRate 0.0600   Epoch: 4   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:13,466-Speed 3386.09 samples/sec   Loss 6.7800   LearningRate 0.0600   Epoch: 4   Global Step: 25650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:16,518-Speed 3355.46 samples/sec   Loss 6.9495   LearningRate 0.0600   Epoch: 4   Global Step: 25660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:19,541-Speed 3388.18 samples/sec   Loss 6.8498   LearningRate 0.0599   Epoch: 4   Global Step: 25670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:22,565-Speed 3387.31 samples/sec   Loss 6.8386   LearningRate 0.0599   Epoch: 4   Global Step: 25680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:25,588-Speed 3387.83 samples/sec   Loss 6.8264   LearningRate 0.0599   Epoch: 4   Global Step: 25690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:28,616-Speed 3382.61 samples/sec   Loss 6.9016   LearningRate 0.0599   Epoch: 4   Global Step: 25700   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:04:31,623-Speed 3406.44 samples/sec   Loss 6.7773   LearningRate 0.0599   Epoch: 4   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:34,647-Speed 3387.36 samples/sec   Loss 6.7116   LearningRate 0.0599   Epoch: 4   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:37,727-Speed 3324.97 samples/sec   Loss 6.7152   LearningRate 0.0599   Epoch: 4   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:40,799-Speed 3334.50 samples/sec   Loss 6.8629   LearningRate 0.0599   Epoch: 4   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:43,872-Speed 3332.95 samples/sec   Loss 6.9347   LearningRate 0.0598   Epoch: 4   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:46,961-Speed 3316.66 samples/sec   Loss 6.8273   LearningRate 0.0598   Epoch: 4   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:50,011-Speed 3357.40 samples/sec   Loss 6.8011   LearningRate 0.0598   Epoch: 4   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:53,124-Speed 3290.89 samples/sec   Loss 6.8964   LearningRate 0.0598   Epoch: 4   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:56,195-Speed 3335.05 samples/sec   Loss 6.6366   LearningRate 0.0598   Epoch: 4   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:04:59,243-Speed 3359.56 samples/sec   Loss 6.7981   LearningRate 0.0598   Epoch: 4   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:02,290-Speed 3362.13 samples/sec   Loss 6.9820   LearningRate 0.0598   Epoch: 4   Global Step: 25810   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:05:05,298-Speed 3404.52 samples/sec   Loss 6.8463   LearningRate 0.0597   Epoch: 4   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:08,321-Speed 3388.24 samples/sec   Loss 6.8264   LearningRate 0.0597   Epoch: 4   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:11,358-Speed 3373.34 samples/sec   Loss 6.8848   LearningRate 0.0597   Epoch: 4   Global Step: 25840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:14,386-Speed 3382.04 samples/sec   Loss 6.7269   LearningRate 0.0597   Epoch: 4   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:17,408-Speed 3389.14 samples/sec   Loss 6.9116   LearningRate 0.0597   Epoch: 4   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:20,435-Speed 3383.58 samples/sec   Loss 6.8238   LearningRate 0.0597   Epoch: 4   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:23,459-Speed 3388.05 samples/sec   Loss 6.9882   LearningRate 0.0597   Epoch: 4   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:26,590-Speed 3271.50 samples/sec   Loss 6.9246   LearningRate 0.0597   Epoch: 4   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:29,727-Speed 3264.43 samples/sec   Loss 6.7857   LearningRate 0.0596   Epoch: 4   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:32,753-Speed 3385.47 samples/sec   Loss 6.7974   LearningRate 0.0596   Epoch: 4   Global Step: 25910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:35,758-Speed 3408.90 samples/sec   Loss 6.8032   LearningRate 0.0596   Epoch: 4   Global Step: 25920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:38,786-Speed 3381.57 samples/sec   Loss 6.8789   LearningRate 0.0596   Epoch: 4   Global Step: 25930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:41,808-Speed 3389.46 samples/sec   Loss 7.0051   LearningRate 0.0596   Epoch: 4   Global Step: 25940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:44,834-Speed 3384.55 samples/sec   Loss 6.7266   LearningRate 0.0596   Epoch: 4   Global Step: 25950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:47,867-Speed 3377.75 samples/sec   Loss 6.9106   LearningRate 0.0596   Epoch: 4   Global Step: 25960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:50,890-Speed 3388.14 samples/sec   Loss 6.9267   LearningRate 0.0595   Epoch: 4   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:53,916-Speed 3384.05 samples/sec   Loss 6.7661   LearningRate 0.0595   Epoch: 4   Global Step: 25980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:56,937-Speed 3391.28 samples/sec   Loss 6.8812   LearningRate 0.0595   Epoch: 4   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:05:59,964-Speed 3383.44 samples/sec   Loss 6.8243   LearningRate 0.0595   Epoch: 4   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:06:43,508-[lfw][26000]XNorm: 23.087392
Training: 2022-04-27 04:06:43,509-[lfw][26000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-27 04:06:43,509-[lfw][26000]Accuracy-Highest: 0.99750
Training: 2022-04-27 04:07:34,265-[cfp_fp][26000]XNorm: 20.063840
Training: 2022-04-27 04:07:34,266-[cfp_fp][26000]Accuracy-Flip: 0.94443+-0.01470
Training: 2022-04-27 04:07:34,266-[cfp_fp][26000]Accuracy-Highest: 0.94443
Training: 2022-04-27 04:08:17,832-[agedb_30][26000]XNorm: 22.752791
Training: 2022-04-27 04:08:17,832-[agedb_30][26000]Accuracy-Flip: 0.96917+-0.00955
Training: 2022-04-27 04:08:17,833-[agedb_30][26000]Accuracy-Highest: 0.97167
Training: 2022-04-27 04:08:20,845-Speed 72.69 samples/sec   Loss 6.8609   LearningRate 0.0595   Epoch: 4   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:23,851-Speed 3408.32 samples/sec   Loss 6.7603   LearningRate 0.0595   Epoch: 4   Global Step: 26020   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:08:26,842-Speed 3424.34 samples/sec   Loss 6.8454   LearningRate 0.0595   Epoch: 4   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:29,850-Speed 3404.92 samples/sec   Loss 6.9258   LearningRate 0.0594   Epoch: 4   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:32,857-Speed 3405.43 samples/sec   Loss 6.7818   LearningRate 0.0594   Epoch: 4   Global Step: 26050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:36,100-Speed 3158.13 samples/sec   Loss 6.6342   LearningRate 0.0594   Epoch: 4   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:39,136-Speed 3374.04 samples/sec   Loss 6.5885   LearningRate 0.0594   Epoch: 4   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:42,152-Speed 3396.66 samples/sec   Loss 6.8445   LearningRate 0.0594   Epoch: 4   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:45,168-Speed 3395.45 samples/sec   Loss 6.8144   LearningRate 0.0594   Epoch: 4   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:48,186-Speed 3394.17 samples/sec   Loss 6.7033   LearningRate 0.0594   Epoch: 4   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:51,226-Speed 3369.24 samples/sec   Loss 6.8972   LearningRate 0.0594   Epoch: 4   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:54,255-Speed 3381.91 samples/sec   Loss 6.8564   LearningRate 0.0593   Epoch: 4   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:08:57,279-Speed 3387.22 samples/sec   Loss 6.9184   LearningRate 0.0593   Epoch: 4   Global Step: 26130   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:09:00,356-Speed 3329.01 samples/sec   Loss 6.7975   LearningRate 0.0593   Epoch: 4   Global Step: 26140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:03,577-Speed 3179.70 samples/sec   Loss 6.8512   LearningRate 0.0593   Epoch: 4   Global Step: 26150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:06,611-Speed 3375.85 samples/sec   Loss 6.8672   LearningRate 0.0593   Epoch: 4   Global Step: 26160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:09,639-Speed 3381.98 samples/sec   Loss 6.7486   LearningRate 0.0593   Epoch: 4   Global Step: 26170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:12,678-Speed 3370.78 samples/sec   Loss 6.8675   LearningRate 0.0593   Epoch: 4   Global Step: 26180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:15,709-Speed 3379.79 samples/sec   Loss 6.8344   LearningRate 0.0592   Epoch: 4   Global Step: 26190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:18,731-Speed 3389.06 samples/sec   Loss 6.7177   LearningRate 0.0592   Epoch: 4   Global Step: 26200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:21,753-Speed 3389.05 samples/sec   Loss 6.7505   LearningRate 0.0592   Epoch: 4   Global Step: 26210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:24,793-Speed 3369.59 samples/sec   Loss 6.9399   LearningRate 0.0592   Epoch: 4   Global Step: 26220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:27,826-Speed 3377.18 samples/sec   Loss 6.7522   LearningRate 0.0592   Epoch: 4   Global Step: 26230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:30,852-Speed 3384.71 samples/sec   Loss 6.7169   LearningRate 0.0592   Epoch: 4   Global Step: 26240   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:09:33,851-Speed 3415.18 samples/sec   Loss 6.7648   LearningRate 0.0592   Epoch: 4   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:36,886-Speed 3375.02 samples/sec   Loss 6.8394   LearningRate 0.0591   Epoch: 4   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:39,904-Speed 3393.28 samples/sec   Loss 6.7351   LearningRate 0.0591   Epoch: 4   Global Step: 26270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:42,916-Speed 3400.73 samples/sec   Loss 6.7181   LearningRate 0.0591   Epoch: 4   Global Step: 26280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:09:45,904-Speed 3428.65 samples/sec   Loss 6.7559   LearningRate 0.0591   Epoch: 4   Global Step: 26290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:09:48,927-Speed 3387.76 samples/sec   Loss 6.7363   LearningRate 0.0591   Epoch: 4   Global Step: 26300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:09:51,943-Speed 3396.56 samples/sec   Loss 6.5860   LearningRate 0.0591   Epoch: 4   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:09:54,957-Speed 3398.42 samples/sec   Loss 6.6887   LearningRate 0.0591   Epoch: 4   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:09:57,967-Speed 3401.98 samples/sec   Loss 6.8324   LearningRate 0.0591   Epoch: 4   Global Step: 26330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:10:00,972-Speed 3409.19 samples/sec   Loss 6.9529   LearningRate 0.0590   Epoch: 4   Global Step: 26340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:10:03,981-Speed 3403.70 samples/sec   Loss 6.5581   LearningRate 0.0590   Epoch: 4   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:10:06,996-Speed 3397.43 samples/sec   Loss 6.5834   LearningRate 0.0590   Epoch: 4   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:10:10,006-Speed 3401.60 samples/sec   Loss 6.8721   LearningRate 0.0590   Epoch: 4   Global Step: 26370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:10:13,033-Speed 3384.05 samples/sec   Loss 6.7056   LearningRate 0.0590   Epoch: 4   Global Step: 26380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:10:16,055-Speed 3389.80 samples/sec   Loss 6.7667   LearningRate 0.0590   Epoch: 4   Global Step: 26390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:19,103-Speed 3360.12 samples/sec   Loss 6.7467   LearningRate 0.0590   Epoch: 4   Global Step: 26400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:22,110-Speed 3406.73 samples/sec   Loss 6.6128   LearningRate 0.0589   Epoch: 4   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:25,128-Speed 3393.73 samples/sec   Loss 6.8131   LearningRate 0.0589   Epoch: 4   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:28,132-Speed 3408.99 samples/sec   Loss 6.8180   LearningRate 0.0589   Epoch: 4   Global Step: 26430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:31,138-Speed 3407.60 samples/sec   Loss 6.7356   LearningRate 0.0589   Epoch: 4   Global Step: 26440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:34,148-Speed 3403.55 samples/sec   Loss 6.7482   LearningRate 0.0589   Epoch: 4   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:37,171-Speed 3387.28 samples/sec   Loss 6.6906   LearningRate 0.0589   Epoch: 4   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:40,235-Speed 3342.56 samples/sec   Loss 6.8702   LearningRate 0.0589   Epoch: 4   Global Step: 26470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:43,259-Speed 3388.07 samples/sec   Loss 6.7702   LearningRate 0.0589   Epoch: 4   Global Step: 26480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:46,281-Speed 3388.89 samples/sec   Loss 6.7452   LearningRate 0.0588   Epoch: 4   Global Step: 26490   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:10:49,275-Speed 3422.10 samples/sec   Loss 6.6135   LearningRate 0.0588   Epoch: 4   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:52,283-Speed 3404.91 samples/sec   Loss 6.7105   LearningRate 0.0588   Epoch: 4   Global Step: 26510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:55,291-Speed 3405.73 samples/sec   Loss 6.7238   LearningRate 0.0588   Epoch: 4   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:10:58,299-Speed 3404.77 samples/sec   Loss 6.6160   LearningRate 0.0588   Epoch: 4   Global Step: 26530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:01,348-Speed 3359.52 samples/sec   Loss 6.5864   LearningRate 0.0588   Epoch: 4   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:04,366-Speed 3393.94 samples/sec   Loss 6.6253   LearningRate 0.0588   Epoch: 4   Global Step: 26550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:07,377-Speed 3401.63 samples/sec   Loss 6.6714   LearningRate 0.0587   Epoch: 4   Global Step: 26560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:10,453-Speed 3329.89 samples/sec   Loss 6.8414   LearningRate 0.0587   Epoch: 4   Global Step: 26570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:13,476-Speed 3387.99 samples/sec   Loss 6.9059   LearningRate 0.0587   Epoch: 4   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:16,493-Speed 3394.82 samples/sec   Loss 6.7918   LearningRate 0.0587   Epoch: 4   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:19,486-Speed 3422.20 samples/sec   Loss 6.6893   LearningRate 0.0587   Epoch: 4   Global Step: 26600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:22,497-Speed 3401.01 samples/sec   Loss 6.8121   LearningRate 0.0587   Epoch: 4   Global Step: 26610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:25,523-Speed 3385.85 samples/sec   Loss 6.7591   LearningRate 0.0587   Epoch: 4   Global Step: 26620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:28,564-Speed 3367.56 samples/sec   Loss 6.7503   LearningRate 0.0586   Epoch: 4   Global Step: 26630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:31,581-Speed 3394.89 samples/sec   Loss 6.7778   LearningRate 0.0586   Epoch: 4   Global Step: 26640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:34,604-Speed 3388.47 samples/sec   Loss 6.8014   LearningRate 0.0586   Epoch: 4   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:37,620-Speed 3395.80 samples/sec   Loss 6.7168   LearningRate 0.0586   Epoch: 4   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:40,679-Speed 3348.22 samples/sec   Loss 6.7502   LearningRate 0.0586   Epoch: 4   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:43,694-Speed 3397.37 samples/sec   Loss 6.8532   LearningRate 0.0586   Epoch: 4   Global Step: 26680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:46,733-Speed 3370.08 samples/sec   Loss 6.8040   LearningRate 0.0586   Epoch: 4   Global Step: 26690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:49,763-Speed 3380.48 samples/sec   Loss 6.7219   LearningRate 0.0586   Epoch: 4   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:52,780-Speed 3394.65 samples/sec   Loss 6.7279   LearningRate 0.0585   Epoch: 4   Global Step: 26710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:11:55,777-Speed 3417.39 samples/sec   Loss 6.8945   LearningRate 0.0585   Epoch: 4   Global Step: 26720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:11:58,790-Speed 3400.02 samples/sec   Loss 6.7330   LearningRate 0.0585   Epoch: 4   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:01,802-Speed 3400.48 samples/sec   Loss 6.7314   LearningRate 0.0585   Epoch: 4   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:04,827-Speed 3385.41 samples/sec   Loss 6.7616   LearningRate 0.0585   Epoch: 4   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:07,840-Speed 3399.32 samples/sec   Loss 6.7422   LearningRate 0.0585   Epoch: 4   Global Step: 26760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:10,853-Speed 3399.57 samples/sec   Loss 6.6298   LearningRate 0.0585   Epoch: 4   Global Step: 26770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:13,873-Speed 3391.83 samples/sec   Loss 6.6233   LearningRate 0.0584   Epoch: 4   Global Step: 26780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:16,889-Speed 3395.53 samples/sec   Loss 6.7909   LearningRate 0.0584   Epoch: 4   Global Step: 26790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:19,899-Speed 3403.00 samples/sec   Loss 6.7288   LearningRate 0.0584   Epoch: 4   Global Step: 26800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:22,919-Speed 3391.41 samples/sec   Loss 6.8621   LearningRate 0.0584   Epoch: 4   Global Step: 26810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:25,918-Speed 3416.15 samples/sec   Loss 6.9222   LearningRate 0.0584   Epoch: 4   Global Step: 26820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:28,931-Speed 3398.47 samples/sec   Loss 6.6971   LearningRate 0.0584   Epoch: 4   Global Step: 26830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:31,946-Speed 3397.94 samples/sec   Loss 6.6842   LearningRate 0.0584   Epoch: 4   Global Step: 26840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:34,958-Speed 3400.08 samples/sec   Loss 6.7677   LearningRate 0.0584   Epoch: 4   Global Step: 26850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:37,981-Speed 3387.93 samples/sec   Loss 6.6751   LearningRate 0.0583   Epoch: 4   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:41,005-Speed 3387.22 samples/sec   Loss 6.7660   LearningRate 0.0583   Epoch: 4   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:44,022-Speed 3394.49 samples/sec   Loss 6.8619   LearningRate 0.0583   Epoch: 4   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:47,040-Speed 3394.69 samples/sec   Loss 6.6523   LearningRate 0.0583   Epoch: 4   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:50,062-Speed 3389.43 samples/sec   Loss 6.8414   LearningRate 0.0583   Epoch: 4   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:53,083-Speed 3390.18 samples/sec   Loss 6.8157   LearningRate 0.0583   Epoch: 4   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:12:56,099-Speed 3395.33 samples/sec   Loss 6.7727   LearningRate 0.0583   Epoch: 4   Global Step: 26920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:12:59,128-Speed 3382.05 samples/sec   Loss 6.7874   LearningRate 0.0582   Epoch: 4   Global Step: 26930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:02,157-Speed 3381.24 samples/sec   Loss 6.7154   LearningRate 0.0582   Epoch: 4   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:05,227-Speed 3336.77 samples/sec   Loss 6.7189   LearningRate 0.0582   Epoch: 4   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:08,271-Speed 3364.77 samples/sec   Loss 6.8711   LearningRate 0.0582   Epoch: 4   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:11,291-Speed 3391.39 samples/sec   Loss 6.9450   LearningRate 0.0582   Epoch: 4   Global Step: 26970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:14,310-Speed 3393.54 samples/sec   Loss 6.6661   LearningRate 0.0582   Epoch: 4   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:17,326-Speed 3395.46 samples/sec   Loss 6.7253   LearningRate 0.0582   Epoch: 4   Global Step: 26990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:20,337-Speed 3401.55 samples/sec   Loss 6.8504   LearningRate 0.0582   Epoch: 4   Global Step: 27000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:23,373-Speed 3374.42 samples/sec   Loss 6.7398   LearningRate 0.0581   Epoch: 4   Global Step: 27010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:26,372-Speed 3415.40 samples/sec   Loss 6.5897   LearningRate 0.0581   Epoch: 4   Global Step: 27020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:29,387-Speed 3396.64 samples/sec   Loss 6.6672   LearningRate 0.0581   Epoch: 4   Global Step: 27030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:32,407-Speed 3391.24 samples/sec   Loss 6.7585   LearningRate 0.0581   Epoch: 4   Global Step: 27040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:35,425-Speed 3393.57 samples/sec   Loss 6.6929   LearningRate 0.0581   Epoch: 4   Global Step: 27050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:38,438-Speed 3399.14 samples/sec   Loss 6.6121   LearningRate 0.0581   Epoch: 4   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:41,459-Speed 3390.91 samples/sec   Loss 6.6858   LearningRate 0.0581   Epoch: 4   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:44,476-Speed 3395.55 samples/sec   Loss 6.7128   LearningRate 0.0580   Epoch: 4   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:47,513-Speed 3372.25 samples/sec   Loss 6.6505   LearningRate 0.0580   Epoch: 4   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:50,537-Speed 3387.41 samples/sec   Loss 6.6508   LearningRate 0.0580   Epoch: 4   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:53,555-Speed 3393.81 samples/sec   Loss 6.6177   LearningRate 0.0580   Epoch: 4   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:13:56,569-Speed 3398.55 samples/sec   Loss 6.7253   LearningRate 0.0580   Epoch: 4   Global Step: 27120   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:13:59,585-Speed 3395.46 samples/sec   Loss 6.6897   LearningRate 0.0580   Epoch: 4   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:02,611-Speed 3384.23 samples/sec   Loss 6.6404   LearningRate 0.0580   Epoch: 4   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:05,627-Speed 3397.21 samples/sec   Loss 6.6928   LearningRate 0.0580   Epoch: 4   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:08,646-Speed 3392.13 samples/sec   Loss 6.7091   LearningRate 0.0579   Epoch: 4   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:11,670-Speed 3387.70 samples/sec   Loss 6.5581   LearningRate 0.0579   Epoch: 4   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:14,732-Speed 3344.72 samples/sec   Loss 6.7593   LearningRate 0.0579   Epoch: 4   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:17,778-Speed 3362.61 samples/sec   Loss 6.6592   LearningRate 0.0579   Epoch: 4   Global Step: 27190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:20,817-Speed 3370.75 samples/sec   Loss 6.7124   LearningRate 0.0579   Epoch: 4   Global Step: 27200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:23,847-Speed 3379.85 samples/sec   Loss 6.6230   LearningRate 0.0579   Epoch: 4   Global Step: 27210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:26,865-Speed 3393.32 samples/sec   Loss 6.6559   LearningRate 0.0579   Epoch: 4   Global Step: 27220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:29,863-Speed 3416.23 samples/sec   Loss 6.7926   LearningRate 0.0578   Epoch: 4   Global Step: 27230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:32,881-Speed 3394.45 samples/sec   Loss 6.7405   LearningRate 0.0578   Epoch: 4   Global Step: 27240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:35,920-Speed 3370.49 samples/sec   Loss 6.5588   LearningRate 0.0578   Epoch: 4   Global Step: 27250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:38,943-Speed 3388.08 samples/sec   Loss 6.6366   LearningRate 0.0578   Epoch: 4   Global Step: 27260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:41,961-Speed 3393.99 samples/sec   Loss 6.6606   LearningRate 0.0578   Epoch: 4   Global Step: 27270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:44,985-Speed 3387.44 samples/sec   Loss 6.5953   LearningRate 0.0578   Epoch: 4   Global Step: 27280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:14:47,990-Speed 3407.83 samples/sec   Loss 6.7246   LearningRate 0.0578   Epoch: 4   Global Step: 27290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:14:51,004-Speed 3398.22 samples/sec   Loss 6.8245   LearningRate 0.0578   Epoch: 4   Global Step: 27300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:14:54,024-Speed 3392.08 samples/sec   Loss 6.7670   LearningRate 0.0577   Epoch: 4   Global Step: 27310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:14:57,049-Speed 3385.86 samples/sec   Loss 6.6087   LearningRate 0.0577   Epoch: 4   Global Step: 27320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:00,068-Speed 3392.57 samples/sec   Loss 6.8179   LearningRate 0.0577   Epoch: 4   Global Step: 27330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:03,094-Speed 3384.22 samples/sec   Loss 6.6023   LearningRate 0.0577   Epoch: 4   Global Step: 27340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:06,130-Speed 3375.53 samples/sec   Loss 6.7188   LearningRate 0.0577   Epoch: 4   Global Step: 27350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:09,152-Speed 3388.83 samples/sec   Loss 6.5692   LearningRate 0.0577   Epoch: 4   Global Step: 27360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:12,175-Speed 3388.46 samples/sec   Loss 6.5228   LearningRate 0.0577   Epoch: 4   Global Step: 27370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:15,191-Speed 3395.49 samples/sec   Loss 6.6627   LearningRate 0.0576   Epoch: 4   Global Step: 27380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:18,210-Speed 3393.26 samples/sec   Loss 6.7884   LearningRate 0.0576   Epoch: 4   Global Step: 27390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 04:15:21,230-Speed 3390.90 samples/sec   Loss 6.6879   LearningRate 0.0576   Epoch: 4   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:24,262-Speed 3378.10 samples/sec   Loss 6.6973   LearningRate 0.0576   Epoch: 4   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:27,290-Speed 3382.36 samples/sec   Loss 6.6793   LearningRate 0.0576   Epoch: 4   Global Step: 27420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:30,318-Speed 3383.03 samples/sec   Loss 6.6419   LearningRate 0.0576   Epoch: 4   Global Step: 27430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:33,340-Speed 3389.83 samples/sec   Loss 6.6978   LearningRate 0.0576   Epoch: 4   Global Step: 27440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:36,358-Speed 3393.42 samples/sec   Loss 6.5733   LearningRate 0.0576   Epoch: 4   Global Step: 27450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:39,379-Speed 3390.22 samples/sec   Loss 6.7159   LearningRate 0.0575   Epoch: 4   Global Step: 27460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:42,401-Speed 3389.36 samples/sec   Loss 6.6857   LearningRate 0.0575   Epoch: 4   Global Step: 27470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:45,420-Speed 3392.97 samples/sec   Loss 6.7726   LearningRate 0.0575   Epoch: 4   Global Step: 27480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:48,469-Speed 3358.60 samples/sec   Loss 6.6284   LearningRate 0.0575   Epoch: 4   Global Step: 27490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:15:51,491-Speed 3389.55 samples/sec   Loss 6.6125   LearningRate 0.0575   Epoch: 4   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:15:54,518-Speed 3383.77 samples/sec   Loss 6.6494   LearningRate 0.0575   Epoch: 4   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:15:57,536-Speed 3393.82 samples/sec   Loss 6.7549   LearningRate 0.0575   Epoch: 4   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:00,553-Speed 3395.10 samples/sec   Loss 6.4713   LearningRate 0.0574   Epoch: 4   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:03,579-Speed 3384.85 samples/sec   Loss 6.6805   LearningRate 0.0574   Epoch: 4   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:06,604-Speed 3385.92 samples/sec   Loss 6.7249   LearningRate 0.0574   Epoch: 4   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:09,635-Speed 3379.39 samples/sec   Loss 6.6892   LearningRate 0.0574   Epoch: 4   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:12,664-Speed 3382.17 samples/sec   Loss 6.6448   LearningRate 0.0574   Epoch: 4   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:15,691-Speed 3382.69 samples/sec   Loss 6.7297   LearningRate 0.0574   Epoch: 4   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:18,711-Speed 3392.29 samples/sec   Loss 6.5607   LearningRate 0.0574   Epoch: 4   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:21,715-Speed 3408.83 samples/sec   Loss 6.6876   LearningRate 0.0574   Epoch: 4   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:24,741-Speed 3385.70 samples/sec   Loss 6.5975   LearningRate 0.0573   Epoch: 4   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:27,764-Speed 3387.31 samples/sec   Loss 6.6593   LearningRate 0.0573   Epoch: 4   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:30,785-Speed 3390.80 samples/sec   Loss 6.6981   LearningRate 0.0573   Epoch: 4   Global Step: 27630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:33,900-Speed 3288.68 samples/sec   Loss 6.7159   LearningRate 0.0573   Epoch: 4   Global Step: 27640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:36,943-Speed 3365.52 samples/sec   Loss 6.5528   LearningRate 0.0573   Epoch: 4   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:40,133-Speed 3211.09 samples/sec   Loss 6.6697   LearningRate 0.0573   Epoch: 4   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:43,242-Speed 3294.55 samples/sec   Loss 6.5614   LearningRate 0.0573   Epoch: 4   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:46,262-Speed 3391.36 samples/sec   Loss 6.7548   LearningRate 0.0572   Epoch: 4   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:49,285-Speed 3387.57 samples/sec   Loss 6.7949   LearningRate 0.0572   Epoch: 4   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:52,296-Speed 3401.84 samples/sec   Loss 6.5456   LearningRate 0.0572   Epoch: 4   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:16:55,300-Speed 3409.97 samples/sec   Loss 6.5964   LearningRate 0.0572   Epoch: 4   Global Step: 27710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:16:58,325-Speed 3385.47 samples/sec   Loss 6.6071   LearningRate 0.0572   Epoch: 4   Global Step: 27720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:01,349-Speed 3386.80 samples/sec   Loss 6.5212   LearningRate 0.0572   Epoch: 4   Global Step: 27730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:04,403-Speed 3354.34 samples/sec   Loss 6.7428   LearningRate 0.0572   Epoch: 4   Global Step: 27740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:07,428-Speed 3385.97 samples/sec   Loss 6.5696   LearningRate 0.0572   Epoch: 4   Global Step: 27750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:10,455-Speed 3383.46 samples/sec   Loss 6.5948   LearningRate 0.0571   Epoch: 4   Global Step: 27760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:13,484-Speed 3381.97 samples/sec   Loss 6.5816   LearningRate 0.0571   Epoch: 4   Global Step: 27770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:16,517-Speed 3376.71 samples/sec   Loss 6.5480   LearningRate 0.0571   Epoch: 4   Global Step: 27780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:19,545-Speed 3382.47 samples/sec   Loss 6.6394   LearningRate 0.0571   Epoch: 4   Global Step: 27790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:22,575-Speed 3380.18 samples/sec   Loss 6.6532   LearningRate 0.0571   Epoch: 4   Global Step: 27800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:25,615-Speed 3370.05 samples/sec   Loss 6.5295   LearningRate 0.0571   Epoch: 4   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:28,648-Speed 3376.99 samples/sec   Loss 6.5040   LearningRate 0.0571   Epoch: 4   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:31,668-Speed 3390.44 samples/sec   Loss 6.6268   LearningRate 0.0570   Epoch: 4   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:34,694-Speed 3385.61 samples/sec   Loss 6.5905   LearningRate 0.0570   Epoch: 4   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:37,733-Speed 3370.07 samples/sec   Loss 6.6355   LearningRate 0.0570   Epoch: 4   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:40,752-Speed 3392.13 samples/sec   Loss 6.5910   LearningRate 0.0570   Epoch: 4   Global Step: 27860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:43,786-Speed 3375.63 samples/sec   Loss 6.6946   LearningRate 0.0570   Epoch: 4   Global Step: 27870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:46,816-Speed 3380.94 samples/sec   Loss 6.5928   LearningRate 0.0570   Epoch: 4   Global Step: 27880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:17:49,845-Speed 3381.36 samples/sec   Loss 6.6521   LearningRate 0.0570   Epoch: 4   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:52,872-Speed 3383.68 samples/sec   Loss 6.6342   LearningRate 0.0570   Epoch: 4   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:55,896-Speed 3387.47 samples/sec   Loss 6.6379   LearningRate 0.0569   Epoch: 4   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:17:58,919-Speed 3388.11 samples/sec   Loss 6.3860   LearningRate 0.0569   Epoch: 4   Global Step: 27920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:18:01,940-Speed 3389.65 samples/sec   Loss 6.5595   LearningRate 0.0569   Epoch: 4   Global Step: 27930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:18:04,966-Speed 3384.92 samples/sec   Loss 6.6465   LearningRate 0.0569   Epoch: 4   Global Step: 27940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:18:07,993-Speed 3383.81 samples/sec   Loss 6.6515   LearningRate 0.0569   Epoch: 4   Global Step: 27950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:18:11,026-Speed 3377.22 samples/sec   Loss 6.6718   LearningRate 0.0569   Epoch: 4   Global Step: 27960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:18:14,050-Speed 3386.72 samples/sec   Loss 6.5897   LearningRate 0.0569   Epoch: 4   Global Step: 27970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:18:17,080-Speed 3381.38 samples/sec   Loss 6.5766   LearningRate 0.0568   Epoch: 4   Global Step: 27980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:18:20,109-Speed 3380.65 samples/sec   Loss 6.7657   LearningRate 0.0568   Epoch: 4   Global Step: 27990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:18:23,200-Speed 3313.72 samples/sec   Loss 6.6644   LearningRate 0.0568   Epoch: 4   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:19:06,448-[lfw][28000]XNorm: 20.600976
Training: 2022-04-27 04:19:06,448-[lfw][28000]Accuracy-Flip: 0.99600+-0.00260
Training: 2022-04-27 04:19:06,449-[lfw][28000]Accuracy-Highest: 0.99750
Training: 2022-04-27 04:19:56,931-[cfp_fp][28000]XNorm: 18.298377
Training: 2022-04-27 04:19:56,932-[cfp_fp][28000]Accuracy-Flip: 0.93886+-0.01355
Training: 2022-04-27 04:19:56,932-[cfp_fp][28000]Accuracy-Highest: 0.94443
Training: 2022-04-27 04:20:40,342-[agedb_30][28000]XNorm: 20.489127
Training: 2022-04-27 04:20:40,343-[agedb_30][28000]Accuracy-Flip: 0.97133+-0.01011
Training: 2022-04-27 04:20:40,343-[agedb_30][28000]Accuracy-Highest: 0.97167
Training: 2022-04-27 04:20:43,364-Speed 73.06 samples/sec   Loss 6.7090   LearningRate 0.0568   Epoch: 4   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:20:46,387-Speed 3387.46 samples/sec   Loss 6.5221   LearningRate 0.0568   Epoch: 4   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:20:49,385-Speed 3416.96 samples/sec   Loss 6.5855   LearningRate 0.0568   Epoch: 4   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:20:52,397-Speed 3400.42 samples/sec   Loss 6.5815   LearningRate 0.0568   Epoch: 4   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:20:55,411-Speed 3398.35 samples/sec   Loss 6.6225   LearningRate 0.0568   Epoch: 4   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:20:58,429-Speed 3393.78 samples/sec   Loss 6.4728   LearningRate 0.0567   Epoch: 4   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:21:01,463-Speed 3375.93 samples/sec   Loss 6.5367   LearningRate 0.0567   Epoch: 4   Global Step: 28070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:21:04,484-Speed 3389.56 samples/sec   Loss 6.6303   LearningRate 0.0567   Epoch: 4   Global Step: 28080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:21:07,503-Speed 3392.53 samples/sec   Loss 6.6451   LearningRate 0.0567   Epoch: 4   Global Step: 28090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:21:10,524-Speed 3390.71 samples/sec   Loss 6.7046   LearningRate 0.0567   Epoch: 4   Global Step: 28100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:21:13,545-Speed 3390.48 samples/sec   Loss 6.6895   LearningRate 0.0567   Epoch: 4   Global Step: 28110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:21:16,566-Speed 3391.48 samples/sec   Loss 6.6635   LearningRate 0.0567   Epoch: 4   Global Step: 28120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:21:19,592-Speed 3384.27 samples/sec   Loss 6.5855   LearningRate 0.0566   Epoch: 4   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:22,614-Speed 3389.42 samples/sec   Loss 6.6673   LearningRate 0.0566   Epoch: 4   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:25,636-Speed 3388.77 samples/sec   Loss 6.5562   LearningRate 0.0566   Epoch: 4   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:28,660-Speed 3386.95 samples/sec   Loss 6.4710   LearningRate 0.0566   Epoch: 4   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:31,680-Speed 3391.22 samples/sec   Loss 6.4610   LearningRate 0.0566   Epoch: 4   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:34,704-Speed 3388.01 samples/sec   Loss 6.6731   LearningRate 0.0566   Epoch: 4   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:37,740-Speed 3373.01 samples/sec   Loss 6.5271   LearningRate 0.0566   Epoch: 4   Global Step: 28190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:40,769-Speed 3381.45 samples/sec   Loss 6.5566   LearningRate 0.0566   Epoch: 4   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:43,814-Speed 3363.94 samples/sec   Loss 6.5637   LearningRate 0.0565   Epoch: 4   Global Step: 28210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:46,843-Speed 3381.69 samples/sec   Loss 6.4893   LearningRate 0.0565   Epoch: 4   Global Step: 28220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:49,877-Speed 3375.61 samples/sec   Loss 6.6909   LearningRate 0.0565   Epoch: 4   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:52,921-Speed 3364.96 samples/sec   Loss 6.6809   LearningRate 0.0565   Epoch: 4   Global Step: 28240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:55,957-Speed 3373.27 samples/sec   Loss 6.5695   LearningRate 0.0565   Epoch: 4   Global Step: 28250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:21:59,008-Speed 3357.08 samples/sec   Loss 6.5994   LearningRate 0.0565   Epoch: 4   Global Step: 28260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:02,042-Speed 3375.80 samples/sec   Loss 6.4801   LearningRate 0.0565   Epoch: 4   Global Step: 28270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:05,069-Speed 3384.08 samples/sec   Loss 6.5425   LearningRate 0.0564   Epoch: 4   Global Step: 28280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:08,093-Speed 3387.84 samples/sec   Loss 6.5049   LearningRate 0.0564   Epoch: 4   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:11,160-Speed 3339.47 samples/sec   Loss 6.4665   LearningRate 0.0564   Epoch: 4   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:14,187-Speed 3383.62 samples/sec   Loss 6.4931   LearningRate 0.0564   Epoch: 4   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:17,208-Speed 3389.65 samples/sec   Loss 6.6132   LearningRate 0.0564   Epoch: 4   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:20,232-Speed 3386.98 samples/sec   Loss 6.7068   LearningRate 0.0564   Epoch: 4   Global Step: 28330   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:22:23,253-Speed 3390.74 samples/sec   Loss 6.5797   LearningRate 0.0564   Epoch: 4   Global Step: 28340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:22:26,258-Speed 3408.68 samples/sec   Loss 6.5406   LearningRate 0.0564   Epoch: 4   Global Step: 28350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:29,280-Speed 3388.55 samples/sec   Loss 6.6048   LearningRate 0.0563   Epoch: 4   Global Step: 28360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:22:32,282-Speed 3412.47 samples/sec   Loss 6.6766   LearningRate 0.0563   Epoch: 4   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:22:35,302-Speed 3392.10 samples/sec   Loss 6.6452   LearningRate 0.0563   Epoch: 4   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:22:38,324-Speed 3389.29 samples/sec   Loss 6.6440   LearningRate 0.0563   Epoch: 4   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:22:41,344-Speed 3390.74 samples/sec   Loss 6.4998   LearningRate 0.0563   Epoch: 4   Global Step: 28400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:22:44,364-Speed 3391.60 samples/sec   Loss 6.5512   LearningRate 0.0563   Epoch: 4   Global Step: 28410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:22:47,562-Speed 3203.33 samples/sec   Loss 6.6577   LearningRate 0.0563   Epoch: 4   Global Step: 28420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:22:50,583-Speed 3389.27 samples/sec   Loss 6.5507   LearningRate 0.0563   Epoch: 4   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:03,920-Speed 767.87 samples/sec   Loss 6.0581   LearningRate 0.0562   Epoch: 5   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:06,962-Speed 3368.35 samples/sec   Loss 5.9586   LearningRate 0.0562   Epoch: 5   Global Step: 28450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:09,997-Speed 3375.28 samples/sec   Loss 6.1011   LearningRate 0.0562   Epoch: 5   Global Step: 28460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:13,011-Speed 3397.81 samples/sec   Loss 5.9808   LearningRate 0.0562   Epoch: 5   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:23:16,051-Speed 3368.99 samples/sec   Loss 5.9908   LearningRate 0.0562   Epoch: 5   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:23:19,069-Speed 3395.08 samples/sec   Loss 5.9135   LearningRate 0.0562   Epoch: 5   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:23:22,089-Speed 3390.97 samples/sec   Loss 5.9870   LearningRate 0.0562   Epoch: 5   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:23:25,129-Speed 3369.04 samples/sec   Loss 6.0094   LearningRate 0.0561   Epoch: 5   Global Step: 28510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:23:28,163-Speed 3376.54 samples/sec   Loss 6.0446   LearningRate 0.0561   Epoch: 5   Global Step: 28520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:23:31,166-Speed 3410.25 samples/sec   Loss 6.0945   LearningRate 0.0561   Epoch: 5   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:34,186-Speed 3391.43 samples/sec   Loss 6.1125   LearningRate 0.0561   Epoch: 5   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:37,209-Speed 3388.94 samples/sec   Loss 6.0254   LearningRate 0.0561   Epoch: 5   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:40,235-Speed 3384.10 samples/sec   Loss 6.0734   LearningRate 0.0561   Epoch: 5   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:43,259-Speed 3387.63 samples/sec   Loss 6.0741   LearningRate 0.0561   Epoch: 5   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:46,291-Speed 3377.50 samples/sec   Loss 5.9974   LearningRate 0.0561   Epoch: 5   Global Step: 28580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:49,342-Speed 3356.88 samples/sec   Loss 6.0468   LearningRate 0.0560   Epoch: 5   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:52,370-Speed 3383.43 samples/sec   Loss 6.0610   LearningRate 0.0560   Epoch: 5   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:55,403-Speed 3377.32 samples/sec   Loss 6.1386   LearningRate 0.0560   Epoch: 5   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:23:58,435-Speed 3377.70 samples/sec   Loss 5.9311   LearningRate 0.0560   Epoch: 5   Global Step: 28620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:01,471-Speed 3373.41 samples/sec   Loss 6.2108   LearningRate 0.0560   Epoch: 5   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:04,505-Speed 3375.70 samples/sec   Loss 6.1234   LearningRate 0.0560   Epoch: 5   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:07,546-Speed 3368.24 samples/sec   Loss 6.1246   LearningRate 0.0560   Epoch: 5   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:10,587-Speed 3368.16 samples/sec   Loss 6.2423   LearningRate 0.0559   Epoch: 5   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:13,649-Speed 3345.44 samples/sec   Loss 6.1418   LearningRate 0.0559   Epoch: 5   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:16,672-Speed 3387.93 samples/sec   Loss 6.1166   LearningRate 0.0559   Epoch: 5   Global Step: 28680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:19,708-Speed 3373.33 samples/sec   Loss 6.1114   LearningRate 0.0559   Epoch: 5   Global Step: 28690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:22,769-Speed 3346.86 samples/sec   Loss 6.2816   LearningRate 0.0559   Epoch: 5   Global Step: 28700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:25,802-Speed 3376.23 samples/sec   Loss 6.2510   LearningRate 0.0559   Epoch: 5   Global Step: 28710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:28,855-Speed 3355.13 samples/sec   Loss 6.2471   LearningRate 0.0559   Epoch: 5   Global Step: 28720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:31,884-Speed 3382.10 samples/sec   Loss 6.2027   LearningRate 0.0559   Epoch: 5   Global Step: 28730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:34,912-Speed 3381.67 samples/sec   Loss 6.1075   LearningRate 0.0558   Epoch: 5   Global Step: 28740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:37,951-Speed 3370.15 samples/sec   Loss 6.2437   LearningRate 0.0558   Epoch: 5   Global Step: 28750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:41,003-Speed 3357.17 samples/sec   Loss 6.1197   LearningRate 0.0558   Epoch: 5   Global Step: 28760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:44,035-Speed 3377.69 samples/sec   Loss 6.2133   LearningRate 0.0558   Epoch: 5   Global Step: 28770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:24:47,065-Speed 3380.13 samples/sec   Loss 6.0631   LearningRate 0.0558   Epoch: 5   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:50,087-Speed 3390.15 samples/sec   Loss 6.2110   LearningRate 0.0558   Epoch: 5   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:53,107-Speed 3391.35 samples/sec   Loss 6.2302   LearningRate 0.0558   Epoch: 5   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:56,132-Speed 3385.16 samples/sec   Loss 6.2842   LearningRate 0.0557   Epoch: 5   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:24:59,153-Speed 3391.46 samples/sec   Loss 6.2979   LearningRate 0.0557   Epoch: 5   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:02,219-Speed 3340.55 samples/sec   Loss 6.3249   LearningRate 0.0557   Epoch: 5   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:05,240-Speed 3390.06 samples/sec   Loss 6.1415   LearningRate 0.0557   Epoch: 5   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:08,262-Speed 3388.86 samples/sec   Loss 6.4004   LearningRate 0.0557   Epoch: 5   Global Step: 28850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:11,284-Speed 3389.18 samples/sec   Loss 6.2302   LearningRate 0.0557   Epoch: 5   Global Step: 28860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:14,298-Speed 3399.26 samples/sec   Loss 6.2347   LearningRate 0.0557   Epoch: 5   Global Step: 28870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:17,318-Speed 3391.25 samples/sec   Loss 6.1551   LearningRate 0.0557   Epoch: 5   Global Step: 28880   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:25:20,315-Speed 3417.37 samples/sec   Loss 6.1446   LearningRate 0.0556   Epoch: 5   Global Step: 28890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:23,359-Speed 3364.46 samples/sec   Loss 6.1764   LearningRate 0.0556   Epoch: 5   Global Step: 28900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:26,375-Speed 3396.61 samples/sec   Loss 6.2682   LearningRate 0.0556   Epoch: 5   Global Step: 28910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:29,385-Speed 3402.04 samples/sec   Loss 6.2383   LearningRate 0.0556   Epoch: 5   Global Step: 28920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:32,411-Speed 3384.65 samples/sec   Loss 6.2872   LearningRate 0.0556   Epoch: 5   Global Step: 28930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:35,432-Speed 3391.24 samples/sec   Loss 6.4516   LearningRate 0.0556   Epoch: 5   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:38,457-Speed 3385.33 samples/sec   Loss 6.3799   LearningRate 0.0556   Epoch: 5   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:41,481-Speed 3387.52 samples/sec   Loss 6.2466   LearningRate 0.0556   Epoch: 5   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:44,495-Speed 3398.34 samples/sec   Loss 6.2302   LearningRate 0.0555   Epoch: 5   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:47,576-Speed 3324.70 samples/sec   Loss 6.2843   LearningRate 0.0555   Epoch: 5   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:50,616-Speed 3368.43 samples/sec   Loss 6.2597   LearningRate 0.0555   Epoch: 5   Global Step: 28990   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:25:53,660-Speed 3365.30 samples/sec   Loss 6.2839   LearningRate 0.0555   Epoch: 5   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:25:56,653-Speed 3421.16 samples/sec   Loss 6.3063   LearningRate 0.0555   Epoch: 5   Global Step: 29010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:25:59,681-Speed 3382.72 samples/sec   Loss 6.2427   LearningRate 0.0555   Epoch: 5   Global Step: 29020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:02,708-Speed 3383.79 samples/sec   Loss 6.4099   LearningRate 0.0555   Epoch: 5   Global Step: 29030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:05,730-Speed 3390.11 samples/sec   Loss 6.2059   LearningRate 0.0554   Epoch: 5   Global Step: 29040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:08,751-Speed 3390.25 samples/sec   Loss 6.3701   LearningRate 0.0554   Epoch: 5   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:11,767-Speed 3395.53 samples/sec   Loss 6.1499   LearningRate 0.0554   Epoch: 5   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:14,796-Speed 3381.87 samples/sec   Loss 6.4190   LearningRate 0.0554   Epoch: 5   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:17,823-Speed 3383.89 samples/sec   Loss 6.2339   LearningRate 0.0554   Epoch: 5   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:20,843-Speed 3391.43 samples/sec   Loss 6.1581   LearningRate 0.0554   Epoch: 5   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:23,863-Speed 3390.98 samples/sec   Loss 6.1593   LearningRate 0.0554   Epoch: 5   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:26:26,882-Speed 3392.21 samples/sec   Loss 6.3320   LearningRate 0.0554   Epoch: 5   Global Step: 29110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:29,904-Speed 3389.66 samples/sec   Loss 6.3120   LearningRate 0.0553   Epoch: 5   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:32,928-Speed 3387.46 samples/sec   Loss 6.3593   LearningRate 0.0553   Epoch: 5   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:35,960-Speed 3378.15 samples/sec   Loss 6.2970   LearningRate 0.0553   Epoch: 5   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:38,989-Speed 3380.93 samples/sec   Loss 6.3713   LearningRate 0.0553   Epoch: 5   Global Step: 29150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:42,021-Speed 3378.44 samples/sec   Loss 6.2687   LearningRate 0.0553   Epoch: 5   Global Step: 29160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:45,041-Speed 3391.27 samples/sec   Loss 6.2558   LearningRate 0.0553   Epoch: 5   Global Step: 29170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:48,074-Speed 3376.85 samples/sec   Loss 6.3296   LearningRate 0.0553   Epoch: 5   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:51,097-Speed 3388.44 samples/sec   Loss 6.4390   LearningRate 0.0553   Epoch: 5   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:54,118-Speed 3390.16 samples/sec   Loss 6.3096   LearningRate 0.0552   Epoch: 5   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:26:57,145-Speed 3384.38 samples/sec   Loss 6.3339   LearningRate 0.0552   Epoch: 5   Global Step: 29210   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:27:00,180-Speed 3374.55 samples/sec   Loss 6.3648   LearningRate 0.0552   Epoch: 5   Global Step: 29220   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:27:03,204-Speed 3387.23 samples/sec   Loss 6.1761   LearningRate 0.0552   Epoch: 5   Global Step: 29230   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:27:06,220-Speed 3395.86 samples/sec   Loss 6.3021   LearningRate 0.0552   Epoch: 5   Global Step: 29240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:09,241-Speed 3390.22 samples/sec   Loss 6.2159   LearningRate 0.0552   Epoch: 5   Global Step: 29250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:12,261-Speed 3391.74 samples/sec   Loss 6.2571   LearningRate 0.0552   Epoch: 5   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:15,281-Speed 3391.51 samples/sec   Loss 6.2604   LearningRate 0.0551   Epoch: 5   Global Step: 29270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:18,303-Speed 3389.67 samples/sec   Loss 6.4101   LearningRate 0.0551   Epoch: 5   Global Step: 29280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:21,335-Speed 3377.96 samples/sec   Loss 6.2315   LearningRate 0.0551   Epoch: 5   Global Step: 29290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:24,374-Speed 3370.02 samples/sec   Loss 6.2628   LearningRate 0.0551   Epoch: 5   Global Step: 29300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:27,394-Speed 3391.88 samples/sec   Loss 6.2386   LearningRate 0.0551   Epoch: 5   Global Step: 29310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:30,416-Speed 3389.55 samples/sec   Loss 6.4200   LearningRate 0.0551   Epoch: 5   Global Step: 29320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:33,434-Speed 3393.33 samples/sec   Loss 6.2536   LearningRate 0.0551   Epoch: 5   Global Step: 29330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:36,454-Speed 3391.51 samples/sec   Loss 6.3048   LearningRate 0.0551   Epoch: 5   Global Step: 29340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:27:39,461-Speed 3405.87 samples/sec   Loss 6.2734   LearningRate 0.0550   Epoch: 5   Global Step: 29350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:42,482-Speed 3389.91 samples/sec   Loss 6.3984   LearningRate 0.0550   Epoch: 5   Global Step: 29360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:45,509-Speed 3384.39 samples/sec   Loss 6.2112   LearningRate 0.0550   Epoch: 5   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:48,536-Speed 3383.95 samples/sec   Loss 6.3482   LearningRate 0.0550   Epoch: 5   Global Step: 29380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:51,560-Speed 3385.92 samples/sec   Loss 6.4955   LearningRate 0.0550   Epoch: 5   Global Step: 29390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:54,599-Speed 3370.97 samples/sec   Loss 6.4213   LearningRate 0.0550   Epoch: 5   Global Step: 29400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:27:57,625-Speed 3385.29 samples/sec   Loss 6.1749   LearningRate 0.0550   Epoch: 5   Global Step: 29410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:00,662-Speed 3372.22 samples/sec   Loss 6.2676   LearningRate 0.0550   Epoch: 5   Global Step: 29420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:03,710-Speed 3360.75 samples/sec   Loss 6.3748   LearningRate 0.0549   Epoch: 5   Global Step: 29430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:06,732-Speed 3389.23 samples/sec   Loss 6.2022   LearningRate 0.0549   Epoch: 5   Global Step: 29440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:09,798-Speed 3340.69 samples/sec   Loss 6.3842   LearningRate 0.0549   Epoch: 5   Global Step: 29450   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-27 04:28:12,806-Speed 3405.04 samples/sec   Loss 6.2207   LearningRate 0.0549   Epoch: 5   Global Step: 29460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:15,827-Speed 3390.17 samples/sec   Loss 6.3131   LearningRate 0.0549   Epoch: 5   Global Step: 29470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:18,860-Speed 3377.51 samples/sec   Loss 6.4202   LearningRate 0.0549   Epoch: 5   Global Step: 29480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:21,861-Speed 3413.06 samples/sec   Loss 6.3223   LearningRate 0.0549   Epoch: 5   Global Step: 29490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:24,883-Speed 3389.35 samples/sec   Loss 6.3742   LearningRate 0.0548   Epoch: 5   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:27,907-Speed 3387.26 samples/sec   Loss 6.3160   LearningRate 0.0548   Epoch: 5   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:30,932-Speed 3385.15 samples/sec   Loss 6.3385   LearningRate 0.0548   Epoch: 5   Global Step: 29520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:33,953-Speed 3390.86 samples/sec   Loss 6.2444   LearningRate 0.0548   Epoch: 5   Global Step: 29530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:36,972-Speed 3392.91 samples/sec   Loss 6.4784   LearningRate 0.0548   Epoch: 5   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:40,003-Speed 3378.86 samples/sec   Loss 6.3011   LearningRate 0.0548   Epoch: 5   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:43,029-Speed 3384.52 samples/sec   Loss 6.4632   LearningRate 0.0548   Epoch: 5   Global Step: 29560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:46,057-Speed 3382.88 samples/sec   Loss 6.4195   LearningRate 0.0548   Epoch: 5   Global Step: 29570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:49,083-Speed 3385.25 samples/sec   Loss 6.4155   LearningRate 0.0547   Epoch: 5   Global Step: 29580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:52,100-Speed 3394.87 samples/sec   Loss 6.2864   LearningRate 0.0547   Epoch: 5   Global Step: 29590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:28:55,101-Speed 3412.28 samples/sec   Loss 6.3902   LearningRate 0.0547   Epoch: 5   Global Step: 29600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:28:58,122-Speed 3391.10 samples/sec   Loss 6.5820   LearningRate 0.0547   Epoch: 5   Global Step: 29610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:01,139-Speed 3394.44 samples/sec   Loss 6.2833   LearningRate 0.0547   Epoch: 5   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:04,165-Speed 3384.40 samples/sec   Loss 6.2001   LearningRate 0.0547   Epoch: 5   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:07,189-Speed 3386.92 samples/sec   Loss 6.3935   LearningRate 0.0547   Epoch: 5   Global Step: 29640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:10,227-Speed 3372.63 samples/sec   Loss 6.3798   LearningRate 0.0547   Epoch: 5   Global Step: 29650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:13,257-Speed 3379.27 samples/sec   Loss 6.4024   LearningRate 0.0546   Epoch: 5   Global Step: 29660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:16,280-Speed 3388.68 samples/sec   Loss 6.3428   LearningRate 0.0546   Epoch: 5   Global Step: 29670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:19,341-Speed 3346.13 samples/sec   Loss 6.3590   LearningRate 0.0546   Epoch: 5   Global Step: 29680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:22,365-Speed 3386.92 samples/sec   Loss 6.4183   LearningRate 0.0546   Epoch: 5   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:25,413-Speed 3360.23 samples/sec   Loss 6.3040   LearningRate 0.0546   Epoch: 5   Global Step: 29700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:29:28,454-Speed 3368.47 samples/sec   Loss 6.4194   LearningRate 0.0546   Epoch: 5   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:29:31,477-Speed 3387.53 samples/sec   Loss 6.2951   LearningRate 0.0546   Epoch: 5   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:29:34,499-Speed 3389.93 samples/sec   Loss 6.3687   LearningRate 0.0545   Epoch: 5   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:29:37,525-Speed 3385.03 samples/sec   Loss 6.4550   LearningRate 0.0545   Epoch: 5   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:29:40,545-Speed 3391.51 samples/sec   Loss 6.2130   LearningRate 0.0545   Epoch: 5   Global Step: 29750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:29:43,570-Speed 3386.09 samples/sec   Loss 6.2982   LearningRate 0.0545   Epoch: 5   Global Step: 29760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:29:46,566-Speed 3418.63 samples/sec   Loss 6.3446   LearningRate 0.0545   Epoch: 5   Global Step: 29770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:49,596-Speed 3380.59 samples/sec   Loss 6.5474   LearningRate 0.0545   Epoch: 5   Global Step: 29780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:52,628-Speed 3377.95 samples/sec   Loss 6.4196   LearningRate 0.0545   Epoch: 5   Global Step: 29790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:55,659-Speed 3378.56 samples/sec   Loss 6.3488   LearningRate 0.0545   Epoch: 5   Global Step: 29800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:29:58,694-Speed 3374.62 samples/sec   Loss 6.2327   LearningRate 0.0544   Epoch: 5   Global Step: 29810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:01,715-Speed 3390.59 samples/sec   Loss 6.3546   LearningRate 0.0544   Epoch: 5   Global Step: 29820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:04,755-Speed 3369.05 samples/sec   Loss 6.3971   LearningRate 0.0544   Epoch: 5   Global Step: 29830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:07,776-Speed 3390.64 samples/sec   Loss 6.3958   LearningRate 0.0544   Epoch: 5   Global Step: 29840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:10,797-Speed 3390.10 samples/sec   Loss 6.3306   LearningRate 0.0544   Epoch: 5   Global Step: 29850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:13,820-Speed 3388.67 samples/sec   Loss 6.3475   LearningRate 0.0544   Epoch: 5   Global Step: 29860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:16,843-Speed 3388.13 samples/sec   Loss 6.2396   LearningRate 0.0544   Epoch: 5   Global Step: 29870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:30:19,875-Speed 3377.96 samples/sec   Loss 6.3806   LearningRate 0.0544   Epoch: 5   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:30:22,917-Speed 3366.99 samples/sec   Loss 6.4120   LearningRate 0.0543   Epoch: 5   Global Step: 29890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:30:25,925-Speed 3405.35 samples/sec   Loss 6.1548   LearningRate 0.0543   Epoch: 5   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:28,994-Speed 3336.57 samples/sec   Loss 6.4759   LearningRate 0.0543   Epoch: 5   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:32,058-Speed 3343.77 samples/sec   Loss 6.4614   LearningRate 0.0543   Epoch: 5   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:35,100-Speed 3366.08 samples/sec   Loss 6.3105   LearningRate 0.0543   Epoch: 5   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:38,125-Speed 3386.94 samples/sec   Loss 6.2954   LearningRate 0.0543   Epoch: 5   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:41,144-Speed 3392.71 samples/sec   Loss 6.2281   LearningRate 0.0543   Epoch: 5   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:44,162-Speed 3393.14 samples/sec   Loss 6.4878   LearningRate 0.0542   Epoch: 5   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:47,187-Speed 3386.41 samples/sec   Loss 6.3575   LearningRate 0.0542   Epoch: 5   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:30:50,214-Speed 3384.17 samples/sec   Loss 6.2969   LearningRate 0.0542   Epoch: 5   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:30:53,242-Speed 3382.27 samples/sec   Loss 6.2774   LearningRate 0.0542   Epoch: 5   Global Step: 29990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:30:56,279-Speed 3372.47 samples/sec   Loss 6.4366   LearningRate 0.0542   Epoch: 5   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:31:39,774-[lfw][30000]XNorm: 21.604066
Training: 2022-04-27 04:31:39,775-[lfw][30000]Accuracy-Flip: 0.99683+-0.00329
Training: 2022-04-27 04:31:39,775-[lfw][30000]Accuracy-Highest: 0.99750
Training: 2022-04-27 04:32:30,254-[cfp_fp][30000]XNorm: 19.520216
Training: 2022-04-27 04:32:30,255-[cfp_fp][30000]Accuracy-Flip: 0.95071+-0.01122
Training: 2022-04-27 04:32:30,255-[cfp_fp][30000]Accuracy-Highest: 0.95071
Training: 2022-04-27 04:33:13,673-[agedb_30][30000]XNorm: 21.513922
Training: 2022-04-27 04:33:13,673-[agedb_30][30000]Accuracy-Flip: 0.97233+-0.00782
Training: 2022-04-27 04:33:13,674-[agedb_30][30000]Accuracy-Highest: 0.97233
Training: 2022-04-27 04:33:16,701-Speed 72.92 samples/sec   Loss 6.2756   LearningRate 0.0542   Epoch: 5   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:19,709-Speed 3405.12 samples/sec   Loss 6.3951   LearningRate 0.0542   Epoch: 5   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:22,741-Speed 3378.00 samples/sec   Loss 6.2914   LearningRate 0.0542   Epoch: 5   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:25,759-Speed 3393.87 samples/sec   Loss 6.4194   LearningRate 0.0541   Epoch: 5   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:28,767-Speed 3405.20 samples/sec   Loss 6.3582   LearningRate 0.0541   Epoch: 5   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:31,778-Speed 3401.63 samples/sec   Loss 6.3833   LearningRate 0.0541   Epoch: 5   Global Step: 30060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:34,799-Speed 3391.06 samples/sec   Loss 6.4203   LearningRate 0.0541   Epoch: 5   Global Step: 30070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:37,811-Speed 3400.43 samples/sec   Loss 6.2717   LearningRate 0.0541   Epoch: 5   Global Step: 30080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:40,845-Speed 3375.74 samples/sec   Loss 6.4813   LearningRate 0.0541   Epoch: 5   Global Step: 30090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:43,852-Speed 3406.04 samples/sec   Loss 6.3175   LearningRate 0.0541   Epoch: 5   Global Step: 30100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:46,859-Speed 3406.55 samples/sec   Loss 6.4768   LearningRate 0.0541   Epoch: 5   Global Step: 30110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:49,888-Speed 3381.50 samples/sec   Loss 6.3412   LearningRate 0.0540   Epoch: 5   Global Step: 30120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:52,901-Speed 3398.45 samples/sec   Loss 6.2620   LearningRate 0.0540   Epoch: 5   Global Step: 30130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:33:55,902-Speed 3413.51 samples/sec   Loss 6.2955   LearningRate 0.0540   Epoch: 5   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:33:58,917-Speed 3397.66 samples/sec   Loss 6.2445   LearningRate 0.0540   Epoch: 5   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:01,940-Speed 3387.75 samples/sec   Loss 6.3804   LearningRate 0.0540   Epoch: 5   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:04,990-Speed 3357.65 samples/sec   Loss 6.3545   LearningRate 0.0540   Epoch: 5   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:08,029-Speed 3370.49 samples/sec   Loss 6.5291   LearningRate 0.0540   Epoch: 5   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:11,040-Speed 3401.87 samples/sec   Loss 6.4499   LearningRate 0.0540   Epoch: 5   Global Step: 30190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:14,049-Speed 3404.82 samples/sec   Loss 6.3602   LearningRate 0.0539   Epoch: 5   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:17,059-Speed 3402.36 samples/sec   Loss 6.1988   LearningRate 0.0539   Epoch: 5   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:20,070-Speed 3401.59 samples/sec   Loss 6.4357   LearningRate 0.0539   Epoch: 5   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:23,095-Speed 3386.36 samples/sec   Loss 6.3172   LearningRate 0.0539   Epoch: 5   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:26,108-Speed 3398.56 samples/sec   Loss 6.2439   LearningRate 0.0539   Epoch: 5   Global Step: 30240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:29,127-Speed 3393.07 samples/sec   Loss 6.4076   LearningRate 0.0539   Epoch: 5   Global Step: 30250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:32,143-Speed 3395.93 samples/sec   Loss 6.4199   LearningRate 0.0539   Epoch: 5   Global Step: 30260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:35,159-Speed 3395.45 samples/sec   Loss 6.4761   LearningRate 0.0538   Epoch: 5   Global Step: 30270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:38,178-Speed 3393.07 samples/sec   Loss 6.3507   LearningRate 0.0538   Epoch: 5   Global Step: 30280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:41,207-Speed 3382.13 samples/sec   Loss 6.3848   LearningRate 0.0538   Epoch: 5   Global Step: 30290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:44,250-Speed 3365.52 samples/sec   Loss 6.1614   LearningRate 0.0538   Epoch: 5   Global Step: 30300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:47,262-Speed 3399.90 samples/sec   Loss 6.3050   LearningRate 0.0538   Epoch: 5   Global Step: 30310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:50,297-Speed 3375.34 samples/sec   Loss 6.4188   LearningRate 0.0538   Epoch: 5   Global Step: 30320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:34:53,301-Speed 3409.40 samples/sec   Loss 6.2933   LearningRate 0.0538   Epoch: 5   Global Step: 30330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:56,316-Speed 3397.31 samples/sec   Loss 6.5575   LearningRate 0.0538   Epoch: 5   Global Step: 30340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:34:59,333-Speed 3395.00 samples/sec   Loss 6.3334   LearningRate 0.0537   Epoch: 5   Global Step: 30350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:02,356-Speed 3388.25 samples/sec   Loss 6.3934   LearningRate 0.0537   Epoch: 5   Global Step: 30360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:05,371-Speed 3397.33 samples/sec   Loss 6.3429   LearningRate 0.0537   Epoch: 5   Global Step: 30370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:08,425-Speed 3353.90 samples/sec   Loss 6.4585   LearningRate 0.0537   Epoch: 5   Global Step: 30380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:11,442-Speed 3394.74 samples/sec   Loss 6.2511   LearningRate 0.0537   Epoch: 5   Global Step: 30390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:14,458-Speed 3396.00 samples/sec   Loss 6.2164   LearningRate 0.0537   Epoch: 5   Global Step: 30400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:17,491-Speed 3376.55 samples/sec   Loss 6.3746   LearningRate 0.0537   Epoch: 5   Global Step: 30410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:20,515-Speed 3387.56 samples/sec   Loss 6.3052   LearningRate 0.0537   Epoch: 5   Global Step: 30420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:23,534-Speed 3392.41 samples/sec   Loss 6.4998   LearningRate 0.0536   Epoch: 5   Global Step: 30430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:35:26,562-Speed 3382.68 samples/sec   Loss 6.3605   LearningRate 0.0536   Epoch: 5   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:29,578-Speed 3395.72 samples/sec   Loss 6.1637   LearningRate 0.0536   Epoch: 5   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:32,595-Speed 3395.30 samples/sec   Loss 6.2758   LearningRate 0.0536   Epoch: 5   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:35,609-Speed 3398.37 samples/sec   Loss 6.4242   LearningRate 0.0536   Epoch: 5   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:38,624-Speed 3396.94 samples/sec   Loss 6.2592   LearningRate 0.0536   Epoch: 5   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:41,633-Speed 3404.18 samples/sec   Loss 6.2080   LearningRate 0.0536   Epoch: 5   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:44,648-Speed 3396.09 samples/sec   Loss 6.2874   LearningRate 0.0536   Epoch: 5   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:47,676-Speed 3382.67 samples/sec   Loss 6.3909   LearningRate 0.0535   Epoch: 5   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:50,690-Speed 3398.18 samples/sec   Loss 6.4039   LearningRate 0.0535   Epoch: 5   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:53,704-Speed 3398.82 samples/sec   Loss 6.3794   LearningRate 0.0535   Epoch: 5   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:35:56,716-Speed 3400.82 samples/sec   Loss 6.5153   LearningRate 0.0535   Epoch: 5   Global Step: 30540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:35:59,739-Speed 3388.70 samples/sec   Loss 6.3711   LearningRate 0.0535   Epoch: 5   Global Step: 30550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:02,769-Speed 3380.11 samples/sec   Loss 6.2829   LearningRate 0.0535   Epoch: 5   Global Step: 30560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:05,784-Speed 3397.05 samples/sec   Loss 6.1660   LearningRate 0.0535   Epoch: 5   Global Step: 30570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:08,804-Speed 3391.17 samples/sec   Loss 6.5331   LearningRate 0.0534   Epoch: 5   Global Step: 30580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:11,830-Speed 3385.15 samples/sec   Loss 6.2788   LearningRate 0.0534   Epoch: 5   Global Step: 30590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:14,867-Speed 3371.77 samples/sec   Loss 6.4323   LearningRate 0.0534   Epoch: 5   Global Step: 30600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:17,896-Speed 3381.22 samples/sec   Loss 6.3775   LearningRate 0.0534   Epoch: 5   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:20,917-Speed 3391.33 samples/sec   Loss 6.4249   LearningRate 0.0534   Epoch: 5   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 04:36:23,931-Speed 3398.21 samples/sec   Loss 6.2433   LearningRate 0.0534   Epoch: 5   Global Step: 30630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:36:26,962-Speed 3379.55 samples/sec   Loss 6.3170   LearningRate 0.0534   Epoch: 5   Global Step: 30640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:36:29,982-Speed 3391.19 samples/sec   Loss 6.2737   LearningRate 0.0534   Epoch: 5   Global Step: 30650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:36:32,996-Speed 3398.51 samples/sec   Loss 6.1907   LearningRate 0.0533   Epoch: 5   Global Step: 30660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 04:36:36,012-Speed 3396.16 samples/sec   Loss 6.4344   LearningRate 0.0533   Epoch: 5   Global Step: 30670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:36:39,014-Speed 3411.49 samples/sec   Loss 6.2493   LearningRate 0.0533   Epoch: 5   Global Step: 30680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:36:42,056-Speed 3366.88 samples/sec   Loss 6.2831   LearningRate 0.0533   Epoch: 5   Global Step: 30690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:36:45,067-Speed 3401.24 samples/sec   Loss 6.3254   LearningRate 0.0533   Epoch: 5   Global Step: 30700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:36:48,085-Speed 3394.40 samples/sec   Loss 6.2591   LearningRate 0.0533   Epoch: 5   Global Step: 30710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:36:51,108-Speed 3388.33 samples/sec   Loss 6.1826   LearningRate 0.0533   Epoch: 5   Global Step: 30720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:36:54,130-Speed 3388.96 samples/sec   Loss 6.1771   LearningRate 0.0533   Epoch: 5   Global Step: 30730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:36:57,153-Speed 3388.79 samples/sec   Loss 6.1174   LearningRate 0.0532   Epoch: 5   Global Step: 30740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:37:00,217-Speed 3343.00 samples/sec   Loss 6.2946   LearningRate 0.0532   Epoch: 5   Global Step: 30750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:37:03,240-Speed 3388.20 samples/sec   Loss 6.3185   LearningRate 0.0532   Epoch: 5   Global Step: 30760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:37:06,261-Speed 3390.07 samples/sec   Loss 6.4845   LearningRate 0.0532   Epoch: 5   Global Step: 30770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:37:09,298-Speed 3371.78 samples/sec   Loss 6.3317   LearningRate 0.0532   Epoch: 5   Global Step: 30780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:12,336-Speed 3372.29 samples/sec   Loss 6.2381   LearningRate 0.0532   Epoch: 5   Global Step: 30790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:15,378-Speed 3366.65 samples/sec   Loss 6.3671   LearningRate 0.0532   Epoch: 5   Global Step: 30800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:18,428-Speed 3357.60 samples/sec   Loss 6.2496   LearningRate 0.0532   Epoch: 5   Global Step: 30810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:21,458-Speed 3380.63 samples/sec   Loss 6.4826   LearningRate 0.0531   Epoch: 5   Global Step: 30820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:24,481-Speed 3388.48 samples/sec   Loss 6.2999   LearningRate 0.0531   Epoch: 5   Global Step: 30830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:27,514-Speed 3376.97 samples/sec   Loss 6.2295   LearningRate 0.0531   Epoch: 5   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:30,568-Speed 3353.58 samples/sec   Loss 6.3628   LearningRate 0.0531   Epoch: 5   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:33,599-Speed 3379.12 samples/sec   Loss 6.3784   LearningRate 0.0531   Epoch: 5   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:36,624-Speed 3385.55 samples/sec   Loss 6.2226   LearningRate 0.0531   Epoch: 5   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:37:39,706-Speed 3324.03 samples/sec   Loss 6.3615   LearningRate 0.0531   Epoch: 5   Global Step: 30880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:37:42,729-Speed 3388.29 samples/sec   Loss 6.2246   LearningRate 0.0531   Epoch: 5   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:37:45,756-Speed 3383.07 samples/sec   Loss 6.1711   LearningRate 0.0530   Epoch: 5   Global Step: 30900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:37:48,781-Speed 3386.15 samples/sec   Loss 6.3236   LearningRate 0.0530   Epoch: 5   Global Step: 30910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:37:51,829-Speed 3361.27 samples/sec   Loss 6.2590   LearningRate 0.0530   Epoch: 5   Global Step: 30920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:37:54,874-Speed 3363.21 samples/sec   Loss 6.1984   LearningRate 0.0530   Epoch: 5   Global Step: 30930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:37:57,897-Speed 3388.66 samples/sec   Loss 6.2181   LearningRate 0.0530   Epoch: 5   Global Step: 30940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:00,918-Speed 3390.26 samples/sec   Loss 6.2887   LearningRate 0.0530   Epoch: 5   Global Step: 30950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:03,965-Speed 3361.30 samples/sec   Loss 6.3845   LearningRate 0.0530   Epoch: 5   Global Step: 30960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:06,985-Speed 3391.97 samples/sec   Loss 6.3277   LearningRate 0.0529   Epoch: 5   Global Step: 30970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:10,009-Speed 3387.31 samples/sec   Loss 6.2365   LearningRate 0.0529   Epoch: 5   Global Step: 30980   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-27 04:38:13,032-Speed 3387.85 samples/sec   Loss 6.2945   LearningRate 0.0529   Epoch: 5   Global Step: 30990   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-27 04:38:16,053-Speed 3390.77 samples/sec   Loss 6.2487   LearningRate 0.0529   Epoch: 5   Global Step: 31000   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-27 04:38:19,069-Speed 3395.48 samples/sec   Loss 6.0778   LearningRate 0.0529   Epoch: 5   Global Step: 31010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:22,142-Speed 3332.98 samples/sec   Loss 6.1560   LearningRate 0.0529   Epoch: 5   Global Step: 31020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:25,166-Speed 3387.19 samples/sec   Loss 6.4276   LearningRate 0.0529   Epoch: 5   Global Step: 31030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:28,195-Speed 3381.88 samples/sec   Loss 6.3290   LearningRate 0.0529   Epoch: 5   Global Step: 31040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:31,218-Speed 3388.43 samples/sec   Loss 6.3200   LearningRate 0.0528   Epoch: 5   Global Step: 31050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:34,248-Speed 3380.53 samples/sec   Loss 6.1336   LearningRate 0.0528   Epoch: 5   Global Step: 31060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:37,274-Speed 3384.44 samples/sec   Loss 6.4003   LearningRate 0.0528   Epoch: 5   Global Step: 31070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:40,299-Speed 3385.75 samples/sec   Loss 6.2570   LearningRate 0.0528   Epoch: 5   Global Step: 31080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:43,320-Speed 3390.64 samples/sec   Loss 6.0915   LearningRate 0.0528   Epoch: 5   Global Step: 31090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:46,342-Speed 3389.37 samples/sec   Loss 6.1999   LearningRate 0.0528   Epoch: 5   Global Step: 31100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:49,349-Speed 3406.08 samples/sec   Loss 6.2095   LearningRate 0.0528   Epoch: 5   Global Step: 31110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:52,368-Speed 3393.06 samples/sec   Loss 6.3172   LearningRate 0.0528   Epoch: 5   Global Step: 31120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:55,388-Speed 3390.90 samples/sec   Loss 6.2641   LearningRate 0.0527   Epoch: 5   Global Step: 31130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:38:58,421-Speed 3376.92 samples/sec   Loss 6.1707   LearningRate 0.0527   Epoch: 5   Global Step: 31140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:01,441-Speed 3392.44 samples/sec   Loss 6.2290   LearningRate 0.0527   Epoch: 5   Global Step: 31150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:04,514-Speed 3331.91 samples/sec   Loss 6.3768   LearningRate 0.0527   Epoch: 5   Global Step: 31160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:07,547-Speed 3376.84 samples/sec   Loss 6.3302   LearningRate 0.0527   Epoch: 5   Global Step: 31170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:10,586-Speed 3370.63 samples/sec   Loss 6.3171   LearningRate 0.0527   Epoch: 5   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:13,614-Speed 3383.36 samples/sec   Loss 6.2726   LearningRate 0.0527   Epoch: 5   Global Step: 31190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:16,635-Speed 3390.68 samples/sec   Loss 6.2542   LearningRate 0.0527   Epoch: 5   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:19,657-Speed 3389.14 samples/sec   Loss 6.4300   LearningRate 0.0526   Epoch: 5   Global Step: 31210   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-27 04:39:22,679-Speed 3388.99 samples/sec   Loss 6.3566   LearningRate 0.0526   Epoch: 5   Global Step: 31220   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-27 04:39:25,687-Speed 3404.81 samples/sec   Loss 6.3687   LearningRate 0.0526   Epoch: 5   Global Step: 31230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:28,710-Speed 3388.31 samples/sec   Loss 6.3189   LearningRate 0.0526   Epoch: 5   Global Step: 31240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:39:31,726-Speed 3395.95 samples/sec   Loss 6.3049   LearningRate 0.0526   Epoch: 5   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:34,799-Speed 3333.40 samples/sec   Loss 6.3597   LearningRate 0.0526   Epoch: 5   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:37,831-Speed 3377.34 samples/sec   Loss 6.2274   LearningRate 0.0526   Epoch: 5   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:40,860-Speed 3381.49 samples/sec   Loss 6.5353   LearningRate 0.0526   Epoch: 5   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:43,910-Speed 3358.86 samples/sec   Loss 6.4605   LearningRate 0.0525   Epoch: 5   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:46,951-Speed 3367.51 samples/sec   Loss 6.2728   LearningRate 0.0525   Epoch: 5   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:49,967-Speed 3395.98 samples/sec   Loss 6.2698   LearningRate 0.0525   Epoch: 5   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:52,988-Speed 3390.06 samples/sec   Loss 6.1402   LearningRate 0.0525   Epoch: 5   Global Step: 31320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:56,010-Speed 3389.41 samples/sec   Loss 6.1321   LearningRate 0.0525   Epoch: 5   Global Step: 31330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:39:59,047-Speed 3372.66 samples/sec   Loss 6.2130   LearningRate 0.0525   Epoch: 5   Global Step: 31340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:02,094-Speed 3374.75 samples/sec   Loss 6.2227   LearningRate 0.0525   Epoch: 5   Global Step: 31350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:05,107-Speed 3400.48 samples/sec   Loss 6.1132   LearningRate 0.0525   Epoch: 5   Global Step: 31360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:08,128-Speed 3389.64 samples/sec   Loss 6.2373   LearningRate 0.0524   Epoch: 5   Global Step: 31370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:11,153-Speed 3386.55 samples/sec   Loss 6.3474   LearningRate 0.0524   Epoch: 5   Global Step: 31380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:14,178-Speed 3385.28 samples/sec   Loss 6.4063   LearningRate 0.0524   Epoch: 5   Global Step: 31390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:17,227-Speed 3359.85 samples/sec   Loss 6.1046   LearningRate 0.0524   Epoch: 5   Global Step: 31400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:20,249-Speed 3389.07 samples/sec   Loss 6.2057   LearningRate 0.0524   Epoch: 5   Global Step: 31410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:23,276-Speed 3382.90 samples/sec   Loss 6.1441   LearningRate 0.0524   Epoch: 5   Global Step: 31420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:26,327-Speed 3357.58 samples/sec   Loss 6.3205   LearningRate 0.0524   Epoch: 5   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:29,372-Speed 3363.81 samples/sec   Loss 6.3245   LearningRate 0.0523   Epoch: 5   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:32,398-Speed 3384.92 samples/sec   Loss 6.2431   LearningRate 0.0523   Epoch: 5   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:40:35,425-Speed 3384.04 samples/sec   Loss 6.2965   LearningRate 0.0523   Epoch: 5   Global Step: 31460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:38,457-Speed 3377.84 samples/sec   Loss 6.0664   LearningRate 0.0523   Epoch: 5   Global Step: 31470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:41,501-Speed 3364.71 samples/sec   Loss 6.2964   LearningRate 0.0523   Epoch: 5   Global Step: 31480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:44,525-Speed 3387.63 samples/sec   Loss 6.2811   LearningRate 0.0523   Epoch: 5   Global Step: 31490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:47,549-Speed 3387.10 samples/sec   Loss 6.3015   LearningRate 0.0523   Epoch: 5   Global Step: 31500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:50,578-Speed 3380.82 samples/sec   Loss 6.1244   LearningRate 0.0523   Epoch: 5   Global Step: 31510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:53,612-Speed 3375.69 samples/sec   Loss 6.2104   LearningRate 0.0522   Epoch: 5   Global Step: 31520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:56,641-Speed 3382.49 samples/sec   Loss 6.1877   LearningRate 0.0522   Epoch: 5   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:40:59,665-Speed 3386.34 samples/sec   Loss 6.4050   LearningRate 0.0522   Epoch: 5   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:02,717-Speed 3357.87 samples/sec   Loss 6.2202   LearningRate 0.0522   Epoch: 5   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:05,733-Speed 3395.79 samples/sec   Loss 6.1620   LearningRate 0.0522   Epoch: 5   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:08,761-Speed 3382.46 samples/sec   Loss 6.2059   LearningRate 0.0522   Epoch: 5   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:11,785-Speed 3386.81 samples/sec   Loss 6.2293   LearningRate 0.0522   Epoch: 5   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:14,812-Speed 3383.84 samples/sec   Loss 6.2468   LearningRate 0.0522   Epoch: 5   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:17,821-Speed 3404.68 samples/sec   Loss 6.2336   LearningRate 0.0521   Epoch: 5   Global Step: 31600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:20,842-Speed 3390.47 samples/sec   Loss 6.2011   LearningRate 0.0521   Epoch: 5   Global Step: 31610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:23,874-Speed 3377.20 samples/sec   Loss 6.2511   LearningRate 0.0521   Epoch: 5   Global Step: 31620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:26,912-Speed 3372.43 samples/sec   Loss 6.3350   LearningRate 0.0521   Epoch: 5   Global Step: 31630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:29,935-Speed 3388.09 samples/sec   Loss 6.0808   LearningRate 0.0521   Epoch: 5   Global Step: 31640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:32,961-Speed 3384.74 samples/sec   Loss 6.1584   LearningRate 0.0521   Epoch: 5   Global Step: 31650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:36,006-Speed 3363.96 samples/sec   Loss 6.2185   LearningRate 0.0521   Epoch: 5   Global Step: 31660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:39,032-Speed 3384.66 samples/sec   Loss 6.1322   LearningRate 0.0521   Epoch: 5   Global Step: 31670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:42,054-Speed 3388.67 samples/sec   Loss 6.3137   LearningRate 0.0520   Epoch: 5   Global Step: 31680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:45,075-Speed 3390.19 samples/sec   Loss 6.1815   LearningRate 0.0520   Epoch: 5   Global Step: 31690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:41:48,112-Speed 3372.41 samples/sec   Loss 6.1224   LearningRate 0.0520   Epoch: 5   Global Step: 31700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:51,142-Speed 3380.95 samples/sec   Loss 6.2992   LearningRate 0.0520   Epoch: 5   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:54,175-Speed 3377.00 samples/sec   Loss 6.3012   LearningRate 0.0520   Epoch: 5   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:41:57,266-Speed 3313.69 samples/sec   Loss 6.2695   LearningRate 0.0520   Epoch: 5   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:00,290-Speed 3386.58 samples/sec   Loss 6.1316   LearningRate 0.0520   Epoch: 5   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:03,316-Speed 3385.47 samples/sec   Loss 6.2975   LearningRate 0.0520   Epoch: 5   Global Step: 31750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:06,359-Speed 3366.21 samples/sec   Loss 6.2388   LearningRate 0.0519   Epoch: 5   Global Step: 31760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:09,380-Speed 3390.09 samples/sec   Loss 6.2437   LearningRate 0.0519   Epoch: 5   Global Step: 31770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:12,384-Speed 3409.28 samples/sec   Loss 6.2108   LearningRate 0.0519   Epoch: 5   Global Step: 31780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:15,429-Speed 3363.57 samples/sec   Loss 6.2095   LearningRate 0.0519   Epoch: 5   Global Step: 31790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:18,460-Speed 3379.70 samples/sec   Loss 6.2419   LearningRate 0.0519   Epoch: 5   Global Step: 31800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:21,499-Speed 3370.87 samples/sec   Loss 6.2223   LearningRate 0.0519   Epoch: 5   Global Step: 31810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:24,524-Speed 3385.37 samples/sec   Loss 6.2656   LearningRate 0.0519   Epoch: 5   Global Step: 31820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:27,551-Speed 3383.69 samples/sec   Loss 6.3823   LearningRate 0.0519   Epoch: 5   Global Step: 31830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:30,626-Speed 3330.69 samples/sec   Loss 6.1462   LearningRate 0.0518   Epoch: 5   Global Step: 31840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:33,653-Speed 3384.01 samples/sec   Loss 6.1921   LearningRate 0.0518   Epoch: 5   Global Step: 31850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:36,676-Speed 3388.15 samples/sec   Loss 6.0471   LearningRate 0.0518   Epoch: 5   Global Step: 31860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:39,700-Speed 3386.34 samples/sec   Loss 6.2360   LearningRate 0.0518   Epoch: 5   Global Step: 31870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:42:42,730-Speed 3380.35 samples/sec   Loss 6.2000   LearningRate 0.0518   Epoch: 5   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:45,819-Speed 3315.86 samples/sec   Loss 6.1980   LearningRate 0.0518   Epoch: 5   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:48,872-Speed 3355.40 samples/sec   Loss 6.2666   LearningRate 0.0518   Epoch: 5   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:51,897-Speed 3386.08 samples/sec   Loss 6.2743   LearningRate 0.0518   Epoch: 5   Global Step: 31910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:54,925-Speed 3383.63 samples/sec   Loss 6.2399   LearningRate 0.0517   Epoch: 5   Global Step: 31920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:42:57,948-Speed 3387.24 samples/sec   Loss 6.3501   LearningRate 0.0517   Epoch: 5   Global Step: 31930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:43:00,956-Speed 3405.48 samples/sec   Loss 6.2364   LearningRate 0.0517   Epoch: 5   Global Step: 31940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:43:03,998-Speed 3366.14 samples/sec   Loss 6.3474   LearningRate 0.0517   Epoch: 5   Global Step: 31950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:43:07,031-Speed 3377.10 samples/sec   Loss 6.1911   LearningRate 0.0517   Epoch: 5   Global Step: 31960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:43:10,077-Speed 3362.75 samples/sec   Loss 6.1721   LearningRate 0.0517   Epoch: 5   Global Step: 31970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:43:13,159-Speed 3323.88 samples/sec   Loss 6.2714   LearningRate 0.0517   Epoch: 5   Global Step: 31980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:43:16,184-Speed 3385.50 samples/sec   Loss 6.2713   LearningRate 0.0517   Epoch: 5   Global Step: 31990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:43:19,211-Speed 3383.96 samples/sec   Loss 6.2355   LearningRate 0.0516   Epoch: 5   Global Step: 32000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:44:02,670-[lfw][32000]XNorm: 21.457548
Training: 2022-04-27 04:44:02,671-[lfw][32000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-27 04:44:02,671-[lfw][32000]Accuracy-Highest: 0.99817
Training: 2022-04-27 04:44:53,165-[cfp_fp][32000]XNorm: 18.387733
Training: 2022-04-27 04:44:53,166-[cfp_fp][32000]Accuracy-Flip: 0.95171+-0.00797
Training: 2022-04-27 04:44:53,166-[cfp_fp][32000]Accuracy-Highest: 0.95171
Training: 2022-04-27 04:45:36,480-[agedb_30][32000]XNorm: 20.998163
Training: 2022-04-27 04:45:36,480-[agedb_30][32000]Accuracy-Flip: 0.97117+-0.00940
Training: 2022-04-27 04:45:36,481-[agedb_30][32000]Accuracy-Highest: 0.97233
Training: 2022-04-27 04:45:39,499-Speed 72.99 samples/sec   Loss 6.2059   LearningRate 0.0516   Epoch: 5   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:45:42,501-Speed 3411.60 samples/sec   Loss 6.0726   LearningRate 0.0516   Epoch: 5   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:45:45,563-Speed 3345.65 samples/sec   Loss 6.2340   LearningRate 0.0516   Epoch: 5   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:45:48,587-Speed 3387.21 samples/sec   Loss 6.1384   LearningRate 0.0516   Epoch: 5   Global Step: 32040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:45:51,600-Speed 3398.67 samples/sec   Loss 6.0497   LearningRate 0.0516   Epoch: 5   Global Step: 32050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:45:54,614-Speed 3398.40 samples/sec   Loss 6.2240   LearningRate 0.0516   Epoch: 5   Global Step: 32060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:45:57,627-Speed 3399.75 samples/sec   Loss 6.2498   LearningRate 0.0516   Epoch: 5   Global Step: 32070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:46:00,638-Speed 3401.15 samples/sec   Loss 6.1331   LearningRate 0.0515   Epoch: 5   Global Step: 32080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:46:03,659-Speed 3390.71 samples/sec   Loss 6.2700   LearningRate 0.0515   Epoch: 5   Global Step: 32090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:46:06,674-Speed 3396.08 samples/sec   Loss 6.2355   LearningRate 0.0515   Epoch: 5   Global Step: 32100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:46:09,707-Speed 3377.60 samples/sec   Loss 6.0720   LearningRate 0.0515   Epoch: 5   Global Step: 32110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:46:12,718-Speed 3404.02 samples/sec   Loss 6.0631   LearningRate 0.0515   Epoch: 5   Global Step: 32120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:46:15,735-Speed 3394.59 samples/sec   Loss 6.1817   LearningRate 0.0515   Epoch: 5   Global Step: 32130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:18,766-Speed 3378.87 samples/sec   Loss 6.2925   LearningRate 0.0515   Epoch: 5   Global Step: 32140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:21,793-Speed 3383.77 samples/sec   Loss 6.1453   LearningRate 0.0515   Epoch: 5   Global Step: 32150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:24,817-Speed 3387.19 samples/sec   Loss 6.3270   LearningRate 0.0514   Epoch: 5   Global Step: 32160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:27,832-Speed 3397.19 samples/sec   Loss 6.4171   LearningRate 0.0514   Epoch: 5   Global Step: 32170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:30,857-Speed 3386.16 samples/sec   Loss 6.2139   LearningRate 0.0514   Epoch: 5   Global Step: 32180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:33,890-Speed 3375.95 samples/sec   Loss 6.2741   LearningRate 0.0514   Epoch: 5   Global Step: 32190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:36,910-Speed 3391.91 samples/sec   Loss 6.2776   LearningRate 0.0514   Epoch: 5   Global Step: 32200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:39,931-Speed 3391.14 samples/sec   Loss 6.3246   LearningRate 0.0514   Epoch: 5   Global Step: 32210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:42,942-Speed 3401.23 samples/sec   Loss 6.1987   LearningRate 0.0514   Epoch: 5   Global Step: 32220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 04:46:45,954-Speed 3400.09 samples/sec   Loss 6.3533   LearningRate 0.0513   Epoch: 5   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:46:48,968-Speed 3398.42 samples/sec   Loss 6.1270   LearningRate 0.0513   Epoch: 5   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:46:51,977-Speed 3404.48 samples/sec   Loss 6.2378   LearningRate 0.0513   Epoch: 5   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:46:54,996-Speed 3392.43 samples/sec   Loss 6.2033   LearningRate 0.0513   Epoch: 5   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:46:58,030-Speed 3375.30 samples/sec   Loss 6.2052   LearningRate 0.0513   Epoch: 5   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:01,048-Speed 3393.91 samples/sec   Loss 6.2091   LearningRate 0.0513   Epoch: 5   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:04,058-Speed 3402.29 samples/sec   Loss 6.2485   LearningRate 0.0513   Epoch: 5   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:07,065-Speed 3406.67 samples/sec   Loss 6.1935   LearningRate 0.0513   Epoch: 5   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:10,075-Speed 3403.10 samples/sec   Loss 6.1858   LearningRate 0.0512   Epoch: 5   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:13,085-Speed 3403.10 samples/sec   Loss 6.3114   LearningRate 0.0512   Epoch: 5   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:16,075-Speed 3425.16 samples/sec   Loss 6.2041   LearningRate 0.0512   Epoch: 5   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:19,091-Speed 3396.27 samples/sec   Loss 6.1830   LearningRate 0.0512   Epoch: 5   Global Step: 32340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:22,123-Speed 3378.33 samples/sec   Loss 6.2069   LearningRate 0.0512   Epoch: 5   Global Step: 32350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:25,152-Speed 3380.61 samples/sec   Loss 6.1634   LearningRate 0.0512   Epoch: 5   Global Step: 32360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:28,178-Speed 3384.63 samples/sec   Loss 6.2522   LearningRate 0.0512   Epoch: 5   Global Step: 32370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:31,207-Speed 3382.80 samples/sec   Loss 6.0712   LearningRate 0.0512   Epoch: 5   Global Step: 32380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:34,223-Speed 3395.03 samples/sec   Loss 6.1727   LearningRate 0.0511   Epoch: 5   Global Step: 32390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:37,234-Speed 3401.91 samples/sec   Loss 6.2431   LearningRate 0.0511   Epoch: 5   Global Step: 32400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:40,239-Speed 3408.76 samples/sec   Loss 6.2787   LearningRate 0.0511   Epoch: 5   Global Step: 32410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:43,324-Speed 3320.07 samples/sec   Loss 6.3322   LearningRate 0.0511   Epoch: 5   Global Step: 32420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:47:46,342-Speed 3393.85 samples/sec   Loss 6.2531   LearningRate 0.0511   Epoch: 5   Global Step: 32430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:47:49,423-Speed 3324.09 samples/sec   Loss 6.0399   LearningRate 0.0511   Epoch: 5   Global Step: 32440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:47:52,460-Speed 3372.57 samples/sec   Loss 6.1026   LearningRate 0.0511   Epoch: 5   Global Step: 32450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:47:55,473-Speed 3399.12 samples/sec   Loss 6.1691   LearningRate 0.0511   Epoch: 5   Global Step: 32460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:47:58,543-Speed 3336.71 samples/sec   Loss 6.0900   LearningRate 0.0510   Epoch: 5   Global Step: 32470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:01,544-Speed 3413.49 samples/sec   Loss 6.1612   LearningRate 0.0510   Epoch: 5   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:04,576-Speed 3377.80 samples/sec   Loss 6.1300   LearningRate 0.0510   Epoch: 5   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:07,608-Speed 3378.28 samples/sec   Loss 6.1521   LearningRate 0.0510   Epoch: 5   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:10,638-Speed 3379.77 samples/sec   Loss 6.2826   LearningRate 0.0510   Epoch: 5   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:13,809-Speed 3230.68 samples/sec   Loss 6.2813   LearningRate 0.0510   Epoch: 5   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:16,840-Speed 3378.94 samples/sec   Loss 6.1274   LearningRate 0.0510   Epoch: 5   Global Step: 32530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:19,849-Speed 3404.13 samples/sec   Loss 6.1567   LearningRate 0.0510   Epoch: 5   Global Step: 32540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:22,865-Speed 3395.03 samples/sec   Loss 6.0872   LearningRate 0.0509   Epoch: 5   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:25,875-Speed 3403.66 samples/sec   Loss 6.1980   LearningRate 0.0509   Epoch: 5   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:28,890-Speed 3397.10 samples/sec   Loss 6.1691   LearningRate 0.0509   Epoch: 5   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:31,905-Speed 3397.42 samples/sec   Loss 6.2172   LearningRate 0.0509   Epoch: 5   Global Step: 32580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:34,917-Speed 3400.61 samples/sec   Loss 6.2333   LearningRate 0.0509   Epoch: 5   Global Step: 32590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:37,932-Speed 3396.90 samples/sec   Loss 5.9726   LearningRate 0.0509   Epoch: 5   Global Step: 32600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:40,968-Speed 3372.89 samples/sec   Loss 6.2149   LearningRate 0.0509   Epoch: 5   Global Step: 32610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:43,987-Speed 3393.25 samples/sec   Loss 6.3014   LearningRate 0.0509   Epoch: 5   Global Step: 32620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:47,013-Speed 3384.72 samples/sec   Loss 6.1339   LearningRate 0.0508   Epoch: 5   Global Step: 32630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:50,031-Speed 3394.27 samples/sec   Loss 6.1699   LearningRate 0.0508   Epoch: 5   Global Step: 32640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:48:53,024-Speed 3421.79 samples/sec   Loss 6.2405   LearningRate 0.0508   Epoch: 5   Global Step: 32650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:56,032-Speed 3404.61 samples/sec   Loss 6.1324   LearningRate 0.0508   Epoch: 5   Global Step: 32660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:48:59,043-Speed 3401.73 samples/sec   Loss 6.1960   LearningRate 0.0508   Epoch: 5   Global Step: 32670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:02,057-Speed 3398.79 samples/sec   Loss 6.2137   LearningRate 0.0508   Epoch: 5   Global Step: 32680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:05,088-Speed 3379.60 samples/sec   Loss 6.0766   LearningRate 0.0508   Epoch: 5   Global Step: 32690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:08,099-Speed 3401.50 samples/sec   Loss 6.2985   LearningRate 0.0508   Epoch: 5   Global Step: 32700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:11,113-Speed 3397.24 samples/sec   Loss 6.3565   LearningRate 0.0507   Epoch: 5   Global Step: 32710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:14,132-Speed 3393.60 samples/sec   Loss 6.0414   LearningRate 0.0507   Epoch: 5   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:17,148-Speed 3395.13 samples/sec   Loss 6.1712   LearningRate 0.0507   Epoch: 5   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:20,206-Speed 3350.32 samples/sec   Loss 6.1859   LearningRate 0.0507   Epoch: 5   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:23,417-Speed 3188.96 samples/sec   Loss 6.1828   LearningRate 0.0507   Epoch: 5   Global Step: 32750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:49:26,455-Speed 3371.65 samples/sec   Loss 6.1083   LearningRate 0.0507   Epoch: 5   Global Step: 32760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:49:29,481-Speed 3385.37 samples/sec   Loss 6.1219   LearningRate 0.0507   Epoch: 5   Global Step: 32770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:49:32,478-Speed 3417.03 samples/sec   Loss 6.2003   LearningRate 0.0507   Epoch: 5   Global Step: 32780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:35,493-Speed 3397.65 samples/sec   Loss 6.0735   LearningRate 0.0506   Epoch: 5   Global Step: 32790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:38,505-Speed 3400.04 samples/sec   Loss 6.1640   LearningRate 0.0506   Epoch: 5   Global Step: 32800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:41,528-Speed 3388.46 samples/sec   Loss 6.1042   LearningRate 0.0506   Epoch: 5   Global Step: 32810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:44,581-Speed 3355.00 samples/sec   Loss 6.0930   LearningRate 0.0506   Epoch: 5   Global Step: 32820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:47,610-Speed 3381.35 samples/sec   Loss 6.1204   LearningRate 0.0506   Epoch: 5   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:50,634-Speed 3386.95 samples/sec   Loss 6.1264   LearningRate 0.0506   Epoch: 5   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:53,651-Speed 3394.86 samples/sec   Loss 6.1188   LearningRate 0.0506   Epoch: 5   Global Step: 32850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:56,663-Speed 3400.53 samples/sec   Loss 6.1563   LearningRate 0.0506   Epoch: 5   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:49:59,685-Speed 3389.38 samples/sec   Loss 6.1045   LearningRate 0.0505   Epoch: 5   Global Step: 32870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:50:02,702-Speed 3394.98 samples/sec   Loss 6.0435   LearningRate 0.0505   Epoch: 5   Global Step: 32880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:05,719-Speed 3395.14 samples/sec   Loss 6.2656   LearningRate 0.0505   Epoch: 5   Global Step: 32890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:08,745-Speed 3384.56 samples/sec   Loss 6.2224   LearningRate 0.0505   Epoch: 5   Global Step: 32900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:11,769-Speed 3386.35 samples/sec   Loss 6.0492   LearningRate 0.0505   Epoch: 5   Global Step: 32910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:14,795-Speed 3384.66 samples/sec   Loss 6.1675   LearningRate 0.0505   Epoch: 5   Global Step: 32920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:17,819-Speed 3387.54 samples/sec   Loss 6.1074   LearningRate 0.0505   Epoch: 5   Global Step: 32930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:20,838-Speed 3393.47 samples/sec   Loss 6.1582   LearningRate 0.0505   Epoch: 5   Global Step: 32940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:23,854-Speed 3395.03 samples/sec   Loss 5.9896   LearningRate 0.0504   Epoch: 5   Global Step: 32950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:26,875-Speed 3391.36 samples/sec   Loss 6.2978   LearningRate 0.0504   Epoch: 5   Global Step: 32960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:29,891-Speed 3395.56 samples/sec   Loss 6.1517   LearningRate 0.0504   Epoch: 5   Global Step: 32970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:32,895-Speed 3409.78 samples/sec   Loss 6.1472   LearningRate 0.0504   Epoch: 5   Global Step: 32980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:35,912-Speed 3394.55 samples/sec   Loss 6.1927   LearningRate 0.0504   Epoch: 5   Global Step: 32990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:38,943-Speed 3380.26 samples/sec   Loss 5.9356   LearningRate 0.0504   Epoch: 5   Global Step: 33000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:41,962-Speed 3391.54 samples/sec   Loss 6.0415   LearningRate 0.0504   Epoch: 5   Global Step: 33010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:44,989-Speed 3384.42 samples/sec   Loss 6.1581   LearningRate 0.0504   Epoch: 5   Global Step: 33020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:48,008-Speed 3392.98 samples/sec   Loss 6.1084   LearningRate 0.0503   Epoch: 5   Global Step: 33030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:51,029-Speed 3390.20 samples/sec   Loss 6.1191   LearningRate 0.0503   Epoch: 5   Global Step: 33040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:54,058-Speed 3382.04 samples/sec   Loss 6.1197   LearningRate 0.0503   Epoch: 5   Global Step: 33050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:50:57,077-Speed 3392.57 samples/sec   Loss 6.1501   LearningRate 0.0503   Epoch: 5   Global Step: 33060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:00,116-Speed 3369.56 samples/sec   Loss 6.1621   LearningRate 0.0503   Epoch: 5   Global Step: 33070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:03,121-Speed 3407.83 samples/sec   Loss 6.1737   LearningRate 0.0503   Epoch: 5   Global Step: 33080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:06,125-Speed 3409.89 samples/sec   Loss 6.3361   LearningRate 0.0503   Epoch: 5   Global Step: 33090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:09,148-Speed 3388.80 samples/sec   Loss 6.0831   LearningRate 0.0503   Epoch: 5   Global Step: 33100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:12,163-Speed 3396.30 samples/sec   Loss 6.3007   LearningRate 0.0502   Epoch: 5   Global Step: 33110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:15,182-Speed 3393.74 samples/sec   Loss 5.9419   LearningRate 0.0502   Epoch: 5   Global Step: 33120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:18,196-Speed 3397.54 samples/sec   Loss 6.1155   LearningRate 0.0502   Epoch: 5   Global Step: 33130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:21,216-Speed 3391.72 samples/sec   Loss 6.0241   LearningRate 0.0502   Epoch: 5   Global Step: 33140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:24,238-Speed 3389.41 samples/sec   Loss 6.0593   LearningRate 0.0502   Epoch: 5   Global Step: 33150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:27,325-Speed 3318.12 samples/sec   Loss 6.1058   LearningRate 0.0502   Epoch: 5   Global Step: 33160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:30,342-Speed 3395.37 samples/sec   Loss 6.0813   LearningRate 0.0502   Epoch: 5   Global Step: 33170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:33,366-Speed 3386.03 samples/sec   Loss 5.9721   LearningRate 0.0502   Epoch: 5   Global Step: 33180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:51:36,386-Speed 3391.51 samples/sec   Loss 6.1050   LearningRate 0.0501   Epoch: 5   Global Step: 33190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:39,405-Speed 3393.38 samples/sec   Loss 6.1679   LearningRate 0.0501   Epoch: 5   Global Step: 33200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:42,422-Speed 3394.58 samples/sec   Loss 6.0108   LearningRate 0.0501   Epoch: 5   Global Step: 33210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:45,440-Speed 3393.84 samples/sec   Loss 6.2523   LearningRate 0.0501   Epoch: 5   Global Step: 33220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:48,467-Speed 3384.19 samples/sec   Loss 6.1524   LearningRate 0.0501   Epoch: 5   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:51,512-Speed 3363.95 samples/sec   Loss 6.0479   LearningRate 0.0501   Epoch: 5   Global Step: 33240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:54,531-Speed 3392.02 samples/sec   Loss 5.9221   LearningRate 0.0501   Epoch: 5   Global Step: 33250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:51:57,555-Speed 3386.56 samples/sec   Loss 6.2193   LearningRate 0.0501   Epoch: 5   Global Step: 33260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:52:00,576-Speed 3390.74 samples/sec   Loss 6.0675   LearningRate 0.0500   Epoch: 5   Global Step: 33270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:52:03,598-Speed 3389.77 samples/sec   Loss 6.1969   LearningRate 0.0500   Epoch: 5   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:52:06,598-Speed 3413.57 samples/sec   Loss 5.9814   LearningRate 0.0500   Epoch: 5   Global Step: 33290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:09,621-Speed 3388.05 samples/sec   Loss 6.0241   LearningRate 0.0500   Epoch: 5   Global Step: 33300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:12,702-Speed 3324.55 samples/sec   Loss 6.0138   LearningRate 0.0500   Epoch: 5   Global Step: 33310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:15,722-Speed 3391.52 samples/sec   Loss 6.2496   LearningRate 0.0500   Epoch: 5   Global Step: 33320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:18,743-Speed 3391.22 samples/sec   Loss 6.0117   LearningRate 0.0500   Epoch: 5   Global Step: 33330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:21,767-Speed 3386.09 samples/sec   Loss 6.0778   LearningRate 0.0500   Epoch: 5   Global Step: 33340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:24,789-Speed 3390.40 samples/sec   Loss 6.1469   LearningRate 0.0499   Epoch: 5   Global Step: 33350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:27,810-Speed 3390.11 samples/sec   Loss 5.9585   LearningRate 0.0499   Epoch: 5   Global Step: 33360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:30,832-Speed 3388.97 samples/sec   Loss 6.0808   LearningRate 0.0499   Epoch: 5   Global Step: 33370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:33,865-Speed 3376.26 samples/sec   Loss 6.1905   LearningRate 0.0499   Epoch: 5   Global Step: 33380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:36,874-Speed 3404.53 samples/sec   Loss 6.0361   LearningRate 0.0499   Epoch: 5   Global Step: 33390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:39,892-Speed 3393.99 samples/sec   Loss 6.1056   LearningRate 0.0499   Epoch: 5   Global Step: 33400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:42,928-Speed 3373.86 samples/sec   Loss 6.0759   LearningRate 0.0499   Epoch: 5   Global Step: 33410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:45,949-Speed 3389.55 samples/sec   Loss 6.1241   LearningRate 0.0499   Epoch: 5   Global Step: 33420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:48,974-Speed 3386.53 samples/sec   Loss 6.0635   LearningRate 0.0498   Epoch: 5   Global Step: 33430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:51,996-Speed 3389.51 samples/sec   Loss 6.0699   LearningRate 0.0498   Epoch: 5   Global Step: 33440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:55,030-Speed 3375.08 samples/sec   Loss 6.2127   LearningRate 0.0498   Epoch: 5   Global Step: 33450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:52:58,055-Speed 3386.23 samples/sec   Loss 6.1351   LearningRate 0.0498   Epoch: 5   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:53:01,073-Speed 3393.22 samples/sec   Loss 6.0879   LearningRate 0.0498   Epoch: 5   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:53:04,096-Speed 3388.39 samples/sec   Loss 6.0934   LearningRate 0.0498   Epoch: 5   Global Step: 33480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:53:07,120-Speed 3387.64 samples/sec   Loss 6.0915   LearningRate 0.0498   Epoch: 5   Global Step: 33490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:10,142-Speed 3389.91 samples/sec   Loss 6.1928   LearningRate 0.0498   Epoch: 5   Global Step: 33500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:13,162-Speed 3390.91 samples/sec   Loss 6.2144   LearningRate 0.0497   Epoch: 5   Global Step: 33510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:16,183-Speed 3390.55 samples/sec   Loss 6.1384   LearningRate 0.0497   Epoch: 5   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:19,205-Speed 3389.69 samples/sec   Loss 6.0630   LearningRate 0.0497   Epoch: 5   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:22,234-Speed 3380.97 samples/sec   Loss 6.2538   LearningRate 0.0497   Epoch: 5   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:25,264-Speed 3380.15 samples/sec   Loss 6.0691   LearningRate 0.0497   Epoch: 5   Global Step: 33550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:28,312-Speed 3360.54 samples/sec   Loss 6.1835   LearningRate 0.0497   Epoch: 5   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:31,337-Speed 3386.20 samples/sec   Loss 5.9254   LearningRate 0.0497   Epoch: 5   Global Step: 33570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:34,358-Speed 3390.40 samples/sec   Loss 6.1753   LearningRate 0.0497   Epoch: 5   Global Step: 33580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:37,377-Speed 3392.68 samples/sec   Loss 6.1465   LearningRate 0.0496   Epoch: 5   Global Step: 33590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:40,418-Speed 3368.82 samples/sec   Loss 6.0054   LearningRate 0.0496   Epoch: 5   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:43,441-Speed 3387.12 samples/sec   Loss 6.0156   LearningRate 0.0496   Epoch: 5   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:46,466-Speed 3386.26 samples/sec   Loss 6.0353   LearningRate 0.0496   Epoch: 5   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:53:49,574-Speed 3295.96 samples/sec   Loss 6.0509   LearningRate 0.0496   Epoch: 5   Global Step: 33630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:53:52,627-Speed 3354.79 samples/sec   Loss 6.1820   LearningRate 0.0496   Epoch: 5   Global Step: 33640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:53:55,649-Speed 3389.91 samples/sec   Loss 6.2024   LearningRate 0.0496   Epoch: 5   Global Step: 33650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:53:58,701-Speed 3355.64 samples/sec   Loss 6.1116   LearningRate 0.0496   Epoch: 5   Global Step: 33660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:01,716-Speed 3397.48 samples/sec   Loss 6.0759   LearningRate 0.0496   Epoch: 5   Global Step: 33670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:04,758-Speed 3366.73 samples/sec   Loss 6.1399   LearningRate 0.0495   Epoch: 5   Global Step: 33680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:07,786-Speed 3382.66 samples/sec   Loss 6.0722   LearningRate 0.0495   Epoch: 5   Global Step: 33690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:10,809-Speed 3388.53 samples/sec   Loss 6.1076   LearningRate 0.0495   Epoch: 5   Global Step: 33700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:13,835-Speed 3385.12 samples/sec   Loss 5.9953   LearningRate 0.0495   Epoch: 5   Global Step: 33710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:16,867-Speed 3377.49 samples/sec   Loss 5.9168   LearningRate 0.0495   Epoch: 5   Global Step: 33720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:20,169-Speed 3101.25 samples/sec   Loss 6.0557   LearningRate 0.0495   Epoch: 5   Global Step: 33730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:54:23,190-Speed 3391.06 samples/sec   Loss 6.0781   LearningRate 0.0495   Epoch: 5   Global Step: 33740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:54:26,219-Speed 3381.54 samples/sec   Loss 5.9841   LearningRate 0.0495   Epoch: 5   Global Step: 33750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:54:29,241-Speed 3389.38 samples/sec   Loss 5.9945   LearningRate 0.0494   Epoch: 5   Global Step: 33760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:54:32,239-Speed 3415.94 samples/sec   Loss 5.9728   LearningRate 0.0494   Epoch: 5   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:35,316-Speed 3329.35 samples/sec   Loss 6.0699   LearningRate 0.0494   Epoch: 5   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:38,345-Speed 3380.65 samples/sec   Loss 6.1037   LearningRate 0.0494   Epoch: 5   Global Step: 33790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:41,376-Speed 3379.96 samples/sec   Loss 6.2407   LearningRate 0.0494   Epoch: 5   Global Step: 33800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:44,412-Speed 3373.72 samples/sec   Loss 6.0437   LearningRate 0.0494   Epoch: 5   Global Step: 33810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:47,442-Speed 3380.13 samples/sec   Loss 5.9336   LearningRate 0.0494   Epoch: 5   Global Step: 33820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:50,468-Speed 3384.85 samples/sec   Loss 6.0669   LearningRate 0.0494   Epoch: 5   Global Step: 33830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:53,504-Speed 3374.41 samples/sec   Loss 5.9722   LearningRate 0.0493   Epoch: 5   Global Step: 33840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:56,529-Speed 3385.04 samples/sec   Loss 6.0088   LearningRate 0.0493   Epoch: 5   Global Step: 33850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:54:59,555-Speed 3385.54 samples/sec   Loss 5.9421   LearningRate 0.0493   Epoch: 5   Global Step: 33860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:55:02,597-Speed 3367.08 samples/sec   Loss 6.0270   LearningRate 0.0493   Epoch: 5   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:05,625-Speed 3382.75 samples/sec   Loss 6.0982   LearningRate 0.0493   Epoch: 5   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:08,643-Speed 3393.31 samples/sec   Loss 6.1752   LearningRate 0.0493   Epoch: 5   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:11,665-Speed 3389.70 samples/sec   Loss 6.0908   LearningRate 0.0493   Epoch: 5   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:14,697-Speed 3377.88 samples/sec   Loss 6.1126   LearningRate 0.0493   Epoch: 5   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:17,722-Speed 3386.95 samples/sec   Loss 6.2652   LearningRate 0.0492   Epoch: 5   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:20,748-Speed 3384.05 samples/sec   Loss 6.0860   LearningRate 0.0492   Epoch: 5   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:23,792-Speed 3365.46 samples/sec   Loss 6.1424   LearningRate 0.0492   Epoch: 5   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:26,842-Speed 3357.86 samples/sec   Loss 6.1205   LearningRate 0.0492   Epoch: 5   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:29,881-Speed 3371.01 samples/sec   Loss 6.0836   LearningRate 0.0492   Epoch: 5   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:32,888-Speed 3405.63 samples/sec   Loss 5.9424   LearningRate 0.0492   Epoch: 5   Global Step: 33970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:35,933-Speed 3363.47 samples/sec   Loss 6.0840   LearningRate 0.0492   Epoch: 5   Global Step: 33980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:38,957-Speed 3386.85 samples/sec   Loss 6.0740   LearningRate 0.0492   Epoch: 5   Global Step: 33990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:55:41,988-Speed 3379.06 samples/sec   Loss 6.1245   LearningRate 0.0491   Epoch: 5   Global Step: 34000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:56:25,467-[lfw][34000]XNorm: 22.048982
Training: 2022-04-27 04:56:25,468-[lfw][34000]Accuracy-Flip: 0.99650+-0.00241
Training: 2022-04-27 04:56:25,468-[lfw][34000]Accuracy-Highest: 0.99817
Training: 2022-04-27 04:57:15,935-[cfp_fp][34000]XNorm: 19.422187
Training: 2022-04-27 04:57:15,935-[cfp_fp][34000]Accuracy-Flip: 0.94429+-0.01020
Training: 2022-04-27 04:57:15,936-[cfp_fp][34000]Accuracy-Highest: 0.95171
Training: 2022-04-27 04:57:59,388-[agedb_30][34000]XNorm: 21.921999
Training: 2022-04-27 04:57:59,389-[agedb_30][34000]Accuracy-Flip: 0.97467+-0.00733
Training: 2022-04-27 04:57:59,389-[agedb_30][34000]Accuracy-Highest: 0.97467
Training: 2022-04-27 04:58:02,404-Speed 72.93 samples/sec   Loss 5.9770   LearningRate 0.0491   Epoch: 5   Global Step: 34010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:05,407-Speed 3410.35 samples/sec   Loss 6.1000   LearningRate 0.0491   Epoch: 5   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:08,418-Speed 3402.04 samples/sec   Loss 6.0136   LearningRate 0.0491   Epoch: 5   Global Step: 34030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:11,505-Speed 3318.25 samples/sec   Loss 5.9405   LearningRate 0.0491   Epoch: 5   Global Step: 34040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:14,512-Speed 3405.79 samples/sec   Loss 6.0713   LearningRate 0.0491   Epoch: 5   Global Step: 34050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:17,521-Speed 3403.44 samples/sec   Loss 5.9715   LearningRate 0.0491   Epoch: 5   Global Step: 34060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:20,515-Speed 3421.37 samples/sec   Loss 6.0955   LearningRate 0.0491   Epoch: 5   Global Step: 34070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:23,533-Speed 3393.19 samples/sec   Loss 6.1513   LearningRate 0.0490   Epoch: 5   Global Step: 34080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:26,569-Speed 3373.63 samples/sec   Loss 6.0400   LearningRate 0.0490   Epoch: 5   Global Step: 34090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:29,584-Speed 3397.57 samples/sec   Loss 6.0054   LearningRate 0.0490   Epoch: 5   Global Step: 34100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:32,676-Speed 3312.58 samples/sec   Loss 6.1396   LearningRate 0.0490   Epoch: 5   Global Step: 34110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:45,906-Speed 774.03 samples/sec   Loss 5.7697   LearningRate 0.0490   Epoch: 6   Global Step: 34120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:48,930-Speed 3387.91 samples/sec   Loss 5.4939   LearningRate 0.0490   Epoch: 6   Global Step: 34130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:51,948-Speed 3393.59 samples/sec   Loss 5.3537   LearningRate 0.0490   Epoch: 6   Global Step: 34140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:54,962-Speed 3398.59 samples/sec   Loss 5.4388   LearningRate 0.0490   Epoch: 6   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:58:57,997-Speed 3374.77 samples/sec   Loss 5.2846   LearningRate 0.0489   Epoch: 6   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:59:01,013-Speed 3395.96 samples/sec   Loss 5.4239   LearningRate 0.0489   Epoch: 6   Global Step: 34170   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-27 04:59:04,018-Speed 3407.78 samples/sec   Loss 5.3882   LearningRate 0.0489   Epoch: 6   Global Step: 34180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:59:07,022-Speed 3410.35 samples/sec   Loss 5.2876   LearningRate 0.0489   Epoch: 6   Global Step: 34190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:10,050-Speed 3382.11 samples/sec   Loss 5.4923   LearningRate 0.0489   Epoch: 6   Global Step: 34200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:13,074-Speed 3387.82 samples/sec   Loss 5.4459   LearningRate 0.0489   Epoch: 6   Global Step: 34210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:16,098-Speed 3387.04 samples/sec   Loss 5.5158   LearningRate 0.0489   Epoch: 6   Global Step: 34220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:19,122-Speed 3386.69 samples/sec   Loss 5.5717   LearningRate 0.0489   Epoch: 6   Global Step: 34230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:22,157-Speed 3373.90 samples/sec   Loss 5.4854   LearningRate 0.0488   Epoch: 6   Global Step: 34240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:25,301-Speed 3258.65 samples/sec   Loss 5.5217   LearningRate 0.0488   Epoch: 6   Global Step: 34250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:28,323-Speed 3389.22 samples/sec   Loss 5.6587   LearningRate 0.0488   Epoch: 6   Global Step: 34260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:31,517-Speed 3205.84 samples/sec   Loss 5.4709   LearningRate 0.0488   Epoch: 6   Global Step: 34270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:34,529-Speed 3401.08 samples/sec   Loss 5.5684   LearningRate 0.0488   Epoch: 6   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:37,543-Speed 3398.58 samples/sec   Loss 5.4981   LearningRate 0.0488   Epoch: 6   Global Step: 34290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:59:40,555-Speed 3400.34 samples/sec   Loss 5.5732   LearningRate 0.0488   Epoch: 6   Global Step: 34300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 04:59:43,547-Speed 3423.96 samples/sec   Loss 5.5709   LearningRate 0.0488   Epoch: 6   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:46,557-Speed 3401.98 samples/sec   Loss 5.6163   LearningRate 0.0487   Epoch: 6   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:49,579-Speed 3389.77 samples/sec   Loss 5.6734   LearningRate 0.0487   Epoch: 6   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:52,593-Speed 3397.62 samples/sec   Loss 5.4294   LearningRate 0.0487   Epoch: 6   Global Step: 34340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:55,603-Speed 3404.52 samples/sec   Loss 5.5396   LearningRate 0.0487   Epoch: 6   Global Step: 34350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 04:59:58,607-Speed 3408.99 samples/sec   Loss 5.6087   LearningRate 0.0487   Epoch: 6   Global Step: 34360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:00:01,617-Speed 3403.23 samples/sec   Loss 5.6839   LearningRate 0.0487   Epoch: 6   Global Step: 34370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:00:04,691-Speed 3332.23 samples/sec   Loss 5.6933   LearningRate 0.0487   Epoch: 6   Global Step: 34380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:00:07,699-Speed 3406.54 samples/sec   Loss 5.5984   LearningRate 0.0487   Epoch: 6   Global Step: 34390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:00:10,718-Speed 3393.55 samples/sec   Loss 5.6267   LearningRate 0.0487   Epoch: 6   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:00:13,758-Speed 3369.45 samples/sec   Loss 5.5589   LearningRate 0.0486   Epoch: 6   Global Step: 34410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:16,768-Speed 3402.28 samples/sec   Loss 5.4005   LearningRate 0.0486   Epoch: 6   Global Step: 34420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:19,783-Speed 3396.88 samples/sec   Loss 5.5804   LearningRate 0.0486   Epoch: 6   Global Step: 34430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:22,796-Speed 3400.02 samples/sec   Loss 5.6018   LearningRate 0.0486   Epoch: 6   Global Step: 34440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:25,821-Speed 3386.19 samples/sec   Loss 5.7238   LearningRate 0.0486   Epoch: 6   Global Step: 34450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:28,823-Speed 3411.50 samples/sec   Loss 5.7600   LearningRate 0.0486   Epoch: 6   Global Step: 34460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:31,831-Speed 3405.32 samples/sec   Loss 5.6338   LearningRate 0.0486   Epoch: 6   Global Step: 34470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:34,867-Speed 3374.03 samples/sec   Loss 5.7328   LearningRate 0.0486   Epoch: 6   Global Step: 34480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:37,880-Speed 3399.48 samples/sec   Loss 5.6890   LearningRate 0.0485   Epoch: 6   Global Step: 34490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:40,945-Speed 3341.17 samples/sec   Loss 5.4715   LearningRate 0.0485   Epoch: 6   Global Step: 34500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:43,933-Speed 3428.22 samples/sec   Loss 5.7753   LearningRate 0.0485   Epoch: 6   Global Step: 34510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:46,941-Speed 3405.39 samples/sec   Loss 5.6055   LearningRate 0.0485   Epoch: 6   Global Step: 34520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:49,949-Speed 3405.10 samples/sec   Loss 5.5892   LearningRate 0.0485   Epoch: 6   Global Step: 34530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:52,958-Speed 3403.22 samples/sec   Loss 5.6273   LearningRate 0.0485   Epoch: 6   Global Step: 34540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:55,981-Speed 3389.88 samples/sec   Loss 5.8013   LearningRate 0.0485   Epoch: 6   Global Step: 34550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:00:59,059-Speed 3327.67 samples/sec   Loss 5.7682   LearningRate 0.0485   Epoch: 6   Global Step: 34560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:02,072-Speed 3399.29 samples/sec   Loss 5.7473   LearningRate 0.0484   Epoch: 6   Global Step: 34570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:05,085-Speed 3398.31 samples/sec   Loss 5.6868   LearningRate 0.0484   Epoch: 6   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:08,104-Speed 3394.03 samples/sec   Loss 5.6862   LearningRate 0.0484   Epoch: 6   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:11,147-Speed 3365.30 samples/sec   Loss 5.5688   LearningRate 0.0484   Epoch: 6   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:14,140-Speed 3423.35 samples/sec   Loss 5.7473   LearningRate 0.0484   Epoch: 6   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:17,146-Speed 3407.59 samples/sec   Loss 5.7560   LearningRate 0.0484   Epoch: 6   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:20,155-Speed 3403.94 samples/sec   Loss 5.7248   LearningRate 0.0484   Epoch: 6   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:23,279-Speed 3278.01 samples/sec   Loss 5.6957   LearningRate 0.0484   Epoch: 6   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:26,303-Speed 3386.72 samples/sec   Loss 5.7575   LearningRate 0.0483   Epoch: 6   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:01:29,317-Speed 3399.08 samples/sec   Loss 5.6867   LearningRate 0.0483   Epoch: 6   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:01:32,315-Speed 3415.90 samples/sec   Loss 5.6544   LearningRate 0.0483   Epoch: 6   Global Step: 34670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:35,324-Speed 3404.17 samples/sec   Loss 5.6425   LearningRate 0.0483   Epoch: 6   Global Step: 34680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:38,338-Speed 3398.29 samples/sec   Loss 5.6944   LearningRate 0.0483   Epoch: 6   Global Step: 34690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:41,353-Speed 3396.57 samples/sec   Loss 5.8292   LearningRate 0.0483   Epoch: 6   Global Step: 34700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:44,361-Speed 3405.64 samples/sec   Loss 5.8080   LearningRate 0.0483   Epoch: 6   Global Step: 34710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:47,373-Speed 3401.06 samples/sec   Loss 5.6385   LearningRate 0.0483   Epoch: 6   Global Step: 34720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:50,415-Speed 3366.65 samples/sec   Loss 5.7830   LearningRate 0.0482   Epoch: 6   Global Step: 34730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:53,544-Speed 3272.61 samples/sec   Loss 5.6905   LearningRate 0.0482   Epoch: 6   Global Step: 34740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:56,554-Speed 3403.71 samples/sec   Loss 5.8279   LearningRate 0.0482   Epoch: 6   Global Step: 34750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:01:59,564-Speed 3401.88 samples/sec   Loss 5.8844   LearningRate 0.0482   Epoch: 6   Global Step: 34760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:02:02,583-Speed 3392.88 samples/sec   Loss 5.6930   LearningRate 0.0482   Epoch: 6   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:05,601-Speed 3393.47 samples/sec   Loss 5.7530   LearningRate 0.0482   Epoch: 6   Global Step: 34780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:08,619-Speed 3394.13 samples/sec   Loss 5.8416   LearningRate 0.0482   Epoch: 6   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:11,627-Speed 3404.93 samples/sec   Loss 5.8031   LearningRate 0.0482   Epoch: 6   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:14,644-Speed 3395.91 samples/sec   Loss 5.6165   LearningRate 0.0481   Epoch: 6   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:17,655-Speed 3400.73 samples/sec   Loss 5.6806   LearningRate 0.0481   Epoch: 6   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:20,670-Speed 3397.07 samples/sec   Loss 5.7098   LearningRate 0.0481   Epoch: 6   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:23,684-Speed 3398.50 samples/sec   Loss 5.7892   LearningRate 0.0481   Epoch: 6   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:26,709-Speed 3385.77 samples/sec   Loss 5.8777   LearningRate 0.0481   Epoch: 6   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:29,728-Speed 3392.87 samples/sec   Loss 5.7131   LearningRate 0.0481   Epoch: 6   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:02:32,746-Speed 3393.43 samples/sec   Loss 5.7259   LearningRate 0.0481   Epoch: 6   Global Step: 34870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:35,768-Speed 3390.27 samples/sec   Loss 5.7917   LearningRate 0.0481   Epoch: 6   Global Step: 34880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:38,809-Speed 3368.14 samples/sec   Loss 5.7611   LearningRate 0.0481   Epoch: 6   Global Step: 34890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:41,829-Speed 3390.69 samples/sec   Loss 5.8061   LearningRate 0.0480   Epoch: 6   Global Step: 34900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:44,844-Speed 3397.44 samples/sec   Loss 5.6451   LearningRate 0.0480   Epoch: 6   Global Step: 34910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:47,880-Speed 3373.60 samples/sec   Loss 5.7223   LearningRate 0.0480   Epoch: 6   Global Step: 34920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:50,890-Speed 3402.28 samples/sec   Loss 5.8015   LearningRate 0.0480   Epoch: 6   Global Step: 34930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:53,907-Speed 3394.64 samples/sec   Loss 5.7902   LearningRate 0.0480   Epoch: 6   Global Step: 34940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:56,944-Speed 3373.64 samples/sec   Loss 5.8927   LearningRate 0.0480   Epoch: 6   Global Step: 34950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:02:59,960-Speed 3396.02 samples/sec   Loss 5.7717   LearningRate 0.0480   Epoch: 6   Global Step: 34960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:03,093-Speed 3269.17 samples/sec   Loss 5.6472   LearningRate 0.0480   Epoch: 6   Global Step: 34970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:06,194-Speed 3302.86 samples/sec   Loss 5.8298   LearningRate 0.0479   Epoch: 6   Global Step: 34980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:09,209-Speed 3397.27 samples/sec   Loss 5.8084   LearningRate 0.0479   Epoch: 6   Global Step: 34990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:12,217-Speed 3404.89 samples/sec   Loss 5.9709   LearningRate 0.0479   Epoch: 6   Global Step: 35000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:15,231-Speed 3398.27 samples/sec   Loss 5.7157   LearningRate 0.0479   Epoch: 6   Global Step: 35010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:18,243-Speed 3400.32 samples/sec   Loss 5.8168   LearningRate 0.0479   Epoch: 6   Global Step: 35020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:21,268-Speed 3386.13 samples/sec   Loss 5.7944   LearningRate 0.0479   Epoch: 6   Global Step: 35030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:24,328-Speed 3347.01 samples/sec   Loss 5.7330   LearningRate 0.0479   Epoch: 6   Global Step: 35040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:27,352-Speed 3387.00 samples/sec   Loss 5.6851   LearningRate 0.0479   Epoch: 6   Global Step: 35050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:30,367-Speed 3397.14 samples/sec   Loss 5.6925   LearningRate 0.0478   Epoch: 6   Global Step: 35060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:33,387-Speed 3391.83 samples/sec   Loss 5.7454   LearningRate 0.0478   Epoch: 6   Global Step: 35070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:36,400-Speed 3399.36 samples/sec   Loss 5.7344   LearningRate 0.0478   Epoch: 6   Global Step: 35080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:39,419-Speed 3393.99 samples/sec   Loss 5.7992   LearningRate 0.0478   Epoch: 6   Global Step: 35090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:03:42,436-Speed 3394.64 samples/sec   Loss 5.6899   LearningRate 0.0478   Epoch: 6   Global Step: 35100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:45,453-Speed 3394.68 samples/sec   Loss 5.7623   LearningRate 0.0478   Epoch: 6   Global Step: 35110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:48,474-Speed 3390.82 samples/sec   Loss 5.9207   LearningRate 0.0478   Epoch: 6   Global Step: 35120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:51,503-Speed 3380.74 samples/sec   Loss 5.8533   LearningRate 0.0478   Epoch: 6   Global Step: 35130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:54,522-Speed 3393.43 samples/sec   Loss 5.9527   LearningRate 0.0477   Epoch: 6   Global Step: 35140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:03:57,542-Speed 3391.55 samples/sec   Loss 5.8365   LearningRate 0.0477   Epoch: 6   Global Step: 35150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:04:00,555-Speed 3399.34 samples/sec   Loss 5.8006   LearningRate 0.0477   Epoch: 6   Global Step: 35160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:04:03,550-Speed 3419.14 samples/sec   Loss 5.7763   LearningRate 0.0477   Epoch: 6   Global Step: 35170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:06,563-Speed 3400.34 samples/sec   Loss 5.7286   LearningRate 0.0477   Epoch: 6   Global Step: 35180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:09,572-Speed 3402.97 samples/sec   Loss 5.8572   LearningRate 0.0477   Epoch: 6   Global Step: 35190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:12,594-Speed 3389.75 samples/sec   Loss 5.9129   LearningRate 0.0477   Epoch: 6   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:15,610-Speed 3395.86 samples/sec   Loss 5.8581   LearningRate 0.0477   Epoch: 6   Global Step: 35210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:18,626-Speed 3395.80 samples/sec   Loss 5.8046   LearningRate 0.0477   Epoch: 6   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:21,650-Speed 3387.92 samples/sec   Loss 5.7919   LearningRate 0.0476   Epoch: 6   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:24,669-Speed 3392.06 samples/sec   Loss 5.8621   LearningRate 0.0476   Epoch: 6   Global Step: 35240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:27,692-Speed 3388.40 samples/sec   Loss 5.6866   LearningRate 0.0476   Epoch: 6   Global Step: 35250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:30,712-Speed 3391.12 samples/sec   Loss 5.8798   LearningRate 0.0476   Epoch: 6   Global Step: 35260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:33,793-Speed 3324.75 samples/sec   Loss 5.9860   LearningRate 0.0476   Epoch: 6   Global Step: 35270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:04:36,831-Speed 3371.16 samples/sec   Loss 5.8750   LearningRate 0.0476   Epoch: 6   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:39,850-Speed 3393.34 samples/sec   Loss 5.8550   LearningRate 0.0476   Epoch: 6   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:42,864-Speed 3398.00 samples/sec   Loss 5.8450   LearningRate 0.0476   Epoch: 6   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:45,893-Speed 3381.55 samples/sec   Loss 5.9237   LearningRate 0.0475   Epoch: 6   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:48,910-Speed 3394.75 samples/sec   Loss 5.8951   LearningRate 0.0475   Epoch: 6   Global Step: 35320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:51,926-Speed 3396.30 samples/sec   Loss 5.8699   LearningRate 0.0475   Epoch: 6   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:54,946-Speed 3391.73 samples/sec   Loss 5.8128   LearningRate 0.0475   Epoch: 6   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:04:57,963-Speed 3395.09 samples/sec   Loss 5.8230   LearningRate 0.0475   Epoch: 6   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:00,980-Speed 3393.78 samples/sec   Loss 5.9331   LearningRate 0.0475   Epoch: 6   Global Step: 35360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:04,021-Speed 3368.47 samples/sec   Loss 5.7978   LearningRate 0.0475   Epoch: 6   Global Step: 35370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:07,054-Speed 3377.12 samples/sec   Loss 5.7292   LearningRate 0.0475   Epoch: 6   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:10,068-Speed 3398.06 samples/sec   Loss 5.8455   LearningRate 0.0474   Epoch: 6   Global Step: 35390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:13,087-Speed 3392.57 samples/sec   Loss 5.7203   LearningRate 0.0474   Epoch: 6   Global Step: 35400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:16,107-Speed 3392.05 samples/sec   Loss 5.6846   LearningRate 0.0474   Epoch: 6   Global Step: 35410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:19,125-Speed 3394.01 samples/sec   Loss 5.8426   LearningRate 0.0474   Epoch: 6   Global Step: 35420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:22,143-Speed 3393.64 samples/sec   Loss 5.9271   LearningRate 0.0474   Epoch: 6   Global Step: 35430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:25,170-Speed 3384.07 samples/sec   Loss 5.6278   LearningRate 0.0474   Epoch: 6   Global Step: 35440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:28,205-Speed 3374.89 samples/sec   Loss 5.7965   LearningRate 0.0474   Epoch: 6   Global Step: 35450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:31,222-Speed 3394.67 samples/sec   Loss 5.7519   LearningRate 0.0474   Epoch: 6   Global Step: 35460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:34,253-Speed 3379.28 samples/sec   Loss 5.8175   LearningRate 0.0473   Epoch: 6   Global Step: 35470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:37,255-Speed 3411.60 samples/sec   Loss 5.9958   LearningRate 0.0473   Epoch: 6   Global Step: 35480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:40,271-Speed 3395.68 samples/sec   Loss 6.0198   LearningRate 0.0473   Epoch: 6   Global Step: 35490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:05:43,265-Speed 3420.81 samples/sec   Loss 5.7564   LearningRate 0.0473   Epoch: 6   Global Step: 35500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:46,296-Speed 3379.62 samples/sec   Loss 5.8371   LearningRate 0.0473   Epoch: 6   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:49,322-Speed 3384.81 samples/sec   Loss 5.6950   LearningRate 0.0473   Epoch: 6   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:52,339-Speed 3395.63 samples/sec   Loss 5.7423   LearningRate 0.0473   Epoch: 6   Global Step: 35530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:55,358-Speed 3392.09 samples/sec   Loss 5.7073   LearningRate 0.0473   Epoch: 6   Global Step: 35540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:05:58,392-Speed 3376.29 samples/sec   Loss 5.7750   LearningRate 0.0473   Epoch: 6   Global Step: 35550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:01,406-Speed 3398.05 samples/sec   Loss 5.8111   LearningRate 0.0472   Epoch: 6   Global Step: 35560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:04,424-Speed 3393.76 samples/sec   Loss 5.7937   LearningRate 0.0472   Epoch: 6   Global Step: 35570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:07,448-Speed 3386.99 samples/sec   Loss 5.8080   LearningRate 0.0472   Epoch: 6   Global Step: 35580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:10,469-Speed 3389.36 samples/sec   Loss 5.9467   LearningRate 0.0472   Epoch: 6   Global Step: 35590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:13,492-Speed 3388.97 samples/sec   Loss 5.7864   LearningRate 0.0472   Epoch: 6   Global Step: 35600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:06:16,510-Speed 3393.86 samples/sec   Loss 5.7670   LearningRate 0.0472   Epoch: 6   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:06:19,527-Speed 3394.97 samples/sec   Loss 5.9703   LearningRate 0.0472   Epoch: 6   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:06:22,548-Speed 3390.90 samples/sec   Loss 5.7435   LearningRate 0.0472   Epoch: 6   Global Step: 35630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:06:25,568-Speed 3390.74 samples/sec   Loss 5.9139   LearningRate 0.0471   Epoch: 6   Global Step: 35640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:06:28,590-Speed 3389.56 samples/sec   Loss 5.8754   LearningRate 0.0471   Epoch: 6   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:06:31,611-Speed 3390.14 samples/sec   Loss 5.8837   LearningRate 0.0471   Epoch: 6   Global Step: 35660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:06:34,612-Speed 3413.33 samples/sec   Loss 5.7174   LearningRate 0.0471   Epoch: 6   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:37,647-Speed 3374.27 samples/sec   Loss 5.8480   LearningRate 0.0471   Epoch: 6   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:40,669-Speed 3390.40 samples/sec   Loss 5.8082   LearningRate 0.0471   Epoch: 6   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:43,684-Speed 3396.16 samples/sec   Loss 5.8671   LearningRate 0.0471   Epoch: 6   Global Step: 35700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:46,718-Speed 3376.18 samples/sec   Loss 5.7395   LearningRate 0.0471   Epoch: 6   Global Step: 35710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:49,781-Speed 3344.55 samples/sec   Loss 5.7566   LearningRate 0.0470   Epoch: 6   Global Step: 35720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:52,896-Speed 3288.12 samples/sec   Loss 5.9177   LearningRate 0.0470   Epoch: 6   Global Step: 35730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:55,915-Speed 3392.11 samples/sec   Loss 5.8572   LearningRate 0.0470   Epoch: 6   Global Step: 35740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:06:58,938-Speed 3387.75 samples/sec   Loss 5.8007   LearningRate 0.0470   Epoch: 6   Global Step: 35750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:01,971-Speed 3376.91 samples/sec   Loss 5.7616   LearningRate 0.0470   Epoch: 6   Global Step: 35760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:05,003-Speed 3378.67 samples/sec   Loss 5.8462   LearningRate 0.0470   Epoch: 6   Global Step: 35770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:08,007-Speed 3409.28 samples/sec   Loss 5.7981   LearningRate 0.0470   Epoch: 6   Global Step: 35780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:11,024-Speed 3394.85 samples/sec   Loss 5.8075   LearningRate 0.0470   Epoch: 6   Global Step: 35790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:14,050-Speed 3385.10 samples/sec   Loss 5.8851   LearningRate 0.0469   Epoch: 6   Global Step: 35800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:17,067-Speed 3395.06 samples/sec   Loss 5.7339   LearningRate 0.0469   Epoch: 6   Global Step: 35810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:20,088-Speed 3390.26 samples/sec   Loss 5.8677   LearningRate 0.0469   Epoch: 6   Global Step: 35820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:23,119-Speed 3378.87 samples/sec   Loss 5.8479   LearningRate 0.0469   Epoch: 6   Global Step: 35830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:26,140-Speed 3391.21 samples/sec   Loss 5.7574   LearningRate 0.0469   Epoch: 6   Global Step: 35840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:29,161-Speed 3390.25 samples/sec   Loss 5.9462   LearningRate 0.0469   Epoch: 6   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:32,179-Speed 3393.12 samples/sec   Loss 5.8646   LearningRate 0.0469   Epoch: 6   Global Step: 35860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:35,207-Speed 3383.53 samples/sec   Loss 5.8880   LearningRate 0.0469   Epoch: 6   Global Step: 35870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:07:38,237-Speed 3380.22 samples/sec   Loss 5.8590   LearningRate 0.0469   Epoch: 6   Global Step: 35880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:41,270-Speed 3376.49 samples/sec   Loss 5.8103   LearningRate 0.0468   Epoch: 6   Global Step: 35890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:44,296-Speed 3385.07 samples/sec   Loss 5.7515   LearningRate 0.0468   Epoch: 6   Global Step: 35900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:47,322-Speed 3384.08 samples/sec   Loss 5.8948   LearningRate 0.0468   Epoch: 6   Global Step: 35910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:50,468-Speed 3255.76 samples/sec   Loss 5.7739   LearningRate 0.0468   Epoch: 6   Global Step: 35920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:53,497-Speed 3381.43 samples/sec   Loss 5.7926   LearningRate 0.0468   Epoch: 6   Global Step: 35930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:56,527-Speed 3380.23 samples/sec   Loss 5.7041   LearningRate 0.0468   Epoch: 6   Global Step: 35940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:07:59,534-Speed 3406.04 samples/sec   Loss 5.6901   LearningRate 0.0468   Epoch: 6   Global Step: 35950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:08:02,567-Speed 3377.55 samples/sec   Loss 5.8763   LearningRate 0.0468   Epoch: 6   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:08:05,588-Speed 3390.81 samples/sec   Loss 5.7157   LearningRate 0.0467   Epoch: 6   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:08:08,612-Speed 3386.86 samples/sec   Loss 5.8895   LearningRate 0.0467   Epoch: 6   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:08:11,638-Speed 3385.02 samples/sec   Loss 5.7919   LearningRate 0.0467   Epoch: 6   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:08:14,667-Speed 3380.83 samples/sec   Loss 5.7868   LearningRate 0.0467   Epoch: 6   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:08:58,001-[lfw][36000]XNorm: 22.579901
Training: 2022-04-27 05:08:58,002-[lfw][36000]Accuracy-Flip: 0.99633+-0.00323
Training: 2022-04-27 05:08:58,002-[lfw][36000]Accuracy-Highest: 0.99817
Training: 2022-04-27 05:09:48,499-[cfp_fp][36000]XNorm: 19.885575
Training: 2022-04-27 05:09:48,500-[cfp_fp][36000]Accuracy-Flip: 0.95186+-0.01214
Training: 2022-04-27 05:09:48,500-[cfp_fp][36000]Accuracy-Highest: 0.95186
Training: 2022-04-27 05:10:31,812-[agedb_30][36000]XNorm: 22.228437
Training: 2022-04-27 05:10:31,812-[agedb_30][36000]Accuracy-Flip: 0.97433+-0.00742
Training: 2022-04-27 05:10:31,813-[agedb_30][36000]Accuracy-Highest: 0.97467
Training: 2022-04-27 05:10:34,832-Speed 73.06 samples/sec   Loss 5.8540   LearningRate 0.0467   Epoch: 6   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:10:37,868-Speed 3373.08 samples/sec   Loss 5.8007   LearningRate 0.0467   Epoch: 6   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:10:40,890-Speed 3389.31 samples/sec   Loss 5.7804   LearningRate 0.0467   Epoch: 6   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:10:43,902-Speed 3400.47 samples/sec   Loss 5.7788   LearningRate 0.0467   Epoch: 6   Global Step: 36040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:10:46,915-Speed 3400.03 samples/sec   Loss 5.7710   LearningRate 0.0466   Epoch: 6   Global Step: 36050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:10:49,925-Speed 3402.44 samples/sec   Loss 5.8117   LearningRate 0.0466   Epoch: 6   Global Step: 36060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:10:52,941-Speed 3395.67 samples/sec   Loss 5.9063   LearningRate 0.0466   Epoch: 6   Global Step: 36070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:10:55,940-Speed 3415.10 samples/sec   Loss 5.8764   LearningRate 0.0466   Epoch: 6   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:10:58,966-Speed 3385.47 samples/sec   Loss 5.8845   LearningRate 0.0466   Epoch: 6   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:02,013-Speed 3361.59 samples/sec   Loss 5.8996   LearningRate 0.0466   Epoch: 6   Global Step: 36100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:05,032-Speed 3392.83 samples/sec   Loss 5.8672   LearningRate 0.0466   Epoch: 6   Global Step: 36110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:08,064-Speed 3377.26 samples/sec   Loss 5.7778   LearningRate 0.0466   Epoch: 6   Global Step: 36120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:11,082-Speed 3394.53 samples/sec   Loss 5.6838   LearningRate 0.0466   Epoch: 6   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:14,101-Speed 3392.41 samples/sec   Loss 5.8511   LearningRate 0.0465   Epoch: 6   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:17,121-Speed 3391.65 samples/sec   Loss 5.7776   LearningRate 0.0465   Epoch: 6   Global Step: 36150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:20,146-Speed 3384.93 samples/sec   Loss 5.7904   LearningRate 0.0465   Epoch: 6   Global Step: 36160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:23,164-Speed 3394.82 samples/sec   Loss 5.8452   LearningRate 0.0465   Epoch: 6   Global Step: 36170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:26,239-Speed 3331.17 samples/sec   Loss 5.7661   LearningRate 0.0465   Epoch: 6   Global Step: 36180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:11:29,255-Speed 3395.32 samples/sec   Loss 5.8509   LearningRate 0.0465   Epoch: 6   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:11:32,272-Speed 3395.00 samples/sec   Loss 5.7710   LearningRate 0.0465   Epoch: 6   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:11:35,295-Speed 3388.16 samples/sec   Loss 5.8700   LearningRate 0.0465   Epoch: 6   Global Step: 36210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:11:38,316-Speed 3390.84 samples/sec   Loss 5.8711   LearningRate 0.0464   Epoch: 6   Global Step: 36220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:11:41,338-Speed 3388.70 samples/sec   Loss 5.7817   LearningRate 0.0464   Epoch: 6   Global Step: 36230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:11:44,346-Speed 3404.68 samples/sec   Loss 5.7696   LearningRate 0.0464   Epoch: 6   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:47,378-Speed 3378.04 samples/sec   Loss 5.8935   LearningRate 0.0464   Epoch: 6   Global Step: 36250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:50,407-Speed 3382.13 samples/sec   Loss 5.8346   LearningRate 0.0464   Epoch: 6   Global Step: 36260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:53,432-Speed 3386.38 samples/sec   Loss 5.7593   LearningRate 0.0464   Epoch: 6   Global Step: 36270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:56,455-Speed 3387.48 samples/sec   Loss 5.9256   LearningRate 0.0464   Epoch: 6   Global Step: 36280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:11:59,484-Speed 3381.92 samples/sec   Loss 5.8543   LearningRate 0.0464   Epoch: 6   Global Step: 36290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:02,513-Speed 3381.33 samples/sec   Loss 5.7610   LearningRate 0.0463   Epoch: 6   Global Step: 36300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:05,542-Speed 3381.06 samples/sec   Loss 5.7439   LearningRate 0.0463   Epoch: 6   Global Step: 36310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:08,568-Speed 3384.99 samples/sec   Loss 5.7668   LearningRate 0.0463   Epoch: 6   Global Step: 36320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:11,604-Speed 3373.93 samples/sec   Loss 5.6248   LearningRate 0.0463   Epoch: 6   Global Step: 36330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:14,639-Speed 3373.88 samples/sec   Loss 5.8668   LearningRate 0.0463   Epoch: 6   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:12:17,693-Speed 3354.85 samples/sec   Loss 5.6914   LearningRate 0.0463   Epoch: 6   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:12:20,697-Speed 3409.26 samples/sec   Loss 5.8281   LearningRate 0.0463   Epoch: 6   Global Step: 36360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:23,724-Speed 3383.79 samples/sec   Loss 5.6638   LearningRate 0.0463   Epoch: 6   Global Step: 36370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:26,745-Speed 3392.06 samples/sec   Loss 5.8461   LearningRate 0.0463   Epoch: 6   Global Step: 36380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:29,765-Speed 3391.25 samples/sec   Loss 5.8464   LearningRate 0.0462   Epoch: 6   Global Step: 36390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:32,788-Speed 3387.59 samples/sec   Loss 5.7837   LearningRate 0.0462   Epoch: 6   Global Step: 36400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:35,816-Speed 3383.42 samples/sec   Loss 5.7493   LearningRate 0.0462   Epoch: 6   Global Step: 36410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:38,842-Speed 3383.81 samples/sec   Loss 5.8292   LearningRate 0.0462   Epoch: 6   Global Step: 36420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:41,868-Speed 3385.46 samples/sec   Loss 5.7290   LearningRate 0.0462   Epoch: 6   Global Step: 36430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:44,886-Speed 3394.20 samples/sec   Loss 5.7274   LearningRate 0.0462   Epoch: 6   Global Step: 36440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:47,904-Speed 3393.08 samples/sec   Loss 5.8202   LearningRate 0.0462   Epoch: 6   Global Step: 36450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:50,929-Speed 3385.87 samples/sec   Loss 5.8134   LearningRate 0.0462   Epoch: 6   Global Step: 36460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:12:53,968-Speed 3370.42 samples/sec   Loss 5.7076   LearningRate 0.0461   Epoch: 6   Global Step: 36470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:12:56,993-Speed 3385.96 samples/sec   Loss 5.8643   LearningRate 0.0461   Epoch: 6   Global Step: 36480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:00,013-Speed 3391.71 samples/sec   Loss 5.7715   LearningRate 0.0461   Epoch: 6   Global Step: 36490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:03,037-Speed 3387.23 samples/sec   Loss 5.7210   LearningRate 0.0461   Epoch: 6   Global Step: 36500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:06,061-Speed 3387.68 samples/sec   Loss 5.6123   LearningRate 0.0461   Epoch: 6   Global Step: 36510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:09,089-Speed 3382.73 samples/sec   Loss 5.7927   LearningRate 0.0461   Epoch: 6   Global Step: 36520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:12,110-Speed 3389.80 samples/sec   Loss 5.6720   LearningRate 0.0461   Epoch: 6   Global Step: 36530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:15,135-Speed 3386.10 samples/sec   Loss 5.7167   LearningRate 0.0461   Epoch: 6   Global Step: 36540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:18,158-Speed 3388.65 samples/sec   Loss 5.8985   LearningRate 0.0460   Epoch: 6   Global Step: 36550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:21,198-Speed 3369.51 samples/sec   Loss 5.9056   LearningRate 0.0460   Epoch: 6   Global Step: 36560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:24,221-Speed 3387.72 samples/sec   Loss 5.7509   LearningRate 0.0460   Epoch: 6   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:13:27,252-Speed 3379.43 samples/sec   Loss 5.7672   LearningRate 0.0460   Epoch: 6   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:13:30,270-Speed 3393.37 samples/sec   Loss 5.7194   LearningRate 0.0460   Epoch: 6   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:13:33,293-Speed 3388.30 samples/sec   Loss 5.6185   LearningRate 0.0460   Epoch: 6   Global Step: 36600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:13:36,313-Speed 3391.39 samples/sec   Loss 5.7293   LearningRate 0.0460   Epoch: 6   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:13:39,324-Speed 3401.66 samples/sec   Loss 5.6936   LearningRate 0.0460   Epoch: 6   Global Step: 36620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:42,346-Speed 3390.14 samples/sec   Loss 5.9220   LearningRate 0.0460   Epoch: 6   Global Step: 36630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:45,370-Speed 3386.88 samples/sec   Loss 5.8367   LearningRate 0.0459   Epoch: 6   Global Step: 36640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:48,390-Speed 3391.15 samples/sec   Loss 5.8442   LearningRate 0.0459   Epoch: 6   Global Step: 36650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:51,413-Speed 3387.53 samples/sec   Loss 5.7795   LearningRate 0.0459   Epoch: 6   Global Step: 36660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:54,436-Speed 3388.32 samples/sec   Loss 5.8565   LearningRate 0.0459   Epoch: 6   Global Step: 36670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:13:57,468-Speed 3378.69 samples/sec   Loss 5.8205   LearningRate 0.0459   Epoch: 6   Global Step: 36680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:00,492-Speed 3387.14 samples/sec   Loss 5.7855   LearningRate 0.0459   Epoch: 6   Global Step: 36690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:03,520-Speed 3381.93 samples/sec   Loss 5.6302   LearningRate 0.0459   Epoch: 6   Global Step: 36700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:06,570-Speed 3359.15 samples/sec   Loss 5.8004   LearningRate 0.0459   Epoch: 6   Global Step: 36710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:09,575-Speed 3408.17 samples/sec   Loss 5.7962   LearningRate 0.0458   Epoch: 6   Global Step: 36720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:12,613-Speed 3371.60 samples/sec   Loss 5.9545   LearningRate 0.0458   Epoch: 6   Global Step: 36730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:15,731-Speed 3284.68 samples/sec   Loss 5.7476   LearningRate 0.0458   Epoch: 6   Global Step: 36740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:18,763-Speed 3378.31 samples/sec   Loss 5.6905   LearningRate 0.0458   Epoch: 6   Global Step: 36750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:21,793-Speed 3379.58 samples/sec   Loss 5.7367   LearningRate 0.0458   Epoch: 6   Global Step: 36760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:24,823-Speed 3379.94 samples/sec   Loss 5.8363   LearningRate 0.0458   Epoch: 6   Global Step: 36770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:27,850-Speed 3383.43 samples/sec   Loss 5.8571   LearningRate 0.0458   Epoch: 6   Global Step: 36780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:30,880-Speed 3380.91 samples/sec   Loss 5.7562   LearningRate 0.0458   Epoch: 6   Global Step: 36790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:33,911-Speed 3380.12 samples/sec   Loss 5.7109   LearningRate 0.0458   Epoch: 6   Global Step: 36800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:36,938-Speed 3383.81 samples/sec   Loss 5.7363   LearningRate 0.0457   Epoch: 6   Global Step: 36810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:39,981-Speed 3365.60 samples/sec   Loss 5.9763   LearningRate 0.0457   Epoch: 6   Global Step: 36820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:14:42,990-Speed 3403.53 samples/sec   Loss 5.7001   LearningRate 0.0457   Epoch: 6   Global Step: 36830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:46,014-Speed 3387.31 samples/sec   Loss 5.6609   LearningRate 0.0457   Epoch: 6   Global Step: 36840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:49,037-Speed 3388.12 samples/sec   Loss 5.8510   LearningRate 0.0457   Epoch: 6   Global Step: 36850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:52,065-Speed 3382.48 samples/sec   Loss 5.7465   LearningRate 0.0457   Epoch: 6   Global Step: 36860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:55,091-Speed 3384.97 samples/sec   Loss 5.8179   LearningRate 0.0457   Epoch: 6   Global Step: 36870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:14:58,135-Speed 3364.81 samples/sec   Loss 5.9444   LearningRate 0.0457   Epoch: 6   Global Step: 36880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:01,202-Speed 3340.01 samples/sec   Loss 5.7132   LearningRate 0.0456   Epoch: 6   Global Step: 36890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:04,226-Speed 3386.58 samples/sec   Loss 5.8259   LearningRate 0.0456   Epoch: 6   Global Step: 36900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:07,252-Speed 3384.94 samples/sec   Loss 5.8894   LearningRate 0.0456   Epoch: 6   Global Step: 36910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:10,278-Speed 3384.38 samples/sec   Loss 5.7622   LearningRate 0.0456   Epoch: 6   Global Step: 36920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:13,306-Speed 3382.58 samples/sec   Loss 5.8176   LearningRate 0.0456   Epoch: 6   Global Step: 36930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:15:16,369-Speed 3343.70 samples/sec   Loss 5.8879   LearningRate 0.0456   Epoch: 6   Global Step: 36940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:15:19,414-Speed 3363.63 samples/sec   Loss 5.8861   LearningRate 0.0456   Epoch: 6   Global Step: 36950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:15:22,438-Speed 3387.32 samples/sec   Loss 5.8824   LearningRate 0.0456   Epoch: 6   Global Step: 36960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:15:25,464-Speed 3384.74 samples/sec   Loss 5.7370   LearningRate 0.0455   Epoch: 6   Global Step: 36970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:28,529-Speed 3341.70 samples/sec   Loss 5.7442   LearningRate 0.0455   Epoch: 6   Global Step: 36980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:31,586-Speed 3350.67 samples/sec   Loss 5.7037   LearningRate 0.0455   Epoch: 6   Global Step: 36990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:34,616-Speed 3380.37 samples/sec   Loss 5.7712   LearningRate 0.0455   Epoch: 6   Global Step: 37000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:37,641-Speed 3385.40 samples/sec   Loss 5.6752   LearningRate 0.0455   Epoch: 6   Global Step: 37010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:40,666-Speed 3386.87 samples/sec   Loss 5.8494   LearningRate 0.0455   Epoch: 6   Global Step: 37020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:43,688-Speed 3388.47 samples/sec   Loss 5.7004   LearningRate 0.0455   Epoch: 6   Global Step: 37030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:46,723-Speed 3374.33 samples/sec   Loss 5.6811   LearningRate 0.0455   Epoch: 6   Global Step: 37040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:49,759-Speed 3373.92 samples/sec   Loss 5.8860   LearningRate 0.0455   Epoch: 6   Global Step: 37050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:52,796-Speed 3373.05 samples/sec   Loss 5.7164   LearningRate 0.0454   Epoch: 6   Global Step: 37060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:15:55,826-Speed 3379.99 samples/sec   Loss 5.6236   LearningRate 0.0454   Epoch: 6   Global Step: 37070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:15:58,849-Speed 3388.71 samples/sec   Loss 5.7367   LearningRate 0.0454   Epoch: 6   Global Step: 37080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:01,876-Speed 3383.75 samples/sec   Loss 5.7134   LearningRate 0.0454   Epoch: 6   Global Step: 37090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:04,900-Speed 3386.32 samples/sec   Loss 5.7980   LearningRate 0.0454   Epoch: 6   Global Step: 37100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:07,927-Speed 3384.49 samples/sec   Loss 5.7058   LearningRate 0.0454   Epoch: 6   Global Step: 37110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:10,942-Speed 3396.06 samples/sec   Loss 5.7877   LearningRate 0.0454   Epoch: 6   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:13,968-Speed 3384.82 samples/sec   Loss 5.8762   LearningRate 0.0454   Epoch: 6   Global Step: 37130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:17,023-Speed 3352.77 samples/sec   Loss 5.7589   LearningRate 0.0453   Epoch: 6   Global Step: 37140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:20,049-Speed 3385.09 samples/sec   Loss 5.6642   LearningRate 0.0453   Epoch: 6   Global Step: 37150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:23,070-Speed 3390.42 samples/sec   Loss 5.5799   LearningRate 0.0453   Epoch: 6   Global Step: 37160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:26,098-Speed 3383.10 samples/sec   Loss 5.6886   LearningRate 0.0453   Epoch: 6   Global Step: 37170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:29,125-Speed 3382.68 samples/sec   Loss 5.7781   LearningRate 0.0453   Epoch: 6   Global Step: 37180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:32,149-Speed 3387.25 samples/sec   Loss 5.7631   LearningRate 0.0453   Epoch: 6   Global Step: 37190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:35,181-Speed 3378.39 samples/sec   Loss 5.8264   LearningRate 0.0453   Epoch: 6   Global Step: 37200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:38,264-Speed 3322.35 samples/sec   Loss 5.8482   LearningRate 0.0453   Epoch: 6   Global Step: 37210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:41,290-Speed 3384.70 samples/sec   Loss 5.6801   LearningRate 0.0453   Epoch: 6   Global Step: 37220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:44,315-Speed 3385.42 samples/sec   Loss 5.7332   LearningRate 0.0452   Epoch: 6   Global Step: 37230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:47,348-Speed 3377.86 samples/sec   Loss 5.7345   LearningRate 0.0452   Epoch: 6   Global Step: 37240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:50,369-Speed 3389.91 samples/sec   Loss 5.5688   LearningRate 0.0452   Epoch: 6   Global Step: 37250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:16:53,388-Speed 3392.77 samples/sec   Loss 5.8011   LearningRate 0.0452   Epoch: 6   Global Step: 37260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:56,411-Speed 3388.33 samples/sec   Loss 5.8093   LearningRate 0.0452   Epoch: 6   Global Step: 37270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:16:59,437-Speed 3384.27 samples/sec   Loss 5.5784   LearningRate 0.0452   Epoch: 6   Global Step: 37280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:02,460-Speed 3387.84 samples/sec   Loss 5.7878   LearningRate 0.0452   Epoch: 6   Global Step: 37290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:05,494-Speed 3375.96 samples/sec   Loss 5.6911   LearningRate 0.0452   Epoch: 6   Global Step: 37300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:08,522-Speed 3382.81 samples/sec   Loss 5.6992   LearningRate 0.0451   Epoch: 6   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:11,549-Speed 3383.45 samples/sec   Loss 5.6757   LearningRate 0.0451   Epoch: 6   Global Step: 37320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:14,588-Speed 3370.91 samples/sec   Loss 5.7388   LearningRate 0.0451   Epoch: 6   Global Step: 37330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:17,614-Speed 3384.86 samples/sec   Loss 5.5986   LearningRate 0.0451   Epoch: 6   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:20,642-Speed 3382.21 samples/sec   Loss 5.7645   LearningRate 0.0451   Epoch: 6   Global Step: 37350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:17:23,667-Speed 3387.23 samples/sec   Loss 5.7943   LearningRate 0.0451   Epoch: 6   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:26,691-Speed 3387.01 samples/sec   Loss 5.7805   LearningRate 0.0451   Epoch: 6   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:29,715-Speed 3385.84 samples/sec   Loss 5.7648   LearningRate 0.0451   Epoch: 6   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:32,749-Speed 3376.69 samples/sec   Loss 5.8156   LearningRate 0.0451   Epoch: 6   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:35,778-Speed 3381.05 samples/sec   Loss 5.8170   LearningRate 0.0450   Epoch: 6   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:38,805-Speed 3383.91 samples/sec   Loss 5.6790   LearningRate 0.0450   Epoch: 6   Global Step: 37410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:41,837-Speed 3377.79 samples/sec   Loss 5.5345   LearningRate 0.0450   Epoch: 6   Global Step: 37420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:44,883-Speed 3362.50 samples/sec   Loss 5.7425   LearningRate 0.0450   Epoch: 6   Global Step: 37430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:47,935-Speed 3356.74 samples/sec   Loss 5.6453   LearningRate 0.0450   Epoch: 6   Global Step: 37440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:50,964-Speed 3381.46 samples/sec   Loss 5.8367   LearningRate 0.0450   Epoch: 6   Global Step: 37450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:17:53,988-Speed 3386.80 samples/sec   Loss 5.7501   LearningRate 0.0450   Epoch: 6   Global Step: 37460   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-27 05:17:56,998-Speed 3402.97 samples/sec   Loss 5.7739   LearningRate 0.0450   Epoch: 6   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:00,020-Speed 3388.97 samples/sec   Loss 5.7578   LearningRate 0.0449   Epoch: 6   Global Step: 37480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:03,051-Speed 3378.69 samples/sec   Loss 5.7489   LearningRate 0.0449   Epoch: 6   Global Step: 37490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:06,089-Speed 3372.11 samples/sec   Loss 5.5716   LearningRate 0.0449   Epoch: 6   Global Step: 37500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:09,122-Speed 3376.96 samples/sec   Loss 5.7275   LearningRate 0.0449   Epoch: 6   Global Step: 37510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:12,151-Speed 3381.77 samples/sec   Loss 5.5893   LearningRate 0.0449   Epoch: 6   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:15,178-Speed 3382.85 samples/sec   Loss 5.6617   LearningRate 0.0449   Epoch: 6   Global Step: 37530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:18,211-Speed 3377.95 samples/sec   Loss 5.8451   LearningRate 0.0449   Epoch: 6   Global Step: 37540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:21,236-Speed 3384.95 samples/sec   Loss 5.6534   LearningRate 0.0449   Epoch: 6   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:24,247-Speed 3402.44 samples/sec   Loss 5.8194   LearningRate 0.0449   Epoch: 6   Global Step: 37560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:27,274-Speed 3383.01 samples/sec   Loss 5.6440   LearningRate 0.0448   Epoch: 6   Global Step: 37570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:30,300-Speed 3385.55 samples/sec   Loss 5.6582   LearningRate 0.0448   Epoch: 6   Global Step: 37580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:33,339-Speed 3369.97 samples/sec   Loss 5.8522   LearningRate 0.0448   Epoch: 6   Global Step: 37590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:36,465-Speed 3276.12 samples/sec   Loss 5.7505   LearningRate 0.0448   Epoch: 6   Global Step: 37600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:39,495-Speed 3381.18 samples/sec   Loss 5.8234   LearningRate 0.0448   Epoch: 6   Global Step: 37610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:42,523-Speed 3381.98 samples/sec   Loss 5.6967   LearningRate 0.0448   Epoch: 6   Global Step: 37620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:45,555-Speed 3378.24 samples/sec   Loss 5.8116   LearningRate 0.0448   Epoch: 6   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:48,583-Speed 3383.67 samples/sec   Loss 5.6538   LearningRate 0.0448   Epoch: 6   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:51,611-Speed 3381.68 samples/sec   Loss 5.6800   LearningRate 0.0447   Epoch: 6   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:18:54,643-Speed 3377.76 samples/sec   Loss 5.6432   LearningRate 0.0447   Epoch: 6   Global Step: 37660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:18:57,676-Speed 3377.30 samples/sec   Loss 5.7838   LearningRate 0.0447   Epoch: 6   Global Step: 37670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:00,743-Speed 3339.74 samples/sec   Loss 5.6248   LearningRate 0.0447   Epoch: 6   Global Step: 37680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:03,864-Speed 3282.32 samples/sec   Loss 5.6824   LearningRate 0.0447   Epoch: 6   Global Step: 37690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:06,901-Speed 3372.27 samples/sec   Loss 5.7528   LearningRate 0.0447   Epoch: 6   Global Step: 37700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:09,931-Speed 3380.07 samples/sec   Loss 5.6903   LearningRate 0.0447   Epoch: 6   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:12,959-Speed 3382.40 samples/sec   Loss 5.8224   LearningRate 0.0447   Epoch: 6   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:15,986-Speed 3383.95 samples/sec   Loss 5.7195   LearningRate 0.0447   Epoch: 6   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:18,997-Speed 3403.85 samples/sec   Loss 5.6048   LearningRate 0.0446   Epoch: 6   Global Step: 37740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:22,043-Speed 3362.56 samples/sec   Loss 5.7686   LearningRate 0.0446   Epoch: 6   Global Step: 37750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:25,085-Speed 3366.97 samples/sec   Loss 5.6838   LearningRate 0.0446   Epoch: 6   Global Step: 37760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:28,118-Speed 3376.99 samples/sec   Loss 5.8582   LearningRate 0.0446   Epoch: 6   Global Step: 37770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:31,140-Speed 3389.57 samples/sec   Loss 5.7218   LearningRate 0.0446   Epoch: 6   Global Step: 37780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:34,173-Speed 3377.10 samples/sec   Loss 5.5892   LearningRate 0.0446   Epoch: 6   Global Step: 37790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:37,201-Speed 3382.12 samples/sec   Loss 5.6725   LearningRate 0.0446   Epoch: 6   Global Step: 37800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:40,257-Speed 3351.70 samples/sec   Loss 5.6642   LearningRate 0.0446   Epoch: 6   Global Step: 37810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:43,333-Speed 3330.44 samples/sec   Loss 5.7810   LearningRate 0.0445   Epoch: 6   Global Step: 37820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:46,364-Speed 3378.74 samples/sec   Loss 5.7816   LearningRate 0.0445   Epoch: 6   Global Step: 37830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:19:49,449-Speed 3320.79 samples/sec   Loss 5.7571   LearningRate 0.0445   Epoch: 6   Global Step: 37840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:52,484-Speed 3374.09 samples/sec   Loss 5.7876   LearningRate 0.0445   Epoch: 6   Global Step: 37850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:55,525-Speed 3367.94 samples/sec   Loss 5.7666   LearningRate 0.0445   Epoch: 6   Global Step: 37860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:19:58,561-Speed 3374.24 samples/sec   Loss 5.5770   LearningRate 0.0445   Epoch: 6   Global Step: 37870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:01,596-Speed 3374.58 samples/sec   Loss 5.6771   LearningRate 0.0445   Epoch: 6   Global Step: 37880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:04,625-Speed 3381.75 samples/sec   Loss 5.7127   LearningRate 0.0445   Epoch: 6   Global Step: 37890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:07,653-Speed 3381.96 samples/sec   Loss 5.7525   LearningRate 0.0445   Epoch: 6   Global Step: 37900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:10,767-Speed 3289.62 samples/sec   Loss 5.7664   LearningRate 0.0444   Epoch: 6   Global Step: 37910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:13,797-Speed 3380.40 samples/sec   Loss 5.5331   LearningRate 0.0444   Epoch: 6   Global Step: 37920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:16,831-Speed 3376.01 samples/sec   Loss 5.7212   LearningRate 0.0444   Epoch: 6   Global Step: 37930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:19,852-Speed 3389.63 samples/sec   Loss 5.6906   LearningRate 0.0444   Epoch: 6   Global Step: 37940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:20:22,866-Speed 3398.19 samples/sec   Loss 5.6012   LearningRate 0.0444   Epoch: 6   Global Step: 37950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:20:25,891-Speed 3386.17 samples/sec   Loss 5.6236   LearningRate 0.0444   Epoch: 6   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:20:28,914-Speed 3388.44 samples/sec   Loss 5.6469   LearningRate 0.0444   Epoch: 6   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:20:31,941-Speed 3383.40 samples/sec   Loss 5.5518   LearningRate 0.0444   Epoch: 6   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:20:35,035-Speed 3310.16 samples/sec   Loss 5.7656   LearningRate 0.0443   Epoch: 6   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:20:38,069-Speed 3375.99 samples/sec   Loss 5.7858   LearningRate 0.0443   Epoch: 6   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:21:21,356-[lfw][38000]XNorm: 21.401213
Training: 2022-04-27 05:21:21,356-[lfw][38000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-27 05:21:21,357-[lfw][38000]Accuracy-Highest: 0.99817
Training: 2022-04-27 05:22:11,705-[cfp_fp][38000]XNorm: 19.202028
Training: 2022-04-27 05:22:11,706-[cfp_fp][38000]Accuracy-Flip: 0.95300+-0.01039
Training: 2022-04-27 05:22:11,706-[cfp_fp][38000]Accuracy-Highest: 0.95300
Training: 2022-04-27 05:22:55,029-[agedb_30][38000]XNorm: 21.345746
Training: 2022-04-27 05:22:55,029-[agedb_30][38000]Accuracy-Flip: 0.97300+-0.00954
Training: 2022-04-27 05:22:55,030-[agedb_30][38000]Accuracy-Highest: 0.97467
Training: 2022-04-27 05:22:58,047-Speed 73.15 samples/sec   Loss 5.7120   LearningRate 0.0443   Epoch: 6   Global Step: 38010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:01,092-Speed 3364.11 samples/sec   Loss 5.6740   LearningRate 0.0443   Epoch: 6   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:04,100-Speed 3405.03 samples/sec   Loss 5.6843   LearningRate 0.0443   Epoch: 6   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:07,105-Speed 3408.33 samples/sec   Loss 5.7002   LearningRate 0.0443   Epoch: 6   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:10,121-Speed 3395.91 samples/sec   Loss 5.5891   LearningRate 0.0443   Epoch: 6   Global Step: 38050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:23:13,125-Speed 3409.87 samples/sec   Loss 5.6677   LearningRate 0.0443   Epoch: 6   Global Step: 38060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:16,139-Speed 3397.68 samples/sec   Loss 5.6948   LearningRate 0.0443   Epoch: 6   Global Step: 38070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:19,172-Speed 3377.43 samples/sec   Loss 5.7874   LearningRate 0.0442   Epoch: 6   Global Step: 38080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:22,194-Speed 3389.63 samples/sec   Loss 5.7445   LearningRate 0.0442   Epoch: 6   Global Step: 38090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:25,216-Speed 3389.39 samples/sec   Loss 5.5880   LearningRate 0.0442   Epoch: 6   Global Step: 38100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:28,259-Speed 3366.28 samples/sec   Loss 5.6339   LearningRate 0.0442   Epoch: 6   Global Step: 38110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:31,275-Speed 3395.92 samples/sec   Loss 5.6732   LearningRate 0.0442   Epoch: 6   Global Step: 38120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:34,302-Speed 3383.39 samples/sec   Loss 5.7354   LearningRate 0.0442   Epoch: 6   Global Step: 38130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:37,330-Speed 3382.68 samples/sec   Loss 5.7032   LearningRate 0.0442   Epoch: 6   Global Step: 38140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:40,350-Speed 3391.67 samples/sec   Loss 5.6990   LearningRate 0.0442   Epoch: 6   Global Step: 38150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:43,364-Speed 3397.84 samples/sec   Loss 5.6597   LearningRate 0.0441   Epoch: 6   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:23:46,384-Speed 3391.38 samples/sec   Loss 5.6236   LearningRate 0.0441   Epoch: 6   Global Step: 38170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:23:49,386-Speed 3411.84 samples/sec   Loss 5.7098   LearningRate 0.0441   Epoch: 6   Global Step: 38180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:52,408-Speed 3389.34 samples/sec   Loss 5.6140   LearningRate 0.0441   Epoch: 6   Global Step: 38190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:55,420-Speed 3400.74 samples/sec   Loss 5.7329   LearningRate 0.0441   Epoch: 6   Global Step: 38200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:23:58,445-Speed 3386.29 samples/sec   Loss 5.7605   LearningRate 0.0441   Epoch: 6   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:24:01,471-Speed 3384.43 samples/sec   Loss 5.7490   LearningRate 0.0441   Epoch: 6   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:24:04,486-Speed 3397.01 samples/sec   Loss 5.6088   LearningRate 0.0441   Epoch: 6   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:24:07,533-Speed 3361.67 samples/sec   Loss 5.5270   LearningRate 0.0441   Epoch: 6   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:24:10,558-Speed 3386.09 samples/sec   Loss 5.7682   LearningRate 0.0440   Epoch: 6   Global Step: 38250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:24:13,583-Speed 3386.31 samples/sec   Loss 5.6151   LearningRate 0.0440   Epoch: 6   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:24:16,638-Speed 3352.53 samples/sec   Loss 5.7346   LearningRate 0.0440   Epoch: 6   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:24:19,658-Speed 3391.32 samples/sec   Loss 5.6045   LearningRate 0.0440   Epoch: 6   Global Step: 38280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:22,677-Speed 3392.44 samples/sec   Loss 5.7679   LearningRate 0.0440   Epoch: 6   Global Step: 38290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:25,699-Speed 3389.41 samples/sec   Loss 5.6161   LearningRate 0.0440   Epoch: 6   Global Step: 38300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:28,717-Speed 3394.04 samples/sec   Loss 5.5343   LearningRate 0.0440   Epoch: 6   Global Step: 38310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:31,735-Speed 3393.27 samples/sec   Loss 5.4656   LearningRate 0.0440   Epoch: 6   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:34,747-Speed 3400.07 samples/sec   Loss 5.5941   LearningRate 0.0439   Epoch: 6   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:37,757-Speed 3403.01 samples/sec   Loss 5.6093   LearningRate 0.0439   Epoch: 6   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:40,779-Speed 3390.20 samples/sec   Loss 5.6287   LearningRate 0.0439   Epoch: 6   Global Step: 38350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:43,791-Speed 3400.20 samples/sec   Loss 5.6158   LearningRate 0.0439   Epoch: 6   Global Step: 38360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:46,804-Speed 3399.02 samples/sec   Loss 5.6786   LearningRate 0.0439   Epoch: 6   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:49,803-Speed 3415.84 samples/sec   Loss 5.6930   LearningRate 0.0439   Epoch: 6   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:52,814-Speed 3401.28 samples/sec   Loss 5.5591   LearningRate 0.0439   Epoch: 6   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:55,828-Speed 3398.74 samples/sec   Loss 5.5574   LearningRate 0.0439   Epoch: 6   Global Step: 38400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:24:58,839-Speed 3401.47 samples/sec   Loss 5.6032   LearningRate 0.0439   Epoch: 6   Global Step: 38410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:01,856-Speed 3394.53 samples/sec   Loss 5.6341   LearningRate 0.0438   Epoch: 6   Global Step: 38420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:04,879-Speed 3388.01 samples/sec   Loss 5.6145   LearningRate 0.0438   Epoch: 6   Global Step: 38430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:07,894-Speed 3397.94 samples/sec   Loss 5.7074   LearningRate 0.0438   Epoch: 6   Global Step: 38440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:10,906-Speed 3400.61 samples/sec   Loss 5.7100   LearningRate 0.0438   Epoch: 6   Global Step: 38450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:13,918-Speed 3401.18 samples/sec   Loss 5.6578   LearningRate 0.0438   Epoch: 6   Global Step: 38460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:16,928-Speed 3402.15 samples/sec   Loss 5.5616   LearningRate 0.0438   Epoch: 6   Global Step: 38470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:19,965-Speed 3372.43 samples/sec   Loss 5.6416   LearningRate 0.0438   Epoch: 6   Global Step: 38480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:22,987-Speed 3388.93 samples/sec   Loss 5.6620   LearningRate 0.0438   Epoch: 6   Global Step: 38490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:26,007-Speed 3392.31 samples/sec   Loss 5.5427   LearningRate 0.0438   Epoch: 6   Global Step: 38500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:29,024-Speed 3394.61 samples/sec   Loss 5.7583   LearningRate 0.0437   Epoch: 6   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:25:32,046-Speed 3389.44 samples/sec   Loss 5.6147   LearningRate 0.0437   Epoch: 6   Global Step: 38520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:25:35,061-Speed 3396.94 samples/sec   Loss 5.6259   LearningRate 0.0437   Epoch: 6   Global Step: 38530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:25:38,059-Speed 3416.69 samples/sec   Loss 5.6267   LearningRate 0.0437   Epoch: 6   Global Step: 38540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:41,070-Speed 3402.41 samples/sec   Loss 5.5410   LearningRate 0.0437   Epoch: 6   Global Step: 38550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:44,083-Speed 3398.82 samples/sec   Loss 5.5353   LearningRate 0.0437   Epoch: 6   Global Step: 38560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:47,096-Speed 3399.62 samples/sec   Loss 5.5000   LearningRate 0.0437   Epoch: 6   Global Step: 38570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:50,116-Speed 3391.46 samples/sec   Loss 5.8100   LearningRate 0.0437   Epoch: 6   Global Step: 38580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:53,132-Speed 3395.95 samples/sec   Loss 5.5331   LearningRate 0.0436   Epoch: 6   Global Step: 38590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:56,148-Speed 3395.36 samples/sec   Loss 5.6591   LearningRate 0.0436   Epoch: 6   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:25:59,164-Speed 3396.05 samples/sec   Loss 5.7270   LearningRate 0.0436   Epoch: 6   Global Step: 38610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:02,179-Speed 3397.33 samples/sec   Loss 5.6232   LearningRate 0.0436   Epoch: 6   Global Step: 38620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:05,192-Speed 3399.31 samples/sec   Loss 5.5716   LearningRate 0.0436   Epoch: 6   Global Step: 38630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:08,215-Speed 3388.46 samples/sec   Loss 5.5917   LearningRate 0.0436   Epoch: 6   Global Step: 38640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:26:11,331-Speed 3286.95 samples/sec   Loss 5.7596   LearningRate 0.0436   Epoch: 6   Global Step: 38650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:26:14,343-Speed 3400.42 samples/sec   Loss 5.5312   LearningRate 0.0436   Epoch: 6   Global Step: 38660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:26:17,357-Speed 3398.06 samples/sec   Loss 5.6027   LearningRate 0.0436   Epoch: 6   Global Step: 38670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:26:20,373-Speed 3396.71 samples/sec   Loss 5.5454   LearningRate 0.0435   Epoch: 6   Global Step: 38680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:26:23,390-Speed 3394.85 samples/sec   Loss 5.6616   LearningRate 0.0435   Epoch: 6   Global Step: 38690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:26:26,388-Speed 3416.45 samples/sec   Loss 5.5064   LearningRate 0.0435   Epoch: 6   Global Step: 38700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:29,408-Speed 3391.91 samples/sec   Loss 5.6382   LearningRate 0.0435   Epoch: 6   Global Step: 38710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:32,423-Speed 3396.65 samples/sec   Loss 5.5654   LearningRate 0.0435   Epoch: 6   Global Step: 38720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:35,474-Speed 3357.25 samples/sec   Loss 5.6420   LearningRate 0.0435   Epoch: 6   Global Step: 38730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:38,495-Speed 3389.87 samples/sec   Loss 5.7341   LearningRate 0.0435   Epoch: 6   Global Step: 38740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:41,516-Speed 3391.09 samples/sec   Loss 5.6000   LearningRate 0.0435   Epoch: 6   Global Step: 38750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:44,538-Speed 3388.81 samples/sec   Loss 5.4583   LearningRate 0.0434   Epoch: 6   Global Step: 38760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:47,556-Speed 3393.23 samples/sec   Loss 5.6728   LearningRate 0.0434   Epoch: 6   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:50,580-Speed 3387.86 samples/sec   Loss 5.7631   LearningRate 0.0434   Epoch: 6   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:53,595-Speed 3396.54 samples/sec   Loss 5.6863   LearningRate 0.0434   Epoch: 6   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:26:56,613-Speed 3394.95 samples/sec   Loss 5.7100   LearningRate 0.0434   Epoch: 6   Global Step: 38800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:26:59,639-Speed 3384.16 samples/sec   Loss 5.6648   LearningRate 0.0434   Epoch: 6   Global Step: 38810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:27:02,648-Speed 3404.06 samples/sec   Loss 5.5873   LearningRate 0.0434   Epoch: 6   Global Step: 38820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:05,666-Speed 3393.30 samples/sec   Loss 5.5921   LearningRate 0.0434   Epoch: 6   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:08,684-Speed 3394.11 samples/sec   Loss 5.5527   LearningRate 0.0434   Epoch: 6   Global Step: 38840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:11,712-Speed 3382.68 samples/sec   Loss 5.5128   LearningRate 0.0433   Epoch: 6   Global Step: 38850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:14,742-Speed 3380.39 samples/sec   Loss 5.6681   LearningRate 0.0433   Epoch: 6   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:17,763-Speed 3390.43 samples/sec   Loss 5.5697   LearningRate 0.0433   Epoch: 6   Global Step: 38870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:20,789-Speed 3385.79 samples/sec   Loss 5.6410   LearningRate 0.0433   Epoch: 6   Global Step: 38880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:23,817-Speed 3382.70 samples/sec   Loss 5.7323   LearningRate 0.0433   Epoch: 6   Global Step: 38890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:26,936-Speed 3284.37 samples/sec   Loss 5.6644   LearningRate 0.0433   Epoch: 6   Global Step: 38900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:29,979-Speed 3365.84 samples/sec   Loss 5.4646   LearningRate 0.0433   Epoch: 6   Global Step: 38910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:33,002-Speed 3388.20 samples/sec   Loss 5.6053   LearningRate 0.0433   Epoch: 6   Global Step: 38920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:27:36,000-Speed 3415.30 samples/sec   Loss 5.6353   LearningRate 0.0433   Epoch: 6   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:39,017-Speed 3395.64 samples/sec   Loss 5.4846   LearningRate 0.0432   Epoch: 6   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:42,043-Speed 3385.12 samples/sec   Loss 5.4924   LearningRate 0.0432   Epoch: 6   Global Step: 38950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:45,067-Speed 3386.44 samples/sec   Loss 5.7338   LearningRate 0.0432   Epoch: 6   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:48,137-Speed 3336.44 samples/sec   Loss 5.6570   LearningRate 0.0432   Epoch: 6   Global Step: 38970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:51,215-Speed 3327.25 samples/sec   Loss 5.4595   LearningRate 0.0432   Epoch: 6   Global Step: 38980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:54,238-Speed 3388.61 samples/sec   Loss 5.6142   LearningRate 0.0432   Epoch: 6   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:27:57,262-Speed 3387.03 samples/sec   Loss 5.5899   LearningRate 0.0432   Epoch: 6   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:00,286-Speed 3387.85 samples/sec   Loss 5.5959   LearningRate 0.0432   Epoch: 6   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:03,311-Speed 3385.47 samples/sec   Loss 5.5384   LearningRate 0.0431   Epoch: 6   Global Step: 39020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:06,332-Speed 3390.53 samples/sec   Loss 5.5895   LearningRate 0.0431   Epoch: 6   Global Step: 39030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:09,400-Speed 3338.74 samples/sec   Loss 5.5640   LearningRate 0.0431   Epoch: 6   Global Step: 39040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:12,436-Speed 3372.64 samples/sec   Loss 5.6058   LearningRate 0.0431   Epoch: 6   Global Step: 39050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:15,561-Speed 3277.87 samples/sec   Loss 5.5486   LearningRate 0.0431   Epoch: 6   Global Step: 39060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:18,583-Speed 3389.55 samples/sec   Loss 5.4597   LearningRate 0.0431   Epoch: 6   Global Step: 39070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:21,606-Speed 3388.85 samples/sec   Loss 5.6496   LearningRate 0.0431   Epoch: 6   Global Step: 39080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:24,628-Speed 3389.49 samples/sec   Loss 5.6757   LearningRate 0.0431   Epoch: 6   Global Step: 39090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:27,647-Speed 3392.08 samples/sec   Loss 5.6598   LearningRate 0.0431   Epoch: 6   Global Step: 39100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:30,675-Speed 3382.62 samples/sec   Loss 5.4905   LearningRate 0.0430   Epoch: 6   Global Step: 39110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:33,698-Speed 3387.92 samples/sec   Loss 5.8238   LearningRate 0.0430   Epoch: 6   Global Step: 39120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:36,713-Speed 3397.10 samples/sec   Loss 5.5962   LearningRate 0.0430   Epoch: 6   Global Step: 39130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:28:39,716-Speed 3410.89 samples/sec   Loss 5.5194   LearningRate 0.0430   Epoch: 6   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:42,740-Speed 3386.51 samples/sec   Loss 5.5278   LearningRate 0.0430   Epoch: 6   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:45,765-Speed 3386.34 samples/sec   Loss 5.6201   LearningRate 0.0430   Epoch: 6   Global Step: 39160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:50,078-Speed 2375.61 samples/sec   Loss 5.5726   LearningRate 0.0430   Epoch: 6   Global Step: 39170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:53,112-Speed 3375.84 samples/sec   Loss 5.5631   LearningRate 0.0430   Epoch: 6   Global Step: 39180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:56,138-Speed 3384.55 samples/sec   Loss 5.5543   LearningRate 0.0430   Epoch: 6   Global Step: 39190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:28:59,173-Speed 3375.32 samples/sec   Loss 5.7217   LearningRate 0.0429   Epoch: 6   Global Step: 39200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:02,193-Speed 3390.95 samples/sec   Loss 5.5733   LearningRate 0.0429   Epoch: 6   Global Step: 39210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:05,220-Speed 3384.01 samples/sec   Loss 5.5694   LearningRate 0.0429   Epoch: 6   Global Step: 39220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:08,252-Speed 3377.91 samples/sec   Loss 5.6998   LearningRate 0.0429   Epoch: 6   Global Step: 39230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:11,279-Speed 3384.02 samples/sec   Loss 5.6036   LearningRate 0.0429   Epoch: 6   Global Step: 39240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:29:14,303-Speed 3387.12 samples/sec   Loss 5.6292   LearningRate 0.0429   Epoch: 6   Global Step: 39250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:29:17,336-Speed 3377.13 samples/sec   Loss 5.5478   LearningRate 0.0429   Epoch: 6   Global Step: 39260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:29:20,359-Speed 3387.71 samples/sec   Loss 5.4595   LearningRate 0.0429   Epoch: 6   Global Step: 39270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:29:23,415-Speed 3351.20 samples/sec   Loss 5.5551   LearningRate 0.0428   Epoch: 6   Global Step: 39280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:29:26,485-Speed 3339.19 samples/sec   Loss 5.6182   LearningRate 0.0428   Epoch: 6   Global Step: 39290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:29:29,528-Speed 3365.53 samples/sec   Loss 5.5561   LearningRate 0.0428   Epoch: 6   Global Step: 39300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:29:32,540-Speed 3401.23 samples/sec   Loss 5.5702   LearningRate 0.0428   Epoch: 6   Global Step: 39310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:35,572-Speed 3377.58 samples/sec   Loss 5.5352   LearningRate 0.0428   Epoch: 6   Global Step: 39320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:38,610-Speed 3371.41 samples/sec   Loss 5.5926   LearningRate 0.0428   Epoch: 6   Global Step: 39330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:41,643-Speed 3377.10 samples/sec   Loss 5.4477   LearningRate 0.0428   Epoch: 6   Global Step: 39340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:44,662-Speed 3393.29 samples/sec   Loss 5.5815   LearningRate 0.0428   Epoch: 6   Global Step: 39350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:47,703-Speed 3368.24 samples/sec   Loss 5.7323   LearningRate 0.0428   Epoch: 6   Global Step: 39360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:50,790-Speed 3317.89 samples/sec   Loss 5.5455   LearningRate 0.0427   Epoch: 6   Global Step: 39370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:53,823-Speed 3376.28 samples/sec   Loss 5.4992   LearningRate 0.0427   Epoch: 6   Global Step: 39380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:56,869-Speed 3362.83 samples/sec   Loss 5.6810   LearningRate 0.0427   Epoch: 6   Global Step: 39390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:29:59,912-Speed 3365.73 samples/sec   Loss 5.5848   LearningRate 0.0427   Epoch: 6   Global Step: 39400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:30:02,965-Speed 3354.86 samples/sec   Loss 5.5473   LearningRate 0.0427   Epoch: 6   Global Step: 39410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:05,991-Speed 3385.40 samples/sec   Loss 5.5512   LearningRate 0.0427   Epoch: 6   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:09,018-Speed 3383.33 samples/sec   Loss 5.5630   LearningRate 0.0427   Epoch: 6   Global Step: 39430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:12,093-Speed 3330.66 samples/sec   Loss 5.4164   LearningRate 0.0427   Epoch: 6   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:15,149-Speed 3352.29 samples/sec   Loss 5.4893   LearningRate 0.0427   Epoch: 6   Global Step: 39450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:18,176-Speed 3382.47 samples/sec   Loss 5.5979   LearningRate 0.0426   Epoch: 6   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:21,200-Speed 3387.43 samples/sec   Loss 5.4851   LearningRate 0.0426   Epoch: 6   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:24,242-Speed 3367.39 samples/sec   Loss 5.6026   LearningRate 0.0426   Epoch: 6   Global Step: 39480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:27,283-Speed 3367.37 samples/sec   Loss 5.5925   LearningRate 0.0426   Epoch: 6   Global Step: 39490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:30,304-Speed 3390.79 samples/sec   Loss 5.4942   LearningRate 0.0426   Epoch: 6   Global Step: 39500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:33,316-Speed 3401.26 samples/sec   Loss 5.5888   LearningRate 0.0426   Epoch: 6   Global Step: 39510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:36,383-Speed 3338.99 samples/sec   Loss 5.5791   LearningRate 0.0426   Epoch: 6   Global Step: 39520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:30:39,414-Speed 3379.02 samples/sec   Loss 5.6264   LearningRate 0.0426   Epoch: 6   Global Step: 39530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:30:42,452-Speed 3371.62 samples/sec   Loss 5.5402   LearningRate 0.0426   Epoch: 6   Global Step: 39540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:30:45,476-Speed 3387.61 samples/sec   Loss 5.5738   LearningRate 0.0425   Epoch: 6   Global Step: 39550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:30:48,501-Speed 3385.42 samples/sec   Loss 5.6077   LearningRate 0.0425   Epoch: 6   Global Step: 39560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:30:51,529-Speed 3382.14 samples/sec   Loss 5.5230   LearningRate 0.0425   Epoch: 6   Global Step: 39570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:30:54,560-Speed 3379.91 samples/sec   Loss 5.5419   LearningRate 0.0425   Epoch: 6   Global Step: 39580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:30:57,582-Speed 3389.30 samples/sec   Loss 5.5050   LearningRate 0.0425   Epoch: 6   Global Step: 39590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:00,625-Speed 3365.24 samples/sec   Loss 5.6057   LearningRate 0.0425   Epoch: 6   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:03,661-Speed 3373.68 samples/sec   Loss 5.5732   LearningRate 0.0425   Epoch: 6   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:06,691-Speed 3381.05 samples/sec   Loss 5.4860   LearningRate 0.0425   Epoch: 6   Global Step: 39620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:09,712-Speed 3390.06 samples/sec   Loss 5.4491   LearningRate 0.0424   Epoch: 6   Global Step: 39630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:31:12,735-Speed 3388.83 samples/sec   Loss 5.5150   LearningRate 0.0424   Epoch: 6   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:31:15,761-Speed 3384.30 samples/sec   Loss 5.5302   LearningRate 0.0424   Epoch: 6   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:31:18,771-Speed 3403.34 samples/sec   Loss 5.5761   LearningRate 0.0424   Epoch: 6   Global Step: 39660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:21,809-Speed 3371.01 samples/sec   Loss 5.5208   LearningRate 0.0424   Epoch: 6   Global Step: 39670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:24,922-Speed 3290.32 samples/sec   Loss 5.4967   LearningRate 0.0424   Epoch: 6   Global Step: 39680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:27,955-Speed 3377.24 samples/sec   Loss 5.6878   LearningRate 0.0424   Epoch: 6   Global Step: 39690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:30,981-Speed 3385.97 samples/sec   Loss 5.5165   LearningRate 0.0424   Epoch: 6   Global Step: 39700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:34,004-Speed 3387.56 samples/sec   Loss 5.5908   LearningRate 0.0424   Epoch: 6   Global Step: 39710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:37,031-Speed 3383.92 samples/sec   Loss 5.4702   LearningRate 0.0423   Epoch: 6   Global Step: 39720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:40,066-Speed 3374.73 samples/sec   Loss 5.6229   LearningRate 0.0423   Epoch: 6   Global Step: 39730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:43,085-Speed 3391.91 samples/sec   Loss 5.5897   LearningRate 0.0423   Epoch: 6   Global Step: 39740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:31:46,103-Speed 3393.70 samples/sec   Loss 5.5450   LearningRate 0.0423   Epoch: 6   Global Step: 39750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:31:49,140-Speed 3372.70 samples/sec   Loss 5.6290   LearningRate 0.0423   Epoch: 6   Global Step: 39760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:31:52,168-Speed 3382.72 samples/sec   Loss 5.4820   LearningRate 0.0423   Epoch: 6   Global Step: 39770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:31:55,190-Speed 3389.12 samples/sec   Loss 5.4478   LearningRate 0.0423   Epoch: 6   Global Step: 39780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:31:58,212-Speed 3389.20 samples/sec   Loss 5.6044   LearningRate 0.0423   Epoch: 6   Global Step: 39790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:32:01,325-Speed 3290.66 samples/sec   Loss 5.5053   LearningRate 0.0423   Epoch: 6   Global Step: 39800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:32:14,630-Speed 769.68 samples/sec   Loss 5.0398   LearningRate 0.0422   Epoch: 7   Global Step: 39810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:32:17,660-Speed 3380.45 samples/sec   Loss 4.8569   LearningRate 0.0422   Epoch: 7   Global Step: 39820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:32:20,680-Speed 3391.69 samples/sec   Loss 4.8508   LearningRate 0.0422   Epoch: 7   Global Step: 39830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:32:23,757-Speed 3329.13 samples/sec   Loss 4.7991   LearningRate 0.0422   Epoch: 7   Global Step: 39840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:32:26,799-Speed 3367.13 samples/sec   Loss 4.9876   LearningRate 0.0422   Epoch: 7   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:29,829-Speed 3379.76 samples/sec   Loss 5.0945   LearningRate 0.0422   Epoch: 7   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:32,850-Speed 3390.73 samples/sec   Loss 5.0478   LearningRate 0.0422   Epoch: 7   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:35,886-Speed 3374.02 samples/sec   Loss 4.9230   LearningRate 0.0422   Epoch: 7   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:38,916-Speed 3380.29 samples/sec   Loss 5.1305   LearningRate 0.0421   Epoch: 7   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:41,939-Speed 3387.31 samples/sec   Loss 4.9757   LearningRate 0.0421   Epoch: 7   Global Step: 39900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:44,970-Speed 3380.27 samples/sec   Loss 4.9383   LearningRate 0.0421   Epoch: 7   Global Step: 39910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:47,989-Speed 3391.50 samples/sec   Loss 4.9442   LearningRate 0.0421   Epoch: 7   Global Step: 39920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:51,013-Speed 3387.67 samples/sec   Loss 5.0124   LearningRate 0.0421   Epoch: 7   Global Step: 39930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:54,037-Speed 3387.48 samples/sec   Loss 5.0744   LearningRate 0.0421   Epoch: 7   Global Step: 39940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:32:57,069-Speed 3377.02 samples/sec   Loss 4.8463   LearningRate 0.0421   Epoch: 7   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:33:00,120-Speed 3357.59 samples/sec   Loss 4.9656   LearningRate 0.0421   Epoch: 7   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:33:03,191-Speed 3336.05 samples/sec   Loss 4.9603   LearningRate 0.0421   Epoch: 7   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:33:06,213-Speed 3388.26 samples/sec   Loss 5.0118   LearningRate 0.0420   Epoch: 7   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:33:09,245-Speed 3378.64 samples/sec   Loss 5.2516   LearningRate 0.0420   Epoch: 7   Global Step: 39990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:33:12,271-Speed 3384.28 samples/sec   Loss 5.0008   LearningRate 0.0420   Epoch: 7   Global Step: 40000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:33:55,883-[lfw][40000]XNorm: 21.172970
Training: 2022-04-27 05:33:55,883-[lfw][40000]Accuracy-Flip: 0.99700+-0.00306
Training: 2022-04-27 05:33:55,884-[lfw][40000]Accuracy-Highest: 0.99817
Training: 2022-04-27 05:34:46,178-[cfp_fp][40000]XNorm: 18.940806
Training: 2022-04-27 05:34:46,179-[cfp_fp][40000]Accuracy-Flip: 0.96057+-0.00920
Training: 2022-04-27 05:34:46,179-[cfp_fp][40000]Accuracy-Highest: 0.96057
Training: 2022-04-27 05:35:29,440-[agedb_30][40000]XNorm: 20.838208
Training: 2022-04-27 05:35:29,441-[agedb_30][40000]Accuracy-Flip: 0.97533+-0.00812
Training: 2022-04-27 05:35:29,441-[agedb_30][40000]Accuracy-Highest: 0.97533
Training: 2022-04-27 05:35:32,454-Speed 73.05 samples/sec   Loss 5.0428   LearningRate 0.0420   Epoch: 7   Global Step: 40010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:35:35,467-Speed 3399.76 samples/sec   Loss 5.0671   LearningRate 0.0420   Epoch: 7   Global Step: 40020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:35:38,479-Speed 3400.95 samples/sec   Loss 5.1549   LearningRate 0.0420   Epoch: 7   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:35:41,500-Speed 3390.09 samples/sec   Loss 5.1226   LearningRate 0.0420   Epoch: 7   Global Step: 40040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:35:44,509-Speed 3403.95 samples/sec   Loss 5.0886   LearningRate 0.0420   Epoch: 7   Global Step: 40050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:35:47,517-Speed 3404.14 samples/sec   Loss 5.0672   LearningRate 0.0420   Epoch: 7   Global Step: 40060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:35:50,531-Speed 3398.65 samples/sec   Loss 4.8689   LearningRate 0.0419   Epoch: 7   Global Step: 40070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:35:53,542-Speed 3401.45 samples/sec   Loss 5.0351   LearningRate 0.0419   Epoch: 7   Global Step: 40080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:35:56,559-Speed 3395.99 samples/sec   Loss 5.2905   LearningRate 0.0419   Epoch: 7   Global Step: 40090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:35:59,575-Speed 3396.16 samples/sec   Loss 5.0529   LearningRate 0.0419   Epoch: 7   Global Step: 40100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:02,592-Speed 3394.06 samples/sec   Loss 4.9863   LearningRate 0.0419   Epoch: 7   Global Step: 40110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:05,607-Speed 3397.94 samples/sec   Loss 5.1967   LearningRate 0.0419   Epoch: 7   Global Step: 40120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:08,605-Speed 3416.21 samples/sec   Loss 5.0403   LearningRate 0.0419   Epoch: 7   Global Step: 40130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:11,627-Speed 3389.03 samples/sec   Loss 5.1600   LearningRate 0.0419   Epoch: 7   Global Step: 40140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:14,763-Speed 3265.50 samples/sec   Loss 5.1438   LearningRate 0.0419   Epoch: 7   Global Step: 40150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:17,802-Speed 3370.75 samples/sec   Loss 5.0758   LearningRate 0.0418   Epoch: 7   Global Step: 40160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:20,817-Speed 3397.33 samples/sec   Loss 5.3032   LearningRate 0.0418   Epoch: 7   Global Step: 40170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:23,828-Speed 3401.41 samples/sec   Loss 5.1906   LearningRate 0.0418   Epoch: 7   Global Step: 40180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:26,844-Speed 3395.91 samples/sec   Loss 5.1433   LearningRate 0.0418   Epoch: 7   Global Step: 40190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:29,859-Speed 3397.33 samples/sec   Loss 5.0779   LearningRate 0.0418   Epoch: 7   Global Step: 40200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:32,875-Speed 3395.95 samples/sec   Loss 5.1724   LearningRate 0.0418   Epoch: 7   Global Step: 40210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:35,901-Speed 3384.65 samples/sec   Loss 5.1266   LearningRate 0.0418   Epoch: 7   Global Step: 40220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:36:38,947-Speed 3363.06 samples/sec   Loss 5.1030   LearningRate 0.0418   Epoch: 7   Global Step: 40230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:36:41,977-Speed 3381.56 samples/sec   Loss 5.0807   LearningRate 0.0418   Epoch: 7   Global Step: 40240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:36:44,994-Speed 3394.47 samples/sec   Loss 5.2011   LearningRate 0.0417   Epoch: 7   Global Step: 40250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:36:48,034-Speed 3369.15 samples/sec   Loss 5.1048   LearningRate 0.0417   Epoch: 7   Global Step: 40260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:36:51,068-Speed 3375.89 samples/sec   Loss 5.1897   LearningRate 0.0417   Epoch: 7   Global Step: 40270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:36:54,081-Speed 3399.98 samples/sec   Loss 5.2099   LearningRate 0.0417   Epoch: 7   Global Step: 40280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:36:57,093-Speed 3400.12 samples/sec   Loss 5.1365   LearningRate 0.0417   Epoch: 7   Global Step: 40290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:00,101-Speed 3404.53 samples/sec   Loss 5.1653   LearningRate 0.0417   Epoch: 7   Global Step: 40300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:03,178-Speed 3328.83 samples/sec   Loss 5.1409   LearningRate 0.0417   Epoch: 7   Global Step: 40310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:06,194-Speed 3396.84 samples/sec   Loss 5.2287   LearningRate 0.0417   Epoch: 7   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:09,189-Speed 3419.44 samples/sec   Loss 5.1428   LearningRate 0.0416   Epoch: 7   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:12,208-Speed 3392.65 samples/sec   Loss 5.2801   LearningRate 0.0416   Epoch: 7   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:15,234-Speed 3384.86 samples/sec   Loss 5.1528   LearningRate 0.0416   Epoch: 7   Global Step: 40350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:18,255-Speed 3390.64 samples/sec   Loss 5.1602   LearningRate 0.0416   Epoch: 7   Global Step: 40360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:21,270-Speed 3398.02 samples/sec   Loss 5.2740   LearningRate 0.0416   Epoch: 7   Global Step: 40370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:24,288-Speed 3393.80 samples/sec   Loss 5.3421   LearningRate 0.0416   Epoch: 7   Global Step: 40380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:27,298-Speed 3403.09 samples/sec   Loss 5.1532   LearningRate 0.0416   Epoch: 7   Global Step: 40390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:30,310-Speed 3400.02 samples/sec   Loss 5.2033   LearningRate 0.0416   Epoch: 7   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:33,321-Speed 3401.13 samples/sec   Loss 5.2207   LearningRate 0.0416   Epoch: 7   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:36,341-Speed 3392.06 samples/sec   Loss 5.3025   LearningRate 0.0415   Epoch: 7   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:37:39,359-Speed 3393.73 samples/sec   Loss 5.0454   LearningRate 0.0415   Epoch: 7   Global Step: 40430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:42,379-Speed 3392.74 samples/sec   Loss 5.3343   LearningRate 0.0415   Epoch: 7   Global Step: 40440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:45,387-Speed 3404.26 samples/sec   Loss 5.1913   LearningRate 0.0415   Epoch: 7   Global Step: 40450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:48,403-Speed 3395.88 samples/sec   Loss 5.2000   LearningRate 0.0415   Epoch: 7   Global Step: 40460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:51,483-Speed 3325.82 samples/sec   Loss 5.3185   LearningRate 0.0415   Epoch: 7   Global Step: 40470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:54,520-Speed 3372.37 samples/sec   Loss 5.2410   LearningRate 0.0415   Epoch: 7   Global Step: 40480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:37:57,531-Speed 3402.28 samples/sec   Loss 5.3269   LearningRate 0.0415   Epoch: 7   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:38:00,528-Speed 3417.00 samples/sec   Loss 5.1341   LearningRate 0.0415   Epoch: 7   Global Step: 40500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:03,538-Speed 3403.02 samples/sec   Loss 5.1540   LearningRate 0.0414   Epoch: 7   Global Step: 40510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:06,552-Speed 3398.06 samples/sec   Loss 5.1862   LearningRate 0.0414   Epoch: 7   Global Step: 40520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:09,572-Speed 3391.66 samples/sec   Loss 5.1996   LearningRate 0.0414   Epoch: 7   Global Step: 40530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:12,594-Speed 3389.13 samples/sec   Loss 5.2199   LearningRate 0.0414   Epoch: 7   Global Step: 40540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:15,634-Speed 3369.53 samples/sec   Loss 5.2084   LearningRate 0.0414   Epoch: 7   Global Step: 40550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:18,663-Speed 3381.75 samples/sec   Loss 5.2612   LearningRate 0.0414   Epoch: 7   Global Step: 40560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:21,675-Speed 3400.76 samples/sec   Loss 5.4461   LearningRate 0.0414   Epoch: 7   Global Step: 40570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:24,694-Speed 3391.91 samples/sec   Loss 5.1609   LearningRate 0.0414   Epoch: 7   Global Step: 40580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:27,781-Speed 3317.94 samples/sec   Loss 5.3188   LearningRate 0.0414   Epoch: 7   Global Step: 40590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:30,849-Speed 3339.12 samples/sec   Loss 5.1810   LearningRate 0.0413   Epoch: 7   Global Step: 40600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:38:33,860-Speed 3401.32 samples/sec   Loss 5.2394   LearningRate 0.0413   Epoch: 7   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 05:38:36,864-Speed 3409.89 samples/sec   Loss 5.2655   LearningRate 0.0413   Epoch: 7   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 05:38:39,889-Speed 3386.77 samples/sec   Loss 5.4035   LearningRate 0.0413   Epoch: 7   Global Step: 40630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:38:42,915-Speed 3384.97 samples/sec   Loss 5.4429   LearningRate 0.0413   Epoch: 7   Global Step: 40640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:38:45,927-Speed 3400.15 samples/sec   Loss 5.2148   LearningRate 0.0413   Epoch: 7   Global Step: 40650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:38:48,939-Speed 3400.67 samples/sec   Loss 5.3484   LearningRate 0.0413   Epoch: 7   Global Step: 40660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:38:51,955-Speed 3395.51 samples/sec   Loss 5.2749   LearningRate 0.0413   Epoch: 7   Global Step: 40670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:38:54,981-Speed 3385.33 samples/sec   Loss 5.3446   LearningRate 0.0413   Epoch: 7   Global Step: 40680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 05:38:57,998-Speed 3394.50 samples/sec   Loss 5.2075   LearningRate 0.0412   Epoch: 7   Global Step: 40690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:39:01,020-Speed 3388.59 samples/sec   Loss 5.2223   LearningRate 0.0412   Epoch: 7   Global Step: 40700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:39:04,085-Speed 3341.81 samples/sec   Loss 5.2295   LearningRate 0.0412   Epoch: 7   Global Step: 40710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:39:07,111-Speed 3385.60 samples/sec   Loss 5.3813   LearningRate 0.0412   Epoch: 7   Global Step: 40720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:39:10,125-Speed 3397.58 samples/sec   Loss 5.3906   LearningRate 0.0412   Epoch: 7   Global Step: 40730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:13,143-Speed 3394.45 samples/sec   Loss 5.2987   LearningRate 0.0412   Epoch: 7   Global Step: 40740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:16,166-Speed 3387.75 samples/sec   Loss 5.2909   LearningRate 0.0412   Epoch: 7   Global Step: 40750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:19,185-Speed 3392.84 samples/sec   Loss 5.3629   LearningRate 0.0412   Epoch: 7   Global Step: 40760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:22,208-Speed 3387.98 samples/sec   Loss 5.2344   LearningRate 0.0412   Epoch: 7   Global Step: 40770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:25,231-Speed 3389.24 samples/sec   Loss 5.3814   LearningRate 0.0411   Epoch: 7   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:28,257-Speed 3384.18 samples/sec   Loss 5.3018   LearningRate 0.0411   Epoch: 7   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:31,295-Speed 3371.42 samples/sec   Loss 5.3043   LearningRate 0.0411   Epoch: 7   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:34,327-Speed 3378.26 samples/sec   Loss 5.1587   LearningRate 0.0411   Epoch: 7   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:37,342-Speed 3397.29 samples/sec   Loss 5.3989   LearningRate 0.0411   Epoch: 7   Global Step: 40820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:39:40,360-Speed 3393.60 samples/sec   Loss 5.3110   LearningRate 0.0411   Epoch: 7   Global Step: 40830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:39:43,378-Speed 3394.59 samples/sec   Loss 5.2883   LearningRate 0.0411   Epoch: 7   Global Step: 40840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:39:46,410-Speed 3378.22 samples/sec   Loss 5.3168   LearningRate 0.0411   Epoch: 7   Global Step: 40850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:39:49,465-Speed 3352.13 samples/sec   Loss 5.4209   LearningRate 0.0410   Epoch: 7   Global Step: 40860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:39:52,502-Speed 3372.15 samples/sec   Loss 5.3123   LearningRate 0.0410   Epoch: 7   Global Step: 40870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:39:55,520-Speed 3394.49 samples/sec   Loss 5.3772   LearningRate 0.0410   Epoch: 7   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:39:58,544-Speed 3386.92 samples/sec   Loss 5.2280   LearningRate 0.0410   Epoch: 7   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:01,564-Speed 3392.12 samples/sec   Loss 5.4381   LearningRate 0.0410   Epoch: 7   Global Step: 40900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:04,588-Speed 3387.07 samples/sec   Loss 5.4348   LearningRate 0.0410   Epoch: 7   Global Step: 40910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:07,612-Speed 3386.86 samples/sec   Loss 5.2654   LearningRate 0.0410   Epoch: 7   Global Step: 40920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:10,624-Speed 3400.72 samples/sec   Loss 5.2255   LearningRate 0.0410   Epoch: 7   Global Step: 40930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:13,638-Speed 3397.46 samples/sec   Loss 5.3370   LearningRate 0.0410   Epoch: 7   Global Step: 40940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:16,657-Speed 3393.66 samples/sec   Loss 5.1888   LearningRate 0.0409   Epoch: 7   Global Step: 40950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:19,674-Speed 3394.23 samples/sec   Loss 5.2304   LearningRate 0.0409   Epoch: 7   Global Step: 40960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:22,694-Speed 3392.17 samples/sec   Loss 5.2830   LearningRate 0.0409   Epoch: 7   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:25,711-Speed 3394.58 samples/sec   Loss 5.2786   LearningRate 0.0409   Epoch: 7   Global Step: 40980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:28,724-Speed 3399.65 samples/sec   Loss 5.3427   LearningRate 0.0409   Epoch: 7   Global Step: 40990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:31,740-Speed 3395.92 samples/sec   Loss 5.3647   LearningRate 0.0409   Epoch: 7   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:40:34,745-Speed 3408.92 samples/sec   Loss 5.2449   LearningRate 0.0409   Epoch: 7   Global Step: 41010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:37,766-Speed 3389.63 samples/sec   Loss 5.2530   LearningRate 0.0409   Epoch: 7   Global Step: 41020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:40,790-Speed 3387.36 samples/sec   Loss 5.3864   LearningRate 0.0409   Epoch: 7   Global Step: 41030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:43,804-Speed 3398.16 samples/sec   Loss 5.2309   LearningRate 0.0408   Epoch: 7   Global Step: 41040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:46,824-Speed 3392.06 samples/sec   Loss 5.3567   LearningRate 0.0408   Epoch: 7   Global Step: 41050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:49,842-Speed 3393.11 samples/sec   Loss 5.2320   LearningRate 0.0408   Epoch: 7   Global Step: 41060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:52,859-Speed 3395.38 samples/sec   Loss 5.2927   LearningRate 0.0408   Epoch: 7   Global Step: 41070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:55,879-Speed 3391.35 samples/sec   Loss 5.4646   LearningRate 0.0408   Epoch: 7   Global Step: 41080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:40:58,908-Speed 3381.89 samples/sec   Loss 5.2921   LearningRate 0.0408   Epoch: 7   Global Step: 41090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:01,929-Speed 3389.88 samples/sec   Loss 5.3279   LearningRate 0.0408   Epoch: 7   Global Step: 41100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:04,951-Speed 3389.14 samples/sec   Loss 5.2276   LearningRate 0.0408   Epoch: 7   Global Step: 41110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:41:07,978-Speed 3383.83 samples/sec   Loss 5.3539   LearningRate 0.0408   Epoch: 7   Global Step: 41120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:41:10,996-Speed 3393.49 samples/sec   Loss 5.2430   LearningRate 0.0407   Epoch: 7   Global Step: 41130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:41:14,021-Speed 3385.72 samples/sec   Loss 5.2516   LearningRate 0.0407   Epoch: 7   Global Step: 41140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:41:17,053-Speed 3379.06 samples/sec   Loss 5.2645   LearningRate 0.0407   Epoch: 7   Global Step: 41150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:41:20,051-Speed 3416.09 samples/sec   Loss 5.3488   LearningRate 0.0407   Epoch: 7   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:23,069-Speed 3394.63 samples/sec   Loss 5.3185   LearningRate 0.0407   Epoch: 7   Global Step: 41170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:26,089-Speed 3391.21 samples/sec   Loss 5.2533   LearningRate 0.0407   Epoch: 7   Global Step: 41180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:29,112-Speed 3387.41 samples/sec   Loss 5.2918   LearningRate 0.0407   Epoch: 7   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:32,130-Speed 3394.10 samples/sec   Loss 5.2540   LearningRate 0.0407   Epoch: 7   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:35,148-Speed 3393.34 samples/sec   Loss 5.2229   LearningRate 0.0407   Epoch: 7   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:38,171-Speed 3388.77 samples/sec   Loss 5.4404   LearningRate 0.0406   Epoch: 7   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:41,197-Speed 3384.37 samples/sec   Loss 5.2475   LearningRate 0.0406   Epoch: 7   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:44,215-Speed 3393.12 samples/sec   Loss 5.3928   LearningRate 0.0406   Epoch: 7   Global Step: 41240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:47,238-Speed 3389.02 samples/sec   Loss 5.3483   LearningRate 0.0406   Epoch: 7   Global Step: 41250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:50,258-Speed 3391.44 samples/sec   Loss 5.3338   LearningRate 0.0406   Epoch: 7   Global Step: 41260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:41:53,257-Speed 3416.21 samples/sec   Loss 5.5069   LearningRate 0.0406   Epoch: 7   Global Step: 41270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:56,274-Speed 3394.63 samples/sec   Loss 5.3644   LearningRate 0.0406   Epoch: 7   Global Step: 41280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:41:59,296-Speed 3389.63 samples/sec   Loss 5.3598   LearningRate 0.0406   Epoch: 7   Global Step: 41290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:02,322-Speed 3384.56 samples/sec   Loss 5.3063   LearningRate 0.0406   Epoch: 7   Global Step: 41300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:05,346-Speed 3387.53 samples/sec   Loss 5.1989   LearningRate 0.0405   Epoch: 7   Global Step: 41310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:08,363-Speed 3394.13 samples/sec   Loss 5.4281   LearningRate 0.0405   Epoch: 7   Global Step: 41320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:11,428-Speed 3342.14 samples/sec   Loss 5.3300   LearningRate 0.0405   Epoch: 7   Global Step: 41330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:14,461-Speed 3376.41 samples/sec   Loss 5.2099   LearningRate 0.0405   Epoch: 7   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:17,486-Speed 3385.98 samples/sec   Loss 5.3388   LearningRate 0.0405   Epoch: 7   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:20,506-Speed 3392.25 samples/sec   Loss 5.2770   LearningRate 0.0405   Epoch: 7   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:23,511-Speed 3408.65 samples/sec   Loss 5.3447   LearningRate 0.0405   Epoch: 7   Global Step: 41370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:26,534-Speed 3387.34 samples/sec   Loss 5.2242   LearningRate 0.0405   Epoch: 7   Global Step: 41380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:29,555-Speed 3390.34 samples/sec   Loss 5.2952   LearningRate 0.0405   Epoch: 7   Global Step: 41390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:32,583-Speed 3383.41 samples/sec   Loss 5.2577   LearningRate 0.0404   Epoch: 7   Global Step: 41400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:35,606-Speed 3388.29 samples/sec   Loss 5.3719   LearningRate 0.0404   Epoch: 7   Global Step: 41410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:38,632-Speed 3384.75 samples/sec   Loss 5.2577   LearningRate 0.0404   Epoch: 7   Global Step: 41420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:41,745-Speed 3290.10 samples/sec   Loss 5.1217   LearningRate 0.0404   Epoch: 7   Global Step: 41430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:44,765-Speed 3391.63 samples/sec   Loss 5.2532   LearningRate 0.0404   Epoch: 7   Global Step: 41440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:47,785-Speed 3390.95 samples/sec   Loss 5.2677   LearningRate 0.0404   Epoch: 7   Global Step: 41450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:50,811-Speed 3385.09 samples/sec   Loss 5.3759   LearningRate 0.0404   Epoch: 7   Global Step: 41460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:42:53,834-Speed 3388.81 samples/sec   Loss 5.1643   LearningRate 0.0404   Epoch: 7   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:42:56,857-Speed 3387.02 samples/sec   Loss 5.3134   LearningRate 0.0404   Epoch: 7   Global Step: 41480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:42:59,881-Speed 3387.67 samples/sec   Loss 5.3088   LearningRate 0.0403   Epoch: 7   Global Step: 41490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:43:02,909-Speed 3382.36 samples/sec   Loss 5.2463   LearningRate 0.0403   Epoch: 7   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:43:05,935-Speed 3385.31 samples/sec   Loss 5.2835   LearningRate 0.0403   Epoch: 7   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:43:08,942-Speed 3405.59 samples/sec   Loss 5.2284   LearningRate 0.0403   Epoch: 7   Global Step: 41520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:11,969-Speed 3383.43 samples/sec   Loss 5.3292   LearningRate 0.0403   Epoch: 7   Global Step: 41530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:14,997-Speed 3382.95 samples/sec   Loss 5.3665   LearningRate 0.0403   Epoch: 7   Global Step: 41540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:18,026-Speed 3382.11 samples/sec   Loss 5.3701   LearningRate 0.0403   Epoch: 7   Global Step: 41550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:21,056-Speed 3380.00 samples/sec   Loss 5.3025   LearningRate 0.0403   Epoch: 7   Global Step: 41560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:24,086-Speed 3380.14 samples/sec   Loss 5.2401   LearningRate 0.0403   Epoch: 7   Global Step: 41570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:27,182-Speed 3307.94 samples/sec   Loss 5.3171   LearningRate 0.0402   Epoch: 7   Global Step: 41580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:30,216-Speed 3375.96 samples/sec   Loss 5.4217   LearningRate 0.0402   Epoch: 7   Global Step: 41590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:33,243-Speed 3384.30 samples/sec   Loss 5.4705   LearningRate 0.0402   Epoch: 7   Global Step: 41600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:36,275-Speed 3378.12 samples/sec   Loss 5.3292   LearningRate 0.0402   Epoch: 7   Global Step: 41610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:39,368-Speed 3311.77 samples/sec   Loss 5.3149   LearningRate 0.0402   Epoch: 7   Global Step: 41620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:43:42,390-Speed 3388.41 samples/sec   Loss 5.2027   LearningRate 0.0402   Epoch: 7   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:43:45,412-Speed 3389.39 samples/sec   Loss 5.1316   LearningRate 0.0402   Epoch: 7   Global Step: 41640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:43:48,435-Speed 3388.15 samples/sec   Loss 5.2285   LearningRate 0.0402   Epoch: 7   Global Step: 41650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:43:51,453-Speed 3393.64 samples/sec   Loss 5.2987   LearningRate 0.0402   Epoch: 7   Global Step: 41660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:54,476-Speed 3388.00 samples/sec   Loss 5.3224   LearningRate 0.0401   Epoch: 7   Global Step: 41670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:43:57,505-Speed 3381.87 samples/sec   Loss 5.2207   LearningRate 0.0401   Epoch: 7   Global Step: 41680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:00,560-Speed 3352.44 samples/sec   Loss 5.2909   LearningRate 0.0401   Epoch: 7   Global Step: 41690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:03,611-Speed 3357.12 samples/sec   Loss 5.3418   LearningRate 0.0401   Epoch: 7   Global Step: 41700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:06,637-Speed 3385.72 samples/sec   Loss 5.2066   LearningRate 0.0401   Epoch: 7   Global Step: 41710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:09,663-Speed 3384.66 samples/sec   Loss 5.2379   LearningRate 0.0401   Epoch: 7   Global Step: 41720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:12,764-Speed 3302.66 samples/sec   Loss 5.4159   LearningRate 0.0401   Epoch: 7   Global Step: 41730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:15,826-Speed 3345.03 samples/sec   Loss 5.2950   LearningRate 0.0401   Epoch: 7   Global Step: 41740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:18,852-Speed 3384.55 samples/sec   Loss 5.3261   LearningRate 0.0401   Epoch: 7   Global Step: 41750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:21,882-Speed 3380.24 samples/sec   Loss 5.2282   LearningRate 0.0400   Epoch: 7   Global Step: 41760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:44:24,891-Speed 3404.25 samples/sec   Loss 5.2437   LearningRate 0.0400   Epoch: 7   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:27,922-Speed 3378.75 samples/sec   Loss 5.2302   LearningRate 0.0400   Epoch: 7   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:30,953-Speed 3379.69 samples/sec   Loss 5.4393   LearningRate 0.0400   Epoch: 7   Global Step: 41790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:33,981-Speed 3382.51 samples/sec   Loss 5.2870   LearningRate 0.0400   Epoch: 7   Global Step: 41800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:37,009-Speed 3382.54 samples/sec   Loss 5.4606   LearningRate 0.0400   Epoch: 7   Global Step: 41810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:40,052-Speed 3365.82 samples/sec   Loss 5.4154   LearningRate 0.0400   Epoch: 7   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:43,079-Speed 3383.82 samples/sec   Loss 5.2817   LearningRate 0.0400   Epoch: 7   Global Step: 41830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:46,099-Speed 3391.49 samples/sec   Loss 5.3449   LearningRate 0.0400   Epoch: 7   Global Step: 41840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:49,126-Speed 3382.86 samples/sec   Loss 5.3883   LearningRate 0.0399   Epoch: 7   Global Step: 41850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:52,149-Speed 3388.88 samples/sec   Loss 5.2727   LearningRate 0.0399   Epoch: 7   Global Step: 41860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:44:55,171-Speed 3388.70 samples/sec   Loss 5.2848   LearningRate 0.0399   Epoch: 7   Global Step: 41870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:44:58,325-Speed 3247.88 samples/sec   Loss 5.5014   LearningRate 0.0399   Epoch: 7   Global Step: 41880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:45:01,388-Speed 3344.05 samples/sec   Loss 5.4324   LearningRate 0.0399   Epoch: 7   Global Step: 41890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:45:04,427-Speed 3370.82 samples/sec   Loss 5.3020   LearningRate 0.0399   Epoch: 7   Global Step: 41900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:45:07,450-Speed 3387.58 samples/sec   Loss 5.2393   LearningRate 0.0399   Epoch: 7   Global Step: 41910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:45:10,474-Speed 3387.49 samples/sec   Loss 5.4133   LearningRate 0.0399   Epoch: 7   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:45:13,477-Speed 3410.12 samples/sec   Loss 5.3490   LearningRate 0.0399   Epoch: 7   Global Step: 41930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:45:16,500-Speed 3388.55 samples/sec   Loss 5.3885   LearningRate 0.0398   Epoch: 7   Global Step: 41940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:45:19,526-Speed 3384.17 samples/sec   Loss 5.2427   LearningRate 0.0398   Epoch: 7   Global Step: 41950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:45:22,552-Speed 3384.94 samples/sec   Loss 5.3903   LearningRate 0.0398   Epoch: 7   Global Step: 41960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:45:25,582-Speed 3381.07 samples/sec   Loss 5.2481   LearningRate 0.0398   Epoch: 7   Global Step: 41970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:45:28,612-Speed 3379.82 samples/sec   Loss 5.2013   LearningRate 0.0398   Epoch: 7   Global Step: 41980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:45:31,634-Speed 3389.76 samples/sec   Loss 5.4006   LearningRate 0.0398   Epoch: 7   Global Step: 41990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:45:34,673-Speed 3370.01 samples/sec   Loss 5.3380   LearningRate 0.0398   Epoch: 7   Global Step: 42000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:46:18,109-[lfw][42000]XNorm: 21.286959
Training: 2022-04-27 05:46:18,110-[lfw][42000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-27 05:46:18,110-[lfw][42000]Accuracy-Highest: 0.99817
Training: 2022-04-27 05:47:08,563-[cfp_fp][42000]XNorm: 18.559626
Training: 2022-04-27 05:47:08,563-[cfp_fp][42000]Accuracy-Flip: 0.95900+-0.00761
Training: 2022-04-27 05:47:08,564-[cfp_fp][42000]Accuracy-Highest: 0.96057
Training: 2022-04-27 05:47:52,060-[agedb_30][42000]XNorm: 21.462667
Training: 2022-04-27 05:47:52,060-[agedb_30][42000]Accuracy-Flip: 0.97767+-0.00786
Training: 2022-04-27 05:47:52,061-[agedb_30][42000]Accuracy-Highest: 0.97767
Training: 2022-04-27 05:47:55,071-Speed 72.94 samples/sec   Loss 5.2957   LearningRate 0.0398   Epoch: 7   Global Step: 42010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:47:58,080-Speed 3404.71 samples/sec   Loss 5.2181   LearningRate 0.0398   Epoch: 7   Global Step: 42020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:01,082-Speed 3411.89 samples/sec   Loss 5.3289   LearningRate 0.0397   Epoch: 7   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:04,091-Speed 3403.88 samples/sec   Loss 5.2807   LearningRate 0.0397   Epoch: 7   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:07,079-Speed 3427.53 samples/sec   Loss 5.3586   LearningRate 0.0397   Epoch: 7   Global Step: 42050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:10,102-Speed 3388.65 samples/sec   Loss 5.3848   LearningRate 0.0397   Epoch: 7   Global Step: 42060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:13,120-Speed 3393.59 samples/sec   Loss 5.3208   LearningRate 0.0397   Epoch: 7   Global Step: 42070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:16,147-Speed 3384.18 samples/sec   Loss 5.3173   LearningRate 0.0397   Epoch: 7   Global Step: 42080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:19,163-Speed 3396.12 samples/sec   Loss 5.2076   LearningRate 0.0397   Epoch: 7   Global Step: 42090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:22,194-Speed 3388.65 samples/sec   Loss 5.2705   LearningRate 0.0397   Epoch: 7   Global Step: 42100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:25,208-Speed 3397.48 samples/sec   Loss 5.1888   LearningRate 0.0397   Epoch: 7   Global Step: 42110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:28,229-Speed 3390.64 samples/sec   Loss 5.2676   LearningRate 0.0396   Epoch: 7   Global Step: 42120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:31,243-Speed 3398.22 samples/sec   Loss 5.4252   LearningRate 0.0396   Epoch: 7   Global Step: 42130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:34,269-Speed 3385.50 samples/sec   Loss 5.3550   LearningRate 0.0396   Epoch: 7   Global Step: 42140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:48:37,291-Speed 3389.32 samples/sec   Loss 5.3062   LearningRate 0.0396   Epoch: 7   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:40,334-Speed 3364.93 samples/sec   Loss 5.2551   LearningRate 0.0396   Epoch: 7   Global Step: 42160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:43,357-Speed 3388.30 samples/sec   Loss 5.4049   LearningRate 0.0396   Epoch: 7   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:46,383-Speed 3385.39 samples/sec   Loss 5.2779   LearningRate 0.0396   Epoch: 7   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:49,409-Speed 3385.23 samples/sec   Loss 5.2735   LearningRate 0.0396   Epoch: 7   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:52,433-Speed 3386.82 samples/sec   Loss 5.2290   LearningRate 0.0396   Epoch: 7   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:55,475-Speed 3367.62 samples/sec   Loss 5.2662   LearningRate 0.0395   Epoch: 7   Global Step: 42210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:48:58,529-Speed 3353.45 samples/sec   Loss 5.2631   LearningRate 0.0395   Epoch: 7   Global Step: 42220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:01,616-Speed 3317.81 samples/sec   Loss 5.3379   LearningRate 0.0395   Epoch: 7   Global Step: 42230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:04,645-Speed 3381.71 samples/sec   Loss 5.3068   LearningRate 0.0395   Epoch: 7   Global Step: 42240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:07,649-Speed 3409.83 samples/sec   Loss 5.2665   LearningRate 0.0395   Epoch: 7   Global Step: 42250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:10,671-Speed 3389.20 samples/sec   Loss 5.3267   LearningRate 0.0395   Epoch: 7   Global Step: 42260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:13,691-Speed 3391.81 samples/sec   Loss 5.3318   LearningRate 0.0395   Epoch: 7   Global Step: 42270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:16,719-Speed 3383.07 samples/sec   Loss 5.0789   LearningRate 0.0395   Epoch: 7   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:19,749-Speed 3379.72 samples/sec   Loss 5.3608   LearningRate 0.0395   Epoch: 7   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:22,812-Speed 3344.01 samples/sec   Loss 5.1940   LearningRate 0.0394   Epoch: 7   Global Step: 42300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:25,832-Speed 3392.01 samples/sec   Loss 5.3070   LearningRate 0.0394   Epoch: 7   Global Step: 42310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:28,863-Speed 3379.23 samples/sec   Loss 5.2944   LearningRate 0.0394   Epoch: 7   Global Step: 42320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:31,873-Speed 3402.56 samples/sec   Loss 5.3153   LearningRate 0.0394   Epoch: 7   Global Step: 42330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:34,886-Speed 3399.25 samples/sec   Loss 5.4645   LearningRate 0.0394   Epoch: 7   Global Step: 42340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:37,898-Speed 3401.01 samples/sec   Loss 5.4354   LearningRate 0.0394   Epoch: 7   Global Step: 42350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:40,919-Speed 3390.46 samples/sec   Loss 5.3576   LearningRate 0.0394   Epoch: 7   Global Step: 42360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:43,927-Speed 3405.58 samples/sec   Loss 5.1967   LearningRate 0.0394   Epoch: 7   Global Step: 42370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:46,937-Speed 3403.64 samples/sec   Loss 5.3703   LearningRate 0.0394   Epoch: 7   Global Step: 42380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:49,953-Speed 3395.20 samples/sec   Loss 5.3835   LearningRate 0.0393   Epoch: 7   Global Step: 42390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:49:52,974-Speed 3390.62 samples/sec   Loss 5.2729   LearningRate 0.0393   Epoch: 7   Global Step: 42400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:55,992-Speed 3393.73 samples/sec   Loss 5.5140   LearningRate 0.0393   Epoch: 7   Global Step: 42410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:49:59,018-Speed 3385.11 samples/sec   Loss 5.2309   LearningRate 0.0393   Epoch: 7   Global Step: 42420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:02,030-Speed 3400.54 samples/sec   Loss 5.2104   LearningRate 0.0393   Epoch: 7   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:05,042-Speed 3400.36 samples/sec   Loss 5.2971   LearningRate 0.0393   Epoch: 7   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:08,054-Speed 3400.68 samples/sec   Loss 5.2109   LearningRate 0.0393   Epoch: 7   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:11,070-Speed 3396.01 samples/sec   Loss 5.2110   LearningRate 0.0393   Epoch: 7   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:14,129-Speed 3349.03 samples/sec   Loss 5.1676   LearningRate 0.0393   Epoch: 7   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:17,176-Speed 3360.43 samples/sec   Loss 5.1136   LearningRate 0.0392   Epoch: 7   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:20,168-Speed 3424.24 samples/sec   Loss 5.4679   LearningRate 0.0392   Epoch: 7   Global Step: 42490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:23,182-Speed 3397.81 samples/sec   Loss 5.3530   LearningRate 0.0392   Epoch: 7   Global Step: 42500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:26,191-Speed 3404.45 samples/sec   Loss 5.2280   LearningRate 0.0392   Epoch: 7   Global Step: 42510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:29,207-Speed 3395.73 samples/sec   Loss 5.1045   LearningRate 0.0392   Epoch: 7   Global Step: 42520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:32,226-Speed 3391.95 samples/sec   Loss 5.1556   LearningRate 0.0392   Epoch: 7   Global Step: 42530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:35,232-Speed 3407.38 samples/sec   Loss 5.2879   LearningRate 0.0392   Epoch: 7   Global Step: 42540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:38,260-Speed 3382.98 samples/sec   Loss 5.2637   LearningRate 0.0392   Epoch: 7   Global Step: 42550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:41,264-Speed 3409.71 samples/sec   Loss 5.2281   LearningRate 0.0392   Epoch: 7   Global Step: 42560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:44,273-Speed 3404.02 samples/sec   Loss 5.1288   LearningRate 0.0391   Epoch: 7   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:47,289-Speed 3396.19 samples/sec   Loss 5.2725   LearningRate 0.0391   Epoch: 7   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:50:50,298-Speed 3403.07 samples/sec   Loss 5.2355   LearningRate 0.0391   Epoch: 7   Global Step: 42590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:53,311-Speed 3399.65 samples/sec   Loss 5.2119   LearningRate 0.0391   Epoch: 7   Global Step: 42600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:56,319-Speed 3406.42 samples/sec   Loss 5.2045   LearningRate 0.0391   Epoch: 7   Global Step: 42610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:50:59,341-Speed 3389.07 samples/sec   Loss 5.3271   LearningRate 0.0391   Epoch: 7   Global Step: 42620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:02,365-Speed 3387.08 samples/sec   Loss 5.2736   LearningRate 0.0391   Epoch: 7   Global Step: 42630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:05,375-Speed 3403.05 samples/sec   Loss 5.2486   LearningRate 0.0391   Epoch: 7   Global Step: 42640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:08,365-Speed 3425.28 samples/sec   Loss 5.3191   LearningRate 0.0391   Epoch: 7   Global Step: 42650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:11,378-Speed 3399.65 samples/sec   Loss 5.2043   LearningRate 0.0390   Epoch: 7   Global Step: 42660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:14,387-Speed 3403.40 samples/sec   Loss 5.2337   LearningRate 0.0390   Epoch: 7   Global Step: 42670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:17,398-Speed 3401.57 samples/sec   Loss 5.4323   LearningRate 0.0390   Epoch: 7   Global Step: 42680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:20,406-Speed 3405.77 samples/sec   Loss 5.2526   LearningRate 0.0390   Epoch: 7   Global Step: 42690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:23,418-Speed 3400.27 samples/sec   Loss 5.4054   LearningRate 0.0390   Epoch: 7   Global Step: 42700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:26,444-Speed 3384.74 samples/sec   Loss 5.1991   LearningRate 0.0390   Epoch: 7   Global Step: 42710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:29,450-Speed 3406.86 samples/sec   Loss 5.3375   LearningRate 0.0390   Epoch: 7   Global Step: 42720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:32,458-Speed 3405.44 samples/sec   Loss 5.2327   LearningRate 0.0390   Epoch: 7   Global Step: 42730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:35,466-Speed 3404.90 samples/sec   Loss 5.1729   LearningRate 0.0390   Epoch: 7   Global Step: 42740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:38,477-Speed 3401.99 samples/sec   Loss 5.3363   LearningRate 0.0389   Epoch: 7   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:41,487-Speed 3403.39 samples/sec   Loss 5.3317   LearningRate 0.0389   Epoch: 7   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:44,494-Speed 3405.71 samples/sec   Loss 5.1145   LearningRate 0.0389   Epoch: 7   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:47,509-Speed 3397.46 samples/sec   Loss 5.2596   LearningRate 0.0389   Epoch: 7   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:50,542-Speed 3377.33 samples/sec   Loss 5.2833   LearningRate 0.0389   Epoch: 7   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:51:53,564-Speed 3388.67 samples/sec   Loss 5.2073   LearningRate 0.0389   Epoch: 7   Global Step: 42800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:56,577-Speed 3399.98 samples/sec   Loss 5.3041   LearningRate 0.0389   Epoch: 7   Global Step: 42810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:51:59,585-Speed 3404.52 samples/sec   Loss 5.2779   LearningRate 0.0389   Epoch: 7   Global Step: 42820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:02,599-Speed 3398.06 samples/sec   Loss 5.2097   LearningRate 0.0389   Epoch: 7   Global Step: 42830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:05,611-Speed 3400.78 samples/sec   Loss 5.2681   LearningRate 0.0388   Epoch: 7   Global Step: 42840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:08,621-Speed 3403.97 samples/sec   Loss 5.1583   LearningRate 0.0388   Epoch: 7   Global Step: 42850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:11,685-Speed 3341.94 samples/sec   Loss 5.3946   LearningRate 0.0388   Epoch: 7   Global Step: 42860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:14,683-Speed 3416.73 samples/sec   Loss 5.2311   LearningRate 0.0388   Epoch: 7   Global Step: 42870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:17,706-Speed 3387.65 samples/sec   Loss 5.1706   LearningRate 0.0388   Epoch: 7   Global Step: 42880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:20,728-Speed 3390.17 samples/sec   Loss 5.4517   LearningRate 0.0388   Epoch: 7   Global Step: 42890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:23,765-Speed 3372.96 samples/sec   Loss 5.1993   LearningRate 0.0388   Epoch: 7   Global Step: 42900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:26,780-Speed 3396.88 samples/sec   Loss 5.3851   LearningRate 0.0388   Epoch: 7   Global Step: 42910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:29,790-Speed 3402.45 samples/sec   Loss 5.2239   LearningRate 0.0388   Epoch: 7   Global Step: 42920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:32,801-Speed 3401.96 samples/sec   Loss 5.2969   LearningRate 0.0387   Epoch: 7   Global Step: 42930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:35,828-Speed 3383.92 samples/sec   Loss 5.2097   LearningRate 0.0387   Epoch: 7   Global Step: 42940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:38,845-Speed 3395.46 samples/sec   Loss 5.2847   LearningRate 0.0387   Epoch: 7   Global Step: 42950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:41,886-Speed 3368.36 samples/sec   Loss 5.2673   LearningRate 0.0387   Epoch: 7   Global Step: 42960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 05:52:44,899-Speed 3399.91 samples/sec   Loss 5.2113   LearningRate 0.0387   Epoch: 7   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:47,937-Speed 3370.91 samples/sec   Loss 5.1904   LearningRate 0.0387   Epoch: 7   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:50,953-Speed 3396.81 samples/sec   Loss 5.2037   LearningRate 0.0387   Epoch: 7   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:53,969-Speed 3396.12 samples/sec   Loss 5.4139   LearningRate 0.0387   Epoch: 7   Global Step: 43000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:56,981-Speed 3399.99 samples/sec   Loss 5.2287   LearningRate 0.0387   Epoch: 7   Global Step: 43010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:52:59,991-Speed 3403.03 samples/sec   Loss 5.1582   LearningRate 0.0387   Epoch: 7   Global Step: 43020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:03,059-Speed 3338.62 samples/sec   Loss 5.2885   LearningRate 0.0386   Epoch: 7   Global Step: 43030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:06,076-Speed 3394.62 samples/sec   Loss 5.2639   LearningRate 0.0386   Epoch: 7   Global Step: 43040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:09,094-Speed 3392.98 samples/sec   Loss 5.2455   LearningRate 0.0386   Epoch: 7   Global Step: 43050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:12,106-Speed 3401.05 samples/sec   Loss 5.3608   LearningRate 0.0386   Epoch: 7   Global Step: 43060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:15,122-Speed 3395.39 samples/sec   Loss 5.2535   LearningRate 0.0386   Epoch: 7   Global Step: 43070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:18,145-Speed 3389.34 samples/sec   Loss 5.3424   LearningRate 0.0386   Epoch: 7   Global Step: 43080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:21,161-Speed 3395.62 samples/sec   Loss 5.2124   LearningRate 0.0386   Epoch: 7   Global Step: 43090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:24,181-Speed 3391.87 samples/sec   Loss 5.1572   LearningRate 0.0386   Epoch: 7   Global Step: 43100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:27,239-Speed 3349.61 samples/sec   Loss 5.1750   LearningRate 0.0386   Epoch: 7   Global Step: 43110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:30,263-Speed 3387.00 samples/sec   Loss 5.2803   LearningRate 0.0385   Epoch: 7   Global Step: 43120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:33,285-Speed 3388.98 samples/sec   Loss 5.1307   LearningRate 0.0385   Epoch: 7   Global Step: 43130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:36,300-Speed 3396.15 samples/sec   Loss 5.3335   LearningRate 0.0385   Epoch: 7   Global Step: 43140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:53:39,293-Speed 3422.52 samples/sec   Loss 5.1343   LearningRate 0.0385   Epoch: 7   Global Step: 43150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:42,308-Speed 3397.92 samples/sec   Loss 5.3566   LearningRate 0.0385   Epoch: 7   Global Step: 43160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:45,324-Speed 3395.57 samples/sec   Loss 5.2668   LearningRate 0.0385   Epoch: 7   Global Step: 43170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:48,361-Speed 3372.67 samples/sec   Loss 5.2982   LearningRate 0.0385   Epoch: 7   Global Step: 43180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:51,374-Speed 3399.07 samples/sec   Loss 5.3144   LearningRate 0.0385   Epoch: 7   Global Step: 43190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:54,460-Speed 3318.98 samples/sec   Loss 5.2466   LearningRate 0.0385   Epoch: 7   Global Step: 43200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:53:57,472-Speed 3401.39 samples/sec   Loss 5.2197   LearningRate 0.0384   Epoch: 7   Global Step: 43210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:00,488-Speed 3395.09 samples/sec   Loss 5.2331   LearningRate 0.0384   Epoch: 7   Global Step: 43220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:03,511-Speed 3388.26 samples/sec   Loss 5.4197   LearningRate 0.0384   Epoch: 7   Global Step: 43230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:06,524-Speed 3399.62 samples/sec   Loss 5.2321   LearningRate 0.0384   Epoch: 7   Global Step: 43240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:09,538-Speed 3399.01 samples/sec   Loss 5.2456   LearningRate 0.0384   Epoch: 7   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:54:12,536-Speed 3416.01 samples/sec   Loss 5.1832   LearningRate 0.0384   Epoch: 7   Global Step: 43260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:15,547-Speed 3401.26 samples/sec   Loss 5.3662   LearningRate 0.0384   Epoch: 7   Global Step: 43270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:18,560-Speed 3399.31 samples/sec   Loss 5.2218   LearningRate 0.0384   Epoch: 7   Global Step: 43280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:21,577-Speed 3395.19 samples/sec   Loss 5.3803   LearningRate 0.0384   Epoch: 7   Global Step: 43290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:24,594-Speed 3394.61 samples/sec   Loss 5.1700   LearningRate 0.0383   Epoch: 7   Global Step: 43300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:27,614-Speed 3392.46 samples/sec   Loss 5.2787   LearningRate 0.0383   Epoch: 7   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:30,629-Speed 3397.24 samples/sec   Loss 5.2091   LearningRate 0.0383   Epoch: 7   Global Step: 43320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:33,647-Speed 3393.16 samples/sec   Loss 5.1646   LearningRate 0.0383   Epoch: 7   Global Step: 43330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:36,672-Speed 3386.81 samples/sec   Loss 5.1972   LearningRate 0.0383   Epoch: 7   Global Step: 43340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:39,691-Speed 3392.60 samples/sec   Loss 5.2474   LearningRate 0.0383   Epoch: 7   Global Step: 43350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:42,705-Speed 3397.77 samples/sec   Loss 5.1451   LearningRate 0.0383   Epoch: 7   Global Step: 43360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:54:45,704-Speed 3415.73 samples/sec   Loss 5.1991   LearningRate 0.0383   Epoch: 7   Global Step: 43370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:48,735-Speed 3379.02 samples/sec   Loss 5.1814   LearningRate 0.0383   Epoch: 7   Global Step: 43380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:51,758-Speed 3387.81 samples/sec   Loss 5.2288   LearningRate 0.0382   Epoch: 7   Global Step: 43390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:54,783-Speed 3386.00 samples/sec   Loss 5.1599   LearningRate 0.0382   Epoch: 7   Global Step: 43400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:54:57,803-Speed 3391.40 samples/sec   Loss 5.2812   LearningRate 0.0382   Epoch: 7   Global Step: 43410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:00,834-Speed 3379.02 samples/sec   Loss 5.2537   LearningRate 0.0382   Epoch: 7   Global Step: 43420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:03,879-Speed 3363.51 samples/sec   Loss 5.2344   LearningRate 0.0382   Epoch: 7   Global Step: 43430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:06,896-Speed 3395.24 samples/sec   Loss 5.1352   LearningRate 0.0382   Epoch: 7   Global Step: 43440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:09,923-Speed 3383.70 samples/sec   Loss 5.2507   LearningRate 0.0382   Epoch: 7   Global Step: 43450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:12,998-Speed 3331.27 samples/sec   Loss 5.1207   LearningRate 0.0382   Epoch: 7   Global Step: 43460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:16,012-Speed 3398.01 samples/sec   Loss 5.1133   LearningRate 0.0382   Epoch: 7   Global Step: 43470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:55:19,016-Speed 3409.47 samples/sec   Loss 5.0983   LearningRate 0.0382   Epoch: 7   Global Step: 43480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:22,032-Speed 3396.02 samples/sec   Loss 5.1884   LearningRate 0.0381   Epoch: 7   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:25,068-Speed 3373.08 samples/sec   Loss 5.0631   LearningRate 0.0381   Epoch: 7   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:28,102-Speed 3375.93 samples/sec   Loss 5.3145   LearningRate 0.0381   Epoch: 7   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:31,132-Speed 3381.35 samples/sec   Loss 5.1278   LearningRate 0.0381   Epoch: 7   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:34,152-Speed 3390.86 samples/sec   Loss 5.0955   LearningRate 0.0381   Epoch: 7   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:37,179-Speed 3384.42 samples/sec   Loss 5.1985   LearningRate 0.0381   Epoch: 7   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:40,201-Speed 3389.01 samples/sec   Loss 5.2508   LearningRate 0.0381   Epoch: 7   Global Step: 43550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:43,228-Speed 3383.59 samples/sec   Loss 5.1944   LearningRate 0.0381   Epoch: 7   Global Step: 43560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:46,250-Speed 3389.54 samples/sec   Loss 5.2313   LearningRate 0.0381   Epoch: 7   Global Step: 43570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:55:49,291-Speed 3367.77 samples/sec   Loss 5.2341   LearningRate 0.0380   Epoch: 7   Global Step: 43580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:55:52,315-Speed 3387.29 samples/sec   Loss 5.0491   LearningRate 0.0380   Epoch: 7   Global Step: 43590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:55:55,340-Speed 3386.06 samples/sec   Loss 5.2910   LearningRate 0.0380   Epoch: 7   Global Step: 43600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:55:58,342-Speed 3411.32 samples/sec   Loss 5.1951   LearningRate 0.0380   Epoch: 7   Global Step: 43610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:01,364-Speed 3390.06 samples/sec   Loss 5.2752   LearningRate 0.0380   Epoch: 7   Global Step: 43620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:04,390-Speed 3384.32 samples/sec   Loss 5.1749   LearningRate 0.0380   Epoch: 7   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:07,412-Speed 3389.69 samples/sec   Loss 5.1061   LearningRate 0.0380   Epoch: 7   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:10,430-Speed 3393.50 samples/sec   Loss 5.1638   LearningRate 0.0380   Epoch: 7   Global Step: 43650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:13,454-Speed 3386.87 samples/sec   Loss 5.2046   LearningRate 0.0380   Epoch: 7   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:16,472-Speed 3393.49 samples/sec   Loss 5.1137   LearningRate 0.0379   Epoch: 7   Global Step: 43670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:19,493-Speed 3391.57 samples/sec   Loss 5.1575   LearningRate 0.0379   Epoch: 7   Global Step: 43680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:22,515-Speed 3388.57 samples/sec   Loss 5.1266   LearningRate 0.0379   Epoch: 7   Global Step: 43690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:25,585-Speed 3336.23 samples/sec   Loss 5.2980   LearningRate 0.0379   Epoch: 7   Global Step: 43700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:28,831-Speed 3155.94 samples/sec   Loss 5.1948   LearningRate 0.0379   Epoch: 7   Global Step: 43710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:56:31,836-Speed 3408.15 samples/sec   Loss 5.2958   LearningRate 0.0379   Epoch: 7   Global Step: 43720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:34,863-Speed 3383.76 samples/sec   Loss 5.1913   LearningRate 0.0379   Epoch: 7   Global Step: 43730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:37,883-Speed 3391.82 samples/sec   Loss 5.1602   LearningRate 0.0379   Epoch: 7   Global Step: 43740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:40,925-Speed 3367.37 samples/sec   Loss 5.1690   LearningRate 0.0379   Epoch: 7   Global Step: 43750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:43,951-Speed 3385.48 samples/sec   Loss 5.0702   LearningRate 0.0378   Epoch: 7   Global Step: 43760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:46,980-Speed 3380.60 samples/sec   Loss 5.2455   LearningRate 0.0378   Epoch: 7   Global Step: 43770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:50,017-Speed 3372.53 samples/sec   Loss 5.2350   LearningRate 0.0378   Epoch: 7   Global Step: 43780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:53,049-Speed 3378.97 samples/sec   Loss 5.1801   LearningRate 0.0378   Epoch: 7   Global Step: 43790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:56,070-Speed 3390.72 samples/sec   Loss 5.2637   LearningRate 0.0378   Epoch: 7   Global Step: 43800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:56:59,096-Speed 3384.58 samples/sec   Loss 5.1943   LearningRate 0.0378   Epoch: 7   Global Step: 43810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:02,125-Speed 3380.79 samples/sec   Loss 5.1500   LearningRate 0.0378   Epoch: 7   Global Step: 43820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:05,218-Speed 3312.02 samples/sec   Loss 5.1465   LearningRate 0.0378   Epoch: 7   Global Step: 43830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:08,238-Speed 3391.49 samples/sec   Loss 5.0586   LearningRate 0.0378   Epoch: 7   Global Step: 43840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:11,258-Speed 3391.65 samples/sec   Loss 5.1166   LearningRate 0.0377   Epoch: 7   Global Step: 43850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:14,263-Speed 3408.05 samples/sec   Loss 5.1890   LearningRate 0.0377   Epoch: 7   Global Step: 43860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:17,291-Speed 3382.90 samples/sec   Loss 5.2393   LearningRate 0.0377   Epoch: 7   Global Step: 43870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:20,313-Speed 3389.62 samples/sec   Loss 5.2218   LearningRate 0.0377   Epoch: 7   Global Step: 43880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:23,345-Speed 3378.05 samples/sec   Loss 5.1940   LearningRate 0.0377   Epoch: 7   Global Step: 43890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:26,400-Speed 3352.92 samples/sec   Loss 5.1100   LearningRate 0.0377   Epoch: 7   Global Step: 43900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:29,429-Speed 3380.76 samples/sec   Loss 5.1151   LearningRate 0.0377   Epoch: 7   Global Step: 43910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:32,459-Speed 3381.88 samples/sec   Loss 5.2035   LearningRate 0.0377   Epoch: 7   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:35,484-Speed 3386.17 samples/sec   Loss 5.1044   LearningRate 0.0377   Epoch: 7   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:38,505-Speed 3390.20 samples/sec   Loss 5.2978   LearningRate 0.0377   Epoch: 7   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:41,529-Speed 3387.08 samples/sec   Loss 5.1828   LearningRate 0.0376   Epoch: 7   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 05:57:44,555-Speed 3385.14 samples/sec   Loss 5.1867   LearningRate 0.0376   Epoch: 7   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:47,590-Speed 3374.77 samples/sec   Loss 5.2787   LearningRate 0.0376   Epoch: 7   Global Step: 43970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:50,616-Speed 3385.54 samples/sec   Loss 5.1413   LearningRate 0.0376   Epoch: 7   Global Step: 43980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:53,640-Speed 3386.89 samples/sec   Loss 5.2026   LearningRate 0.0376   Epoch: 7   Global Step: 43990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:57:56,663-Speed 3388.66 samples/sec   Loss 5.1870   LearningRate 0.0376   Epoch: 7   Global Step: 44000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 05:58:40,391-[lfw][44000]XNorm: 21.992907
Training: 2022-04-27 05:58:40,391-[lfw][44000]Accuracy-Flip: 0.99733+-0.00260
Training: 2022-04-27 05:58:40,392-[lfw][44000]Accuracy-Highest: 0.99817
Training: 2022-04-27 05:59:30,615-[cfp_fp][44000]XNorm: 19.642103
Training: 2022-04-27 05:59:30,615-[cfp_fp][44000]Accuracy-Flip: 0.95957+-0.00878
Training: 2022-04-27 05:59:30,616-[cfp_fp][44000]Accuracy-Highest: 0.96057
Training: 2022-04-27 06:00:14,008-[agedb_30][44000]XNorm: 21.939295
Training: 2022-04-27 06:00:14,009-[agedb_30][44000]Accuracy-Flip: 0.97600+-0.00797
Training: 2022-04-27 06:00:14,009-[agedb_30][44000]Accuracy-Highest: 0.97767
Training: 2022-04-27 06:00:17,036-Speed 72.95 samples/sec   Loss 5.2943   LearningRate 0.0376   Epoch: 7   Global Step: 44010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:00:20,047-Speed 3401.34 samples/sec   Loss 5.1001   LearningRate 0.0376   Epoch: 7   Global Step: 44020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:00:23,060-Speed 3399.93 samples/sec   Loss 5.1796   LearningRate 0.0376   Epoch: 7   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:00:26,072-Speed 3400.36 samples/sec   Loss 5.1729   LearningRate 0.0375   Epoch: 7   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:00:29,081-Speed 3404.12 samples/sec   Loss 5.2761   LearningRate 0.0375   Epoch: 7   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:00:32,057-Speed 3441.59 samples/sec   Loss 5.2462   LearningRate 0.0375   Epoch: 7   Global Step: 44060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:35,065-Speed 3405.26 samples/sec   Loss 5.1321   LearningRate 0.0375   Epoch: 7   Global Step: 44070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:38,085-Speed 3391.81 samples/sec   Loss 5.1142   LearningRate 0.0375   Epoch: 7   Global Step: 44080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:41,110-Speed 3385.60 samples/sec   Loss 5.2094   LearningRate 0.0375   Epoch: 7   Global Step: 44090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:44,129-Speed 3392.79 samples/sec   Loss 5.1286   LearningRate 0.0375   Epoch: 7   Global Step: 44100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:47,147-Speed 3394.02 samples/sec   Loss 5.1344   LearningRate 0.0375   Epoch: 7   Global Step: 44110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:50,164-Speed 3394.44 samples/sec   Loss 5.1609   LearningRate 0.0375   Epoch: 7   Global Step: 44120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:53,183-Speed 3393.14 samples/sec   Loss 5.3268   LearningRate 0.0374   Epoch: 7   Global Step: 44130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:56,191-Speed 3405.13 samples/sec   Loss 5.1899   LearningRate 0.0374   Epoch: 7   Global Step: 44140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:00:59,208-Speed 3395.08 samples/sec   Loss 5.1545   LearningRate 0.0374   Epoch: 7   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:02,228-Speed 3390.67 samples/sec   Loss 5.1715   LearningRate 0.0374   Epoch: 7   Global Step: 44160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:01:05,225-Speed 3417.76 samples/sec   Loss 5.1624   LearningRate 0.0374   Epoch: 7   Global Step: 44170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:08,238-Speed 3400.26 samples/sec   Loss 5.2789   LearningRate 0.0374   Epoch: 7   Global Step: 44180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:11,258-Speed 3391.52 samples/sec   Loss 5.0771   LearningRate 0.0374   Epoch: 7   Global Step: 44190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:14,271-Speed 3399.48 samples/sec   Loss 5.1544   LearningRate 0.0374   Epoch: 7   Global Step: 44200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:17,288-Speed 3394.71 samples/sec   Loss 5.1295   LearningRate 0.0374   Epoch: 7   Global Step: 44210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:20,306-Speed 3393.23 samples/sec   Loss 5.0400   LearningRate 0.0374   Epoch: 7   Global Step: 44220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:23,327-Speed 3390.28 samples/sec   Loss 5.1682   LearningRate 0.0373   Epoch: 7   Global Step: 44230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:26,343-Speed 3396.43 samples/sec   Loss 5.0599   LearningRate 0.0373   Epoch: 7   Global Step: 44240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:29,354-Speed 3401.41 samples/sec   Loss 5.1046   LearningRate 0.0373   Epoch: 7   Global Step: 44250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:32,367-Speed 3399.65 samples/sec   Loss 5.0439   LearningRate 0.0373   Epoch: 7   Global Step: 44260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:01:35,358-Speed 3424.27 samples/sec   Loss 5.1357   LearningRate 0.0373   Epoch: 7   Global Step: 44270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:38,391-Speed 3377.10 samples/sec   Loss 5.2062   LearningRate 0.0373   Epoch: 7   Global Step: 44280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:41,417-Speed 3384.92 samples/sec   Loss 5.2435   LearningRate 0.0373   Epoch: 7   Global Step: 44290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:44,433-Speed 3397.46 samples/sec   Loss 5.1052   LearningRate 0.0373   Epoch: 7   Global Step: 44300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:47,486-Speed 3354.62 samples/sec   Loss 5.3187   LearningRate 0.0373   Epoch: 7   Global Step: 44310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:50,515-Speed 3381.09 samples/sec   Loss 5.2257   LearningRate 0.0372   Epoch: 7   Global Step: 44320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:53,527-Speed 3399.90 samples/sec   Loss 5.0862   LearningRate 0.0372   Epoch: 7   Global Step: 44330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:56,542-Speed 3397.38 samples/sec   Loss 5.1461   LearningRate 0.0372   Epoch: 7   Global Step: 44340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:01:59,553-Speed 3401.43 samples/sec   Loss 5.2677   LearningRate 0.0372   Epoch: 7   Global Step: 44350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:02:02,572-Speed 3392.65 samples/sec   Loss 5.1379   LearningRate 0.0372   Epoch: 7   Global Step: 44360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:02:05,590-Speed 3394.03 samples/sec   Loss 5.0131   LearningRate 0.0372   Epoch: 7   Global Step: 44370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:08,610-Speed 3391.49 samples/sec   Loss 5.1827   LearningRate 0.0372   Epoch: 7   Global Step: 44380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:11,628-Speed 3394.12 samples/sec   Loss 5.1461   LearningRate 0.0372   Epoch: 7   Global Step: 44390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:14,649-Speed 3390.81 samples/sec   Loss 5.0919   LearningRate 0.0372   Epoch: 7   Global Step: 44400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:17,742-Speed 3311.03 samples/sec   Loss 5.1413   LearningRate 0.0371   Epoch: 7   Global Step: 44410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:20,758-Speed 3396.63 samples/sec   Loss 5.1569   LearningRate 0.0371   Epoch: 7   Global Step: 44420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:23,777-Speed 3391.65 samples/sec   Loss 5.1446   LearningRate 0.0371   Epoch: 7   Global Step: 44430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:26,797-Speed 3391.79 samples/sec   Loss 5.1139   LearningRate 0.0371   Epoch: 7   Global Step: 44440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:29,819-Speed 3389.54 samples/sec   Loss 5.1929   LearningRate 0.0371   Epoch: 7   Global Step: 44450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:32,838-Speed 3392.33 samples/sec   Loss 5.1333   LearningRate 0.0371   Epoch: 7   Global Step: 44460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:02:35,860-Speed 3389.19 samples/sec   Loss 5.0637   LearningRate 0.0371   Epoch: 7   Global Step: 44470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:02:38,886-Speed 3385.99 samples/sec   Loss 5.1160   LearningRate 0.0371   Epoch: 7   Global Step: 44480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:02:41,904-Speed 3393.28 samples/sec   Loss 5.1803   LearningRate 0.0371   Epoch: 7   Global Step: 44490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:02:44,922-Speed 3394.25 samples/sec   Loss 5.1473   LearningRate 0.0371   Epoch: 7   Global Step: 44500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:02:47,934-Speed 3399.82 samples/sec   Loss 5.1951   LearningRate 0.0370   Epoch: 7   Global Step: 44510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:02:50,955-Speed 3390.47 samples/sec   Loss 5.1931   LearningRate 0.0370   Epoch: 7   Global Step: 44520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:02:53,967-Speed 3400.44 samples/sec   Loss 5.2572   LearningRate 0.0370   Epoch: 7   Global Step: 44530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:02:56,984-Speed 3394.76 samples/sec   Loss 5.0949   LearningRate 0.0370   Epoch: 7   Global Step: 44540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:00,002-Speed 3394.79 samples/sec   Loss 5.1713   LearningRate 0.0370   Epoch: 7   Global Step: 44550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:03,039-Speed 3372.68 samples/sec   Loss 5.1500   LearningRate 0.0370   Epoch: 7   Global Step: 44560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:06,038-Speed 3414.58 samples/sec   Loss 5.1091   LearningRate 0.0370   Epoch: 7   Global Step: 44570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:09,034-Speed 3419.17 samples/sec   Loss 5.2268   LearningRate 0.0370   Epoch: 7   Global Step: 44580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:12,051-Speed 3394.61 samples/sec   Loss 5.3248   LearningRate 0.0370   Epoch: 7   Global Step: 44590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:15,066-Speed 3397.11 samples/sec   Loss 5.1258   LearningRate 0.0369   Epoch: 7   Global Step: 44600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:18,090-Speed 3387.35 samples/sec   Loss 5.1012   LearningRate 0.0369   Epoch: 7   Global Step: 44610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:21,103-Speed 3398.63 samples/sec   Loss 5.0809   LearningRate 0.0369   Epoch: 7   Global Step: 44620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:24,118-Speed 3397.47 samples/sec   Loss 5.1867   LearningRate 0.0369   Epoch: 7   Global Step: 44630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:27,245-Speed 3275.19 samples/sec   Loss 5.0877   LearningRate 0.0369   Epoch: 7   Global Step: 44640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:30,278-Speed 3377.90 samples/sec   Loss 5.0171   LearningRate 0.0369   Epoch: 7   Global Step: 44650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:33,296-Speed 3393.70 samples/sec   Loss 5.1001   LearningRate 0.0369   Epoch: 7   Global Step: 44660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:36,315-Speed 3391.99 samples/sec   Loss 5.2856   LearningRate 0.0369   Epoch: 7   Global Step: 44670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:03:39,339-Speed 3387.30 samples/sec   Loss 4.9894   LearningRate 0.0369   Epoch: 7   Global Step: 44680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:42,356-Speed 3395.52 samples/sec   Loss 5.2187   LearningRate 0.0368   Epoch: 7   Global Step: 44690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:45,370-Speed 3397.33 samples/sec   Loss 5.1631   LearningRate 0.0368   Epoch: 7   Global Step: 44700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:48,399-Speed 3381.93 samples/sec   Loss 5.1768   LearningRate 0.0368   Epoch: 7   Global Step: 44710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:51,412-Speed 3399.06 samples/sec   Loss 5.1500   LearningRate 0.0368   Epoch: 7   Global Step: 44720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:54,432-Speed 3391.59 samples/sec   Loss 5.1149   LearningRate 0.0368   Epoch: 7   Global Step: 44730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:03:57,473-Speed 3368.54 samples/sec   Loss 5.1409   LearningRate 0.0368   Epoch: 7   Global Step: 44740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:04:00,487-Speed 3398.74 samples/sec   Loss 5.1379   LearningRate 0.0368   Epoch: 7   Global Step: 44750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:04:03,487-Speed 3413.80 samples/sec   Loss 5.0979   LearningRate 0.0368   Epoch: 7   Global Step: 44760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:06,483-Speed 3418.90 samples/sec   Loss 5.1158   LearningRate 0.0368   Epoch: 7   Global Step: 44770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:09,496-Speed 3399.31 samples/sec   Loss 5.2472   LearningRate 0.0368   Epoch: 7   Global Step: 44780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:12,515-Speed 3392.66 samples/sec   Loss 5.1892   LearningRate 0.0367   Epoch: 7   Global Step: 44790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:15,529-Speed 3397.74 samples/sec   Loss 5.0913   LearningRate 0.0367   Epoch: 7   Global Step: 44800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:18,545-Speed 3396.78 samples/sec   Loss 5.0259   LearningRate 0.0367   Epoch: 7   Global Step: 44810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:21,580-Speed 3374.31 samples/sec   Loss 5.1515   LearningRate 0.0367   Epoch: 7   Global Step: 44820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:24,611-Speed 3378.99 samples/sec   Loss 5.1160   LearningRate 0.0367   Epoch: 7   Global Step: 44830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:27,633-Speed 3389.50 samples/sec   Loss 5.1227   LearningRate 0.0367   Epoch: 7   Global Step: 44840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:30,655-Speed 3389.45 samples/sec   Loss 5.1131   LearningRate 0.0367   Epoch: 7   Global Step: 44850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:33,679-Speed 3387.09 samples/sec   Loss 5.1573   LearningRate 0.0367   Epoch: 7   Global Step: 44860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:04:36,699-Speed 3391.23 samples/sec   Loss 5.1525   LearningRate 0.0367   Epoch: 7   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:39,731-Speed 3378.82 samples/sec   Loss 5.0834   LearningRate 0.0366   Epoch: 7   Global Step: 44880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:42,755-Speed 3386.11 samples/sec   Loss 5.0935   LearningRate 0.0366   Epoch: 7   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:45,773-Speed 3394.78 samples/sec   Loss 5.1032   LearningRate 0.0366   Epoch: 7   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:48,795-Speed 3388.87 samples/sec   Loss 5.1401   LearningRate 0.0366   Epoch: 7   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:51,811-Speed 3395.72 samples/sec   Loss 5.1022   LearningRate 0.0366   Epoch: 7   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:54,829-Speed 3394.49 samples/sec   Loss 5.0420   LearningRate 0.0366   Epoch: 7   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:04:57,850-Speed 3389.88 samples/sec   Loss 5.1015   LearningRate 0.0366   Epoch: 7   Global Step: 44940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:00,868-Speed 3394.00 samples/sec   Loss 5.0721   LearningRate 0.0366   Epoch: 7   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:03,890-Speed 3388.82 samples/sec   Loss 5.1045   LearningRate 0.0366   Epoch: 7   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:06,921-Speed 3379.93 samples/sec   Loss 5.0340   LearningRate 0.0365   Epoch: 7   Global Step: 44970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:05:09,919-Speed 3416.52 samples/sec   Loss 5.1233   LearningRate 0.0365   Epoch: 7   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:12,945-Speed 3384.55 samples/sec   Loss 5.1434   LearningRate 0.0365   Epoch: 7   Global Step: 44990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:15,967-Speed 3388.74 samples/sec   Loss 5.1459   LearningRate 0.0365   Epoch: 7   Global Step: 45000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:18,984-Speed 3395.53 samples/sec   Loss 5.0206   LearningRate 0.0365   Epoch: 7   Global Step: 45010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:22,008-Speed 3386.51 samples/sec   Loss 4.9999   LearningRate 0.0365   Epoch: 7   Global Step: 45020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:25,010-Speed 3411.66 samples/sec   Loss 5.0458   LearningRate 0.0365   Epoch: 7   Global Step: 45030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:28,038-Speed 3383.32 samples/sec   Loss 5.0015   LearningRate 0.0365   Epoch: 7   Global Step: 45040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:31,054-Speed 3395.47 samples/sec   Loss 5.1155   LearningRate 0.0365   Epoch: 7   Global Step: 45050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:34,071-Speed 3395.12 samples/sec   Loss 5.1101   LearningRate 0.0365   Epoch: 7   Global Step: 45060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:37,087-Speed 3396.27 samples/sec   Loss 5.1242   LearningRate 0.0364   Epoch: 7   Global Step: 45070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:40,109-Speed 3389.20 samples/sec   Loss 5.3144   LearningRate 0.0364   Epoch: 7   Global Step: 45080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:43,139-Speed 3380.16 samples/sec   Loss 5.0760   LearningRate 0.0364   Epoch: 7   Global Step: 45090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:46,155-Speed 3396.56 samples/sec   Loss 5.0076   LearningRate 0.0364   Epoch: 7   Global Step: 45100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:49,182-Speed 3383.88 samples/sec   Loss 5.0280   LearningRate 0.0364   Epoch: 7   Global Step: 45110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:52,205-Speed 3388.21 samples/sec   Loss 5.1835   LearningRate 0.0364   Epoch: 7   Global Step: 45120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:05:55,221-Speed 3395.93 samples/sec   Loss 5.1563   LearningRate 0.0364   Epoch: 7   Global Step: 45130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:05:58,240-Speed 3392.19 samples/sec   Loss 5.0587   LearningRate 0.0364   Epoch: 7   Global Step: 45140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:01,256-Speed 3395.65 samples/sec   Loss 4.9998   LearningRate 0.0364   Epoch: 7   Global Step: 45150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:04,280-Speed 3388.15 samples/sec   Loss 5.1000   LearningRate 0.0363   Epoch: 7   Global Step: 45160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:07,299-Speed 3392.12 samples/sec   Loss 5.0428   LearningRate 0.0363   Epoch: 7   Global Step: 45170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:10,326-Speed 3384.17 samples/sec   Loss 5.0318   LearningRate 0.0363   Epoch: 7   Global Step: 45180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:13,346-Speed 3391.33 samples/sec   Loss 5.1522   LearningRate 0.0363   Epoch: 7   Global Step: 45190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:16,363-Speed 3395.34 samples/sec   Loss 5.1943   LearningRate 0.0363   Epoch: 7   Global Step: 45200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:19,383-Speed 3391.39 samples/sec   Loss 5.0892   LearningRate 0.0363   Epoch: 7   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:22,403-Speed 3391.40 samples/sec   Loss 5.1568   LearningRate 0.0363   Epoch: 7   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:25,426-Speed 3387.92 samples/sec   Loss 5.1201   LearningRate 0.0363   Epoch: 7   Global Step: 45230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:06:28,444-Speed 3393.58 samples/sec   Loss 5.0337   LearningRate 0.0363   Epoch: 7   Global Step: 45240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:06:31,450-Speed 3407.91 samples/sec   Loss 5.1607   LearningRate 0.0363   Epoch: 7   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:34,464-Speed 3397.75 samples/sec   Loss 4.9249   LearningRate 0.0362   Epoch: 7   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:37,489-Speed 3385.89 samples/sec   Loss 5.0041   LearningRate 0.0362   Epoch: 7   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:40,513-Speed 3388.15 samples/sec   Loss 5.2072   LearningRate 0.0362   Epoch: 7   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:43,539-Speed 3385.31 samples/sec   Loss 4.8852   LearningRate 0.0362   Epoch: 7   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:46,561-Speed 3388.93 samples/sec   Loss 5.1858   LearningRate 0.0362   Epoch: 7   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:49,582-Speed 3391.41 samples/sec   Loss 4.9577   LearningRate 0.0362   Epoch: 7   Global Step: 45310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:52,605-Speed 3387.59 samples/sec   Loss 5.1564   LearningRate 0.0362   Epoch: 7   Global Step: 45320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:55,630-Speed 3386.08 samples/sec   Loss 5.0257   LearningRate 0.0362   Epoch: 7   Global Step: 45330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:06:58,656-Speed 3385.29 samples/sec   Loss 5.2622   LearningRate 0.0362   Epoch: 7   Global Step: 45340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:01,678-Speed 3388.85 samples/sec   Loss 5.0674   LearningRate 0.0361   Epoch: 7   Global Step: 45350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:07:04,726-Speed 3360.44 samples/sec   Loss 5.0424   LearningRate 0.0361   Epoch: 7   Global Step: 45360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:07:07,749-Speed 3388.18 samples/sec   Loss 5.0985   LearningRate 0.0361   Epoch: 7   Global Step: 45370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:07:10,768-Speed 3392.76 samples/sec   Loss 5.0484   LearningRate 0.0361   Epoch: 7   Global Step: 45380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:07:13,791-Speed 3388.28 samples/sec   Loss 4.9343   LearningRate 0.0361   Epoch: 7   Global Step: 45390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:07:16,813-Speed 3389.93 samples/sec   Loss 5.0107   LearningRate 0.0361   Epoch: 7   Global Step: 45400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:07:19,815-Speed 3410.97 samples/sec   Loss 5.0910   LearningRate 0.0361   Epoch: 7   Global Step: 45410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:22,843-Speed 3382.45 samples/sec   Loss 5.0725   LearningRate 0.0361   Epoch: 7   Global Step: 45420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:25,871-Speed 3383.04 samples/sec   Loss 4.9279   LearningRate 0.0361   Epoch: 7   Global Step: 45430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:28,891-Speed 3391.24 samples/sec   Loss 4.9904   LearningRate 0.0361   Epoch: 7   Global Step: 45440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:31,921-Speed 3380.49 samples/sec   Loss 5.1112   LearningRate 0.0360   Epoch: 7   Global Step: 45450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:34,949-Speed 3383.14 samples/sec   Loss 5.0433   LearningRate 0.0360   Epoch: 7   Global Step: 45460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:37,972-Speed 3388.30 samples/sec   Loss 5.1433   LearningRate 0.0360   Epoch: 7   Global Step: 45470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:41,092-Speed 3282.22 samples/sec   Loss 5.1529   LearningRate 0.0360   Epoch: 7   Global Step: 45480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:54,287-Speed 776.11 samples/sec   Loss 5.0772   LearningRate 0.0360   Epoch: 8   Global Step: 45490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:07:57,316-Speed 3381.84 samples/sec   Loss 4.4966   LearningRate 0.0360   Epoch: 8   Global Step: 45500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:00,377-Speed 3346.44 samples/sec   Loss 4.4197   LearningRate 0.0360   Epoch: 8   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:03,415-Speed 3372.00 samples/sec   Loss 4.4040   LearningRate 0.0360   Epoch: 8   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:06,440-Speed 3385.15 samples/sec   Loss 4.5389   LearningRate 0.0360   Epoch: 8   Global Step: 45530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:09,479-Speed 3370.23 samples/sec   Loss 4.3896   LearningRate 0.0359   Epoch: 8   Global Step: 45540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:12,504-Speed 3387.02 samples/sec   Loss 4.4930   LearningRate 0.0359   Epoch: 8   Global Step: 45550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:15,524-Speed 3390.99 samples/sec   Loss 4.6510   LearningRate 0.0359   Epoch: 8   Global Step: 45560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:18,543-Speed 3392.93 samples/sec   Loss 4.4506   LearningRate 0.0359   Epoch: 8   Global Step: 45570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:21,562-Speed 3392.38 samples/sec   Loss 4.5857   LearningRate 0.0359   Epoch: 8   Global Step: 45580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:24,583-Speed 3390.89 samples/sec   Loss 4.5107   LearningRate 0.0359   Epoch: 8   Global Step: 45590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:27,604-Speed 3390.49 samples/sec   Loss 4.5890   LearningRate 0.0359   Epoch: 8   Global Step: 45600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:30,628-Speed 3387.18 samples/sec   Loss 4.5959   LearningRate 0.0359   Epoch: 8   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:33,648-Speed 3390.72 samples/sec   Loss 4.5256   LearningRate 0.0359   Epoch: 8   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:36,708-Speed 3347.54 samples/sec   Loss 4.5092   LearningRate 0.0359   Epoch: 8   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:39,732-Speed 3386.63 samples/sec   Loss 4.5112   LearningRate 0.0358   Epoch: 8   Global Step: 45640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:42,768-Speed 3374.68 samples/sec   Loss 4.5772   LearningRate 0.0358   Epoch: 8   Global Step: 45650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:45,808-Speed 3368.74 samples/sec   Loss 4.5435   LearningRate 0.0358   Epoch: 8   Global Step: 45660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:48,833-Speed 3385.78 samples/sec   Loss 4.5448   LearningRate 0.0358   Epoch: 8   Global Step: 45670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:51,858-Speed 3386.21 samples/sec   Loss 4.5815   LearningRate 0.0358   Epoch: 8   Global Step: 45680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:08:54,865-Speed 3406.35 samples/sec   Loss 4.5909   LearningRate 0.0358   Epoch: 8   Global Step: 45690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:08:57,905-Speed 3368.58 samples/sec   Loss 4.6412   LearningRate 0.0358   Epoch: 8   Global Step: 45700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:00,944-Speed 3370.70 samples/sec   Loss 4.6180   LearningRate 0.0358   Epoch: 8   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:03,977-Speed 3377.63 samples/sec   Loss 4.6817   LearningRate 0.0358   Epoch: 8   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:07,030-Speed 3354.87 samples/sec   Loss 4.5726   LearningRate 0.0357   Epoch: 8   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:10,066-Speed 3373.20 samples/sec   Loss 4.6022   LearningRate 0.0357   Epoch: 8   Global Step: 45740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:13,133-Speed 3339.50 samples/sec   Loss 4.5230   LearningRate 0.0357   Epoch: 8   Global Step: 45750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:16,165-Speed 3378.73 samples/sec   Loss 4.4701   LearningRate 0.0357   Epoch: 8   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:19,193-Speed 3382.70 samples/sec   Loss 4.5775   LearningRate 0.0357   Epoch: 8   Global Step: 45770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:22,223-Speed 3379.16 samples/sec   Loss 4.6327   LearningRate 0.0357   Epoch: 8   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:25,272-Speed 3360.13 samples/sec   Loss 4.6381   LearningRate 0.0357   Epoch: 8   Global Step: 45790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:09:28,359-Speed 3317.13 samples/sec   Loss 4.6373   LearningRate 0.0357   Epoch: 8   Global Step: 45800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:09:31,391-Speed 3379.17 samples/sec   Loss 4.6187   LearningRate 0.0357   Epoch: 8   Global Step: 45810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:09:34,398-Speed 3406.30 samples/sec   Loss 4.4914   LearningRate 0.0357   Epoch: 8   Global Step: 45820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:37,446-Speed 3360.24 samples/sec   Loss 4.7367   LearningRate 0.0356   Epoch: 8   Global Step: 45830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:40,481-Speed 3374.40 samples/sec   Loss 4.6371   LearningRate 0.0356   Epoch: 8   Global Step: 45840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:43,509-Speed 3382.22 samples/sec   Loss 4.8325   LearningRate 0.0356   Epoch: 8   Global Step: 45850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:46,543-Speed 3376.31 samples/sec   Loss 4.6312   LearningRate 0.0356   Epoch: 8   Global Step: 45860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:49,575-Speed 3378.30 samples/sec   Loss 4.6023   LearningRate 0.0356   Epoch: 8   Global Step: 45870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:52,609-Speed 3375.66 samples/sec   Loss 4.7779   LearningRate 0.0356   Epoch: 8   Global Step: 45880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:55,644-Speed 3374.54 samples/sec   Loss 4.6945   LearningRate 0.0356   Epoch: 8   Global Step: 45890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:09:58,673-Speed 3381.80 samples/sec   Loss 4.6404   LearningRate 0.0356   Epoch: 8   Global Step: 45900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:01,707-Speed 3376.37 samples/sec   Loss 4.6914   LearningRate 0.0356   Epoch: 8   Global Step: 45910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:04,736-Speed 3381.53 samples/sec   Loss 4.7571   LearningRate 0.0355   Epoch: 8   Global Step: 45920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:10:07,759-Speed 3387.94 samples/sec   Loss 4.6747   LearningRate 0.0355   Epoch: 8   Global Step: 45930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:10:10,781-Speed 3389.19 samples/sec   Loss 4.7402   LearningRate 0.0355   Epoch: 8   Global Step: 45940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:13,819-Speed 3371.48 samples/sec   Loss 4.6161   LearningRate 0.0355   Epoch: 8   Global Step: 45950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:16,873-Speed 3353.62 samples/sec   Loss 4.7663   LearningRate 0.0355   Epoch: 8   Global Step: 45960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:19,911-Speed 3370.68 samples/sec   Loss 4.7509   LearningRate 0.0355   Epoch: 8   Global Step: 45970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:22,939-Speed 3382.78 samples/sec   Loss 4.6178   LearningRate 0.0355   Epoch: 8   Global Step: 45980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:25,967-Speed 3383.21 samples/sec   Loss 4.8002   LearningRate 0.0355   Epoch: 8   Global Step: 45990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:10:28,994-Speed 3383.80 samples/sec   Loss 4.6673   LearningRate 0.0355   Epoch: 8   Global Step: 46000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:11:12,874-[lfw][46000]XNorm: 23.118855
Training: 2022-04-27 06:11:12,875-[lfw][46000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-27 06:11:12,875-[lfw][46000]Accuracy-Highest: 0.99817
Training: 2022-04-27 06:12:03,785-[cfp_fp][46000]XNorm: 20.791686
Training: 2022-04-27 06:12:03,786-[cfp_fp][46000]Accuracy-Flip: 0.95686+-0.01188
Training: 2022-04-27 06:12:03,786-[cfp_fp][46000]Accuracy-Highest: 0.96057
Training: 2022-04-27 06:12:47,698-[agedb_30][46000]XNorm: 22.902561
Training: 2022-04-27 06:12:47,699-[agedb_30][46000]Accuracy-Flip: 0.97433+-0.00700
Training: 2022-04-27 06:12:47,699-[agedb_30][46000]Accuracy-Highest: 0.97767
Training: 2022-04-27 06:12:50,734-Speed 72.25 samples/sec   Loss 4.7250   LearningRate 0.0355   Epoch: 8   Global Step: 46010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:12:53,741-Speed 3406.18 samples/sec   Loss 4.7951   LearningRate 0.0354   Epoch: 8   Global Step: 46020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:12:56,749-Speed 3405.51 samples/sec   Loss 4.5494   LearningRate 0.0354   Epoch: 8   Global Step: 46030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:12:59,760-Speed 3401.78 samples/sec   Loss 4.6590   LearningRate 0.0354   Epoch: 8   Global Step: 46040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:02,777-Speed 3394.70 samples/sec   Loss 4.7455   LearningRate 0.0354   Epoch: 8   Global Step: 46050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:05,795-Speed 3393.07 samples/sec   Loss 4.6984   LearningRate 0.0354   Epoch: 8   Global Step: 46060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:08,824-Speed 3381.81 samples/sec   Loss 4.7139   LearningRate 0.0354   Epoch: 8   Global Step: 46070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:11,846-Speed 3389.00 samples/sec   Loss 4.7503   LearningRate 0.0354   Epoch: 8   Global Step: 46080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:14,876-Speed 3379.81 samples/sec   Loss 4.6169   LearningRate 0.0354   Epoch: 8   Global Step: 46090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:17,898-Speed 3390.53 samples/sec   Loss 4.7400   LearningRate 0.0354   Epoch: 8   Global Step: 46100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:20,914-Speed 3395.94 samples/sec   Loss 4.7357   LearningRate 0.0353   Epoch: 8   Global Step: 46110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:23,936-Speed 3389.47 samples/sec   Loss 4.7998   LearningRate 0.0353   Epoch: 8   Global Step: 46120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:26,954-Speed 3393.52 samples/sec   Loss 4.5651   LearningRate 0.0353   Epoch: 8   Global Step: 46130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:29,971-Speed 3394.97 samples/sec   Loss 4.7330   LearningRate 0.0353   Epoch: 8   Global Step: 46140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:32,992-Speed 3390.15 samples/sec   Loss 4.7007   LearningRate 0.0353   Epoch: 8   Global Step: 46150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:36,026-Speed 3375.78 samples/sec   Loss 4.8386   LearningRate 0.0353   Epoch: 8   Global Step: 46160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:13:39,055-Speed 3381.11 samples/sec   Loss 4.8025   LearningRate 0.0353   Epoch: 8   Global Step: 46170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:42,076-Speed 3390.40 samples/sec   Loss 4.5824   LearningRate 0.0353   Epoch: 8   Global Step: 46180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:45,092-Speed 3396.42 samples/sec   Loss 4.6176   LearningRate 0.0353   Epoch: 8   Global Step: 46190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:48,114-Speed 3389.91 samples/sec   Loss 4.7136   LearningRate 0.0353   Epoch: 8   Global Step: 46200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:51,183-Speed 3337.01 samples/sec   Loss 4.7062   LearningRate 0.0352   Epoch: 8   Global Step: 46210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:54,198-Speed 3396.57 samples/sec   Loss 4.8154   LearningRate 0.0352   Epoch: 8   Global Step: 46220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:13:57,216-Speed 3393.79 samples/sec   Loss 4.7322   LearningRate 0.0352   Epoch: 8   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:14:00,231-Speed 3397.34 samples/sec   Loss 4.7865   LearningRate 0.0352   Epoch: 8   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:14:03,246-Speed 3396.88 samples/sec   Loss 4.7873   LearningRate 0.0352   Epoch: 8   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:14:06,244-Speed 3416.54 samples/sec   Loss 4.8397   LearningRate 0.0352   Epoch: 8   Global Step: 46260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:09,256-Speed 3400.14 samples/sec   Loss 4.6468   LearningRate 0.0352   Epoch: 8   Global Step: 46270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:12,322-Speed 3340.97 samples/sec   Loss 4.7559   LearningRate 0.0352   Epoch: 8   Global Step: 46280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:15,485-Speed 3238.97 samples/sec   Loss 4.7137   LearningRate 0.0352   Epoch: 8   Global Step: 46290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:18,526-Speed 3367.97 samples/sec   Loss 4.8398   LearningRate 0.0351   Epoch: 8   Global Step: 46300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:21,539-Speed 3399.35 samples/sec   Loss 4.8855   LearningRate 0.0351   Epoch: 8   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:24,553-Speed 3397.84 samples/sec   Loss 4.7841   LearningRate 0.0351   Epoch: 8   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:27,567-Speed 3397.92 samples/sec   Loss 4.6847   LearningRate 0.0351   Epoch: 8   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:30,587-Speed 3391.47 samples/sec   Loss 4.7929   LearningRate 0.0351   Epoch: 8   Global Step: 46340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:33,600-Speed 3399.41 samples/sec   Loss 4.7822   LearningRate 0.0351   Epoch: 8   Global Step: 46350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:36,598-Speed 3416.18 samples/sec   Loss 4.8397   LearningRate 0.0351   Epoch: 8   Global Step: 46360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:39,620-Speed 3390.43 samples/sec   Loss 4.6641   LearningRate 0.0351   Epoch: 8   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:42,635-Speed 3396.62 samples/sec   Loss 4.8398   LearningRate 0.0351   Epoch: 8   Global Step: 46380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:45,652-Speed 3394.86 samples/sec   Loss 4.7205   LearningRate 0.0351   Epoch: 8   Global Step: 46390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:48,671-Speed 3393.26 samples/sec   Loss 4.7172   LearningRate 0.0350   Epoch: 8   Global Step: 46400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:51,685-Speed 3398.01 samples/sec   Loss 4.7686   LearningRate 0.0350   Epoch: 8   Global Step: 46410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:54,714-Speed 3381.51 samples/sec   Loss 4.8397   LearningRate 0.0350   Epoch: 8   Global Step: 46420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:14:57,728-Speed 3397.55 samples/sec   Loss 4.8008   LearningRate 0.0350   Epoch: 8   Global Step: 46430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:00,755-Speed 3385.17 samples/sec   Loss 4.8169   LearningRate 0.0350   Epoch: 8   Global Step: 46440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:03,775-Speed 3392.01 samples/sec   Loss 4.7772   LearningRate 0.0350   Epoch: 8   Global Step: 46450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:06,804-Speed 3381.17 samples/sec   Loss 4.6895   LearningRate 0.0350   Epoch: 8   Global Step: 46460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:15:09,809-Speed 3408.47 samples/sec   Loss 4.7561   LearningRate 0.0350   Epoch: 8   Global Step: 46470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:12,833-Speed 3387.10 samples/sec   Loss 4.8481   LearningRate 0.0350   Epoch: 8   Global Step: 46480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:15,847-Speed 3398.32 samples/sec   Loss 4.7991   LearningRate 0.0350   Epoch: 8   Global Step: 46490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:18,868-Speed 3390.20 samples/sec   Loss 4.7659   LearningRate 0.0349   Epoch: 8   Global Step: 46500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:21,882-Speed 3398.57 samples/sec   Loss 4.7290   LearningRate 0.0349   Epoch: 8   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:24,896-Speed 3397.79 samples/sec   Loss 4.8323   LearningRate 0.0349   Epoch: 8   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:27,917-Speed 3390.46 samples/sec   Loss 4.7320   LearningRate 0.0349   Epoch: 8   Global Step: 46530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:30,940-Speed 3388.35 samples/sec   Loss 4.8321   LearningRate 0.0349   Epoch: 8   Global Step: 46540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:33,974-Speed 3375.89 samples/sec   Loss 4.7893   LearningRate 0.0349   Epoch: 8   Global Step: 46550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:36,995-Speed 3390.88 samples/sec   Loss 4.8224   LearningRate 0.0349   Epoch: 8   Global Step: 46560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:15:40,011-Speed 3396.18 samples/sec   Loss 4.8651   LearningRate 0.0349   Epoch: 8   Global Step: 46570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:15:43,028-Speed 3394.21 samples/sec   Loss 4.7901   LearningRate 0.0349   Epoch: 8   Global Step: 46580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:15:46,044-Speed 3395.77 samples/sec   Loss 4.6614   LearningRate 0.0348   Epoch: 8   Global Step: 46590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:15:49,065-Speed 3390.41 samples/sec   Loss 4.9282   LearningRate 0.0348   Epoch: 8   Global Step: 46600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:15:52,086-Speed 3390.33 samples/sec   Loss 4.8761   LearningRate 0.0348   Epoch: 8   Global Step: 46610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:15:55,106-Speed 3391.86 samples/sec   Loss 4.7597   LearningRate 0.0348   Epoch: 8   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:15:58,126-Speed 3391.29 samples/sec   Loss 4.8391   LearningRate 0.0348   Epoch: 8   Global Step: 46630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:01,165-Speed 3371.15 samples/sec   Loss 4.7151   LearningRate 0.0348   Epoch: 8   Global Step: 46640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:04,185-Speed 3391.81 samples/sec   Loss 4.7956   LearningRate 0.0348   Epoch: 8   Global Step: 46650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:07,201-Speed 3395.19 samples/sec   Loss 4.7884   LearningRate 0.0348   Epoch: 8   Global Step: 46660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:10,216-Speed 3396.86 samples/sec   Loss 4.7377   LearningRate 0.0348   Epoch: 8   Global Step: 46670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-27 06:16:13,224-Speed 3405.64 samples/sec   Loss 4.7551   LearningRate 0.0348   Epoch: 8   Global Step: 46680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:16,269-Speed 3363.62 samples/sec   Loss 4.7393   LearningRate 0.0347   Epoch: 8   Global Step: 46690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:19,290-Speed 3390.33 samples/sec   Loss 4.8593   LearningRate 0.0347   Epoch: 8   Global Step: 46700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:22,294-Speed 3409.23 samples/sec   Loss 4.6263   LearningRate 0.0347   Epoch: 8   Global Step: 46710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:25,316-Speed 3389.41 samples/sec   Loss 4.7867   LearningRate 0.0347   Epoch: 8   Global Step: 46720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:28,331-Speed 3397.82 samples/sec   Loss 4.9114   LearningRate 0.0347   Epoch: 8   Global Step: 46730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:31,347-Speed 3395.61 samples/sec   Loss 4.7684   LearningRate 0.0347   Epoch: 8   Global Step: 46740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:34,375-Speed 3382.26 samples/sec   Loss 4.8525   LearningRate 0.0347   Epoch: 8   Global Step: 46750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:37,406-Speed 3379.57 samples/sec   Loss 4.7904   LearningRate 0.0347   Epoch: 8   Global Step: 46760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:40,444-Speed 3370.59 samples/sec   Loss 4.7980   LearningRate 0.0347   Epoch: 8   Global Step: 46770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:43,471-Speed 3383.81 samples/sec   Loss 4.9024   LearningRate 0.0346   Epoch: 8   Global Step: 46780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:46,496-Speed 3386.07 samples/sec   Loss 4.8915   LearningRate 0.0346   Epoch: 8   Global Step: 46790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:49,518-Speed 3389.17 samples/sec   Loss 4.8860   LearningRate 0.0346   Epoch: 8   Global Step: 46800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:16:52,544-Speed 3385.18 samples/sec   Loss 4.7821   LearningRate 0.0346   Epoch: 8   Global Step: 46810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:55,563-Speed 3392.35 samples/sec   Loss 4.8769   LearningRate 0.0346   Epoch: 8   Global Step: 46820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:16:58,599-Speed 3374.31 samples/sec   Loss 4.7758   LearningRate 0.0346   Epoch: 8   Global Step: 46830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:17:01,605-Speed 3407.06 samples/sec   Loss 4.8896   LearningRate 0.0346   Epoch: 8   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:04,624-Speed 3392.50 samples/sec   Loss 4.8353   LearningRate 0.0346   Epoch: 8   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:07,642-Speed 3393.73 samples/sec   Loss 4.8833   LearningRate 0.0346   Epoch: 8   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:10,664-Speed 3390.03 samples/sec   Loss 4.8047   LearningRate 0.0346   Epoch: 8   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:13,683-Speed 3391.89 samples/sec   Loss 4.8645   LearningRate 0.0345   Epoch: 8   Global Step: 46880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:16,723-Speed 3369.98 samples/sec   Loss 4.7939   LearningRate 0.0345   Epoch: 8   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:19,741-Speed 3393.78 samples/sec   Loss 4.7492   LearningRate 0.0345   Epoch: 8   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:22,764-Speed 3388.30 samples/sec   Loss 4.7189   LearningRate 0.0345   Epoch: 8   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:25,802-Speed 3371.00 samples/sec   Loss 4.7710   LearningRate 0.0345   Epoch: 8   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:28,824-Speed 3389.52 samples/sec   Loss 4.8124   LearningRate 0.0345   Epoch: 8   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:31,825-Speed 3412.60 samples/sec   Loss 4.8261   LearningRate 0.0345   Epoch: 8   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:34,863-Speed 3371.38 samples/sec   Loss 4.7894   LearningRate 0.0345   Epoch: 8   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:37,885-Speed 3388.99 samples/sec   Loss 4.7925   LearningRate 0.0345   Epoch: 8   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:40,904-Speed 3393.58 samples/sec   Loss 4.7607   LearningRate 0.0345   Epoch: 8   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:43,933-Speed 3380.96 samples/sec   Loss 4.8426   LearningRate 0.0344   Epoch: 8   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:46,954-Speed 3390.98 samples/sec   Loss 4.9154   LearningRate 0.0344   Epoch: 8   Global Step: 46990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:49,976-Speed 3389.44 samples/sec   Loss 4.8233   LearningRate 0.0344   Epoch: 8   Global Step: 47000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:53,001-Speed 3386.68 samples/sec   Loss 4.8432   LearningRate 0.0344   Epoch: 8   Global Step: 47010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:56,023-Speed 3388.60 samples/sec   Loss 4.7831   LearningRate 0.0344   Epoch: 8   Global Step: 47020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:17:59,052-Speed 3381.78 samples/sec   Loss 4.9477   LearningRate 0.0344   Epoch: 8   Global Step: 47030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:02,050-Speed 3415.75 samples/sec   Loss 4.8510   LearningRate 0.0344   Epoch: 8   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:05,071-Speed 3390.27 samples/sec   Loss 4.8035   LearningRate 0.0344   Epoch: 8   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:08,102-Speed 3379.60 samples/sec   Loss 4.8002   LearningRate 0.0344   Epoch: 8   Global Step: 47060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:11,134-Speed 3377.93 samples/sec   Loss 4.7421   LearningRate 0.0343   Epoch: 8   Global Step: 47070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:14,163-Speed 3381.55 samples/sec   Loss 5.0155   LearningRate 0.0343   Epoch: 8   Global Step: 47080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:17,190-Speed 3383.57 samples/sec   Loss 4.9281   LearningRate 0.0343   Epoch: 8   Global Step: 47090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:20,214-Speed 3387.70 samples/sec   Loss 4.7878   LearningRate 0.0343   Epoch: 8   Global Step: 47100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:23,236-Speed 3388.94 samples/sec   Loss 5.0283   LearningRate 0.0343   Epoch: 8   Global Step: 47110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:26,262-Speed 3385.09 samples/sec   Loss 4.8428   LearningRate 0.0343   Epoch: 8   Global Step: 47120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:29,286-Speed 3386.76 samples/sec   Loss 4.8222   LearningRate 0.0343   Epoch: 8   Global Step: 47130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:32,306-Speed 3390.89 samples/sec   Loss 4.8882   LearningRate 0.0343   Epoch: 8   Global Step: 47140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:35,330-Speed 3387.35 samples/sec   Loss 4.7973   LearningRate 0.0343   Epoch: 8   Global Step: 47150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:38,357-Speed 3384.02 samples/sec   Loss 4.8289   LearningRate 0.0343   Epoch: 8   Global Step: 47160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:18:41,388-Speed 3378.81 samples/sec   Loss 4.8268   LearningRate 0.0342   Epoch: 8   Global Step: 47170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:44,413-Speed 3385.88 samples/sec   Loss 4.8214   LearningRate 0.0342   Epoch: 8   Global Step: 47180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:47,444-Speed 3380.14 samples/sec   Loss 4.8230   LearningRate 0.0342   Epoch: 8   Global Step: 47190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:50,472-Speed 3382.37 samples/sec   Loss 4.8960   LearningRate 0.0342   Epoch: 8   Global Step: 47200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:53,497-Speed 3386.16 samples/sec   Loss 4.9256   LearningRate 0.0342   Epoch: 8   Global Step: 47210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:56,526-Speed 3381.06 samples/sec   Loss 4.7586   LearningRate 0.0342   Epoch: 8   Global Step: 47220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:18:59,562-Speed 3373.46 samples/sec   Loss 4.7973   LearningRate 0.0342   Epoch: 8   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:19:02,594-Speed 3378.08 samples/sec   Loss 4.7834   LearningRate 0.0342   Epoch: 8   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:19:05,618-Speed 3386.89 samples/sec   Loss 4.8609   LearningRate 0.0342   Epoch: 8   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:19:08,640-Speed 3389.16 samples/sec   Loss 4.8556   LearningRate 0.0342   Epoch: 8   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:19:11,669-Speed 3381.80 samples/sec   Loss 4.7174   LearningRate 0.0341   Epoch: 8   Global Step: 47270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:19:14,703-Speed 3376.27 samples/sec   Loss 4.7351   LearningRate 0.0341   Epoch: 8   Global Step: 47280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:19:17,733-Speed 3379.51 samples/sec   Loss 4.9060   LearningRate 0.0341   Epoch: 8   Global Step: 47290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:19:20,760-Speed 3384.16 samples/sec   Loss 4.8989   LearningRate 0.0341   Epoch: 8   Global Step: 47300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:19:23,813-Speed 3354.31 samples/sec   Loss 4.8559   LearningRate 0.0341   Epoch: 8   Global Step: 47310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:19:26,849-Speed 3374.11 samples/sec   Loss 4.9856   LearningRate 0.0341   Epoch: 8   Global Step: 47320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:19:29,943-Speed 3310.85 samples/sec   Loss 4.7822   LearningRate 0.0341   Epoch: 8   Global Step: 47330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:32,975-Speed 3377.88 samples/sec   Loss 4.8126   LearningRate 0.0341   Epoch: 8   Global Step: 47340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:35,998-Speed 3387.43 samples/sec   Loss 4.8426   LearningRate 0.0341   Epoch: 8   Global Step: 47350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:39,019-Speed 3390.73 samples/sec   Loss 4.9461   LearningRate 0.0341   Epoch: 8   Global Step: 47360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:42,048-Speed 3381.90 samples/sec   Loss 4.7726   LearningRate 0.0340   Epoch: 8   Global Step: 47370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:45,066-Speed 3393.40 samples/sec   Loss 4.8044   LearningRate 0.0340   Epoch: 8   Global Step: 47380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:48,094-Speed 3382.70 samples/sec   Loss 4.8305   LearningRate 0.0340   Epoch: 8   Global Step: 47390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:51,122-Speed 3382.93 samples/sec   Loss 4.9256   LearningRate 0.0340   Epoch: 8   Global Step: 47400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:54,153-Speed 3379.09 samples/sec   Loss 4.8226   LearningRate 0.0340   Epoch: 8   Global Step: 47410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:19:57,175-Speed 3389.59 samples/sec   Loss 4.7749   LearningRate 0.0340   Epoch: 8   Global Step: 47420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:20:00,209-Speed 3375.61 samples/sec   Loss 4.7565   LearningRate 0.0340   Epoch: 8   Global Step: 47430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:03,245-Speed 3374.26 samples/sec   Loss 4.8498   LearningRate 0.0340   Epoch: 8   Global Step: 47440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:06,268-Speed 3387.18 samples/sec   Loss 4.7748   LearningRate 0.0340   Epoch: 8   Global Step: 47450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:09,291-Speed 3388.98 samples/sec   Loss 4.8300   LearningRate 0.0339   Epoch: 8   Global Step: 47460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:12,318-Speed 3382.70 samples/sec   Loss 4.8542   LearningRate 0.0339   Epoch: 8   Global Step: 47470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:15,350-Speed 3378.19 samples/sec   Loss 4.9235   LearningRate 0.0339   Epoch: 8   Global Step: 47480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:18,383-Speed 3377.71 samples/sec   Loss 4.9030   LearningRate 0.0339   Epoch: 8   Global Step: 47490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:21,411-Speed 3382.12 samples/sec   Loss 4.7931   LearningRate 0.0339   Epoch: 8   Global Step: 47500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:24,444-Speed 3376.78 samples/sec   Loss 4.7438   LearningRate 0.0339   Epoch: 8   Global Step: 47510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:27,464-Speed 3391.70 samples/sec   Loss 4.8226   LearningRate 0.0339   Epoch: 8   Global Step: 47520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:30,489-Speed 3385.68 samples/sec   Loss 4.8273   LearningRate 0.0339   Epoch: 8   Global Step: 47530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:20:33,512-Speed 3388.77 samples/sec   Loss 4.7734   LearningRate 0.0339   Epoch: 8   Global Step: 47540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:20:36,537-Speed 3385.59 samples/sec   Loss 4.8034   LearningRate 0.0339   Epoch: 8   Global Step: 47550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:20:39,557-Speed 3391.66 samples/sec   Loss 5.0144   LearningRate 0.0338   Epoch: 8   Global Step: 47560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:20:42,586-Speed 3380.84 samples/sec   Loss 4.8389   LearningRate 0.0338   Epoch: 8   Global Step: 47570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:20:45,641-Speed 3353.74 samples/sec   Loss 4.7879   LearningRate 0.0338   Epoch: 8   Global Step: 47580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:48,669-Speed 3381.94 samples/sec   Loss 4.8649   LearningRate 0.0338   Epoch: 8   Global Step: 47590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:51,696-Speed 3383.25 samples/sec   Loss 4.7930   LearningRate 0.0338   Epoch: 8   Global Step: 47600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:54,726-Speed 3380.08 samples/sec   Loss 4.8867   LearningRate 0.0338   Epoch: 8   Global Step: 47610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:20:57,748-Speed 3390.02 samples/sec   Loss 4.9077   LearningRate 0.0338   Epoch: 8   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:00,775-Speed 3383.91 samples/sec   Loss 5.0172   LearningRate 0.0338   Epoch: 8   Global Step: 47630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:03,810-Speed 3374.33 samples/sec   Loss 4.8605   LearningRate 0.0338   Epoch: 8   Global Step: 47640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:06,840-Speed 3380.33 samples/sec   Loss 4.8614   LearningRate 0.0338   Epoch: 8   Global Step: 47650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:09,867-Speed 3384.12 samples/sec   Loss 4.7660   LearningRate 0.0337   Epoch: 8   Global Step: 47660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:12,900-Speed 3376.47 samples/sec   Loss 4.8332   LearningRate 0.0337   Epoch: 8   Global Step: 47670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:15,916-Speed 3396.76 samples/sec   Loss 4.8318   LearningRate 0.0337   Epoch: 8   Global Step: 47680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:18,950-Speed 3375.68 samples/sec   Loss 4.7433   LearningRate 0.0337   Epoch: 8   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:21,987-Speed 3372.36 samples/sec   Loss 4.8721   LearningRate 0.0337   Epoch: 8   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:25,043-Speed 3351.47 samples/sec   Loss 4.9758   LearningRate 0.0337   Epoch: 8   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:21:28,058-Speed 3397.40 samples/sec   Loss 4.7061   LearningRate 0.0337   Epoch: 8   Global Step: 47720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:31,092-Speed 3375.62 samples/sec   Loss 4.7730   LearningRate 0.0337   Epoch: 8   Global Step: 47730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:34,115-Speed 3388.19 samples/sec   Loss 4.8337   LearningRate 0.0337   Epoch: 8   Global Step: 47740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:37,142-Speed 3383.68 samples/sec   Loss 4.9189   LearningRate 0.0337   Epoch: 8   Global Step: 47750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:40,172-Speed 3380.74 samples/sec   Loss 4.7730   LearningRate 0.0336   Epoch: 8   Global Step: 47760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:43,230-Speed 3349.18 samples/sec   Loss 4.8492   LearningRate 0.0336   Epoch: 8   Global Step: 47770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:46,524-Speed 3109.42 samples/sec   Loss 4.9662   LearningRate 0.0336   Epoch: 8   Global Step: 47780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:49,561-Speed 3372.27 samples/sec   Loss 4.8313   LearningRate 0.0336   Epoch: 8   Global Step: 47790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:52,594-Speed 3377.24 samples/sec   Loss 4.7942   LearningRate 0.0336   Epoch: 8   Global Step: 47800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:55,628-Speed 3375.33 samples/sec   Loss 4.8597   LearningRate 0.0336   Epoch: 8   Global Step: 47810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:21:58,656-Speed 3383.02 samples/sec   Loss 4.8036   LearningRate 0.0336   Epoch: 8   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:01,686-Speed 3380.26 samples/sec   Loss 4.8270   LearningRate 0.0336   Epoch: 8   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:04,713-Speed 3383.78 samples/sec   Loss 4.8489   LearningRate 0.0336   Epoch: 8   Global Step: 47840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:07,746-Speed 3376.87 samples/sec   Loss 4.8760   LearningRate 0.0336   Epoch: 8   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:10,789-Speed 3365.80 samples/sec   Loss 4.8593   LearningRate 0.0335   Epoch: 8   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:13,824-Speed 3375.14 samples/sec   Loss 4.8315   LearningRate 0.0335   Epoch: 8   Global Step: 47870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:16,865-Speed 3368.59 samples/sec   Loss 4.7942   LearningRate 0.0335   Epoch: 8   Global Step: 47880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:19,894-Speed 3381.18 samples/sec   Loss 4.7924   LearningRate 0.0335   Epoch: 8   Global Step: 47890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:22,949-Speed 3352.52 samples/sec   Loss 4.7469   LearningRate 0.0335   Epoch: 8   Global Step: 47900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:26,036-Speed 3318.46 samples/sec   Loss 4.8281   LearningRate 0.0335   Epoch: 8   Global Step: 47910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:29,065-Speed 3381.75 samples/sec   Loss 4.8939   LearningRate 0.0335   Epoch: 8   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:22:32,100-Speed 3374.26 samples/sec   Loss 4.6642   LearningRate 0.0335   Epoch: 8   Global Step: 47930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:35,137-Speed 3372.07 samples/sec   Loss 4.9551   LearningRate 0.0335   Epoch: 8   Global Step: 47940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:38,208-Speed 3335.62 samples/sec   Loss 4.7621   LearningRate 0.0334   Epoch: 8   Global Step: 47950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:41,254-Speed 3362.71 samples/sec   Loss 4.8003   LearningRate 0.0334   Epoch: 8   Global Step: 47960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:44,285-Speed 3378.98 samples/sec   Loss 4.8443   LearningRate 0.0334   Epoch: 8   Global Step: 47970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:47,322-Speed 3372.52 samples/sec   Loss 4.8526   LearningRate 0.0334   Epoch: 8   Global Step: 47980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:50,354-Speed 3378.49 samples/sec   Loss 4.8608   LearningRate 0.0334   Epoch: 8   Global Step: 47990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:22:53,385-Speed 3379.12 samples/sec   Loss 4.8231   LearningRate 0.0334   Epoch: 8   Global Step: 48000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:23:37,027-[lfw][48000]XNorm: 20.882258
Training: 2022-04-27 06:23:37,028-[lfw][48000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-27 06:23:37,028-[lfw][48000]Accuracy-Highest: 0.99817
Training: 2022-04-27 06:24:27,354-[cfp_fp][48000]XNorm: 18.910019
Training: 2022-04-27 06:24:27,354-[cfp_fp][48000]Accuracy-Flip: 0.96414+-0.00849
Training: 2022-04-27 06:24:27,355-[cfp_fp][48000]Accuracy-Highest: 0.96414
Training: 2022-04-27 06:25:10,627-[agedb_30][48000]XNorm: 21.358148
Training: 2022-04-27 06:25:10,628-[agedb_30][48000]Accuracy-Flip: 0.97583+-0.00883
Training: 2022-04-27 06:25:10,629-[agedb_30][48000]Accuracy-Highest: 0.97767
Training: 2022-04-27 06:25:13,643-Speed 73.01 samples/sec   Loss 4.8041   LearningRate 0.0334   Epoch: 8   Global Step: 48010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:16,650-Speed 3405.66 samples/sec   Loss 4.9085   LearningRate 0.0334   Epoch: 8   Global Step: 48020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:19,662-Speed 3400.89 samples/sec   Loss 4.7279   LearningRate 0.0334   Epoch: 8   Global Step: 48030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:25:22,696-Speed 3375.65 samples/sec   Loss 4.6427   LearningRate 0.0334   Epoch: 8   Global Step: 48040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:25:25,706-Speed 3402.87 samples/sec   Loss 4.6844   LearningRate 0.0333   Epoch: 8   Global Step: 48050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:25:28,719-Speed 3399.02 samples/sec   Loss 4.8449   LearningRate 0.0333   Epoch: 8   Global Step: 48060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:25:31,730-Speed 3401.76 samples/sec   Loss 4.8948   LearningRate 0.0333   Epoch: 8   Global Step: 48070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:25:34,742-Speed 3400.69 samples/sec   Loss 4.7774   LearningRate 0.0333   Epoch: 8   Global Step: 48080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:25:37,748-Speed 3407.63 samples/sec   Loss 4.7377   LearningRate 0.0333   Epoch: 8   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:40,767-Speed 3393.01 samples/sec   Loss 4.8427   LearningRate 0.0333   Epoch: 8   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:43,785-Speed 3393.23 samples/sec   Loss 4.7641   LearningRate 0.0333   Epoch: 8   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:46,807-Speed 3388.98 samples/sec   Loss 4.8331   LearningRate 0.0333   Epoch: 8   Global Step: 48120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:49,830-Speed 3387.99 samples/sec   Loss 4.8132   LearningRate 0.0333   Epoch: 8   Global Step: 48130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:52,861-Speed 3379.35 samples/sec   Loss 4.8386   LearningRate 0.0333   Epoch: 8   Global Step: 48140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:55,888-Speed 3384.33 samples/sec   Loss 4.7358   LearningRate 0.0332   Epoch: 8   Global Step: 48150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:25:58,923-Speed 3374.94 samples/sec   Loss 4.7395   LearningRate 0.0332   Epoch: 8   Global Step: 48160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:01,959-Speed 3372.75 samples/sec   Loss 4.7458   LearningRate 0.0332   Epoch: 8   Global Step: 48170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:04,979-Speed 3391.82 samples/sec   Loss 4.9683   LearningRate 0.0332   Epoch: 8   Global Step: 48180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:08,017-Speed 3372.01 samples/sec   Loss 5.0165   LearningRate 0.0332   Epoch: 8   Global Step: 48190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:11,047-Speed 3380.37 samples/sec   Loss 4.7544   LearningRate 0.0332   Epoch: 8   Global Step: 48200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:14,076-Speed 3381.20 samples/sec   Loss 4.9025   LearningRate 0.0332   Epoch: 8   Global Step: 48210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:17,107-Speed 3379.55 samples/sec   Loss 4.8440   LearningRate 0.0332   Epoch: 8   Global Step: 48220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:20,130-Speed 3388.09 samples/sec   Loss 4.8590   LearningRate 0.0332   Epoch: 8   Global Step: 48230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:23,173-Speed 3365.85 samples/sec   Loss 4.8387   LearningRate 0.0332   Epoch: 8   Global Step: 48240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:26,213-Speed 3369.30 samples/sec   Loss 4.8580   LearningRate 0.0331   Epoch: 8   Global Step: 48250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:29,230-Speed 3394.58 samples/sec   Loss 4.8183   LearningRate 0.0331   Epoch: 8   Global Step: 48260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:26:32,238-Speed 3404.80 samples/sec   Loss 4.8211   LearningRate 0.0331   Epoch: 8   Global Step: 48270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:35,252-Speed 3399.40 samples/sec   Loss 4.7864   LearningRate 0.0331   Epoch: 8   Global Step: 48280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:38,295-Speed 3364.84 samples/sec   Loss 4.7926   LearningRate 0.0331   Epoch: 8   Global Step: 48290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:41,315-Speed 3392.08 samples/sec   Loss 4.8484   LearningRate 0.0331   Epoch: 8   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:44,330-Speed 3396.79 samples/sec   Loss 4.8009   LearningRate 0.0331   Epoch: 8   Global Step: 48310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:47,339-Speed 3403.50 samples/sec   Loss 4.8516   LearningRate 0.0331   Epoch: 8   Global Step: 48320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:50,351-Speed 3400.69 samples/sec   Loss 4.8549   LearningRate 0.0331   Epoch: 8   Global Step: 48330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:26:53,345-Speed 3420.79 samples/sec   Loss 4.7229   LearningRate 0.0331   Epoch: 8   Global Step: 48340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:26:56,361-Speed 3396.49 samples/sec   Loss 4.7837   LearningRate 0.0330   Epoch: 8   Global Step: 48350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:26:59,415-Speed 3353.53 samples/sec   Loss 4.6946   LearningRate 0.0330   Epoch: 8   Global Step: 48360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:02,471-Speed 3351.29 samples/sec   Loss 4.8831   LearningRate 0.0330   Epoch: 8   Global Step: 48370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:05,484-Speed 3399.79 samples/sec   Loss 4.8546   LearningRate 0.0330   Epoch: 8   Global Step: 48380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:08,495-Speed 3402.05 samples/sec   Loss 4.8652   LearningRate 0.0330   Epoch: 8   Global Step: 48390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:11,507-Speed 3400.46 samples/sec   Loss 4.8407   LearningRate 0.0330   Epoch: 8   Global Step: 48400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:14,522-Speed 3397.27 samples/sec   Loss 4.8139   LearningRate 0.0330   Epoch: 8   Global Step: 48410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:17,531-Speed 3403.11 samples/sec   Loss 4.7016   LearningRate 0.0330   Epoch: 8   Global Step: 48420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:20,540-Speed 3404.74 samples/sec   Loss 4.8673   LearningRate 0.0330   Epoch: 8   Global Step: 48430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 06:27:23,552-Speed 3400.79 samples/sec   Loss 4.8383   LearningRate 0.0330   Epoch: 8   Global Step: 48440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:26,585-Speed 3376.61 samples/sec   Loss 4.8428   LearningRate 0.0329   Epoch: 8   Global Step: 48450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:29,589-Speed 3409.44 samples/sec   Loss 4.7441   LearningRate 0.0329   Epoch: 8   Global Step: 48460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:32,599-Speed 3402.56 samples/sec   Loss 4.8107   LearningRate 0.0329   Epoch: 8   Global Step: 48470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:35,612-Speed 3400.05 samples/sec   Loss 4.7637   LearningRate 0.0329   Epoch: 8   Global Step: 48480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:38,627-Speed 3396.67 samples/sec   Loss 4.7374   LearningRate 0.0329   Epoch: 8   Global Step: 48490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:41,637-Speed 3402.75 samples/sec   Loss 4.8213   LearningRate 0.0329   Epoch: 8   Global Step: 48500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:44,647-Speed 3402.99 samples/sec   Loss 4.8372   LearningRate 0.0329   Epoch: 8   Global Step: 48510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:47,661-Speed 3398.02 samples/sec   Loss 4.8149   LearningRate 0.0329   Epoch: 8   Global Step: 48520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:50,678-Speed 3395.25 samples/sec   Loss 4.8261   LearningRate 0.0329   Epoch: 8   Global Step: 48530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:27:53,696-Speed 3394.18 samples/sec   Loss 4.7824   LearningRate 0.0329   Epoch: 8   Global Step: 48540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:27:56,709-Speed 3398.58 samples/sec   Loss 4.9195   LearningRate 0.0328   Epoch: 8   Global Step: 48550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:27:59,719-Speed 3403.24 samples/sec   Loss 4.8994   LearningRate 0.0328   Epoch: 8   Global Step: 48560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:28:02,745-Speed 3385.30 samples/sec   Loss 4.7955   LearningRate 0.0328   Epoch: 8   Global Step: 48570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:28:05,739-Speed 3421.17 samples/sec   Loss 4.7281   LearningRate 0.0328   Epoch: 8   Global Step: 48580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:08,753-Speed 3397.09 samples/sec   Loss 4.7913   LearningRate 0.0328   Epoch: 8   Global Step: 48590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:11,808-Speed 3353.60 samples/sec   Loss 4.7609   LearningRate 0.0328   Epoch: 8   Global Step: 48600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:14,905-Speed 3306.30 samples/sec   Loss 4.6911   LearningRate 0.0328   Epoch: 8   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:17,935-Speed 3380.43 samples/sec   Loss 4.8425   LearningRate 0.0328   Epoch: 8   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:20,949-Speed 3398.54 samples/sec   Loss 4.8620   LearningRate 0.0328   Epoch: 8   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:23,992-Speed 3366.00 samples/sec   Loss 4.7975   LearningRate 0.0328   Epoch: 8   Global Step: 48640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:27,010-Speed 3394.17 samples/sec   Loss 4.7880   LearningRate 0.0327   Epoch: 8   Global Step: 48650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:30,027-Speed 3394.68 samples/sec   Loss 4.8811   LearningRate 0.0327   Epoch: 8   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:33,043-Speed 3396.32 samples/sec   Loss 4.7324   LearningRate 0.0327   Epoch: 8   Global Step: 48670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:36,121-Speed 3327.80 samples/sec   Loss 4.8460   LearningRate 0.0327   Epoch: 8   Global Step: 48680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:28:39,163-Speed 3366.01 samples/sec   Loss 4.9830   LearningRate 0.0327   Epoch: 8   Global Step: 48690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:28:42,183-Speed 3392.17 samples/sec   Loss 4.5895   LearningRate 0.0327   Epoch: 8   Global Step: 48700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:28:45,188-Speed 3408.49 samples/sec   Loss 4.7491   LearningRate 0.0327   Epoch: 8   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:48,205-Speed 3395.33 samples/sec   Loss 4.8440   LearningRate 0.0327   Epoch: 8   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:51,215-Speed 3402.53 samples/sec   Loss 4.7299   LearningRate 0.0327   Epoch: 8   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:54,227-Speed 3400.75 samples/sec   Loss 4.7640   LearningRate 0.0327   Epoch: 8   Global Step: 48740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:28:57,243-Speed 3395.77 samples/sec   Loss 4.7407   LearningRate 0.0326   Epoch: 8   Global Step: 48750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:00,275-Speed 3377.61 samples/sec   Loss 4.6928   LearningRate 0.0326   Epoch: 8   Global Step: 48760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:03,291-Speed 3396.73 samples/sec   Loss 4.7325   LearningRate 0.0326   Epoch: 8   Global Step: 48770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:06,300-Speed 3403.79 samples/sec   Loss 4.7766   LearningRate 0.0326   Epoch: 8   Global Step: 48780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:09,321-Speed 3390.23 samples/sec   Loss 4.8664   LearningRate 0.0326   Epoch: 8   Global Step: 48790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:12,337-Speed 3396.51 samples/sec   Loss 4.7851   LearningRate 0.0326   Epoch: 8   Global Step: 48800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:15,359-Speed 3388.58 samples/sec   Loss 4.7981   LearningRate 0.0326   Epoch: 8   Global Step: 48810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:18,372-Speed 3400.13 samples/sec   Loss 4.8202   LearningRate 0.0326   Epoch: 8   Global Step: 48820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:21,388-Speed 3395.57 samples/sec   Loss 4.8316   LearningRate 0.0326   Epoch: 8   Global Step: 48830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:24,405-Speed 3394.76 samples/sec   Loss 4.7582   LearningRate 0.0325   Epoch: 8   Global Step: 48840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:27,429-Speed 3387.77 samples/sec   Loss 4.8365   LearningRate 0.0325   Epoch: 8   Global Step: 48850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:30,446-Speed 3394.66 samples/sec   Loss 4.8131   LearningRate 0.0325   Epoch: 8   Global Step: 48860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:33,460-Speed 3398.11 samples/sec   Loss 4.8028   LearningRate 0.0325   Epoch: 8   Global Step: 48870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:36,475-Speed 3397.25 samples/sec   Loss 4.7390   LearningRate 0.0325   Epoch: 8   Global Step: 48880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:29:39,475-Speed 3413.67 samples/sec   Loss 4.8415   LearningRate 0.0325   Epoch: 8   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:42,499-Speed 3387.80 samples/sec   Loss 4.7887   LearningRate 0.0325   Epoch: 8   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:45,515-Speed 3395.98 samples/sec   Loss 4.8556   LearningRate 0.0325   Epoch: 8   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:48,533-Speed 3393.49 samples/sec   Loss 4.7578   LearningRate 0.0325   Epoch: 8   Global Step: 48920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:51,590-Speed 3350.34 samples/sec   Loss 4.8804   LearningRate 0.0325   Epoch: 8   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:54,604-Speed 3398.51 samples/sec   Loss 4.7342   LearningRate 0.0324   Epoch: 8   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:29:57,623-Speed 3392.88 samples/sec   Loss 4.8432   LearningRate 0.0324   Epoch: 8   Global Step: 48950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:00,639-Speed 3396.12 samples/sec   Loss 4.8629   LearningRate 0.0324   Epoch: 8   Global Step: 48960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:03,654-Speed 3396.88 samples/sec   Loss 4.7835   LearningRate 0.0324   Epoch: 8   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:06,668-Speed 3398.17 samples/sec   Loss 4.8631   LearningRate 0.0324   Epoch: 8   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:09,687-Speed 3392.37 samples/sec   Loss 4.6737   LearningRate 0.0324   Epoch: 8   Global Step: 48990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:30:12,689-Speed 3412.27 samples/sec   Loss 4.6552   LearningRate 0.0324   Epoch: 8   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:15,701-Speed 3400.41 samples/sec   Loss 4.9467   LearningRate 0.0324   Epoch: 8   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:18,721-Speed 3391.87 samples/sec   Loss 4.7715   LearningRate 0.0324   Epoch: 8   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:21,745-Speed 3386.93 samples/sec   Loss 4.6908   LearningRate 0.0324   Epoch: 8   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:24,766-Speed 3390.71 samples/sec   Loss 4.7301   LearningRate 0.0323   Epoch: 8   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:27,788-Speed 3389.52 samples/sec   Loss 4.7190   LearningRate 0.0323   Epoch: 8   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:30,804-Speed 3395.80 samples/sec   Loss 4.6993   LearningRate 0.0323   Epoch: 8   Global Step: 49060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:33,820-Speed 3395.87 samples/sec   Loss 4.7347   LearningRate 0.0323   Epoch: 8   Global Step: 49070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:36,841-Speed 3390.20 samples/sec   Loss 4.7769   LearningRate 0.0323   Epoch: 8   Global Step: 49080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:39,916-Speed 3330.99 samples/sec   Loss 4.7853   LearningRate 0.0323   Epoch: 8   Global Step: 49090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:30:42,948-Speed 3378.94 samples/sec   Loss 4.7374   LearningRate 0.0323   Epoch: 8   Global Step: 49100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:30:45,963-Speed 3397.13 samples/sec   Loss 4.6284   LearningRate 0.0323   Epoch: 8   Global Step: 49110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:30:48,978-Speed 3397.15 samples/sec   Loss 4.6635   LearningRate 0.0323   Epoch: 8   Global Step: 49120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:30:51,995-Speed 3394.35 samples/sec   Loss 4.9213   LearningRate 0.0323   Epoch: 8   Global Step: 49130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:30:55,012-Speed 3395.48 samples/sec   Loss 4.7323   LearningRate 0.0322   Epoch: 8   Global Step: 49140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:30:58,035-Speed 3387.34 samples/sec   Loss 4.7680   LearningRate 0.0322   Epoch: 8   Global Step: 49150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:01,060-Speed 3386.04 samples/sec   Loss 4.7858   LearningRate 0.0322   Epoch: 8   Global Step: 49160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:04,069-Speed 3404.43 samples/sec   Loss 4.6681   LearningRate 0.0322   Epoch: 8   Global Step: 49170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:07,089-Speed 3391.55 samples/sec   Loss 4.8283   LearningRate 0.0322   Epoch: 8   Global Step: 49180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:10,106-Speed 3394.79 samples/sec   Loss 4.8086   LearningRate 0.0322   Epoch: 8   Global Step: 49190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:13,123-Speed 3395.31 samples/sec   Loss 4.7775   LearningRate 0.0322   Epoch: 8   Global Step: 49200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:16,137-Speed 3398.26 samples/sec   Loss 4.8285   LearningRate 0.0322   Epoch: 8   Global Step: 49210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:19,155-Speed 3393.68 samples/sec   Loss 4.7540   LearningRate 0.0322   Epoch: 8   Global Step: 49220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:22,171-Speed 3395.59 samples/sec   Loss 4.6592   LearningRate 0.0322   Epoch: 8   Global Step: 49230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:25,192-Speed 3390.16 samples/sec   Loss 4.7869   LearningRate 0.0321   Epoch: 8   Global Step: 49240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:28,216-Speed 3387.73 samples/sec   Loss 4.8014   LearningRate 0.0321   Epoch: 8   Global Step: 49250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:31,240-Speed 3386.53 samples/sec   Loss 4.8310   LearningRate 0.0321   Epoch: 8   Global Step: 49260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:34,271-Speed 3380.02 samples/sec   Loss 4.7806   LearningRate 0.0321   Epoch: 8   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:37,287-Speed 3395.74 samples/sec   Loss 4.7900   LearningRate 0.0321   Epoch: 8   Global Step: 49280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:40,306-Speed 3392.68 samples/sec   Loss 4.7026   LearningRate 0.0321   Epoch: 8   Global Step: 49290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:43,334-Speed 3382.02 samples/sec   Loss 4.7559   LearningRate 0.0321   Epoch: 8   Global Step: 49300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:46,372-Speed 3371.65 samples/sec   Loss 4.7013   LearningRate 0.0321   Epoch: 8   Global Step: 49310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:49,400-Speed 3382.98 samples/sec   Loss 4.6581   LearningRate 0.0321   Epoch: 8   Global Step: 49320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:31:52,399-Speed 3414.83 samples/sec   Loss 4.8434   LearningRate 0.0321   Epoch: 8   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:55,418-Speed 3393.06 samples/sec   Loss 4.8152   LearningRate 0.0321   Epoch: 8   Global Step: 49340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:31:58,436-Speed 3394.25 samples/sec   Loss 4.7830   LearningRate 0.0320   Epoch: 8   Global Step: 49350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:01,529-Speed 3311.74 samples/sec   Loss 4.7346   LearningRate 0.0320   Epoch: 8   Global Step: 49360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:04,544-Speed 3396.44 samples/sec   Loss 4.7142   LearningRate 0.0320   Epoch: 8   Global Step: 49370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:07,566-Speed 3389.95 samples/sec   Loss 4.6754   LearningRate 0.0320   Epoch: 8   Global Step: 49380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:10,581-Speed 3396.98 samples/sec   Loss 4.6946   LearningRate 0.0320   Epoch: 8   Global Step: 49390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:13,599-Speed 3394.01 samples/sec   Loss 4.6930   LearningRate 0.0320   Epoch: 8   Global Step: 49400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:16,616-Speed 3394.41 samples/sec   Loss 4.6472   LearningRate 0.0320   Epoch: 8   Global Step: 49410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:19,636-Speed 3391.72 samples/sec   Loss 4.6986   LearningRate 0.0320   Epoch: 8   Global Step: 49420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:22,653-Speed 3395.03 samples/sec   Loss 4.6635   LearningRate 0.0320   Epoch: 8   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:32:25,654-Speed 3412.96 samples/sec   Loss 4.6296   LearningRate 0.0320   Epoch: 8   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:28,671-Speed 3394.69 samples/sec   Loss 4.8343   LearningRate 0.0319   Epoch: 8   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:31,686-Speed 3396.86 samples/sec   Loss 4.7525   LearningRate 0.0319   Epoch: 8   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:34,706-Speed 3392.19 samples/sec   Loss 4.6857   LearningRate 0.0319   Epoch: 8   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:37,750-Speed 3363.87 samples/sec   Loss 4.7087   LearningRate 0.0319   Epoch: 8   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:40,805-Speed 3353.47 samples/sec   Loss 4.7053   LearningRate 0.0319   Epoch: 8   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:43,825-Speed 3391.24 samples/sec   Loss 4.8096   LearningRate 0.0319   Epoch: 8   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:46,845-Speed 3391.65 samples/sec   Loss 4.7256   LearningRate 0.0319   Epoch: 8   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:49,863-Speed 3393.52 samples/sec   Loss 4.6613   LearningRate 0.0319   Epoch: 8   Global Step: 49520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:52,882-Speed 3392.93 samples/sec   Loss 4.7200   LearningRate 0.0319   Epoch: 8   Global Step: 49530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:32:55,900-Speed 3393.37 samples/sec   Loss 4.7488   LearningRate 0.0319   Epoch: 8   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:32:58,939-Speed 3371.33 samples/sec   Loss 4.8223   LearningRate 0.0318   Epoch: 8   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:01,957-Speed 3393.62 samples/sec   Loss 4.7936   LearningRate 0.0318   Epoch: 8   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:04,982-Speed 3385.70 samples/sec   Loss 4.6257   LearningRate 0.0318   Epoch: 8   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:07,980-Speed 3416.36 samples/sec   Loss 4.7419   LearningRate 0.0318   Epoch: 8   Global Step: 49580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:11,004-Speed 3386.66 samples/sec   Loss 4.8226   LearningRate 0.0318   Epoch: 8   Global Step: 49590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:14,045-Speed 3367.54 samples/sec   Loss 4.7822   LearningRate 0.0318   Epoch: 8   Global Step: 49600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:17,066-Speed 3390.87 samples/sec   Loss 4.7667   LearningRate 0.0318   Epoch: 8   Global Step: 49610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:20,087-Speed 3391.11 samples/sec   Loss 4.6418   LearningRate 0.0318   Epoch: 8   Global Step: 49620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:23,135-Speed 3360.13 samples/sec   Loss 4.6904   LearningRate 0.0318   Epoch: 8   Global Step: 49630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:26,172-Speed 3372.59 samples/sec   Loss 4.7090   LearningRate 0.0318   Epoch: 8   Global Step: 49640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:29,231-Speed 3349.20 samples/sec   Loss 4.7554   LearningRate 0.0317   Epoch: 8   Global Step: 49650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:32,262-Speed 3379.27 samples/sec   Loss 4.7888   LearningRate 0.0317   Epoch: 8   Global Step: 49660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:35,286-Speed 3386.86 samples/sec   Loss 4.7196   LearningRate 0.0317   Epoch: 8   Global Step: 49670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:38,305-Speed 3392.28 samples/sec   Loss 4.6182   LearningRate 0.0317   Epoch: 8   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:41,326-Speed 3390.90 samples/sec   Loss 4.6235   LearningRate 0.0317   Epoch: 8   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:44,359-Speed 3376.58 samples/sec   Loss 4.7472   LearningRate 0.0317   Epoch: 8   Global Step: 49700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:47,396-Speed 3372.80 samples/sec   Loss 4.7528   LearningRate 0.0317   Epoch: 8   Global Step: 49710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:50,523-Speed 3275.46 samples/sec   Loss 4.7061   LearningRate 0.0317   Epoch: 8   Global Step: 49720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:33:53,525-Speed 3411.97 samples/sec   Loss 4.7834   LearningRate 0.0317   Epoch: 8   Global Step: 49730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:56,545-Speed 3391.13 samples/sec   Loss 4.7384   LearningRate 0.0317   Epoch: 8   Global Step: 49740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:33:59,599-Speed 3353.71 samples/sec   Loss 4.6823   LearningRate 0.0316   Epoch: 8   Global Step: 49750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:02,656-Speed 3350.56 samples/sec   Loss 4.6377   LearningRate 0.0316   Epoch: 8   Global Step: 49760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:05,682-Speed 3384.87 samples/sec   Loss 4.6674   LearningRate 0.0316   Epoch: 8   Global Step: 49770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:08,703-Speed 3389.63 samples/sec   Loss 4.8502   LearningRate 0.0316   Epoch: 8   Global Step: 49780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:11,847-Speed 3258.54 samples/sec   Loss 4.6325   LearningRate 0.0316   Epoch: 8   Global Step: 49790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:14,933-Speed 3319.05 samples/sec   Loss 4.7713   LearningRate 0.0316   Epoch: 8   Global Step: 49800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:17,964-Speed 3379.81 samples/sec   Loss 4.6500   LearningRate 0.0316   Epoch: 8   Global Step: 49810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:20,990-Speed 3384.18 samples/sec   Loss 4.7359   LearningRate 0.0316   Epoch: 8   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:24,009-Speed 3392.18 samples/sec   Loss 4.6891   LearningRate 0.0316   Epoch: 8   Global Step: 49830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:34:27,029-Speed 3391.39 samples/sec   Loss 4.7573   LearningRate 0.0316   Epoch: 8   Global Step: 49840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:34:30,036-Speed 3406.48 samples/sec   Loss 4.6573   LearningRate 0.0315   Epoch: 8   Global Step: 49850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:33,054-Speed 3393.84 samples/sec   Loss 4.7032   LearningRate 0.0315   Epoch: 8   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:36,085-Speed 3378.91 samples/sec   Loss 4.6792   LearningRate 0.0315   Epoch: 8   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:39,130-Speed 3363.59 samples/sec   Loss 4.6271   LearningRate 0.0315   Epoch: 8   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:42,156-Speed 3384.55 samples/sec   Loss 4.7237   LearningRate 0.0315   Epoch: 8   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:45,180-Speed 3387.72 samples/sec   Loss 4.7349   LearningRate 0.0315   Epoch: 8   Global Step: 49900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:48,213-Speed 3377.50 samples/sec   Loss 4.8406   LearningRate 0.0315   Epoch: 8   Global Step: 49910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:51,242-Speed 3381.69 samples/sec   Loss 4.6009   LearningRate 0.0315   Epoch: 8   Global Step: 49920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:54,271-Speed 3380.91 samples/sec   Loss 4.8316   LearningRate 0.0315   Epoch: 8   Global Step: 49930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:34:57,293-Speed 3389.56 samples/sec   Loss 4.7187   LearningRate 0.0315   Epoch: 8   Global Step: 49940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:35:00,314-Speed 3390.05 samples/sec   Loss 4.7797   LearningRate 0.0314   Epoch: 8   Global Step: 49950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:35:03,353-Speed 3370.49 samples/sec   Loss 4.6880   LearningRate 0.0314   Epoch: 8   Global Step: 49960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:35:06,372-Speed 3391.83 samples/sec   Loss 4.6964   LearningRate 0.0314   Epoch: 8   Global Step: 49970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:35:09,395-Speed 3387.98 samples/sec   Loss 4.6692   LearningRate 0.0314   Epoch: 8   Global Step: 49980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:35:12,421-Speed 3385.70 samples/sec   Loss 4.5567   LearningRate 0.0314   Epoch: 8   Global Step: 49990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:35:15,445-Speed 3387.62 samples/sec   Loss 4.6905   LearningRate 0.0314   Epoch: 8   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:35:58,954-[lfw][50000]XNorm: 21.875312
Training: 2022-04-27 06:35:58,955-[lfw][50000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-27 06:35:58,955-[lfw][50000]Accuracy-Highest: 0.99817
Training: 2022-04-27 06:36:49,337-[cfp_fp][50000]XNorm: 19.527867
Training: 2022-04-27 06:36:49,337-[cfp_fp][50000]Accuracy-Flip: 0.95843+-0.00895
Training: 2022-04-27 06:36:49,338-[cfp_fp][50000]Accuracy-Highest: 0.96414
Training: 2022-04-27 06:37:32,889-[agedb_30][50000]XNorm: 21.759556
Training: 2022-04-27 06:37:32,889-[agedb_30][50000]Accuracy-Flip: 0.97567+-0.00814
Training: 2022-04-27 06:37:32,890-[agedb_30][50000]Accuracy-Highest: 0.97767
Training: 2022-04-27 06:37:35,905-Speed 72.90 samples/sec   Loss 4.6234   LearningRate 0.0314   Epoch: 8   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:37:38,912-Speed 3406.70 samples/sec   Loss 4.6668   LearningRate 0.0314   Epoch: 8   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:37:41,923-Speed 3401.48 samples/sec   Loss 4.6183   LearningRate 0.0314   Epoch: 8   Global Step: 50030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:37:44,942-Speed 3392.14 samples/sec   Loss 4.7282   LearningRate 0.0314   Epoch: 8   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:37:47,965-Speed 3388.77 samples/sec   Loss 4.7514   LearningRate 0.0313   Epoch: 8   Global Step: 50050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:37:51,011-Speed 3362.24 samples/sec   Loss 4.7674   LearningRate 0.0313   Epoch: 8   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:37:54,048-Speed 3374.05 samples/sec   Loss 4.6597   LearningRate 0.0313   Epoch: 8   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:37:57,066-Speed 3392.98 samples/sec   Loss 4.6274   LearningRate 0.0313   Epoch: 8   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:38:00,094-Speed 3383.73 samples/sec   Loss 4.7048   LearningRate 0.0313   Epoch: 8   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:38:03,131-Speed 3372.39 samples/sec   Loss 4.6003   LearningRate 0.0313   Epoch: 8   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:38:06,249-Speed 3284.24 samples/sec   Loss 4.7412   LearningRate 0.0313   Epoch: 8   Global Step: 50110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:38:09,254-Speed 3409.49 samples/sec   Loss 4.6597   LearningRate 0.0313   Epoch: 8   Global Step: 50120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:12,275-Speed 3390.48 samples/sec   Loss 4.6976   LearningRate 0.0313   Epoch: 8   Global Step: 50130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:15,295-Speed 3390.75 samples/sec   Loss 4.7655   LearningRate 0.0313   Epoch: 8   Global Step: 50140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:18,321-Speed 3385.11 samples/sec   Loss 4.6773   LearningRate 0.0312   Epoch: 8   Global Step: 50150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:21,352-Speed 3379.38 samples/sec   Loss 4.6825   LearningRate 0.0312   Epoch: 8   Global Step: 50160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:24,380-Speed 3381.52 samples/sec   Loss 4.7678   LearningRate 0.0312   Epoch: 8   Global Step: 50170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:27,402-Speed 3390.11 samples/sec   Loss 4.6719   LearningRate 0.0312   Epoch: 8   Global Step: 50180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:30,425-Speed 3388.22 samples/sec   Loss 4.7778   LearningRate 0.0312   Epoch: 8   Global Step: 50190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:33,450-Speed 3386.15 samples/sec   Loss 4.7049   LearningRate 0.0312   Epoch: 8   Global Step: 50200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:36,477-Speed 3383.85 samples/sec   Loss 4.8028   LearningRate 0.0312   Epoch: 8   Global Step: 50210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:39,517-Speed 3368.73 samples/sec   Loss 4.6759   LearningRate 0.0312   Epoch: 8   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:38:42,527-Speed 3403.50 samples/sec   Loss 4.6698   LearningRate 0.0312   Epoch: 8   Global Step: 50230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:45,574-Speed 3361.48 samples/sec   Loss 4.6847   LearningRate 0.0312   Epoch: 8   Global Step: 50240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:48,606-Speed 3377.96 samples/sec   Loss 4.6167   LearningRate 0.0312   Epoch: 8   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:51,641-Speed 3374.90 samples/sec   Loss 4.7121   LearningRate 0.0311   Epoch: 8   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:54,682-Speed 3367.33 samples/sec   Loss 4.5617   LearningRate 0.0311   Epoch: 8   Global Step: 50270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:38:57,731-Speed 3360.46 samples/sec   Loss 4.7345   LearningRate 0.0311   Epoch: 8   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:00,778-Speed 3360.52 samples/sec   Loss 4.7167   LearningRate 0.0311   Epoch: 8   Global Step: 50290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:03,831-Speed 3354.91 samples/sec   Loss 4.6163   LearningRate 0.0311   Epoch: 8   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:06,861-Speed 3380.64 samples/sec   Loss 4.8213   LearningRate 0.0311   Epoch: 8   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:09,891-Speed 3380.62 samples/sec   Loss 4.6092   LearningRate 0.0311   Epoch: 8   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:12,916-Speed 3385.46 samples/sec   Loss 4.5648   LearningRate 0.0311   Epoch: 8   Global Step: 50330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:39:15,932-Speed 3396.21 samples/sec   Loss 4.6186   LearningRate 0.0311   Epoch: 8   Global Step: 50340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:18,977-Speed 3363.83 samples/sec   Loss 4.6097   LearningRate 0.0311   Epoch: 8   Global Step: 50350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:22,004-Speed 3383.03 samples/sec   Loss 4.6959   LearningRate 0.0310   Epoch: 8   Global Step: 50360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:25,035-Speed 3380.13 samples/sec   Loss 4.7267   LearningRate 0.0310   Epoch: 8   Global Step: 50370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:28,064-Speed 3381.00 samples/sec   Loss 4.6863   LearningRate 0.0310   Epoch: 8   Global Step: 50380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:31,092-Speed 3382.90 samples/sec   Loss 4.6314   LearningRate 0.0310   Epoch: 8   Global Step: 50390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:34,112-Speed 3391.18 samples/sec   Loss 4.5820   LearningRate 0.0310   Epoch: 8   Global Step: 50400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:37,139-Speed 3382.99 samples/sec   Loss 4.6503   LearningRate 0.0310   Epoch: 8   Global Step: 50410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:40,161-Speed 3390.14 samples/sec   Loss 4.6698   LearningRate 0.0310   Epoch: 8   Global Step: 50420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:43,187-Speed 3383.87 samples/sec   Loss 4.6380   LearningRate 0.0310   Epoch: 8   Global Step: 50430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:39:46,208-Speed 3391.34 samples/sec   Loss 4.6122   LearningRate 0.0310   Epoch: 8   Global Step: 50440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:39:49,238-Speed 3380.35 samples/sec   Loss 4.7358   LearningRate 0.0310   Epoch: 8   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:39:52,272-Speed 3375.90 samples/sec   Loss 4.6766   LearningRate 0.0309   Epoch: 8   Global Step: 50460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:39:55,295-Speed 3387.46 samples/sec   Loss 4.7118   LearningRate 0.0309   Epoch: 8   Global Step: 50470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:39:58,322-Speed 3383.79 samples/sec   Loss 4.6797   LearningRate 0.0309   Epoch: 8   Global Step: 50480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 06:40:01,351-Speed 3381.24 samples/sec   Loss 4.7154   LearningRate 0.0309   Epoch: 8   Global Step: 50490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:40:04,377-Speed 3385.59 samples/sec   Loss 4.6413   LearningRate 0.0309   Epoch: 8   Global Step: 50500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:40:07,398-Speed 3390.22 samples/sec   Loss 4.6428   LearningRate 0.0309   Epoch: 8   Global Step: 50510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:40:10,427-Speed 3381.70 samples/sec   Loss 4.7082   LearningRate 0.0309   Epoch: 8   Global Step: 50520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:40:13,452-Speed 3385.94 samples/sec   Loss 4.6826   LearningRate 0.0309   Epoch: 8   Global Step: 50530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:40:16,529-Speed 3328.78 samples/sec   Loss 4.8028   LearningRate 0.0309   Epoch: 8   Global Step: 50540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 06:40:19,553-Speed 3386.94 samples/sec   Loss 4.7434   LearningRate 0.0309   Epoch: 8   Global Step: 50550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:22,574-Speed 3390.68 samples/sec   Loss 4.6249   LearningRate 0.0308   Epoch: 8   Global Step: 50560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:25,593-Speed 3391.49 samples/sec   Loss 4.5369   LearningRate 0.0308   Epoch: 8   Global Step: 50570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:28,615-Speed 3389.59 samples/sec   Loss 4.6986   LearningRate 0.0308   Epoch: 8   Global Step: 50580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:31,635-Speed 3391.39 samples/sec   Loss 4.6292   LearningRate 0.0308   Epoch: 8   Global Step: 50590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:40:34,665-Speed 3380.95 samples/sec   Loss 4.6159   LearningRate 0.0308   Epoch: 8   Global Step: 50600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:40:37,701-Speed 3372.65 samples/sec   Loss 4.6270   LearningRate 0.0308   Epoch: 8   Global Step: 50610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:40:40,713-Speed 3402.37 samples/sec   Loss 4.8218   LearningRate 0.0308   Epoch: 8   Global Step: 50620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:43,740-Speed 3383.55 samples/sec   Loss 4.5479   LearningRate 0.0308   Epoch: 8   Global Step: 50630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:46,762-Speed 3389.48 samples/sec   Loss 4.6720   LearningRate 0.0308   Epoch: 8   Global Step: 50640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:49,797-Speed 3374.43 samples/sec   Loss 4.6844   LearningRate 0.0308   Epoch: 8   Global Step: 50650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:52,821-Speed 3386.53 samples/sec   Loss 4.6535   LearningRate 0.0307   Epoch: 8   Global Step: 50660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:55,860-Speed 3370.34 samples/sec   Loss 4.5804   LearningRate 0.0307   Epoch: 8   Global Step: 50670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:40:58,908-Speed 3360.38 samples/sec   Loss 4.7277   LearningRate 0.0307   Epoch: 8   Global Step: 50680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:01,934-Speed 3385.14 samples/sec   Loss 4.6006   LearningRate 0.0307   Epoch: 8   Global Step: 50690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:04,964-Speed 3380.23 samples/sec   Loss 4.7511   LearningRate 0.0307   Epoch: 8   Global Step: 50700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:07,996-Speed 3377.86 samples/sec   Loss 4.7078   LearningRate 0.0307   Epoch: 8   Global Step: 50710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:11,002-Speed 3407.63 samples/sec   Loss 4.7479   LearningRate 0.0307   Epoch: 8   Global Step: 50720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:14,028-Speed 3384.71 samples/sec   Loss 4.6523   LearningRate 0.0307   Epoch: 8   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:17,055-Speed 3384.56 samples/sec   Loss 4.6479   LearningRate 0.0307   Epoch: 8   Global Step: 50740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:20,080-Speed 3386.04 samples/sec   Loss 4.5551   LearningRate 0.0307   Epoch: 8   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:23,109-Speed 3381.32 samples/sec   Loss 4.5644   LearningRate 0.0307   Epoch: 8   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:26,132-Speed 3388.04 samples/sec   Loss 4.6623   LearningRate 0.0306   Epoch: 8   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:29,158-Speed 3384.64 samples/sec   Loss 4.6471   LearningRate 0.0306   Epoch: 8   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:32,185-Speed 3383.63 samples/sec   Loss 4.6611   LearningRate 0.0306   Epoch: 8   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:35,229-Speed 3365.28 samples/sec   Loss 4.5634   LearningRate 0.0306   Epoch: 8   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:38,263-Speed 3375.09 samples/sec   Loss 4.5877   LearningRate 0.0306   Epoch: 8   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:41,294-Speed 3380.14 samples/sec   Loss 4.7116   LearningRate 0.0306   Epoch: 8   Global Step: 50820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:41:44,299-Speed 3408.73 samples/sec   Loss 4.6015   LearningRate 0.0306   Epoch: 8   Global Step: 50830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:47,327-Speed 3382.30 samples/sec   Loss 4.6773   LearningRate 0.0306   Epoch: 8   Global Step: 50840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:50,360-Speed 3377.09 samples/sec   Loss 4.6843   LearningRate 0.0306   Epoch: 8   Global Step: 50850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:53,384-Speed 3386.82 samples/sec   Loss 4.5665   LearningRate 0.0306   Epoch: 8   Global Step: 50860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:56,408-Speed 3386.23 samples/sec   Loss 4.5831   LearningRate 0.0305   Epoch: 8   Global Step: 50870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:41:59,431-Speed 3388.80 samples/sec   Loss 4.5979   LearningRate 0.0305   Epoch: 8   Global Step: 50880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:02,453-Speed 3389.32 samples/sec   Loss 4.7281   LearningRate 0.0305   Epoch: 8   Global Step: 50890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:05,478-Speed 3385.70 samples/sec   Loss 4.6249   LearningRate 0.0305   Epoch: 8   Global Step: 50900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:08,506-Speed 3382.37 samples/sec   Loss 4.8049   LearningRate 0.0305   Epoch: 8   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:11,594-Speed 3317.60 samples/sec   Loss 4.6824   LearningRate 0.0305   Epoch: 8   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:14,652-Speed 3350.26 samples/sec   Loss 4.5421   LearningRate 0.0305   Epoch: 8   Global Step: 50930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:42:17,687-Speed 3374.75 samples/sec   Loss 4.5820   LearningRate 0.0305   Epoch: 8   Global Step: 50940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:42:20,704-Speed 3394.75 samples/sec   Loss 4.6236   LearningRate 0.0305   Epoch: 8   Global Step: 50950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:23,730-Speed 3383.99 samples/sec   Loss 4.6677   LearningRate 0.0305   Epoch: 8   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:26,751-Speed 3390.14 samples/sec   Loss 4.6285   LearningRate 0.0304   Epoch: 8   Global Step: 50970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:29,781-Speed 3380.19 samples/sec   Loss 4.7144   LearningRate 0.0304   Epoch: 8   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:32,812-Speed 3380.21 samples/sec   Loss 4.5745   LearningRate 0.0304   Epoch: 8   Global Step: 50990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:35,928-Speed 3287.54 samples/sec   Loss 4.5936   LearningRate 0.0304   Epoch: 8   Global Step: 51000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:38,963-Speed 3374.71 samples/sec   Loss 4.4620   LearningRate 0.0304   Epoch: 8   Global Step: 51010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:41,995-Speed 3377.08 samples/sec   Loss 4.5882   LearningRate 0.0304   Epoch: 8   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:45,021-Speed 3385.24 samples/sec   Loss 4.5034   LearningRate 0.0304   Epoch: 8   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:48,045-Speed 3386.98 samples/sec   Loss 4.4007   LearningRate 0.0304   Epoch: 8   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:51,073-Speed 3383.29 samples/sec   Loss 4.5396   LearningRate 0.0304   Epoch: 8   Global Step: 51050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:42:54,103-Speed 3380.01 samples/sec   Loss 4.6591   LearningRate 0.0304   Epoch: 8   Global Step: 51060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:42:57,131-Speed 3382.25 samples/sec   Loss 4.7308   LearningRate 0.0304   Epoch: 8   Global Step: 51070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:00,167-Speed 3373.51 samples/sec   Loss 4.6166   LearningRate 0.0303   Epoch: 8   Global Step: 51080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:03,214-Speed 3363.03 samples/sec   Loss 4.6559   LearningRate 0.0303   Epoch: 8   Global Step: 51090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:06,241-Speed 3382.73 samples/sec   Loss 4.6858   LearningRate 0.0303   Epoch: 8   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:09,287-Speed 3363.25 samples/sec   Loss 4.5644   LearningRate 0.0303   Epoch: 8   Global Step: 51110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:12,316-Speed 3381.18 samples/sec   Loss 4.5872   LearningRate 0.0303   Epoch: 8   Global Step: 51120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:15,341-Speed 3385.98 samples/sec   Loss 4.5695   LearningRate 0.0303   Epoch: 8   Global Step: 51130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:18,373-Speed 3377.59 samples/sec   Loss 4.6471   LearningRate 0.0303   Epoch: 8   Global Step: 51140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:21,405-Speed 3378.20 samples/sec   Loss 4.4873   LearningRate 0.0303   Epoch: 8   Global Step: 51150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:24,528-Speed 3279.88 samples/sec   Loss 4.6184   LearningRate 0.0303   Epoch: 8   Global Step: 51160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:43:27,788-Speed 3141.50 samples/sec   Loss 4.6812   LearningRate 0.0303   Epoch: 8   Global Step: 51170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:43:41,358-Speed 754.69 samples/sec   Loss 4.2444   LearningRate 0.0302   Epoch: 9   Global Step: 51180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:43:44,389-Speed 3378.97 samples/sec   Loss 3.9491   LearningRate 0.0302   Epoch: 9   Global Step: 51190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:43:47,420-Speed 3380.21 samples/sec   Loss 3.9316   LearningRate 0.0302   Epoch: 9   Global Step: 51200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:43:50,447-Speed 3382.62 samples/sec   Loss 3.9513   LearningRate 0.0302   Epoch: 9   Global Step: 51210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:43:53,483-Speed 3374.14 samples/sec   Loss 4.0236   LearningRate 0.0302   Epoch: 9   Global Step: 51220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:43:56,488-Speed 3408.14 samples/sec   Loss 3.9332   LearningRate 0.0302   Epoch: 9   Global Step: 51230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:43:59,521-Speed 3377.69 samples/sec   Loss 4.0510   LearningRate 0.0302   Epoch: 9   Global Step: 51240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:02,548-Speed 3383.27 samples/sec   Loss 3.9494   LearningRate 0.0302   Epoch: 9   Global Step: 51250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:05,578-Speed 3381.00 samples/sec   Loss 4.0751   LearningRate 0.0302   Epoch: 9   Global Step: 51260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:08,616-Speed 3372.02 samples/sec   Loss 4.1238   LearningRate 0.0302   Epoch: 9   Global Step: 51270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:11,807-Speed 3209.66 samples/sec   Loss 4.0182   LearningRate 0.0301   Epoch: 9   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:14,821-Speed 3398.34 samples/sec   Loss 4.0171   LearningRate 0.0301   Epoch: 9   Global Step: 51290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:17,857-Speed 3372.85 samples/sec   Loss 3.8868   LearningRate 0.0301   Epoch: 9   Global Step: 51300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:20,890-Speed 3377.57 samples/sec   Loss 4.1247   LearningRate 0.0301   Epoch: 9   Global Step: 51310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:23,937-Speed 3361.71 samples/sec   Loss 4.1162   LearningRate 0.0301   Epoch: 9   Global Step: 51320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:27,001-Speed 3342.62 samples/sec   Loss 4.1366   LearningRate 0.0301   Epoch: 9   Global Step: 51330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:30,064-Speed 3343.58 samples/sec   Loss 4.0292   LearningRate 0.0301   Epoch: 9   Global Step: 51340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:33,099-Speed 3374.68 samples/sec   Loss 4.0876   LearningRate 0.0301   Epoch: 9   Global Step: 51350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:36,161-Speed 3344.92 samples/sec   Loss 3.9444   LearningRate 0.0301   Epoch: 9   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:39,199-Speed 3372.14 samples/sec   Loss 4.1814   LearningRate 0.0301   Epoch: 9   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:42,234-Speed 3374.42 samples/sec   Loss 4.1649   LearningRate 0.0301   Epoch: 9   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:44:45,272-Speed 3371.39 samples/sec   Loss 4.1309   LearningRate 0.0300   Epoch: 9   Global Step: 51390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:48,310-Speed 3370.88 samples/sec   Loss 4.0589   LearningRate 0.0300   Epoch: 9   Global Step: 51400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:51,351-Speed 3368.55 samples/sec   Loss 4.1332   LearningRate 0.0300   Epoch: 9   Global Step: 51410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:54,396-Speed 3363.86 samples/sec   Loss 4.1146   LearningRate 0.0300   Epoch: 9   Global Step: 51420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:44:57,440-Speed 3365.00 samples/sec   Loss 4.0904   LearningRate 0.0300   Epoch: 9   Global Step: 51430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:00,477-Speed 3372.25 samples/sec   Loss 4.2602   LearningRate 0.0300   Epoch: 9   Global Step: 51440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:03,526-Speed 3359.65 samples/sec   Loss 4.3176   LearningRate 0.0300   Epoch: 9   Global Step: 51450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:06,558-Speed 3377.32 samples/sec   Loss 4.0927   LearningRate 0.0300   Epoch: 9   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:09,590-Speed 3377.97 samples/sec   Loss 4.1932   LearningRate 0.0300   Epoch: 9   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:12,619-Speed 3381.63 samples/sec   Loss 4.0898   LearningRate 0.0300   Epoch: 9   Global Step: 51480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:15,651-Speed 3377.99 samples/sec   Loss 4.1638   LearningRate 0.0299   Epoch: 9   Global Step: 51490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:45:18,681-Speed 3380.90 samples/sec   Loss 4.0821   LearningRate 0.0299   Epoch: 9   Global Step: 51500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:45:21,707-Speed 3384.89 samples/sec   Loss 4.1142   LearningRate 0.0299   Epoch: 9   Global Step: 51510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:45:24,751-Speed 3364.70 samples/sec   Loss 4.1846   LearningRate 0.0299   Epoch: 9   Global Step: 51520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:45:27,781-Speed 3379.87 samples/sec   Loss 4.1571   LearningRate 0.0299   Epoch: 9   Global Step: 51530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:45:30,806-Speed 3386.74 samples/sec   Loss 4.1864   LearningRate 0.0299   Epoch: 9   Global Step: 51540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:45:33,836-Speed 3379.89 samples/sec   Loss 4.2325   LearningRate 0.0299   Epoch: 9   Global Step: 51550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:45:36,846-Speed 3402.63 samples/sec   Loss 4.2154   LearningRate 0.0299   Epoch: 9   Global Step: 51560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:39,873-Speed 3383.40 samples/sec   Loss 4.1080   LearningRate 0.0299   Epoch: 9   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:42,894-Speed 3390.79 samples/sec   Loss 4.3035   LearningRate 0.0299   Epoch: 9   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:45,929-Speed 3375.34 samples/sec   Loss 4.0810   LearningRate 0.0298   Epoch: 9   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:48,969-Speed 3369.27 samples/sec   Loss 4.1443   LearningRate 0.0298   Epoch: 9   Global Step: 51600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:51,991-Speed 3389.33 samples/sec   Loss 4.1801   LearningRate 0.0298   Epoch: 9   Global Step: 51610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:55,012-Speed 3389.55 samples/sec   Loss 4.2547   LearningRate 0.0298   Epoch: 9   Global Step: 51620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:45:58,042-Speed 3380.63 samples/sec   Loss 4.2537   LearningRate 0.0298   Epoch: 9   Global Step: 51630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:01,068-Speed 3384.86 samples/sec   Loss 4.1832   LearningRate 0.0298   Epoch: 9   Global Step: 51640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:04,088-Speed 3391.39 samples/sec   Loss 4.2986   LearningRate 0.0298   Epoch: 9   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:07,113-Speed 3386.09 samples/sec   Loss 4.3070   LearningRate 0.0298   Epoch: 9   Global Step: 51660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:10,144-Speed 3379.06 samples/sec   Loss 4.1961   LearningRate 0.0298   Epoch: 9   Global Step: 51670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:13,216-Speed 3335.00 samples/sec   Loss 4.3117   LearningRate 0.0298   Epoch: 9   Global Step: 51680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:16,259-Speed 3365.04 samples/sec   Loss 4.2540   LearningRate 0.0298   Epoch: 9   Global Step: 51690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:19,284-Speed 3386.28 samples/sec   Loss 4.3453   LearningRate 0.0297   Epoch: 9   Global Step: 51700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:22,320-Speed 3373.55 samples/sec   Loss 4.3162   LearningRate 0.0297   Epoch: 9   Global Step: 51710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:25,342-Speed 3389.11 samples/sec   Loss 4.3037   LearningRate 0.0297   Epoch: 9   Global Step: 51720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:28,366-Speed 3386.71 samples/sec   Loss 4.1786   LearningRate 0.0297   Epoch: 9   Global Step: 51730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:31,392-Speed 3385.16 samples/sec   Loss 4.2063   LearningRate 0.0297   Epoch: 9   Global Step: 51740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:34,455-Speed 3344.11 samples/sec   Loss 4.3147   LearningRate 0.0297   Epoch: 9   Global Step: 51750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:37,576-Speed 3281.57 samples/sec   Loss 4.2571   LearningRate 0.0297   Epoch: 9   Global Step: 51760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:40,604-Speed 3383.08 samples/sec   Loss 4.3321   LearningRate 0.0297   Epoch: 9   Global Step: 51770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:46:43,610-Speed 3406.90 samples/sec   Loss 4.2197   LearningRate 0.0297   Epoch: 9   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:46,638-Speed 3382.44 samples/sec   Loss 4.3020   LearningRate 0.0297   Epoch: 9   Global Step: 51790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:49,660-Speed 3389.26 samples/sec   Loss 4.1520   LearningRate 0.0296   Epoch: 9   Global Step: 51800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:52,685-Speed 3386.27 samples/sec   Loss 4.1581   LearningRate 0.0296   Epoch: 9   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:55,711-Speed 3384.72 samples/sec   Loss 4.3755   LearningRate 0.0296   Epoch: 9   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:46:58,743-Speed 3377.28 samples/sec   Loss 4.2931   LearningRate 0.0296   Epoch: 9   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:01,767-Speed 3387.86 samples/sec   Loss 4.2173   LearningRate 0.0296   Epoch: 9   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:04,800-Speed 3376.88 samples/sec   Loss 4.2718   LearningRate 0.0296   Epoch: 9   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:07,826-Speed 3384.85 samples/sec   Loss 4.2098   LearningRate 0.0296   Epoch: 9   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:10,862-Speed 3373.66 samples/sec   Loss 4.2455   LearningRate 0.0296   Epoch: 9   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:13,897-Speed 3374.98 samples/sec   Loss 4.1613   LearningRate 0.0296   Epoch: 9   Global Step: 51880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:47:16,920-Speed 3387.46 samples/sec   Loss 4.1937   LearningRate 0.0296   Epoch: 9   Global Step: 51890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:47:19,945-Speed 3386.19 samples/sec   Loss 4.1868   LearningRate 0.0296   Epoch: 9   Global Step: 51900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:47:22,977-Speed 3378.73 samples/sec   Loss 4.3283   LearningRate 0.0295   Epoch: 9   Global Step: 51910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:47:26,016-Speed 3371.43 samples/sec   Loss 4.3324   LearningRate 0.0295   Epoch: 9   Global Step: 51920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:47:29,081-Speed 3341.85 samples/sec   Loss 4.3743   LearningRate 0.0295   Epoch: 9   Global Step: 51930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:47:32,083-Speed 3411.14 samples/sec   Loss 4.3347   LearningRate 0.0295   Epoch: 9   Global Step: 51940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:35,124-Speed 3368.21 samples/sec   Loss 4.3834   LearningRate 0.0295   Epoch: 9   Global Step: 51950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:38,147-Speed 3388.72 samples/sec   Loss 4.2834   LearningRate 0.0295   Epoch: 9   Global Step: 51960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:41,167-Speed 3390.95 samples/sec   Loss 4.3462   LearningRate 0.0295   Epoch: 9   Global Step: 51970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:44,191-Speed 3387.50 samples/sec   Loss 4.3469   LearningRate 0.0295   Epoch: 9   Global Step: 51980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:47,214-Speed 3387.50 samples/sec   Loss 4.2689   LearningRate 0.0295   Epoch: 9   Global Step: 51990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:47:50,264-Speed 3358.52 samples/sec   Loss 4.4010   LearningRate 0.0295   Epoch: 9   Global Step: 52000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:48:33,509-[lfw][52000]XNorm: 21.903268
Training: 2022-04-27 06:48:33,509-[lfw][52000]Accuracy-Flip: 0.99700+-0.00277
Training: 2022-04-27 06:48:33,510-[lfw][52000]Accuracy-Highest: 0.99817
Training: 2022-04-27 06:49:23,737-[cfp_fp][52000]XNorm: 19.901404
Training: 2022-04-27 06:49:23,738-[cfp_fp][52000]Accuracy-Flip: 0.96114+-0.01006
Training: 2022-04-27 06:49:23,738-[cfp_fp][52000]Accuracy-Highest: 0.96414
Training: 2022-04-27 06:50:06,894-[agedb_30][52000]XNorm: 22.019052
Training: 2022-04-27 06:50:06,894-[agedb_30][52000]Accuracy-Flip: 0.97733+-0.00727
Training: 2022-04-27 06:50:06,895-[agedb_30][52000]Accuracy-Highest: 0.97767
Training: 2022-04-27 06:50:09,907-Speed 73.33 samples/sec   Loss 4.2374   LearningRate 0.0294   Epoch: 9   Global Step: 52010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:12,930-Speed 3387.37 samples/sec   Loss 4.3676   LearningRate 0.0294   Epoch: 9   Global Step: 52020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:15,943-Speed 3399.94 samples/sec   Loss 4.3944   LearningRate 0.0294   Epoch: 9   Global Step: 52030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:18,954-Speed 3401.63 samples/sec   Loss 4.2378   LearningRate 0.0294   Epoch: 9   Global Step: 52040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:50:21,950-Speed 3418.91 samples/sec   Loss 4.2509   LearningRate 0.0294   Epoch: 9   Global Step: 52050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:24,963-Speed 3398.99 samples/sec   Loss 4.3423   LearningRate 0.0294   Epoch: 9   Global Step: 52060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:27,996-Speed 3377.08 samples/sec   Loss 4.1853   LearningRate 0.0294   Epoch: 9   Global Step: 52070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:31,024-Speed 3382.74 samples/sec   Loss 4.3436   LearningRate 0.0294   Epoch: 9   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:34,051-Speed 3383.31 samples/sec   Loss 4.3415   LearningRate 0.0294   Epoch: 9   Global Step: 52090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:37,069-Speed 3394.39 samples/sec   Loss 4.4282   LearningRate 0.0294   Epoch: 9   Global Step: 52100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:40,094-Speed 3385.51 samples/sec   Loss 4.2253   LearningRate 0.0294   Epoch: 9   Global Step: 52110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:43,111-Speed 3394.74 samples/sec   Loss 4.4339   LearningRate 0.0293   Epoch: 9   Global Step: 52120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:46,126-Speed 3397.47 samples/sec   Loss 4.3372   LearningRate 0.0293   Epoch: 9   Global Step: 52130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:49,147-Speed 3390.74 samples/sec   Loss 4.1987   LearningRate 0.0293   Epoch: 9   Global Step: 52140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:52,161-Speed 3398.33 samples/sec   Loss 4.2445   LearningRate 0.0293   Epoch: 9   Global Step: 52150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:50:55,180-Speed 3391.97 samples/sec   Loss 4.1878   LearningRate 0.0293   Epoch: 9   Global Step: 52160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:50:58,196-Speed 3396.83 samples/sec   Loss 4.4711   LearningRate 0.0293   Epoch: 9   Global Step: 52170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:01,208-Speed 3400.09 samples/sec   Loss 4.2545   LearningRate 0.0293   Epoch: 9   Global Step: 52180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:04,227-Speed 3392.66 samples/sec   Loss 4.3952   LearningRate 0.0293   Epoch: 9   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:07,240-Speed 3399.56 samples/sec   Loss 4.3538   LearningRate 0.0293   Epoch: 9   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:10,252-Speed 3400.16 samples/sec   Loss 4.2654   LearningRate 0.0293   Epoch: 9   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:13,267-Speed 3397.94 samples/sec   Loss 4.2496   LearningRate 0.0292   Epoch: 9   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:16,282-Speed 3397.39 samples/sec   Loss 4.2416   LearningRate 0.0292   Epoch: 9   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:19,304-Speed 3389.34 samples/sec   Loss 4.3083   LearningRate 0.0292   Epoch: 9   Global Step: 52240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:22,317-Speed 3398.34 samples/sec   Loss 4.3677   LearningRate 0.0292   Epoch: 9   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:25,350-Speed 3377.51 samples/sec   Loss 4.3335   LearningRate 0.0292   Epoch: 9   Global Step: 52260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:51:28,386-Speed 3373.61 samples/sec   Loss 4.3961   LearningRate 0.0292   Epoch: 9   Global Step: 52270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:51:31,402-Speed 3396.01 samples/sec   Loss 4.4022   LearningRate 0.0292   Epoch: 9   Global Step: 52280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:51:34,410-Speed 3404.94 samples/sec   Loss 4.3228   LearningRate 0.0292   Epoch: 9   Global Step: 52290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:37,440-Speed 3380.87 samples/sec   Loss 4.3257   LearningRate 0.0292   Epoch: 9   Global Step: 52300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:40,463-Speed 3387.85 samples/sec   Loss 4.3801   LearningRate 0.0292   Epoch: 9   Global Step: 52310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:43,487-Speed 3386.57 samples/sec   Loss 4.4441   LearningRate 0.0292   Epoch: 9   Global Step: 52320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:46,539-Speed 3356.47 samples/sec   Loss 4.3150   LearningRate 0.0291   Epoch: 9   Global Step: 52330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:49,565-Speed 3385.31 samples/sec   Loss 4.4193   LearningRate 0.0291   Epoch: 9   Global Step: 52340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:52,597-Speed 3377.79 samples/sec   Loss 4.4371   LearningRate 0.0291   Epoch: 9   Global Step: 52350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:55,619-Speed 3389.62 samples/sec   Loss 4.4522   LearningRate 0.0291   Epoch: 9   Global Step: 52360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:51:58,652-Speed 3377.46 samples/sec   Loss 4.3051   LearningRate 0.0291   Epoch: 9   Global Step: 52370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:01,671-Speed 3392.14 samples/sec   Loss 4.3131   LearningRate 0.0291   Epoch: 9   Global Step: 52380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:04,668-Speed 3417.63 samples/sec   Loss 4.4158   LearningRate 0.0291   Epoch: 9   Global Step: 52390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:07,681-Speed 3400.08 samples/sec   Loss 4.3667   LearningRate 0.0291   Epoch: 9   Global Step: 52400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:10,701-Speed 3391.56 samples/sec   Loss 4.4086   LearningRate 0.0291   Epoch: 9   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:13,734-Speed 3376.76 samples/sec   Loss 4.3680   LearningRate 0.0291   Epoch: 9   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:16,763-Speed 3381.72 samples/sec   Loss 4.2563   LearningRate 0.0290   Epoch: 9   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:19,777-Speed 3397.90 samples/sec   Loss 4.3706   LearningRate 0.0290   Epoch: 9   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:22,797-Speed 3392.48 samples/sec   Loss 4.4327   LearningRate 0.0290   Epoch: 9   Global Step: 52450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:25,818-Speed 3390.16 samples/sec   Loss 4.2886   LearningRate 0.0290   Epoch: 9   Global Step: 52460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:28,831-Speed 3399.16 samples/sec   Loss 4.4183   LearningRate 0.0290   Epoch: 9   Global Step: 52470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:31,857-Speed 3384.64 samples/sec   Loss 4.4325   LearningRate 0.0290   Epoch: 9   Global Step: 52480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:52:34,875-Speed 3394.52 samples/sec   Loss 4.3549   LearningRate 0.0290   Epoch: 9   Global Step: 52490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:37,887-Speed 3399.58 samples/sec   Loss 4.4503   LearningRate 0.0290   Epoch: 9   Global Step: 52500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:40,905-Speed 3394.04 samples/sec   Loss 4.3763   LearningRate 0.0290   Epoch: 9   Global Step: 52510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:43,925-Speed 3391.86 samples/sec   Loss 4.3093   LearningRate 0.0290   Epoch: 9   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:46,954-Speed 3381.57 samples/sec   Loss 4.2112   LearningRate 0.0290   Epoch: 9   Global Step: 52530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:49,980-Speed 3384.69 samples/sec   Loss 4.3244   LearningRate 0.0289   Epoch: 9   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:53,001-Speed 3390.51 samples/sec   Loss 4.2966   LearningRate 0.0289   Epoch: 9   Global Step: 52550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:56,024-Speed 3388.05 samples/sec   Loss 4.4148   LearningRate 0.0289   Epoch: 9   Global Step: 52560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:52:59,075-Speed 3356.30 samples/sec   Loss 4.3399   LearningRate 0.0289   Epoch: 9   Global Step: 52570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:53:02,076-Speed 3414.06 samples/sec   Loss 4.5215   LearningRate 0.0289   Epoch: 9   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:05,130-Speed 3353.92 samples/sec   Loss 4.3615   LearningRate 0.0289   Epoch: 9   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:08,149-Speed 3391.83 samples/sec   Loss 4.3311   LearningRate 0.0289   Epoch: 9   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:11,168-Speed 3392.60 samples/sec   Loss 4.4031   LearningRate 0.0289   Epoch: 9   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:14,186-Speed 3394.55 samples/sec   Loss 4.3658   LearningRate 0.0289   Epoch: 9   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:17,206-Speed 3391.72 samples/sec   Loss 4.3588   LearningRate 0.0289   Epoch: 9   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:20,225-Speed 3392.78 samples/sec   Loss 4.3728   LearningRate 0.0288   Epoch: 9   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:23,247-Speed 3388.20 samples/sec   Loss 4.2872   LearningRate 0.0288   Epoch: 9   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:26,295-Speed 3360.80 samples/sec   Loss 4.2443   LearningRate 0.0288   Epoch: 9   Global Step: 52660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:29,310-Speed 3397.07 samples/sec   Loss 4.4560   LearningRate 0.0288   Epoch: 9   Global Step: 52670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:32,325-Speed 3398.05 samples/sec   Loss 4.3554   LearningRate 0.0288   Epoch: 9   Global Step: 52680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:53:35,343-Speed 3393.76 samples/sec   Loss 4.4101   LearningRate 0.0288   Epoch: 9   Global Step: 52690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:53:38,358-Speed 3396.71 samples/sec   Loss 4.2598   LearningRate 0.0288   Epoch: 9   Global Step: 52700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:53:41,375-Speed 3394.68 samples/sec   Loss 4.4112   LearningRate 0.0288   Epoch: 9   Global Step: 52710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:53:44,375-Speed 3414.25 samples/sec   Loss 4.4563   LearningRate 0.0288   Epoch: 9   Global Step: 52720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:47,392-Speed 3394.94 samples/sec   Loss 4.3732   LearningRate 0.0288   Epoch: 9   Global Step: 52730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:50,443-Speed 3356.99 samples/sec   Loss 4.3742   LearningRate 0.0288   Epoch: 9   Global Step: 52740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:53:53,454-Speed 3401.43 samples/sec   Loss 4.4620   LearningRate 0.0287   Epoch: 9   Global Step: 52750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:53:56,473-Speed 3393.18 samples/sec   Loss 4.2995   LearningRate 0.0287   Epoch: 9   Global Step: 52760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:53:59,506-Speed 3376.93 samples/sec   Loss 4.2298   LearningRate 0.0287   Epoch: 9   Global Step: 52770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:02,529-Speed 3388.35 samples/sec   Loss 4.4234   LearningRate 0.0287   Epoch: 9   Global Step: 52780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:05,547-Speed 3393.63 samples/sec   Loss 4.4280   LearningRate 0.0287   Epoch: 9   Global Step: 52790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:08,565-Speed 3393.55 samples/sec   Loss 4.3669   LearningRate 0.0287   Epoch: 9   Global Step: 52800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:11,602-Speed 3373.22 samples/sec   Loss 4.4404   LearningRate 0.0287   Epoch: 9   Global Step: 52810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:14,656-Speed 3353.24 samples/sec   Loss 4.2111   LearningRate 0.0287   Epoch: 9   Global Step: 52820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:17,676-Speed 3391.47 samples/sec   Loss 4.3757   LearningRate 0.0287   Epoch: 9   Global Step: 52830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:20,693-Speed 3395.79 samples/sec   Loss 4.3875   LearningRate 0.0287   Epoch: 9   Global Step: 52840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:54:23,714-Speed 3389.58 samples/sec   Loss 4.3108   LearningRate 0.0287   Epoch: 9   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:26,733-Speed 3393.11 samples/sec   Loss 4.3022   LearningRate 0.0286   Epoch: 9   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:29,751-Speed 3393.99 samples/sec   Loss 4.3955   LearningRate 0.0286   Epoch: 9   Global Step: 52870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:32,774-Speed 3387.47 samples/sec   Loss 4.3262   LearningRate 0.0286   Epoch: 9   Global Step: 52880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:35,799-Speed 3386.01 samples/sec   Loss 4.2047   LearningRate 0.0286   Epoch: 9   Global Step: 52890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:38,829-Speed 3381.13 samples/sec   Loss 4.3353   LearningRate 0.0286   Epoch: 9   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:41,855-Speed 3384.47 samples/sec   Loss 4.5362   LearningRate 0.0286   Epoch: 9   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:44,885-Speed 3380.16 samples/sec   Loss 4.3583   LearningRate 0.0286   Epoch: 9   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:47,928-Speed 3365.44 samples/sec   Loss 4.3489   LearningRate 0.0286   Epoch: 9   Global Step: 52930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:50,949-Speed 3391.17 samples/sec   Loss 4.3882   LearningRate 0.0286   Epoch: 9   Global Step: 52940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:53,955-Speed 3406.95 samples/sec   Loss 4.3880   LearningRate 0.0286   Epoch: 9   Global Step: 52950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:54:56,987-Speed 3378.07 samples/sec   Loss 4.3780   LearningRate 0.0285   Epoch: 9   Global Step: 52960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:00,025-Speed 3372.44 samples/sec   Loss 4.3893   LearningRate 0.0285   Epoch: 9   Global Step: 52970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:03,052-Speed 3383.84 samples/sec   Loss 4.3330   LearningRate 0.0285   Epoch: 9   Global Step: 52980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:06,073-Speed 3389.83 samples/sec   Loss 4.3230   LearningRate 0.0285   Epoch: 9   Global Step: 52990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:09,094-Speed 3390.43 samples/sec   Loss 4.3250   LearningRate 0.0285   Epoch: 9   Global Step: 53000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:12,116-Speed 3389.20 samples/sec   Loss 4.4346   LearningRate 0.0285   Epoch: 9   Global Step: 53010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:15,141-Speed 3386.23 samples/sec   Loss 4.3854   LearningRate 0.0285   Epoch: 9   Global Step: 53020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:18,160-Speed 3392.16 samples/sec   Loss 4.3507   LearningRate 0.0285   Epoch: 9   Global Step: 53030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:21,179-Speed 3393.57 samples/sec   Loss 4.3689   LearningRate 0.0285   Epoch: 9   Global Step: 53040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:24,218-Speed 3370.19 samples/sec   Loss 4.3781   LearningRate 0.0285   Epoch: 9   Global Step: 53050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:55:27,228-Speed 3403.06 samples/sec   Loss 4.3407   LearningRate 0.0285   Epoch: 9   Global Step: 53060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:55:30,233-Speed 3407.89 samples/sec   Loss 4.3553   LearningRate 0.0284   Epoch: 9   Global Step: 53070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:33,254-Speed 3390.81 samples/sec   Loss 4.2953   LearningRate 0.0284   Epoch: 9   Global Step: 53080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:36,280-Speed 3384.80 samples/sec   Loss 4.3938   LearningRate 0.0284   Epoch: 9   Global Step: 53090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:39,302-Speed 3388.31 samples/sec   Loss 4.6086   LearningRate 0.0284   Epoch: 9   Global Step: 53100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:42,328-Speed 3385.05 samples/sec   Loss 4.3966   LearningRate 0.0284   Epoch: 9   Global Step: 53110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:45,351-Speed 3388.84 samples/sec   Loss 4.2987   LearningRate 0.0284   Epoch: 9   Global Step: 53120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:48,383-Speed 3378.23 samples/sec   Loss 4.3395   LearningRate 0.0284   Epoch: 9   Global Step: 53130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:51,414-Speed 3378.62 samples/sec   Loss 4.2881   LearningRate 0.0284   Epoch: 9   Global Step: 53140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:54,446-Speed 3378.22 samples/sec   Loss 4.3306   LearningRate 0.0284   Epoch: 9   Global Step: 53150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:55:57,488-Speed 3367.04 samples/sec   Loss 4.4146   LearningRate 0.0284   Epoch: 9   Global Step: 53160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 06:56:00,545-Speed 3350.60 samples/sec   Loss 4.3530   LearningRate 0.0284   Epoch: 9   Global Step: 53170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:03,570-Speed 3386.26 samples/sec   Loss 4.3009   LearningRate 0.0283   Epoch: 9   Global Step: 53180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:06,597-Speed 3383.46 samples/sec   Loss 4.3934   LearningRate 0.0283   Epoch: 9   Global Step: 53190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:09,632-Speed 3375.38 samples/sec   Loss 4.3763   LearningRate 0.0283   Epoch: 9   Global Step: 53200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:12,704-Speed 3334.44 samples/sec   Loss 4.3837   LearningRate 0.0283   Epoch: 9   Global Step: 53210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:15,732-Speed 3382.20 samples/sec   Loss 4.4175   LearningRate 0.0283   Epoch: 9   Global Step: 53220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:18,757-Speed 3386.23 samples/sec   Loss 4.2644   LearningRate 0.0283   Epoch: 9   Global Step: 53230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:21,784-Speed 3382.71 samples/sec   Loss 4.3891   LearningRate 0.0283   Epoch: 9   Global Step: 53240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:24,812-Speed 3382.84 samples/sec   Loss 4.4065   LearningRate 0.0283   Epoch: 9   Global Step: 53250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:27,835-Speed 3388.79 samples/sec   Loss 4.4310   LearningRate 0.0283   Epoch: 9   Global Step: 53260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:30,869-Speed 3375.63 samples/sec   Loss 4.4201   LearningRate 0.0283   Epoch: 9   Global Step: 53270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:56:33,899-Speed 3379.99 samples/sec   Loss 4.2845   LearningRate 0.0282   Epoch: 9   Global Step: 53280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:56:36,907-Speed 3405.49 samples/sec   Loss 4.3841   LearningRate 0.0282   Epoch: 9   Global Step: 53290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:39,928-Speed 3390.87 samples/sec   Loss 4.3909   LearningRate 0.0282   Epoch: 9   Global Step: 53300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:42,949-Speed 3389.88 samples/sec   Loss 4.2943   LearningRate 0.0282   Epoch: 9   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:45,992-Speed 3365.88 samples/sec   Loss 4.3961   LearningRate 0.0282   Epoch: 9   Global Step: 53320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:49,016-Speed 3387.35 samples/sec   Loss 4.3702   LearningRate 0.0282   Epoch: 9   Global Step: 53330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:52,044-Speed 3382.32 samples/sec   Loss 4.4420   LearningRate 0.0282   Epoch: 9   Global Step: 53340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:55,069-Speed 3386.84 samples/sec   Loss 4.2676   LearningRate 0.0282   Epoch: 9   Global Step: 53350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:56:58,098-Speed 3381.02 samples/sec   Loss 4.3537   LearningRate 0.0282   Epoch: 9   Global Step: 53360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:01,123-Speed 3386.15 samples/sec   Loss 4.3899   LearningRate 0.0282   Epoch: 9   Global Step: 53370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:04,158-Speed 3373.95 samples/sec   Loss 4.2935   LearningRate 0.0282   Epoch: 9   Global Step: 53380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:07,167-Speed 3404.75 samples/sec   Loss 4.3133   LearningRate 0.0281   Epoch: 9   Global Step: 53390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:10,199-Speed 3377.60 samples/sec   Loss 4.2814   LearningRate 0.0281   Epoch: 9   Global Step: 53400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:13,349-Speed 3251.84 samples/sec   Loss 4.3382   LearningRate 0.0281   Epoch: 9   Global Step: 53410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:16,450-Speed 3303.24 samples/sec   Loss 4.4204   LearningRate 0.0281   Epoch: 9   Global Step: 53420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:19,480-Speed 3379.43 samples/sec   Loss 4.3067   LearningRate 0.0281   Epoch: 9   Global Step: 53430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:22,507-Speed 3384.10 samples/sec   Loss 4.3478   LearningRate 0.0281   Epoch: 9   Global Step: 53440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:25,565-Speed 3349.78 samples/sec   Loss 4.4179   LearningRate 0.0281   Epoch: 9   Global Step: 53450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:28,608-Speed 3365.24 samples/sec   Loss 4.2670   LearningRate 0.0281   Epoch: 9   Global Step: 53460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:31,639-Speed 3379.46 samples/sec   Loss 4.3672   LearningRate 0.0281   Epoch: 9   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:34,660-Speed 3390.22 samples/sec   Loss 4.3792   LearningRate 0.0281   Epoch: 9   Global Step: 53480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:57:37,691-Speed 3379.45 samples/sec   Loss 4.2931   LearningRate 0.0281   Epoch: 9   Global Step: 53490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:57:40,713-Speed 3389.44 samples/sec   Loss 4.4596   LearningRate 0.0280   Epoch: 9   Global Step: 53500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:57:43,753-Speed 3370.80 samples/sec   Loss 4.2959   LearningRate 0.0280   Epoch: 9   Global Step: 53510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:57:46,799-Speed 3362.84 samples/sec   Loss 4.3929   LearningRate 0.0280   Epoch: 9   Global Step: 53520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:57:49,841-Speed 3366.03 samples/sec   Loss 4.3248   LearningRate 0.0280   Epoch: 9   Global Step: 53530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:57:52,895-Speed 3353.81 samples/sec   Loss 4.3701   LearningRate 0.0280   Epoch: 9   Global Step: 53540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:57:55,918-Speed 3388.38 samples/sec   Loss 4.3664   LearningRate 0.0280   Epoch: 9   Global Step: 53550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:57:58,947-Speed 3381.58 samples/sec   Loss 4.3561   LearningRate 0.0280   Epoch: 9   Global Step: 53560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:01,979-Speed 3378.58 samples/sec   Loss 4.2904   LearningRate 0.0280   Epoch: 9   Global Step: 53570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:04,988-Speed 3404.39 samples/sec   Loss 4.3477   LearningRate 0.0280   Epoch: 9   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:08,061-Speed 3332.51 samples/sec   Loss 4.3104   LearningRate 0.0280   Epoch: 9   Global Step: 53590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:11,084-Speed 3388.05 samples/sec   Loss 4.4070   LearningRate 0.0279   Epoch: 9   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:14,111-Speed 3383.79 samples/sec   Loss 4.3763   LearningRate 0.0279   Epoch: 9   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:17,141-Speed 3380.18 samples/sec   Loss 4.2558   LearningRate 0.0279   Epoch: 9   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:20,177-Speed 3374.32 samples/sec   Loss 4.2618   LearningRate 0.0279   Epoch: 9   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:23,203-Speed 3384.11 samples/sec   Loss 4.4127   LearningRate 0.0279   Epoch: 9   Global Step: 53640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:26,236-Speed 3376.65 samples/sec   Loss 4.3034   LearningRate 0.0279   Epoch: 9   Global Step: 53650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:29,262-Speed 3385.88 samples/sec   Loss 4.4528   LearningRate 0.0279   Epoch: 9   Global Step: 53660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:32,289-Speed 3385.05 samples/sec   Loss 4.2986   LearningRate 0.0279   Epoch: 9   Global Step: 53670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:58:35,318-Speed 3381.17 samples/sec   Loss 4.3308   LearningRate 0.0279   Epoch: 9   Global Step: 53680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:38,346-Speed 3383.16 samples/sec   Loss 4.3597   LearningRate 0.0279   Epoch: 9   Global Step: 53690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:41,368-Speed 3389.25 samples/sec   Loss 4.3236   LearningRate 0.0279   Epoch: 9   Global Step: 53700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:44,390-Speed 3388.67 samples/sec   Loss 4.3179   LearningRate 0.0278   Epoch: 9   Global Step: 53710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:47,434-Speed 3364.80 samples/sec   Loss 4.4990   LearningRate 0.0278   Epoch: 9   Global Step: 53720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:50,459-Speed 3386.02 samples/sec   Loss 4.3893   LearningRate 0.0278   Epoch: 9   Global Step: 53730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:53,483-Speed 3387.43 samples/sec   Loss 4.3631   LearningRate 0.0278   Epoch: 9   Global Step: 53740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:56,507-Speed 3386.90 samples/sec   Loss 4.2816   LearningRate 0.0278   Epoch: 9   Global Step: 53750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:58:59,536-Speed 3381.03 samples/sec   Loss 4.4005   LearningRate 0.0278   Epoch: 9   Global Step: 53760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:59:02,576-Speed 3369.28 samples/sec   Loss 4.3501   LearningRate 0.0278   Epoch: 9   Global Step: 53770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:59:05,578-Speed 3411.94 samples/sec   Loss 4.3953   LearningRate 0.0278   Epoch: 9   Global Step: 53780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:59:08,610-Speed 3378.64 samples/sec   Loss 4.3797   LearningRate 0.0278   Epoch: 9   Global Step: 53790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:59:11,617-Speed 3405.62 samples/sec   Loss 4.4130   LearningRate 0.0278   Epoch: 9   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:14,641-Speed 3387.22 samples/sec   Loss 4.3759   LearningRate 0.0278   Epoch: 9   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:17,666-Speed 3385.53 samples/sec   Loss 4.4386   LearningRate 0.0277   Epoch: 9   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:20,712-Speed 3362.73 samples/sec   Loss 4.3209   LearningRate 0.0277   Epoch: 9   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:23,775-Speed 3344.10 samples/sec   Loss 4.2804   LearningRate 0.0277   Epoch: 9   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:26,807-Speed 3378.28 samples/sec   Loss 4.4310   LearningRate 0.0277   Epoch: 9   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:29,839-Speed 3378.43 samples/sec   Loss 4.4246   LearningRate 0.0277   Epoch: 9   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:32,863-Speed 3387.12 samples/sec   Loss 4.3882   LearningRate 0.0277   Epoch: 9   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:35,888-Speed 3385.95 samples/sec   Loss 4.2767   LearningRate 0.0277   Epoch: 9   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:38,934-Speed 3362.71 samples/sec   Loss 4.3118   LearningRate 0.0277   Epoch: 9   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:41,958-Speed 3386.26 samples/sec   Loss 4.3524   LearningRate 0.0277   Epoch: 9   Global Step: 53900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 06:59:44,970-Speed 3401.23 samples/sec   Loss 4.3272   LearningRate 0.0277   Epoch: 9   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:48,005-Speed 3374.65 samples/sec   Loss 4.3962   LearningRate 0.0277   Epoch: 9   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:51,060-Speed 3352.70 samples/sec   Loss 4.3802   LearningRate 0.0276   Epoch: 9   Global Step: 53930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:54,085-Speed 3386.47 samples/sec   Loss 4.2410   LearningRate 0.0276   Epoch: 9   Global Step: 53940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 06:59:57,104-Speed 3392.12 samples/sec   Loss 4.3804   LearningRate 0.0276   Epoch: 9   Global Step: 53950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:00:00,129-Speed 3386.30 samples/sec   Loss 4.2374   LearningRate 0.0276   Epoch: 9   Global Step: 53960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:00:03,158-Speed 3381.55 samples/sec   Loss 4.3346   LearningRate 0.0276   Epoch: 9   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:00:06,180-Speed 3388.62 samples/sec   Loss 4.2922   LearningRate 0.0276   Epoch: 9   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:00:09,204-Speed 3387.64 samples/sec   Loss 4.4302   LearningRate 0.0276   Epoch: 9   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:00:12,246-Speed 3367.27 samples/sec   Loss 4.1646   LearningRate 0.0276   Epoch: 9   Global Step: 54000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:00:55,857-[lfw][54000]XNorm: 21.680687
Training: 2022-04-27 07:00:55,857-[lfw][54000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-27 07:00:55,858-[lfw][54000]Accuracy-Highest: 0.99817
Training: 2022-04-27 07:01:46,613-[cfp_fp][54000]XNorm: 19.846197
Training: 2022-04-27 07:01:46,614-[cfp_fp][54000]Accuracy-Flip: 0.96171+-0.00805
Training: 2022-04-27 07:01:46,614-[cfp_fp][54000]Accuracy-Highest: 0.96414
Training: 2022-04-27 07:02:30,355-[agedb_30][54000]XNorm: 22.034334
Training: 2022-04-27 07:02:30,356-[agedb_30][54000]Accuracy-Flip: 0.97667+-0.00957
Training: 2022-04-27 07:02:30,356-[agedb_30][54000]Accuracy-Highest: 0.97767
Training: 2022-04-27 07:02:33,380-Speed 72.56 samples/sec   Loss 4.2218   LearningRate 0.0276   Epoch: 9   Global Step: 54010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:02:36,453-Speed 3332.90 samples/sec   Loss 4.3346   LearningRate 0.0276   Epoch: 9   Global Step: 54020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:02:39,461-Speed 3406.59 samples/sec   Loss 4.3016   LearningRate 0.0276   Epoch: 9   Global Step: 54030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:02:42,469-Speed 3404.17 samples/sec   Loss 4.3237   LearningRate 0.0275   Epoch: 9   Global Step: 54040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:02:45,480-Speed 3401.60 samples/sec   Loss 4.3612   LearningRate 0.0275   Epoch: 9   Global Step: 54050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:02:48,494-Speed 3398.34 samples/sec   Loss 4.3521   LearningRate 0.0275   Epoch: 9   Global Step: 54060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:02:51,496-Speed 3412.16 samples/sec   Loss 4.3628   LearningRate 0.0275   Epoch: 9   Global Step: 54070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:02:54,511-Speed 3396.80 samples/sec   Loss 4.2999   LearningRate 0.0275   Epoch: 9   Global Step: 54080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:02:57,525-Speed 3398.26 samples/sec   Loss 4.1906   LearningRate 0.0275   Epoch: 9   Global Step: 54090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:00,570-Speed 3363.68 samples/sec   Loss 4.4074   LearningRate 0.0275   Epoch: 9   Global Step: 54100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:03,596-Speed 3384.62 samples/sec   Loss 4.3339   LearningRate 0.0275   Epoch: 9   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:06,616-Speed 3392.02 samples/sec   Loss 4.2942   LearningRate 0.0275   Epoch: 9   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:09,638-Speed 3389.49 samples/sec   Loss 4.3163   LearningRate 0.0275   Epoch: 9   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:12,716-Speed 3327.66 samples/sec   Loss 4.3055   LearningRate 0.0274   Epoch: 9   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:15,742-Speed 3384.36 samples/sec   Loss 4.2419   LearningRate 0.0274   Epoch: 9   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:18,766-Speed 3386.58 samples/sec   Loss 4.5051   LearningRate 0.0274   Epoch: 9   Global Step: 54160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:21,772-Speed 3408.31 samples/sec   Loss 4.1993   LearningRate 0.0274   Epoch: 9   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:24,797-Speed 3385.52 samples/sec   Loss 4.3109   LearningRate 0.0274   Epoch: 9   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:27,817-Speed 3391.70 samples/sec   Loss 4.3283   LearningRate 0.0274   Epoch: 9   Global Step: 54190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:30,848-Speed 3379.35 samples/sec   Loss 4.3771   LearningRate 0.0274   Epoch: 9   Global Step: 54200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:33,866-Speed 3394.41 samples/sec   Loss 4.3270   LearningRate 0.0274   Epoch: 9   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:36,893-Speed 3383.53 samples/sec   Loss 4.3254   LearningRate 0.0274   Epoch: 9   Global Step: 54220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:39,908-Speed 3397.09 samples/sec   Loss 4.2814   LearningRate 0.0274   Epoch: 9   Global Step: 54230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:42,923-Speed 3396.81 samples/sec   Loss 4.4420   LearningRate 0.0274   Epoch: 9   Global Step: 54240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:45,935-Speed 3400.44 samples/sec   Loss 4.3677   LearningRate 0.0273   Epoch: 9   Global Step: 54250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:48,977-Speed 3366.50 samples/sec   Loss 4.3619   LearningRate 0.0273   Epoch: 9   Global Step: 54260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:03:52,080-Speed 3301.01 samples/sec   Loss 4.3800   LearningRate 0.0273   Epoch: 9   Global Step: 54270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:03:55,093-Speed 3399.79 samples/sec   Loss 4.3213   LearningRate 0.0273   Epoch: 9   Global Step: 54280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:03:58,113-Speed 3391.30 samples/sec   Loss 4.2484   LearningRate 0.0273   Epoch: 9   Global Step: 54290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:01,127-Speed 3399.21 samples/sec   Loss 4.4379   LearningRate 0.0273   Epoch: 9   Global Step: 54300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:04,138-Speed 3401.26 samples/sec   Loss 4.3128   LearningRate 0.0273   Epoch: 9   Global Step: 54310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:07,146-Speed 3405.49 samples/sec   Loss 4.2982   LearningRate 0.0273   Epoch: 9   Global Step: 54320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:10,156-Speed 3402.03 samples/sec   Loss 4.4614   LearningRate 0.0273   Epoch: 9   Global Step: 54330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:13,176-Speed 3392.24 samples/sec   Loss 4.3644   LearningRate 0.0273   Epoch: 9   Global Step: 54340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:16,188-Speed 3400.61 samples/sec   Loss 4.3456   LearningRate 0.0273   Epoch: 9   Global Step: 54350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:19,204-Speed 3394.94 samples/sec   Loss 4.2833   LearningRate 0.0272   Epoch: 9   Global Step: 54360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:22,214-Speed 3402.91 samples/sec   Loss 4.3343   LearningRate 0.0272   Epoch: 9   Global Step: 54370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:25,211-Speed 3417.53 samples/sec   Loss 4.2058   LearningRate 0.0272   Epoch: 9   Global Step: 54380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:28,226-Speed 3397.56 samples/sec   Loss 4.2919   LearningRate 0.0272   Epoch: 9   Global Step: 54390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:31,245-Speed 3392.78 samples/sec   Loss 4.2817   LearningRate 0.0272   Epoch: 9   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:34,255-Speed 3402.90 samples/sec   Loss 4.4432   LearningRate 0.0272   Epoch: 9   Global Step: 54410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:37,265-Speed 3402.64 samples/sec   Loss 4.2983   LearningRate 0.0272   Epoch: 9   Global Step: 54420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:40,278-Speed 3399.74 samples/sec   Loss 4.3382   LearningRate 0.0272   Epoch: 9   Global Step: 54430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:43,287-Speed 3403.08 samples/sec   Loss 4.2588   LearningRate 0.0272   Epoch: 9   Global Step: 54440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:46,321-Speed 3375.80 samples/sec   Loss 4.3614   LearningRate 0.0272   Epoch: 9   Global Step: 54450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:49,340-Speed 3393.48 samples/sec   Loss 4.3979   LearningRate 0.0272   Epoch: 9   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:52,356-Speed 3395.51 samples/sec   Loss 4.3758   LearningRate 0.0271   Epoch: 9   Global Step: 54470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:04:55,367-Speed 3402.77 samples/sec   Loss 4.3697   LearningRate 0.0271   Epoch: 9   Global Step: 54480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:04:58,374-Speed 3405.18 samples/sec   Loss 4.3013   LearningRate 0.0271   Epoch: 9   Global Step: 54490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:01,389-Speed 3397.82 samples/sec   Loss 4.3869   LearningRate 0.0271   Epoch: 9   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:04,402-Speed 3399.68 samples/sec   Loss 4.4337   LearningRate 0.0271   Epoch: 9   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:07,421-Speed 3392.33 samples/sec   Loss 4.2639   LearningRate 0.0271   Epoch: 9   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:10,434-Speed 3399.15 samples/sec   Loss 4.3624   LearningRate 0.0271   Epoch: 9   Global Step: 54530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:13,450-Speed 3396.30 samples/sec   Loss 4.3707   LearningRate 0.0271   Epoch: 9   Global Step: 54540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:16,463-Speed 3399.06 samples/sec   Loss 4.3165   LearningRate 0.0271   Epoch: 9   Global Step: 54550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:19,482-Speed 3393.12 samples/sec   Loss 4.2707   LearningRate 0.0271   Epoch: 9   Global Step: 54560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:22,494-Speed 3400.21 samples/sec   Loss 4.2454   LearningRate 0.0271   Epoch: 9   Global Step: 54570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:05:25,497-Speed 3411.02 samples/sec   Loss 4.2217   LearningRate 0.0270   Epoch: 9   Global Step: 54580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:05:28,570-Speed 3332.89 samples/sec   Loss 4.3156   LearningRate 0.0270   Epoch: 9   Global Step: 54590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:05:31,584-Speed 3398.84 samples/sec   Loss 4.4154   LearningRate 0.0270   Epoch: 9   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:05:34,608-Speed 3386.52 samples/sec   Loss 4.3240   LearningRate 0.0270   Epoch: 9   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:05:37,624-Speed 3396.19 samples/sec   Loss 4.3467   LearningRate 0.0270   Epoch: 9   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:05:40,640-Speed 3395.04 samples/sec   Loss 4.1863   LearningRate 0.0270   Epoch: 9   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:05:43,638-Speed 3416.86 samples/sec   Loss 4.2313   LearningRate 0.0270   Epoch: 9   Global Step: 54640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:05:46,682-Speed 3364.53 samples/sec   Loss 4.2856   LearningRate 0.0270   Epoch: 9   Global Step: 54650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:05:49,697-Speed 3397.51 samples/sec   Loss 4.3286   LearningRate 0.0270   Epoch: 9   Global Step: 54660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:05:52,711-Speed 3397.98 samples/sec   Loss 4.3271   LearningRate 0.0270   Epoch: 9   Global Step: 54670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:05:55,740-Speed 3382.73 samples/sec   Loss 4.3501   LearningRate 0.0270   Epoch: 9   Global Step: 54680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:05:58,758-Speed 3393.78 samples/sec   Loss 4.3860   LearningRate 0.0269   Epoch: 9   Global Step: 54690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:06:01,776-Speed 3393.06 samples/sec   Loss 4.3512   LearningRate 0.0269   Epoch: 9   Global Step: 54700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:06:04,794-Speed 3393.83 samples/sec   Loss 4.3096   LearningRate 0.0269   Epoch: 9   Global Step: 54710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:06:07,814-Speed 3391.90 samples/sec   Loss 4.1899   LearningRate 0.0269   Epoch: 9   Global Step: 54720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:06:10,839-Speed 3385.27 samples/sec   Loss 4.2933   LearningRate 0.0269   Epoch: 9   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:06:13,858-Speed 3393.27 samples/sec   Loss 4.3062   LearningRate 0.0269   Epoch: 9   Global Step: 54740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:16,872-Speed 3398.74 samples/sec   Loss 4.4593   LearningRate 0.0269   Epoch: 9   Global Step: 54750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:19,890-Speed 3394.06 samples/sec   Loss 4.2972   LearningRate 0.0269   Epoch: 9   Global Step: 54760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:22,905-Speed 3396.61 samples/sec   Loss 4.3656   LearningRate 0.0269   Epoch: 9   Global Step: 54770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:25,930-Speed 3385.83 samples/sec   Loss 4.2560   LearningRate 0.0269   Epoch: 9   Global Step: 54780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:28,946-Speed 3396.63 samples/sec   Loss 4.2182   LearningRate 0.0269   Epoch: 9   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:31,963-Speed 3394.67 samples/sec   Loss 4.3718   LearningRate 0.0268   Epoch: 9   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:34,991-Speed 3382.93 samples/sec   Loss 4.2160   LearningRate 0.0268   Epoch: 9   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:38,008-Speed 3394.70 samples/sec   Loss 4.2980   LearningRate 0.0268   Epoch: 9   Global Step: 54820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:41,027-Speed 3392.94 samples/sec   Loss 4.3150   LearningRate 0.0268   Epoch: 9   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:44,051-Speed 3387.28 samples/sec   Loss 4.2575   LearningRate 0.0268   Epoch: 9   Global Step: 54840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:06:47,053-Speed 3411.92 samples/sec   Loss 4.1367   LearningRate 0.0268   Epoch: 9   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:50,070-Speed 3394.36 samples/sec   Loss 4.2408   LearningRate 0.0268   Epoch: 9   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:53,087-Speed 3394.60 samples/sec   Loss 4.3069   LearningRate 0.0268   Epoch: 9   Global Step: 54870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:56,105-Speed 3393.92 samples/sec   Loss 4.4732   LearningRate 0.0268   Epoch: 9   Global Step: 54880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:06:59,121-Speed 3396.01 samples/sec   Loss 4.2248   LearningRate 0.0268   Epoch: 9   Global Step: 54890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:02,150-Speed 3382.12 samples/sec   Loss 4.4146   LearningRate 0.0268   Epoch: 9   Global Step: 54900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:05,179-Speed 3380.97 samples/sec   Loss 4.3455   LearningRate 0.0267   Epoch: 9   Global Step: 54910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:08,198-Speed 3392.50 samples/sec   Loss 4.2570   LearningRate 0.0267   Epoch: 9   Global Step: 54920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:11,225-Speed 3383.90 samples/sec   Loss 4.2080   LearningRate 0.0267   Epoch: 9   Global Step: 54930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:14,241-Speed 3395.87 samples/sec   Loss 4.2680   LearningRate 0.0267   Epoch: 9   Global Step: 54940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:17,242-Speed 3413.87 samples/sec   Loss 4.3176   LearningRate 0.0267   Epoch: 9   Global Step: 54950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:20,268-Speed 3384.75 samples/sec   Loss 4.2830   LearningRate 0.0267   Epoch: 9   Global Step: 54960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:23,294-Speed 3384.63 samples/sec   Loss 4.2258   LearningRate 0.0267   Epoch: 9   Global Step: 54970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:26,312-Speed 3392.98 samples/sec   Loss 4.2287   LearningRate 0.0267   Epoch: 9   Global Step: 54980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:29,332-Speed 3393.46 samples/sec   Loss 4.3132   LearningRate 0.0267   Epoch: 9   Global Step: 54990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:32,355-Speed 3388.24 samples/sec   Loss 4.3151   LearningRate 0.0267   Epoch: 9   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:35,384-Speed 3381.66 samples/sec   Loss 4.2482   LearningRate 0.0267   Epoch: 9   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:38,401-Speed 3394.07 samples/sec   Loss 4.2392   LearningRate 0.0266   Epoch: 9   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:41,424-Speed 3388.65 samples/sec   Loss 4.2233   LearningRate 0.0266   Epoch: 9   Global Step: 55030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:44,437-Speed 3399.41 samples/sec   Loss 4.3369   LearningRate 0.0266   Epoch: 9   Global Step: 55040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:07:47,475-Speed 3371.46 samples/sec   Loss 4.2513   LearningRate 0.0266   Epoch: 9   Global Step: 55050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:07:50,498-Speed 3388.45 samples/sec   Loss 4.2596   LearningRate 0.0266   Epoch: 9   Global Step: 55060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:07:53,541-Speed 3365.35 samples/sec   Loss 4.3361   LearningRate 0.0266   Epoch: 9   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:07:56,562-Speed 3390.45 samples/sec   Loss 4.3442   LearningRate 0.0266   Epoch: 9   Global Step: 55080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:07:59,584-Speed 3389.87 samples/sec   Loss 4.2456   LearningRate 0.0266   Epoch: 9   Global Step: 55090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:08:02,601-Speed 3394.80 samples/sec   Loss 4.2497   LearningRate 0.0266   Epoch: 9   Global Step: 55100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:08:05,622-Speed 3389.73 samples/sec   Loss 4.2890   LearningRate 0.0266   Epoch: 9   Global Step: 55110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:08:08,640-Speed 3394.62 samples/sec   Loss 4.3624   LearningRate 0.0266   Epoch: 9   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:08:11,679-Speed 3369.68 samples/sec   Loss 4.3754   LearningRate 0.0265   Epoch: 9   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:08:14,698-Speed 3392.79 samples/sec   Loss 4.2531   LearningRate 0.0265   Epoch: 9   Global Step: 55140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:08:17,701-Speed 3411.14 samples/sec   Loss 4.3536   LearningRate 0.0265   Epoch: 9   Global Step: 55150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:20,723-Speed 3389.30 samples/sec   Loss 4.3402   LearningRate 0.0265   Epoch: 9   Global Step: 55160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:23,743-Speed 3390.73 samples/sec   Loss 4.4419   LearningRate 0.0265   Epoch: 9   Global Step: 55170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:27,694-Speed 2592.38 samples/sec   Loss 4.2562   LearningRate 0.0265   Epoch: 9   Global Step: 55180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:30,721-Speed 3388.26 samples/sec   Loss 4.3396   LearningRate 0.0265   Epoch: 9   Global Step: 55190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:33,745-Speed 3386.81 samples/sec   Loss 4.1546   LearningRate 0.0265   Epoch: 9   Global Step: 55200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:36,822-Speed 3328.89 samples/sec   Loss 4.2811   LearningRate 0.0265   Epoch: 9   Global Step: 55210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:39,862-Speed 3368.85 samples/sec   Loss 4.2222   LearningRate 0.0265   Epoch: 9   Global Step: 55220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:42,890-Speed 3382.90 samples/sec   Loss 4.2332   LearningRate 0.0265   Epoch: 9   Global Step: 55230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:45,914-Speed 3387.28 samples/sec   Loss 4.2173   LearningRate 0.0264   Epoch: 9   Global Step: 55240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:48,936-Speed 3389.19 samples/sec   Loss 4.2598   LearningRate 0.0264   Epoch: 9   Global Step: 55250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:08:51,947-Speed 3401.87 samples/sec   Loss 4.1608   LearningRate 0.0264   Epoch: 9   Global Step: 55260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:54,968-Speed 3389.81 samples/sec   Loss 4.3696   LearningRate 0.0264   Epoch: 9   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:08:58,047-Speed 3326.63 samples/sec   Loss 4.2030   LearningRate 0.0264   Epoch: 9   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:01,102-Speed 3352.45 samples/sec   Loss 4.1526   LearningRate 0.0264   Epoch: 9   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:04,110-Speed 3405.27 samples/sec   Loss 4.2407   LearningRate 0.0264   Epoch: 9   Global Step: 55300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:07,135-Speed 3386.24 samples/sec   Loss 4.3148   LearningRate 0.0264   Epoch: 9   Global Step: 55310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:10,154-Speed 3392.51 samples/sec   Loss 4.1949   LearningRate 0.0264   Epoch: 9   Global Step: 55320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:13,187-Speed 3376.90 samples/sec   Loss 4.2433   LearningRate 0.0264   Epoch: 9   Global Step: 55330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:16,207-Speed 3391.60 samples/sec   Loss 4.3172   LearningRate 0.0264   Epoch: 9   Global Step: 55340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:19,228-Speed 3390.26 samples/sec   Loss 4.5410   LearningRate 0.0263   Epoch: 9   Global Step: 55350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:22,245-Speed 3395.13 samples/sec   Loss 4.3292   LearningRate 0.0263   Epoch: 9   Global Step: 55360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:25,265-Speed 3391.95 samples/sec   Loss 4.3261   LearningRate 0.0263   Epoch: 9   Global Step: 55370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:28,283-Speed 3393.34 samples/sec   Loss 4.2334   LearningRate 0.0263   Epoch: 9   Global Step: 55380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:31,302-Speed 3393.22 samples/sec   Loss 4.4073   LearningRate 0.0263   Epoch: 9   Global Step: 55390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:09:34,330-Speed 3382.11 samples/sec   Loss 4.2172   LearningRate 0.0263   Epoch: 9   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:37,353-Speed 3388.61 samples/sec   Loss 4.2347   LearningRate 0.0263   Epoch: 9   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:40,370-Speed 3394.77 samples/sec   Loss 4.2259   LearningRate 0.0263   Epoch: 9   Global Step: 55420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:43,388-Speed 3393.08 samples/sec   Loss 4.2778   LearningRate 0.0263   Epoch: 9   Global Step: 55430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:46,418-Speed 3381.04 samples/sec   Loss 4.4070   LearningRate 0.0263   Epoch: 9   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:49,438-Speed 3390.97 samples/sec   Loss 4.1997   LearningRate 0.0263   Epoch: 9   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:52,462-Speed 3388.49 samples/sec   Loss 4.2478   LearningRate 0.0262   Epoch: 9   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:55,482-Speed 3390.59 samples/sec   Loss 4.1931   LearningRate 0.0262   Epoch: 9   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:09:58,506-Speed 3387.74 samples/sec   Loss 4.1862   LearningRate 0.0262   Epoch: 9   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:01,527-Speed 3389.54 samples/sec   Loss 4.3900   LearningRate 0.0262   Epoch: 9   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:04,549-Speed 3390.21 samples/sec   Loss 4.2472   LearningRate 0.0262   Epoch: 9   Global Step: 55500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:07,568-Speed 3391.78 samples/sec   Loss 4.3728   LearningRate 0.0262   Epoch: 9   Global Step: 55510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:10,589-Speed 3391.19 samples/sec   Loss 4.2294   LearningRate 0.0262   Epoch: 9   Global Step: 55520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:13,611-Speed 3388.65 samples/sec   Loss 4.2370   LearningRate 0.0262   Epoch: 9   Global Step: 55530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:16,630-Speed 3393.34 samples/sec   Loss 4.2422   LearningRate 0.0262   Epoch: 9   Global Step: 55540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:19,645-Speed 3397.14 samples/sec   Loss 4.2133   LearningRate 0.0262   Epoch: 9   Global Step: 55550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:22,665-Speed 3391.03 samples/sec   Loss 4.2878   LearningRate 0.0262   Epoch: 9   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:25,687-Speed 3389.32 samples/sec   Loss 4.3010   LearningRate 0.0261   Epoch: 9   Global Step: 55570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:28,708-Speed 3390.15 samples/sec   Loss 4.2868   LearningRate 0.0261   Epoch: 9   Global Step: 55580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:31,729-Speed 3390.12 samples/sec   Loss 4.3082   LearningRate 0.0261   Epoch: 9   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:34,762-Speed 3377.37 samples/sec   Loss 4.1932   LearningRate 0.0261   Epoch: 9   Global Step: 55600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:37,808-Speed 3362.27 samples/sec   Loss 4.2725   LearningRate 0.0261   Epoch: 9   Global Step: 55610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:40,840-Speed 3379.65 samples/sec   Loss 4.3660   LearningRate 0.0261   Epoch: 9   Global Step: 55620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:43,875-Speed 3374.94 samples/sec   Loss 4.2369   LearningRate 0.0261   Epoch: 9   Global Step: 55630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:46,901-Speed 3384.70 samples/sec   Loss 4.2026   LearningRate 0.0261   Epoch: 9   Global Step: 55640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:10:49,925-Speed 3386.24 samples/sec   Loss 4.2348   LearningRate 0.0261   Epoch: 9   Global Step: 55650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:52,948-Speed 3388.09 samples/sec   Loss 4.2045   LearningRate 0.0261   Epoch: 9   Global Step: 55660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:55,968-Speed 3392.34 samples/sec   Loss 4.2598   LearningRate 0.0261   Epoch: 9   Global Step: 55670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:10:58,997-Speed 3381.19 samples/sec   Loss 4.1216   LearningRate 0.0260   Epoch: 9   Global Step: 55680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:11:02,046-Speed 3359.05 samples/sec   Loss 4.2848   LearningRate 0.0260   Epoch: 9   Global Step: 55690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:05,073-Speed 3384.13 samples/sec   Loss 4.1652   LearningRate 0.0260   Epoch: 9   Global Step: 55700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:08,100-Speed 3383.61 samples/sec   Loss 4.2009   LearningRate 0.0260   Epoch: 9   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:11,123-Speed 3388.29 samples/sec   Loss 4.2483   LearningRate 0.0260   Epoch: 9   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:14,163-Speed 3369.12 samples/sec   Loss 4.1949   LearningRate 0.0260   Epoch: 9   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:17,184-Speed 3390.73 samples/sec   Loss 4.0933   LearningRate 0.0260   Epoch: 9   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:20,220-Speed 3372.85 samples/sec   Loss 4.2545   LearningRate 0.0260   Epoch: 9   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:23,249-Speed 3382.02 samples/sec   Loss 4.2018   LearningRate 0.0260   Epoch: 9   Global Step: 55760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:26,411-Speed 3238.40 samples/sec   Loss 4.2323   LearningRate 0.0260   Epoch: 9   Global Step: 55770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:29,434-Speed 3388.54 samples/sec   Loss 4.2454   LearningRate 0.0260   Epoch: 9   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:32,441-Speed 3405.85 samples/sec   Loss 4.2969   LearningRate 0.0259   Epoch: 9   Global Step: 55790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:35,466-Speed 3385.99 samples/sec   Loss 4.2035   LearningRate 0.0259   Epoch: 9   Global Step: 55800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:38,491-Speed 3386.90 samples/sec   Loss 4.2964   LearningRate 0.0259   Epoch: 9   Global Step: 55810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:41,516-Speed 3386.13 samples/sec   Loss 4.2012   LearningRate 0.0259   Epoch: 9   Global Step: 55820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:44,544-Speed 3381.75 samples/sec   Loss 4.3337   LearningRate 0.0259   Epoch: 9   Global Step: 55830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:47,577-Speed 3377.16 samples/sec   Loss 4.3152   LearningRate 0.0259   Epoch: 9   Global Step: 55840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:50,617-Speed 3369.24 samples/sec   Loss 4.3306   LearningRate 0.0259   Epoch: 9   Global Step: 55850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:53,649-Speed 3379.17 samples/sec   Loss 4.3810   LearningRate 0.0259   Epoch: 9   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:56,672-Speed 3387.38 samples/sec   Loss 4.2192   LearningRate 0.0259   Epoch: 9   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:11:59,697-Speed 3385.99 samples/sec   Loss 4.2443   LearningRate 0.0259   Epoch: 9   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:02,730-Speed 3377.16 samples/sec   Loss 4.3484   LearningRate 0.0259   Epoch: 9   Global Step: 55890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:12:05,767-Speed 3373.22 samples/sec   Loss 4.4010   LearningRate 0.0259   Epoch: 9   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:12:08,790-Speed 3388.26 samples/sec   Loss 4.2443   LearningRate 0.0258   Epoch: 9   Global Step: 55910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:12:11,792-Speed 3412.19 samples/sec   Loss 4.1650   LearningRate 0.0258   Epoch: 9   Global Step: 55920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:14,820-Speed 3381.53 samples/sec   Loss 4.2602   LearningRate 0.0258   Epoch: 9   Global Step: 55930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:17,852-Speed 3378.61 samples/sec   Loss 4.2007   LearningRate 0.0258   Epoch: 9   Global Step: 55940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:20,875-Speed 3388.13 samples/sec   Loss 4.2805   LearningRate 0.0258   Epoch: 9   Global Step: 55950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:23,899-Speed 3386.45 samples/sec   Loss 4.2498   LearningRate 0.0258   Epoch: 9   Global Step: 55960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:26,927-Speed 3382.61 samples/sec   Loss 4.1429   LearningRate 0.0258   Epoch: 9   Global Step: 55970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:29,955-Speed 3382.80 samples/sec   Loss 4.2260   LearningRate 0.0258   Epoch: 9   Global Step: 55980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:32,979-Speed 3386.48 samples/sec   Loss 4.1779   LearningRate 0.0258   Epoch: 9   Global Step: 55990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:12:36,010-Speed 3380.45 samples/sec   Loss 4.1384   LearningRate 0.0258   Epoch: 9   Global Step: 56000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:13:19,670-[lfw][56000]XNorm: 21.520633
Training: 2022-04-27 07:13:19,671-[lfw][56000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-27 07:13:19,671-[lfw][56000]Accuracy-Highest: 0.99817
Training: 2022-04-27 07:14:10,094-[cfp_fp][56000]XNorm: 19.650348
Training: 2022-04-27 07:14:10,094-[cfp_fp][56000]Accuracy-Flip: 0.96643+-0.00877
Training: 2022-04-27 07:14:10,095-[cfp_fp][56000]Accuracy-Highest: 0.96643
Training: 2022-04-27 07:14:53,410-[agedb_30][56000]XNorm: 21.535365
Training: 2022-04-27 07:14:53,411-[agedb_30][56000]Accuracy-Flip: 0.97483+-0.00758
Training: 2022-04-27 07:14:53,411-[agedb_30][56000]Accuracy-Highest: 0.97767
Training: 2022-04-27 07:14:56,430-Speed 72.92 samples/sec   Loss 4.1655   LearningRate 0.0258   Epoch: 9   Global Step: 56010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:14:59,438-Speed 3404.60 samples/sec   Loss 4.3177   LearningRate 0.0257   Epoch: 9   Global Step: 56020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:15:02,447-Speed 3404.14 samples/sec   Loss 4.2868   LearningRate 0.0257   Epoch: 9   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:05,459-Speed 3400.16 samples/sec   Loss 4.3269   LearningRate 0.0257   Epoch: 9   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:08,474-Speed 3397.05 samples/sec   Loss 4.2151   LearningRate 0.0257   Epoch: 9   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:11,490-Speed 3395.61 samples/sec   Loss 4.1578   LearningRate 0.0257   Epoch: 9   Global Step: 56060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:14,514-Speed 3387.11 samples/sec   Loss 4.2625   LearningRate 0.0257   Epoch: 9   Global Step: 56070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:17,530-Speed 3396.64 samples/sec   Loss 4.3349   LearningRate 0.0257   Epoch: 9   Global Step: 56080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:20,543-Speed 3398.82 samples/sec   Loss 4.2505   LearningRate 0.0257   Epoch: 9   Global Step: 56090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:23,572-Speed 3382.06 samples/sec   Loss 4.3015   LearningRate 0.0257   Epoch: 9   Global Step: 56100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:26,601-Speed 3380.98 samples/sec   Loss 4.2353   LearningRate 0.0257   Epoch: 9   Global Step: 56110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:29,626-Speed 3385.86 samples/sec   Loss 4.1429   LearningRate 0.0257   Epoch: 9   Global Step: 56120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:32,629-Speed 3411.23 samples/sec   Loss 4.1938   LearningRate 0.0256   Epoch: 9   Global Step: 56130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:35,645-Speed 3396.16 samples/sec   Loss 4.2319   LearningRate 0.0256   Epoch: 9   Global Step: 56140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:38,671-Speed 3384.02 samples/sec   Loss 4.2353   LearningRate 0.0256   Epoch: 9   Global Step: 56150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:41,697-Speed 3385.46 samples/sec   Loss 4.2183   LearningRate 0.0256   Epoch: 9   Global Step: 56160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:44,720-Speed 3387.66 samples/sec   Loss 4.1463   LearningRate 0.0256   Epoch: 9   Global Step: 56170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:47,760-Speed 3369.19 samples/sec   Loss 4.3002   LearningRate 0.0256   Epoch: 9   Global Step: 56180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:50,796-Speed 3373.50 samples/sec   Loss 4.1777   LearningRate 0.0256   Epoch: 9   Global Step: 56190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:53,829-Speed 3376.96 samples/sec   Loss 4.3062   LearningRate 0.0256   Epoch: 9   Global Step: 56200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:56,874-Speed 3364.42 samples/sec   Loss 4.2422   LearningRate 0.0256   Epoch: 9   Global Step: 56210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:15:59,931-Speed 3350.52 samples/sec   Loss 4.1530   LearningRate 0.0256   Epoch: 9   Global Step: 56220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:02,949-Speed 3393.75 samples/sec   Loss 4.2227   LearningRate 0.0256   Epoch: 9   Global Step: 56230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:05,977-Speed 3381.99 samples/sec   Loss 4.2943   LearningRate 0.0255   Epoch: 9   Global Step: 56240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:09,015-Speed 3371.52 samples/sec   Loss 4.1586   LearningRate 0.0255   Epoch: 9   Global Step: 56250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:12,045-Speed 3380.56 samples/sec   Loss 4.3724   LearningRate 0.0255   Epoch: 9   Global Step: 56260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:15,209-Speed 3236.48 samples/sec   Loss 4.1935   LearningRate 0.0255   Epoch: 9   Global Step: 56270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:18,283-Speed 3332.73 samples/sec   Loss 4.3450   LearningRate 0.0255   Epoch: 9   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:21,303-Speed 3391.13 samples/sec   Loss 4.2210   LearningRate 0.0255   Epoch: 9   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:24,322-Speed 3392.35 samples/sec   Loss 4.1472   LearningRate 0.0255   Epoch: 9   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:27,353-Speed 3380.11 samples/sec   Loss 4.2186   LearningRate 0.0255   Epoch: 9   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:30,368-Speed 3396.31 samples/sec   Loss 4.2245   LearningRate 0.0255   Epoch: 9   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:33,387-Speed 3392.85 samples/sec   Loss 4.1886   LearningRate 0.0255   Epoch: 9   Global Step: 56330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:16:36,409-Speed 3389.30 samples/sec   Loss 4.3447   LearningRate 0.0255   Epoch: 9   Global Step: 56340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:16:39,425-Speed 3396.02 samples/sec   Loss 4.2733   LearningRate 0.0255   Epoch: 9   Global Step: 56350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:16:42,446-Speed 3390.23 samples/sec   Loss 4.1211   LearningRate 0.0254   Epoch: 9   Global Step: 56360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:16:45,437-Speed 3425.01 samples/sec   Loss 4.1355   LearningRate 0.0254   Epoch: 9   Global Step: 56370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:48,451-Speed 3398.16 samples/sec   Loss 4.1605   LearningRate 0.0254   Epoch: 9   Global Step: 56380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:51,464-Speed 3399.11 samples/sec   Loss 4.3242   LearningRate 0.0254   Epoch: 9   Global Step: 56390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:54,475-Speed 3401.68 samples/sec   Loss 4.1900   LearningRate 0.0254   Epoch: 9   Global Step: 56400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:16:57,489-Speed 3398.55 samples/sec   Loss 4.1354   LearningRate 0.0254   Epoch: 9   Global Step: 56410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:00,507-Speed 3395.09 samples/sec   Loss 4.2134   LearningRate 0.0254   Epoch: 9   Global Step: 56420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:03,592-Speed 3319.29 samples/sec   Loss 4.1850   LearningRate 0.0254   Epoch: 9   Global Step: 56430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:06,614-Speed 3390.33 samples/sec   Loss 4.1727   LearningRate 0.0254   Epoch: 9   Global Step: 56440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:09,631-Speed 3393.78 samples/sec   Loss 4.2710   LearningRate 0.0254   Epoch: 9   Global Step: 56450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:12,641-Speed 3403.40 samples/sec   Loss 4.1351   LearningRate 0.0254   Epoch: 9   Global Step: 56460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:15,635-Speed 3421.28 samples/sec   Loss 4.2731   LearningRate 0.0253   Epoch: 9   Global Step: 56470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:18,643-Speed 3404.89 samples/sec   Loss 4.3015   LearningRate 0.0253   Epoch: 9   Global Step: 56480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:21,650-Speed 3405.92 samples/sec   Loss 4.1572   LearningRate 0.0253   Epoch: 9   Global Step: 56490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:24,666-Speed 3396.32 samples/sec   Loss 4.2169   LearningRate 0.0253   Epoch: 9   Global Step: 56500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:27,702-Speed 3373.23 samples/sec   Loss 4.0963   LearningRate 0.0253   Epoch: 9   Global Step: 56510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:30,712-Speed 3403.29 samples/sec   Loss 4.0989   LearningRate 0.0253   Epoch: 9   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:33,719-Speed 3405.80 samples/sec   Loss 4.1234   LearningRate 0.0253   Epoch: 9   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:36,735-Speed 3396.57 samples/sec   Loss 4.4039   LearningRate 0.0253   Epoch: 9   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:39,749-Speed 3398.61 samples/sec   Loss 4.1947   LearningRate 0.0253   Epoch: 9   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:42,758-Speed 3403.42 samples/sec   Loss 4.2255   LearningRate 0.0253   Epoch: 9   Global Step: 56560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:45,753-Speed 3419.14 samples/sec   Loss 4.1678   LearningRate 0.0253   Epoch: 9   Global Step: 56570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:48,770-Speed 3395.84 samples/sec   Loss 4.3078   LearningRate 0.0252   Epoch: 9   Global Step: 56580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:51,785-Speed 3397.17 samples/sec   Loss 4.1917   LearningRate 0.0252   Epoch: 9   Global Step: 56590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:54,795-Speed 3402.79 samples/sec   Loss 4.3007   LearningRate 0.0252   Epoch: 9   Global Step: 56600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:17:57,807-Speed 3400.41 samples/sec   Loss 4.2258   LearningRate 0.0252   Epoch: 9   Global Step: 56610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:00,823-Speed 3396.11 samples/sec   Loss 4.2284   LearningRate 0.0252   Epoch: 9   Global Step: 56620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:03,843-Speed 3390.53 samples/sec   Loss 4.2570   LearningRate 0.0252   Epoch: 9   Global Step: 56630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:06,856-Speed 3400.33 samples/sec   Loss 4.2265   LearningRate 0.0252   Epoch: 9   Global Step: 56640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:09,871-Speed 3396.80 samples/sec   Loss 4.2004   LearningRate 0.0252   Epoch: 9   Global Step: 56650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:12,891-Speed 3392.12 samples/sec   Loss 4.2698   LearningRate 0.0252   Epoch: 9   Global Step: 56660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:15,887-Speed 3417.87 samples/sec   Loss 4.1600   LearningRate 0.0252   Epoch: 9   Global Step: 56670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:18,896-Speed 3404.71 samples/sec   Loss 4.1167   LearningRate 0.0252   Epoch: 9   Global Step: 56680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:21,910-Speed 3398.13 samples/sec   Loss 4.1383   LearningRate 0.0251   Epoch: 9   Global Step: 56690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:24,957-Speed 3361.29 samples/sec   Loss 4.0728   LearningRate 0.0251   Epoch: 9   Global Step: 56700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:27,971-Speed 3397.90 samples/sec   Loss 4.1141   LearningRate 0.0251   Epoch: 9   Global Step: 56710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:30,992-Speed 3389.88 samples/sec   Loss 4.2662   LearningRate 0.0251   Epoch: 9   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:34,018-Speed 3385.79 samples/sec   Loss 4.1999   LearningRate 0.0251   Epoch: 9   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:37,109-Speed 3313.16 samples/sec   Loss 4.0538   LearningRate 0.0251   Epoch: 9   Global Step: 56740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:40,131-Speed 3389.14 samples/sec   Loss 4.1648   LearningRate 0.0251   Epoch: 9   Global Step: 56750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:43,144-Speed 3400.17 samples/sec   Loss 4.3162   LearningRate 0.0251   Epoch: 9   Global Step: 56760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:46,160-Speed 3395.19 samples/sec   Loss 4.2573   LearningRate 0.0251   Epoch: 9   Global Step: 56770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:18:49,193-Speed 3378.01 samples/sec   Loss 4.3455   LearningRate 0.0251   Epoch: 9   Global Step: 56780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:18:52,218-Speed 3384.84 samples/sec   Loss 4.0309   LearningRate 0.0251   Epoch: 9   Global Step: 56790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:18:55,222-Speed 3410.02 samples/sec   Loss 4.2170   LearningRate 0.0251   Epoch: 9   Global Step: 56800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:18:58,243-Speed 3389.75 samples/sec   Loss 4.1434   LearningRate 0.0250   Epoch: 9   Global Step: 56810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:01,258-Speed 3398.66 samples/sec   Loss 4.2457   LearningRate 0.0250   Epoch: 9   Global Step: 56820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:04,273-Speed 3397.14 samples/sec   Loss 4.1371   LearningRate 0.0250   Epoch: 9   Global Step: 56830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:07,287-Speed 3399.00 samples/sec   Loss 4.1496   LearningRate 0.0250   Epoch: 9   Global Step: 56840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:10,437-Speed 3250.65 samples/sec   Loss 4.0531   LearningRate 0.0250   Epoch: 9   Global Step: 56850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:13,447-Speed 3403.46 samples/sec   Loss 4.0980   LearningRate 0.0250   Epoch: 9   Global Step: 56860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:26,777-Speed 768.25 samples/sec   Loss 3.6710   LearningRate 0.0250   Epoch: 10   Global Step: 56870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:29,799-Speed 3389.34 samples/sec   Loss 3.5931   LearningRate 0.0250   Epoch: 10   Global Step: 56880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:32,977-Speed 3223.15 samples/sec   Loss 3.6001   LearningRate 0.0250   Epoch: 10   Global Step: 56890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:36,009-Speed 3378.93 samples/sec   Loss 3.6642   LearningRate 0.0250   Epoch: 10   Global Step: 56900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:19:39,025-Speed 3396.19 samples/sec   Loss 3.6384   LearningRate 0.0250   Epoch: 10   Global Step: 56910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:19:42,044-Speed 3393.04 samples/sec   Loss 3.5950   LearningRate 0.0249   Epoch: 10   Global Step: 56920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:19:45,065-Speed 3390.19 samples/sec   Loss 3.5259   LearningRate 0.0249   Epoch: 10   Global Step: 56930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:19:48,079-Speed 3397.86 samples/sec   Loss 3.5942   LearningRate 0.0249   Epoch: 10   Global Step: 56940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:51,111-Speed 3378.55 samples/sec   Loss 3.6945   LearningRate 0.0249   Epoch: 10   Global Step: 56950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:54,134-Speed 3387.89 samples/sec   Loss 3.5189   LearningRate 0.0249   Epoch: 10   Global Step: 56960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:19:57,158-Speed 3387.41 samples/sec   Loss 3.5289   LearningRate 0.0249   Epoch: 10   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:00,189-Speed 3379.06 samples/sec   Loss 3.5888   LearningRate 0.0249   Epoch: 10   Global Step: 56980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:03,218-Speed 3381.63 samples/sec   Loss 3.6231   LearningRate 0.0249   Epoch: 10   Global Step: 56990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:06,246-Speed 3382.74 samples/sec   Loss 3.6097   LearningRate 0.0249   Epoch: 10   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:09,260-Speed 3397.99 samples/sec   Loss 3.5491   LearningRate 0.0249   Epoch: 10   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:12,298-Speed 3372.20 samples/sec   Loss 3.7801   LearningRate 0.0249   Epoch: 10   Global Step: 57020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:15,399-Speed 3302.72 samples/sec   Loss 3.6134   LearningRate 0.0249   Epoch: 10   Global Step: 57030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:18,414-Speed 3397.25 samples/sec   Loss 3.6262   LearningRate 0.0248   Epoch: 10   Global Step: 57040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:20:21,433-Speed 3392.73 samples/sec   Loss 3.5046   LearningRate 0.0248   Epoch: 10   Global Step: 57050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:20:24,465-Speed 3378.95 samples/sec   Loss 3.6727   LearningRate 0.0248   Epoch: 10   Global Step: 57060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:20:27,461-Speed 3417.73 samples/sec   Loss 3.5903   LearningRate 0.0248   Epoch: 10   Global Step: 57070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:30,489-Speed 3382.42 samples/sec   Loss 3.6560   LearningRate 0.0248   Epoch: 10   Global Step: 57080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:33,510-Speed 3391.34 samples/sec   Loss 3.7150   LearningRate 0.0248   Epoch: 10   Global Step: 57090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:36,532-Speed 3389.67 samples/sec   Loss 3.7081   LearningRate 0.0248   Epoch: 10   Global Step: 57100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:39,551-Speed 3391.63 samples/sec   Loss 3.7203   LearningRate 0.0248   Epoch: 10   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:42,579-Speed 3383.53 samples/sec   Loss 3.7847   LearningRate 0.0248   Epoch: 10   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:45,600-Speed 3389.69 samples/sec   Loss 3.5511   LearningRate 0.0248   Epoch: 10   Global Step: 57130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:48,627-Speed 3383.80 samples/sec   Loss 3.7173   LearningRate 0.0248   Epoch: 10   Global Step: 57140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:51,651-Speed 3386.57 samples/sec   Loss 3.8399   LearningRate 0.0247   Epoch: 10   Global Step: 57150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:54,681-Speed 3381.10 samples/sec   Loss 3.7592   LearningRate 0.0247   Epoch: 10   Global Step: 57160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:20:57,705-Speed 3386.29 samples/sec   Loss 3.6525   LearningRate 0.0247   Epoch: 10   Global Step: 57170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:21:00,741-Speed 3374.25 samples/sec   Loss 3.7871   LearningRate 0.0247   Epoch: 10   Global Step: 57180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:21:03,757-Speed 3396.05 samples/sec   Loss 3.5862   LearningRate 0.0247   Epoch: 10   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:21:06,774-Speed 3395.07 samples/sec   Loss 3.6031   LearningRate 0.0247   Epoch: 10   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:21:09,834-Speed 3346.81 samples/sec   Loss 3.6934   LearningRate 0.0247   Epoch: 10   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:21:12,850-Speed 3396.03 samples/sec   Loss 3.8042   LearningRate 0.0247   Epoch: 10   Global Step: 57220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:21:15,854-Speed 3409.99 samples/sec   Loss 3.7546   LearningRate 0.0247   Epoch: 10   Global Step: 57230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:18,869-Speed 3397.25 samples/sec   Loss 3.7583   LearningRate 0.0247   Epoch: 10   Global Step: 57240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:21,897-Speed 3382.06 samples/sec   Loss 3.8238   LearningRate 0.0247   Epoch: 10   Global Step: 57250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:24,920-Speed 3388.07 samples/sec   Loss 3.7368   LearningRate 0.0246   Epoch: 10   Global Step: 57260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:27,941-Speed 3390.74 samples/sec   Loss 3.7554   LearningRate 0.0246   Epoch: 10   Global Step: 57270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:30,955-Speed 3397.95 samples/sec   Loss 3.7267   LearningRate 0.0246   Epoch: 10   Global Step: 57280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:33,973-Speed 3394.62 samples/sec   Loss 3.7638   LearningRate 0.0246   Epoch: 10   Global Step: 57290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:36,989-Speed 3396.12 samples/sec   Loss 3.7057   LearningRate 0.0246   Epoch: 10   Global Step: 57300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:40,009-Speed 3390.58 samples/sec   Loss 3.6932   LearningRate 0.0246   Epoch: 10   Global Step: 57310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:43,041-Speed 3378.33 samples/sec   Loss 3.7624   LearningRate 0.0246   Epoch: 10   Global Step: 57320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:46,066-Speed 3385.55 samples/sec   Loss 3.7485   LearningRate 0.0246   Epoch: 10   Global Step: 57330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:21:49,067-Speed 3412.77 samples/sec   Loss 3.7592   LearningRate 0.0246   Epoch: 10   Global Step: 57340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:52,094-Speed 3383.94 samples/sec   Loss 3.8219   LearningRate 0.0246   Epoch: 10   Global Step: 57350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:55,137-Speed 3366.30 samples/sec   Loss 3.6456   LearningRate 0.0246   Epoch: 10   Global Step: 57360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:21:58,163-Speed 3384.84 samples/sec   Loss 3.8465   LearningRate 0.0246   Epoch: 10   Global Step: 57370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:01,182-Speed 3392.75 samples/sec   Loss 3.8003   LearningRate 0.0245   Epoch: 10   Global Step: 57380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:04,200-Speed 3393.14 samples/sec   Loss 3.6786   LearningRate 0.0245   Epoch: 10   Global Step: 57390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:07,217-Speed 3395.94 samples/sec   Loss 3.8703   LearningRate 0.0245   Epoch: 10   Global Step: 57400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:10,257-Speed 3369.64 samples/sec   Loss 3.6507   LearningRate 0.0245   Epoch: 10   Global Step: 57410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:13,297-Speed 3369.52 samples/sec   Loss 3.8378   LearningRate 0.0245   Epoch: 10   Global Step: 57420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:16,352-Speed 3351.78 samples/sec   Loss 3.9238   LearningRate 0.0245   Epoch: 10   Global Step: 57430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:19,364-Speed 3401.63 samples/sec   Loss 3.7609   LearningRate 0.0245   Epoch: 10   Global Step: 57440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:22,388-Speed 3386.77 samples/sec   Loss 3.8485   LearningRate 0.0245   Epoch: 10   Global Step: 57450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:25,440-Speed 3356.35 samples/sec   Loss 3.8813   LearningRate 0.0245   Epoch: 10   Global Step: 57460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:28,468-Speed 3382.68 samples/sec   Loss 3.8146   LearningRate 0.0245   Epoch: 10   Global Step: 57470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:31,483-Speed 3396.55 samples/sec   Loss 3.8829   LearningRate 0.0245   Epoch: 10   Global Step: 57480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:34,500-Speed 3394.60 samples/sec   Loss 3.8182   LearningRate 0.0244   Epoch: 10   Global Step: 57490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:37,519-Speed 3393.00 samples/sec   Loss 3.8364   LearningRate 0.0244   Epoch: 10   Global Step: 57500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:40,537-Speed 3394.37 samples/sec   Loss 3.8467   LearningRate 0.0244   Epoch: 10   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:43,582-Speed 3363.46 samples/sec   Loss 3.8647   LearningRate 0.0244   Epoch: 10   Global Step: 57520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:46,616-Speed 3376.56 samples/sec   Loss 3.7684   LearningRate 0.0244   Epoch: 10   Global Step: 57530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:22:49,652-Speed 3373.37 samples/sec   Loss 3.7908   LearningRate 0.0244   Epoch: 10   Global Step: 57540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:22:52,676-Speed 3386.67 samples/sec   Loss 3.7933   LearningRate 0.0244   Epoch: 10   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:22:55,708-Speed 3379.55 samples/sec   Loss 3.8344   LearningRate 0.0244   Epoch: 10   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:22:58,730-Speed 3388.43 samples/sec   Loss 3.8245   LearningRate 0.0244   Epoch: 10   Global Step: 57570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:23:01,757-Speed 3383.98 samples/sec   Loss 3.7720   LearningRate 0.0244   Epoch: 10   Global Step: 57580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:23:04,760-Speed 3410.80 samples/sec   Loss 3.7501   LearningRate 0.0244   Epoch: 10   Global Step: 57590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:07,778-Speed 3393.58 samples/sec   Loss 3.7392   LearningRate 0.0244   Epoch: 10   Global Step: 57600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:10,798-Speed 3391.84 samples/sec   Loss 3.8521   LearningRate 0.0243   Epoch: 10   Global Step: 57610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:13,820-Speed 3390.02 samples/sec   Loss 3.7059   LearningRate 0.0243   Epoch: 10   Global Step: 57620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:16,845-Speed 3385.35 samples/sec   Loss 3.8626   LearningRate 0.0243   Epoch: 10   Global Step: 57630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:19,881-Speed 3373.68 samples/sec   Loss 3.8308   LearningRate 0.0243   Epoch: 10   Global Step: 57640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:22,900-Speed 3392.67 samples/sec   Loss 3.6884   LearningRate 0.0243   Epoch: 10   Global Step: 57650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:25,924-Speed 3387.04 samples/sec   Loss 3.8071   LearningRate 0.0243   Epoch: 10   Global Step: 57660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:28,957-Speed 3376.45 samples/sec   Loss 3.8670   LearningRate 0.0243   Epoch: 10   Global Step: 57670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:31,983-Speed 3385.39 samples/sec   Loss 3.8982   LearningRate 0.0243   Epoch: 10   Global Step: 57680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:35,016-Speed 3377.19 samples/sec   Loss 3.8372   LearningRate 0.0243   Epoch: 10   Global Step: 57690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:23:38,054-Speed 3370.85 samples/sec   Loss 3.7561   LearningRate 0.0243   Epoch: 10   Global Step: 57700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:23:41,058-Speed 3411.69 samples/sec   Loss 3.9922   LearningRate 0.0243   Epoch: 10   Global Step: 57710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:44,088-Speed 3380.63 samples/sec   Loss 3.9160   LearningRate 0.0242   Epoch: 10   Global Step: 57720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:47,119-Speed 3380.08 samples/sec   Loss 3.8566   LearningRate 0.0242   Epoch: 10   Global Step: 57730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:50,156-Speed 3372.83 samples/sec   Loss 3.8199   LearningRate 0.0242   Epoch: 10   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:53,182-Speed 3383.88 samples/sec   Loss 3.8199   LearningRate 0.0242   Epoch: 10   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:56,206-Speed 3387.61 samples/sec   Loss 3.9197   LearningRate 0.0242   Epoch: 10   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:23:59,234-Speed 3383.00 samples/sec   Loss 3.9020   LearningRate 0.0242   Epoch: 10   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:02,258-Speed 3386.32 samples/sec   Loss 3.9375   LearningRate 0.0242   Epoch: 10   Global Step: 57780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:05,288-Speed 3380.68 samples/sec   Loss 3.9105   LearningRate 0.0242   Epoch: 10   Global Step: 57790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:08,329-Speed 3368.51 samples/sec   Loss 3.8574   LearningRate 0.0242   Epoch: 10   Global Step: 57800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:11,397-Speed 3339.08 samples/sec   Loss 3.8713   LearningRate 0.0242   Epoch: 10   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:24:14,464-Speed 3338.78 samples/sec   Loss 3.8520   LearningRate 0.0242   Epoch: 10   Global Step: 57820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:24:17,469-Speed 3408.99 samples/sec   Loss 3.9924   LearningRate 0.0242   Epoch: 10   Global Step: 57830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:20,497-Speed 3382.09 samples/sec   Loss 3.8992   LearningRate 0.0241   Epoch: 10   Global Step: 57840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:23,528-Speed 3380.18 samples/sec   Loss 3.8412   LearningRate 0.0241   Epoch: 10   Global Step: 57850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:26,557-Speed 3380.82 samples/sec   Loss 3.7888   LearningRate 0.0241   Epoch: 10   Global Step: 57860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:29,578-Speed 3390.82 samples/sec   Loss 3.8404   LearningRate 0.0241   Epoch: 10   Global Step: 57870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:32,599-Speed 3390.23 samples/sec   Loss 3.8542   LearningRate 0.0241   Epoch: 10   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:35,639-Speed 3369.97 samples/sec   Loss 3.8602   LearningRate 0.0241   Epoch: 10   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:38,671-Speed 3377.63 samples/sec   Loss 3.8687   LearningRate 0.0241   Epoch: 10   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:41,695-Speed 3387.26 samples/sec   Loss 3.7574   LearningRate 0.0241   Epoch: 10   Global Step: 57910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:44,716-Speed 3389.38 samples/sec   Loss 4.0000   LearningRate 0.0241   Epoch: 10   Global Step: 57920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:47,738-Speed 3389.24 samples/sec   Loss 3.9402   LearningRate 0.0241   Epoch: 10   Global Step: 57930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:24:50,764-Speed 3384.75 samples/sec   Loss 3.9081   LearningRate 0.0241   Epoch: 10   Global Step: 57940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:24:53,770-Speed 3408.40 samples/sec   Loss 3.7971   LearningRate 0.0241   Epoch: 10   Global Step: 57950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:56,800-Speed 3380.37 samples/sec   Loss 3.9199   LearningRate 0.0240   Epoch: 10   Global Step: 57960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:24:59,837-Speed 3372.16 samples/sec   Loss 3.7663   LearningRate 0.0240   Epoch: 10   Global Step: 57970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:25:02,872-Speed 3374.51 samples/sec   Loss 3.8508   LearningRate 0.0240   Epoch: 10   Global Step: 57980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:25:05,899-Speed 3384.38 samples/sec   Loss 3.8429   LearningRate 0.0240   Epoch: 10   Global Step: 57990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:25:08,933-Speed 3375.11 samples/sec   Loss 3.8825   LearningRate 0.0240   Epoch: 10   Global Step: 58000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:25:52,680-[lfw][58000]XNorm: 22.443554
Training: 2022-04-27 07:25:52,680-[lfw][58000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-27 07:25:52,681-[lfw][58000]Accuracy-Highest: 0.99817
Training: 2022-04-27 07:26:43,631-[cfp_fp][58000]XNorm: 20.331181
Training: 2022-04-27 07:26:43,632-[cfp_fp][58000]Accuracy-Flip: 0.96743+-0.00887
Training: 2022-04-27 07:26:43,632-[cfp_fp][58000]Accuracy-Highest: 0.96743
Training: 2022-04-27 07:27:27,177-[agedb_30][58000]XNorm: 22.511959
Training: 2022-04-27 07:27:27,178-[agedb_30][58000]Accuracy-Flip: 0.97633+-0.00632
Training: 2022-04-27 07:27:27,178-[agedb_30][58000]Accuracy-Highest: 0.97767
Training: 2022-04-27 07:27:30,191-Speed 72.49 samples/sec   Loss 3.8454   LearningRate 0.0240   Epoch: 10   Global Step: 58010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:27:33,197-Speed 3406.96 samples/sec   Loss 3.8498   LearningRate 0.0240   Epoch: 10   Global Step: 58020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:27:36,206-Speed 3404.41 samples/sec   Loss 3.8957   LearningRate 0.0240   Epoch: 10   Global Step: 58030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:27:39,215-Speed 3403.54 samples/sec   Loss 3.8618   LearningRate 0.0240   Epoch: 10   Global Step: 58040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:27:42,227-Speed 3400.44 samples/sec   Loss 3.8590   LearningRate 0.0240   Epoch: 10   Global Step: 58050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:27:45,236-Speed 3404.37 samples/sec   Loss 3.7996   LearningRate 0.0240   Epoch: 10   Global Step: 58060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:27:48,247-Speed 3401.19 samples/sec   Loss 3.7528   LearningRate 0.0239   Epoch: 10   Global Step: 58070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:27:51,268-Speed 3390.23 samples/sec   Loss 3.8961   LearningRate 0.0239   Epoch: 10   Global Step: 58080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:27:54,280-Speed 3400.80 samples/sec   Loss 3.8900   LearningRate 0.0239   Epoch: 10   Global Step: 58090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:27:57,276-Speed 3418.44 samples/sec   Loss 3.7875   LearningRate 0.0239   Epoch: 10   Global Step: 58100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:00,297-Speed 3390.62 samples/sec   Loss 3.9066   LearningRate 0.0239   Epoch: 10   Global Step: 58110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:03,314-Speed 3395.21 samples/sec   Loss 3.7729   LearningRate 0.0239   Epoch: 10   Global Step: 58120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:06,332-Speed 3393.33 samples/sec   Loss 3.7660   LearningRate 0.0239   Epoch: 10   Global Step: 58130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:09,352-Speed 3391.62 samples/sec   Loss 3.9529   LearningRate 0.0239   Epoch: 10   Global Step: 58140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:12,385-Speed 3376.96 samples/sec   Loss 3.8812   LearningRate 0.0239   Epoch: 10   Global Step: 58150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:15,422-Speed 3372.88 samples/sec   Loss 3.7812   LearningRate 0.0239   Epoch: 10   Global Step: 58160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:18,457-Speed 3374.60 samples/sec   Loss 3.8370   LearningRate 0.0239   Epoch: 10   Global Step: 58170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:21,481-Speed 3387.70 samples/sec   Loss 4.0274   LearningRate 0.0239   Epoch: 10   Global Step: 58180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:24,508-Speed 3383.23 samples/sec   Loss 3.8645   LearningRate 0.0238   Epoch: 10   Global Step: 58190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:27,534-Speed 3385.26 samples/sec   Loss 3.8003   LearningRate 0.0238   Epoch: 10   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:28:30,548-Speed 3397.87 samples/sec   Loss 3.9761   LearningRate 0.0238   Epoch: 10   Global Step: 58210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:33,583-Speed 3374.20 samples/sec   Loss 3.9520   LearningRate 0.0238   Epoch: 10   Global Step: 58220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:36,619-Speed 3373.94 samples/sec   Loss 3.8896   LearningRate 0.0238   Epoch: 10   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:39,656-Speed 3373.37 samples/sec   Loss 3.7861   LearningRate 0.0238   Epoch: 10   Global Step: 58240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:42,689-Speed 3376.29 samples/sec   Loss 3.9602   LearningRate 0.0238   Epoch: 10   Global Step: 58250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:45,719-Speed 3380.03 samples/sec   Loss 3.9764   LearningRate 0.0238   Epoch: 10   Global Step: 58260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:48,746-Speed 3384.22 samples/sec   Loss 3.9816   LearningRate 0.0238   Epoch: 10   Global Step: 58270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:51,769-Speed 3388.09 samples/sec   Loss 3.8462   LearningRate 0.0238   Epoch: 10   Global Step: 58280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:54,790-Speed 3390.16 samples/sec   Loss 3.9275   LearningRate 0.0238   Epoch: 10   Global Step: 58290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:28:57,813-Speed 3388.93 samples/sec   Loss 3.9100   LearningRate 0.0237   Epoch: 10   Global Step: 58300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:00,839-Speed 3383.73 samples/sec   Loss 3.8812   LearningRate 0.0237   Epoch: 10   Global Step: 58310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:03,866-Speed 3384.44 samples/sec   Loss 3.9235   LearningRate 0.0237   Epoch: 10   Global Step: 58320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:06,890-Speed 3386.46 samples/sec   Loss 3.9676   LearningRate 0.0237   Epoch: 10   Global Step: 58330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:09,910-Speed 3391.90 samples/sec   Loss 3.9506   LearningRate 0.0237   Epoch: 10   Global Step: 58340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:12,928-Speed 3394.30 samples/sec   Loss 4.0070   LearningRate 0.0237   Epoch: 10   Global Step: 58350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:15,928-Speed 3413.64 samples/sec   Loss 3.9957   LearningRate 0.0237   Epoch: 10   Global Step: 58360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:18,942-Speed 3398.02 samples/sec   Loss 3.9265   LearningRate 0.0237   Epoch: 10   Global Step: 58370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:21,962-Speed 3392.43 samples/sec   Loss 4.0346   LearningRate 0.0237   Epoch: 10   Global Step: 58380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:25,005-Speed 3365.57 samples/sec   Loss 3.8256   LearningRate 0.0237   Epoch: 10   Global Step: 58390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:28,019-Speed 3397.76 samples/sec   Loss 4.0262   LearningRate 0.0237   Epoch: 10   Global Step: 58400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:31,033-Speed 3397.85 samples/sec   Loss 3.9396   LearningRate 0.0237   Epoch: 10   Global Step: 58410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:34,053-Speed 3392.65 samples/sec   Loss 3.8503   LearningRate 0.0236   Epoch: 10   Global Step: 58420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:37,085-Speed 3377.71 samples/sec   Loss 3.9422   LearningRate 0.0236   Epoch: 10   Global Step: 58430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:40,122-Speed 3373.07 samples/sec   Loss 3.7877   LearningRate 0.0236   Epoch: 10   Global Step: 58440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:43,140-Speed 3393.87 samples/sec   Loss 3.9148   LearningRate 0.0236   Epoch: 10   Global Step: 58450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:29:46,153-Speed 3398.90 samples/sec   Loss 3.9460   LearningRate 0.0236   Epoch: 10   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:49,170-Speed 3395.09 samples/sec   Loss 3.7469   LearningRate 0.0236   Epoch: 10   Global Step: 58470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:52,193-Speed 3387.87 samples/sec   Loss 3.9413   LearningRate 0.0236   Epoch: 10   Global Step: 58480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:55,203-Speed 3402.87 samples/sec   Loss 3.8676   LearningRate 0.0236   Epoch: 10   Global Step: 58490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:29:58,216-Speed 3399.78 samples/sec   Loss 3.9209   LearningRate 0.0236   Epoch: 10   Global Step: 58500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:01,211-Speed 3418.85 samples/sec   Loss 4.0160   LearningRate 0.0236   Epoch: 10   Global Step: 58510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:04,271-Speed 3347.43 samples/sec   Loss 3.8813   LearningRate 0.0236   Epoch: 10   Global Step: 58520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:07,350-Speed 3327.37 samples/sec   Loss 3.8512   LearningRate 0.0236   Epoch: 10   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:10,360-Speed 3402.26 samples/sec   Loss 3.8578   LearningRate 0.0235   Epoch: 10   Global Step: 58540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:13,371-Speed 3402.18 samples/sec   Loss 3.8619   LearningRate 0.0235   Epoch: 10   Global Step: 58550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:16,386-Speed 3396.56 samples/sec   Loss 3.9260   LearningRate 0.0235   Epoch: 10   Global Step: 58560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:19,397-Speed 3401.79 samples/sec   Loss 3.8692   LearningRate 0.0235   Epoch: 10   Global Step: 58570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:22,407-Speed 3403.15 samples/sec   Loss 3.8273   LearningRate 0.0235   Epoch: 10   Global Step: 58580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:25,421-Speed 3397.45 samples/sec   Loss 4.0182   LearningRate 0.0235   Epoch: 10   Global Step: 58590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:28,438-Speed 3395.83 samples/sec   Loss 3.9549   LearningRate 0.0235   Epoch: 10   Global Step: 58600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:31,456-Speed 3393.39 samples/sec   Loss 4.0055   LearningRate 0.0235   Epoch: 10   Global Step: 58610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:34,498-Speed 3366.94 samples/sec   Loss 3.8427   LearningRate 0.0235   Epoch: 10   Global Step: 58620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:37,513-Speed 3397.24 samples/sec   Loss 3.8059   LearningRate 0.0235   Epoch: 10   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:40,525-Speed 3400.83 samples/sec   Loss 3.8776   LearningRate 0.0235   Epoch: 10   Global Step: 58640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:43,536-Speed 3401.88 samples/sec   Loss 3.8557   LearningRate 0.0235   Epoch: 10   Global Step: 58650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:46,551-Speed 3396.89 samples/sec   Loss 3.9053   LearningRate 0.0234   Epoch: 10   Global Step: 58660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:49,572-Speed 3389.85 samples/sec   Loss 3.9244   LearningRate 0.0234   Epoch: 10   Global Step: 58670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:52,584-Speed 3401.53 samples/sec   Loss 3.8866   LearningRate 0.0234   Epoch: 10   Global Step: 58680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:30:55,579-Speed 3418.95 samples/sec   Loss 3.8687   LearningRate 0.0234   Epoch: 10   Global Step: 58690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:30:58,599-Speed 3391.86 samples/sec   Loss 3.9559   LearningRate 0.0234   Epoch: 10   Global Step: 58700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:01,614-Speed 3396.93 samples/sec   Loss 3.9364   LearningRate 0.0234   Epoch: 10   Global Step: 58710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:04,638-Speed 3387.49 samples/sec   Loss 3.9923   LearningRate 0.0234   Epoch: 10   Global Step: 58720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:07,649-Speed 3401.95 samples/sec   Loss 3.8781   LearningRate 0.0234   Epoch: 10   Global Step: 58730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:10,662-Speed 3398.93 samples/sec   Loss 3.8414   LearningRate 0.0234   Epoch: 10   Global Step: 58740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:13,677-Speed 3396.94 samples/sec   Loss 3.8797   LearningRate 0.0234   Epoch: 10   Global Step: 58750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:16,708-Speed 3379.96 samples/sec   Loss 3.8890   LearningRate 0.0234   Epoch: 10   Global Step: 58760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:19,719-Speed 3401.38 samples/sec   Loss 3.9640   LearningRate 0.0233   Epoch: 10   Global Step: 58770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:22,743-Speed 3386.70 samples/sec   Loss 3.9472   LearningRate 0.0233   Epoch: 10   Global Step: 58780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:25,778-Speed 3375.42 samples/sec   Loss 3.8635   LearningRate 0.0233   Epoch: 10   Global Step: 58790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:31:28,795-Speed 3394.53 samples/sec   Loss 3.9716   LearningRate 0.0233   Epoch: 10   Global Step: 58800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:31:31,814-Speed 3392.58 samples/sec   Loss 3.8958   LearningRate 0.0233   Epoch: 10   Global Step: 58810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:31:34,830-Speed 3396.67 samples/sec   Loss 3.9130   LearningRate 0.0233   Epoch: 10   Global Step: 58820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:31:37,860-Speed 3380.19 samples/sec   Loss 3.9130   LearningRate 0.0233   Epoch: 10   Global Step: 58830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:31:40,876-Speed 3396.46 samples/sec   Loss 4.0724   LearningRate 0.0233   Epoch: 10   Global Step: 58840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:31:43,893-Speed 3394.68 samples/sec   Loss 3.9181   LearningRate 0.0233   Epoch: 10   Global Step: 58850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:31:46,913-Speed 3391.02 samples/sec   Loss 3.9711   LearningRate 0.0233   Epoch: 10   Global Step: 58860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:49,928-Speed 3397.94 samples/sec   Loss 3.8197   LearningRate 0.0233   Epoch: 10   Global Step: 58870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:52,949-Speed 3389.94 samples/sec   Loss 3.8556   LearningRate 0.0233   Epoch: 10   Global Step: 58880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:55,972-Speed 3388.10 samples/sec   Loss 3.9991   LearningRate 0.0232   Epoch: 10   Global Step: 58890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:31:58,990-Speed 3393.90 samples/sec   Loss 4.0184   LearningRate 0.0232   Epoch: 10   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:02,006-Speed 3396.54 samples/sec   Loss 4.0002   LearningRate 0.0232   Epoch: 10   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:05,022-Speed 3395.73 samples/sec   Loss 3.8057   LearningRate 0.0232   Epoch: 10   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:08,033-Speed 3401.36 samples/sec   Loss 3.9633   LearningRate 0.0232   Epoch: 10   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:11,048-Speed 3397.16 samples/sec   Loss 3.9554   LearningRate 0.0232   Epoch: 10   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:14,067-Speed 3393.31 samples/sec   Loss 3.8334   LearningRate 0.0232   Epoch: 10   Global Step: 58950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:17,091-Speed 3386.47 samples/sec   Loss 3.8517   LearningRate 0.0232   Epoch: 10   Global Step: 58960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:32:20,099-Speed 3405.68 samples/sec   Loss 3.7908   LearningRate 0.0232   Epoch: 10   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:23,122-Speed 3387.77 samples/sec   Loss 3.8686   LearningRate 0.0232   Epoch: 10   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:26,138-Speed 3396.31 samples/sec   Loss 3.8723   LearningRate 0.0232   Epoch: 10   Global Step: 58990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:29,151-Speed 3399.66 samples/sec   Loss 3.8794   LearningRate 0.0232   Epoch: 10   Global Step: 59000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:32,166-Speed 3397.02 samples/sec   Loss 3.7364   LearningRate 0.0231   Epoch: 10   Global Step: 59010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:35,183-Speed 3394.51 samples/sec   Loss 3.9656   LearningRate 0.0231   Epoch: 10   Global Step: 59020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:38,199-Speed 3395.36 samples/sec   Loss 3.8572   LearningRate 0.0231   Epoch: 10   Global Step: 59030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:41,234-Speed 3375.63 samples/sec   Loss 3.8344   LearningRate 0.0231   Epoch: 10   Global Step: 59040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:44,249-Speed 3396.23 samples/sec   Loss 3.8042   LearningRate 0.0231   Epoch: 10   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:47,270-Speed 3390.80 samples/sec   Loss 3.8752   LearningRate 0.0231   Epoch: 10   Global Step: 59060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:50,294-Speed 3387.94 samples/sec   Loss 3.9551   LearningRate 0.0231   Epoch: 10   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:32:53,297-Speed 3410.01 samples/sec   Loss 3.9593   LearningRate 0.0231   Epoch: 10   Global Step: 59080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:56,311-Speed 3399.17 samples/sec   Loss 3.9394   LearningRate 0.0231   Epoch: 10   Global Step: 59090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:32:59,328-Speed 3394.31 samples/sec   Loss 3.8818   LearningRate 0.0231   Epoch: 10   Global Step: 59100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:02,349-Speed 3390.97 samples/sec   Loss 3.9967   LearningRate 0.0231   Epoch: 10   Global Step: 59110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:05,374-Speed 3385.04 samples/sec   Loss 4.0276   LearningRate 0.0231   Epoch: 10   Global Step: 59120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:08,394-Speed 3392.16 samples/sec   Loss 3.8408   LearningRate 0.0230   Epoch: 10   Global Step: 59130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:11,407-Speed 3398.79 samples/sec   Loss 3.8674   LearningRate 0.0230   Epoch: 10   Global Step: 59140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:14,428-Speed 3390.00 samples/sec   Loss 3.9477   LearningRate 0.0230   Epoch: 10   Global Step: 59150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:17,443-Speed 3399.16 samples/sec   Loss 4.0706   LearningRate 0.0230   Epoch: 10   Global Step: 59160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:20,456-Speed 3399.76 samples/sec   Loss 3.9135   LearningRate 0.0230   Epoch: 10   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:23,479-Speed 3387.75 samples/sec   Loss 3.7901   LearningRate 0.0230   Epoch: 10   Global Step: 59180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:33:26,585-Speed 3298.13 samples/sec   Loss 3.9167   LearningRate 0.0230   Epoch: 10   Global Step: 59190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:33:29,583-Speed 3416.29 samples/sec   Loss 3.9282   LearningRate 0.0230   Epoch: 10   Global Step: 59200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:32,599-Speed 3395.95 samples/sec   Loss 3.9249   LearningRate 0.0230   Epoch: 10   Global Step: 59210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:35,614-Speed 3397.03 samples/sec   Loss 3.9037   LearningRate 0.0230   Epoch: 10   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:38,636-Speed 3389.10 samples/sec   Loss 3.9620   LearningRate 0.0230   Epoch: 10   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:41,651-Speed 3397.13 samples/sec   Loss 3.9576   LearningRate 0.0230   Epoch: 10   Global Step: 59240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:44,670-Speed 3392.90 samples/sec   Loss 3.8748   LearningRate 0.0229   Epoch: 10   Global Step: 59250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:47,690-Speed 3392.23 samples/sec   Loss 3.8965   LearningRate 0.0229   Epoch: 10   Global Step: 59260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:50,708-Speed 3393.57 samples/sec   Loss 3.8528   LearningRate 0.0229   Epoch: 10   Global Step: 59270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:53,731-Speed 3387.86 samples/sec   Loss 4.0503   LearningRate 0.0229   Epoch: 10   Global Step: 59280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:56,749-Speed 3393.38 samples/sec   Loss 3.8477   LearningRate 0.0229   Epoch: 10   Global Step: 59290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:33:59,765-Speed 3395.97 samples/sec   Loss 3.9517   LearningRate 0.0229   Epoch: 10   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:34:02,787-Speed 3389.07 samples/sec   Loss 4.0532   LearningRate 0.0229   Epoch: 10   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:34:05,788-Speed 3413.34 samples/sec   Loss 3.8582   LearningRate 0.0229   Epoch: 10   Global Step: 59320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:08,811-Speed 3388.32 samples/sec   Loss 3.8914   LearningRate 0.0229   Epoch: 10   Global Step: 59330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:11,835-Speed 3386.68 samples/sec   Loss 3.8525   LearningRate 0.0229   Epoch: 10   Global Step: 59340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:14,855-Speed 3391.59 samples/sec   Loss 3.8873   LearningRate 0.0229   Epoch: 10   Global Step: 59350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:17,873-Speed 3394.51 samples/sec   Loss 3.9611   LearningRate 0.0228   Epoch: 10   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:20,891-Speed 3393.44 samples/sec   Loss 3.8893   LearningRate 0.0228   Epoch: 10   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:23,910-Speed 3392.81 samples/sec   Loss 3.9572   LearningRate 0.0228   Epoch: 10   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:26,932-Speed 3388.86 samples/sec   Loss 3.8928   LearningRate 0.0228   Epoch: 10   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:29,953-Speed 3390.72 samples/sec   Loss 3.8508   LearningRate 0.0228   Epoch: 10   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:32,976-Speed 3388.04 samples/sec   Loss 3.9355   LearningRate 0.0228   Epoch: 10   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:35,997-Speed 3389.91 samples/sec   Loss 3.9308   LearningRate 0.0228   Epoch: 10   Global Step: 59420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:34:39,021-Speed 3387.20 samples/sec   Loss 3.8803   LearningRate 0.0228   Epoch: 10   Global Step: 59430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:34:42,024-Speed 3410.82 samples/sec   Loss 3.8533   LearningRate 0.0228   Epoch: 10   Global Step: 59440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:45,046-Speed 3390.31 samples/sec   Loss 3.8537   LearningRate 0.0228   Epoch: 10   Global Step: 59450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:48,069-Speed 3387.59 samples/sec   Loss 3.8861   LearningRate 0.0228   Epoch: 10   Global Step: 59460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:51,095-Speed 3384.86 samples/sec   Loss 3.9574   LearningRate 0.0228   Epoch: 10   Global Step: 59470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:54,116-Speed 3390.27 samples/sec   Loss 3.9049   LearningRate 0.0227   Epoch: 10   Global Step: 59480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:34:57,135-Speed 3392.37 samples/sec   Loss 3.9145   LearningRate 0.0227   Epoch: 10   Global Step: 59490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:35:00,178-Speed 3365.85 samples/sec   Loss 3.8521   LearningRate 0.0227   Epoch: 10   Global Step: 59500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:35:03,218-Speed 3368.60 samples/sec   Loss 3.8634   LearningRate 0.0227   Epoch: 10   Global Step: 59510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:35:06,246-Speed 3382.79 samples/sec   Loss 3.9530   LearningRate 0.0227   Epoch: 10   Global Step: 59520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:35:09,270-Speed 3387.62 samples/sec   Loss 3.8332   LearningRate 0.0227   Epoch: 10   Global Step: 59530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:35:12,296-Speed 3384.88 samples/sec   Loss 3.8993   LearningRate 0.0227   Epoch: 10   Global Step: 59540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:35:15,319-Speed 3388.01 samples/sec   Loss 3.8415   LearningRate 0.0227   Epoch: 10   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:35:18,343-Speed 3387.17 samples/sec   Loss 4.0042   LearningRate 0.0227   Epoch: 10   Global Step: 59560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:35:21,350-Speed 3406.19 samples/sec   Loss 3.9068   LearningRate 0.0227   Epoch: 10   Global Step: 59570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:35:24,375-Speed 3385.74 samples/sec   Loss 3.8282   LearningRate 0.0227   Epoch: 10   Global Step: 59580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:35:27,386-Speed 3401.32 samples/sec   Loss 3.9000   LearningRate 0.0227   Epoch: 10   Global Step: 59590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:30,428-Speed 3367.10 samples/sec   Loss 3.8632   LearningRate 0.0226   Epoch: 10   Global Step: 59600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:33,454-Speed 3384.46 samples/sec   Loss 3.8158   LearningRate 0.0226   Epoch: 10   Global Step: 59610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:36,481-Speed 3384.09 samples/sec   Loss 3.8306   LearningRate 0.0226   Epoch: 10   Global Step: 59620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:39,501-Speed 3391.62 samples/sec   Loss 3.8756   LearningRate 0.0226   Epoch: 10   Global Step: 59630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:42,537-Speed 3374.26 samples/sec   Loss 3.8348   LearningRate 0.0226   Epoch: 10   Global Step: 59640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:45,563-Speed 3385.09 samples/sec   Loss 3.8663   LearningRate 0.0226   Epoch: 10   Global Step: 59650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:48,593-Speed 3379.16 samples/sec   Loss 3.8640   LearningRate 0.0226   Epoch: 10   Global Step: 59660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:51,623-Speed 3380.99 samples/sec   Loss 3.8460   LearningRate 0.0226   Epoch: 10   Global Step: 59670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:54,653-Speed 3380.13 samples/sec   Loss 3.9213   LearningRate 0.0226   Epoch: 10   Global Step: 59680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 07:35:57,682-Speed 3381.25 samples/sec   Loss 3.8545   LearningRate 0.0226   Epoch: 10   Global Step: 59690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:00,718-Speed 3374.24 samples/sec   Loss 3.8529   LearningRate 0.0226   Epoch: 10   Global Step: 59700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:03,743-Speed 3385.19 samples/sec   Loss 3.9357   LearningRate 0.0226   Epoch: 10   Global Step: 59710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:06,772-Speed 3381.95 samples/sec   Loss 3.8500   LearningRate 0.0225   Epoch: 10   Global Step: 59720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:09,793-Speed 3390.38 samples/sec   Loss 3.8728   LearningRate 0.0225   Epoch: 10   Global Step: 59730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:12,819-Speed 3384.60 samples/sec   Loss 3.8765   LearningRate 0.0225   Epoch: 10   Global Step: 59740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:15,841-Speed 3390.01 samples/sec   Loss 3.9396   LearningRate 0.0225   Epoch: 10   Global Step: 59750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:18,867-Speed 3384.55 samples/sec   Loss 3.9261   LearningRate 0.0225   Epoch: 10   Global Step: 59760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:21,893-Speed 3384.67 samples/sec   Loss 3.8890   LearningRate 0.0225   Epoch: 10   Global Step: 59770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:24,926-Speed 3376.43 samples/sec   Loss 3.9090   LearningRate 0.0225   Epoch: 10   Global Step: 59780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:27,951-Speed 3386.98 samples/sec   Loss 3.9473   LearningRate 0.0225   Epoch: 10   Global Step: 59790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:36:30,976-Speed 3385.25 samples/sec   Loss 3.9697   LearningRate 0.0225   Epoch: 10   Global Step: 59800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:36:34,016-Speed 3369.44 samples/sec   Loss 3.9342   LearningRate 0.0225   Epoch: 10   Global Step: 59810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:36:37,023-Speed 3405.59 samples/sec   Loss 3.8504   LearningRate 0.0225   Epoch: 10   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:40,067-Speed 3365.18 samples/sec   Loss 3.8272   LearningRate 0.0225   Epoch: 10   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:43,095-Speed 3382.52 samples/sec   Loss 3.7874   LearningRate 0.0224   Epoch: 10   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:46,120-Speed 3386.16 samples/sec   Loss 3.8010   LearningRate 0.0224   Epoch: 10   Global Step: 59850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:49,145-Speed 3385.40 samples/sec   Loss 3.9303   LearningRate 0.0224   Epoch: 10   Global Step: 59860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:52,169-Speed 3387.96 samples/sec   Loss 3.8951   LearningRate 0.0224   Epoch: 10   Global Step: 59870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:55,206-Speed 3372.47 samples/sec   Loss 3.7908   LearningRate 0.0224   Epoch: 10   Global Step: 59880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:36:58,230-Speed 3387.13 samples/sec   Loss 3.8393   LearningRate 0.0224   Epoch: 10   Global Step: 59890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:01,266-Speed 3373.51 samples/sec   Loss 3.8642   LearningRate 0.0224   Epoch: 10   Global Step: 59900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:04,291-Speed 3386.16 samples/sec   Loss 3.8864   LearningRate 0.0224   Epoch: 10   Global Step: 59910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:07,300-Speed 3403.19 samples/sec   Loss 4.0176   LearningRate 0.0224   Epoch: 10   Global Step: 59920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:10,327-Speed 3383.77 samples/sec   Loss 3.7647   LearningRate 0.0224   Epoch: 10   Global Step: 59930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:13,353-Speed 3384.20 samples/sec   Loss 3.9385   LearningRate 0.0224   Epoch: 10   Global Step: 59940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:16,378-Speed 3386.63 samples/sec   Loss 3.8881   LearningRate 0.0224   Epoch: 10   Global Step: 59950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:19,402-Speed 3387.01 samples/sec   Loss 3.9566   LearningRate 0.0223   Epoch: 10   Global Step: 59960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:22,430-Speed 3382.57 samples/sec   Loss 3.8550   LearningRate 0.0223   Epoch: 10   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:25,516-Speed 3319.58 samples/sec   Loss 3.7486   LearningRate 0.0223   Epoch: 10   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:28,569-Speed 3354.60 samples/sec   Loss 3.8915   LearningRate 0.0223   Epoch: 10   Global Step: 59990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:37:31,595-Speed 3384.30 samples/sec   Loss 3.8998   LearningRate 0.0223   Epoch: 10   Global Step: 60000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:38:15,326-[lfw][60000]XNorm: 22.612268
Training: 2022-04-27 07:38:15,327-[lfw][60000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-27 07:38:15,327-[lfw][60000]Accuracy-Highest: 0.99817
Training: 2022-04-27 07:39:06,236-[cfp_fp][60000]XNorm: 20.646764
Training: 2022-04-27 07:39:06,237-[cfp_fp][60000]Accuracy-Flip: 0.97100+-0.00930
Training: 2022-04-27 07:39:06,237-[cfp_fp][60000]Accuracy-Highest: 0.97100
Training: 2022-04-27 07:39:49,518-[agedb_30][60000]XNorm: 22.595011
Training: 2022-04-27 07:39:49,519-[agedb_30][60000]Accuracy-Flip: 0.97500+-0.00830
Training: 2022-04-27 07:39:49,519-[agedb_30][60000]Accuracy-Highest: 0.97767
Training: 2022-04-27 07:39:52,544-Speed 72.65 samples/sec   Loss 3.7668   LearningRate 0.0223   Epoch: 10   Global Step: 60010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:39:55,548-Speed 3408.75 samples/sec   Loss 3.8990   LearningRate 0.0223   Epoch: 10   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:39:58,557-Speed 3404.68 samples/sec   Loss 3.9034   LearningRate 0.0223   Epoch: 10   Global Step: 60030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:01,564-Speed 3405.42 samples/sec   Loss 3.7317   LearningRate 0.0223   Epoch: 10   Global Step: 60040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:04,581-Speed 3395.47 samples/sec   Loss 3.8061   LearningRate 0.0223   Epoch: 10   Global Step: 60050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:07,594-Speed 3399.04 samples/sec   Loss 3.8756   LearningRate 0.0223   Epoch: 10   Global Step: 60060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:10,624-Speed 3380.92 samples/sec   Loss 3.8022   LearningRate 0.0223   Epoch: 10   Global Step: 60070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:13,662-Speed 3371.41 samples/sec   Loss 3.8833   LearningRate 0.0222   Epoch: 10   Global Step: 60080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:16,677-Speed 3397.24 samples/sec   Loss 3.7955   LearningRate 0.0222   Epoch: 10   Global Step: 60090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:19,673-Speed 3418.65 samples/sec   Loss 3.9954   LearningRate 0.0222   Epoch: 10   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:22,692-Speed 3391.78 samples/sec   Loss 3.7805   LearningRate 0.0222   Epoch: 10   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:25,705-Speed 3400.57 samples/sec   Loss 3.8500   LearningRate 0.0222   Epoch: 10   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:28,718-Speed 3398.64 samples/sec   Loss 3.8510   LearningRate 0.0222   Epoch: 10   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:31,731-Speed 3399.35 samples/sec   Loss 3.9704   LearningRate 0.0222   Epoch: 10   Global Step: 60140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:34,759-Speed 3383.01 samples/sec   Loss 3.7537   LearningRate 0.0222   Epoch: 10   Global Step: 60150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:37,785-Speed 3384.48 samples/sec   Loss 4.0206   LearningRate 0.0222   Epoch: 10   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:40,827-Speed 3367.72 samples/sec   Loss 3.8718   LearningRate 0.0222   Epoch: 10   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:43,840-Speed 3398.86 samples/sec   Loss 3.9862   LearningRate 0.0222   Epoch: 10   Global Step: 60180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:46,856-Speed 3397.09 samples/sec   Loss 3.9536   LearningRate 0.0222   Epoch: 10   Global Step: 60190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:40:49,881-Speed 3385.05 samples/sec   Loss 3.9151   LearningRate 0.0221   Epoch: 10   Global Step: 60200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:52,899-Speed 3393.91 samples/sec   Loss 3.9747   LearningRate 0.0221   Epoch: 10   Global Step: 60210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:55,924-Speed 3386.62 samples/sec   Loss 3.9086   LearningRate 0.0221   Epoch: 10   Global Step: 60220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:40:58,987-Speed 3343.16 samples/sec   Loss 3.8821   LearningRate 0.0221   Epoch: 10   Global Step: 60230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:41:02,010-Speed 3388.69 samples/sec   Loss 3.8480   LearningRate 0.0221   Epoch: 10   Global Step: 60240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:41:05,038-Speed 3382.61 samples/sec   Loss 3.8725   LearningRate 0.0221   Epoch: 10   Global Step: 60250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:41:08,039-Speed 3412.60 samples/sec   Loss 3.8178   LearningRate 0.0221   Epoch: 10   Global Step: 60260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:11,054-Speed 3397.51 samples/sec   Loss 3.8767   LearningRate 0.0221   Epoch: 10   Global Step: 60270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:14,076-Speed 3389.53 samples/sec   Loss 3.8270   LearningRate 0.0221   Epoch: 10   Global Step: 60280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:17,099-Speed 3388.18 samples/sec   Loss 3.9652   LearningRate 0.0221   Epoch: 10   Global Step: 60290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:20,118-Speed 3392.37 samples/sec   Loss 4.0179   LearningRate 0.0221   Epoch: 10   Global Step: 60300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:23,132-Speed 3398.19 samples/sec   Loss 3.7339   LearningRate 0.0221   Epoch: 10   Global Step: 60310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:26,163-Speed 3379.63 samples/sec   Loss 3.8362   LearningRate 0.0221   Epoch: 10   Global Step: 60320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:29,183-Speed 3390.84 samples/sec   Loss 3.8498   LearningRate 0.0220   Epoch: 10   Global Step: 60330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:32,205-Speed 3390.29 samples/sec   Loss 3.8319   LearningRate 0.0220   Epoch: 10   Global Step: 60340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:35,246-Speed 3367.95 samples/sec   Loss 3.8489   LearningRate 0.0220   Epoch: 10   Global Step: 60350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 07:41:38,262-Speed 3395.59 samples/sec   Loss 4.0056   LearningRate 0.0220   Epoch: 10   Global Step: 60360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-27 07:41:41,280-Speed 3393.86 samples/sec   Loss 3.8149   LearningRate 0.0220   Epoch: 10   Global Step: 60370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:41:44,279-Speed 3415.84 samples/sec   Loss 3.8222   LearningRate 0.0220   Epoch: 10   Global Step: 60380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:41:47,303-Speed 3387.07 samples/sec   Loss 3.8170   LearningRate 0.0220   Epoch: 10   Global Step: 60390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:41:50,325-Speed 3389.45 samples/sec   Loss 3.7780   LearningRate 0.0220   Epoch: 10   Global Step: 60400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:41:53,345-Speed 3391.05 samples/sec   Loss 3.7261   LearningRate 0.0220   Epoch: 10   Global Step: 60410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:41:56,373-Speed 3382.35 samples/sec   Loss 3.8417   LearningRate 0.0220   Epoch: 10   Global Step: 60420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:41:59,403-Speed 3380.51 samples/sec   Loss 3.8729   LearningRate 0.0220   Epoch: 10   Global Step: 60430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:02,419-Speed 3396.36 samples/sec   Loss 3.7559   LearningRate 0.0220   Epoch: 10   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:05,441-Speed 3388.93 samples/sec   Loss 4.0111   LearningRate 0.0219   Epoch: 10   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:08,453-Speed 3400.06 samples/sec   Loss 3.8377   LearningRate 0.0219   Epoch: 10   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:11,470-Speed 3395.58 samples/sec   Loss 4.0651   LearningRate 0.0219   Epoch: 10   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:14,489-Speed 3392.35 samples/sec   Loss 3.7765   LearningRate 0.0219   Epoch: 10   Global Step: 60480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:17,512-Speed 3388.49 samples/sec   Loss 3.9519   LearningRate 0.0219   Epoch: 10   Global Step: 60490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:20,525-Speed 3399.36 samples/sec   Loss 3.7927   LearningRate 0.0219   Epoch: 10   Global Step: 60500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:23,552-Speed 3383.59 samples/sec   Loss 3.7887   LearningRate 0.0219   Epoch: 10   Global Step: 60510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:26,568-Speed 3396.05 samples/sec   Loss 3.7123   LearningRate 0.0219   Epoch: 10   Global Step: 60520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:29,585-Speed 3395.16 samples/sec   Loss 3.8550   LearningRate 0.0219   Epoch: 10   Global Step: 60530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:32,608-Speed 3388.55 samples/sec   Loss 3.8861   LearningRate 0.0219   Epoch: 10   Global Step: 60540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:35,640-Speed 3377.84 samples/sec   Loss 3.9415   LearningRate 0.0219   Epoch: 10   Global Step: 60550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:38,664-Speed 3387.43 samples/sec   Loss 3.8027   LearningRate 0.0219   Epoch: 10   Global Step: 60560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:41,681-Speed 3394.15 samples/sec   Loss 3.8249   LearningRate 0.0218   Epoch: 10   Global Step: 60570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:44,707-Speed 3385.36 samples/sec   Loss 3.8392   LearningRate 0.0218   Epoch: 10   Global Step: 60580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:42:47,729-Speed 3388.98 samples/sec   Loss 3.8165   LearningRate 0.0218   Epoch: 10   Global Step: 60590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:42:50,750-Speed 3390.11 samples/sec   Loss 3.9009   LearningRate 0.0218   Epoch: 10   Global Step: 60600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:42:53,752-Speed 3411.82 samples/sec   Loss 3.8560   LearningRate 0.0218   Epoch: 10   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:56,769-Speed 3395.68 samples/sec   Loss 3.7796   LearningRate 0.0218   Epoch: 10   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:42:59,797-Speed 3381.82 samples/sec   Loss 3.8844   LearningRate 0.0218   Epoch: 10   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:02,822-Speed 3386.61 samples/sec   Loss 3.8849   LearningRate 0.0218   Epoch: 10   Global Step: 60640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:05,856-Speed 3375.07 samples/sec   Loss 3.9268   LearningRate 0.0218   Epoch: 10   Global Step: 60650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:08,880-Speed 3387.38 samples/sec   Loss 3.8695   LearningRate 0.0218   Epoch: 10   Global Step: 60660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:11,898-Speed 3394.34 samples/sec   Loss 3.8109   LearningRate 0.0218   Epoch: 10   Global Step: 60670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:14,916-Speed 3393.61 samples/sec   Loss 3.8844   LearningRate 0.0218   Epoch: 10   Global Step: 60680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:17,935-Speed 3393.06 samples/sec   Loss 3.7881   LearningRate 0.0217   Epoch: 10   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:20,952-Speed 3394.42 samples/sec   Loss 3.7662   LearningRate 0.0217   Epoch: 10   Global Step: 60700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:23,970-Speed 3393.67 samples/sec   Loss 3.8680   LearningRate 0.0217   Epoch: 10   Global Step: 60710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:43:26,993-Speed 3388.28 samples/sec   Loss 3.8226   LearningRate 0.0217   Epoch: 10   Global Step: 60720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:30,008-Speed 3396.71 samples/sec   Loss 3.7919   LearningRate 0.0217   Epoch: 10   Global Step: 60730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:33,026-Speed 3394.17 samples/sec   Loss 3.7610   LearningRate 0.0217   Epoch: 10   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:36,053-Speed 3383.41 samples/sec   Loss 3.9512   LearningRate 0.0217   Epoch: 10   Global Step: 60750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:39,069-Speed 3395.98 samples/sec   Loss 3.8744   LearningRate 0.0217   Epoch: 10   Global Step: 60760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:42,088-Speed 3393.03 samples/sec   Loss 3.8084   LearningRate 0.0217   Epoch: 10   Global Step: 60770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:45,110-Speed 3389.42 samples/sec   Loss 3.9749   LearningRate 0.0217   Epoch: 10   Global Step: 60780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:48,124-Speed 3398.10 samples/sec   Loss 3.7852   LearningRate 0.0217   Epoch: 10   Global Step: 60790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:51,160-Speed 3373.73 samples/sec   Loss 3.9183   LearningRate 0.0217   Epoch: 10   Global Step: 60800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:54,176-Speed 3395.50 samples/sec   Loss 3.7733   LearningRate 0.0216   Epoch: 10   Global Step: 60810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:43:57,205-Speed 3382.00 samples/sec   Loss 3.7855   LearningRate 0.0216   Epoch: 10   Global Step: 60820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:00,226-Speed 3389.76 samples/sec   Loss 3.7790   LearningRate 0.0216   Epoch: 10   Global Step: 60830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:03,246-Speed 3392.76 samples/sec   Loss 3.9233   LearningRate 0.0216   Epoch: 10   Global Step: 60840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:06,272-Speed 3383.93 samples/sec   Loss 3.8334   LearningRate 0.0216   Epoch: 10   Global Step: 60850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:09,291-Speed 3393.47 samples/sec   Loss 3.7236   LearningRate 0.0216   Epoch: 10   Global Step: 60860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:12,313-Speed 3388.65 samples/sec   Loss 3.8079   LearningRate 0.0216   Epoch: 10   Global Step: 60870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:15,351-Speed 3371.80 samples/sec   Loss 3.8241   LearningRate 0.0216   Epoch: 10   Global Step: 60880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:18,376-Speed 3384.96 samples/sec   Loss 3.8240   LearningRate 0.0216   Epoch: 10   Global Step: 60890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:21,394-Speed 3394.89 samples/sec   Loss 3.9562   LearningRate 0.0216   Epoch: 10   Global Step: 60900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:24,411-Speed 3394.57 samples/sec   Loss 3.9157   LearningRate 0.0216   Epoch: 10   Global Step: 60910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:27,435-Speed 3386.47 samples/sec   Loss 3.7068   LearningRate 0.0216   Epoch: 10   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:44:30,454-Speed 3392.98 samples/sec   Loss 3.8143   LearningRate 0.0215   Epoch: 10   Global Step: 60930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:33,471-Speed 3395.81 samples/sec   Loss 3.7759   LearningRate 0.0215   Epoch: 10   Global Step: 60940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:36,492-Speed 3389.43 samples/sec   Loss 3.8631   LearningRate 0.0215   Epoch: 10   Global Step: 60950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:39,515-Speed 3388.39 samples/sec   Loss 3.8064   LearningRate 0.0215   Epoch: 10   Global Step: 60960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:42,536-Speed 3391.15 samples/sec   Loss 3.9672   LearningRate 0.0215   Epoch: 10   Global Step: 60970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:45,573-Speed 3372.47 samples/sec   Loss 3.7307   LearningRate 0.0215   Epoch: 10   Global Step: 60980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:48,595-Speed 3388.68 samples/sec   Loss 3.9150   LearningRate 0.0215   Epoch: 10   Global Step: 60990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:51,622-Speed 3384.19 samples/sec   Loss 3.8938   LearningRate 0.0215   Epoch: 10   Global Step: 61000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:54,645-Speed 3388.28 samples/sec   Loss 3.8666   LearningRate 0.0215   Epoch: 10   Global Step: 61010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:44:57,650-Speed 3408.23 samples/sec   Loss 3.7899   LearningRate 0.0215   Epoch: 10   Global Step: 61020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:00,669-Speed 3392.93 samples/sec   Loss 3.7804   LearningRate 0.0215   Epoch: 10   Global Step: 61030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:03,693-Speed 3386.60 samples/sec   Loss 3.9148   LearningRate 0.0215   Epoch: 10   Global Step: 61040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:06,718-Speed 3386.44 samples/sec   Loss 4.0718   LearningRate 0.0215   Epoch: 10   Global Step: 61050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:09,738-Speed 3391.32 samples/sec   Loss 3.7491   LearningRate 0.0214   Epoch: 10   Global Step: 61060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:12,776-Speed 3370.70 samples/sec   Loss 3.7158   LearningRate 0.0214   Epoch: 10   Global Step: 61070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:15,797-Speed 3390.64 samples/sec   Loss 3.9446   LearningRate 0.0214   Epoch: 10   Global Step: 61080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:18,823-Speed 3384.76 samples/sec   Loss 3.8696   LearningRate 0.0214   Epoch: 10   Global Step: 61090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:21,850-Speed 3383.78 samples/sec   Loss 3.7441   LearningRate 0.0214   Epoch: 10   Global Step: 61100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:24,877-Speed 3383.89 samples/sec   Loss 3.7785   LearningRate 0.0214   Epoch: 10   Global Step: 61110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:45:27,894-Speed 3395.05 samples/sec   Loss 3.8427   LearningRate 0.0214   Epoch: 10   Global Step: 61120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:30,919-Speed 3385.99 samples/sec   Loss 3.9272   LearningRate 0.0214   Epoch: 10   Global Step: 61130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:33,939-Speed 3390.95 samples/sec   Loss 3.9062   LearningRate 0.0214   Epoch: 10   Global Step: 61140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:36,965-Speed 3385.36 samples/sec   Loss 3.8356   LearningRate 0.0214   Epoch: 10   Global Step: 61150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:39,985-Speed 3390.83 samples/sec   Loss 3.9093   LearningRate 0.0214   Epoch: 10   Global Step: 61160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:43,011-Speed 3385.38 samples/sec   Loss 3.7703   LearningRate 0.0214   Epoch: 10   Global Step: 61170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:46,038-Speed 3383.49 samples/sec   Loss 3.8328   LearningRate 0.0213   Epoch: 10   Global Step: 61180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:49,062-Speed 3386.10 samples/sec   Loss 3.8764   LearningRate 0.0213   Epoch: 10   Global Step: 61190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:52,092-Speed 3382.13 samples/sec   Loss 3.7074   LearningRate 0.0213   Epoch: 10   Global Step: 61200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:55,114-Speed 3388.63 samples/sec   Loss 3.7718   LearningRate 0.0213   Epoch: 10   Global Step: 61210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:45:58,138-Speed 3387.02 samples/sec   Loss 3.6873   LearningRate 0.0213   Epoch: 10   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:01,163-Speed 3386.10 samples/sec   Loss 3.8604   LearningRate 0.0213   Epoch: 10   Global Step: 61230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:04,192-Speed 3382.04 samples/sec   Loss 3.9360   LearningRate 0.0213   Epoch: 10   Global Step: 61240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:07,219-Speed 3382.85 samples/sec   Loss 3.7157   LearningRate 0.0213   Epoch: 10   Global Step: 61250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:10,253-Speed 3376.49 samples/sec   Loss 3.8579   LearningRate 0.0213   Epoch: 10   Global Step: 61260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:13,291-Speed 3370.69 samples/sec   Loss 3.8946   LearningRate 0.0213   Epoch: 10   Global Step: 61270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:16,321-Speed 3380.26 samples/sec   Loss 3.7829   LearningRate 0.0213   Epoch: 10   Global Step: 61280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:19,349-Speed 3383.85 samples/sec   Loss 3.9771   LearningRate 0.0213   Epoch: 10   Global Step: 61290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:22,390-Speed 3367.20 samples/sec   Loss 3.8960   LearningRate 0.0212   Epoch: 10   Global Step: 61300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:25,422-Speed 3378.82 samples/sec   Loss 3.6986   LearningRate 0.0212   Epoch: 10   Global Step: 61310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:28,453-Speed 3378.75 samples/sec   Loss 3.7547   LearningRate 0.0212   Epoch: 10   Global Step: 61320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:46:31,482-Speed 3381.13 samples/sec   Loss 3.8664   LearningRate 0.0212   Epoch: 10   Global Step: 61330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:46:34,509-Speed 3384.55 samples/sec   Loss 3.7552   LearningRate 0.0212   Epoch: 10   Global Step: 61340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:46:37,526-Speed 3394.43 samples/sec   Loss 3.7690   LearningRate 0.0212   Epoch: 10   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:40,553-Speed 3383.84 samples/sec   Loss 3.8087   LearningRate 0.0212   Epoch: 10   Global Step: 61360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:43,575-Speed 3388.81 samples/sec   Loss 3.7342   LearningRate 0.0212   Epoch: 10   Global Step: 61370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:46,606-Speed 3379.97 samples/sec   Loss 3.8944   LearningRate 0.0212   Epoch: 10   Global Step: 61380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:49,636-Speed 3380.12 samples/sec   Loss 3.9031   LearningRate 0.0212   Epoch: 10   Global Step: 61390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:52,662-Speed 3385.56 samples/sec   Loss 3.7237   LearningRate 0.0212   Epoch: 10   Global Step: 61400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:55,694-Speed 3377.94 samples/sec   Loss 3.7693   LearningRate 0.0212   Epoch: 10   Global Step: 61410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:46:58,715-Speed 3390.08 samples/sec   Loss 3.6368   LearningRate 0.0212   Epoch: 10   Global Step: 61420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:01,744-Speed 3381.59 samples/sec   Loss 3.8594   LearningRate 0.0211   Epoch: 10   Global Step: 61430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:04,794-Speed 3358.23 samples/sec   Loss 3.8115   LearningRate 0.0211   Epoch: 10   Global Step: 61440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:07,816-Speed 3388.78 samples/sec   Loss 3.8546   LearningRate 0.0211   Epoch: 10   Global Step: 61450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:47:10,837-Speed 3389.92 samples/sec   Loss 3.9443   LearningRate 0.0211   Epoch: 10   Global Step: 61460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:47:13,860-Speed 3388.42 samples/sec   Loss 3.8635   LearningRate 0.0211   Epoch: 10   Global Step: 61470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:47:16,872-Speed 3400.67 samples/sec   Loss 3.7792   LearningRate 0.0211   Epoch: 10   Global Step: 61480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:19,904-Speed 3379.38 samples/sec   Loss 3.8911   LearningRate 0.0211   Epoch: 10   Global Step: 61490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:22,929-Speed 3385.14 samples/sec   Loss 3.7879   LearningRate 0.0211   Epoch: 10   Global Step: 61500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:25,961-Speed 3378.43 samples/sec   Loss 3.8662   LearningRate 0.0211   Epoch: 10   Global Step: 61510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:28,986-Speed 3385.25 samples/sec   Loss 3.7207   LearningRate 0.0211   Epoch: 10   Global Step: 61520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:32,019-Speed 3377.11 samples/sec   Loss 3.7553   LearningRate 0.0211   Epoch: 10   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:35,058-Speed 3370.73 samples/sec   Loss 3.8099   LearningRate 0.0211   Epoch: 10   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:38,090-Speed 3378.13 samples/sec   Loss 3.8381   LearningRate 0.0210   Epoch: 10   Global Step: 61550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:41,122-Speed 3378.72 samples/sec   Loss 3.7424   LearningRate 0.0210   Epoch: 10   Global Step: 61560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:44,151-Speed 3381.36 samples/sec   Loss 3.8487   LearningRate 0.0210   Epoch: 10   Global Step: 61570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:47,175-Speed 3386.07 samples/sec   Loss 3.8419   LearningRate 0.0210   Epoch: 10   Global Step: 61580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:47:50,204-Speed 3381.44 samples/sec   Loss 3.7879   LearningRate 0.0210   Epoch: 10   Global Step: 61590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:47:53,215-Speed 3401.91 samples/sec   Loss 3.8378   LearningRate 0.0210   Epoch: 10   Global Step: 61600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:56,244-Speed 3381.88 samples/sec   Loss 3.8074   LearningRate 0.0210   Epoch: 10   Global Step: 61610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:47:59,274-Speed 3379.57 samples/sec   Loss 3.8358   LearningRate 0.0210   Epoch: 10   Global Step: 61620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:02,307-Speed 3377.18 samples/sec   Loss 3.6841   LearningRate 0.0210   Epoch: 10   Global Step: 61630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:05,332-Speed 3385.61 samples/sec   Loss 3.8945   LearningRate 0.0210   Epoch: 10   Global Step: 61640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:08,359-Speed 3384.83 samples/sec   Loss 3.8963   LearningRate 0.0210   Epoch: 10   Global Step: 61650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:11,390-Speed 3378.32 samples/sec   Loss 3.6400   LearningRate 0.0210   Epoch: 10   Global Step: 61660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:14,420-Speed 3381.36 samples/sec   Loss 3.6984   LearningRate 0.0209   Epoch: 10   Global Step: 61670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:17,450-Speed 3379.99 samples/sec   Loss 3.8067   LearningRate 0.0209   Epoch: 10   Global Step: 61680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:20,479-Speed 3380.80 samples/sec   Loss 3.7793   LearningRate 0.0209   Epoch: 10   Global Step: 61690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:23,506-Speed 3383.77 samples/sec   Loss 3.7874   LearningRate 0.0209   Epoch: 10   Global Step: 61700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:48:26,517-Speed 3401.75 samples/sec   Loss 3.8134   LearningRate 0.0209   Epoch: 10   Global Step: 61710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:29,550-Speed 3377.18 samples/sec   Loss 3.7575   LearningRate 0.0209   Epoch: 10   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:32,579-Speed 3381.67 samples/sec   Loss 3.7263   LearningRate 0.0209   Epoch: 10   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:35,606-Speed 3383.40 samples/sec   Loss 3.7534   LearningRate 0.0209   Epoch: 10   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:38,630-Speed 3386.79 samples/sec   Loss 3.8225   LearningRate 0.0209   Epoch: 10   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:41,661-Speed 3380.13 samples/sec   Loss 3.8908   LearningRate 0.0209   Epoch: 10   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:44,686-Speed 3384.98 samples/sec   Loss 3.7132   LearningRate 0.0209   Epoch: 10   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:47,735-Speed 3360.37 samples/sec   Loss 3.8565   LearningRate 0.0209   Epoch: 10   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:50,767-Speed 3377.45 samples/sec   Loss 3.6829   LearningRate 0.0209   Epoch: 10   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:53,804-Speed 3372.12 samples/sec   Loss 3.8195   LearningRate 0.0208   Epoch: 10   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:48:56,832-Speed 3382.96 samples/sec   Loss 3.8683   LearningRate 0.0208   Epoch: 10   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:48:59,845-Speed 3399.81 samples/sec   Loss 3.7303   LearningRate 0.0208   Epoch: 10   Global Step: 61820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:02,874-Speed 3382.13 samples/sec   Loss 3.6995   LearningRate 0.0208   Epoch: 10   Global Step: 61830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:05,904-Speed 3379.47 samples/sec   Loss 3.7766   LearningRate 0.0208   Epoch: 10   Global Step: 61840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:08,927-Speed 3388.36 samples/sec   Loss 3.7710   LearningRate 0.0208   Epoch: 10   Global Step: 61850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:11,969-Speed 3367.22 samples/sec   Loss 3.7800   LearningRate 0.0208   Epoch: 10   Global Step: 61860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:14,991-Speed 3388.39 samples/sec   Loss 3.7434   LearningRate 0.0208   Epoch: 10   Global Step: 61870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:18,016-Speed 3386.96 samples/sec   Loss 3.7315   LearningRate 0.0208   Epoch: 10   Global Step: 61880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:21,038-Speed 3388.67 samples/sec   Loss 3.9349   LearningRate 0.0208   Epoch: 10   Global Step: 61890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:24,074-Speed 3373.30 samples/sec   Loss 3.7081   LearningRate 0.0208   Epoch: 10   Global Step: 61900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:27,102-Speed 3382.81 samples/sec   Loss 3.7553   LearningRate 0.0208   Epoch: 10   Global Step: 61910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:30,109-Speed 3407.18 samples/sec   Loss 3.7804   LearningRate 0.0207   Epoch: 10   Global Step: 61920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:33,138-Speed 3381.34 samples/sec   Loss 3.8382   LearningRate 0.0207   Epoch: 10   Global Step: 61930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:36,165-Speed 3382.72 samples/sec   Loss 3.6931   LearningRate 0.0207   Epoch: 10   Global Step: 61940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:39,189-Speed 3387.72 samples/sec   Loss 3.8754   LearningRate 0.0207   Epoch: 10   Global Step: 61950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:42,213-Speed 3386.21 samples/sec   Loss 3.8765   LearningRate 0.0207   Epoch: 10   Global Step: 61960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:49:45,237-Speed 3387.30 samples/sec   Loss 3.7583   LearningRate 0.0207   Epoch: 10   Global Step: 61970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:49:48,323-Speed 3319.72 samples/sec   Loss 3.7544   LearningRate 0.0207   Epoch: 10   Global Step: 61980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:49:51,447-Speed 3278.55 samples/sec   Loss 3.8248   LearningRate 0.0207   Epoch: 10   Global Step: 61990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:49:54,514-Speed 3338.98 samples/sec   Loss 3.6825   LearningRate 0.0207   Epoch: 10   Global Step: 62000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:50:37,725-[lfw][62000]XNorm: 23.436172
Training: 2022-04-27 07:50:37,725-[lfw][62000]Accuracy-Flip: 0.99800+-0.00277
Training: 2022-04-27 07:50:37,726-[lfw][62000]Accuracy-Highest: 0.99817
Training: 2022-04-27 07:51:28,095-[cfp_fp][62000]XNorm: 21.172719
Training: 2022-04-27 07:51:28,095-[cfp_fp][62000]Accuracy-Flip: 0.97243+-0.00688
Training: 2022-04-27 07:51:28,096-[cfp_fp][62000]Accuracy-Highest: 0.97243
Training: 2022-04-27 07:52:11,293-[agedb_30][62000]XNorm: 23.276430
Training: 2022-04-27 07:52:11,294-[agedb_30][62000]Accuracy-Flip: 0.97600+-0.00754
Training: 2022-04-27 07:52:11,294-[agedb_30][62000]Accuracy-Highest: 0.97767
Training: 2022-04-27 07:52:14,320-Speed 73.25 samples/sec   Loss 3.7732   LearningRate 0.0207   Epoch: 10   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:52:17,319-Speed 3415.25 samples/sec   Loss 3.8230   LearningRate 0.0207   Epoch: 10   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:52:20,333-Speed 3398.70 samples/sec   Loss 3.7938   LearningRate 0.0207   Epoch: 10   Global Step: 62030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:52:23,343-Speed 3402.08 samples/sec   Loss 3.7792   LearningRate 0.0207   Epoch: 10   Global Step: 62040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:52:26,355-Speed 3400.25 samples/sec   Loss 3.7866   LearningRate 0.0206   Epoch: 10   Global Step: 62050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:52:29,370-Speed 3397.65 samples/sec   Loss 3.7990   LearningRate 0.0206   Epoch: 10   Global Step: 62060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:52:32,378-Speed 3405.45 samples/sec   Loss 3.8182   LearningRate 0.0206   Epoch: 10   Global Step: 62070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:35,411-Speed 3376.01 samples/sec   Loss 3.7185   LearningRate 0.0206   Epoch: 10   Global Step: 62080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:38,424-Speed 3399.20 samples/sec   Loss 3.7235   LearningRate 0.0206   Epoch: 10   Global Step: 62090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:41,456-Speed 3379.00 samples/sec   Loss 3.8276   LearningRate 0.0206   Epoch: 10   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:44,474-Speed 3393.50 samples/sec   Loss 3.7101   LearningRate 0.0206   Epoch: 10   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:47,490-Speed 3396.64 samples/sec   Loss 3.7740   LearningRate 0.0206   Epoch: 10   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:50,508-Speed 3392.99 samples/sec   Loss 3.8108   LearningRate 0.0206   Epoch: 10   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:53,528-Speed 3391.86 samples/sec   Loss 3.7677   LearningRate 0.0206   Epoch: 10   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:56,593-Speed 3341.61 samples/sec   Loss 3.7842   LearningRate 0.0206   Epoch: 10   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:52:59,616-Speed 3388.51 samples/sec   Loss 3.7305   LearningRate 0.0206   Epoch: 10   Global Step: 62160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:02,654-Speed 3371.61 samples/sec   Loss 3.8499   LearningRate 0.0205   Epoch: 10   Global Step: 62170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:53:05,678-Speed 3386.49 samples/sec   Loss 3.7167   LearningRate 0.0205   Epoch: 10   Global Step: 62180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:08,708-Speed 3380.32 samples/sec   Loss 3.6843   LearningRate 0.0205   Epoch: 10   Global Step: 62190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:11,738-Speed 3380.95 samples/sec   Loss 3.7741   LearningRate 0.0205   Epoch: 10   Global Step: 62200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:14,793-Speed 3352.77 samples/sec   Loss 3.7577   LearningRate 0.0205   Epoch: 10   Global Step: 62210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:17,822-Speed 3381.47 samples/sec   Loss 3.8189   LearningRate 0.0205   Epoch: 10   Global Step: 62220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:20,848-Speed 3384.58 samples/sec   Loss 3.7015   LearningRate 0.0205   Epoch: 10   Global Step: 62230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:23,898-Speed 3358.08 samples/sec   Loss 3.8283   LearningRate 0.0205   Epoch: 10   Global Step: 62240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:26,925-Speed 3383.60 samples/sec   Loss 3.7482   LearningRate 0.0205   Epoch: 10   Global Step: 62250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:29,949-Speed 3387.19 samples/sec   Loss 3.7481   LearningRate 0.0205   Epoch: 10   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:32,978-Speed 3381.66 samples/sec   Loss 3.7854   LearningRate 0.0205   Epoch: 10   Global Step: 62270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:35,999-Speed 3389.62 samples/sec   Loss 3.7112   LearningRate 0.0205   Epoch: 10   Global Step: 62280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:53:39,019-Speed 3392.09 samples/sec   Loss 3.7271   LearningRate 0.0205   Epoch: 10   Global Step: 62290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:53:42,037-Speed 3393.89 samples/sec   Loss 3.8560   LearningRate 0.0204   Epoch: 10   Global Step: 62300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:53:45,041-Speed 3409.12 samples/sec   Loss 3.7716   LearningRate 0.0204   Epoch: 10   Global Step: 62310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:48,122-Speed 3325.05 samples/sec   Loss 3.6914   LearningRate 0.0204   Epoch: 10   Global Step: 62320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:51,137-Speed 3396.79 samples/sec   Loss 3.6915   LearningRate 0.0204   Epoch: 10   Global Step: 62330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:54,151-Speed 3398.54 samples/sec   Loss 3.8926   LearningRate 0.0204   Epoch: 10   Global Step: 62340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:53:57,162-Speed 3400.82 samples/sec   Loss 3.7414   LearningRate 0.0204   Epoch: 10   Global Step: 62350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:00,182-Speed 3392.18 samples/sec   Loss 3.7475   LearningRate 0.0204   Epoch: 10   Global Step: 62360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:03,209-Speed 3383.27 samples/sec   Loss 3.7895   LearningRate 0.0204   Epoch: 10   Global Step: 62370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:06,223-Speed 3398.89 samples/sec   Loss 3.7209   LearningRate 0.0204   Epoch: 10   Global Step: 62380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:09,238-Speed 3397.50 samples/sec   Loss 3.7185   LearningRate 0.0204   Epoch: 10   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:12,251-Speed 3398.25 samples/sec   Loss 3.7766   LearningRate 0.0204   Epoch: 10   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:15,278-Speed 3384.57 samples/sec   Loss 3.7262   LearningRate 0.0204   Epoch: 10   Global Step: 62410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:54:18,275-Speed 3416.95 samples/sec   Loss 3.9117   LearningRate 0.0203   Epoch: 10   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:21,297-Speed 3389.77 samples/sec   Loss 3.7204   LearningRate 0.0203   Epoch: 10   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:24,312-Speed 3396.32 samples/sec   Loss 3.7384   LearningRate 0.0203   Epoch: 10   Global Step: 62440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:27,323-Speed 3401.82 samples/sec   Loss 3.6436   LearningRate 0.0203   Epoch: 10   Global Step: 62450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:30,335-Speed 3400.99 samples/sec   Loss 3.7846   LearningRate 0.0203   Epoch: 10   Global Step: 62460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:33,353-Speed 3393.66 samples/sec   Loss 3.6555   LearningRate 0.0203   Epoch: 10   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:36,370-Speed 3395.45 samples/sec   Loss 3.8557   LearningRate 0.0203   Epoch: 10   Global Step: 62480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:39,380-Speed 3402.17 samples/sec   Loss 3.7243   LearningRate 0.0203   Epoch: 10   Global Step: 62490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:42,392-Speed 3400.86 samples/sec   Loss 3.8155   LearningRate 0.0203   Epoch: 10   Global Step: 62500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:45,404-Speed 3400.30 samples/sec   Loss 3.7403   LearningRate 0.0203   Epoch: 10   Global Step: 62510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:54:48,398-Speed 3420.61 samples/sec   Loss 3.7612   LearningRate 0.0203   Epoch: 10   Global Step: 62520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:54:51,413-Speed 3397.49 samples/sec   Loss 3.8102   LearningRate 0.0203   Epoch: 10   Global Step: 62530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:54:54,504-Speed 3313.54 samples/sec   Loss 3.5876   LearningRate 0.0203   Epoch: 10   Global Step: 62540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:07,887-Speed 765.22 samples/sec   Loss 3.5473   LearningRate 0.0202   Epoch: 11   Global Step: 62550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:10,896-Speed 3404.54 samples/sec   Loss 3.1289   LearningRate 0.0202   Epoch: 11   Global Step: 62560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:13,907-Speed 3401.24 samples/sec   Loss 3.1046   LearningRate 0.0202   Epoch: 11   Global Step: 62570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:16,951-Speed 3364.41 samples/sec   Loss 3.1197   LearningRate 0.0202   Epoch: 11   Global Step: 62580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:19,962-Speed 3402.33 samples/sec   Loss 3.0536   LearningRate 0.0202   Epoch: 11   Global Step: 62590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:22,997-Speed 3374.36 samples/sec   Loss 3.0800   LearningRate 0.0202   Epoch: 11   Global Step: 62600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:26,016-Speed 3392.43 samples/sec   Loss 3.1922   LearningRate 0.0202   Epoch: 11   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:29,030-Speed 3398.49 samples/sec   Loss 3.1146   LearningRate 0.0202   Epoch: 11   Global Step: 62620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:55:32,037-Speed 3405.62 samples/sec   Loss 3.1629   LearningRate 0.0202   Epoch: 11   Global Step: 62630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:55:35,070-Speed 3377.68 samples/sec   Loss 3.2593   LearningRate 0.0202   Epoch: 11   Global Step: 62640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:55:38,062-Speed 3423.39 samples/sec   Loss 3.2655   LearningRate 0.0202   Epoch: 11   Global Step: 62650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:41,075-Speed 3399.16 samples/sec   Loss 3.1322   LearningRate 0.0202   Epoch: 11   Global Step: 62660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:44,147-Speed 3333.61 samples/sec   Loss 3.1653   LearningRate 0.0202   Epoch: 11   Global Step: 62670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:47,159-Speed 3400.75 samples/sec   Loss 3.1240   LearningRate 0.0201   Epoch: 11   Global Step: 62680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:50,180-Speed 3390.42 samples/sec   Loss 3.2560   LearningRate 0.0201   Epoch: 11   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:53,191-Speed 3401.31 samples/sec   Loss 3.2585   LearningRate 0.0201   Epoch: 11   Global Step: 62700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:56,207-Speed 3396.80 samples/sec   Loss 3.1824   LearningRate 0.0201   Epoch: 11   Global Step: 62710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:55:59,224-Speed 3394.47 samples/sec   Loss 3.2148   LearningRate 0.0201   Epoch: 11   Global Step: 62720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:56:02,239-Speed 3397.12 samples/sec   Loss 3.1168   LearningRate 0.0201   Epoch: 11   Global Step: 62730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:56:05,260-Speed 3390.81 samples/sec   Loss 3.1792   LearningRate 0.0201   Epoch: 11   Global Step: 62740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:56:08,278-Speed 3393.68 samples/sec   Loss 3.2078   LearningRate 0.0201   Epoch: 11   Global Step: 62750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:11,308-Speed 3380.62 samples/sec   Loss 3.2176   LearningRate 0.0201   Epoch: 11   Global Step: 62760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:14,336-Speed 3382.61 samples/sec   Loss 3.2045   LearningRate 0.0201   Epoch: 11   Global Step: 62770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:17,352-Speed 3395.56 samples/sec   Loss 3.2517   LearningRate 0.0201   Epoch: 11   Global Step: 62780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:20,372-Speed 3391.45 samples/sec   Loss 3.1792   LearningRate 0.0201   Epoch: 11   Global Step: 62790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:23,387-Speed 3396.70 samples/sec   Loss 3.1961   LearningRate 0.0200   Epoch: 11   Global Step: 62800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:26,467-Speed 3326.14 samples/sec   Loss 3.2413   LearningRate 0.0200   Epoch: 11   Global Step: 62810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:29,497-Speed 3380.33 samples/sec   Loss 3.1004   LearningRate 0.0200   Epoch: 11   Global Step: 62820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:32,539-Speed 3367.48 samples/sec   Loss 3.1933   LearningRate 0.0200   Epoch: 11   Global Step: 62830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:35,573-Speed 3375.49 samples/sec   Loss 3.3266   LearningRate 0.0200   Epoch: 11   Global Step: 62840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:38,605-Speed 3378.49 samples/sec   Loss 3.2982   LearningRate 0.0200   Epoch: 11   Global Step: 62850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:56:41,830-Speed 3175.83 samples/sec   Loss 3.2976   LearningRate 0.0200   Epoch: 11   Global Step: 62860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:44,855-Speed 3385.41 samples/sec   Loss 3.2370   LearningRate 0.0200   Epoch: 11   Global Step: 62870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:47,889-Speed 3376.16 samples/sec   Loss 3.2307   LearningRate 0.0200   Epoch: 11   Global Step: 62880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:50,907-Speed 3393.63 samples/sec   Loss 3.1985   LearningRate 0.0200   Epoch: 11   Global Step: 62890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:53,929-Speed 3390.17 samples/sec   Loss 3.2149   LearningRate 0.0200   Epoch: 11   Global Step: 62900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:56,947-Speed 3393.33 samples/sec   Loss 3.2211   LearningRate 0.0200   Epoch: 11   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:56:59,988-Speed 3367.52 samples/sec   Loss 3.2819   LearningRate 0.0200   Epoch: 11   Global Step: 62920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:03,033-Speed 3364.31 samples/sec   Loss 3.2723   LearningRate 0.0199   Epoch: 11   Global Step: 62930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:06,054-Speed 3390.04 samples/sec   Loss 3.2939   LearningRate 0.0199   Epoch: 11   Global Step: 62940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:09,085-Speed 3379.21 samples/sec   Loss 3.1091   LearningRate 0.0199   Epoch: 11   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:12,107-Speed 3389.12 samples/sec   Loss 3.2856   LearningRate 0.0199   Epoch: 11   Global Step: 62960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:57:15,137-Speed 3380.17 samples/sec   Loss 3.2208   LearningRate 0.0199   Epoch: 11   Global Step: 62970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:57:18,153-Speed 3395.91 samples/sec   Loss 3.3291   LearningRate 0.0199   Epoch: 11   Global Step: 62980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:57:21,178-Speed 3386.93 samples/sec   Loss 3.2748   LearningRate 0.0199   Epoch: 11   Global Step: 62990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:57:24,216-Speed 3370.85 samples/sec   Loss 3.1623   LearningRate 0.0199   Epoch: 11   Global Step: 63000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:57:27,222-Speed 3407.41 samples/sec   Loss 3.2349   LearningRate 0.0199   Epoch: 11   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:30,252-Speed 3380.21 samples/sec   Loss 3.2783   LearningRate 0.0199   Epoch: 11   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:33,271-Speed 3392.35 samples/sec   Loss 3.3263   LearningRate 0.0199   Epoch: 11   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:36,299-Speed 3383.30 samples/sec   Loss 3.3572   LearningRate 0.0199   Epoch: 11   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:39,322-Speed 3387.25 samples/sec   Loss 3.1999   LearningRate 0.0199   Epoch: 11   Global Step: 63050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:42,340-Speed 3393.68 samples/sec   Loss 3.3614   LearningRate 0.0198   Epoch: 11   Global Step: 63060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:45,357-Speed 3395.45 samples/sec   Loss 3.3152   LearningRate 0.0198   Epoch: 11   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:48,391-Speed 3375.94 samples/sec   Loss 3.3675   LearningRate 0.0198   Epoch: 11   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:51,438-Speed 3361.49 samples/sec   Loss 3.3673   LearningRate 0.0198   Epoch: 11   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:54,456-Speed 3393.92 samples/sec   Loss 3.3208   LearningRate 0.0198   Epoch: 11   Global Step: 63100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:57:57,474-Speed 3393.50 samples/sec   Loss 3.2518   LearningRate 0.0198   Epoch: 11   Global Step: 63110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:00,492-Speed 3394.08 samples/sec   Loss 3.3787   LearningRate 0.0198   Epoch: 11   Global Step: 63120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:03,510-Speed 3393.83 samples/sec   Loss 3.2248   LearningRate 0.0198   Epoch: 11   Global Step: 63130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:06,528-Speed 3393.23 samples/sec   Loss 3.3407   LearningRate 0.0198   Epoch: 11   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:09,553-Speed 3385.85 samples/sec   Loss 3.4516   LearningRate 0.0198   Epoch: 11   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:12,627-Speed 3332.32 samples/sec   Loss 3.2256   LearningRate 0.0198   Epoch: 11   Global Step: 63160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:15,647-Speed 3391.96 samples/sec   Loss 3.3308   LearningRate 0.0198   Epoch: 11   Global Step: 63170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:18,665-Speed 3393.58 samples/sec   Loss 3.3203   LearningRate 0.0198   Epoch: 11   Global Step: 63180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:21,684-Speed 3392.32 samples/sec   Loss 3.2190   LearningRate 0.0197   Epoch: 11   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:24,715-Speed 3379.40 samples/sec   Loss 3.2484   LearningRate 0.0197   Epoch: 11   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:27,736-Speed 3390.45 samples/sec   Loss 3.3674   LearningRate 0.0197   Epoch: 11   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:30,757-Speed 3389.86 samples/sec   Loss 3.2977   LearningRate 0.0197   Epoch: 11   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:33,790-Speed 3376.71 samples/sec   Loss 3.4232   LearningRate 0.0197   Epoch: 11   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:36,812-Speed 3390.47 samples/sec   Loss 3.3603   LearningRate 0.0197   Epoch: 11   Global Step: 63240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:39,833-Speed 3391.42 samples/sec   Loss 3.3494   LearningRate 0.0197   Epoch: 11   Global Step: 63250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:42,855-Speed 3389.28 samples/sec   Loss 3.3561   LearningRate 0.0197   Epoch: 11   Global Step: 63260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:45,899-Speed 3364.40 samples/sec   Loss 3.3299   LearningRate 0.0197   Epoch: 11   Global Step: 63270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:48,929-Speed 3380.27 samples/sec   Loss 3.3891   LearningRate 0.0197   Epoch: 11   Global Step: 63280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:58:51,935-Speed 3406.71 samples/sec   Loss 3.3943   LearningRate 0.0197   Epoch: 11   Global Step: 63290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:54,952-Speed 3395.68 samples/sec   Loss 3.3838   LearningRate 0.0197   Epoch: 11   Global Step: 63300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:58:57,971-Speed 3392.35 samples/sec   Loss 3.4993   LearningRate 0.0196   Epoch: 11   Global Step: 63310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:01,094-Speed 3279.66 samples/sec   Loss 3.2212   LearningRate 0.0196   Epoch: 11   Global Step: 63320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:04,137-Speed 3366.33 samples/sec   Loss 3.4607   LearningRate 0.0196   Epoch: 11   Global Step: 63330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:07,155-Speed 3393.41 samples/sec   Loss 3.4488   LearningRate 0.0196   Epoch: 11   Global Step: 63340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:10,191-Speed 3373.81 samples/sec   Loss 3.2850   LearningRate 0.0196   Epoch: 11   Global Step: 63350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:13,213-Speed 3389.14 samples/sec   Loss 3.3526   LearningRate 0.0196   Epoch: 11   Global Step: 63360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:16,245-Speed 3378.53 samples/sec   Loss 3.3664   LearningRate 0.0196   Epoch: 11   Global Step: 63370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:19,273-Speed 3382.09 samples/sec   Loss 3.2073   LearningRate 0.0196   Epoch: 11   Global Step: 63380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:22,290-Speed 3394.56 samples/sec   Loss 3.2868   LearningRate 0.0196   Epoch: 11   Global Step: 63390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 07:59:25,309-Speed 3393.63 samples/sec   Loss 3.2843   LearningRate 0.0196   Epoch: 11   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:28,339-Speed 3379.94 samples/sec   Loss 3.3097   LearningRate 0.0196   Epoch: 11   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:31,364-Speed 3385.60 samples/sec   Loss 3.3947   LearningRate 0.0196   Epoch: 11   Global Step: 63420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:34,389-Speed 3386.24 samples/sec   Loss 3.2192   LearningRate 0.0196   Epoch: 11   Global Step: 63430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:37,409-Speed 3391.92 samples/sec   Loss 3.3444   LearningRate 0.0195   Epoch: 11   Global Step: 63440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:40,431-Speed 3389.17 samples/sec   Loss 3.4113   LearningRate 0.0195   Epoch: 11   Global Step: 63450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:43,454-Speed 3388.39 samples/sec   Loss 3.3516   LearningRate 0.0195   Epoch: 11   Global Step: 63460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:46,475-Speed 3390.29 samples/sec   Loss 3.3270   LearningRate 0.0195   Epoch: 11   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:49,498-Speed 3388.02 samples/sec   Loss 3.3147   LearningRate 0.0195   Epoch: 11   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 07:59:52,514-Speed 3395.26 samples/sec   Loss 3.2654   LearningRate 0.0195   Epoch: 11   Global Step: 63490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:59:55,540-Speed 3385.23 samples/sec   Loss 3.3780   LearningRate 0.0195   Epoch: 11   Global Step: 63500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 07:59:58,564-Speed 3386.88 samples/sec   Loss 3.3786   LearningRate 0.0195   Epoch: 11   Global Step: 63510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:01,586-Speed 3389.08 samples/sec   Loss 3.4400   LearningRate 0.0195   Epoch: 11   Global Step: 63520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:04,614-Speed 3383.44 samples/sec   Loss 3.4781   LearningRate 0.0195   Epoch: 11   Global Step: 63530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:07,637-Speed 3387.55 samples/sec   Loss 3.4296   LearningRate 0.0195   Epoch: 11   Global Step: 63540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:10,658-Speed 3390.33 samples/sec   Loss 3.4116   LearningRate 0.0195   Epoch: 11   Global Step: 63550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:13,680-Speed 3389.71 samples/sec   Loss 3.3740   LearningRate 0.0195   Epoch: 11   Global Step: 63560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:16,705-Speed 3385.77 samples/sec   Loss 3.3809   LearningRate 0.0194   Epoch: 11   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:19,722-Speed 3394.88 samples/sec   Loss 3.2573   LearningRate 0.0194   Epoch: 11   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:22,742-Speed 3391.42 samples/sec   Loss 3.4146   LearningRate 0.0194   Epoch: 11   Global Step: 63590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:00:25,779-Speed 3372.52 samples/sec   Loss 3.4550   LearningRate 0.0194   Epoch: 11   Global Step: 63600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:00:28,792-Speed 3399.37 samples/sec   Loss 3.4038   LearningRate 0.0194   Epoch: 11   Global Step: 63610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:31,815-Speed 3388.55 samples/sec   Loss 3.4442   LearningRate 0.0194   Epoch: 11   Global Step: 63620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:34,843-Speed 3382.72 samples/sec   Loss 3.3287   LearningRate 0.0194   Epoch: 11   Global Step: 63630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:37,988-Speed 3256.46 samples/sec   Loss 3.3781   LearningRate 0.0194   Epoch: 11   Global Step: 63640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:41,017-Speed 3380.60 samples/sec   Loss 3.3400   LearningRate 0.0194   Epoch: 11   Global Step: 63650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:44,039-Speed 3389.70 samples/sec   Loss 3.3826   LearningRate 0.0194   Epoch: 11   Global Step: 63660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:47,063-Speed 3387.05 samples/sec   Loss 3.4226   LearningRate 0.0194   Epoch: 11   Global Step: 63670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:50,089-Speed 3384.34 samples/sec   Loss 3.4221   LearningRate 0.0194   Epoch: 11   Global Step: 63680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:53,112-Speed 3388.68 samples/sec   Loss 3.4352   LearningRate 0.0194   Epoch: 11   Global Step: 63690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:56,132-Speed 3392.75 samples/sec   Loss 3.4534   LearningRate 0.0193   Epoch: 11   Global Step: 63700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:00:59,159-Speed 3383.71 samples/sec   Loss 3.4379   LearningRate 0.0193   Epoch: 11   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:02,189-Speed 3380.38 samples/sec   Loss 3.3729   LearningRate 0.0193   Epoch: 11   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:05,208-Speed 3392.17 samples/sec   Loss 3.4560   LearningRate 0.0193   Epoch: 11   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:08,229-Speed 3389.70 samples/sec   Loss 3.3741   LearningRate 0.0193   Epoch: 11   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:11,251-Speed 3390.12 samples/sec   Loss 3.3978   LearningRate 0.0193   Epoch: 11   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:14,281-Speed 3380.31 samples/sec   Loss 3.4548   LearningRate 0.0193   Epoch: 11   Global Step: 63760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:17,302-Speed 3389.98 samples/sec   Loss 3.4295   LearningRate 0.0193   Epoch: 11   Global Step: 63770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:20,321-Speed 3392.45 samples/sec   Loss 3.4013   LearningRate 0.0193   Epoch: 11   Global Step: 63780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:23,348-Speed 3384.54 samples/sec   Loss 3.4471   LearningRate 0.0193   Epoch: 11   Global Step: 63790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:26,374-Speed 3384.26 samples/sec   Loss 3.3389   LearningRate 0.0193   Epoch: 11   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:29,395-Speed 3390.77 samples/sec   Loss 3.3812   LearningRate 0.0193   Epoch: 11   Global Step: 63810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:01:32,410-Speed 3396.75 samples/sec   Loss 3.3366   LearningRate 0.0193   Epoch: 11   Global Step: 63820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:01:35,412-Speed 3411.39 samples/sec   Loss 3.4842   LearningRate 0.0192   Epoch: 11   Global Step: 63830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:38,435-Speed 3388.28 samples/sec   Loss 3.3165   LearningRate 0.0192   Epoch: 11   Global Step: 63840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:41,461-Speed 3385.51 samples/sec   Loss 3.3325   LearningRate 0.0192   Epoch: 11   Global Step: 63850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:44,480-Speed 3391.85 samples/sec   Loss 3.3726   LearningRate 0.0192   Epoch: 11   Global Step: 63860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:47,528-Speed 3360.69 samples/sec   Loss 3.4189   LearningRate 0.0192   Epoch: 11   Global Step: 63870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:50,552-Speed 3387.50 samples/sec   Loss 3.4534   LearningRate 0.0192   Epoch: 11   Global Step: 63880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:53,576-Speed 3388.57 samples/sec   Loss 3.3889   LearningRate 0.0192   Epoch: 11   Global Step: 63890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:56,598-Speed 3389.25 samples/sec   Loss 3.3781   LearningRate 0.0192   Epoch: 11   Global Step: 63900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:01:59,624-Speed 3384.72 samples/sec   Loss 3.3787   LearningRate 0.0192   Epoch: 11   Global Step: 63910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:02,651-Speed 3383.61 samples/sec   Loss 3.3459   LearningRate 0.0192   Epoch: 11   Global Step: 63920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:05,676-Speed 3386.41 samples/sec   Loss 3.3584   LearningRate 0.0192   Epoch: 11   Global Step: 63930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:02:08,691-Speed 3396.64 samples/sec   Loss 3.5367   LearningRate 0.0192   Epoch: 11   Global Step: 63940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:11,711-Speed 3391.87 samples/sec   Loss 3.2836   LearningRate 0.0192   Epoch: 11   Global Step: 63950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:14,732-Speed 3389.81 samples/sec   Loss 3.3991   LearningRate 0.0191   Epoch: 11   Global Step: 63960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:17,766-Speed 3376.59 samples/sec   Loss 3.4144   LearningRate 0.0191   Epoch: 11   Global Step: 63970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:20,792-Speed 3384.82 samples/sec   Loss 3.4707   LearningRate 0.0191   Epoch: 11   Global Step: 63980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:23,832-Speed 3369.22 samples/sec   Loss 3.4198   LearningRate 0.0191   Epoch: 11   Global Step: 63990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:02:26,866-Speed 3375.60 samples/sec   Loss 3.3848   LearningRate 0.0191   Epoch: 11   Global Step: 64000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:03:10,168-[lfw][64000]XNorm: 21.939068
Training: 2022-04-27 08:03:10,169-[lfw][64000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-27 08:03:10,169-[lfw][64000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:04:00,531-[cfp_fp][64000]XNorm: 20.491285
Training: 2022-04-27 08:04:00,532-[cfp_fp][64000]Accuracy-Flip: 0.97400+-0.01008
Training: 2022-04-27 08:04:00,532-[cfp_fp][64000]Accuracy-Highest: 0.97400
Training: 2022-04-27 08:04:44,220-[agedb_30][64000]XNorm: 22.025909
Training: 2022-04-27 08:04:44,221-[agedb_30][64000]Accuracy-Flip: 0.97683+-0.00765
Training: 2022-04-27 08:04:44,221-[agedb_30][64000]Accuracy-Highest: 0.97767
Training: 2022-04-27 08:04:47,246-Speed 72.95 samples/sec   Loss 3.4210   LearningRate 0.0191   Epoch: 11   Global Step: 64010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:04:50,251-Speed 3408.32 samples/sec   Loss 3.4782   LearningRate 0.0191   Epoch: 11   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:04:53,261-Speed 3403.01 samples/sec   Loss 3.4240   LearningRate 0.0191   Epoch: 11   Global Step: 64030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:04:56,270-Speed 3403.97 samples/sec   Loss 3.4292   LearningRate 0.0191   Epoch: 11   Global Step: 64040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:04:59,285-Speed 3397.46 samples/sec   Loss 3.5166   LearningRate 0.0191   Epoch: 11   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:02,308-Speed 3387.70 samples/sec   Loss 3.4186   LearningRate 0.0191   Epoch: 11   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:05,327-Speed 3392.57 samples/sec   Loss 3.5245   LearningRate 0.0191   Epoch: 11   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:08,360-Speed 3377.73 samples/sec   Loss 3.3486   LearningRate 0.0191   Epoch: 11   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:11,375-Speed 3396.94 samples/sec   Loss 3.4325   LearningRate 0.0190   Epoch: 11   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:14,396-Speed 3390.76 samples/sec   Loss 3.3594   LearningRate 0.0190   Epoch: 11   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:17,416-Speed 3391.07 samples/sec   Loss 3.4261   LearningRate 0.0190   Epoch: 11   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:20,438-Speed 3389.21 samples/sec   Loss 3.5156   LearningRate 0.0190   Epoch: 11   Global Step: 64120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:23,466-Speed 3382.55 samples/sec   Loss 3.4605   LearningRate 0.0190   Epoch: 11   Global Step: 64130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:26,497-Speed 3379.46 samples/sec   Loss 3.4187   LearningRate 0.0190   Epoch: 11   Global Step: 64140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:29,542-Speed 3364.18 samples/sec   Loss 3.5057   LearningRate 0.0190   Epoch: 11   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:05:32,550-Speed 3404.59 samples/sec   Loss 3.3230   LearningRate 0.0190   Epoch: 11   Global Step: 64160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:35,582-Speed 3377.47 samples/sec   Loss 3.4600   LearningRate 0.0190   Epoch: 11   Global Step: 64170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:38,615-Speed 3377.22 samples/sec   Loss 3.6355   LearningRate 0.0190   Epoch: 11   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:41,638-Speed 3388.37 samples/sec   Loss 3.2765   LearningRate 0.0190   Epoch: 11   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:44,667-Speed 3381.04 samples/sec   Loss 3.4356   LearningRate 0.0190   Epoch: 11   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:47,755-Speed 3316.98 samples/sec   Loss 3.4831   LearningRate 0.0190   Epoch: 11   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:50,781-Speed 3385.63 samples/sec   Loss 3.4699   LearningRate 0.0189   Epoch: 11   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:53,810-Speed 3380.55 samples/sec   Loss 3.4709   LearningRate 0.0189   Epoch: 11   Global Step: 64230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:56,840-Speed 3380.95 samples/sec   Loss 3.4988   LearningRate 0.0189   Epoch: 11   Global Step: 64240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:05:59,877-Speed 3372.84 samples/sec   Loss 3.4143   LearningRate 0.0189   Epoch: 11   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:02,909-Speed 3377.00 samples/sec   Loss 3.4787   LearningRate 0.0189   Epoch: 11   Global Step: 64260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:06:05,945-Speed 3373.98 samples/sec   Loss 3.4285   LearningRate 0.0189   Epoch: 11   Global Step: 64270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:06:08,961-Speed 3395.79 samples/sec   Loss 3.4047   LearningRate 0.0189   Epoch: 11   Global Step: 64280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:11,998-Speed 3372.46 samples/sec   Loss 3.4251   LearningRate 0.0189   Epoch: 11   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:15,024-Speed 3385.85 samples/sec   Loss 3.5757   LearningRate 0.0189   Epoch: 11   Global Step: 64300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:18,051-Speed 3383.70 samples/sec   Loss 3.3585   LearningRate 0.0189   Epoch: 11   Global Step: 64310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:21,071-Speed 3391.10 samples/sec   Loss 3.5725   LearningRate 0.0189   Epoch: 11   Global Step: 64320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:24,093-Speed 3389.53 samples/sec   Loss 3.5105   LearningRate 0.0189   Epoch: 11   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:27,122-Speed 3380.52 samples/sec   Loss 3.4318   LearningRate 0.0189   Epoch: 11   Global Step: 64340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:30,139-Speed 3395.32 samples/sec   Loss 3.3756   LearningRate 0.0188   Epoch: 11   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:33,165-Speed 3385.12 samples/sec   Loss 3.4451   LearningRate 0.0188   Epoch: 11   Global Step: 64360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:36,219-Speed 3353.15 samples/sec   Loss 3.3952   LearningRate 0.0188   Epoch: 11   Global Step: 64370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:39,257-Speed 3371.62 samples/sec   Loss 3.4092   LearningRate 0.0188   Epoch: 11   Global Step: 64380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:06:42,258-Speed 3412.83 samples/sec   Loss 3.3987   LearningRate 0.0188   Epoch: 11   Global Step: 64390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:45,272-Speed 3399.48 samples/sec   Loss 3.4100   LearningRate 0.0188   Epoch: 11   Global Step: 64400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:48,290-Speed 3392.73 samples/sec   Loss 3.5642   LearningRate 0.0188   Epoch: 11   Global Step: 64410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:51,311-Speed 3390.53 samples/sec   Loss 3.4788   LearningRate 0.0188   Epoch: 11   Global Step: 64420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:54,336-Speed 3385.55 samples/sec   Loss 3.3885   LearningRate 0.0188   Epoch: 11   Global Step: 64430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:06:57,359-Speed 3389.07 samples/sec   Loss 3.3967   LearningRate 0.0188   Epoch: 11   Global Step: 64440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:00,383-Speed 3386.03 samples/sec   Loss 3.4664   LearningRate 0.0188   Epoch: 11   Global Step: 64450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:03,446-Speed 3344.10 samples/sec   Loss 3.4631   LearningRate 0.0188   Epoch: 11   Global Step: 64460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:06,461-Speed 3397.64 samples/sec   Loss 3.4686   LearningRate 0.0188   Epoch: 11   Global Step: 64470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:09,487-Speed 3384.99 samples/sec   Loss 3.5016   LearningRate 0.0187   Epoch: 11   Global Step: 64480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:12,481-Speed 3420.73 samples/sec   Loss 3.5109   LearningRate 0.0187   Epoch: 11   Global Step: 64490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:15,499-Speed 3393.34 samples/sec   Loss 3.4019   LearningRate 0.0187   Epoch: 11   Global Step: 64500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:18,517-Speed 3393.79 samples/sec   Loss 3.5207   LearningRate 0.0187   Epoch: 11   Global Step: 64510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:21,526-Speed 3404.61 samples/sec   Loss 3.4331   LearningRate 0.0187   Epoch: 11   Global Step: 64520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:24,546-Speed 3390.79 samples/sec   Loss 3.5217   LearningRate 0.0187   Epoch: 11   Global Step: 64530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:27,566-Speed 3392.21 samples/sec   Loss 3.3339   LearningRate 0.0187   Epoch: 11   Global Step: 64540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:30,585-Speed 3392.05 samples/sec   Loss 3.5686   LearningRate 0.0187   Epoch: 11   Global Step: 64550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:33,596-Speed 3402.37 samples/sec   Loss 3.4936   LearningRate 0.0187   Epoch: 11   Global Step: 64560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:36,634-Speed 3370.73 samples/sec   Loss 3.4961   LearningRate 0.0187   Epoch: 11   Global Step: 64570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:39,647-Speed 3400.15 samples/sec   Loss 3.4380   LearningRate 0.0187   Epoch: 11   Global Step: 64580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:42,673-Speed 3384.33 samples/sec   Loss 3.4982   LearningRate 0.0187   Epoch: 11   Global Step: 64590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:07:45,687-Speed 3398.46 samples/sec   Loss 3.4427   LearningRate 0.0187   Epoch: 11   Global Step: 64600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:07:48,684-Speed 3418.29 samples/sec   Loss 3.4104   LearningRate 0.0186   Epoch: 11   Global Step: 64610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:51,699-Speed 3396.75 samples/sec   Loss 3.4377   LearningRate 0.0186   Epoch: 11   Global Step: 64620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:54,717-Speed 3393.96 samples/sec   Loss 3.5225   LearningRate 0.0186   Epoch: 11   Global Step: 64630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:07:57,741-Speed 3386.57 samples/sec   Loss 3.4617   LearningRate 0.0186   Epoch: 11   Global Step: 64640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:00,754-Speed 3399.65 samples/sec   Loss 3.4230   LearningRate 0.0186   Epoch: 11   Global Step: 64650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:03,770-Speed 3396.57 samples/sec   Loss 3.3660   LearningRate 0.0186   Epoch: 11   Global Step: 64660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:06,783-Speed 3399.00 samples/sec   Loss 3.3653   LearningRate 0.0186   Epoch: 11   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:09,801-Speed 3394.12 samples/sec   Loss 3.5638   LearningRate 0.0186   Epoch: 11   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:12,828-Speed 3383.45 samples/sec   Loss 3.4554   LearningRate 0.0186   Epoch: 11   Global Step: 64690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:15,877-Speed 3358.97 samples/sec   Loss 3.4276   LearningRate 0.0186   Epoch: 11   Global Step: 64700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:18,897-Speed 3391.20 samples/sec   Loss 3.5188   LearningRate 0.0186   Epoch: 11   Global Step: 64710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:08:21,910-Speed 3399.42 samples/sec   Loss 3.4045   LearningRate 0.0186   Epoch: 11   Global Step: 64720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:08:24,929-Speed 3392.43 samples/sec   Loss 3.4417   LearningRate 0.0186   Epoch: 11   Global Step: 64730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:08:28,003-Speed 3332.11 samples/sec   Loss 3.3845   LearningRate 0.0186   Epoch: 11   Global Step: 64740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:08:31,017-Speed 3398.56 samples/sec   Loss 3.3812   LearningRate 0.0185   Epoch: 11   Global Step: 64750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:08:34,013-Speed 3419.02 samples/sec   Loss 3.4616   LearningRate 0.0185   Epoch: 11   Global Step: 64760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:37,026-Speed 3399.14 samples/sec   Loss 3.4914   LearningRate 0.0185   Epoch: 11   Global Step: 64770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:40,049-Speed 3387.58 samples/sec   Loss 3.3656   LearningRate 0.0185   Epoch: 11   Global Step: 64780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:43,063-Speed 3398.89 samples/sec   Loss 3.4648   LearningRate 0.0185   Epoch: 11   Global Step: 64790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:46,091-Speed 3382.67 samples/sec   Loss 3.4156   LearningRate 0.0185   Epoch: 11   Global Step: 64800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:49,120-Speed 3381.15 samples/sec   Loss 3.4891   LearningRate 0.0185   Epoch: 11   Global Step: 64810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:52,138-Speed 3393.50 samples/sec   Loss 3.4752   LearningRate 0.0185   Epoch: 11   Global Step: 64820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:08:55,142-Speed 3410.04 samples/sec   Loss 3.5096   LearningRate 0.0185   Epoch: 11   Global Step: 64830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:08:58,162-Speed 3391.60 samples/sec   Loss 3.4554   LearningRate 0.0185   Epoch: 11   Global Step: 64840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:01,187-Speed 3386.61 samples/sec   Loss 3.4821   LearningRate 0.0185   Epoch: 11   Global Step: 64850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:04,223-Speed 3373.45 samples/sec   Loss 3.5652   LearningRate 0.0185   Epoch: 11   Global Step: 64860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:07,242-Speed 3391.92 samples/sec   Loss 3.3746   LearningRate 0.0185   Epoch: 11   Global Step: 64870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:10,261-Speed 3393.17 samples/sec   Loss 3.4499   LearningRate 0.0184   Epoch: 11   Global Step: 64880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:13,286-Speed 3385.08 samples/sec   Loss 3.4800   LearningRate 0.0184   Epoch: 11   Global Step: 64890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:16,309-Speed 3388.64 samples/sec   Loss 3.5469   LearningRate 0.0184   Epoch: 11   Global Step: 64900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:19,336-Speed 3384.26 samples/sec   Loss 3.4200   LearningRate 0.0184   Epoch: 11   Global Step: 64910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:22,404-Speed 3337.78 samples/sec   Loss 3.3706   LearningRate 0.0184   Epoch: 11   Global Step: 64920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:09:25,602-Speed 3202.91 samples/sec   Loss 3.5145   LearningRate 0.0184   Epoch: 11   Global Step: 64930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:28,670-Speed 3338.44 samples/sec   Loss 3.4667   LearningRate 0.0184   Epoch: 11   Global Step: 64940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:31,684-Speed 3398.29 samples/sec   Loss 3.4585   LearningRate 0.0184   Epoch: 11   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:34,716-Speed 3378.25 samples/sec   Loss 3.4687   LearningRate 0.0184   Epoch: 11   Global Step: 64960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:37,752-Speed 3373.10 samples/sec   Loss 3.3989   LearningRate 0.0184   Epoch: 11   Global Step: 64970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:40,769-Speed 3395.02 samples/sec   Loss 3.4532   LearningRate 0.0184   Epoch: 11   Global Step: 64980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:43,791-Speed 3389.79 samples/sec   Loss 3.3398   LearningRate 0.0184   Epoch: 11   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:46,811-Speed 3390.65 samples/sec   Loss 3.4506   LearningRate 0.0184   Epoch: 11   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:49,871-Speed 3348.25 samples/sec   Loss 3.4219   LearningRate 0.0183   Epoch: 11   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:52,888-Speed 3394.69 samples/sec   Loss 3.4908   LearningRate 0.0183   Epoch: 11   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:09:55,907-Speed 3392.88 samples/sec   Loss 3.4340   LearningRate 0.0183   Epoch: 11   Global Step: 65030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:09:58,936-Speed 3381.23 samples/sec   Loss 3.5767   LearningRate 0.0183   Epoch: 11   Global Step: 65040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:10:01,965-Speed 3381.84 samples/sec   Loss 3.3924   LearningRate 0.0183   Epoch: 11   Global Step: 65050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:10:04,981-Speed 3394.99 samples/sec   Loss 3.3737   LearningRate 0.0183   Epoch: 11   Global Step: 65060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:10:07,996-Speed 3397.86 samples/sec   Loss 3.4644   LearningRate 0.0183   Epoch: 11   Global Step: 65070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:10:11,027-Speed 3378.38 samples/sec   Loss 3.4166   LearningRate 0.0183   Epoch: 11   Global Step: 65080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:10:14,043-Speed 3397.30 samples/sec   Loss 3.3208   LearningRate 0.0183   Epoch: 11   Global Step: 65090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:10:17,043-Speed 3414.16 samples/sec   Loss 3.5615   LearningRate 0.0183   Epoch: 11   Global Step: 65100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:20,070-Speed 3383.76 samples/sec   Loss 3.5209   LearningRate 0.0183   Epoch: 11   Global Step: 65110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:23,108-Speed 3371.04 samples/sec   Loss 3.4330   LearningRate 0.0183   Epoch: 11   Global Step: 65120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:26,128-Speed 3391.86 samples/sec   Loss 3.4963   LearningRate 0.0183   Epoch: 11   Global Step: 65130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:29,152-Speed 3386.48 samples/sec   Loss 3.3363   LearningRate 0.0182   Epoch: 11   Global Step: 65140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:32,169-Speed 3394.96 samples/sec   Loss 3.3569   LearningRate 0.0182   Epoch: 11   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:35,186-Speed 3395.25 samples/sec   Loss 3.4424   LearningRate 0.0182   Epoch: 11   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:38,216-Speed 3380.15 samples/sec   Loss 3.3864   LearningRate 0.0182   Epoch: 11   Global Step: 65170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:41,246-Speed 3379.70 samples/sec   Loss 3.5329   LearningRate 0.0182   Epoch: 11   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:44,262-Speed 3396.92 samples/sec   Loss 3.5377   LearningRate 0.0182   Epoch: 11   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:47,267-Speed 3408.01 samples/sec   Loss 3.4704   LearningRate 0.0182   Epoch: 11   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:50,284-Speed 3394.89 samples/sec   Loss 3.4394   LearningRate 0.0182   Epoch: 11   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:53,307-Speed 3388.25 samples/sec   Loss 3.3760   LearningRate 0.0182   Epoch: 11   Global Step: 65220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:56,333-Speed 3385.36 samples/sec   Loss 3.5616   LearningRate 0.0182   Epoch: 11   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:10:59,395-Speed 3344.14 samples/sec   Loss 3.5576   LearningRate 0.0182   Epoch: 11   Global Step: 65240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:02,446-Speed 3357.12 samples/sec   Loss 3.4528   LearningRate 0.0182   Epoch: 11   Global Step: 65250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:05,472-Speed 3385.24 samples/sec   Loss 3.5467   LearningRate 0.0182   Epoch: 11   Global Step: 65260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:08,519-Speed 3360.87 samples/sec   Loss 3.3904   LearningRate 0.0182   Epoch: 11   Global Step: 65270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:11,538-Speed 3393.08 samples/sec   Loss 3.4772   LearningRate 0.0181   Epoch: 11   Global Step: 65280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:14,589-Speed 3356.91 samples/sec   Loss 3.3551   LearningRate 0.0181   Epoch: 11   Global Step: 65290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:17,609-Speed 3391.32 samples/sec   Loss 3.5364   LearningRate 0.0181   Epoch: 11   Global Step: 65300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:11:20,632-Speed 3388.30 samples/sec   Loss 3.4532   LearningRate 0.0181   Epoch: 11   Global Step: 65310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:11:23,676-Speed 3364.80 samples/sec   Loss 3.4342   LearningRate 0.0181   Epoch: 11   Global Step: 65320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:26,818-Speed 3260.01 samples/sec   Loss 3.4332   LearningRate 0.0181   Epoch: 11   Global Step: 65330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:29,852-Speed 3375.51 samples/sec   Loss 3.4100   LearningRate 0.0181   Epoch: 11   Global Step: 65340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:32,876-Speed 3387.13 samples/sec   Loss 3.4764   LearningRate 0.0181   Epoch: 11   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:35,892-Speed 3397.86 samples/sec   Loss 3.3992   LearningRate 0.0181   Epoch: 11   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:38,928-Speed 3372.68 samples/sec   Loss 3.5752   LearningRate 0.0181   Epoch: 11   Global Step: 65370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:41,947-Speed 3393.36 samples/sec   Loss 3.3902   LearningRate 0.0181   Epoch: 11   Global Step: 65380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:44,978-Speed 3379.59 samples/sec   Loss 3.5452   LearningRate 0.0181   Epoch: 11   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:48,000-Speed 3388.85 samples/sec   Loss 3.3700   LearningRate 0.0181   Epoch: 11   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:51,031-Speed 3379.46 samples/sec   Loss 3.4515   LearningRate 0.0180   Epoch: 11   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:11:54,056-Speed 3386.19 samples/sec   Loss 3.4369   LearningRate 0.0180   Epoch: 11   Global Step: 65420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:11:57,082-Speed 3384.07 samples/sec   Loss 3.3743   LearningRate 0.0180   Epoch: 11   Global Step: 65430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:12:00,102-Speed 3391.53 samples/sec   Loss 3.3900   LearningRate 0.0180   Epoch: 11   Global Step: 65440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:12:03,108-Speed 3406.89 samples/sec   Loss 3.5109   LearningRate 0.0180   Epoch: 11   Global Step: 65450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:06,130-Speed 3389.62 samples/sec   Loss 3.4523   LearningRate 0.0180   Epoch: 11   Global Step: 65460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:09,148-Speed 3393.82 samples/sec   Loss 3.4967   LearningRate 0.0180   Epoch: 11   Global Step: 65470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:12,168-Speed 3392.38 samples/sec   Loss 3.4280   LearningRate 0.0180   Epoch: 11   Global Step: 65480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:15,204-Speed 3372.60 samples/sec   Loss 3.3916   LearningRate 0.0180   Epoch: 11   Global Step: 65490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:18,233-Speed 3381.30 samples/sec   Loss 3.3491   LearningRate 0.0180   Epoch: 11   Global Step: 65500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:21,251-Speed 3393.76 samples/sec   Loss 3.4870   LearningRate 0.0180   Epoch: 11   Global Step: 65510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:24,287-Speed 3374.46 samples/sec   Loss 3.4487   LearningRate 0.0180   Epoch: 11   Global Step: 65520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:27,309-Speed 3388.52 samples/sec   Loss 3.4025   LearningRate 0.0180   Epoch: 11   Global Step: 65530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:30,328-Speed 3393.34 samples/sec   Loss 3.5256   LearningRate 0.0179   Epoch: 11   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:33,352-Speed 3387.23 samples/sec   Loss 3.5010   LearningRate 0.0179   Epoch: 11   Global Step: 65550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:12:36,378-Speed 3385.00 samples/sec   Loss 3.4879   LearningRate 0.0179   Epoch: 11   Global Step: 65560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:12:39,390-Speed 3401.19 samples/sec   Loss 3.4180   LearningRate 0.0179   Epoch: 11   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:42,411-Speed 3390.50 samples/sec   Loss 3.4313   LearningRate 0.0179   Epoch: 11   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:45,433-Speed 3388.84 samples/sec   Loss 3.2947   LearningRate 0.0179   Epoch: 11   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:48,457-Speed 3386.86 samples/sec   Loss 3.3678   LearningRate 0.0179   Epoch: 11   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:51,481-Speed 3387.42 samples/sec   Loss 3.4665   LearningRate 0.0179   Epoch: 11   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:12:54,490-Speed 3403.50 samples/sec   Loss 3.4612   LearningRate 0.0179   Epoch: 11   Global Step: 65620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:12:57,513-Speed 3388.36 samples/sec   Loss 3.4294   LearningRate 0.0179   Epoch: 11   Global Step: 65630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:00,536-Speed 3388.66 samples/sec   Loss 3.4425   LearningRate 0.0179   Epoch: 11   Global Step: 65640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:03,571-Speed 3374.08 samples/sec   Loss 3.4962   LearningRate 0.0179   Epoch: 11   Global Step: 65650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:06,611-Speed 3369.65 samples/sec   Loss 3.3940   LearningRate 0.0179   Epoch: 11   Global Step: 65660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:09,630-Speed 3392.55 samples/sec   Loss 3.5260   LearningRate 0.0179   Epoch: 11   Global Step: 65670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:12,672-Speed 3367.60 samples/sec   Loss 3.4316   LearningRate 0.0178   Epoch: 11   Global Step: 65680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:15,699-Speed 3382.86 samples/sec   Loss 3.3968   LearningRate 0.0178   Epoch: 11   Global Step: 65690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:18,720-Speed 3390.78 samples/sec   Loss 3.5084   LearningRate 0.0178   Epoch: 11   Global Step: 65700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:21,741-Speed 3390.44 samples/sec   Loss 3.4902   LearningRate 0.0178   Epoch: 11   Global Step: 65710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:13:24,763-Speed 3389.83 samples/sec   Loss 3.3969   LearningRate 0.0178   Epoch: 11   Global Step: 65720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:27,786-Speed 3387.74 samples/sec   Loss 3.3948   LearningRate 0.0178   Epoch: 11   Global Step: 65730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:30,808-Speed 3389.12 samples/sec   Loss 3.4558   LearningRate 0.0178   Epoch: 11   Global Step: 65740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:33,834-Speed 3384.79 samples/sec   Loss 3.4086   LearningRate 0.0178   Epoch: 11   Global Step: 65750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:36,862-Speed 3382.85 samples/sec   Loss 3.3664   LearningRate 0.0178   Epoch: 11   Global Step: 65760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:39,883-Speed 3389.78 samples/sec   Loss 3.4947   LearningRate 0.0178   Epoch: 11   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:42,905-Speed 3390.03 samples/sec   Loss 3.5046   LearningRate 0.0178   Epoch: 11   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:45,931-Speed 3384.36 samples/sec   Loss 3.5199   LearningRate 0.0178   Epoch: 11   Global Step: 65790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:48,956-Speed 3386.39 samples/sec   Loss 3.4377   LearningRate 0.0178   Epoch: 11   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:51,982-Speed 3384.93 samples/sec   Loss 3.3442   LearningRate 0.0177   Epoch: 11   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:13:55,014-Speed 3377.72 samples/sec   Loss 3.4356   LearningRate 0.0177   Epoch: 11   Global Step: 65820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:13:58,036-Speed 3389.53 samples/sec   Loss 3.4360   LearningRate 0.0177   Epoch: 11   Global Step: 65830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:14:01,061-Speed 3387.72 samples/sec   Loss 3.4667   LearningRate 0.0177   Epoch: 11   Global Step: 65840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:14:04,084-Speed 3388.28 samples/sec   Loss 3.5611   LearningRate 0.0177   Epoch: 11   Global Step: 65850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:14:07,109-Speed 3385.83 samples/sec   Loss 3.5120   LearningRate 0.0177   Epoch: 11   Global Step: 65860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:14:10,113-Speed 3409.61 samples/sec   Loss 3.4025   LearningRate 0.0177   Epoch: 11   Global Step: 65870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:13,144-Speed 3379.94 samples/sec   Loss 3.2493   LearningRate 0.0177   Epoch: 11   Global Step: 65880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:16,167-Speed 3388.35 samples/sec   Loss 3.5670   LearningRate 0.0177   Epoch: 11   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:19,186-Speed 3392.36 samples/sec   Loss 3.4389   LearningRate 0.0177   Epoch: 11   Global Step: 65900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:22,208-Speed 3389.81 samples/sec   Loss 3.3970   LearningRate 0.0177   Epoch: 11   Global Step: 65910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:25,239-Speed 3378.82 samples/sec   Loss 3.4251   LearningRate 0.0177   Epoch: 11   Global Step: 65920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:28,269-Speed 3379.75 samples/sec   Loss 3.2980   LearningRate 0.0177   Epoch: 11   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:31,290-Speed 3390.81 samples/sec   Loss 3.4584   LearningRate 0.0177   Epoch: 11   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:34,311-Speed 3390.87 samples/sec   Loss 3.4434   LearningRate 0.0176   Epoch: 11   Global Step: 65950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:37,334-Speed 3388.00 samples/sec   Loss 3.3593   LearningRate 0.0176   Epoch: 11   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:14:40,359-Speed 3386.17 samples/sec   Loss 3.4714   LearningRate 0.0176   Epoch: 11   Global Step: 65970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:14:43,379-Speed 3391.52 samples/sec   Loss 3.3870   LearningRate 0.0176   Epoch: 11   Global Step: 65980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:14:46,402-Speed 3387.85 samples/sec   Loss 3.4181   LearningRate 0.0176   Epoch: 11   Global Step: 65990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:14:49,424-Speed 3389.65 samples/sec   Loss 3.3979   LearningRate 0.0176   Epoch: 11   Global Step: 66000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:15:32,974-[lfw][66000]XNorm: 23.685091
Training: 2022-04-27 08:15:32,975-[lfw][66000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-27 08:15:32,975-[lfw][66000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:16:23,612-[cfp_fp][66000]XNorm: 21.866582
Training: 2022-04-27 08:16:23,612-[cfp_fp][66000]Accuracy-Flip: 0.97529+-0.00676
Training: 2022-04-27 08:16:23,613-[cfp_fp][66000]Accuracy-Highest: 0.97529
Training: 2022-04-27 08:17:07,162-[agedb_30][66000]XNorm: 23.529873
Training: 2022-04-27 08:17:07,162-[agedb_30][66000]Accuracy-Flip: 0.97883+-0.00637
Training: 2022-04-27 08:17:07,163-[agedb_30][66000]Accuracy-Highest: 0.97883
Training: 2022-04-27 08:17:10,179-Speed 72.75 samples/sec   Loss 3.3346   LearningRate 0.0176   Epoch: 11   Global Step: 66010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:17:13,183-Speed 3409.43 samples/sec   Loss 3.3210   LearningRate 0.0176   Epoch: 11   Global Step: 66020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:16,194-Speed 3402.06 samples/sec   Loss 3.4365   LearningRate 0.0176   Epoch: 11   Global Step: 66030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:19,203-Speed 3403.96 samples/sec   Loss 3.4645   LearningRate 0.0176   Epoch: 11   Global Step: 66040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:22,247-Speed 3364.51 samples/sec   Loss 3.4769   LearningRate 0.0176   Epoch: 11   Global Step: 66050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:25,266-Speed 3392.84 samples/sec   Loss 3.3127   LearningRate 0.0176   Epoch: 11   Global Step: 66060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:28,286-Speed 3391.35 samples/sec   Loss 3.3632   LearningRate 0.0176   Epoch: 11   Global Step: 66070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:31,300-Speed 3398.24 samples/sec   Loss 3.3079   LearningRate 0.0175   Epoch: 11   Global Step: 66080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:34,321-Speed 3390.28 samples/sec   Loss 3.4729   LearningRate 0.0175   Epoch: 11   Global Step: 66090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:37,344-Speed 3388.51 samples/sec   Loss 3.3793   LearningRate 0.0175   Epoch: 11   Global Step: 66100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:40,362-Speed 3394.24 samples/sec   Loss 3.4864   LearningRate 0.0175   Epoch: 11   Global Step: 66110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:43,378-Speed 3394.87 samples/sec   Loss 3.4623   LearningRate 0.0175   Epoch: 11   Global Step: 66120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:17:46,400-Speed 3389.38 samples/sec   Loss 3.4698   LearningRate 0.0175   Epoch: 11   Global Step: 66130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:17:49,408-Speed 3405.57 samples/sec   Loss 3.4513   LearningRate 0.0175   Epoch: 11   Global Step: 66140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:52,430-Speed 3389.29 samples/sec   Loss 3.3297   LearningRate 0.0175   Epoch: 11   Global Step: 66150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:55,449-Speed 3392.49 samples/sec   Loss 3.4600   LearningRate 0.0175   Epoch: 11   Global Step: 66160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:17:58,466-Speed 3394.94 samples/sec   Loss 3.5585   LearningRate 0.0175   Epoch: 11   Global Step: 66170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:18:01,480-Speed 3398.13 samples/sec   Loss 3.4961   LearningRate 0.0175   Epoch: 11   Global Step: 66180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:18:04,496-Speed 3396.30 samples/sec   Loss 3.3245   LearningRate 0.0175   Epoch: 11   Global Step: 66190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:18:07,519-Speed 3388.05 samples/sec   Loss 3.4449   LearningRate 0.0175   Epoch: 11   Global Step: 66200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:18:10,545-Speed 3384.77 samples/sec   Loss 3.4612   LearningRate 0.0175   Epoch: 11   Global Step: 66210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:18:13,579-Speed 3375.88 samples/sec   Loss 3.4349   LearningRate 0.0174   Epoch: 11   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:18:16,597-Speed 3393.56 samples/sec   Loss 3.3983   LearningRate 0.0174   Epoch: 11   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:18:19,617-Speed 3391.48 samples/sec   Loss 3.4708   LearningRate 0.0174   Epoch: 11   Global Step: 66240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:22,636-Speed 3393.35 samples/sec   Loss 3.2947   LearningRate 0.0174   Epoch: 11   Global Step: 66250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:25,655-Speed 3392.31 samples/sec   Loss 3.4058   LearningRate 0.0174   Epoch: 11   Global Step: 66260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:28,674-Speed 3392.77 samples/sec   Loss 3.4960   LearningRate 0.0174   Epoch: 11   Global Step: 66270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:31,705-Speed 3379.18 samples/sec   Loss 3.2654   LearningRate 0.0174   Epoch: 11   Global Step: 66280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:34,734-Speed 3381.24 samples/sec   Loss 3.4727   LearningRate 0.0174   Epoch: 11   Global Step: 66290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:37,753-Speed 3393.24 samples/sec   Loss 3.3767   LearningRate 0.0174   Epoch: 11   Global Step: 66300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:40,774-Speed 3390.02 samples/sec   Loss 3.3324   LearningRate 0.0174   Epoch: 11   Global Step: 66310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:43,798-Speed 3386.49 samples/sec   Loss 3.4343   LearningRate 0.0174   Epoch: 11   Global Step: 66320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:46,815-Speed 3395.37 samples/sec   Loss 3.3663   LearningRate 0.0174   Epoch: 11   Global Step: 66330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:49,815-Speed 3413.82 samples/sec   Loss 3.4367   LearningRate 0.0174   Epoch: 11   Global Step: 66340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:52,838-Speed 3388.25 samples/sec   Loss 3.3043   LearningRate 0.0174   Epoch: 11   Global Step: 66350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:55,857-Speed 3393.07 samples/sec   Loss 3.3487   LearningRate 0.0173   Epoch: 11   Global Step: 66360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:18:58,873-Speed 3395.20 samples/sec   Loss 3.3750   LearningRate 0.0173   Epoch: 11   Global Step: 66370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:19:01,894-Speed 3391.25 samples/sec   Loss 3.4233   LearningRate 0.0173   Epoch: 11   Global Step: 66380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:19:04,914-Speed 3390.77 samples/sec   Loss 3.4509   LearningRate 0.0173   Epoch: 11   Global Step: 66390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:19:07,935-Speed 3390.93 samples/sec   Loss 3.3017   LearningRate 0.0173   Epoch: 11   Global Step: 66400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:19:10,963-Speed 3382.52 samples/sec   Loss 3.3640   LearningRate 0.0173   Epoch: 11   Global Step: 66410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:19:13,984-Speed 3390.48 samples/sec   Loss 3.4920   LearningRate 0.0173   Epoch: 11   Global Step: 66420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:19:17,000-Speed 3395.46 samples/sec   Loss 3.3615   LearningRate 0.0173   Epoch: 11   Global Step: 66430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:19:20,017-Speed 3395.84 samples/sec   Loss 3.3945   LearningRate 0.0173   Epoch: 11   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:23,035-Speed 3392.83 samples/sec   Loss 3.5175   LearningRate 0.0173   Epoch: 11   Global Step: 66450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:26,054-Speed 3392.53 samples/sec   Loss 3.4143   LearningRate 0.0173   Epoch: 11   Global Step: 66460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:29,072-Speed 3394.12 samples/sec   Loss 3.4368   LearningRate 0.0173   Epoch: 11   Global Step: 66470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:32,089-Speed 3395.46 samples/sec   Loss 3.3322   LearningRate 0.0173   Epoch: 11   Global Step: 66480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:35,118-Speed 3381.06 samples/sec   Loss 3.3719   LearningRate 0.0172   Epoch: 11   Global Step: 66490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:38,129-Speed 3401.45 samples/sec   Loss 3.4837   LearningRate 0.0172   Epoch: 11   Global Step: 66500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:41,147-Speed 3394.29 samples/sec   Loss 3.5140   LearningRate 0.0172   Epoch: 11   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:44,158-Speed 3400.86 samples/sec   Loss 3.4506   LearningRate 0.0172   Epoch: 11   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:47,234-Speed 3330.42 samples/sec   Loss 3.3794   LearningRate 0.0172   Epoch: 11   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:50,245-Speed 3402.03 samples/sec   Loss 3.3138   LearningRate 0.0172   Epoch: 11   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:53,257-Speed 3400.26 samples/sec   Loss 3.4003   LearningRate 0.0172   Epoch: 11   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:56,272-Speed 3397.13 samples/sec   Loss 3.2916   LearningRate 0.0172   Epoch: 11   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:19:59,289-Speed 3394.58 samples/sec   Loss 3.3176   LearningRate 0.0172   Epoch: 11   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:02,312-Speed 3387.87 samples/sec   Loss 3.3654   LearningRate 0.0172   Epoch: 11   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:05,328-Speed 3396.34 samples/sec   Loss 3.3565   LearningRate 0.0172   Epoch: 11   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:08,346-Speed 3393.35 samples/sec   Loss 3.4398   LearningRate 0.0172   Epoch: 11   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:11,362-Speed 3396.12 samples/sec   Loss 3.4847   LearningRate 0.0172   Epoch: 11   Global Step: 66610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:14,444-Speed 3323.66 samples/sec   Loss 3.4349   LearningRate 0.0172   Epoch: 11   Global Step: 66620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:17,464-Speed 3391.41 samples/sec   Loss 3.2691   LearningRate 0.0171   Epoch: 11   Global Step: 66630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:20,478-Speed 3397.89 samples/sec   Loss 3.3479   LearningRate 0.0171   Epoch: 11   Global Step: 66640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:20:23,498-Speed 3391.42 samples/sec   Loss 3.3211   LearningRate 0.0171   Epoch: 11   Global Step: 66650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:20:26,520-Speed 3390.02 samples/sec   Loss 3.3832   LearningRate 0.0171   Epoch: 11   Global Step: 66660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:29,545-Speed 3385.46 samples/sec   Loss 3.4093   LearningRate 0.0171   Epoch: 11   Global Step: 66670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:32,563-Speed 3394.39 samples/sec   Loss 3.3747   LearningRate 0.0171   Epoch: 11   Global Step: 66680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:35,587-Speed 3386.46 samples/sec   Loss 3.3384   LearningRate 0.0171   Epoch: 11   Global Step: 66690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:38,610-Speed 3388.24 samples/sec   Loss 3.3380   LearningRate 0.0171   Epoch: 11   Global Step: 66700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:41,628-Speed 3393.99 samples/sec   Loss 3.3985   LearningRate 0.0171   Epoch: 11   Global Step: 66710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:44,650-Speed 3389.49 samples/sec   Loss 3.3848   LearningRate 0.0171   Epoch: 11   Global Step: 66720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:47,672-Speed 3389.23 samples/sec   Loss 3.3498   LearningRate 0.0171   Epoch: 11   Global Step: 66730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:50,694-Speed 3389.27 samples/sec   Loss 3.3804   LearningRate 0.0171   Epoch: 11   Global Step: 66740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:53,714-Speed 3391.41 samples/sec   Loss 3.5045   LearningRate 0.0171   Epoch: 11   Global Step: 66750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:20:56,731-Speed 3394.43 samples/sec   Loss 3.3818   LearningRate 0.0171   Epoch: 11   Global Step: 66760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:20:59,771-Speed 3369.70 samples/sec   Loss 3.4732   LearningRate 0.0170   Epoch: 11   Global Step: 66770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:21:02,802-Speed 3379.38 samples/sec   Loss 3.4360   LearningRate 0.0170   Epoch: 11   Global Step: 66780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:21:05,823-Speed 3390.28 samples/sec   Loss 3.3163   LearningRate 0.0170   Epoch: 11   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:21:08,833-Speed 3402.76 samples/sec   Loss 3.4476   LearningRate 0.0170   Epoch: 11   Global Step: 66800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:11,859-Speed 3384.64 samples/sec   Loss 3.4322   LearningRate 0.0170   Epoch: 11   Global Step: 66810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:14,880-Speed 3390.57 samples/sec   Loss 3.3209   LearningRate 0.0170   Epoch: 11   Global Step: 66820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:17,906-Speed 3384.42 samples/sec   Loss 3.4116   LearningRate 0.0170   Epoch: 11   Global Step: 66830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:20,924-Speed 3394.03 samples/sec   Loss 3.4411   LearningRate 0.0170   Epoch: 11   Global Step: 66840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:23,946-Speed 3388.61 samples/sec   Loss 3.4672   LearningRate 0.0170   Epoch: 11   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:26,983-Speed 3372.84 samples/sec   Loss 3.3101   LearningRate 0.0170   Epoch: 11   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:30,004-Speed 3391.26 samples/sec   Loss 3.3071   LearningRate 0.0170   Epoch: 11   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:33,033-Speed 3381.37 samples/sec   Loss 3.3916   LearningRate 0.0170   Epoch: 11   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:36,054-Speed 3389.84 samples/sec   Loss 3.3933   LearningRate 0.0170   Epoch: 11   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:39,081-Speed 3384.22 samples/sec   Loss 3.3400   LearningRate 0.0170   Epoch: 11   Global Step: 66900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:21:42,082-Speed 3412.78 samples/sec   Loss 3.3826   LearningRate 0.0169   Epoch: 11   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:45,102-Speed 3390.97 samples/sec   Loss 3.3333   LearningRate 0.0169   Epoch: 11   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:48,134-Speed 3378.42 samples/sec   Loss 3.3568   LearningRate 0.0169   Epoch: 11   Global Step: 66930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:51,169-Speed 3374.82 samples/sec   Loss 3.3193   LearningRate 0.0169   Epoch: 11   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:54,192-Speed 3387.51 samples/sec   Loss 3.4486   LearningRate 0.0169   Epoch: 11   Global Step: 66950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:21:57,212-Speed 3392.50 samples/sec   Loss 3.3912   LearningRate 0.0169   Epoch: 11   Global Step: 66960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:00,233-Speed 3390.10 samples/sec   Loss 3.3876   LearningRate 0.0169   Epoch: 11   Global Step: 66970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:03,257-Speed 3387.92 samples/sec   Loss 3.4256   LearningRate 0.0169   Epoch: 11   Global Step: 66980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:06,280-Speed 3387.68 samples/sec   Loss 3.3882   LearningRate 0.0169   Epoch: 11   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:09,307-Speed 3383.12 samples/sec   Loss 3.3222   LearningRate 0.0169   Epoch: 11   Global Step: 67000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:12,307-Speed 3414.46 samples/sec   Loss 3.3743   LearningRate 0.0169   Epoch: 11   Global Step: 67010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:15,333-Speed 3384.44 samples/sec   Loss 3.4086   LearningRate 0.0169   Epoch: 11   Global Step: 67020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:18,353-Speed 3391.97 samples/sec   Loss 3.3278   LearningRate 0.0169   Epoch: 11   Global Step: 67030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:21,379-Speed 3385.04 samples/sec   Loss 3.5178   LearningRate 0.0168   Epoch: 11   Global Step: 67040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:24,409-Speed 3379.67 samples/sec   Loss 3.3449   LearningRate 0.0168   Epoch: 11   Global Step: 67050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:27,429-Speed 3392.48 samples/sec   Loss 3.3462   LearningRate 0.0168   Epoch: 11   Global Step: 67060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:30,530-Speed 3303.16 samples/sec   Loss 3.4362   LearningRate 0.0168   Epoch: 11   Global Step: 67070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:33,556-Speed 3384.10 samples/sec   Loss 3.4962   LearningRate 0.0168   Epoch: 11   Global Step: 67080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:36,585-Speed 3382.07 samples/sec   Loss 3.3801   LearningRate 0.0168   Epoch: 11   Global Step: 67090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:39,661-Speed 3329.10 samples/sec   Loss 3.4329   LearningRate 0.0168   Epoch: 11   Global Step: 67100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:22:42,689-Speed 3383.08 samples/sec   Loss 3.4756   LearningRate 0.0168   Epoch: 11   Global Step: 67110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:45,712-Speed 3387.85 samples/sec   Loss 3.3394   LearningRate 0.0168   Epoch: 11   Global Step: 67120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:48,740-Speed 3383.41 samples/sec   Loss 3.3270   LearningRate 0.0168   Epoch: 11   Global Step: 67130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:51,761-Speed 3389.90 samples/sec   Loss 3.3908   LearningRate 0.0168   Epoch: 11   Global Step: 67140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:54,790-Speed 3382.00 samples/sec   Loss 3.2892   LearningRate 0.0168   Epoch: 11   Global Step: 67150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:22:57,830-Speed 3368.42 samples/sec   Loss 3.3477   LearningRate 0.0168   Epoch: 11   Global Step: 67160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:00,854-Speed 3387.00 samples/sec   Loss 3.3742   LearningRate 0.0168   Epoch: 11   Global Step: 67170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:03,886-Speed 3378.44 samples/sec   Loss 3.3982   LearningRate 0.0167   Epoch: 11   Global Step: 67180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:06,906-Speed 3392.09 samples/sec   Loss 3.3198   LearningRate 0.0167   Epoch: 11   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:09,934-Speed 3381.50 samples/sec   Loss 3.3959   LearningRate 0.0167   Epoch: 11   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:12,961-Speed 3384.59 samples/sec   Loss 3.4501   LearningRate 0.0167   Epoch: 11   Global Step: 67210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:23:15,981-Speed 3391.29 samples/sec   Loss 3.2135   LearningRate 0.0167   Epoch: 11   Global Step: 67220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:23:19,009-Speed 3382.33 samples/sec   Loss 3.3142   LearningRate 0.0167   Epoch: 11   Global Step: 67230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:23:22,024-Speed 3397.54 samples/sec   Loss 3.4198   LearningRate 0.0167   Epoch: 11   Global Step: 67240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:25,056-Speed 3378.18 samples/sec   Loss 3.5283   LearningRate 0.0167   Epoch: 11   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:28,132-Speed 3329.13 samples/sec   Loss 3.3344   LearningRate 0.0167   Epoch: 11   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:31,167-Speed 3375.41 samples/sec   Loss 3.5074   LearningRate 0.0167   Epoch: 11   Global Step: 67270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:34,192-Speed 3385.90 samples/sec   Loss 3.3564   LearningRate 0.0167   Epoch: 11   Global Step: 67280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:37,230-Speed 3371.57 samples/sec   Loss 3.4072   LearningRate 0.0167   Epoch: 11   Global Step: 67290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:40,253-Speed 3387.66 samples/sec   Loss 3.2399   LearningRate 0.0167   Epoch: 11   Global Step: 67300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:43,273-Speed 3391.28 samples/sec   Loss 3.3224   LearningRate 0.0167   Epoch: 11   Global Step: 67310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:46,298-Speed 3386.85 samples/sec   Loss 3.4250   LearningRate 0.0166   Epoch: 11   Global Step: 67320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:49,348-Speed 3357.69 samples/sec   Loss 3.3594   LearningRate 0.0166   Epoch: 11   Global Step: 67330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:52,374-Speed 3385.02 samples/sec   Loss 3.4164   LearningRate 0.0166   Epoch: 11   Global Step: 67340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:23:55,381-Speed 3406.04 samples/sec   Loss 3.3787   LearningRate 0.0166   Epoch: 11   Global Step: 67350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:23:58,434-Speed 3355.52 samples/sec   Loss 3.3751   LearningRate 0.0166   Epoch: 11   Global Step: 67360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:01,464-Speed 3380.29 samples/sec   Loss 3.4173   LearningRate 0.0166   Epoch: 11   Global Step: 67370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:04,487-Speed 3387.29 samples/sec   Loss 3.3658   LearningRate 0.0166   Epoch: 11   Global Step: 67380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:07,520-Speed 3377.57 samples/sec   Loss 3.3979   LearningRate 0.0166   Epoch: 11   Global Step: 67390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:10,562-Speed 3367.06 samples/sec   Loss 3.3110   LearningRate 0.0166   Epoch: 11   Global Step: 67400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:13,586-Speed 3386.93 samples/sec   Loss 3.2086   LearningRate 0.0166   Epoch: 11   Global Step: 67410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:16,612-Speed 3384.70 samples/sec   Loss 3.2803   LearningRate 0.0166   Epoch: 11   Global Step: 67420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:19,637-Speed 3386.17 samples/sec   Loss 3.3638   LearningRate 0.0166   Epoch: 11   Global Step: 67430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:22,682-Speed 3363.62 samples/sec   Loss 3.3456   LearningRate 0.0166   Epoch: 11   Global Step: 67440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:25,672-Speed 3425.21 samples/sec   Loss 3.4739   LearningRate 0.0166   Epoch: 11   Global Step: 67450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:28,706-Speed 3376.24 samples/sec   Loss 3.3060   LearningRate 0.0165   Epoch: 11   Global Step: 67460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:31,737-Speed 3379.27 samples/sec   Loss 3.3969   LearningRate 0.0165   Epoch: 11   Global Step: 67470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:34,763-Speed 3384.65 samples/sec   Loss 3.3815   LearningRate 0.0165   Epoch: 11   Global Step: 67480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:37,784-Speed 3390.18 samples/sec   Loss 3.3357   LearningRate 0.0165   Epoch: 11   Global Step: 67490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:40,811-Speed 3384.24 samples/sec   Loss 3.2427   LearningRate 0.0165   Epoch: 11   Global Step: 67500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:43,831-Speed 3391.39 samples/sec   Loss 3.3742   LearningRate 0.0165   Epoch: 11   Global Step: 67510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:46,883-Speed 3356.03 samples/sec   Loss 3.2680   LearningRate 0.0165   Epoch: 11   Global Step: 67520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:49,904-Speed 3390.44 samples/sec   Loss 3.3558   LearningRate 0.0165   Epoch: 11   Global Step: 67530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:52,929-Speed 3386.46 samples/sec   Loss 3.4339   LearningRate 0.0165   Epoch: 11   Global Step: 67540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:24:55,951-Speed 3389.03 samples/sec   Loss 3.3837   LearningRate 0.0165   Epoch: 11   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:24:58,977-Speed 3384.51 samples/sec   Loss 3.3473   LearningRate 0.0165   Epoch: 11   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:02,016-Speed 3370.77 samples/sec   Loss 3.3748   LearningRate 0.0165   Epoch: 11   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:05,040-Speed 3386.68 samples/sec   Loss 3.3780   LearningRate 0.0165   Epoch: 11   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:08,063-Speed 3388.52 samples/sec   Loss 3.3998   LearningRate 0.0165   Epoch: 11   Global Step: 67590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:11,088-Speed 3387.68 samples/sec   Loss 3.2757   LearningRate 0.0164   Epoch: 11   Global Step: 67600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:14,110-Speed 3388.54 samples/sec   Loss 3.3297   LearningRate 0.0164   Epoch: 11   Global Step: 67610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:17,132-Speed 3389.25 samples/sec   Loss 3.3020   LearningRate 0.0164   Epoch: 11   Global Step: 67620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:20,153-Speed 3390.98 samples/sec   Loss 3.3579   LearningRate 0.0164   Epoch: 11   Global Step: 67630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:23,176-Speed 3387.38 samples/sec   Loss 3.3758   LearningRate 0.0164   Epoch: 11   Global Step: 67640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:26,204-Speed 3382.51 samples/sec   Loss 3.3781   LearningRate 0.0164   Epoch: 11   Global Step: 67650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:25:29,235-Speed 3379.72 samples/sec   Loss 3.2800   LearningRate 0.0164   Epoch: 11   Global Step: 67660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:25:32,261-Speed 3384.70 samples/sec   Loss 3.4153   LearningRate 0.0164   Epoch: 11   Global Step: 67670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:25:35,272-Speed 3401.34 samples/sec   Loss 3.3082   LearningRate 0.0164   Epoch: 11   Global Step: 67680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:38,300-Speed 3382.41 samples/sec   Loss 3.3089   LearningRate 0.0164   Epoch: 11   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:41,329-Speed 3381.60 samples/sec   Loss 3.4552   LearningRate 0.0164   Epoch: 11   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:44,353-Speed 3387.26 samples/sec   Loss 3.2924   LearningRate 0.0164   Epoch: 11   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:47,379-Speed 3385.24 samples/sec   Loss 3.3534   LearningRate 0.0164   Epoch: 11   Global Step: 67720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:50,417-Speed 3371.41 samples/sec   Loss 3.2419   LearningRate 0.0164   Epoch: 11   Global Step: 67730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:53,448-Speed 3378.53 samples/sec   Loss 3.4036   LearningRate 0.0163   Epoch: 11   Global Step: 67740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:56,477-Speed 3381.00 samples/sec   Loss 3.4112   LearningRate 0.0163   Epoch: 11   Global Step: 67750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:25:59,509-Speed 3378.66 samples/sec   Loss 3.2303   LearningRate 0.0163   Epoch: 11   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:02,540-Speed 3380.02 samples/sec   Loss 3.3828   LearningRate 0.0163   Epoch: 11   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:05,570-Speed 3379.58 samples/sec   Loss 3.3256   LearningRate 0.0163   Epoch: 11   Global Step: 67780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:26:08,597-Speed 3384.37 samples/sec   Loss 3.4438   LearningRate 0.0163   Epoch: 11   Global Step: 67790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:26:11,632-Speed 3373.92 samples/sec   Loss 3.2518   LearningRate 0.0163   Epoch: 11   Global Step: 67800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:26:14,658-Speed 3384.61 samples/sec   Loss 3.2360   LearningRate 0.0163   Epoch: 11   Global Step: 67810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:26:17,700-Speed 3367.80 samples/sec   Loss 3.5287   LearningRate 0.0163   Epoch: 11   Global Step: 67820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:26:20,731-Speed 3378.89 samples/sec   Loss 3.3873   LearningRate 0.0163   Epoch: 11   Global Step: 67830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:26:23,745-Speed 3398.65 samples/sec   Loss 3.3616   LearningRate 0.0163   Epoch: 11   Global Step: 67840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:26,779-Speed 3375.51 samples/sec   Loss 3.3628   LearningRate 0.0163   Epoch: 11   Global Step: 67850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:29,803-Speed 3387.70 samples/sec   Loss 3.4400   LearningRate 0.0163   Epoch: 11   Global Step: 67860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:32,825-Speed 3389.87 samples/sec   Loss 3.3060   LearningRate 0.0163   Epoch: 11   Global Step: 67870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:35,853-Speed 3381.43 samples/sec   Loss 3.4042   LearningRate 0.0162   Epoch: 11   Global Step: 67880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:38,904-Speed 3357.01 samples/sec   Loss 3.3699   LearningRate 0.0162   Epoch: 11   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:41,934-Speed 3380.63 samples/sec   Loss 3.2219   LearningRate 0.0162   Epoch: 11   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:44,964-Speed 3380.25 samples/sec   Loss 3.4291   LearningRate 0.0162   Epoch: 11   Global Step: 67910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:48,003-Speed 3370.12 samples/sec   Loss 3.3905   LearningRate 0.0162   Epoch: 11   Global Step: 67920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:51,026-Speed 3388.86 samples/sec   Loss 3.3550   LearningRate 0.0162   Epoch: 11   Global Step: 67930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:26:54,065-Speed 3369.88 samples/sec   Loss 3.3510   LearningRate 0.0162   Epoch: 11   Global Step: 67940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:26:57,159-Speed 3310.84 samples/sec   Loss 3.4221   LearningRate 0.0162   Epoch: 11   Global Step: 67950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:27:00,190-Speed 3379.46 samples/sec   Loss 3.3226   LearningRate 0.0162   Epoch: 11   Global Step: 67960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:27:03,220-Speed 3379.96 samples/sec   Loss 3.4101   LearningRate 0.0162   Epoch: 11   Global Step: 67970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:27:06,250-Speed 3379.77 samples/sec   Loss 3.2079   LearningRate 0.0162   Epoch: 11   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:27:09,276-Speed 3384.75 samples/sec   Loss 3.3844   LearningRate 0.0162   Epoch: 11   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:27:12,300-Speed 3387.81 samples/sec   Loss 3.3353   LearningRate 0.0162   Epoch: 11   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:27:55,883-[lfw][68000]XNorm: 22.379005
Training: 2022-04-27 08:27:55,883-[lfw][68000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-27 08:27:55,884-[lfw][68000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:28:46,686-[cfp_fp][68000]XNorm: 21.270001
Training: 2022-04-27 08:28:46,686-[cfp_fp][68000]Accuracy-Flip: 0.97186+-0.00729
Training: 2022-04-27 08:28:46,687-[cfp_fp][68000]Accuracy-Highest: 0.97529
Training: 2022-04-27 08:29:30,153-[agedb_30][68000]XNorm: 22.876901
Training: 2022-04-27 08:29:30,154-[agedb_30][68000]Accuracy-Flip: 0.97950+-0.00587
Training: 2022-04-27 08:29:30,154-[agedb_30][68000]Accuracy-Highest: 0.97950
Training: 2022-04-27 08:29:33,171-Speed 72.69 samples/sec   Loss 3.3702   LearningRate 0.0162   Epoch: 11   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:29:36,174-Speed 3410.33 samples/sec   Loss 3.3172   LearningRate 0.0161   Epoch: 11   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:29:39,177-Speed 3410.97 samples/sec   Loss 3.3371   LearningRate 0.0161   Epoch: 11   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:29:42,184-Speed 3406.40 samples/sec   Loss 3.2081   LearningRate 0.0161   Epoch: 11   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:29:45,200-Speed 3395.61 samples/sec   Loss 3.3177   LearningRate 0.0161   Epoch: 11   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:29:48,214-Speed 3398.33 samples/sec   Loss 3.3769   LearningRate 0.0161   Epoch: 11   Global Step: 68060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:29:51,230-Speed 3396.59 samples/sec   Loss 3.2798   LearningRate 0.0161   Epoch: 11   Global Step: 68070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:29:54,245-Speed 3396.37 samples/sec   Loss 3.2475   LearningRate 0.0161   Epoch: 11   Global Step: 68080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:29:57,266-Speed 3390.99 samples/sec   Loss 3.3300   LearningRate 0.0161   Epoch: 11   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:30:00,273-Speed 3405.96 samples/sec   Loss 3.1644   LearningRate 0.0161   Epoch: 11   Global Step: 68100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:03,295-Speed 3389.07 samples/sec   Loss 3.3767   LearningRate 0.0161   Epoch: 11   Global Step: 68110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:06,318-Speed 3387.94 samples/sec   Loss 3.2945   LearningRate 0.0161   Epoch: 11   Global Step: 68120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:09,360-Speed 3367.10 samples/sec   Loss 3.2932   LearningRate 0.0161   Epoch: 11   Global Step: 68130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:12,392-Speed 3379.32 samples/sec   Loss 3.3454   LearningRate 0.0161   Epoch: 11   Global Step: 68140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:15,424-Speed 3377.19 samples/sec   Loss 3.3205   LearningRate 0.0161   Epoch: 11   Global Step: 68150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:18,445-Speed 3390.61 samples/sec   Loss 3.3089   LearningRate 0.0161   Epoch: 11   Global Step: 68160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:21,470-Speed 3385.53 samples/sec   Loss 3.1783   LearningRate 0.0160   Epoch: 11   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:24,492-Speed 3390.21 samples/sec   Loss 3.3599   LearningRate 0.0160   Epoch: 11   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:27,511-Speed 3392.35 samples/sec   Loss 3.3319   LearningRate 0.0160   Epoch: 11   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:30,533-Speed 3388.47 samples/sec   Loss 3.4040   LearningRate 0.0160   Epoch: 11   Global Step: 68200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:30:33,546-Speed 3399.64 samples/sec   Loss 3.4516   LearningRate 0.0160   Epoch: 11   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:36,573-Speed 3384.28 samples/sec   Loss 3.3043   LearningRate 0.0160   Epoch: 11   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:39,674-Speed 3302.43 samples/sec   Loss 3.3284   LearningRate 0.0160   Epoch: 11   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:53,066-Speed 764.72 samples/sec   Loss 2.8906   LearningRate 0.0160   Epoch: 12   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:56,101-Speed 3375.33 samples/sec   Loss 2.6391   LearningRate 0.0160   Epoch: 12   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:30:59,111-Speed 3402.64 samples/sec   Loss 2.6966   LearningRate 0.0160   Epoch: 12   Global Step: 68260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:02,140-Speed 3381.84 samples/sec   Loss 2.7196   LearningRate 0.0160   Epoch: 12   Global Step: 68270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:05,159-Speed 3393.04 samples/sec   Loss 2.6933   LearningRate 0.0160   Epoch: 12   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:08,180-Speed 3390.51 samples/sec   Loss 2.6670   LearningRate 0.0160   Epoch: 12   Global Step: 68290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:11,206-Speed 3384.27 samples/sec   Loss 2.7023   LearningRate 0.0160   Epoch: 12   Global Step: 68300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:14,224-Speed 3394.70 samples/sec   Loss 2.7314   LearningRate 0.0159   Epoch: 12   Global Step: 68310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:31:17,233-Speed 3403.46 samples/sec   Loss 2.7915   LearningRate 0.0159   Epoch: 12   Global Step: 68320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:31:20,244-Speed 3401.68 samples/sec   Loss 2.6779   LearningRate 0.0159   Epoch: 12   Global Step: 68330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:31:23,263-Speed 3392.37 samples/sec   Loss 2.7151   LearningRate 0.0159   Epoch: 12   Global Step: 68340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:31:26,316-Speed 3355.98 samples/sec   Loss 2.6300   LearningRate 0.0159   Epoch: 12   Global Step: 68350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:31:29,333-Speed 3394.57 samples/sec   Loss 2.7133   LearningRate 0.0159   Epoch: 12   Global Step: 68360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:31:32,334-Speed 3411.97 samples/sec   Loss 2.8186   LearningRate 0.0159   Epoch: 12   Global Step: 68370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:35,384-Speed 3359.34 samples/sec   Loss 2.7797   LearningRate 0.0159   Epoch: 12   Global Step: 68380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:38,400-Speed 3395.81 samples/sec   Loss 2.8686   LearningRate 0.0159   Epoch: 12   Global Step: 68390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:41,403-Speed 3410.54 samples/sec   Loss 2.7509   LearningRate 0.0159   Epoch: 12   Global Step: 68400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:44,413-Speed 3402.62 samples/sec   Loss 2.8202   LearningRate 0.0159   Epoch: 12   Global Step: 68410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:47,475-Speed 3345.57 samples/sec   Loss 2.7698   LearningRate 0.0159   Epoch: 12   Global Step: 68420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:50,533-Speed 3349.39 samples/sec   Loss 2.7210   LearningRate 0.0159   Epoch: 12   Global Step: 68430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:53,595-Speed 3344.04 samples/sec   Loss 2.8631   LearningRate 0.0159   Epoch: 12   Global Step: 68440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:56,605-Speed 3403.39 samples/sec   Loss 2.7102   LearningRate 0.0158   Epoch: 12   Global Step: 68450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:31:59,616-Speed 3401.85 samples/sec   Loss 2.7892   LearningRate 0.0158   Epoch: 12   Global Step: 68460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:32:02,614-Speed 3416.61 samples/sec   Loss 2.7480   LearningRate 0.0158   Epoch: 12   Global Step: 68470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:05,641-Speed 3383.32 samples/sec   Loss 2.7210   LearningRate 0.0158   Epoch: 12   Global Step: 68480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:08,653-Speed 3400.47 samples/sec   Loss 2.7783   LearningRate 0.0158   Epoch: 12   Global Step: 68490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:11,666-Speed 3399.67 samples/sec   Loss 2.7672   LearningRate 0.0158   Epoch: 12   Global Step: 68500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:14,681-Speed 3398.12 samples/sec   Loss 2.7803   LearningRate 0.0158   Epoch: 12   Global Step: 68510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:17,716-Speed 3374.04 samples/sec   Loss 2.6687   LearningRate 0.0158   Epoch: 12   Global Step: 68520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:20,732-Speed 3395.93 samples/sec   Loss 2.7307   LearningRate 0.0158   Epoch: 12   Global Step: 68530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:23,748-Speed 3395.58 samples/sec   Loss 2.8233   LearningRate 0.0158   Epoch: 12   Global Step: 68540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:26,763-Speed 3398.36 samples/sec   Loss 2.7349   LearningRate 0.0158   Epoch: 12   Global Step: 68550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:29,780-Speed 3394.39 samples/sec   Loss 2.9215   LearningRate 0.0158   Epoch: 12   Global Step: 68560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:32,807-Speed 3383.89 samples/sec   Loss 2.7536   LearningRate 0.0158   Epoch: 12   Global Step: 68570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:32:35,872-Speed 3341.51 samples/sec   Loss 2.8362   LearningRate 0.0158   Epoch: 12   Global Step: 68580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:32:38,885-Speed 3399.76 samples/sec   Loss 2.8543   LearningRate 0.0157   Epoch: 12   Global Step: 68590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:32:41,905-Speed 3391.03 samples/sec   Loss 2.7979   LearningRate 0.0157   Epoch: 12   Global Step: 68600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:32:44,926-Speed 3390.93 samples/sec   Loss 2.8277   LearningRate 0.0157   Epoch: 12   Global Step: 68610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:32:47,948-Speed 3388.68 samples/sec   Loss 2.8039   LearningRate 0.0157   Epoch: 12   Global Step: 68620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:32:50,949-Speed 3413.47 samples/sec   Loss 2.8073   LearningRate 0.0157   Epoch: 12   Global Step: 68630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:53,964-Speed 3397.05 samples/sec   Loss 2.9534   LearningRate 0.0157   Epoch: 12   Global Step: 68640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:32:56,987-Speed 3388.67 samples/sec   Loss 2.7494   LearningRate 0.0157   Epoch: 12   Global Step: 68650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:00,014-Speed 3383.33 samples/sec   Loss 2.8443   LearningRate 0.0157   Epoch: 12   Global Step: 68660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:03,036-Speed 3389.36 samples/sec   Loss 2.9545   LearningRate 0.0157   Epoch: 12   Global Step: 68670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:06,062-Speed 3384.91 samples/sec   Loss 2.8741   LearningRate 0.0157   Epoch: 12   Global Step: 68680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:09,080-Speed 3393.38 samples/sec   Loss 2.8273   LearningRate 0.0157   Epoch: 12   Global Step: 68690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:12,095-Speed 3397.42 samples/sec   Loss 2.9421   LearningRate 0.0157   Epoch: 12   Global Step: 68700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:15,111-Speed 3396.38 samples/sec   Loss 2.7945   LearningRate 0.0157   Epoch: 12   Global Step: 68710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:18,125-Speed 3398.37 samples/sec   Loss 2.8377   LearningRate 0.0157   Epoch: 12   Global Step: 68720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:33:21,137-Speed 3400.63 samples/sec   Loss 2.7301   LearningRate 0.0157   Epoch: 12   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:24,171-Speed 3374.84 samples/sec   Loss 2.8459   LearningRate 0.0156   Epoch: 12   Global Step: 68740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:27,207-Speed 3374.18 samples/sec   Loss 2.8955   LearningRate 0.0156   Epoch: 12   Global Step: 68750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:30,247-Speed 3368.60 samples/sec   Loss 2.8931   LearningRate 0.0156   Epoch: 12   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:33,263-Speed 3396.07 samples/sec   Loss 2.8192   LearningRate 0.0156   Epoch: 12   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:36,294-Speed 3378.89 samples/sec   Loss 2.9418   LearningRate 0.0156   Epoch: 12   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:39,313-Speed 3393.59 samples/sec   Loss 2.8436   LearningRate 0.0156   Epoch: 12   Global Step: 68790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:42,336-Speed 3388.05 samples/sec   Loss 2.8552   LearningRate 0.0156   Epoch: 12   Global Step: 68800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:45,358-Speed 3389.11 samples/sec   Loss 2.9611   LearningRate 0.0156   Epoch: 12   Global Step: 68810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:48,372-Speed 3398.38 samples/sec   Loss 2.8706   LearningRate 0.0156   Epoch: 12   Global Step: 68820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:33:51,392-Speed 3392.12 samples/sec   Loss 2.8263   LearningRate 0.0156   Epoch: 12   Global Step: 68830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:33:54,407-Speed 3396.88 samples/sec   Loss 2.9706   LearningRate 0.0156   Epoch: 12   Global Step: 68840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:33:57,436-Speed 3380.96 samples/sec   Loss 2.8946   LearningRate 0.0156   Epoch: 12   Global Step: 68850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:34:00,456-Speed 3391.44 samples/sec   Loss 2.8579   LearningRate 0.0156   Epoch: 12   Global Step: 68860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:34:03,481-Speed 3386.28 samples/sec   Loss 2.9203   LearningRate 0.0156   Epoch: 12   Global Step: 68870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:34:06,501-Speed 3391.21 samples/sec   Loss 2.8691   LearningRate 0.0155   Epoch: 12   Global Step: 68880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:34:09,525-Speed 3387.43 samples/sec   Loss 2.9902   LearningRate 0.0155   Epoch: 12   Global Step: 68890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:34:12,548-Speed 3387.54 samples/sec   Loss 2.9430   LearningRate 0.0155   Epoch: 12   Global Step: 68900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:34:15,572-Speed 3387.49 samples/sec   Loss 2.7942   LearningRate 0.0155   Epoch: 12   Global Step: 68910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:34:18,590-Speed 3393.71 samples/sec   Loss 2.9153   LearningRate 0.0155   Epoch: 12   Global Step: 68920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:34:21,602-Speed 3400.70 samples/sec   Loss 2.9602   LearningRate 0.0155   Epoch: 12   Global Step: 68930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:34:24,621-Speed 3392.73 samples/sec   Loss 2.8985   LearningRate 0.0155   Epoch: 12   Global Step: 68940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:34:27,634-Speed 3399.25 samples/sec   Loss 2.8272   LearningRate 0.0155   Epoch: 12   Global Step: 68950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:30,647-Speed 3398.98 samples/sec   Loss 2.9287   LearningRate 0.0155   Epoch: 12   Global Step: 68960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:33,682-Speed 3375.54 samples/sec   Loss 2.8855   LearningRate 0.0155   Epoch: 12   Global Step: 68970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:36,723-Speed 3368.05 samples/sec   Loss 2.8697   LearningRate 0.0155   Epoch: 12   Global Step: 68980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:39,737-Speed 3397.88 samples/sec   Loss 2.9272   LearningRate 0.0155   Epoch: 12   Global Step: 68990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:42,756-Speed 3393.23 samples/sec   Loss 2.8038   LearningRate 0.0155   Epoch: 12   Global Step: 69000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:45,781-Speed 3385.15 samples/sec   Loss 2.9353   LearningRate 0.0155   Epoch: 12   Global Step: 69010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:48,798-Speed 3395.13 samples/sec   Loss 2.8408   LearningRate 0.0155   Epoch: 12   Global Step: 69020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:51,815-Speed 3395.01 samples/sec   Loss 2.8736   LearningRate 0.0154   Epoch: 12   Global Step: 69030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:54,833-Speed 3394.11 samples/sec   Loss 2.8371   LearningRate 0.0154   Epoch: 12   Global Step: 69040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:34:57,855-Speed 3389.30 samples/sec   Loss 2.9553   LearningRate 0.0154   Epoch: 12   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:00,881-Speed 3383.73 samples/sec   Loss 2.9095   LearningRate 0.0154   Epoch: 12   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:03,929-Speed 3360.83 samples/sec   Loss 2.8833   LearningRate 0.0154   Epoch: 12   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:06,954-Speed 3385.81 samples/sec   Loss 3.0126   LearningRate 0.0154   Epoch: 12   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:09,980-Speed 3385.78 samples/sec   Loss 3.0224   LearningRate 0.0154   Epoch: 12   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:12,999-Speed 3391.91 samples/sec   Loss 2.9118   LearningRate 0.0154   Epoch: 12   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:16,031-Speed 3378.00 samples/sec   Loss 2.9507   LearningRate 0.0154   Epoch: 12   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:19,054-Speed 3388.20 samples/sec   Loss 2.9247   LearningRate 0.0154   Epoch: 12   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:22,081-Speed 3384.15 samples/sec   Loss 3.0166   LearningRate 0.0154   Epoch: 12   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:25,101-Speed 3391.60 samples/sec   Loss 2.9326   LearningRate 0.0154   Epoch: 12   Global Step: 69140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:28,184-Speed 3321.78 samples/sec   Loss 2.8806   LearningRate 0.0154   Epoch: 12   Global Step: 69150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:35:31,185-Speed 3412.62 samples/sec   Loss 2.8610   LearningRate 0.0154   Epoch: 12   Global Step: 69160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:34,202-Speed 3395.58 samples/sec   Loss 2.9163   LearningRate 0.0153   Epoch: 12   Global Step: 69170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:37,222-Speed 3391.52 samples/sec   Loss 2.8866   LearningRate 0.0153   Epoch: 12   Global Step: 69180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:40,258-Speed 3373.19 samples/sec   Loss 2.8600   LearningRate 0.0153   Epoch: 12   Global Step: 69190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:43,296-Speed 3372.44 samples/sec   Loss 2.7646   LearningRate 0.0153   Epoch: 12   Global Step: 69200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:46,321-Speed 3385.59 samples/sec   Loss 2.8803   LearningRate 0.0153   Epoch: 12   Global Step: 69210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:49,357-Speed 3373.27 samples/sec   Loss 2.9149   LearningRate 0.0153   Epoch: 12   Global Step: 69220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:52,378-Speed 3390.13 samples/sec   Loss 2.9208   LearningRate 0.0153   Epoch: 12   Global Step: 69230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:55,398-Speed 3391.62 samples/sec   Loss 2.9092   LearningRate 0.0153   Epoch: 12   Global Step: 69240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:35:58,421-Speed 3387.80 samples/sec   Loss 2.9775   LearningRate 0.0153   Epoch: 12   Global Step: 69250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:01,464-Speed 3366.28 samples/sec   Loss 2.8944   LearningRate 0.0153   Epoch: 12   Global Step: 69260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:36:04,477-Speed 3399.38 samples/sec   Loss 2.9035   LearningRate 0.0153   Epoch: 12   Global Step: 69270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:07,498-Speed 3391.05 samples/sec   Loss 2.8886   LearningRate 0.0153   Epoch: 12   Global Step: 69280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:10,566-Speed 3338.29 samples/sec   Loss 2.9042   LearningRate 0.0153   Epoch: 12   Global Step: 69290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:13,590-Speed 3387.28 samples/sec   Loss 2.9637   LearningRate 0.0153   Epoch: 12   Global Step: 69300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:16,616-Speed 3384.30 samples/sec   Loss 3.0239   LearningRate 0.0153   Epoch: 12   Global Step: 69310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:19,633-Speed 3395.20 samples/sec   Loss 2.9796   LearningRate 0.0152   Epoch: 12   Global Step: 69320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:22,664-Speed 3379.28 samples/sec   Loss 3.0048   LearningRate 0.0152   Epoch: 12   Global Step: 69330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:25,685-Speed 3390.51 samples/sec   Loss 2.8677   LearningRate 0.0152   Epoch: 12   Global Step: 69340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:28,706-Speed 3390.58 samples/sec   Loss 2.9528   LearningRate 0.0152   Epoch: 12   Global Step: 69350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:31,730-Speed 3387.08 samples/sec   Loss 2.9660   LearningRate 0.0152   Epoch: 12   Global Step: 69360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:34,766-Speed 3372.61 samples/sec   Loss 2.9115   LearningRate 0.0152   Epoch: 12   Global Step: 69370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:36:37,819-Speed 3355.79 samples/sec   Loss 2.9767   LearningRate 0.0152   Epoch: 12   Global Step: 69380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:36:40,851-Speed 3377.57 samples/sec   Loss 2.9475   LearningRate 0.0152   Epoch: 12   Global Step: 69390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:36:43,858-Speed 3406.16 samples/sec   Loss 2.8955   LearningRate 0.0152   Epoch: 12   Global Step: 69400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:46,887-Speed 3381.03 samples/sec   Loss 2.9508   LearningRate 0.0152   Epoch: 12   Global Step: 69410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:49,948-Speed 3346.41 samples/sec   Loss 3.0545   LearningRate 0.0152   Epoch: 12   Global Step: 69420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:52,967-Speed 3392.85 samples/sec   Loss 2.9622   LearningRate 0.0152   Epoch: 12   Global Step: 69430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:55,986-Speed 3392.62 samples/sec   Loss 2.9645   LearningRate 0.0152   Epoch: 12   Global Step: 69440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:36:59,007-Speed 3390.12 samples/sec   Loss 2.9542   LearningRate 0.0152   Epoch: 12   Global Step: 69450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:02,037-Speed 3380.25 samples/sec   Loss 2.9494   LearningRate 0.0151   Epoch: 12   Global Step: 69460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:05,087-Speed 3358.32 samples/sec   Loss 2.9143   LearningRate 0.0151   Epoch: 12   Global Step: 69470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:08,121-Speed 3376.25 samples/sec   Loss 2.9291   LearningRate 0.0151   Epoch: 12   Global Step: 69480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:11,148-Speed 3383.47 samples/sec   Loss 2.9520   LearningRate 0.0151   Epoch: 12   Global Step: 69490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:14,181-Speed 3377.57 samples/sec   Loss 2.8318   LearningRate 0.0151   Epoch: 12   Global Step: 69500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:37:17,191-Speed 3402.21 samples/sec   Loss 2.9046   LearningRate 0.0151   Epoch: 12   Global Step: 69510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:20,210-Speed 3393.03 samples/sec   Loss 2.9580   LearningRate 0.0151   Epoch: 12   Global Step: 69520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:23,238-Speed 3383.12 samples/sec   Loss 2.8556   LearningRate 0.0151   Epoch: 12   Global Step: 69530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:26,260-Speed 3388.93 samples/sec   Loss 3.0425   LearningRate 0.0151   Epoch: 12   Global Step: 69540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:29,282-Speed 3389.33 samples/sec   Loss 2.9451   LearningRate 0.0151   Epoch: 12   Global Step: 69550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:32,300-Speed 3393.15 samples/sec   Loss 2.9904   LearningRate 0.0151   Epoch: 12   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:35,328-Speed 3382.94 samples/sec   Loss 2.8999   LearningRate 0.0151   Epoch: 12   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:38,363-Speed 3374.73 samples/sec   Loss 2.9934   LearningRate 0.0151   Epoch: 12   Global Step: 69580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:41,390-Speed 3383.07 samples/sec   Loss 2.9356   LearningRate 0.0151   Epoch: 12   Global Step: 69590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:44,412-Speed 3389.10 samples/sec   Loss 3.0342   LearningRate 0.0151   Epoch: 12   Global Step: 69600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:47,434-Speed 3390.32 samples/sec   Loss 3.0176   LearningRate 0.0150   Epoch: 12   Global Step: 69610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:37:50,527-Speed 3311.57 samples/sec   Loss 3.0811   LearningRate 0.0150   Epoch: 12   Global Step: 69620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:53,555-Speed 3382.58 samples/sec   Loss 2.9070   LearningRate 0.0150   Epoch: 12   Global Step: 69630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:56,584-Speed 3381.16 samples/sec   Loss 3.0134   LearningRate 0.0150   Epoch: 12   Global Step: 69640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:37:59,605-Speed 3390.37 samples/sec   Loss 2.9500   LearningRate 0.0150   Epoch: 12   Global Step: 69650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:02,628-Speed 3387.65 samples/sec   Loss 3.0333   LearningRate 0.0150   Epoch: 12   Global Step: 69660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:05,650-Speed 3390.96 samples/sec   Loss 2.9589   LearningRate 0.0150   Epoch: 12   Global Step: 69670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:08,670-Speed 3390.99 samples/sec   Loss 2.9688   LearningRate 0.0150   Epoch: 12   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:11,710-Speed 3369.06 samples/sec   Loss 2.9899   LearningRate 0.0150   Epoch: 12   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:14,749-Speed 3370.34 samples/sec   Loss 3.0149   LearningRate 0.0150   Epoch: 12   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:17,772-Speed 3389.16 samples/sec   Loss 2.9683   LearningRate 0.0150   Epoch: 12   Global Step: 69710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:20,793-Speed 3390.29 samples/sec   Loss 2.9777   LearningRate 0.0150   Epoch: 12   Global Step: 69720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:38:23,826-Speed 3376.87 samples/sec   Loss 3.0046   LearningRate 0.0150   Epoch: 12   Global Step: 69730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:38:26,832-Speed 3406.70 samples/sec   Loss 2.9200   LearningRate 0.0150   Epoch: 12   Global Step: 69740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:29,857-Speed 3386.78 samples/sec   Loss 2.8952   LearningRate 0.0149   Epoch: 12   Global Step: 69750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:32,879-Speed 3388.43 samples/sec   Loss 3.0575   LearningRate 0.0149   Epoch: 12   Global Step: 69760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:35,988-Speed 3295.26 samples/sec   Loss 3.0682   LearningRate 0.0149   Epoch: 12   Global Step: 69770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:39,017-Speed 3380.67 samples/sec   Loss 2.9610   LearningRate 0.0149   Epoch: 12   Global Step: 69780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:42,050-Speed 3377.37 samples/sec   Loss 2.9542   LearningRate 0.0149   Epoch: 12   Global Step: 69790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:45,076-Speed 3385.50 samples/sec   Loss 3.0253   LearningRate 0.0149   Epoch: 12   Global Step: 69800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:48,100-Speed 3386.66 samples/sec   Loss 2.9055   LearningRate 0.0149   Epoch: 12   Global Step: 69810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:51,122-Speed 3388.94 samples/sec   Loss 2.9553   LearningRate 0.0149   Epoch: 12   Global Step: 69820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:54,141-Speed 3392.86 samples/sec   Loss 2.9653   LearningRate 0.0149   Epoch: 12   Global Step: 69830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:38:57,171-Speed 3380.48 samples/sec   Loss 2.9707   LearningRate 0.0149   Epoch: 12   Global Step: 69840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:39:00,310-Speed 3262.83 samples/sec   Loss 2.9749   LearningRate 0.0149   Epoch: 12   Global Step: 69850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:39:03,386-Speed 3329.87 samples/sec   Loss 2.9620   LearningRate 0.0149   Epoch: 12   Global Step: 69860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:39:06,424-Speed 3371.03 samples/sec   Loss 2.9800   LearningRate 0.0149   Epoch: 12   Global Step: 69870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:39:09,447-Speed 3388.37 samples/sec   Loss 2.9425   LearningRate 0.0149   Epoch: 12   Global Step: 69880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:39:12,485-Speed 3371.36 samples/sec   Loss 2.9277   LearningRate 0.0149   Epoch: 12   Global Step: 69890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:39:15,513-Speed 3382.66 samples/sec   Loss 3.0090   LearningRate 0.0148   Epoch: 12   Global Step: 69900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:39:18,519-Speed 3406.95 samples/sec   Loss 2.9337   LearningRate 0.0148   Epoch: 12   Global Step: 69910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:39:21,545-Speed 3384.72 samples/sec   Loss 2.8932   LearningRate 0.0148   Epoch: 12   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 08:39:24,580-Speed 3375.41 samples/sec   Loss 3.0361   LearningRate 0.0148   Epoch: 12   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:39:27,630-Speed 3358.47 samples/sec   Loss 2.9571   LearningRate 0.0148   Epoch: 12   Global Step: 69940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:39:30,660-Speed 3379.24 samples/sec   Loss 2.9061   LearningRate 0.0148   Epoch: 12   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:39:33,689-Speed 3381.53 samples/sec   Loss 3.0292   LearningRate 0.0148   Epoch: 12   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:39:36,730-Speed 3369.04 samples/sec   Loss 2.9082   LearningRate 0.0148   Epoch: 12   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:39:39,761-Speed 3379.01 samples/sec   Loss 2.9864   LearningRate 0.0148   Epoch: 12   Global Step: 69980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:39:42,786-Speed 3386.14 samples/sec   Loss 2.9400   LearningRate 0.0148   Epoch: 12   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:39:45,837-Speed 3356.61 samples/sec   Loss 2.9113   LearningRate 0.0148   Epoch: 12   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:40:29,074-[lfw][70000]XNorm: 21.786232
Training: 2022-04-27 08:40:29,075-[lfw][70000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-27 08:40:29,076-[lfw][70000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:41:19,765-[cfp_fp][70000]XNorm: 20.163254
Training: 2022-04-27 08:41:19,766-[cfp_fp][70000]Accuracy-Flip: 0.97286+-0.01048
Training: 2022-04-27 08:41:19,766-[cfp_fp][70000]Accuracy-Highest: 0.97529
Training: 2022-04-27 08:42:03,105-[agedb_30][70000]XNorm: 22.169804
Training: 2022-04-27 08:42:03,106-[agedb_30][70000]Accuracy-Flip: 0.97650+-0.00713
Training: 2022-04-27 08:42:03,106-[agedb_30][70000]Accuracy-Highest: 0.97950
Training: 2022-04-27 08:42:06,117-Speed 73.00 samples/sec   Loss 3.0755   LearningRate 0.0148   Epoch: 12   Global Step: 70010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:42:09,133-Speed 3396.29 samples/sec   Loss 2.9348   LearningRate 0.0148   Epoch: 12   Global Step: 70020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:42:12,137-Speed 3409.69 samples/sec   Loss 2.9863   LearningRate 0.0148   Epoch: 12   Global Step: 70030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:42:15,147-Speed 3402.84 samples/sec   Loss 3.0182   LearningRate 0.0148   Epoch: 12   Global Step: 70040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:42:18,177-Speed 3379.83 samples/sec   Loss 2.9671   LearningRate 0.0147   Epoch: 12   Global Step: 70050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:42:21,202-Speed 3386.31 samples/sec   Loss 2.9982   LearningRate 0.0147   Epoch: 12   Global Step: 70060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-27 08:42:24,191-Speed 3427.08 samples/sec   Loss 2.9643   LearningRate 0.0147   Epoch: 12   Global Step: 70070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:42:27,206-Speed 3396.29 samples/sec   Loss 2.9391   LearningRate 0.0147   Epoch: 12   Global Step: 70080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:42:30,225-Speed 3392.68 samples/sec   Loss 3.0053   LearningRate 0.0147   Epoch: 12   Global Step: 70090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:42:33,244-Speed 3392.85 samples/sec   Loss 3.0922   LearningRate 0.0147   Epoch: 12   Global Step: 70100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:42:36,266-Speed 3389.86 samples/sec   Loss 2.9610   LearningRate 0.0147   Epoch: 12   Global Step: 70110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:42:39,285-Speed 3393.27 samples/sec   Loss 3.1107   LearningRate 0.0147   Epoch: 12   Global Step: 70120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:42:42,306-Speed 3389.59 samples/sec   Loss 2.9213   LearningRate 0.0147   Epoch: 12   Global Step: 70130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 08:42:45,334-Speed 3383.09 samples/sec   Loss 2.9504   LearningRate 0.0147   Epoch: 12   Global Step: 70140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:42:48,397-Speed 3343.79 samples/sec   Loss 2.9613   LearningRate 0.0147   Epoch: 12   Global Step: 70150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:42:51,433-Speed 3373.72 samples/sec   Loss 2.9641   LearningRate 0.0147   Epoch: 12   Global Step: 70160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:42:54,464-Speed 3378.89 samples/sec   Loss 2.9847   LearningRate 0.0147   Epoch: 12   Global Step: 70170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:42:57,494-Speed 3380.16 samples/sec   Loss 3.0807   LearningRate 0.0147   Epoch: 12   Global Step: 70180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:00,527-Speed 3377.74 samples/sec   Loss 3.0753   LearningRate 0.0147   Epoch: 12   Global Step: 70190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:03,597-Speed 3335.89 samples/sec   Loss 3.0317   LearningRate 0.0146   Epoch: 12   Global Step: 70200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:06,619-Speed 3388.78 samples/sec   Loss 2.9754   LearningRate 0.0146   Epoch: 12   Global Step: 70210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:09,645-Speed 3384.87 samples/sec   Loss 2.9868   LearningRate 0.0146   Epoch: 12   Global Step: 70220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:12,667-Speed 3389.21 samples/sec   Loss 2.9660   LearningRate 0.0146   Epoch: 12   Global Step: 70230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:15,687-Speed 3391.95 samples/sec   Loss 3.0555   LearningRate 0.0146   Epoch: 12   Global Step: 70240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:18,708-Speed 3390.96 samples/sec   Loss 2.9860   LearningRate 0.0146   Epoch: 12   Global Step: 70250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:21,727-Speed 3391.96 samples/sec   Loss 2.9238   LearningRate 0.0146   Epoch: 12   Global Step: 70260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:24,753-Speed 3385.25 samples/sec   Loss 3.0357   LearningRate 0.0146   Epoch: 12   Global Step: 70270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:43:27,779-Speed 3384.55 samples/sec   Loss 2.9661   LearningRate 0.0146   Epoch: 12   Global Step: 70280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:43:30,776-Speed 3418.24 samples/sec   Loss 2.9053   LearningRate 0.0146   Epoch: 12   Global Step: 70290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:33,791-Speed 3396.51 samples/sec   Loss 2.9835   LearningRate 0.0146   Epoch: 12   Global Step: 70300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:36,816-Speed 3386.10 samples/sec   Loss 2.9525   LearningRate 0.0146   Epoch: 12   Global Step: 70310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:39,828-Speed 3399.96 samples/sec   Loss 3.0130   LearningRate 0.0146   Epoch: 12   Global Step: 70320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:42,847-Speed 3393.49 samples/sec   Loss 2.9591   LearningRate 0.0146   Epoch: 12   Global Step: 70330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:45,861-Speed 3397.61 samples/sec   Loss 3.1279   LearningRate 0.0146   Epoch: 12   Global Step: 70340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:48,871-Speed 3402.68 samples/sec   Loss 3.0067   LearningRate 0.0145   Epoch: 12   Global Step: 70350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:51,882-Speed 3402.34 samples/sec   Loss 3.0225   LearningRate 0.0145   Epoch: 12   Global Step: 70360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:54,890-Speed 3404.47 samples/sec   Loss 2.9934   LearningRate 0.0145   Epoch: 12   Global Step: 70370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:43:57,901-Speed 3401.91 samples/sec   Loss 2.8937   LearningRate 0.0145   Epoch: 12   Global Step: 70380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:00,912-Speed 3401.78 samples/sec   Loss 2.9831   LearningRate 0.0145   Epoch: 12   Global Step: 70390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:44:03,925-Speed 3399.42 samples/sec   Loss 2.8897   LearningRate 0.0145   Epoch: 12   Global Step: 70400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:44:06,914-Speed 3425.96 samples/sec   Loss 2.9877   LearningRate 0.0145   Epoch: 12   Global Step: 70410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:09,937-Speed 3389.32 samples/sec   Loss 2.8955   LearningRate 0.0145   Epoch: 12   Global Step: 70420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:12,952-Speed 3396.82 samples/sec   Loss 2.8363   LearningRate 0.0145   Epoch: 12   Global Step: 70430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:15,981-Speed 3381.46 samples/sec   Loss 3.0332   LearningRate 0.0145   Epoch: 12   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:18,989-Speed 3404.74 samples/sec   Loss 2.9248   LearningRate 0.0145   Epoch: 12   Global Step: 70450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:21,999-Speed 3403.55 samples/sec   Loss 3.0021   LearningRate 0.0145   Epoch: 12   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:25,008-Speed 3403.34 samples/sec   Loss 2.9024   LearningRate 0.0145   Epoch: 12   Global Step: 70470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:28,053-Speed 3364.27 samples/sec   Loss 2.9232   LearningRate 0.0145   Epoch: 12   Global Step: 70480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:31,063-Speed 3402.24 samples/sec   Loss 2.8825   LearningRate 0.0145   Epoch: 12   Global Step: 70490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:44:34,058-Speed 3419.68 samples/sec   Loss 3.1152   LearningRate 0.0144   Epoch: 12   Global Step: 70500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:37,083-Speed 3385.41 samples/sec   Loss 2.9028   LearningRate 0.0144   Epoch: 12   Global Step: 70510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:40,102-Speed 3393.17 samples/sec   Loss 3.0528   LearningRate 0.0144   Epoch: 12   Global Step: 70520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:43,118-Speed 3396.04 samples/sec   Loss 3.0642   LearningRate 0.0144   Epoch: 12   Global Step: 70530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:46,140-Speed 3389.80 samples/sec   Loss 3.0241   LearningRate 0.0144   Epoch: 12   Global Step: 70540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:49,151-Speed 3401.22 samples/sec   Loss 3.0396   LearningRate 0.0144   Epoch: 12   Global Step: 70550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:52,163-Speed 3400.99 samples/sec   Loss 2.9291   LearningRate 0.0144   Epoch: 12   Global Step: 70560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:55,174-Speed 3401.60 samples/sec   Loss 3.0364   LearningRate 0.0144   Epoch: 12   Global Step: 70570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:44:58,195-Speed 3390.03 samples/sec   Loss 3.0446   LearningRate 0.0144   Epoch: 12   Global Step: 70580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:45:01,241-Speed 3362.89 samples/sec   Loss 3.0092   LearningRate 0.0144   Epoch: 12   Global Step: 70590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:45:04,253-Speed 3399.86 samples/sec   Loss 2.8565   LearningRate 0.0144   Epoch: 12   Global Step: 70600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:07,265-Speed 3400.49 samples/sec   Loss 3.0092   LearningRate 0.0144   Epoch: 12   Global Step: 70610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:10,281-Speed 3396.72 samples/sec   Loss 3.0704   LearningRate 0.0144   Epoch: 12   Global Step: 70620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:13,293-Speed 3400.09 samples/sec   Loss 3.0433   LearningRate 0.0144   Epoch: 12   Global Step: 70630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:16,306-Speed 3399.79 samples/sec   Loss 2.9770   LearningRate 0.0144   Epoch: 12   Global Step: 70640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:19,321-Speed 3396.75 samples/sec   Loss 3.0028   LearningRate 0.0143   Epoch: 12   Global Step: 70650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:22,337-Speed 3396.15 samples/sec   Loss 2.9616   LearningRate 0.0143   Epoch: 12   Global Step: 70660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:25,357-Speed 3391.14 samples/sec   Loss 2.9238   LearningRate 0.0143   Epoch: 12   Global Step: 70670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:28,373-Speed 3396.82 samples/sec   Loss 3.0049   LearningRate 0.0143   Epoch: 12   Global Step: 70680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:31,385-Speed 3400.62 samples/sec   Loss 2.9895   LearningRate 0.0143   Epoch: 12   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:34,401-Speed 3395.76 samples/sec   Loss 3.0348   LearningRate 0.0143   Epoch: 12   Global Step: 70700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:45:37,419-Speed 3393.08 samples/sec   Loss 3.0291   LearningRate 0.0143   Epoch: 12   Global Step: 70710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:45:40,427-Speed 3405.97 samples/sec   Loss 3.0131   LearningRate 0.0143   Epoch: 12   Global Step: 70720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:43,437-Speed 3402.57 samples/sec   Loss 3.0264   LearningRate 0.0143   Epoch: 12   Global Step: 70730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:46,457-Speed 3392.00 samples/sec   Loss 3.0129   LearningRate 0.0143   Epoch: 12   Global Step: 70740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:49,476-Speed 3392.63 samples/sec   Loss 2.9210   LearningRate 0.0143   Epoch: 12   Global Step: 70750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:52,490-Speed 3397.62 samples/sec   Loss 2.9508   LearningRate 0.0143   Epoch: 12   Global Step: 70760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:55,504-Speed 3398.48 samples/sec   Loss 2.9477   LearningRate 0.0143   Epoch: 12   Global Step: 70770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:45:58,517-Speed 3398.67 samples/sec   Loss 3.0852   LearningRate 0.0143   Epoch: 12   Global Step: 70780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:01,538-Speed 3390.59 samples/sec   Loss 2.9993   LearningRate 0.0143   Epoch: 12   Global Step: 70790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:04,558-Speed 3390.99 samples/sec   Loss 2.8712   LearningRate 0.0142   Epoch: 12   Global Step: 70800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:07,570-Speed 3401.25 samples/sec   Loss 3.0042   LearningRate 0.0142   Epoch: 12   Global Step: 70810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:10,586-Speed 3395.69 samples/sec   Loss 3.0788   LearningRate 0.0142   Epoch: 12   Global Step: 70820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:46:13,615-Speed 3381.92 samples/sec   Loss 3.0204   LearningRate 0.0142   Epoch: 12   Global Step: 70830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:46:16,629-Speed 3399.21 samples/sec   Loss 3.0087   LearningRate 0.0142   Epoch: 12   Global Step: 70840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:46:19,644-Speed 3396.18 samples/sec   Loss 2.9713   LearningRate 0.0142   Epoch: 12   Global Step: 70850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:46:22,661-Speed 3395.10 samples/sec   Loss 2.9859   LearningRate 0.0142   Epoch: 12   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:46:25,662-Speed 3412.92 samples/sec   Loss 2.9094   LearningRate 0.0142   Epoch: 12   Global Step: 70870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:28,674-Speed 3400.22 samples/sec   Loss 2.9526   LearningRate 0.0142   Epoch: 12   Global Step: 70880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:31,695-Speed 3390.93 samples/sec   Loss 2.9163   LearningRate 0.0142   Epoch: 12   Global Step: 70890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:34,724-Speed 3381.71 samples/sec   Loss 2.9847   LearningRate 0.0142   Epoch: 12   Global Step: 70900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:37,739-Speed 3396.88 samples/sec   Loss 3.0371   LearningRate 0.0142   Epoch: 12   Global Step: 70910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:40,754-Speed 3397.12 samples/sec   Loss 3.1249   LearningRate 0.0142   Epoch: 12   Global Step: 70920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:43,768-Speed 3397.90 samples/sec   Loss 2.9246   LearningRate 0.0142   Epoch: 12   Global Step: 70930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:46,800-Speed 3378.42 samples/sec   Loss 3.0665   LearningRate 0.0142   Epoch: 12   Global Step: 70940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:49,865-Speed 3342.39 samples/sec   Loss 3.0100   LearningRate 0.0141   Epoch: 12   Global Step: 70950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:52,884-Speed 3392.36 samples/sec   Loss 2.9569   LearningRate 0.0141   Epoch: 12   Global Step: 70960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:46:55,915-Speed 3379.16 samples/sec   Loss 3.0189   LearningRate 0.0141   Epoch: 12   Global Step: 70970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:46:58,964-Speed 3359.42 samples/sec   Loss 2.9410   LearningRate 0.0141   Epoch: 12   Global Step: 70980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:47:01,984-Speed 3391.29 samples/sec   Loss 2.9373   LearningRate 0.0141   Epoch: 12   Global Step: 70990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:47:05,039-Speed 3352.33 samples/sec   Loss 2.8762   LearningRate 0.0141   Epoch: 12   Global Step: 71000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:47:08,072-Speed 3378.03 samples/sec   Loss 2.8992   LearningRate 0.0141   Epoch: 12   Global Step: 71010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:47:11,072-Speed 3413.46 samples/sec   Loss 3.0703   LearningRate 0.0141   Epoch: 12   Global Step: 71020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:14,107-Speed 3374.82 samples/sec   Loss 2.9355   LearningRate 0.0141   Epoch: 12   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:17,120-Speed 3399.21 samples/sec   Loss 2.9525   LearningRate 0.0141   Epoch: 12   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:20,135-Speed 3397.79 samples/sec   Loss 3.0576   LearningRate 0.0141   Epoch: 12   Global Step: 71050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:23,183-Speed 3359.44 samples/sec   Loss 2.8769   LearningRate 0.0141   Epoch: 12   Global Step: 71060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:26,229-Speed 3363.13 samples/sec   Loss 3.0718   LearningRate 0.0141   Epoch: 12   Global Step: 71070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:29,250-Speed 3391.12 samples/sec   Loss 3.0119   LearningRate 0.0141   Epoch: 12   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:32,266-Speed 3396.18 samples/sec   Loss 2.9719   LearningRate 0.0141   Epoch: 12   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:35,334-Speed 3338.15 samples/sec   Loss 2.9936   LearningRate 0.0140   Epoch: 12   Global Step: 71100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:38,351-Speed 3395.43 samples/sec   Loss 2.9588   LearningRate 0.0140   Epoch: 12   Global Step: 71110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:41,349-Speed 3415.80 samples/sec   Loss 2.9380   LearningRate 0.0140   Epoch: 12   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:44,371-Speed 3388.82 samples/sec   Loss 3.0038   LearningRate 0.0140   Epoch: 12   Global Step: 71130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:47,390-Speed 3393.35 samples/sec   Loss 3.1440   LearningRate 0.0140   Epoch: 12   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:50,417-Speed 3382.96 samples/sec   Loss 3.0325   LearningRate 0.0140   Epoch: 12   Global Step: 71150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:53,442-Speed 3386.09 samples/sec   Loss 2.9148   LearningRate 0.0140   Epoch: 12   Global Step: 71160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:56,463-Speed 3390.41 samples/sec   Loss 3.0601   LearningRate 0.0140   Epoch: 12   Global Step: 71170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:47:59,499-Speed 3374.15 samples/sec   Loss 3.0690   LearningRate 0.0140   Epoch: 12   Global Step: 71180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:03,007-Speed 2919.70 samples/sec   Loss 2.9328   LearningRate 0.0140   Epoch: 12   Global Step: 71190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:06,332-Speed 3080.24 samples/sec   Loss 3.0142   LearningRate 0.0140   Epoch: 12   Global Step: 71200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:09,358-Speed 3384.54 samples/sec   Loss 3.0577   LearningRate 0.0140   Epoch: 12   Global Step: 71210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:12,462-Speed 3299.62 samples/sec   Loss 2.9218   LearningRate 0.0140   Epoch: 12   Global Step: 71220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:15,515-Speed 3355.56 samples/sec   Loss 3.0894   LearningRate 0.0140   Epoch: 12   Global Step: 71230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:18,544-Speed 3380.59 samples/sec   Loss 2.9275   LearningRate 0.0140   Epoch: 12   Global Step: 71240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:21,558-Speed 3398.76 samples/sec   Loss 3.0397   LearningRate 0.0139   Epoch: 12   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:24,573-Speed 3396.78 samples/sec   Loss 2.9891   LearningRate 0.0139   Epoch: 12   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:27,589-Speed 3396.50 samples/sec   Loss 2.9284   LearningRate 0.0139   Epoch: 12   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:30,603-Speed 3398.47 samples/sec   Loss 3.0608   LearningRate 0.0139   Epoch: 12   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:33,618-Speed 3396.65 samples/sec   Loss 3.0124   LearningRate 0.0139   Epoch: 12   Global Step: 71290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:36,642-Speed 3387.41 samples/sec   Loss 2.9867   LearningRate 0.0139   Epoch: 12   Global Step: 71300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:39,660-Speed 3394.15 samples/sec   Loss 2.9471   LearningRate 0.0139   Epoch: 12   Global Step: 71310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:48:42,682-Speed 3389.01 samples/sec   Loss 2.9521   LearningRate 0.0139   Epoch: 12   Global Step: 71320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:48:45,705-Speed 3387.51 samples/sec   Loss 3.0029   LearningRate 0.0139   Epoch: 12   Global Step: 71330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:48:48,733-Speed 3382.22 samples/sec   Loss 2.9968   LearningRate 0.0139   Epoch: 12   Global Step: 71340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:48:51,756-Speed 3388.67 samples/sec   Loss 3.0603   LearningRate 0.0139   Epoch: 12   Global Step: 71350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:48:54,773-Speed 3395.52 samples/sec   Loss 3.0013   LearningRate 0.0139   Epoch: 12   Global Step: 71360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:48:57,793-Speed 3391.41 samples/sec   Loss 2.9905   LearningRate 0.0139   Epoch: 12   Global Step: 71370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:49:00,802-Speed 3403.98 samples/sec   Loss 3.1291   LearningRate 0.0139   Epoch: 12   Global Step: 71380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:03,828-Speed 3384.46 samples/sec   Loss 2.9547   LearningRate 0.0139   Epoch: 12   Global Step: 71390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:06,859-Speed 3378.78 samples/sec   Loss 3.0144   LearningRate 0.0138   Epoch: 12   Global Step: 71400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:09,892-Speed 3377.24 samples/sec   Loss 2.9717   LearningRate 0.0138   Epoch: 12   Global Step: 71410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:12,925-Speed 3377.19 samples/sec   Loss 3.0302   LearningRate 0.0138   Epoch: 12   Global Step: 71420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:15,952-Speed 3383.58 samples/sec   Loss 2.9250   LearningRate 0.0138   Epoch: 12   Global Step: 71430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:18,968-Speed 3396.18 samples/sec   Loss 2.9279   LearningRate 0.0138   Epoch: 12   Global Step: 71440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:21,986-Speed 3394.61 samples/sec   Loss 2.9565   LearningRate 0.0138   Epoch: 12   Global Step: 71450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:25,007-Speed 3390.42 samples/sec   Loss 2.9326   LearningRate 0.0138   Epoch: 12   Global Step: 71460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:28,027-Speed 3391.24 samples/sec   Loss 2.9207   LearningRate 0.0138   Epoch: 12   Global Step: 71470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:31,049-Speed 3388.76 samples/sec   Loss 3.1577   LearningRate 0.0138   Epoch: 12   Global Step: 71480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:49:34,058-Speed 3403.62 samples/sec   Loss 3.0114   LearningRate 0.0138   Epoch: 12   Global Step: 71490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:37,081-Speed 3388.28 samples/sec   Loss 2.9297   LearningRate 0.0138   Epoch: 12   Global Step: 71500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:40,096-Speed 3398.07 samples/sec   Loss 2.9366   LearningRate 0.0138   Epoch: 12   Global Step: 71510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:43,116-Speed 3391.93 samples/sec   Loss 2.9441   LearningRate 0.0138   Epoch: 12   Global Step: 71520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:46,133-Speed 3395.23 samples/sec   Loss 3.0375   LearningRate 0.0138   Epoch: 12   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:49,182-Speed 3358.42 samples/sec   Loss 3.0414   LearningRate 0.0138   Epoch: 12   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:52,202-Speed 3392.08 samples/sec   Loss 2.9293   LearningRate 0.0138   Epoch: 12   Global Step: 71550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:55,217-Speed 3396.58 samples/sec   Loss 3.0096   LearningRate 0.0137   Epoch: 12   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:49:58,237-Speed 3391.71 samples/sec   Loss 2.8658   LearningRate 0.0137   Epoch: 12   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:01,257-Speed 3391.59 samples/sec   Loss 2.9103   LearningRate 0.0137   Epoch: 12   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:04,275-Speed 3394.00 samples/sec   Loss 3.0797   LearningRate 0.0137   Epoch: 12   Global Step: 71590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:50:07,288-Speed 3398.82 samples/sec   Loss 2.9102   LearningRate 0.0137   Epoch: 12   Global Step: 71600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:10,331-Speed 3366.06 samples/sec   Loss 3.0505   LearningRate 0.0137   Epoch: 12   Global Step: 71610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:13,407-Speed 3329.47 samples/sec   Loss 3.0175   LearningRate 0.0137   Epoch: 12   Global Step: 71620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:16,572-Speed 3236.97 samples/sec   Loss 2.9774   LearningRate 0.0137   Epoch: 12   Global Step: 71630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:19,590-Speed 3393.29 samples/sec   Loss 2.8842   LearningRate 0.0137   Epoch: 12   Global Step: 71640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:22,610-Speed 3392.30 samples/sec   Loss 3.0111   LearningRate 0.0137   Epoch: 12   Global Step: 71650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:25,636-Speed 3383.97 samples/sec   Loss 3.0372   LearningRate 0.0137   Epoch: 12   Global Step: 71660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:28,667-Speed 3379.96 samples/sec   Loss 3.0502   LearningRate 0.0137   Epoch: 12   Global Step: 71670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:31,687-Speed 3390.50 samples/sec   Loss 3.0522   LearningRate 0.0137   Epoch: 12   Global Step: 71680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:34,715-Speed 3382.40 samples/sec   Loss 3.0511   LearningRate 0.0137   Epoch: 12   Global Step: 71690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:37,730-Speed 3396.89 samples/sec   Loss 3.0050   LearningRate 0.0137   Epoch: 12   Global Step: 71700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:40,752-Speed 3389.90 samples/sec   Loss 3.0415   LearningRate 0.0136   Epoch: 12   Global Step: 71710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:43,776-Speed 3387.73 samples/sec   Loss 3.0627   LearningRate 0.0136   Epoch: 12   Global Step: 71720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:46,797-Speed 3390.07 samples/sec   Loss 2.8825   LearningRate 0.0136   Epoch: 12   Global Step: 71730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:49,819-Speed 3389.45 samples/sec   Loss 3.0109   LearningRate 0.0136   Epoch: 12   Global Step: 71740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:52,840-Speed 3389.63 samples/sec   Loss 2.9249   LearningRate 0.0136   Epoch: 12   Global Step: 71750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:55,869-Speed 3381.34 samples/sec   Loss 2.9515   LearningRate 0.0136   Epoch: 12   Global Step: 71760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:50:58,892-Speed 3388.63 samples/sec   Loss 3.0029   LearningRate 0.0136   Epoch: 12   Global Step: 71770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:01,928-Speed 3373.53 samples/sec   Loss 3.0484   LearningRate 0.0136   Epoch: 12   Global Step: 71780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:04,961-Speed 3376.56 samples/sec   Loss 2.9873   LearningRate 0.0136   Epoch: 12   Global Step: 71790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:07,964-Speed 3411.31 samples/sec   Loss 2.9760   LearningRate 0.0136   Epoch: 12   Global Step: 71800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:10,990-Speed 3385.15 samples/sec   Loss 2.9444   LearningRate 0.0136   Epoch: 12   Global Step: 71810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:14,021-Speed 3378.54 samples/sec   Loss 2.9574   LearningRate 0.0136   Epoch: 12   Global Step: 71820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:17,046-Speed 3385.96 samples/sec   Loss 2.9334   LearningRate 0.0136   Epoch: 12   Global Step: 71830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:20,072-Speed 3385.34 samples/sec   Loss 2.9169   LearningRate 0.0136   Epoch: 12   Global Step: 71840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:23,110-Speed 3371.29 samples/sec   Loss 2.9639   LearningRate 0.0136   Epoch: 12   Global Step: 71850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:26,143-Speed 3376.38 samples/sec   Loss 2.9595   LearningRate 0.0135   Epoch: 12   Global Step: 71860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:29,168-Speed 3386.20 samples/sec   Loss 2.9943   LearningRate 0.0135   Epoch: 12   Global Step: 71870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:32,192-Speed 3387.56 samples/sec   Loss 3.0210   LearningRate 0.0135   Epoch: 12   Global Step: 71880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:35,214-Speed 3389.48 samples/sec   Loss 2.9866   LearningRate 0.0135   Epoch: 12   Global Step: 71890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:38,251-Speed 3372.27 samples/sec   Loss 3.0346   LearningRate 0.0135   Epoch: 12   Global Step: 71900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:51:41,255-Speed 3409.40 samples/sec   Loss 3.0235   LearningRate 0.0135   Epoch: 12   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:44,277-Speed 3389.54 samples/sec   Loss 3.0138   LearningRate 0.0135   Epoch: 12   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:47,300-Speed 3388.13 samples/sec   Loss 2.8917   LearningRate 0.0135   Epoch: 12   Global Step: 71930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:50,321-Speed 3390.77 samples/sec   Loss 3.0403   LearningRate 0.0135   Epoch: 12   Global Step: 71940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:53,361-Speed 3368.78 samples/sec   Loss 2.8835   LearningRate 0.0135   Epoch: 12   Global Step: 71950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:56,385-Speed 3387.10 samples/sec   Loss 2.9847   LearningRate 0.0135   Epoch: 12   Global Step: 71960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:51:59,407-Speed 3389.01 samples/sec   Loss 2.9743   LearningRate 0.0135   Epoch: 12   Global Step: 71970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:52:02,430-Speed 3387.57 samples/sec   Loss 2.9027   LearningRate 0.0135   Epoch: 12   Global Step: 71980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:52:05,454-Speed 3387.89 samples/sec   Loss 2.9736   LearningRate 0.0135   Epoch: 12   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:52:08,481-Speed 3383.50 samples/sec   Loss 2.9344   LearningRate 0.0135   Epoch: 12   Global Step: 72000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:52:51,790-[lfw][72000]XNorm: 22.158012
Training: 2022-04-27 08:52:51,791-[lfw][72000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-27 08:52:51,791-[lfw][72000]Accuracy-Highest: 0.99817
Training: 2022-04-27 08:53:42,115-[cfp_fp][72000]XNorm: 20.789949
Training: 2022-04-27 08:53:42,116-[cfp_fp][72000]Accuracy-Flip: 0.97557+-0.00875
Training: 2022-04-27 08:53:42,116-[cfp_fp][72000]Accuracy-Highest: 0.97557
Training: 2022-04-27 08:54:25,649-[agedb_30][72000]XNorm: 22.255885
Training: 2022-04-27 08:54:25,650-[agedb_30][72000]Accuracy-Flip: 0.97800+-0.00690
Training: 2022-04-27 08:54:25,650-[agedb_30][72000]Accuracy-Highest: 0.97950
Training: 2022-04-27 08:54:28,668-Speed 73.05 samples/sec   Loss 3.0417   LearningRate 0.0135   Epoch: 12   Global Step: 72010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:54:31,677-Speed 3404.22 samples/sec   Loss 3.0046   LearningRate 0.0134   Epoch: 12   Global Step: 72020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:54:34,689-Speed 3400.69 samples/sec   Loss 2.9336   LearningRate 0.0134   Epoch: 12   Global Step: 72030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:54:37,690-Speed 3412.86 samples/sec   Loss 2.9366   LearningRate 0.0134   Epoch: 12   Global Step: 72040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:54:40,711-Speed 3389.58 samples/sec   Loss 2.9592   LearningRate 0.0134   Epoch: 12   Global Step: 72050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:54:43,745-Speed 3375.57 samples/sec   Loss 2.9353   LearningRate 0.0134   Epoch: 12   Global Step: 72060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:54:46,757-Speed 3400.39 samples/sec   Loss 2.9072   LearningRate 0.0134   Epoch: 12   Global Step: 72070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:54:49,772-Speed 3397.92 samples/sec   Loss 2.8902   LearningRate 0.0134   Epoch: 12   Global Step: 72080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:54:52,795-Speed 3388.13 samples/sec   Loss 2.9547   LearningRate 0.0134   Epoch: 12   Global Step: 72090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:54:55,815-Speed 3392.14 samples/sec   Loss 2.9920   LearningRate 0.0134   Epoch: 12   Global Step: 72100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:54:58,839-Speed 3386.76 samples/sec   Loss 2.8781   LearningRate 0.0134   Epoch: 12   Global Step: 72110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:01,862-Speed 3387.95 samples/sec   Loss 2.8874   LearningRate 0.0134   Epoch: 12   Global Step: 72120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:04,892-Speed 3380.38 samples/sec   Loss 2.9221   LearningRate 0.0134   Epoch: 12   Global Step: 72130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:07,906-Speed 3398.19 samples/sec   Loss 3.0439   LearningRate 0.0134   Epoch: 12   Global Step: 72140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:10,908-Speed 3411.48 samples/sec   Loss 2.9778   LearningRate 0.0134   Epoch: 12   Global Step: 72150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:13,930-Speed 3389.02 samples/sec   Loss 2.9649   LearningRate 0.0134   Epoch: 12   Global Step: 72160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:16,958-Speed 3383.29 samples/sec   Loss 2.9213   LearningRate 0.0133   Epoch: 12   Global Step: 72170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:19,975-Speed 3394.57 samples/sec   Loss 3.0227   LearningRate 0.0133   Epoch: 12   Global Step: 72180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:23,061-Speed 3319.41 samples/sec   Loss 2.9675   LearningRate 0.0133   Epoch: 12   Global Step: 72190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:26,088-Speed 3383.87 samples/sec   Loss 2.8613   LearningRate 0.0133   Epoch: 12   Global Step: 72200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:29,119-Speed 3379.42 samples/sec   Loss 2.9441   LearningRate 0.0133   Epoch: 12   Global Step: 72210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:32,141-Speed 3388.84 samples/sec   Loss 3.0157   LearningRate 0.0133   Epoch: 12   Global Step: 72220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:35,168-Speed 3383.40 samples/sec   Loss 3.0637   LearningRate 0.0133   Epoch: 12   Global Step: 72230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:38,199-Speed 3379.50 samples/sec   Loss 2.9006   LearningRate 0.0133   Epoch: 12   Global Step: 72240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:55:41,221-Speed 3389.32 samples/sec   Loss 3.0498   LearningRate 0.0133   Epoch: 12   Global Step: 72250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:44,252-Speed 3378.90 samples/sec   Loss 2.9198   LearningRate 0.0133   Epoch: 12   Global Step: 72260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:47,318-Speed 3340.66 samples/sec   Loss 2.9474   LearningRate 0.0133   Epoch: 12   Global Step: 72270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:50,377-Speed 3349.81 samples/sec   Loss 2.8934   LearningRate 0.0133   Epoch: 12   Global Step: 72280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:53,404-Speed 3383.95 samples/sec   Loss 2.9373   LearningRate 0.0133   Epoch: 12   Global Step: 72290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:56,428-Speed 3386.57 samples/sec   Loss 2.8655   LearningRate 0.0133   Epoch: 12   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:55:59,452-Speed 3387.05 samples/sec   Loss 2.9988   LearningRate 0.0133   Epoch: 12   Global Step: 72310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:02,476-Speed 3386.65 samples/sec   Loss 3.0289   LearningRate 0.0133   Epoch: 12   Global Step: 72320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:05,503-Speed 3384.50 samples/sec   Loss 2.9863   LearningRate 0.0132   Epoch: 12   Global Step: 72330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:08,525-Speed 3389.02 samples/sec   Loss 2.9773   LearningRate 0.0132   Epoch: 12   Global Step: 72340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:11,551-Speed 3384.17 samples/sec   Loss 2.9201   LearningRate 0.0132   Epoch: 12   Global Step: 72350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:56:14,583-Speed 3378.43 samples/sec   Loss 2.9123   LearningRate 0.0132   Epoch: 12   Global Step: 72360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:56:17,587-Speed 3410.01 samples/sec   Loss 2.9157   LearningRate 0.0132   Epoch: 12   Global Step: 72370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:20,608-Speed 3391.16 samples/sec   Loss 2.9752   LearningRate 0.0132   Epoch: 12   Global Step: 72380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:23,677-Speed 3337.11 samples/sec   Loss 2.8976   LearningRate 0.0132   Epoch: 12   Global Step: 72390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:26,726-Speed 3359.57 samples/sec   Loss 2.9724   LearningRate 0.0132   Epoch: 12   Global Step: 72400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:29,751-Speed 3385.72 samples/sec   Loss 2.9064   LearningRate 0.0132   Epoch: 12   Global Step: 72410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:32,771-Speed 3390.63 samples/sec   Loss 2.9512   LearningRate 0.0132   Epoch: 12   Global Step: 72420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:35,792-Speed 3390.98 samples/sec   Loss 2.8686   LearningRate 0.0132   Epoch: 12   Global Step: 72430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:38,814-Speed 3388.83 samples/sec   Loss 2.8654   LearningRate 0.0132   Epoch: 12   Global Step: 72440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:41,838-Speed 3387.54 samples/sec   Loss 2.9493   LearningRate 0.0132   Epoch: 12   Global Step: 72450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:44,858-Speed 3391.47 samples/sec   Loss 2.9905   LearningRate 0.0132   Epoch: 12   Global Step: 72460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:47,876-Speed 3393.32 samples/sec   Loss 2.8669   LearningRate 0.0132   Epoch: 12   Global Step: 72470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:56:50,897-Speed 3390.97 samples/sec   Loss 2.9673   LearningRate 0.0132   Epoch: 12   Global Step: 72480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:56:53,900-Speed 3410.57 samples/sec   Loss 2.8717   LearningRate 0.0131   Epoch: 12   Global Step: 72490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:56,917-Speed 3394.59 samples/sec   Loss 2.9620   LearningRate 0.0131   Epoch: 12   Global Step: 72500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:56:59,958-Speed 3368.33 samples/sec   Loss 2.9365   LearningRate 0.0131   Epoch: 12   Global Step: 72510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:02,991-Speed 3377.44 samples/sec   Loss 2.9956   LearningRate 0.0131   Epoch: 12   Global Step: 72520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:06,012-Speed 3389.86 samples/sec   Loss 2.8465   LearningRate 0.0131   Epoch: 12   Global Step: 72530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:09,038-Speed 3384.83 samples/sec   Loss 2.9552   LearningRate 0.0131   Epoch: 12   Global Step: 72540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:12,061-Speed 3388.30 samples/sec   Loss 2.9269   LearningRate 0.0131   Epoch: 12   Global Step: 72550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:15,059-Speed 3416.02 samples/sec   Loss 2.8960   LearningRate 0.0131   Epoch: 12   Global Step: 72560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:18,078-Speed 3392.36 samples/sec   Loss 2.9222   LearningRate 0.0131   Epoch: 12   Global Step: 72570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:21,100-Speed 3390.17 samples/sec   Loss 2.9589   LearningRate 0.0131   Epoch: 12   Global Step: 72580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:24,124-Speed 3387.10 samples/sec   Loss 2.9127   LearningRate 0.0131   Epoch: 12   Global Step: 72590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:27,147-Speed 3388.36 samples/sec   Loss 2.8460   LearningRate 0.0131   Epoch: 12   Global Step: 72600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:30,175-Speed 3381.60 samples/sec   Loss 2.9347   LearningRate 0.0131   Epoch: 12   Global Step: 72610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:33,204-Speed 3381.91 samples/sec   Loss 2.9632   LearningRate 0.0131   Epoch: 12   Global Step: 72620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:36,247-Speed 3364.89 samples/sec   Loss 2.9449   LearningRate 0.0131   Epoch: 12   Global Step: 72630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:39,268-Speed 3391.54 samples/sec   Loss 2.9280   LearningRate 0.0130   Epoch: 12   Global Step: 72640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:42,285-Speed 3395.31 samples/sec   Loss 3.0044   LearningRate 0.0130   Epoch: 12   Global Step: 72650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:57:45,321-Speed 3372.86 samples/sec   Loss 2.8817   LearningRate 0.0130   Epoch: 12   Global Step: 72660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:48,339-Speed 3393.88 samples/sec   Loss 2.8744   LearningRate 0.0130   Epoch: 12   Global Step: 72670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:51,358-Speed 3392.89 samples/sec   Loss 2.8848   LearningRate 0.0130   Epoch: 12   Global Step: 72680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:54,377-Speed 3392.35 samples/sec   Loss 2.9969   LearningRate 0.0130   Epoch: 12   Global Step: 72690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:57:57,409-Speed 3377.92 samples/sec   Loss 3.0340   LearningRate 0.0130   Epoch: 12   Global Step: 72700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:00,439-Speed 3380.96 samples/sec   Loss 2.9694   LearningRate 0.0130   Epoch: 12   Global Step: 72710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:03,465-Speed 3384.18 samples/sec   Loss 2.9211   LearningRate 0.0130   Epoch: 12   Global Step: 72720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:06,492-Speed 3383.63 samples/sec   Loss 2.8591   LearningRate 0.0130   Epoch: 12   Global Step: 72730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:09,520-Speed 3382.88 samples/sec   Loss 2.9554   LearningRate 0.0130   Epoch: 12   Global Step: 72740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:12,596-Speed 3329.47 samples/sec   Loss 2.9081   LearningRate 0.0130   Epoch: 12   Global Step: 72750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:15,623-Speed 3384.26 samples/sec   Loss 3.0438   LearningRate 0.0130   Epoch: 12   Global Step: 72760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:58:18,627-Speed 3409.65 samples/sec   Loss 2.8704   LearningRate 0.0130   Epoch: 12   Global Step: 72770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:21,648-Speed 3389.74 samples/sec   Loss 2.8690   LearningRate 0.0130   Epoch: 12   Global Step: 72780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:24,706-Speed 3349.41 samples/sec   Loss 2.9705   LearningRate 0.0130   Epoch: 12   Global Step: 72790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:27,739-Speed 3377.27 samples/sec   Loss 2.8748   LearningRate 0.0129   Epoch: 12   Global Step: 72800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:30,763-Speed 3387.24 samples/sec   Loss 2.9610   LearningRate 0.0129   Epoch: 12   Global Step: 72810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:33,785-Speed 3389.37 samples/sec   Loss 2.9684   LearningRate 0.0129   Epoch: 12   Global Step: 72820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:36,817-Speed 3378.57 samples/sec   Loss 2.9279   LearningRate 0.0129   Epoch: 12   Global Step: 72830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:39,839-Speed 3389.26 samples/sec   Loss 2.9099   LearningRate 0.0129   Epoch: 12   Global Step: 72840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:42,861-Speed 3388.80 samples/sec   Loss 2.8181   LearningRate 0.0129   Epoch: 12   Global Step: 72850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:45,881-Speed 3391.43 samples/sec   Loss 2.8713   LearningRate 0.0129   Epoch: 12   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:48,905-Speed 3386.95 samples/sec   Loss 2.8996   LearningRate 0.0129   Epoch: 12   Global Step: 72870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:58:51,932-Speed 3383.94 samples/sec   Loss 2.9933   LearningRate 0.0129   Epoch: 12   Global Step: 72880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 08:58:54,940-Speed 3404.37 samples/sec   Loss 2.9494   LearningRate 0.0129   Epoch: 12   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:58:58,036-Speed 3308.48 samples/sec   Loss 2.8497   LearningRate 0.0129   Epoch: 12   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:01,087-Speed 3357.90 samples/sec   Loss 3.0402   LearningRate 0.0129   Epoch: 12   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:04,210-Speed 3279.63 samples/sec   Loss 3.0286   LearningRate 0.0129   Epoch: 12   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:07,236-Speed 3384.86 samples/sec   Loss 2.9189   LearningRate 0.0129   Epoch: 12   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:10,260-Speed 3386.53 samples/sec   Loss 2.9943   LearningRate 0.0129   Epoch: 12   Global Step: 72940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:13,287-Speed 3384.04 samples/sec   Loss 2.8651   LearningRate 0.0129   Epoch: 12   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:16,298-Speed 3401.33 samples/sec   Loss 2.8662   LearningRate 0.0128   Epoch: 12   Global Step: 72960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:19,325-Speed 3383.42 samples/sec   Loss 3.0109   LearningRate 0.0128   Epoch: 12   Global Step: 72970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:22,355-Speed 3380.39 samples/sec   Loss 3.0068   LearningRate 0.0128   Epoch: 12   Global Step: 72980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:25,393-Speed 3372.31 samples/sec   Loss 3.0098   LearningRate 0.0128   Epoch: 12   Global Step: 72990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:28,423-Speed 3380.98 samples/sec   Loss 2.9869   LearningRate 0.0128   Epoch: 12   Global Step: 73000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:31,451-Speed 3382.83 samples/sec   Loss 3.0532   LearningRate 0.0128   Epoch: 12   Global Step: 73010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:34,484-Speed 3376.69 samples/sec   Loss 2.9699   LearningRate 0.0128   Epoch: 12   Global Step: 73020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:37,507-Speed 3387.69 samples/sec   Loss 2.9548   LearningRate 0.0128   Epoch: 12   Global Step: 73030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:40,533-Speed 3385.64 samples/sec   Loss 2.9927   LearningRate 0.0128   Epoch: 12   Global Step: 73040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:43,560-Speed 3383.02 samples/sec   Loss 2.8180   LearningRate 0.0128   Epoch: 12   Global Step: 73050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 08:59:46,609-Speed 3358.92 samples/sec   Loss 2.9191   LearningRate 0.0128   Epoch: 12   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:49,707-Speed 3306.79 samples/sec   Loss 3.0338   LearningRate 0.0128   Epoch: 12   Global Step: 73070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:52,789-Speed 3323.25 samples/sec   Loss 2.9421   LearningRate 0.0128   Epoch: 12   Global Step: 73080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:55,811-Speed 3388.77 samples/sec   Loss 2.9367   LearningRate 0.0128   Epoch: 12   Global Step: 73090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 08:59:58,836-Speed 3386.89 samples/sec   Loss 2.8891   LearningRate 0.0128   Epoch: 12   Global Step: 73100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:01,858-Speed 3388.91 samples/sec   Loss 2.8427   LearningRate 0.0128   Epoch: 12   Global Step: 73110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:04,920-Speed 3344.90 samples/sec   Loss 3.0667   LearningRate 0.0127   Epoch: 12   Global Step: 73120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:07,950-Speed 3379.82 samples/sec   Loss 2.8976   LearningRate 0.0127   Epoch: 12   Global Step: 73130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:10,984-Speed 3376.08 samples/sec   Loss 2.9693   LearningRate 0.0127   Epoch: 12   Global Step: 73140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:14,018-Speed 3376.10 samples/sec   Loss 2.9801   LearningRate 0.0127   Epoch: 12   Global Step: 73150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:17,100-Speed 3322.40 samples/sec   Loss 2.9294   LearningRate 0.0127   Epoch: 12   Global Step: 73160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:20,132-Speed 3378.32 samples/sec   Loss 2.8829   LearningRate 0.0127   Epoch: 12   Global Step: 73170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:23,157-Speed 3386.17 samples/sec   Loss 2.9208   LearningRate 0.0127   Epoch: 12   Global Step: 73180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:26,177-Speed 3392.23 samples/sec   Loss 2.7978   LearningRate 0.0127   Epoch: 12   Global Step: 73190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:29,197-Speed 3390.84 samples/sec   Loss 2.9562   LearningRate 0.0127   Epoch: 12   Global Step: 73200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:32,220-Speed 3388.58 samples/sec   Loss 2.9643   LearningRate 0.0127   Epoch: 12   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:35,249-Speed 3381.13 samples/sec   Loss 2.8365   LearningRate 0.0127   Epoch: 12   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:38,326-Speed 3329.22 samples/sec   Loss 2.8939   LearningRate 0.0127   Epoch: 12   Global Step: 73230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:41,354-Speed 3382.74 samples/sec   Loss 2.8007   LearningRate 0.0127   Epoch: 12   Global Step: 73240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:44,375-Speed 3389.95 samples/sec   Loss 2.8758   LearningRate 0.0127   Epoch: 12   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:47,440-Speed 3341.72 samples/sec   Loss 2.9350   LearningRate 0.0127   Epoch: 12   Global Step: 73260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:00:50,490-Speed 3359.00 samples/sec   Loss 2.8800   LearningRate 0.0127   Epoch: 12   Global Step: 73270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:00:53,499-Speed 3403.39 samples/sec   Loss 2.9448   LearningRate 0.0126   Epoch: 12   Global Step: 73280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:56,519-Speed 3391.01 samples/sec   Loss 2.8822   LearningRate 0.0126   Epoch: 12   Global Step: 73290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:00:59,554-Speed 3375.36 samples/sec   Loss 2.9363   LearningRate 0.0126   Epoch: 12   Global Step: 73300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:02,579-Speed 3386.16 samples/sec   Loss 2.9328   LearningRate 0.0126   Epoch: 12   Global Step: 73310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:05,607-Speed 3382.97 samples/sec   Loss 2.8604   LearningRate 0.0126   Epoch: 12   Global Step: 73320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:08,651-Speed 3364.36 samples/sec   Loss 2.8660   LearningRate 0.0126   Epoch: 12   Global Step: 73330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:11,680-Speed 3380.99 samples/sec   Loss 3.0384   LearningRate 0.0126   Epoch: 12   Global Step: 73340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:14,703-Speed 3388.40 samples/sec   Loss 2.9014   LearningRate 0.0126   Epoch: 12   Global Step: 73350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:17,733-Speed 3380.35 samples/sec   Loss 2.8864   LearningRate 0.0126   Epoch: 12   Global Step: 73360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:20,762-Speed 3382.02 samples/sec   Loss 2.8674   LearningRate 0.0126   Epoch: 12   Global Step: 73370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:23,865-Speed 3300.80 samples/sec   Loss 2.8950   LearningRate 0.0126   Epoch: 12   Global Step: 73380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:26,905-Speed 3368.40 samples/sec   Loss 2.8909   LearningRate 0.0126   Epoch: 12   Global Step: 73390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:29,932-Speed 3383.64 samples/sec   Loss 2.8958   LearningRate 0.0126   Epoch: 12   Global Step: 73400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:32,959-Speed 3384.18 samples/sec   Loss 2.8766   LearningRate 0.0126   Epoch: 12   Global Step: 73410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:36,008-Speed 3359.37 samples/sec   Loss 2.9006   LearningRate 0.0126   Epoch: 12   Global Step: 73420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:39,038-Speed 3380.02 samples/sec   Loss 2.8586   LearningRate 0.0126   Epoch: 12   Global Step: 73430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:42,068-Speed 3381.01 samples/sec   Loss 2.9212   LearningRate 0.0125   Epoch: 12   Global Step: 73440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:45,094-Speed 3384.86 samples/sec   Loss 2.8678   LearningRate 0.0125   Epoch: 12   Global Step: 73450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:48,123-Speed 3381.45 samples/sec   Loss 2.8719   LearningRate 0.0125   Epoch: 12   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:51,183-Speed 3346.54 samples/sec   Loss 2.8746   LearningRate 0.0125   Epoch: 12   Global Step: 73470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:01:54,215-Speed 3378.20 samples/sec   Loss 2.7850   LearningRate 0.0125   Epoch: 12   Global Step: 73480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:01:57,225-Speed 3402.17 samples/sec   Loss 2.8934   LearningRate 0.0125   Epoch: 12   Global Step: 73490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:00,262-Speed 3372.90 samples/sec   Loss 2.9791   LearningRate 0.0125   Epoch: 12   Global Step: 73500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:03,287-Speed 3386.51 samples/sec   Loss 2.7747   LearningRate 0.0125   Epoch: 12   Global Step: 73510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:06,331-Speed 3364.61 samples/sec   Loss 2.9994   LearningRate 0.0125   Epoch: 12   Global Step: 73520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:09,380-Speed 3359.06 samples/sec   Loss 2.8507   LearningRate 0.0125   Epoch: 12   Global Step: 73530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:12,413-Speed 3376.61 samples/sec   Loss 2.8774   LearningRate 0.0125   Epoch: 12   Global Step: 73540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:15,423-Speed 3402.80 samples/sec   Loss 2.9192   LearningRate 0.0125   Epoch: 12   Global Step: 73550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:18,452-Speed 3382.25 samples/sec   Loss 2.7602   LearningRate 0.0125   Epoch: 12   Global Step: 73560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:21,488-Speed 3373.37 samples/sec   Loss 2.8998   LearningRate 0.0125   Epoch: 12   Global Step: 73570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:24,517-Speed 3380.85 samples/sec   Loss 2.8270   LearningRate 0.0125   Epoch: 12   Global Step: 73580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:27,548-Speed 3379.50 samples/sec   Loss 2.9970   LearningRate 0.0125   Epoch: 12   Global Step: 73590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:30,575-Speed 3383.30 samples/sec   Loss 2.7373   LearningRate 0.0124   Epoch: 12   Global Step: 73600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:33,613-Speed 3371.51 samples/sec   Loss 2.8123   LearningRate 0.0124   Epoch: 12   Global Step: 73610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:36,645-Speed 3378.16 samples/sec   Loss 2.8558   LearningRate 0.0124   Epoch: 12   Global Step: 73620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:39,673-Speed 3383.03 samples/sec   Loss 2.7808   LearningRate 0.0124   Epoch: 12   Global Step: 73630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:42,699-Speed 3384.73 samples/sec   Loss 2.8472   LearningRate 0.0124   Epoch: 12   Global Step: 73640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:02:45,728-Speed 3381.22 samples/sec   Loss 2.8752   LearningRate 0.0124   Epoch: 12   Global Step: 73650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:48,752-Speed 3387.83 samples/sec   Loss 2.7922   LearningRate 0.0124   Epoch: 12   Global Step: 73660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:51,786-Speed 3375.70 samples/sec   Loss 2.8463   LearningRate 0.0124   Epoch: 12   Global Step: 73670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:54,810-Speed 3386.87 samples/sec   Loss 2.9139   LearningRate 0.0124   Epoch: 12   Global Step: 73680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:02:57,879-Speed 3337.21 samples/sec   Loss 2.7771   LearningRate 0.0124   Epoch: 12   Global Step: 73690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:00,930-Speed 3357.96 samples/sec   Loss 2.7920   LearningRate 0.0124   Epoch: 12   Global Step: 73700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:03,966-Speed 3373.98 samples/sec   Loss 2.7743   LearningRate 0.0124   Epoch: 12   Global Step: 73710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:06,993-Speed 3383.34 samples/sec   Loss 2.9315   LearningRate 0.0124   Epoch: 12   Global Step: 73720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:10,022-Speed 3382.02 samples/sec   Loss 2.8681   LearningRate 0.0124   Epoch: 12   Global Step: 73730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:13,050-Speed 3381.80 samples/sec   Loss 2.9373   LearningRate 0.0124   Epoch: 12   Global Step: 73740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:16,079-Speed 3382.43 samples/sec   Loss 2.8042   LearningRate 0.0124   Epoch: 12   Global Step: 73750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:19,111-Speed 3378.64 samples/sec   Loss 2.8214   LearningRate 0.0123   Epoch: 12   Global Step: 73760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:22,138-Speed 3383.21 samples/sec   Loss 2.8484   LearningRate 0.0123   Epoch: 12   Global Step: 73770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:25,168-Speed 3380.02 samples/sec   Loss 2.9432   LearningRate 0.0123   Epoch: 12   Global Step: 73780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:28,199-Speed 3379.83 samples/sec   Loss 2.8418   LearningRate 0.0123   Epoch: 12   Global Step: 73790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:31,228-Speed 3381.45 samples/sec   Loss 2.7803   LearningRate 0.0123   Epoch: 12   Global Step: 73800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:34,256-Speed 3382.88 samples/sec   Loss 2.8536   LearningRate 0.0123   Epoch: 12   Global Step: 73810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:37,285-Speed 3381.08 samples/sec   Loss 2.8954   LearningRate 0.0123   Epoch: 12   Global Step: 73820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:40,315-Speed 3381.04 samples/sec   Loss 2.9475   LearningRate 0.0123   Epoch: 12   Global Step: 73830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:43,344-Speed 3380.35 samples/sec   Loss 2.7333   LearningRate 0.0123   Epoch: 12   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:46,427-Speed 3322.08 samples/sec   Loss 2.8988   LearningRate 0.0123   Epoch: 12   Global Step: 73850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:03:49,446-Speed 3393.58 samples/sec   Loss 2.8565   LearningRate 0.0123   Epoch: 12   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:52,495-Speed 3359.45 samples/sec   Loss 2.8528   LearningRate 0.0123   Epoch: 12   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:55,529-Speed 3376.16 samples/sec   Loss 2.9475   LearningRate 0.0123   Epoch: 12   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:03:58,557-Speed 3382.77 samples/sec   Loss 2.7411   LearningRate 0.0123   Epoch: 12   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:01,585-Speed 3383.04 samples/sec   Loss 2.8188   LearningRate 0.0123   Epoch: 12   Global Step: 73900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:04,744-Speed 3242.47 samples/sec   Loss 2.7688   LearningRate 0.0123   Epoch: 12   Global Step: 73910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:18,626-Speed 737.69 samples/sec   Loss 2.5737   LearningRate 0.0122   Epoch: 13   Global Step: 73920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:21,650-Speed 3388.20 samples/sec   Loss 2.2306   LearningRate 0.0122   Epoch: 13   Global Step: 73930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:24,695-Speed 3363.59 samples/sec   Loss 2.2511   LearningRate 0.0122   Epoch: 13   Global Step: 73940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:27,752-Speed 3350.14 samples/sec   Loss 2.2807   LearningRate 0.0122   Epoch: 13   Global Step: 73950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:30,771-Speed 3393.55 samples/sec   Loss 2.3043   LearningRate 0.0122   Epoch: 13   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:04:33,791-Speed 3390.57 samples/sec   Loss 2.3033   LearningRate 0.0122   Epoch: 13   Global Step: 73970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:04:36,808-Speed 3395.78 samples/sec   Loss 2.2224   LearningRate 0.0122   Epoch: 13   Global Step: 73980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:39,835-Speed 3383.43 samples/sec   Loss 2.2244   LearningRate 0.0122   Epoch: 13   Global Step: 73990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:04:42,856-Speed 3391.00 samples/sec   Loss 2.3194   LearningRate 0.0122   Epoch: 13   Global Step: 74000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:05:26,291-[lfw][74000]XNorm: 22.218615
Training: 2022-04-27 09:05:26,292-[lfw][74000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-27 09:05:26,292-[lfw][74000]Accuracy-Highest: 0.99817
Training: 2022-04-27 09:06:17,167-[cfp_fp][74000]XNorm: 20.875479
Training: 2022-04-27 09:06:17,168-[cfp_fp][74000]Accuracy-Flip: 0.97429+-0.00769
Training: 2022-04-27 09:06:17,168-[cfp_fp][74000]Accuracy-Highest: 0.97557
Training: 2022-04-27 09:07:00,602-[agedb_30][74000]XNorm: 22.516372
Training: 2022-04-27 09:07:00,603-[agedb_30][74000]Accuracy-Flip: 0.98100+-0.00700
Training: 2022-04-27 09:07:00,603-[agedb_30][74000]Accuracy-Highest: 0.98100
Training: 2022-04-27 09:07:03,644-Speed 72.73 samples/sec   Loss 2.2675   LearningRate 0.0122   Epoch: 13   Global Step: 74010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:06,651-Speed 3405.72 samples/sec   Loss 2.3848   LearningRate 0.0122   Epoch: 13   Global Step: 74020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:09,658-Speed 3405.68 samples/sec   Loss 2.3096   LearningRate 0.0122   Epoch: 13   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:12,671-Speed 3399.38 samples/sec   Loss 2.3462   LearningRate 0.0122   Epoch: 13   Global Step: 74040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:15,692-Speed 3390.88 samples/sec   Loss 2.2541   LearningRate 0.0122   Epoch: 13   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:18,703-Speed 3401.28 samples/sec   Loss 2.3619   LearningRate 0.0122   Epoch: 13   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:21,728-Speed 3386.14 samples/sec   Loss 2.2892   LearningRate 0.0122   Epoch: 13   Global Step: 74070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:24,734-Speed 3407.48 samples/sec   Loss 2.2911   LearningRate 0.0122   Epoch: 13   Global Step: 74080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:27,756-Speed 3389.09 samples/sec   Loss 2.2837   LearningRate 0.0121   Epoch: 13   Global Step: 74090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:30,774-Speed 3393.43 samples/sec   Loss 2.3875   LearningRate 0.0121   Epoch: 13   Global Step: 74100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:33,788-Speed 3398.39 samples/sec   Loss 2.2875   LearningRate 0.0121   Epoch: 13   Global Step: 74110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:36,937-Speed 3252.96 samples/sec   Loss 2.3015   LearningRate 0.0121   Epoch: 13   Global Step: 74120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:39,953-Speed 3395.24 samples/sec   Loss 2.3470   LearningRate 0.0121   Epoch: 13   Global Step: 74130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:42,969-Speed 3396.26 samples/sec   Loss 2.2526   LearningRate 0.0121   Epoch: 13   Global Step: 74140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:45,996-Speed 3383.78 samples/sec   Loss 2.3743   LearningRate 0.0121   Epoch: 13   Global Step: 74150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:49,032-Speed 3374.39 samples/sec   Loss 2.3086   LearningRate 0.0121   Epoch: 13   Global Step: 74160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:52,057-Speed 3385.91 samples/sec   Loss 2.3707   LearningRate 0.0121   Epoch: 13   Global Step: 74170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:07:55,072-Speed 3396.93 samples/sec   Loss 2.3725   LearningRate 0.0121   Epoch: 13   Global Step: 74180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:07:58,088-Speed 3396.06 samples/sec   Loss 2.3318   LearningRate 0.0121   Epoch: 13   Global Step: 74190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:08:01,084-Speed 3417.88 samples/sec   Loss 2.3869   LearningRate 0.0121   Epoch: 13   Global Step: 74200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:04,106-Speed 3389.80 samples/sec   Loss 2.2941   LearningRate 0.0121   Epoch: 13   Global Step: 74210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:07,127-Speed 3390.34 samples/sec   Loss 2.3784   LearningRate 0.0121   Epoch: 13   Global Step: 74220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:10,143-Speed 3396.11 samples/sec   Loss 2.2992   LearningRate 0.0121   Epoch: 13   Global Step: 74230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:13,160-Speed 3394.67 samples/sec   Loss 2.3051   LearningRate 0.0121   Epoch: 13   Global Step: 74240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:16,194-Speed 3375.83 samples/sec   Loss 2.3933   LearningRate 0.0120   Epoch: 13   Global Step: 74250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:19,222-Speed 3382.95 samples/sec   Loss 2.3296   LearningRate 0.0120   Epoch: 13   Global Step: 74260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:22,238-Speed 3396.55 samples/sec   Loss 2.3804   LearningRate 0.0120   Epoch: 13   Global Step: 74270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:25,252-Speed 3397.38 samples/sec   Loss 2.3862   LearningRate 0.0120   Epoch: 13   Global Step: 74280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:28,267-Speed 3397.68 samples/sec   Loss 2.3515   LearningRate 0.0120   Epoch: 13   Global Step: 74290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:31,265-Speed 3416.73 samples/sec   Loss 2.4436   LearningRate 0.0120   Epoch: 13   Global Step: 74300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:34,282-Speed 3394.41 samples/sec   Loss 2.4174   LearningRate 0.0120   Epoch: 13   Global Step: 74310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:37,323-Speed 3367.88 samples/sec   Loss 2.4148   LearningRate 0.0120   Epoch: 13   Global Step: 74320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:40,340-Speed 3394.56 samples/sec   Loss 2.3879   LearningRate 0.0120   Epoch: 13   Global Step: 74330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:43,355-Speed 3398.59 samples/sec   Loss 2.3777   LearningRate 0.0120   Epoch: 13   Global Step: 74340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:46,369-Speed 3397.95 samples/sec   Loss 2.4062   LearningRate 0.0120   Epoch: 13   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:49,387-Speed 3393.13 samples/sec   Loss 2.3218   LearningRate 0.0120   Epoch: 13   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:52,408-Speed 3390.82 samples/sec   Loss 2.3809   LearningRate 0.0120   Epoch: 13   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:55,436-Speed 3382.05 samples/sec   Loss 2.3621   LearningRate 0.0120   Epoch: 13   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:08:58,465-Speed 3382.05 samples/sec   Loss 2.4255   LearningRate 0.0120   Epoch: 13   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:09:01,488-Speed 3387.89 samples/sec   Loss 2.3319   LearningRate 0.0120   Epoch: 13   Global Step: 74400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:09:04,503-Speed 3396.57 samples/sec   Loss 2.4230   LearningRate 0.0119   Epoch: 13   Global Step: 74410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:09:07,525-Speed 3390.32 samples/sec   Loss 2.4330   LearningRate 0.0119   Epoch: 13   Global Step: 74420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:09:10,542-Speed 3394.05 samples/sec   Loss 2.3540   LearningRate 0.0119   Epoch: 13   Global Step: 74430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:09:13,560-Speed 3394.61 samples/sec   Loss 2.4290   LearningRate 0.0119   Epoch: 13   Global Step: 74440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:09:16,561-Speed 3413.09 samples/sec   Loss 2.3743   LearningRate 0.0119   Epoch: 13   Global Step: 74450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:09:19,578-Speed 3394.73 samples/sec   Loss 2.4023   LearningRate 0.0119   Epoch: 13   Global Step: 74460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:09:22,596-Speed 3393.55 samples/sec   Loss 2.3750   LearningRate 0.0119   Epoch: 13   Global Step: 74470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:25,670-Speed 3332.66 samples/sec   Loss 2.3885   LearningRate 0.0119   Epoch: 13   Global Step: 74480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:28,746-Speed 3329.41 samples/sec   Loss 2.5126   LearningRate 0.0119   Epoch: 13   Global Step: 74490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:31,761-Speed 3396.32 samples/sec   Loss 2.3732   LearningRate 0.0119   Epoch: 13   Global Step: 74500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:34,794-Speed 3376.88 samples/sec   Loss 2.3727   LearningRate 0.0119   Epoch: 13   Global Step: 74510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:37,813-Speed 3393.25 samples/sec   Loss 2.3249   LearningRate 0.0119   Epoch: 13   Global Step: 74520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:40,837-Speed 3386.98 samples/sec   Loss 2.4289   LearningRate 0.0119   Epoch: 13   Global Step: 74530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:43,854-Speed 3395.84 samples/sec   Loss 2.4537   LearningRate 0.0119   Epoch: 13   Global Step: 74540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:46,878-Speed 3386.85 samples/sec   Loss 2.3363   LearningRate 0.0119   Epoch: 13   Global Step: 74550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:49,896-Speed 3393.80 samples/sec   Loss 2.4068   LearningRate 0.0119   Epoch: 13   Global Step: 74560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:09:52,920-Speed 3386.27 samples/sec   Loss 2.3877   LearningRate 0.0119   Epoch: 13   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:09:55,938-Speed 3393.73 samples/sec   Loss 2.4066   LearningRate 0.0118   Epoch: 13   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:09:58,959-Speed 3390.46 samples/sec   Loss 2.4550   LearningRate 0.0118   Epoch: 13   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:01,986-Speed 3384.14 samples/sec   Loss 2.4509   LearningRate 0.0118   Epoch: 13   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:05,003-Speed 3395.14 samples/sec   Loss 2.4504   LearningRate 0.0118   Epoch: 13   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:08,017-Speed 3398.15 samples/sec   Loss 2.3937   LearningRate 0.0118   Epoch: 13   Global Step: 74620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:11,036-Speed 3392.97 samples/sec   Loss 2.4541   LearningRate 0.0118   Epoch: 13   Global Step: 74630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:14,058-Speed 3388.38 samples/sec   Loss 2.4327   LearningRate 0.0118   Epoch: 13   Global Step: 74640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:17,086-Speed 3382.49 samples/sec   Loss 2.4355   LearningRate 0.0118   Epoch: 13   Global Step: 74650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:20,105-Speed 3392.36 samples/sec   Loss 2.4371   LearningRate 0.0118   Epoch: 13   Global Step: 74660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:23,128-Speed 3388.15 samples/sec   Loss 2.3990   LearningRate 0.0118   Epoch: 13   Global Step: 74670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:10:26,131-Speed 3411.76 samples/sec   Loss 2.3167   LearningRate 0.0118   Epoch: 13   Global Step: 74680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:29,152-Speed 3389.77 samples/sec   Loss 2.4250   LearningRate 0.0118   Epoch: 13   Global Step: 74690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:32,167-Speed 3397.98 samples/sec   Loss 2.3634   LearningRate 0.0118   Epoch: 13   Global Step: 74700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:35,201-Speed 3376.56 samples/sec   Loss 2.4990   LearningRate 0.0118   Epoch: 13   Global Step: 74710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:38,243-Speed 3366.51 samples/sec   Loss 2.4382   LearningRate 0.0118   Epoch: 13   Global Step: 74720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:41,260-Speed 3394.48 samples/sec   Loss 2.4370   LearningRate 0.0118   Epoch: 13   Global Step: 74730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:44,284-Speed 3387.06 samples/sec   Loss 2.4515   LearningRate 0.0117   Epoch: 13   Global Step: 74740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:47,300-Speed 3396.55 samples/sec   Loss 2.4608   LearningRate 0.0117   Epoch: 13   Global Step: 74750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:50,318-Speed 3393.23 samples/sec   Loss 2.5139   LearningRate 0.0117   Epoch: 13   Global Step: 74760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:53,342-Speed 3387.55 samples/sec   Loss 2.4445   LearningRate 0.0117   Epoch: 13   Global Step: 74770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:56,349-Speed 3407.34 samples/sec   Loss 2.4881   LearningRate 0.0117   Epoch: 13   Global Step: 74780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:10:59,388-Speed 3370.46 samples/sec   Loss 2.4894   LearningRate 0.0117   Epoch: 13   Global Step: 74790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:02,414-Speed 3385.32 samples/sec   Loss 2.3523   LearningRate 0.0117   Epoch: 13   Global Step: 74800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:05,437-Speed 3388.21 samples/sec   Loss 2.3954   LearningRate 0.0117   Epoch: 13   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:08,458-Speed 3389.79 samples/sec   Loss 2.3770   LearningRate 0.0117   Epoch: 13   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:11,492-Speed 3375.91 samples/sec   Loss 2.5121   LearningRate 0.0117   Epoch: 13   Global Step: 74830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:14,516-Speed 3387.93 samples/sec   Loss 2.3935   LearningRate 0.0117   Epoch: 13   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:17,537-Speed 3390.41 samples/sec   Loss 2.5553   LearningRate 0.0117   Epoch: 13   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:20,561-Speed 3386.41 samples/sec   Loss 2.3168   LearningRate 0.0117   Epoch: 13   Global Step: 74860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:23,586-Speed 3385.67 samples/sec   Loss 2.4057   LearningRate 0.0117   Epoch: 13   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:26,595-Speed 3404.11 samples/sec   Loss 2.4432   LearningRate 0.0117   Epoch: 13   Global Step: 74880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:29,616-Speed 3390.26 samples/sec   Loss 2.5158   LearningRate 0.0117   Epoch: 13   Global Step: 74890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:32,651-Speed 3375.60 samples/sec   Loss 2.4981   LearningRate 0.0117   Epoch: 13   Global Step: 74900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:35,683-Speed 3378.05 samples/sec   Loss 2.3952   LearningRate 0.0116   Epoch: 13   Global Step: 74910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:38,701-Speed 3393.35 samples/sec   Loss 2.5538   LearningRate 0.0116   Epoch: 13   Global Step: 74920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:41,722-Speed 3389.80 samples/sec   Loss 2.4045   LearningRate 0.0116   Epoch: 13   Global Step: 74930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:44,743-Speed 3391.02 samples/sec   Loss 2.4736   LearningRate 0.0116   Epoch: 13   Global Step: 74940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:47,776-Speed 3376.95 samples/sec   Loss 2.5266   LearningRate 0.0116   Epoch: 13   Global Step: 74950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:50,825-Speed 3359.36 samples/sec   Loss 2.4229   LearningRate 0.0116   Epoch: 13   Global Step: 74960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:53,847-Speed 3390.55 samples/sec   Loss 2.4122   LearningRate 0.0116   Epoch: 13   Global Step: 74970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:11:56,872-Speed 3386.06 samples/sec   Loss 2.4712   LearningRate 0.0116   Epoch: 13   Global Step: 74980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:11:59,893-Speed 3390.06 samples/sec   Loss 2.4944   LearningRate 0.0116   Epoch: 13   Global Step: 74990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:12:02,919-Speed 3385.30 samples/sec   Loss 2.4428   LearningRate 0.0116   Epoch: 13   Global Step: 75000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:12:05,939-Speed 3391.10 samples/sec   Loss 2.4833   LearningRate 0.0116   Epoch: 13   Global Step: 75010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:12:08,938-Speed 3414.96 samples/sec   Loss 2.3871   LearningRate 0.0116   Epoch: 13   Global Step: 75020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:12:11,946-Speed 3405.84 samples/sec   Loss 2.4166   LearningRate 0.0116   Epoch: 13   Global Step: 75030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:14,971-Speed 3385.06 samples/sec   Loss 2.4905   LearningRate 0.0116   Epoch: 13   Global Step: 75040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:17,993-Speed 3389.66 samples/sec   Loss 2.4312   LearningRate 0.0116   Epoch: 13   Global Step: 75050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:21,011-Speed 3393.34 samples/sec   Loss 2.4670   LearningRate 0.0116   Epoch: 13   Global Step: 75060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:24,030-Speed 3393.64 samples/sec   Loss 2.6106   LearningRate 0.0116   Epoch: 13   Global Step: 75070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:27,049-Speed 3392.15 samples/sec   Loss 2.4800   LearningRate 0.0115   Epoch: 13   Global Step: 75080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:30,074-Speed 3385.49 samples/sec   Loss 2.4003   LearningRate 0.0115   Epoch: 13   Global Step: 75090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:33,100-Speed 3385.21 samples/sec   Loss 2.4378   LearningRate 0.0115   Epoch: 13   Global Step: 75100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:36,124-Speed 3387.01 samples/sec   Loss 2.4825   LearningRate 0.0115   Epoch: 13   Global Step: 75110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:39,155-Speed 3378.84 samples/sec   Loss 2.5329   LearningRate 0.0115   Epoch: 13   Global Step: 75120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:42,172-Speed 3395.28 samples/sec   Loss 2.3318   LearningRate 0.0115   Epoch: 13   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:12:45,189-Speed 3394.50 samples/sec   Loss 2.4932   LearningRate 0.0115   Epoch: 13   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:12:48,190-Speed 3413.45 samples/sec   Loss 2.5572   LearningRate 0.0115   Epoch: 13   Global Step: 75150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:51,208-Speed 3393.93 samples/sec   Loss 2.4391   LearningRate 0.0115   Epoch: 13   Global Step: 75160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:54,229-Speed 3390.75 samples/sec   Loss 2.5146   LearningRate 0.0115   Epoch: 13   Global Step: 75170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:12:57,254-Speed 3385.81 samples/sec   Loss 2.5640   LearningRate 0.0115   Epoch: 13   Global Step: 75180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:00,306-Speed 3355.12 samples/sec   Loss 2.5221   LearningRate 0.0115   Epoch: 13   Global Step: 75190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:03,329-Speed 3388.62 samples/sec   Loss 2.4516   LearningRate 0.0115   Epoch: 13   Global Step: 75200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:06,352-Speed 3387.88 samples/sec   Loss 2.5264   LearningRate 0.0115   Epoch: 13   Global Step: 75210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:09,374-Speed 3389.34 samples/sec   Loss 2.3430   LearningRate 0.0115   Epoch: 13   Global Step: 75220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:12,400-Speed 3384.11 samples/sec   Loss 2.4496   LearningRate 0.0115   Epoch: 13   Global Step: 75230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:15,420-Speed 3391.75 samples/sec   Loss 2.5229   LearningRate 0.0114   Epoch: 13   Global Step: 75240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:18,446-Speed 3385.11 samples/sec   Loss 2.5636   LearningRate 0.0114   Epoch: 13   Global Step: 75250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:21,469-Speed 3388.76 samples/sec   Loss 2.5147   LearningRate 0.0114   Epoch: 13   Global Step: 75260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:24,492-Speed 3388.08 samples/sec   Loss 2.5084   LearningRate 0.0114   Epoch: 13   Global Step: 75270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:27,516-Speed 3387.43 samples/sec   Loss 2.4506   LearningRate 0.0114   Epoch: 13   Global Step: 75280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:30,540-Speed 3386.89 samples/sec   Loss 2.4400   LearningRate 0.0114   Epoch: 13   Global Step: 75290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:33,561-Speed 3390.10 samples/sec   Loss 2.4713   LearningRate 0.0114   Epoch: 13   Global Step: 75300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:36,584-Speed 3387.89 samples/sec   Loss 2.5145   LearningRate 0.0114   Epoch: 13   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:39,617-Speed 3376.53 samples/sec   Loss 2.5194   LearningRate 0.0114   Epoch: 13   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:42,747-Speed 3272.43 samples/sec   Loss 2.4713   LearningRate 0.0114   Epoch: 13   Global Step: 75330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:45,788-Speed 3368.60 samples/sec   Loss 2.5856   LearningRate 0.0114   Epoch: 13   Global Step: 75340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:13:48,811-Speed 3388.74 samples/sec   Loss 2.5200   LearningRate 0.0114   Epoch: 13   Global Step: 75350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:13:51,835-Speed 3386.25 samples/sec   Loss 2.3983   LearningRate 0.0114   Epoch: 13   Global Step: 75360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:13:54,826-Speed 3425.16 samples/sec   Loss 2.5248   LearningRate 0.0114   Epoch: 13   Global Step: 75370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:13:57,847-Speed 3390.20 samples/sec   Loss 2.3963   LearningRate 0.0114   Epoch: 13   Global Step: 75380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:00,871-Speed 3386.27 samples/sec   Loss 2.5011   LearningRate 0.0114   Epoch: 13   Global Step: 75390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:03,897-Speed 3385.04 samples/sec   Loss 2.5229   LearningRate 0.0114   Epoch: 13   Global Step: 75400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:06,922-Speed 3385.94 samples/sec   Loss 2.4343   LearningRate 0.0113   Epoch: 13   Global Step: 75410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:09,998-Speed 3330.40 samples/sec   Loss 2.5299   LearningRate 0.0113   Epoch: 13   Global Step: 75420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:13,039-Speed 3368.09 samples/sec   Loss 2.4728   LearningRate 0.0113   Epoch: 13   Global Step: 75430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:16,073-Speed 3375.29 samples/sec   Loss 2.4913   LearningRate 0.0113   Epoch: 13   Global Step: 75440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:19,095-Speed 3389.37 samples/sec   Loss 2.4395   LearningRate 0.0113   Epoch: 13   Global Step: 75450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:22,122-Speed 3383.82 samples/sec   Loss 2.5433   LearningRate 0.0113   Epoch: 13   Global Step: 75460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:14:25,150-Speed 3382.85 samples/sec   Loss 2.5456   LearningRate 0.0113   Epoch: 13   Global Step: 75470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:28,176-Speed 3384.89 samples/sec   Loss 2.5378   LearningRate 0.0113   Epoch: 13   Global Step: 75480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:31,197-Speed 3389.44 samples/sec   Loss 2.4089   LearningRate 0.0113   Epoch: 13   Global Step: 75490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:34,229-Speed 3377.96 samples/sec   Loss 2.5118   LearningRate 0.0113   Epoch: 13   Global Step: 75500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:37,253-Speed 3388.06 samples/sec   Loss 2.5640   LearningRate 0.0113   Epoch: 13   Global Step: 75510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:40,274-Speed 3390.48 samples/sec   Loss 2.5216   LearningRate 0.0113   Epoch: 13   Global Step: 75520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:43,298-Speed 3387.44 samples/sec   Loss 2.6048   LearningRate 0.0113   Epoch: 13   Global Step: 75530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:46,326-Speed 3381.74 samples/sec   Loss 2.4682   LearningRate 0.0113   Epoch: 13   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:49,365-Speed 3370.96 samples/sec   Loss 2.5552   LearningRate 0.0113   Epoch: 13   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:52,399-Speed 3375.83 samples/sec   Loss 2.4319   LearningRate 0.0113   Epoch: 13   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:55,405-Speed 3408.69 samples/sec   Loss 2.4857   LearningRate 0.0113   Epoch: 13   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:14:58,436-Speed 3379.27 samples/sec   Loss 2.5148   LearningRate 0.0112   Epoch: 13   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:01,460-Speed 3386.23 samples/sec   Loss 2.5783   LearningRate 0.0112   Epoch: 13   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:04,486-Speed 3384.74 samples/sec   Loss 2.5401   LearningRate 0.0112   Epoch: 13   Global Step: 75600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:07,507-Speed 3391.67 samples/sec   Loss 2.5878   LearningRate 0.0112   Epoch: 13   Global Step: 75610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:10,554-Speed 3360.78 samples/sec   Loss 2.5117   LearningRate 0.0112   Epoch: 13   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:13,576-Speed 3389.00 samples/sec   Loss 2.4223   LearningRate 0.0112   Epoch: 13   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:16,602-Speed 3385.00 samples/sec   Loss 2.4793   LearningRate 0.0112   Epoch: 13   Global Step: 75640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:19,630-Speed 3382.86 samples/sec   Loss 2.5403   LearningRate 0.0112   Epoch: 13   Global Step: 75650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:22,664-Speed 3375.60 samples/sec   Loss 2.5220   LearningRate 0.0112   Epoch: 13   Global Step: 75660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:25,753-Speed 3315.87 samples/sec   Loss 2.4904   LearningRate 0.0112   Epoch: 13   Global Step: 75670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:15:28,762-Speed 3404.71 samples/sec   Loss 2.4221   LearningRate 0.0112   Epoch: 13   Global Step: 75680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:31,784-Speed 3388.63 samples/sec   Loss 2.4194   LearningRate 0.0112   Epoch: 13   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:34,808-Speed 3387.21 samples/sec   Loss 2.6107   LearningRate 0.0112   Epoch: 13   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:37,828-Speed 3392.22 samples/sec   Loss 2.5043   LearningRate 0.0112   Epoch: 13   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:40,852-Speed 3386.59 samples/sec   Loss 2.5626   LearningRate 0.0112   Epoch: 13   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:43,876-Speed 3386.87 samples/sec   Loss 2.4940   LearningRate 0.0112   Epoch: 13   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:46,896-Speed 3391.51 samples/sec   Loss 2.4832   LearningRate 0.0112   Epoch: 13   Global Step: 75740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:49,932-Speed 3373.42 samples/sec   Loss 2.5242   LearningRate 0.0111   Epoch: 13   Global Step: 75750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:52,952-Speed 3391.84 samples/sec   Loss 2.5616   LearningRate 0.0111   Epoch: 13   Global Step: 75760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:55,988-Speed 3373.59 samples/sec   Loss 2.5794   LearningRate 0.0111   Epoch: 13   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:15:59,016-Speed 3383.64 samples/sec   Loss 2.6332   LearningRate 0.0111   Epoch: 13   Global Step: 75780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:16:02,023-Speed 3405.31 samples/sec   Loss 2.5984   LearningRate 0.0111   Epoch: 13   Global Step: 75790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:05,045-Speed 3389.40 samples/sec   Loss 2.4985   LearningRate 0.0111   Epoch: 13   Global Step: 75800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:08,073-Speed 3382.59 samples/sec   Loss 2.5487   LearningRate 0.0111   Epoch: 13   Global Step: 75810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:11,112-Speed 3370.37 samples/sec   Loss 2.6098   LearningRate 0.0111   Epoch: 13   Global Step: 75820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:14,140-Speed 3382.50 samples/sec   Loss 2.6042   LearningRate 0.0111   Epoch: 13   Global Step: 75830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:17,166-Speed 3384.40 samples/sec   Loss 2.5321   LearningRate 0.0111   Epoch: 13   Global Step: 75840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:20,213-Speed 3362.29 samples/sec   Loss 2.5620   LearningRate 0.0111   Epoch: 13   Global Step: 75850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:23,233-Speed 3391.11 samples/sec   Loss 2.5256   LearningRate 0.0111   Epoch: 13   Global Step: 75860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:26,261-Speed 3383.02 samples/sec   Loss 2.4587   LearningRate 0.0111   Epoch: 13   Global Step: 75870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:29,284-Speed 3388.03 samples/sec   Loss 2.5102   LearningRate 0.0111   Epoch: 13   Global Step: 75880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:32,307-Speed 3387.79 samples/sec   Loss 2.4867   LearningRate 0.0111   Epoch: 13   Global Step: 75890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:16:35,324-Speed 3395.27 samples/sec   Loss 2.5573   LearningRate 0.0111   Epoch: 13   Global Step: 75900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:38,366-Speed 3367.26 samples/sec   Loss 2.5537   LearningRate 0.0111   Epoch: 13   Global Step: 75910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:41,386-Speed 3391.17 samples/sec   Loss 2.6193   LearningRate 0.0110   Epoch: 13   Global Step: 75920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:44,413-Speed 3383.55 samples/sec   Loss 2.6272   LearningRate 0.0110   Epoch: 13   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:47,436-Speed 3387.82 samples/sec   Loss 2.4678   LearningRate 0.0110   Epoch: 13   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:50,468-Speed 3378.15 samples/sec   Loss 2.3707   LearningRate 0.0110   Epoch: 13   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:53,509-Speed 3368.87 samples/sec   Loss 2.5463   LearningRate 0.0110   Epoch: 13   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:56,539-Speed 3380.31 samples/sec   Loss 2.4780   LearningRate 0.0110   Epoch: 13   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:16:59,564-Speed 3385.56 samples/sec   Loss 2.4456   LearningRate 0.0110   Epoch: 13   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:17:02,594-Speed 3380.28 samples/sec   Loss 2.3603   LearningRate 0.0110   Epoch: 13   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:17:05,607-Speed 3399.48 samples/sec   Loss 2.5215   LearningRate 0.0110   Epoch: 13   Global Step: 76000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:17:49,111-[lfw][76000]XNorm: 20.858766
Training: 2022-04-27 09:17:49,111-[lfw][76000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-04-27 09:17:49,112-[lfw][76000]Accuracy-Highest: 0.99817
Training: 2022-04-27 09:18:39,696-[cfp_fp][76000]XNorm: 19.454355
Training: 2022-04-27 09:18:39,697-[cfp_fp][76000]Accuracy-Flip: 0.97743+-0.00654
Training: 2022-04-27 09:18:39,697-[cfp_fp][76000]Accuracy-Highest: 0.97743
Training: 2022-04-27 09:19:23,208-[agedb_30][76000]XNorm: 21.044185
Training: 2022-04-27 09:19:23,209-[agedb_30][76000]Accuracy-Flip: 0.98100+-0.00688
Training: 2022-04-27 09:19:23,209-[agedb_30][76000]Accuracy-Highest: 0.98100
Training: 2022-04-27 09:19:26,237-Speed 72.82 samples/sec   Loss 2.5592   LearningRate 0.0110   Epoch: 13   Global Step: 76010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:19:29,248-Speed 3401.10 samples/sec   Loss 2.4819   LearningRate 0.0110   Epoch: 13   Global Step: 76020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:19:32,256-Speed 3405.42 samples/sec   Loss 2.5111   LearningRate 0.0110   Epoch: 13   Global Step: 76030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:19:35,267-Speed 3401.85 samples/sec   Loss 2.5450   LearningRate 0.0110   Epoch: 13   Global Step: 76040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:19:38,277-Speed 3402.48 samples/sec   Loss 2.5311   LearningRate 0.0110   Epoch: 13   Global Step: 76050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:19:41,303-Speed 3384.08 samples/sec   Loss 2.3555   LearningRate 0.0110   Epoch: 13   Global Step: 76060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:19:44,299-Speed 3418.84 samples/sec   Loss 2.4443   LearningRate 0.0110   Epoch: 13   Global Step: 76070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:19:47,322-Speed 3388.44 samples/sec   Loss 2.3866   LearningRate 0.0110   Epoch: 13   Global Step: 76080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:19:50,344-Speed 3389.05 samples/sec   Loss 2.5319   LearningRate 0.0109   Epoch: 13   Global Step: 76090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:19:53,361-Speed 3395.72 samples/sec   Loss 2.5646   LearningRate 0.0109   Epoch: 13   Global Step: 76100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:19:56,381-Speed 3391.08 samples/sec   Loss 2.5499   LearningRate 0.0109   Epoch: 13   Global Step: 76110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:19:59,407-Speed 3384.31 samples/sec   Loss 2.6061   LearningRate 0.0109   Epoch: 13   Global Step: 76120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:20:02,429-Speed 3389.03 samples/sec   Loss 2.5249   LearningRate 0.0109   Epoch: 13   Global Step: 76130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:20:05,449-Speed 3392.17 samples/sec   Loss 2.6559   LearningRate 0.0109   Epoch: 13   Global Step: 76140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:20:08,469-Speed 3391.10 samples/sec   Loss 2.4444   LearningRate 0.0109   Epoch: 13   Global Step: 76150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:20:11,511-Speed 3367.70 samples/sec   Loss 2.5493   LearningRate 0.0109   Epoch: 13   Global Step: 76160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:20:14,561-Speed 3358.42 samples/sec   Loss 2.4853   LearningRate 0.0109   Epoch: 13   Global Step: 76170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:17,597-Speed 3373.13 samples/sec   Loss 2.5593   LearningRate 0.0109   Epoch: 13   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:20,637-Speed 3369.70 samples/sec   Loss 2.5386   LearningRate 0.0109   Epoch: 13   Global Step: 76190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:23,662-Speed 3385.21 samples/sec   Loss 2.4525   LearningRate 0.0109   Epoch: 13   Global Step: 76200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:26,691-Speed 3382.03 samples/sec   Loss 2.5234   LearningRate 0.0109   Epoch: 13   Global Step: 76210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:29,707-Speed 3396.09 samples/sec   Loss 2.5757   LearningRate 0.0109   Epoch: 13   Global Step: 76220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:32,722-Speed 3396.90 samples/sec   Loss 2.5156   LearningRate 0.0109   Epoch: 13   Global Step: 76230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:35,740-Speed 3393.37 samples/sec   Loss 2.4442   LearningRate 0.0109   Epoch: 13   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:38,775-Speed 3375.52 samples/sec   Loss 2.5249   LearningRate 0.0109   Epoch: 13   Global Step: 76250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:41,794-Speed 3392.55 samples/sec   Loss 2.4971   LearningRate 0.0109   Epoch: 13   Global Step: 76260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:44,813-Speed 3392.47 samples/sec   Loss 2.5443   LearningRate 0.0108   Epoch: 13   Global Step: 76270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:20:47,808-Speed 3419.41 samples/sec   Loss 2.5261   LearningRate 0.0108   Epoch: 13   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:50,827-Speed 3392.43 samples/sec   Loss 2.5952   LearningRate 0.0108   Epoch: 13   Global Step: 76290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:53,848-Speed 3390.47 samples/sec   Loss 2.5643   LearningRate 0.0108   Epoch: 13   Global Step: 76300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:56,861-Speed 3399.98 samples/sec   Loss 2.5903   LearningRate 0.0108   Epoch: 13   Global Step: 76310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:20:59,879-Speed 3393.92 samples/sec   Loss 2.5520   LearningRate 0.0108   Epoch: 13   Global Step: 76320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:02,903-Speed 3386.08 samples/sec   Loss 2.5386   LearningRate 0.0108   Epoch: 13   Global Step: 76330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:05,916-Speed 3399.95 samples/sec   Loss 2.5556   LearningRate 0.0108   Epoch: 13   Global Step: 76340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:08,928-Speed 3401.30 samples/sec   Loss 2.4987   LearningRate 0.0108   Epoch: 13   Global Step: 76350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:11,941-Speed 3398.97 samples/sec   Loss 2.5869   LearningRate 0.0108   Epoch: 13   Global Step: 76360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:14,953-Speed 3399.93 samples/sec   Loss 2.5645   LearningRate 0.0108   Epoch: 13   Global Step: 76370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:18,230-Speed 3126.27 samples/sec   Loss 2.4947   LearningRate 0.0108   Epoch: 13   Global Step: 76380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:21,306-Speed 3330.01 samples/sec   Loss 2.4578   LearningRate 0.0108   Epoch: 13   Global Step: 76390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:24,334-Speed 3381.72 samples/sec   Loss 2.4658   LearningRate 0.0108   Epoch: 13   Global Step: 76400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:27,360-Speed 3385.49 samples/sec   Loss 2.4972   LearningRate 0.0108   Epoch: 13   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:30,410-Speed 3357.60 samples/sec   Loss 2.5032   LearningRate 0.0108   Epoch: 13   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:33,424-Speed 3398.06 samples/sec   Loss 2.6158   LearningRate 0.0108   Epoch: 13   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:36,447-Speed 3388.44 samples/sec   Loss 2.5160   LearningRate 0.0107   Epoch: 13   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:39,459-Speed 3400.54 samples/sec   Loss 2.6182   LearningRate 0.0107   Epoch: 13   Global Step: 76450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:42,474-Speed 3397.74 samples/sec   Loss 2.4419   LearningRate 0.0107   Epoch: 13   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:45,482-Speed 3404.46 samples/sec   Loss 2.5038   LearningRate 0.0107   Epoch: 13   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:48,494-Speed 3400.74 samples/sec   Loss 2.4255   LearningRate 0.0107   Epoch: 13   Global Step: 76480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:21:51,490-Speed 3418.68 samples/sec   Loss 2.6022   LearningRate 0.0107   Epoch: 13   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:54,495-Speed 3408.14 samples/sec   Loss 2.5416   LearningRate 0.0107   Epoch: 13   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:21:57,513-Speed 3394.49 samples/sec   Loss 2.5339   LearningRate 0.0107   Epoch: 13   Global Step: 76510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:00,518-Speed 3408.04 samples/sec   Loss 2.4447   LearningRate 0.0107   Epoch: 13   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:03,539-Speed 3390.35 samples/sec   Loss 2.5541   LearningRate 0.0107   Epoch: 13   Global Step: 76530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:06,551-Speed 3400.15 samples/sec   Loss 2.6210   LearningRate 0.0107   Epoch: 13   Global Step: 76540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:09,569-Speed 3394.73 samples/sec   Loss 2.4532   LearningRate 0.0107   Epoch: 13   Global Step: 76550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:12,584-Speed 3397.12 samples/sec   Loss 2.5190   LearningRate 0.0107   Epoch: 13   Global Step: 76560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:15,594-Speed 3401.83 samples/sec   Loss 2.5072   LearningRate 0.0107   Epoch: 13   Global Step: 76570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:18,612-Speed 3394.50 samples/sec   Loss 2.4817   LearningRate 0.0107   Epoch: 13   Global Step: 76580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:21,623-Speed 3401.56 samples/sec   Loss 2.5470   LearningRate 0.0107   Epoch: 13   Global Step: 76590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:22:24,650-Speed 3382.86 samples/sec   Loss 2.5031   LearningRate 0.0107   Epoch: 13   Global Step: 76600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:27,663-Speed 3399.85 samples/sec   Loss 2.5615   LearningRate 0.0106   Epoch: 13   Global Step: 76610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:30,683-Speed 3391.77 samples/sec   Loss 2.5177   LearningRate 0.0106   Epoch: 13   Global Step: 76620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:33,700-Speed 3395.28 samples/sec   Loss 2.5341   LearningRate 0.0106   Epoch: 13   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:36,725-Speed 3385.96 samples/sec   Loss 2.6267   LearningRate 0.0106   Epoch: 13   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:39,757-Speed 3377.98 samples/sec   Loss 2.4804   LearningRate 0.0106   Epoch: 13   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:42,786-Speed 3381.24 samples/sec   Loss 2.5836   LearningRate 0.0106   Epoch: 13   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:45,803-Speed 3394.75 samples/sec   Loss 2.6087   LearningRate 0.0106   Epoch: 13   Global Step: 76670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:48,825-Speed 3388.90 samples/sec   Loss 2.5054   LearningRate 0.0106   Epoch: 13   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:51,846-Speed 3391.21 samples/sec   Loss 2.6444   LearningRate 0.0106   Epoch: 13   Global Step: 76690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:22:54,862-Speed 3395.25 samples/sec   Loss 2.5543   LearningRate 0.0106   Epoch: 13   Global Step: 76700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:22:57,861-Speed 3416.32 samples/sec   Loss 2.4423   LearningRate 0.0106   Epoch: 13   Global Step: 76710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:00,901-Speed 3369.23 samples/sec   Loss 2.4899   LearningRate 0.0106   Epoch: 13   Global Step: 76720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:03,921-Speed 3390.88 samples/sec   Loss 2.6166   LearningRate 0.0106   Epoch: 13   Global Step: 76730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:06,941-Speed 3391.81 samples/sec   Loss 2.5000   LearningRate 0.0106   Epoch: 13   Global Step: 76740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:09,958-Speed 3394.71 samples/sec   Loss 2.5264   LearningRate 0.0106   Epoch: 13   Global Step: 76750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:12,978-Speed 3392.09 samples/sec   Loss 2.4733   LearningRate 0.0106   Epoch: 13   Global Step: 76760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:16,004-Speed 3383.82 samples/sec   Loss 2.4864   LearningRate 0.0106   Epoch: 13   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:19,026-Speed 3389.28 samples/sec   Loss 2.5989   LearningRate 0.0106   Epoch: 13   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:22,051-Speed 3386.75 samples/sec   Loss 2.4592   LearningRate 0.0105   Epoch: 13   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:25,079-Speed 3381.91 samples/sec   Loss 2.5299   LearningRate 0.0105   Epoch: 13   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:28,081-Speed 3412.52 samples/sec   Loss 2.4891   LearningRate 0.0105   Epoch: 13   Global Step: 76810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:31,102-Speed 3390.60 samples/sec   Loss 2.5020   LearningRate 0.0105   Epoch: 13   Global Step: 76820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:34,120-Speed 3393.86 samples/sec   Loss 2.6191   LearningRate 0.0105   Epoch: 13   Global Step: 76830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:37,139-Speed 3392.14 samples/sec   Loss 2.5419   LearningRate 0.0105   Epoch: 13   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:40,158-Speed 3392.51 samples/sec   Loss 2.4875   LearningRate 0.0105   Epoch: 13   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:23:43,162-Speed 3409.74 samples/sec   Loss 2.4513   LearningRate 0.0105   Epoch: 13   Global Step: 76860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:23:46,201-Speed 3370.34 samples/sec   Loss 2.4681   LearningRate 0.0105   Epoch: 13   Global Step: 76870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:23:49,227-Speed 3384.62 samples/sec   Loss 2.4528   LearningRate 0.0105   Epoch: 13   Global Step: 76880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:23:52,252-Speed 3385.65 samples/sec   Loss 2.4999   LearningRate 0.0105   Epoch: 13   Global Step: 76890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:23:55,268-Speed 3396.83 samples/sec   Loss 2.4642   LearningRate 0.0105   Epoch: 13   Global Step: 76900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:23:58,283-Speed 3396.56 samples/sec   Loss 2.4649   LearningRate 0.0105   Epoch: 13   Global Step: 76910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:01,300-Speed 3395.40 samples/sec   Loss 2.5258   LearningRate 0.0105   Epoch: 13   Global Step: 76920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:04,320-Speed 3391.60 samples/sec   Loss 2.4820   LearningRate 0.0105   Epoch: 13   Global Step: 76930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:07,336-Speed 3395.05 samples/sec   Loss 2.5036   LearningRate 0.0105   Epoch: 13   Global Step: 76940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:10,355-Speed 3393.09 samples/sec   Loss 2.5041   LearningRate 0.0105   Epoch: 13   Global Step: 76950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:13,371-Speed 3395.83 samples/sec   Loss 2.5142   LearningRate 0.0104   Epoch: 13   Global Step: 76960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:24:16,400-Speed 3381.02 samples/sec   Loss 2.4668   LearningRate 0.0104   Epoch: 13   Global Step: 76970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:24:19,414-Speed 3399.08 samples/sec   Loss 2.5322   LearningRate 0.0104   Epoch: 13   Global Step: 76980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:24:22,432-Speed 3393.98 samples/sec   Loss 2.4525   LearningRate 0.0104   Epoch: 13   Global Step: 76990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:24:25,451-Speed 3393.07 samples/sec   Loss 2.3872   LearningRate 0.0104   Epoch: 13   Global Step: 77000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:24:28,472-Speed 3397.90 samples/sec   Loss 2.5065   LearningRate 0.0104   Epoch: 13   Global Step: 77010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:24:31,471-Speed 3414.56 samples/sec   Loss 2.6494   LearningRate 0.0104   Epoch: 13   Global Step: 77020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:34,493-Speed 3389.00 samples/sec   Loss 2.5562   LearningRate 0.0104   Epoch: 13   Global Step: 77030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:37,515-Speed 3389.33 samples/sec   Loss 2.6125   LearningRate 0.0104   Epoch: 13   Global Step: 77040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:40,535-Speed 3391.97 samples/sec   Loss 2.5615   LearningRate 0.0104   Epoch: 13   Global Step: 77050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:43,553-Speed 3393.83 samples/sec   Loss 2.4890   LearningRate 0.0104   Epoch: 13   Global Step: 77060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:46,570-Speed 3394.75 samples/sec   Loss 2.4876   LearningRate 0.0104   Epoch: 13   Global Step: 77070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:49,591-Speed 3391.27 samples/sec   Loss 2.6119   LearningRate 0.0104   Epoch: 13   Global Step: 77080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:52,611-Speed 3391.64 samples/sec   Loss 2.4858   LearningRate 0.0104   Epoch: 13   Global Step: 77090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:55,631-Speed 3390.86 samples/sec   Loss 2.6040   LearningRate 0.0104   Epoch: 13   Global Step: 77100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:24:58,663-Speed 3378.38 samples/sec   Loss 2.5236   LearningRate 0.0104   Epoch: 13   Global Step: 77110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:01,685-Speed 3389.03 samples/sec   Loss 2.4984   LearningRate 0.0104   Epoch: 13   Global Step: 77120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:04,714-Speed 3380.83 samples/sec   Loss 2.5972   LearningRate 0.0104   Epoch: 13   Global Step: 77130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:07,735-Speed 3390.64 samples/sec   Loss 2.5000   LearningRate 0.0103   Epoch: 13   Global Step: 77140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:10,757-Speed 3389.83 samples/sec   Loss 2.4500   LearningRate 0.0103   Epoch: 13   Global Step: 77150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:13,792-Speed 3374.80 samples/sec   Loss 2.4612   LearningRate 0.0103   Epoch: 13   Global Step: 77160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:16,822-Speed 3380.82 samples/sec   Loss 2.5323   LearningRate 0.0103   Epoch: 13   Global Step: 77170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:19,846-Speed 3386.48 samples/sec   Loss 2.5507   LearningRate 0.0103   Epoch: 13   Global Step: 77180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:22,864-Speed 3393.51 samples/sec   Loss 2.4870   LearningRate 0.0103   Epoch: 13   Global Step: 77190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:25,882-Speed 3394.35 samples/sec   Loss 2.5978   LearningRate 0.0103   Epoch: 13   Global Step: 77200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:28,904-Speed 3389.16 samples/sec   Loss 2.4869   LearningRate 0.0103   Epoch: 13   Global Step: 77210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:31,921-Speed 3395.24 samples/sec   Loss 2.4450   LearningRate 0.0103   Epoch: 13   Global Step: 77220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:25:34,938-Speed 3394.37 samples/sec   Loss 2.5075   LearningRate 0.0103   Epoch: 13   Global Step: 77230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:25:37,946-Speed 3404.72 samples/sec   Loss 2.5160   LearningRate 0.0103   Epoch: 13   Global Step: 77240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:40,965-Speed 3392.91 samples/sec   Loss 2.4945   LearningRate 0.0103   Epoch: 13   Global Step: 77250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:43,982-Speed 3395.50 samples/sec   Loss 2.4369   LearningRate 0.0103   Epoch: 13   Global Step: 77260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:47,013-Speed 3379.06 samples/sec   Loss 2.4955   LearningRate 0.0103   Epoch: 13   Global Step: 77270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:50,035-Speed 3388.79 samples/sec   Loss 2.5404   LearningRate 0.0103   Epoch: 13   Global Step: 77280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:53,074-Speed 3370.26 samples/sec   Loss 2.5035   LearningRate 0.0103   Epoch: 13   Global Step: 77290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:56,091-Speed 3394.78 samples/sec   Loss 2.5036   LearningRate 0.0103   Epoch: 13   Global Step: 77300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:25:59,118-Speed 3384.34 samples/sec   Loss 2.5253   LearningRate 0.0103   Epoch: 13   Global Step: 77310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:02,150-Speed 3377.62 samples/sec   Loss 2.5421   LearningRate 0.0102   Epoch: 13   Global Step: 77320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:05,172-Speed 3389.45 samples/sec   Loss 2.5745   LearningRate 0.0102   Epoch: 13   Global Step: 77330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:08,191-Speed 3392.77 samples/sec   Loss 2.3931   LearningRate 0.0102   Epoch: 13   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:26:11,215-Speed 3387.86 samples/sec   Loss 2.4834   LearningRate 0.0102   Epoch: 13   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:26:14,240-Speed 3385.42 samples/sec   Loss 2.5764   LearningRate 0.0102   Epoch: 13   Global Step: 77360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:17,273-Speed 3377.03 samples/sec   Loss 2.5277   LearningRate 0.0102   Epoch: 13   Global Step: 77370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:20,294-Speed 3389.98 samples/sec   Loss 2.4239   LearningRate 0.0102   Epoch: 13   Global Step: 77380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:23,320-Speed 3384.55 samples/sec   Loss 2.5277   LearningRate 0.0102   Epoch: 13   Global Step: 77390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:26,344-Speed 3387.47 samples/sec   Loss 2.4818   LearningRate 0.0102   Epoch: 13   Global Step: 77400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:29,362-Speed 3393.22 samples/sec   Loss 2.4825   LearningRate 0.0102   Epoch: 13   Global Step: 77410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:32,385-Speed 3389.27 samples/sec   Loss 2.5048   LearningRate 0.0102   Epoch: 13   Global Step: 77420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:35,408-Speed 3387.86 samples/sec   Loss 2.4625   LearningRate 0.0102   Epoch: 13   Global Step: 77430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:38,429-Speed 3390.88 samples/sec   Loss 2.5170   LearningRate 0.0102   Epoch: 13   Global Step: 77440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:41,451-Speed 3389.56 samples/sec   Loss 2.4921   LearningRate 0.0102   Epoch: 13   Global Step: 77450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:26:44,473-Speed 3389.07 samples/sec   Loss 2.5219   LearningRate 0.0102   Epoch: 13   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:26:47,523-Speed 3357.53 samples/sec   Loss 2.5573   LearningRate 0.0102   Epoch: 13   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:26:50,545-Speed 3389.50 samples/sec   Loss 2.6183   LearningRate 0.0102   Epoch: 13   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:26:53,575-Speed 3380.06 samples/sec   Loss 2.4852   LearningRate 0.0101   Epoch: 13   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:26:56,598-Speed 3388.83 samples/sec   Loss 2.5829   LearningRate 0.0101   Epoch: 13   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:26:59,638-Speed 3369.05 samples/sec   Loss 2.4771   LearningRate 0.0101   Epoch: 13   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:02,676-Speed 3371.96 samples/sec   Loss 2.4952   LearningRate 0.0101   Epoch: 13   Global Step: 77520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:05,710-Speed 3375.54 samples/sec   Loss 2.5325   LearningRate 0.0101   Epoch: 13   Global Step: 77530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:08,737-Speed 3384.05 samples/sec   Loss 2.4255   LearningRate 0.0101   Epoch: 13   Global Step: 77540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:11,767-Speed 3379.90 samples/sec   Loss 2.5015   LearningRate 0.0101   Epoch: 13   Global Step: 77550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:14,791-Speed 3386.94 samples/sec   Loss 2.6187   LearningRate 0.0101   Epoch: 13   Global Step: 77560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:27:17,818-Speed 3384.10 samples/sec   Loss 2.4621   LearningRate 0.0101   Epoch: 13   Global Step: 77570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:27:20,849-Speed 3378.78 samples/sec   Loss 2.4999   LearningRate 0.0101   Epoch: 13   Global Step: 77580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:27:23,865-Speed 3396.40 samples/sec   Loss 2.5553   LearningRate 0.0101   Epoch: 13   Global Step: 77590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:26,928-Speed 3343.82 samples/sec   Loss 2.6183   LearningRate 0.0101   Epoch: 13   Global Step: 77600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:29,950-Speed 3389.41 samples/sec   Loss 2.6086   LearningRate 0.0101   Epoch: 13   Global Step: 77610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:32,975-Speed 3386.19 samples/sec   Loss 2.5613   LearningRate 0.0101   Epoch: 13   Global Step: 77620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:35,998-Speed 3388.36 samples/sec   Loss 2.5984   LearningRate 0.0101   Epoch: 13   Global Step: 77630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:39,023-Speed 3385.26 samples/sec   Loss 2.4268   LearningRate 0.0101   Epoch: 13   Global Step: 77640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:42,050-Speed 3383.52 samples/sec   Loss 2.5257   LearningRate 0.0101   Epoch: 13   Global Step: 77650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:45,075-Speed 3385.96 samples/sec   Loss 2.6097   LearningRate 0.0101   Epoch: 13   Global Step: 77660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:48,098-Speed 3387.86 samples/sec   Loss 2.5861   LearningRate 0.0100   Epoch: 13   Global Step: 77670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:51,132-Speed 3376.19 samples/sec   Loss 2.5107   LearningRate 0.0100   Epoch: 13   Global Step: 77680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:54,142-Speed 3403.15 samples/sec   Loss 2.5126   LearningRate 0.0100   Epoch: 13   Global Step: 77690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:27:57,165-Speed 3388.50 samples/sec   Loss 2.5060   LearningRate 0.0100   Epoch: 13   Global Step: 77700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:00,190-Speed 3385.67 samples/sec   Loss 2.5380   LearningRate 0.0100   Epoch: 13   Global Step: 77710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:03,213-Speed 3388.23 samples/sec   Loss 2.4633   LearningRate 0.0100   Epoch: 13   Global Step: 77720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:06,239-Speed 3384.44 samples/sec   Loss 2.5090   LearningRate 0.0100   Epoch: 13   Global Step: 77730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:09,265-Speed 3384.60 samples/sec   Loss 2.5498   LearningRate 0.0100   Epoch: 13   Global Step: 77740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:12,292-Speed 3384.07 samples/sec   Loss 2.5463   LearningRate 0.0100   Epoch: 13   Global Step: 77750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:15,320-Speed 3382.24 samples/sec   Loss 2.4815   LearningRate 0.0100   Epoch: 13   Global Step: 77760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:18,346-Speed 3384.90 samples/sec   Loss 2.5746   LearningRate 0.0100   Epoch: 13   Global Step: 77770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:21,370-Speed 3387.08 samples/sec   Loss 2.4957   LearningRate 0.0100   Epoch: 13   Global Step: 77780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:24,379-Speed 3404.08 samples/sec   Loss 2.3712   LearningRate 0.0100   Epoch: 13   Global Step: 77790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:27,403-Speed 3387.20 samples/sec   Loss 2.5207   LearningRate 0.0100   Epoch: 13   Global Step: 77800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:30,425-Speed 3388.94 samples/sec   Loss 2.4576   LearningRate 0.0100   Epoch: 13   Global Step: 77810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:33,451-Speed 3384.89 samples/sec   Loss 2.5073   LearningRate 0.0100   Epoch: 13   Global Step: 77820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:36,486-Speed 3374.98 samples/sec   Loss 2.5970   LearningRate 0.0100   Epoch: 13   Global Step: 77830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:39,506-Speed 3391.52 samples/sec   Loss 2.5251   LearningRate 0.0100   Epoch: 13   Global Step: 77840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:42,536-Speed 3380.08 samples/sec   Loss 2.5129   LearningRate 0.0099   Epoch: 13   Global Step: 77850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:45,565-Speed 3381.38 samples/sec   Loss 2.5000   LearningRate 0.0099   Epoch: 13   Global Step: 77860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:48,589-Speed 3388.04 samples/sec   Loss 2.4224   LearningRate 0.0099   Epoch: 13   Global Step: 77870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:51,619-Speed 3380.32 samples/sec   Loss 2.5094   LearningRate 0.0099   Epoch: 13   Global Step: 77880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:28:54,636-Speed 3394.51 samples/sec   Loss 2.5566   LearningRate 0.0099   Epoch: 13   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:28:57,655-Speed 3392.67 samples/sec   Loss 2.4916   LearningRate 0.0099   Epoch: 13   Global Step: 77900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:00,678-Speed 3388.53 samples/sec   Loss 2.4763   LearningRate 0.0099   Epoch: 13   Global Step: 77910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:03,705-Speed 3383.26 samples/sec   Loss 2.5360   LearningRate 0.0099   Epoch: 13   Global Step: 77920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:06,732-Speed 3384.14 samples/sec   Loss 2.4288   LearningRate 0.0099   Epoch: 13   Global Step: 77930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:09,758-Speed 3384.18 samples/sec   Loss 2.4396   LearningRate 0.0099   Epoch: 13   Global Step: 77940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:12,783-Speed 3386.18 samples/sec   Loss 2.6653   LearningRate 0.0099   Epoch: 13   Global Step: 77950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:15,818-Speed 3374.57 samples/sec   Loss 2.6831   LearningRate 0.0099   Epoch: 13   Global Step: 77960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:18,845-Speed 3384.19 samples/sec   Loss 2.5136   LearningRate 0.0099   Epoch: 13   Global Step: 77970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:21,878-Speed 3377.51 samples/sec   Loss 2.4685   LearningRate 0.0099   Epoch: 13   Global Step: 77980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:29:24,908-Speed 3380.11 samples/sec   Loss 2.4816   LearningRate 0.0099   Epoch: 13   Global Step: 77990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:29:27,923-Speed 3396.25 samples/sec   Loss 2.5324   LearningRate 0.0099   Epoch: 13   Global Step: 78000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:30:11,379-[lfw][78000]XNorm: 22.533113
Training: 2022-04-27 09:30:11,380-[lfw][78000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-27 09:30:11,380-[lfw][78000]Accuracy-Highest: 0.99817
Training: 2022-04-27 09:31:01,690-[cfp_fp][78000]XNorm: 20.898850
Training: 2022-04-27 09:31:01,690-[cfp_fp][78000]Accuracy-Flip: 0.97714+-0.00616
Training: 2022-04-27 09:31:01,691-[cfp_fp][78000]Accuracy-Highest: 0.97743
Training: 2022-04-27 09:31:45,393-[agedb_30][78000]XNorm: 22.560032
Training: 2022-04-27 09:31:45,393-[agedb_30][78000]Accuracy-Flip: 0.98100+-0.00731
Training: 2022-04-27 09:31:45,394-[agedb_30][78000]Accuracy-Highest: 0.98100
Training: 2022-04-27 09:31:48,396-Speed 72.90 samples/sec   Loss 2.5301   LearningRate 0.0099   Epoch: 13   Global Step: 78010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:31:51,401-Speed 3408.95 samples/sec   Loss 2.4558   LearningRate 0.0099   Epoch: 13   Global Step: 78020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:31:54,398-Speed 3416.81 samples/sec   Loss 2.5126   LearningRate 0.0098   Epoch: 13   Global Step: 78030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:31:57,399-Speed 3414.18 samples/sec   Loss 2.3468   LearningRate 0.0098   Epoch: 13   Global Step: 78040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:00,402-Speed 3410.58 samples/sec   Loss 2.5474   LearningRate 0.0098   Epoch: 13   Global Step: 78050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:03,405-Speed 3410.22 samples/sec   Loss 2.5329   LearningRate 0.0098   Epoch: 13   Global Step: 78060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:06,409-Speed 3410.12 samples/sec   Loss 2.4708   LearningRate 0.0098   Epoch: 13   Global Step: 78070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:09,419-Speed 3403.15 samples/sec   Loss 2.5197   LearningRate 0.0098   Epoch: 13   Global Step: 78080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:12,431-Speed 3400.38 samples/sec   Loss 2.6007   LearningRate 0.0098   Epoch: 13   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:15,441-Speed 3404.03 samples/sec   Loss 2.4447   LearningRate 0.0098   Epoch: 13   Global Step: 78100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:32:18,468-Speed 3382.55 samples/sec   Loss 2.4797   LearningRate 0.0098   Epoch: 13   Global Step: 78110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:32:21,480-Speed 3400.76 samples/sec   Loss 2.4897   LearningRate 0.0098   Epoch: 13   Global Step: 78120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:32:24,482-Speed 3412.25 samples/sec   Loss 2.5003   LearningRate 0.0098   Epoch: 13   Global Step: 78130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:27,495-Speed 3399.06 samples/sec   Loss 2.5064   LearningRate 0.0098   Epoch: 13   Global Step: 78140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:30,509-Speed 3398.26 samples/sec   Loss 2.4305   LearningRate 0.0098   Epoch: 13   Global Step: 78150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:33,525-Speed 3396.44 samples/sec   Loss 2.4584   LearningRate 0.0098   Epoch: 13   Global Step: 78160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:36,541-Speed 3396.00 samples/sec   Loss 2.4605   LearningRate 0.0098   Epoch: 13   Global Step: 78170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:39,562-Speed 3390.37 samples/sec   Loss 2.5795   LearningRate 0.0098   Epoch: 13   Global Step: 78180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:42,606-Speed 3364.61 samples/sec   Loss 2.4921   LearningRate 0.0098   Epoch: 13   Global Step: 78190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:45,628-Speed 3389.75 samples/sec   Loss 2.5340   LearningRate 0.0098   Epoch: 13   Global Step: 78200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:48,654-Speed 3384.10 samples/sec   Loss 2.4632   LearningRate 0.0098   Epoch: 13   Global Step: 78210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:51,673-Speed 3392.99 samples/sec   Loss 2.5062   LearningRate 0.0097   Epoch: 13   Global Step: 78220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:54,679-Speed 3406.75 samples/sec   Loss 2.3637   LearningRate 0.0097   Epoch: 13   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:32:57,705-Speed 3384.91 samples/sec   Loss 2.3453   LearningRate 0.0097   Epoch: 13   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:00,728-Speed 3388.49 samples/sec   Loss 2.4546   LearningRate 0.0097   Epoch: 13   Global Step: 78250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:03,777-Speed 3360.00 samples/sec   Loss 2.4324   LearningRate 0.0097   Epoch: 13   Global Step: 78260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:06,797-Speed 3391.15 samples/sec   Loss 2.4616   LearningRate 0.0097   Epoch: 13   Global Step: 78270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:09,828-Speed 3379.04 samples/sec   Loss 2.4883   LearningRate 0.0097   Epoch: 13   Global Step: 78280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:12,845-Speed 3395.26 samples/sec   Loss 2.4049   LearningRate 0.0097   Epoch: 13   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:15,843-Speed 3416.68 samples/sec   Loss 2.4311   LearningRate 0.0097   Epoch: 13   Global Step: 78300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:18,874-Speed 3378.45 samples/sec   Loss 2.5332   LearningRate 0.0097   Epoch: 13   Global Step: 78310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:21,886-Speed 3400.13 samples/sec   Loss 2.5283   LearningRate 0.0097   Epoch: 13   Global Step: 78320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:24,902-Speed 3396.73 samples/sec   Loss 2.5119   LearningRate 0.0097   Epoch: 13   Global Step: 78330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:27,935-Speed 3376.64 samples/sec   Loss 2.4874   LearningRate 0.0097   Epoch: 13   Global Step: 78340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:30,947-Speed 3400.62 samples/sec   Loss 2.5417   LearningRate 0.0097   Epoch: 13   Global Step: 78350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:33,957-Speed 3402.73 samples/sec   Loss 2.5069   LearningRate 0.0097   Epoch: 13   Global Step: 78360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:36,970-Speed 3399.59 samples/sec   Loss 2.4379   LearningRate 0.0097   Epoch: 13   Global Step: 78370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:39,980-Speed 3402.78 samples/sec   Loss 2.5746   LearningRate 0.0097   Epoch: 13   Global Step: 78380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:42,994-Speed 3398.56 samples/sec   Loss 2.4387   LearningRate 0.0097   Epoch: 13   Global Step: 78390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:33:46,093-Speed 3305.53 samples/sec   Loss 2.5095   LearningRate 0.0096   Epoch: 13   Global Step: 78400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:49,105-Speed 3399.99 samples/sec   Loss 2.4623   LearningRate 0.0096   Epoch: 13   Global Step: 78410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:52,133-Speed 3382.41 samples/sec   Loss 2.4583   LearningRate 0.0096   Epoch: 13   Global Step: 78420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:55,144-Speed 3401.65 samples/sec   Loss 2.4643   LearningRate 0.0096   Epoch: 13   Global Step: 78430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:33:58,151-Speed 3406.95 samples/sec   Loss 2.5224   LearningRate 0.0096   Epoch: 13   Global Step: 78440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:01,158-Speed 3405.83 samples/sec   Loss 2.4166   LearningRate 0.0096   Epoch: 13   Global Step: 78450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:04,148-Speed 3425.79 samples/sec   Loss 2.5451   LearningRate 0.0096   Epoch: 13   Global Step: 78460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:07,153-Speed 3408.50 samples/sec   Loss 2.5509   LearningRate 0.0096   Epoch: 13   Global Step: 78470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:10,160-Speed 3406.48 samples/sec   Loss 2.4809   LearningRate 0.0096   Epoch: 13   Global Step: 78480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:13,167-Speed 3405.72 samples/sec   Loss 2.4782   LearningRate 0.0096   Epoch: 13   Global Step: 78490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:16,172-Speed 3408.42 samples/sec   Loss 2.2830   LearningRate 0.0096   Epoch: 13   Global Step: 78500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:19,182-Speed 3403.03 samples/sec   Loss 2.4631   LearningRate 0.0096   Epoch: 13   Global Step: 78510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:22,187-Speed 3407.51 samples/sec   Loss 2.5124   LearningRate 0.0096   Epoch: 13   Global Step: 78520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:25,195-Speed 3406.14 samples/sec   Loss 2.4724   LearningRate 0.0096   Epoch: 13   Global Step: 78530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:28,200-Speed 3407.83 samples/sec   Loss 2.4704   LearningRate 0.0096   Epoch: 13   Global Step: 78540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:31,211-Speed 3402.49 samples/sec   Loss 2.3979   LearningRate 0.0096   Epoch: 13   Global Step: 78550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 09:34:34,235-Speed 3386.47 samples/sec   Loss 2.5431   LearningRate 0.0096   Epoch: 13   Global Step: 78560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:37,246-Speed 3401.71 samples/sec   Loss 2.3483   LearningRate 0.0096   Epoch: 13   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:40,251-Speed 3408.88 samples/sec   Loss 2.4588   LearningRate 0.0095   Epoch: 13   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:43,266-Speed 3396.70 samples/sec   Loss 2.4925   LearningRate 0.0095   Epoch: 13   Global Step: 78590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:46,277-Speed 3401.86 samples/sec   Loss 2.4754   LearningRate 0.0095   Epoch: 13   Global Step: 78600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:49,288-Speed 3400.75 samples/sec   Loss 2.4784   LearningRate 0.0095   Epoch: 13   Global Step: 78610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:52,295-Speed 3407.01 samples/sec   Loss 2.5894   LearningRate 0.0095   Epoch: 13   Global Step: 78620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:55,301-Speed 3406.90 samples/sec   Loss 2.3509   LearningRate 0.0095   Epoch: 13   Global Step: 78630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:34:58,316-Speed 3397.33 samples/sec   Loss 2.5824   LearningRate 0.0095   Epoch: 13   Global Step: 78640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:01,331-Speed 3397.11 samples/sec   Loss 2.4659   LearningRate 0.0095   Epoch: 13   Global Step: 78650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:04,343-Speed 3401.13 samples/sec   Loss 2.5222   LearningRate 0.0095   Epoch: 13   Global Step: 78660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:35:07,332-Speed 3425.90 samples/sec   Loss 2.5925   LearningRate 0.0095   Epoch: 13   Global Step: 78670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:10,349-Speed 3395.58 samples/sec   Loss 2.4070   LearningRate 0.0095   Epoch: 13   Global Step: 78680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:13,356-Speed 3405.46 samples/sec   Loss 2.4530   LearningRate 0.0095   Epoch: 13   Global Step: 78690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:16,374-Speed 3394.10 samples/sec   Loss 2.4421   LearningRate 0.0095   Epoch: 13   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:19,386-Speed 3400.32 samples/sec   Loss 2.4324   LearningRate 0.0095   Epoch: 13   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:22,404-Speed 3394.30 samples/sec   Loss 2.4620   LearningRate 0.0095   Epoch: 13   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:25,415-Speed 3402.05 samples/sec   Loss 2.5016   LearningRate 0.0095   Epoch: 13   Global Step: 78730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:28,430-Speed 3397.17 samples/sec   Loss 2.5608   LearningRate 0.0095   Epoch: 13   Global Step: 78740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:31,446-Speed 3396.89 samples/sec   Loss 2.4176   LearningRate 0.0095   Epoch: 13   Global Step: 78750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:34,477-Speed 3378.87 samples/sec   Loss 2.5457   LearningRate 0.0095   Epoch: 13   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:37,487-Speed 3402.32 samples/sec   Loss 2.3442   LearningRate 0.0094   Epoch: 13   Global Step: 78770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:35:40,480-Speed 3421.69 samples/sec   Loss 2.5066   LearningRate 0.0094   Epoch: 13   Global Step: 78780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:43,493-Speed 3400.15 samples/sec   Loss 2.5276   LearningRate 0.0094   Epoch: 13   Global Step: 78790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:46,515-Speed 3390.32 samples/sec   Loss 2.5978   LearningRate 0.0094   Epoch: 13   Global Step: 78800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:49,530-Speed 3397.51 samples/sec   Loss 2.3397   LearningRate 0.0094   Epoch: 13   Global Step: 78810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:52,543-Speed 3399.35 samples/sec   Loss 2.4566   LearningRate 0.0094   Epoch: 13   Global Step: 78820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:55,554-Speed 3401.46 samples/sec   Loss 2.4104   LearningRate 0.0094   Epoch: 13   Global Step: 78830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:35:58,563-Speed 3404.27 samples/sec   Loss 2.4643   LearningRate 0.0094   Epoch: 13   Global Step: 78840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:01,581-Speed 3393.12 samples/sec   Loss 2.4006   LearningRate 0.0094   Epoch: 13   Global Step: 78850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:04,598-Speed 3394.59 samples/sec   Loss 2.3691   LearningRate 0.0094   Epoch: 13   Global Step: 78860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:07,612-Speed 3398.11 samples/sec   Loss 2.4419   LearningRate 0.0094   Epoch: 13   Global Step: 78870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:10,612-Speed 3414.83 samples/sec   Loss 2.5202   LearningRate 0.0094   Epoch: 13   Global Step: 78880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:13,623-Speed 3401.06 samples/sec   Loss 2.4789   LearningRate 0.0094   Epoch: 13   Global Step: 78890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:16,634-Speed 3402.42 samples/sec   Loss 2.5910   LearningRate 0.0094   Epoch: 13   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:19,647-Speed 3399.84 samples/sec   Loss 2.4574   LearningRate 0.0094   Epoch: 13   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:22,657-Speed 3402.67 samples/sec   Loss 2.3921   LearningRate 0.0094   Epoch: 13   Global Step: 78920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:25,669-Speed 3399.97 samples/sec   Loss 2.4315   LearningRate 0.0094   Epoch: 13   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:28,688-Speed 3392.68 samples/sec   Loss 2.3757   LearningRate 0.0094   Epoch: 13   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:31,702-Speed 3398.57 samples/sec   Loss 2.3655   LearningRate 0.0093   Epoch: 13   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:34,728-Speed 3383.83 samples/sec   Loss 2.4031   LearningRate 0.0093   Epoch: 13   Global Step: 78960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:37,795-Speed 3340.40 samples/sec   Loss 2.4326   LearningRate 0.0093   Epoch: 13   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:40,791-Speed 3418.69 samples/sec   Loss 2.4526   LearningRate 0.0093   Epoch: 13   Global Step: 78980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:43,800-Speed 3404.05 samples/sec   Loss 2.4119   LearningRate 0.0093   Epoch: 13   Global Step: 78990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:46,817-Speed 3394.61 samples/sec   Loss 2.3664   LearningRate 0.0093   Epoch: 13   Global Step: 79000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:49,832-Speed 3397.60 samples/sec   Loss 2.4493   LearningRate 0.0093   Epoch: 13   Global Step: 79010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:52,845-Speed 3398.97 samples/sec   Loss 2.5594   LearningRate 0.0093   Epoch: 13   Global Step: 79020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:55,857-Speed 3400.19 samples/sec   Loss 2.3652   LearningRate 0.0093   Epoch: 13   Global Step: 79030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:36:58,876-Speed 3392.68 samples/sec   Loss 2.4188   LearningRate 0.0093   Epoch: 13   Global Step: 79040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:01,892-Speed 3396.95 samples/sec   Loss 2.5237   LearningRate 0.0093   Epoch: 13   Global Step: 79050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:04,915-Speed 3387.63 samples/sec   Loss 2.4065   LearningRate 0.0093   Epoch: 13   Global Step: 79060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:07,930-Speed 3397.41 samples/sec   Loss 2.4454   LearningRate 0.0093   Epoch: 13   Global Step: 79070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:10,930-Speed 3413.91 samples/sec   Loss 2.4293   LearningRate 0.0093   Epoch: 13   Global Step: 79080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:13,942-Speed 3400.52 samples/sec   Loss 2.5147   LearningRate 0.0093   Epoch: 13   Global Step: 79090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:16,957-Speed 3397.85 samples/sec   Loss 2.4526   LearningRate 0.0093   Epoch: 13   Global Step: 79100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:19,973-Speed 3396.48 samples/sec   Loss 2.4799   LearningRate 0.0093   Epoch: 13   Global Step: 79110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:22,990-Speed 3395.71 samples/sec   Loss 2.5020   LearningRate 0.0093   Epoch: 13   Global Step: 79120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:26,015-Speed 3385.28 samples/sec   Loss 2.5551   LearningRate 0.0093   Epoch: 13   Global Step: 79130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:29,034-Speed 3392.50 samples/sec   Loss 2.3178   LearningRate 0.0092   Epoch: 13   Global Step: 79140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:32,047-Speed 3400.68 samples/sec   Loss 2.4113   LearningRate 0.0092   Epoch: 13   Global Step: 79150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:35,061-Speed 3397.72 samples/sec   Loss 2.5158   LearningRate 0.0092   Epoch: 13   Global Step: 79160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:38,079-Speed 3394.06 samples/sec   Loss 2.4260   LearningRate 0.0092   Epoch: 13   Global Step: 79170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:41,089-Speed 3402.33 samples/sec   Loss 2.5049   LearningRate 0.0092   Epoch: 13   Global Step: 79180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:44,113-Speed 3387.10 samples/sec   Loss 2.5310   LearningRate 0.0092   Epoch: 13   Global Step: 79190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:47,136-Speed 3388.13 samples/sec   Loss 2.3514   LearningRate 0.0092   Epoch: 13   Global Step: 79200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:50,164-Speed 3382.79 samples/sec   Loss 2.4694   LearningRate 0.0092   Epoch: 13   Global Step: 79210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:53,182-Speed 3393.25 samples/sec   Loss 2.3709   LearningRate 0.0092   Epoch: 13   Global Step: 79220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:56,204-Speed 3389.52 samples/sec   Loss 2.4413   LearningRate 0.0092   Epoch: 13   Global Step: 79230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:37:59,223-Speed 3392.60 samples/sec   Loss 2.4715   LearningRate 0.0092   Epoch: 13   Global Step: 79240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:02,243-Speed 3391.84 samples/sec   Loss 2.4688   LearningRate 0.0092   Epoch: 13   Global Step: 79250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:05,261-Speed 3394.23 samples/sec   Loss 2.4669   LearningRate 0.0092   Epoch: 13   Global Step: 79260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:08,279-Speed 3393.42 samples/sec   Loss 2.4341   LearningRate 0.0092   Epoch: 13   Global Step: 79270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:11,304-Speed 3385.42 samples/sec   Loss 2.6048   LearningRate 0.0092   Epoch: 13   Global Step: 79280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:38:14,320-Speed 3396.40 samples/sec   Loss 2.4978   LearningRate 0.0092   Epoch: 13   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:17,347-Speed 3384.05 samples/sec   Loss 2.4320   LearningRate 0.0092   Epoch: 13   Global Step: 79300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:20,370-Speed 3387.53 samples/sec   Loss 2.4441   LearningRate 0.0092   Epoch: 13   Global Step: 79310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:23,391-Speed 3390.72 samples/sec   Loss 2.3809   LearningRate 0.0092   Epoch: 13   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:26,421-Speed 3380.40 samples/sec   Loss 2.4553   LearningRate 0.0091   Epoch: 13   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:29,438-Speed 3395.57 samples/sec   Loss 2.4345   LearningRate 0.0091   Epoch: 13   Global Step: 79340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:32,454-Speed 3395.84 samples/sec   Loss 2.4923   LearningRate 0.0091   Epoch: 13   Global Step: 79350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:35,468-Speed 3397.35 samples/sec   Loss 2.4303   LearningRate 0.0091   Epoch: 13   Global Step: 79360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:38,521-Speed 3354.93 samples/sec   Loss 2.3641   LearningRate 0.0091   Epoch: 13   Global Step: 79370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:41,535-Speed 3398.57 samples/sec   Loss 2.4349   LearningRate 0.0091   Epoch: 13   Global Step: 79380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:44,541-Speed 3406.98 samples/sec   Loss 2.3865   LearningRate 0.0091   Epoch: 13   Global Step: 79390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:47,563-Speed 3389.92 samples/sec   Loss 2.4964   LearningRate 0.0091   Epoch: 13   Global Step: 79400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:50,582-Speed 3392.36 samples/sec   Loss 2.3520   LearningRate 0.0091   Epoch: 13   Global Step: 79410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:53,604-Speed 3388.86 samples/sec   Loss 2.4587   LearningRate 0.0091   Epoch: 13   Global Step: 79420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:56,622-Speed 3394.54 samples/sec   Loss 2.4949   LearningRate 0.0091   Epoch: 13   Global Step: 79430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:38:59,644-Speed 3389.46 samples/sec   Loss 2.4636   LearningRate 0.0091   Epoch: 13   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:02,679-Speed 3374.52 samples/sec   Loss 2.3760   LearningRate 0.0091   Epoch: 13   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:05,695-Speed 3395.05 samples/sec   Loss 2.3919   LearningRate 0.0091   Epoch: 13   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:08,715-Speed 3393.14 samples/sec   Loss 2.4845   LearningRate 0.0091   Epoch: 13   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:11,751-Speed 3374.29 samples/sec   Loss 2.3599   LearningRate 0.0091   Epoch: 13   Global Step: 79480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:14,788-Speed 3371.84 samples/sec   Loss 2.3498   LearningRate 0.0091   Epoch: 13   Global Step: 79490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:39:17,789-Speed 3412.45 samples/sec   Loss 2.3930   LearningRate 0.0091   Epoch: 13   Global Step: 79500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:20,818-Speed 3382.32 samples/sec   Loss 2.5285   LearningRate 0.0090   Epoch: 13   Global Step: 79510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:23,856-Speed 3371.96 samples/sec   Loss 2.4718   LearningRate 0.0090   Epoch: 13   Global Step: 79520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:26,926-Speed 3336.62 samples/sec   Loss 2.3940   LearningRate 0.0090   Epoch: 13   Global Step: 79530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:29,954-Speed 3382.78 samples/sec   Loss 2.4401   LearningRate 0.0090   Epoch: 13   Global Step: 79540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:32,970-Speed 3395.57 samples/sec   Loss 2.2820   LearningRate 0.0090   Epoch: 13   Global Step: 79550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:35,991-Speed 3389.90 samples/sec   Loss 2.3888   LearningRate 0.0090   Epoch: 13   Global Step: 79560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:39,020-Speed 3382.47 samples/sec   Loss 2.3352   LearningRate 0.0090   Epoch: 13   Global Step: 79570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:42,040-Speed 3391.07 samples/sec   Loss 2.4811   LearningRate 0.0090   Epoch: 13   Global Step: 79580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:45,104-Speed 3343.50 samples/sec   Loss 2.4439   LearningRate 0.0090   Epoch: 13   Global Step: 79590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:39:48,187-Speed 3321.60 samples/sec   Loss 2.4263   LearningRate 0.0090   Epoch: 13   Global Step: 79600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:01,603-Speed 763.34 samples/sec   Loss 2.0940   LearningRate 0.0090   Epoch: 14   Global Step: 79610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:04,621-Speed 3393.39 samples/sec   Loss 1.9365   LearningRate 0.0090   Epoch: 14   Global Step: 79620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:07,654-Speed 3377.79 samples/sec   Loss 1.9344   LearningRate 0.0090   Epoch: 14   Global Step: 79630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:10,677-Speed 3387.99 samples/sec   Loss 1.8368   LearningRate 0.0090   Epoch: 14   Global Step: 79640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:13,720-Speed 3365.88 samples/sec   Loss 1.7792   LearningRate 0.0090   Epoch: 14   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:16,809-Speed 3316.09 samples/sec   Loss 1.8465   LearningRate 0.0090   Epoch: 14   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:19,831-Speed 3388.84 samples/sec   Loss 1.8926   LearningRate 0.0090   Epoch: 14   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:22,855-Speed 3387.12 samples/sec   Loss 1.7856   LearningRate 0.0090   Epoch: 14   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:25,882-Speed 3383.46 samples/sec   Loss 1.8748   LearningRate 0.0090   Epoch: 14   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:28,906-Speed 3386.91 samples/sec   Loss 1.8453   LearningRate 0.0089   Epoch: 14   Global Step: 79700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-27 09:40:31,915-Speed 3404.15 samples/sec   Loss 1.7992   LearningRate 0.0089   Epoch: 14   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 09:40:34,951-Speed 3373.75 samples/sec   Loss 1.9051   LearningRate 0.0089   Epoch: 14   Global Step: 79720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:37,995-Speed 3365.11 samples/sec   Loss 1.8749   LearningRate 0.0089   Epoch: 14   Global Step: 79730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:41,041-Speed 3362.67 samples/sec   Loss 1.8135   LearningRate 0.0089   Epoch: 14   Global Step: 79740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:44,069-Speed 3382.46 samples/sec   Loss 1.9596   LearningRate 0.0089   Epoch: 14   Global Step: 79750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:47,094-Speed 3385.97 samples/sec   Loss 1.7432   LearningRate 0.0089   Epoch: 14   Global Step: 79760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:50,134-Speed 3369.24 samples/sec   Loss 1.8393   LearningRate 0.0089   Epoch: 14   Global Step: 79770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:53,264-Speed 3272.79 samples/sec   Loss 1.8289   LearningRate 0.0089   Epoch: 14   Global Step: 79780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:56,301-Speed 3372.64 samples/sec   Loss 1.9121   LearningRate 0.0089   Epoch: 14   Global Step: 79790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:40:59,334-Speed 3377.00 samples/sec   Loss 1.9114   LearningRate 0.0089   Epoch: 14   Global Step: 79800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:02,370-Speed 3372.70 samples/sec   Loss 1.9952   LearningRate 0.0089   Epoch: 14   Global Step: 79810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:41:05,392-Speed 3389.62 samples/sec   Loss 1.8481   LearningRate 0.0089   Epoch: 14   Global Step: 79820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:08,427-Speed 3374.96 samples/sec   Loss 1.8922   LearningRate 0.0089   Epoch: 14   Global Step: 79830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:11,470-Speed 3366.54 samples/sec   Loss 1.8985   LearningRate 0.0089   Epoch: 14   Global Step: 79840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:14,512-Speed 3366.44 samples/sec   Loss 1.8984   LearningRate 0.0089   Epoch: 14   Global Step: 79850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:17,544-Speed 3378.33 samples/sec   Loss 1.7908   LearningRate 0.0089   Epoch: 14   Global Step: 79860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:20,601-Speed 3350.44 samples/sec   Loss 1.8521   LearningRate 0.0089   Epoch: 14   Global Step: 79870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:23,635-Speed 3375.50 samples/sec   Loss 1.9535   LearningRate 0.0089   Epoch: 14   Global Step: 79880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:26,669-Speed 3376.19 samples/sec   Loss 1.9467   LearningRate 0.0088   Epoch: 14   Global Step: 79890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:29,700-Speed 3378.84 samples/sec   Loss 1.9674   LearningRate 0.0088   Epoch: 14   Global Step: 79900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:32,731-Speed 3379.72 samples/sec   Loss 1.9422   LearningRate 0.0088   Epoch: 14   Global Step: 79910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:35,767-Speed 3373.46 samples/sec   Loss 1.8881   LearningRate 0.0088   Epoch: 14   Global Step: 79920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:38,795-Speed 3383.70 samples/sec   Loss 1.9220   LearningRate 0.0088   Epoch: 14   Global Step: 79930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:41,842-Speed 3361.82 samples/sec   Loss 1.8215   LearningRate 0.0088   Epoch: 14   Global Step: 79940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:44,885-Speed 3365.74 samples/sec   Loss 2.0358   LearningRate 0.0088   Epoch: 14   Global Step: 79950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:47,906-Speed 3390.23 samples/sec   Loss 1.9711   LearningRate 0.0088   Epoch: 14   Global Step: 79960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:50,917-Speed 3400.60 samples/sec   Loss 2.0016   LearningRate 0.0088   Epoch: 14   Global Step: 79970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:53,935-Speed 3395.16 samples/sec   Loss 1.9994   LearningRate 0.0088   Epoch: 14   Global Step: 79980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:56,949-Speed 3398.44 samples/sec   Loss 1.9045   LearningRate 0.0088   Epoch: 14   Global Step: 79990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:41:59,962-Speed 3398.73 samples/sec   Loss 1.8309   LearningRate 0.0088   Epoch: 14   Global Step: 80000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:42:43,356-[lfw][80000]XNorm: 22.465401
Training: 2022-04-27 09:42:43,356-[lfw][80000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-27 09:42:43,357-[lfw][80000]Accuracy-Highest: 0.99817
Training: 2022-04-27 09:43:33,766-[cfp_fp][80000]XNorm: 21.072669
Training: 2022-04-27 09:43:33,766-[cfp_fp][80000]Accuracy-Flip: 0.97643+-0.00572
Training: 2022-04-27 09:43:33,767-[cfp_fp][80000]Accuracy-Highest: 0.97743
Training: 2022-04-27 09:44:17,117-[agedb_30][80000]XNorm: 22.450177
Training: 2022-04-27 09:44:17,118-[agedb_30][80000]Accuracy-Flip: 0.97883+-0.00738
Training: 2022-04-27 09:44:17,119-[agedb_30][80000]Accuracy-Highest: 0.98100
Training: 2022-04-27 09:44:20,123-Speed 73.06 samples/sec   Loss 1.9434   LearningRate 0.0088   Epoch: 14   Global Step: 80010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:23,114-Speed 3424.98 samples/sec   Loss 1.9406   LearningRate 0.0088   Epoch: 14   Global Step: 80020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:44:26,106-Speed 3422.23 samples/sec   Loss 1.9628   LearningRate 0.0088   Epoch: 14   Global Step: 80030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:44:29,085-Speed 3439.41 samples/sec   Loss 1.8724   LearningRate 0.0088   Epoch: 14   Global Step: 80040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:32,083-Speed 3416.13 samples/sec   Loss 1.9819   LearningRate 0.0088   Epoch: 14   Global Step: 80050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:35,084-Speed 3412.21 samples/sec   Loss 1.9977   LearningRate 0.0088   Epoch: 14   Global Step: 80060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:38,180-Speed 3308.49 samples/sec   Loss 1.8972   LearningRate 0.0088   Epoch: 14   Global Step: 80070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:41,239-Speed 3348.12 samples/sec   Loss 1.9475   LearningRate 0.0088   Epoch: 14   Global Step: 80080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:44,241-Speed 3412.20 samples/sec   Loss 1.9179   LearningRate 0.0087   Epoch: 14   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:47,239-Speed 3416.91 samples/sec   Loss 1.9597   LearningRate 0.0087   Epoch: 14   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:50,238-Speed 3415.07 samples/sec   Loss 1.8815   LearningRate 0.0087   Epoch: 14   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:53,235-Speed 3417.66 samples/sec   Loss 1.9165   LearningRate 0.0087   Epoch: 14   Global Step: 80120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:44:56,241-Speed 3407.20 samples/sec   Loss 1.9339   LearningRate 0.0087   Epoch: 14   Global Step: 80130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:44:59,282-Speed 3367.65 samples/sec   Loss 1.9232   LearningRate 0.0087   Epoch: 14   Global Step: 80140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:02,339-Speed 3351.25 samples/sec   Loss 1.9110   LearningRate 0.0087   Epoch: 14   Global Step: 80150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:05,357-Speed 3393.49 samples/sec   Loss 1.8592   LearningRate 0.0087   Epoch: 14   Global Step: 80160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:08,364-Speed 3405.49 samples/sec   Loss 2.0290   LearningRate 0.0087   Epoch: 14   Global Step: 80170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:11,371-Speed 3406.12 samples/sec   Loss 1.9256   LearningRate 0.0087   Epoch: 14   Global Step: 80180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:14,389-Speed 3393.98 samples/sec   Loss 2.0819   LearningRate 0.0087   Epoch: 14   Global Step: 80190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:17,393-Speed 3409.94 samples/sec   Loss 1.9724   LearningRate 0.0087   Epoch: 14   Global Step: 80200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:20,398-Speed 3408.53 samples/sec   Loss 1.9971   LearningRate 0.0087   Epoch: 14   Global Step: 80210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:23,403-Speed 3408.90 samples/sec   Loss 1.9486   LearningRate 0.0087   Epoch: 14   Global Step: 80220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:45:26,409-Speed 3407.39 samples/sec   Loss 1.9061   LearningRate 0.0087   Epoch: 14   Global Step: 80230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:29,454-Speed 3363.53 samples/sec   Loss 1.9547   LearningRate 0.0087   Epoch: 14   Global Step: 80240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:32,459-Speed 3408.68 samples/sec   Loss 2.0480   LearningRate 0.0087   Epoch: 14   Global Step: 80250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:35,484-Speed 3384.73 samples/sec   Loss 1.9744   LearningRate 0.0087   Epoch: 14   Global Step: 80260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:38,486-Speed 3412.13 samples/sec   Loss 1.8846   LearningRate 0.0087   Epoch: 14   Global Step: 80270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:41,489-Speed 3410.85 samples/sec   Loss 1.9745   LearningRate 0.0086   Epoch: 14   Global Step: 80280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:44,492-Speed 3410.81 samples/sec   Loss 1.9972   LearningRate 0.0086   Epoch: 14   Global Step: 80290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:47,501-Speed 3404.05 samples/sec   Loss 1.8467   LearningRate 0.0086   Epoch: 14   Global Step: 80300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:50,509-Speed 3405.61 samples/sec   Loss 1.9484   LearningRate 0.0086   Epoch: 14   Global Step: 80310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:53,518-Speed 3404.04 samples/sec   Loss 1.9422   LearningRate 0.0086   Epoch: 14   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:45:56,523-Speed 3407.88 samples/sec   Loss 1.9639   LearningRate 0.0086   Epoch: 14   Global Step: 80330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:45:59,521-Speed 3416.35 samples/sec   Loss 1.9617   LearningRate 0.0086   Epoch: 14   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:02,527-Speed 3407.91 samples/sec   Loss 1.9612   LearningRate 0.0086   Epoch: 14   Global Step: 80350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:05,537-Speed 3401.84 samples/sec   Loss 2.0858   LearningRate 0.0086   Epoch: 14   Global Step: 80360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:08,545-Speed 3405.06 samples/sec   Loss 2.0233   LearningRate 0.0086   Epoch: 14   Global Step: 80370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:11,554-Speed 3404.21 samples/sec   Loss 1.9843   LearningRate 0.0086   Epoch: 14   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:14,601-Speed 3362.38 samples/sec   Loss 1.8959   LearningRate 0.0086   Epoch: 14   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:17,607-Speed 3407.51 samples/sec   Loss 2.0126   LearningRate 0.0086   Epoch: 14   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:20,609-Speed 3411.66 samples/sec   Loss 2.0605   LearningRate 0.0086   Epoch: 14   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:23,619-Speed 3402.30 samples/sec   Loss 1.9224   LearningRate 0.0086   Epoch: 14   Global Step: 80420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:26,622-Speed 3410.33 samples/sec   Loss 1.8699   LearningRate 0.0086   Epoch: 14   Global Step: 80430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:29,608-Speed 3430.07 samples/sec   Loss 1.9585   LearningRate 0.0086   Epoch: 14   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:32,613-Speed 3409.10 samples/sec   Loss 2.0607   LearningRate 0.0086   Epoch: 14   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:46:35,675-Speed 3345.22 samples/sec   Loss 2.0040   LearningRate 0.0086   Epoch: 14   Global Step: 80460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:38,755-Speed 3324.60 samples/sec   Loss 1.9448   LearningRate 0.0085   Epoch: 14   Global Step: 80470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:41,763-Speed 3405.74 samples/sec   Loss 1.9477   LearningRate 0.0085   Epoch: 14   Global Step: 80480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:44,769-Speed 3407.59 samples/sec   Loss 1.9702   LearningRate 0.0085   Epoch: 14   Global Step: 80490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:47,776-Speed 3405.59 samples/sec   Loss 2.0023   LearningRate 0.0085   Epoch: 14   Global Step: 80500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:50,780-Speed 3409.23 samples/sec   Loss 1.9975   LearningRate 0.0085   Epoch: 14   Global Step: 80510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:53,791-Speed 3402.72 samples/sec   Loss 2.0581   LearningRate 0.0085   Epoch: 14   Global Step: 80520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:56,796-Speed 3408.12 samples/sec   Loss 2.0030   LearningRate 0.0085   Epoch: 14   Global Step: 80530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:46:59,803-Speed 3405.42 samples/sec   Loss 1.9996   LearningRate 0.0085   Epoch: 14   Global Step: 80540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:47:02,845-Speed 3368.03 samples/sec   Loss 2.0257   LearningRate 0.0085   Epoch: 14   Global Step: 80550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:47:05,852-Speed 3407.39 samples/sec   Loss 2.0274   LearningRate 0.0085   Epoch: 14   Global Step: 80560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:08,861-Speed 3403.20 samples/sec   Loss 2.0649   LearningRate 0.0085   Epoch: 14   Global Step: 80570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:11,869-Speed 3404.98 samples/sec   Loss 1.9576   LearningRate 0.0085   Epoch: 14   Global Step: 80580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:14,877-Speed 3404.67 samples/sec   Loss 1.9408   LearningRate 0.0085   Epoch: 14   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:17,889-Speed 3401.40 samples/sec   Loss 1.8832   LearningRate 0.0085   Epoch: 14   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:20,898-Speed 3403.59 samples/sec   Loss 2.0886   LearningRate 0.0085   Epoch: 14   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:23,914-Speed 3396.86 samples/sec   Loss 2.0263   LearningRate 0.0085   Epoch: 14   Global Step: 80620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:26,920-Speed 3407.19 samples/sec   Loss 2.0058   LearningRate 0.0085   Epoch: 14   Global Step: 80630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:29,933-Speed 3399.47 samples/sec   Loss 1.9363   LearningRate 0.0085   Epoch: 14   Global Step: 80640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:32,943-Speed 3402.72 samples/sec   Loss 2.0095   LearningRate 0.0085   Epoch: 14   Global Step: 80650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:35,936-Speed 3421.80 samples/sec   Loss 1.9484   LearningRate 0.0085   Epoch: 14   Global Step: 80660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:38,942-Speed 3407.16 samples/sec   Loss 1.9753   LearningRate 0.0084   Epoch: 14   Global Step: 80670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:41,950-Speed 3405.37 samples/sec   Loss 2.0445   LearningRate 0.0084   Epoch: 14   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:44,965-Speed 3396.23 samples/sec   Loss 1.9796   LearningRate 0.0084   Epoch: 14   Global Step: 80690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:47:47,974-Speed 3404.44 samples/sec   Loss 2.0021   LearningRate 0.0084   Epoch: 14   Global Step: 80700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:47:51,025-Speed 3356.80 samples/sec   Loss 2.0371   LearningRate 0.0084   Epoch: 14   Global Step: 80710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:47:54,041-Speed 3396.70 samples/sec   Loss 1.9981   LearningRate 0.0084   Epoch: 14   Global Step: 80720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:47:57,055-Speed 3398.19 samples/sec   Loss 2.0026   LearningRate 0.0084   Epoch: 14   Global Step: 80730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:48:00,068-Speed 3399.21 samples/sec   Loss 2.0531   LearningRate 0.0084   Epoch: 14   Global Step: 80740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:48:03,085-Speed 3395.15 samples/sec   Loss 2.0460   LearningRate 0.0084   Epoch: 14   Global Step: 80750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:48:06,098-Speed 3399.73 samples/sec   Loss 1.9859   LearningRate 0.0084   Epoch: 14   Global Step: 80760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:48:09,111-Speed 3399.23 samples/sec   Loss 2.0514   LearningRate 0.0084   Epoch: 14   Global Step: 80770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:48:12,127-Speed 3396.06 samples/sec   Loss 2.0269   LearningRate 0.0084   Epoch: 14   Global Step: 80780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:48:15,167-Speed 3369.12 samples/sec   Loss 2.0321   LearningRate 0.0084   Epoch: 14   Global Step: 80790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:48:18,181-Speed 3398.23 samples/sec   Loss 2.0487   LearningRate 0.0084   Epoch: 14   Global Step: 80800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:21,196-Speed 3396.32 samples/sec   Loss 2.0214   LearningRate 0.0084   Epoch: 14   Global Step: 80810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:24,214-Speed 3394.69 samples/sec   Loss 1.9795   LearningRate 0.0084   Epoch: 14   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:27,228-Speed 3398.40 samples/sec   Loss 2.0728   LearningRate 0.0084   Epoch: 14   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:30,246-Speed 3393.86 samples/sec   Loss 2.0451   LearningRate 0.0084   Epoch: 14   Global Step: 80840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:33,257-Speed 3400.95 samples/sec   Loss 2.0664   LearningRate 0.0084   Epoch: 14   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:36,282-Speed 3386.62 samples/sec   Loss 1.9635   LearningRate 0.0083   Epoch: 14   Global Step: 80860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:39,305-Speed 3388.30 samples/sec   Loss 1.9544   LearningRate 0.0083   Epoch: 14   Global Step: 80870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:42,325-Speed 3391.02 samples/sec   Loss 1.9681   LearningRate 0.0083   Epoch: 14   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:45,336-Speed 3402.18 samples/sec   Loss 1.9991   LearningRate 0.0083   Epoch: 14   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:48,331-Speed 3419.62 samples/sec   Loss 2.0415   LearningRate 0.0083   Epoch: 14   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:51,342-Speed 3402.02 samples/sec   Loss 2.0413   LearningRate 0.0083   Epoch: 14   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:54,357-Speed 3396.91 samples/sec   Loss 2.0509   LearningRate 0.0083   Epoch: 14   Global Step: 80920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:48:57,372-Speed 3397.56 samples/sec   Loss 1.9788   LearningRate 0.0083   Epoch: 14   Global Step: 80930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:00,397-Speed 3385.74 samples/sec   Loss 2.0778   LearningRate 0.0083   Epoch: 14   Global Step: 80940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:03,419-Speed 3389.29 samples/sec   Loss 2.0419   LearningRate 0.0083   Epoch: 14   Global Step: 80950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:06,437-Speed 3393.26 samples/sec   Loss 2.0297   LearningRate 0.0083   Epoch: 14   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:09,449-Speed 3400.74 samples/sec   Loss 2.1074   LearningRate 0.0083   Epoch: 14   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:12,462-Speed 3399.69 samples/sec   Loss 1.9975   LearningRate 0.0083   Epoch: 14   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:15,478-Speed 3395.52 samples/sec   Loss 1.9996   LearningRate 0.0083   Epoch: 14   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:18,511-Speed 3377.73 samples/sec   Loss 2.1104   LearningRate 0.0083   Epoch: 14   Global Step: 81000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:49:21,562-Speed 3357.02 samples/sec   Loss 2.0211   LearningRate 0.0083   Epoch: 14   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:24,602-Speed 3369.08 samples/sec   Loss 2.0827   LearningRate 0.0083   Epoch: 14   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:27,624-Speed 3388.47 samples/sec   Loss 2.0261   LearningRate 0.0083   Epoch: 14   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:30,647-Speed 3388.97 samples/sec   Loss 2.0559   LearningRate 0.0083   Epoch: 14   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:33,667-Speed 3391.45 samples/sec   Loss 2.0984   LearningRate 0.0083   Epoch: 14   Global Step: 81050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:36,681-Speed 3397.85 samples/sec   Loss 1.9321   LearningRate 0.0082   Epoch: 14   Global Step: 81060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:39,702-Speed 3390.08 samples/sec   Loss 2.0212   LearningRate 0.0082   Epoch: 14   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:42,717-Speed 3397.14 samples/sec   Loss 2.0092   LearningRate 0.0082   Epoch: 14   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:45,757-Speed 3370.85 samples/sec   Loss 2.0621   LearningRate 0.0082   Epoch: 14   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:48,782-Speed 3385.82 samples/sec   Loss 2.0555   LearningRate 0.0082   Epoch: 14   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:51,800-Speed 3394.02 samples/sec   Loss 1.9974   LearningRate 0.0082   Epoch: 14   Global Step: 81110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:49:54,791-Speed 3424.73 samples/sec   Loss 2.1081   LearningRate 0.0082   Epoch: 14   Global Step: 81120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:49:57,830-Speed 3370.05 samples/sec   Loss 2.0113   LearningRate 0.0082   Epoch: 14   Global Step: 81130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:00,975-Speed 3256.66 samples/sec   Loss 2.0831   LearningRate 0.0082   Epoch: 14   Global Step: 81140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:03,992-Speed 3394.57 samples/sec   Loss 2.0291   LearningRate 0.0082   Epoch: 14   Global Step: 81150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:07,009-Speed 3394.73 samples/sec   Loss 2.0354   LearningRate 0.0082   Epoch: 14   Global Step: 81160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:10,044-Speed 3375.44 samples/sec   Loss 1.9418   LearningRate 0.0082   Epoch: 14   Global Step: 81170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:13,124-Speed 3325.74 samples/sec   Loss 2.0596   LearningRate 0.0082   Epoch: 14   Global Step: 81180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:16,158-Speed 3375.59 samples/sec   Loss 1.9799   LearningRate 0.0082   Epoch: 14   Global Step: 81190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:19,154-Speed 3419.23 samples/sec   Loss 2.0498   LearningRate 0.0082   Epoch: 14   Global Step: 81200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:22,187-Speed 3377.11 samples/sec   Loss 2.0089   LearningRate 0.0082   Epoch: 14   Global Step: 81210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:25,205-Speed 3393.34 samples/sec   Loss 1.9744   LearningRate 0.0082   Epoch: 14   Global Step: 81220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:28,220-Speed 3396.34 samples/sec   Loss 2.0097   LearningRate 0.0082   Epoch: 14   Global Step: 81230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:31,243-Speed 3388.53 samples/sec   Loss 2.0442   LearningRate 0.0082   Epoch: 14   Global Step: 81240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:34,267-Speed 3386.64 samples/sec   Loss 2.0648   LearningRate 0.0082   Epoch: 14   Global Step: 81250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:37,294-Speed 3384.67 samples/sec   Loss 2.0398   LearningRate 0.0081   Epoch: 14   Global Step: 81260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:40,312-Speed 3393.88 samples/sec   Loss 1.9887   LearningRate 0.0081   Epoch: 14   Global Step: 81270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:43,324-Speed 3399.82 samples/sec   Loss 2.0714   LearningRate 0.0081   Epoch: 14   Global Step: 81280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:46,341-Speed 3395.18 samples/sec   Loss 2.0519   LearningRate 0.0081   Epoch: 14   Global Step: 81290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:50:49,360-Speed 3393.15 samples/sec   Loss 2.0305   LearningRate 0.0081   Epoch: 14   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:52,376-Speed 3396.11 samples/sec   Loss 2.1088   LearningRate 0.0081   Epoch: 14   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:55,392-Speed 3395.56 samples/sec   Loss 2.0256   LearningRate 0.0081   Epoch: 14   Global Step: 81320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:50:58,412-Speed 3393.14 samples/sec   Loss 2.0314   LearningRate 0.0081   Epoch: 14   Global Step: 81330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:01,437-Speed 3385.26 samples/sec   Loss 1.9573   LearningRate 0.0081   Epoch: 14   Global Step: 81340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:04,461-Speed 3387.29 samples/sec   Loss 2.0663   LearningRate 0.0081   Epoch: 14   Global Step: 81350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:07,480-Speed 3393.19 samples/sec   Loss 2.0217   LearningRate 0.0081   Epoch: 14   Global Step: 81360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:10,497-Speed 3394.64 samples/sec   Loss 1.9618   LearningRate 0.0081   Epoch: 14   Global Step: 81370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:13,513-Speed 3396.39 samples/sec   Loss 1.9674   LearningRate 0.0081   Epoch: 14   Global Step: 81380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:16,537-Speed 3386.18 samples/sec   Loss 1.9882   LearningRate 0.0081   Epoch: 14   Global Step: 81390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:19,539-Speed 3412.32 samples/sec   Loss 2.0479   LearningRate 0.0081   Epoch: 14   Global Step: 81400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:22,559-Speed 3390.87 samples/sec   Loss 1.9294   LearningRate 0.0081   Epoch: 14   Global Step: 81410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:25,583-Speed 3387.54 samples/sec   Loss 2.0601   LearningRate 0.0081   Epoch: 14   Global Step: 81420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:28,599-Speed 3396.02 samples/sec   Loss 2.0736   LearningRate 0.0081   Epoch: 14   Global Step: 81430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:31,617-Speed 3393.69 samples/sec   Loss 2.1134   LearningRate 0.0081   Epoch: 14   Global Step: 81440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:34,686-Speed 3337.73 samples/sec   Loss 2.0202   LearningRate 0.0081   Epoch: 14   Global Step: 81450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:37,757-Speed 3335.51 samples/sec   Loss 2.0923   LearningRate 0.0080   Epoch: 14   Global Step: 81460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:40,790-Speed 3376.61 samples/sec   Loss 2.0016   LearningRate 0.0080   Epoch: 14   Global Step: 81470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:51:43,796-Speed 3407.60 samples/sec   Loss 1.9745   LearningRate 0.0080   Epoch: 14   Global Step: 81480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:51:46,818-Speed 3388.34 samples/sec   Loss 2.0729   LearningRate 0.0080   Epoch: 14   Global Step: 81490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:51:49,838-Speed 3391.75 samples/sec   Loss 2.1272   LearningRate 0.0080   Epoch: 14   Global Step: 81500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:51:52,861-Speed 3388.11 samples/sec   Loss 2.0275   LearningRate 0.0080   Epoch: 14   Global Step: 81510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:51:55,880-Speed 3393.17 samples/sec   Loss 2.0114   LearningRate 0.0080   Epoch: 14   Global Step: 81520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:51:58,897-Speed 3394.45 samples/sec   Loss 2.1202   LearningRate 0.0080   Epoch: 14   Global Step: 81530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:52:01,917-Speed 3392.05 samples/sec   Loss 2.0211   LearningRate 0.0080   Epoch: 14   Global Step: 81540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:52:04,943-Speed 3385.21 samples/sec   Loss 2.0078   LearningRate 0.0080   Epoch: 14   Global Step: 81550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:52:07,958-Speed 3396.09 samples/sec   Loss 2.0008   LearningRate 0.0080   Epoch: 14   Global Step: 81560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:52:10,982-Speed 3386.92 samples/sec   Loss 2.1159   LearningRate 0.0080   Epoch: 14   Global Step: 81570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:52:14,010-Speed 3382.59 samples/sec   Loss 2.0313   LearningRate 0.0080   Epoch: 14   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:17,030-Speed 3391.31 samples/sec   Loss 2.0106   LearningRate 0.0080   Epoch: 14   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:20,047-Speed 3394.99 samples/sec   Loss 2.0723   LearningRate 0.0080   Epoch: 14   Global Step: 81600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:23,075-Speed 3383.37 samples/sec   Loss 1.9886   LearningRate 0.0080   Epoch: 14   Global Step: 81610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:26,108-Speed 3377.48 samples/sec   Loss 2.1065   LearningRate 0.0080   Epoch: 14   Global Step: 81620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:29,199-Speed 3313.34 samples/sec   Loss 2.1287   LearningRate 0.0080   Epoch: 14   Global Step: 81630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:32,226-Speed 3383.75 samples/sec   Loss 2.0120   LearningRate 0.0080   Epoch: 14   Global Step: 81640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:35,248-Speed 3389.30 samples/sec   Loss 2.0448   LearningRate 0.0080   Epoch: 14   Global Step: 81650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:38,271-Speed 3388.30 samples/sec   Loss 2.1682   LearningRate 0.0079   Epoch: 14   Global Step: 81660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:41,297-Speed 3384.85 samples/sec   Loss 2.1319   LearningRate 0.0079   Epoch: 14   Global Step: 81670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:44,326-Speed 3381.50 samples/sec   Loss 2.0317   LearningRate 0.0079   Epoch: 14   Global Step: 81680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:52:47,334-Speed 3404.30 samples/sec   Loss 1.9973   LearningRate 0.0079   Epoch: 14   Global Step: 81690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:50,359-Speed 3386.06 samples/sec   Loss 1.9908   LearningRate 0.0079   Epoch: 14   Global Step: 81700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:53,384-Speed 3385.35 samples/sec   Loss 1.9451   LearningRate 0.0079   Epoch: 14   Global Step: 81710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:56,407-Speed 3389.03 samples/sec   Loss 2.2059   LearningRate 0.0079   Epoch: 14   Global Step: 81720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:52:59,449-Speed 3367.31 samples/sec   Loss 2.0288   LearningRate 0.0079   Epoch: 14   Global Step: 81730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:02,476-Speed 3383.23 samples/sec   Loss 2.0126   LearningRate 0.0079   Epoch: 14   Global Step: 81740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:05,508-Speed 3378.23 samples/sec   Loss 2.0264   LearningRate 0.0079   Epoch: 14   Global Step: 81750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:08,528-Speed 3391.38 samples/sec   Loss 1.9809   LearningRate 0.0079   Epoch: 14   Global Step: 81760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:11,563-Speed 3374.61 samples/sec   Loss 1.9873   LearningRate 0.0079   Epoch: 14   Global Step: 81770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:14,585-Speed 3389.80 samples/sec   Loss 2.0560   LearningRate 0.0079   Epoch: 14   Global Step: 81780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:17,583-Speed 3415.43 samples/sec   Loss 2.0790   LearningRate 0.0079   Epoch: 14   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:20,604-Speed 3390.95 samples/sec   Loss 2.1427   LearningRate 0.0079   Epoch: 14   Global Step: 81800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:23,629-Speed 3385.76 samples/sec   Loss 2.0379   LearningRate 0.0079   Epoch: 14   Global Step: 81810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:26,662-Speed 3376.86 samples/sec   Loss 2.1545   LearningRate 0.0079   Epoch: 14   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:29,682-Speed 3392.14 samples/sec   Loss 2.1276   LearningRate 0.0079   Epoch: 14   Global Step: 81830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:32,707-Speed 3386.07 samples/sec   Loss 2.0160   LearningRate 0.0079   Epoch: 14   Global Step: 81840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:35,732-Speed 3385.41 samples/sec   Loss 2.0876   LearningRate 0.0079   Epoch: 14   Global Step: 81850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:38,755-Speed 3387.90 samples/sec   Loss 2.0890   LearningRate 0.0078   Epoch: 14   Global Step: 81860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:41,781-Speed 3384.97 samples/sec   Loss 1.8965   LearningRate 0.0078   Epoch: 14   Global Step: 81870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:44,802-Speed 3390.75 samples/sec   Loss 2.0784   LearningRate 0.0078   Epoch: 14   Global Step: 81880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:47,866-Speed 3342.10 samples/sec   Loss 2.1006   LearningRate 0.0078   Epoch: 14   Global Step: 81890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:50,964-Speed 3306.45 samples/sec   Loss 2.1823   LearningRate 0.0078   Epoch: 14   Global Step: 81900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:53,991-Speed 3383.79 samples/sec   Loss 2.0159   LearningRate 0.0078   Epoch: 14   Global Step: 81910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:53:57,020-Speed 3381.11 samples/sec   Loss 2.1118   LearningRate 0.0078   Epoch: 14   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:54:00,045-Speed 3386.06 samples/sec   Loss 2.0813   LearningRate 0.0078   Epoch: 14   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:54:03,072-Speed 3383.97 samples/sec   Loss 2.0732   LearningRate 0.0078   Epoch: 14   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:54:06,095-Speed 3388.17 samples/sec   Loss 2.0992   LearningRate 0.0078   Epoch: 14   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:54:09,119-Speed 3386.89 samples/sec   Loss 2.0959   LearningRate 0.0078   Epoch: 14   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:54:12,140-Speed 3390.30 samples/sec   Loss 2.1176   LearningRate 0.0078   Epoch: 14   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:54:15,163-Speed 3387.97 samples/sec   Loss 2.0167   LearningRate 0.0078   Epoch: 14   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:54:18,198-Speed 3375.86 samples/sec   Loss 2.1512   LearningRate 0.0078   Epoch: 14   Global Step: 81990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:54:21,235-Speed 3371.54 samples/sec   Loss 2.0376   LearningRate 0.0078   Epoch: 14   Global Step: 82000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:55:04,551-[lfw][82000]XNorm: 22.244488
Training: 2022-04-27 09:55:04,551-[lfw][82000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-27 09:55:04,552-[lfw][82000]Accuracy-Highest: 0.99817
Training: 2022-04-27 09:55:54,863-[cfp_fp][82000]XNorm: 20.730621
Training: 2022-04-27 09:55:54,864-[cfp_fp][82000]Accuracy-Flip: 0.97829+-0.00666
Training: 2022-04-27 09:55:54,864-[cfp_fp][82000]Accuracy-Highest: 0.97829
Training: 2022-04-27 09:56:38,090-[agedb_30][82000]XNorm: 22.237732
Training: 2022-04-27 09:56:38,091-[agedb_30][82000]Accuracy-Flip: 0.98133+-0.00763
Training: 2022-04-27 09:56:38,091-[agedb_30][82000]Accuracy-Highest: 0.98133
Training: 2022-04-27 09:56:41,153-Speed 73.19 samples/sec   Loss 2.1107   LearningRate 0.0078   Epoch: 14   Global Step: 82010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:56:44,150-Speed 3416.58 samples/sec   Loss 2.2154   LearningRate 0.0078   Epoch: 14   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:56:47,155-Speed 3408.95 samples/sec   Loss 2.1194   LearningRate 0.0078   Epoch: 14   Global Step: 82030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:56:50,163-Speed 3404.99 samples/sec   Loss 2.1513   LearningRate 0.0078   Epoch: 14   Global Step: 82040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:56:53,180-Speed 3394.50 samples/sec   Loss 2.1046   LearningRate 0.0078   Epoch: 14   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:56:56,194-Speed 3398.20 samples/sec   Loss 2.0890   LearningRate 0.0078   Epoch: 14   Global Step: 82060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:56:59,210-Speed 3396.60 samples/sec   Loss 2.0329   LearningRate 0.0077   Epoch: 14   Global Step: 82070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:02,227-Speed 3395.57 samples/sec   Loss 2.0552   LearningRate 0.0077   Epoch: 14   Global Step: 82080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:05,247-Speed 3391.60 samples/sec   Loss 1.9665   LearningRate 0.0077   Epoch: 14   Global Step: 82090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:08,270-Speed 3388.93 samples/sec   Loss 1.9738   LearningRate 0.0077   Epoch: 14   Global Step: 82100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:11,286-Speed 3395.25 samples/sec   Loss 2.1444   LearningRate 0.0077   Epoch: 14   Global Step: 82110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:14,292-Speed 3407.82 samples/sec   Loss 2.0347   LearningRate 0.0077   Epoch: 14   Global Step: 82120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:17,326-Speed 3375.88 samples/sec   Loss 2.0381   LearningRate 0.0077   Epoch: 14   Global Step: 82130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:20,350-Speed 3386.50 samples/sec   Loss 2.0813   LearningRate 0.0077   Epoch: 14   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:23,386-Speed 3374.25 samples/sec   Loss 2.0484   LearningRate 0.0077   Epoch: 14   Global Step: 82150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:26,494-Speed 3294.86 samples/sec   Loss 2.1317   LearningRate 0.0077   Epoch: 14   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:29,538-Speed 3364.51 samples/sec   Loss 2.0473   LearningRate 0.0077   Epoch: 14   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:32,555-Speed 3395.41 samples/sec   Loss 2.1067   LearningRate 0.0077   Epoch: 14   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:35,573-Speed 3393.42 samples/sec   Loss 2.1454   LearningRate 0.0077   Epoch: 14   Global Step: 82190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:38,591-Speed 3394.83 samples/sec   Loss 2.0121   LearningRate 0.0077   Epoch: 14   Global Step: 82200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:41,607-Speed 3395.68 samples/sec   Loss 1.9980   LearningRate 0.0077   Epoch: 14   Global Step: 82210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:44,622-Speed 3397.08 samples/sec   Loss 2.0370   LearningRate 0.0077   Epoch: 14   Global Step: 82220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 09:57:47,627-Speed 3407.93 samples/sec   Loss 2.0039   LearningRate 0.0077   Epoch: 14   Global Step: 82230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:50,650-Speed 3388.25 samples/sec   Loss 2.0030   LearningRate 0.0077   Epoch: 14   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:53,663-Speed 3399.12 samples/sec   Loss 2.1445   LearningRate 0.0077   Epoch: 14   Global Step: 82250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:56,677-Speed 3398.27 samples/sec   Loss 2.0683   LearningRate 0.0077   Epoch: 14   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:57:59,691-Speed 3398.86 samples/sec   Loss 1.9977   LearningRate 0.0076   Epoch: 14   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:02,704-Speed 3399.22 samples/sec   Loss 2.0523   LearningRate 0.0076   Epoch: 14   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:05,718-Speed 3398.13 samples/sec   Loss 2.1469   LearningRate 0.0076   Epoch: 14   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:08,734-Speed 3396.82 samples/sec   Loss 2.1065   LearningRate 0.0076   Epoch: 14   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:11,822-Speed 3316.27 samples/sec   Loss 2.0248   LearningRate 0.0076   Epoch: 14   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:14,837-Speed 3397.09 samples/sec   Loss 2.0081   LearningRate 0.0076   Epoch: 14   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:17,854-Speed 3394.73 samples/sec   Loss 2.1211   LearningRate 0.0076   Epoch: 14   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:20,872-Speed 3394.00 samples/sec   Loss 2.1035   LearningRate 0.0076   Epoch: 14   Global Step: 82340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:23,911-Speed 3370.49 samples/sec   Loss 2.0842   LearningRate 0.0076   Epoch: 14   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:26,947-Speed 3373.44 samples/sec   Loss 2.0375   LearningRate 0.0076   Epoch: 14   Global Step: 82360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:29,962-Speed 3396.71 samples/sec   Loss 2.1022   LearningRate 0.0076   Epoch: 14   Global Step: 82370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:32,983-Speed 3390.91 samples/sec   Loss 2.0354   LearningRate 0.0076   Epoch: 14   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:36,000-Speed 3394.50 samples/sec   Loss 2.0142   LearningRate 0.0076   Epoch: 14   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:39,155-Speed 3248.54 samples/sec   Loss 2.0559   LearningRate 0.0076   Epoch: 14   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:42,168-Speed 3399.37 samples/sec   Loss 2.0107   LearningRate 0.0076   Epoch: 14   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:45,189-Speed 3389.93 samples/sec   Loss 2.0802   LearningRate 0.0076   Epoch: 14   Global Step: 82420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:48,184-Speed 3419.48 samples/sec   Loss 2.0913   LearningRate 0.0076   Epoch: 14   Global Step: 82430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:51,203-Speed 3393.25 samples/sec   Loss 2.0358   LearningRate 0.0076   Epoch: 14   Global Step: 82440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:54,214-Speed 3401.89 samples/sec   Loss 2.0028   LearningRate 0.0076   Epoch: 14   Global Step: 82450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:58:57,224-Speed 3401.88 samples/sec   Loss 2.0618   LearningRate 0.0076   Epoch: 14   Global Step: 82460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:00,241-Speed 3395.62 samples/sec   Loss 1.9988   LearningRate 0.0076   Epoch: 14   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:03,308-Speed 3338.85 samples/sec   Loss 2.0497   LearningRate 0.0075   Epoch: 14   Global Step: 82480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:06,322-Speed 3398.86 samples/sec   Loss 2.0692   LearningRate 0.0075   Epoch: 14   Global Step: 82490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:09,338-Speed 3395.57 samples/sec   Loss 2.0533   LearningRate 0.0075   Epoch: 14   Global Step: 82500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:12,337-Speed 3415.08 samples/sec   Loss 2.0643   LearningRate 0.0075   Epoch: 14   Global Step: 82510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:15,348-Speed 3402.35 samples/sec   Loss 2.1251   LearningRate 0.0075   Epoch: 14   Global Step: 82520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:18,365-Speed 3394.24 samples/sec   Loss 2.0592   LearningRate 0.0075   Epoch: 14   Global Step: 82530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:21,376-Speed 3401.88 samples/sec   Loss 2.1173   LearningRate 0.0075   Epoch: 14   Global Step: 82540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:24,400-Speed 3387.92 samples/sec   Loss 2.1399   LearningRate 0.0075   Epoch: 14   Global Step: 82550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:27,486-Speed 3318.87 samples/sec   Loss 2.0228   LearningRate 0.0075   Epoch: 14   Global Step: 82560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:30,526-Speed 3368.89 samples/sec   Loss 2.1470   LearningRate 0.0075   Epoch: 14   Global Step: 82570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:33,549-Speed 3388.41 samples/sec   Loss 1.9672   LearningRate 0.0075   Epoch: 14   Global Step: 82580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:36,575-Speed 3384.19 samples/sec   Loss 2.1125   LearningRate 0.0075   Epoch: 14   Global Step: 82590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:39,595-Speed 3391.66 samples/sec   Loss 2.0976   LearningRate 0.0075   Epoch: 14   Global Step: 82600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 09:59:42,616-Speed 3390.06 samples/sec   Loss 2.1201   LearningRate 0.0075   Epoch: 14   Global Step: 82610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:45,649-Speed 3377.80 samples/sec   Loss 1.9676   LearningRate 0.0075   Epoch: 14   Global Step: 82620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:48,767-Speed 3285.06 samples/sec   Loss 2.0827   LearningRate 0.0075   Epoch: 14   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:51,823-Speed 3351.32 samples/sec   Loss 2.0599   LearningRate 0.0075   Epoch: 14   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:54,833-Speed 3402.22 samples/sec   Loss 2.0349   LearningRate 0.0075   Epoch: 14   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 09:59:57,865-Speed 3378.66 samples/sec   Loss 2.0157   LearningRate 0.0075   Epoch: 14   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:00,895-Speed 3380.34 samples/sec   Loss 2.0048   LearningRate 0.0075   Epoch: 14   Global Step: 82670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:03,908-Speed 3399.07 samples/sec   Loss 2.0330   LearningRate 0.0075   Epoch: 14   Global Step: 82680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:06,923-Speed 3397.32 samples/sec   Loss 2.0666   LearningRate 0.0074   Epoch: 14   Global Step: 82690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:09,942-Speed 3392.16 samples/sec   Loss 2.0337   LearningRate 0.0074   Epoch: 14   Global Step: 82700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:12,944-Speed 3411.67 samples/sec   Loss 2.0150   LearningRate 0.0074   Epoch: 14   Global Step: 82710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:15,961-Speed 3396.73 samples/sec   Loss 2.0902   LearningRate 0.0074   Epoch: 14   Global Step: 82720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:18,980-Speed 3392.32 samples/sec   Loss 2.1154   LearningRate 0.0074   Epoch: 14   Global Step: 82730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:21,995-Speed 3397.82 samples/sec   Loss 2.0868   LearningRate 0.0074   Epoch: 14   Global Step: 82740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:25,006-Speed 3400.75 samples/sec   Loss 2.1052   LearningRate 0.0074   Epoch: 14   Global Step: 82750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:28,025-Speed 3392.91 samples/sec   Loss 2.0103   LearningRate 0.0074   Epoch: 14   Global Step: 82760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:31,037-Speed 3400.17 samples/sec   Loss 2.1133   LearningRate 0.0074   Epoch: 14   Global Step: 82770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:34,075-Speed 3371.57 samples/sec   Loss 2.1116   LearningRate 0.0074   Epoch: 14   Global Step: 82780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:37,222-Speed 3254.67 samples/sec   Loss 2.0808   LearningRate 0.0074   Epoch: 14   Global Step: 82790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:40,248-Speed 3384.25 samples/sec   Loss 1.9710   LearningRate 0.0074   Epoch: 14   Global Step: 82800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:43,249-Speed 3414.15 samples/sec   Loss 2.0869   LearningRate 0.0074   Epoch: 14   Global Step: 82810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:46,262-Speed 3398.95 samples/sec   Loss 2.0619   LearningRate 0.0074   Epoch: 14   Global Step: 82820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:49,288-Speed 3384.29 samples/sec   Loss 2.2171   LearningRate 0.0074   Epoch: 14   Global Step: 82830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:52,303-Speed 3397.33 samples/sec   Loss 2.0677   LearningRate 0.0074   Epoch: 14   Global Step: 82840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:55,317-Speed 3398.41 samples/sec   Loss 2.0313   LearningRate 0.0074   Epoch: 14   Global Step: 82850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:00:58,335-Speed 3394.22 samples/sec   Loss 1.9973   LearningRate 0.0074   Epoch: 14   Global Step: 82860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:01,355-Speed 3391.96 samples/sec   Loss 2.0491   LearningRate 0.0074   Epoch: 14   Global Step: 82870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:04,365-Speed 3402.48 samples/sec   Loss 1.9843   LearningRate 0.0074   Epoch: 14   Global Step: 82880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:07,381-Speed 3396.13 samples/sec   Loss 1.9537   LearningRate 0.0073   Epoch: 14   Global Step: 82890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:10,407-Speed 3385.33 samples/sec   Loss 1.9919   LearningRate 0.0073   Epoch: 14   Global Step: 82900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:13,423-Speed 3395.95 samples/sec   Loss 2.1106   LearningRate 0.0073   Epoch: 14   Global Step: 82910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:16,447-Speed 3386.92 samples/sec   Loss 2.0916   LearningRate 0.0073   Epoch: 14   Global Step: 82920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:19,464-Speed 3395.40 samples/sec   Loss 2.0907   LearningRate 0.0073   Epoch: 14   Global Step: 82930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:22,493-Speed 3380.90 samples/sec   Loss 2.0571   LearningRate 0.0073   Epoch: 14   Global Step: 82940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:25,515-Speed 3389.29 samples/sec   Loss 2.1380   LearningRate 0.0073   Epoch: 14   Global Step: 82950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:28,539-Speed 3387.20 samples/sec   Loss 2.1601   LearningRate 0.0073   Epoch: 14   Global Step: 82960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:31,561-Speed 3389.41 samples/sec   Loss 1.9876   LearningRate 0.0073   Epoch: 14   Global Step: 82970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:01:34,575-Speed 3398.17 samples/sec   Loss 2.0549   LearningRate 0.0073   Epoch: 14   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:37,597-Speed 3389.28 samples/sec   Loss 2.0882   LearningRate 0.0073   Epoch: 14   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:40,618-Speed 3390.78 samples/sec   Loss 2.0051   LearningRate 0.0073   Epoch: 14   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:43,631-Speed 3399.80 samples/sec   Loss 1.9408   LearningRate 0.0073   Epoch: 14   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:46,661-Speed 3380.10 samples/sec   Loss 2.0294   LearningRate 0.0073   Epoch: 14   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:49,703-Speed 3367.08 samples/sec   Loss 2.0160   LearningRate 0.0073   Epoch: 14   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:52,743-Speed 3368.63 samples/sec   Loss 2.1284   LearningRate 0.0073   Epoch: 14   Global Step: 83040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:55,761-Speed 3393.71 samples/sec   Loss 2.0959   LearningRate 0.0073   Epoch: 14   Global Step: 83050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:01:58,778-Speed 3395.29 samples/sec   Loss 1.9940   LearningRate 0.0073   Epoch: 14   Global Step: 83060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:01,795-Speed 3395.06 samples/sec   Loss 2.0488   LearningRate 0.0073   Epoch: 14   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:04,811-Speed 3395.61 samples/sec   Loss 2.1249   LearningRate 0.0073   Epoch: 14   Global Step: 83080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:02:07,809-Speed 3416.42 samples/sec   Loss 2.0179   LearningRate 0.0073   Epoch: 14   Global Step: 83090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:10,842-Speed 3377.58 samples/sec   Loss 2.0474   LearningRate 0.0072   Epoch: 14   Global Step: 83100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:13,864-Speed 3389.08 samples/sec   Loss 2.0380   LearningRate 0.0072   Epoch: 14   Global Step: 83110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:16,879-Speed 3397.07 samples/sec   Loss 2.0234   LearningRate 0.0072   Epoch: 14   Global Step: 83120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:19,914-Speed 3374.01 samples/sec   Loss 2.0945   LearningRate 0.0072   Epoch: 14   Global Step: 83130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:22,938-Speed 3387.63 samples/sec   Loss 1.9932   LearningRate 0.0072   Epoch: 14   Global Step: 83140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:25,955-Speed 3394.98 samples/sec   Loss 2.0546   LearningRate 0.0072   Epoch: 14   Global Step: 83150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:28,976-Speed 3390.29 samples/sec   Loss 2.0848   LearningRate 0.0072   Epoch: 14   Global Step: 83160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:32,016-Speed 3369.56 samples/sec   Loss 2.1123   LearningRate 0.0072   Epoch: 14   Global Step: 83170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:35,092-Speed 3329.44 samples/sec   Loss 1.9734   LearningRate 0.0072   Epoch: 14   Global Step: 83180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:38,098-Speed 3407.59 samples/sec   Loss 2.0538   LearningRate 0.0072   Epoch: 14   Global Step: 83190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:41,119-Speed 3390.18 samples/sec   Loss 2.0333   LearningRate 0.0072   Epoch: 14   Global Step: 83200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:44,136-Speed 3395.13 samples/sec   Loss 2.0029   LearningRate 0.0072   Epoch: 14   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:47,154-Speed 3393.04 samples/sec   Loss 2.1004   LearningRate 0.0072   Epoch: 14   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:50,170-Speed 3395.82 samples/sec   Loss 2.0437   LearningRate 0.0072   Epoch: 14   Global Step: 83230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:53,195-Speed 3386.52 samples/sec   Loss 2.0492   LearningRate 0.0072   Epoch: 14   Global Step: 83240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:56,216-Speed 3391.22 samples/sec   Loss 2.0758   LearningRate 0.0072   Epoch: 14   Global Step: 83250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:02:59,238-Speed 3388.56 samples/sec   Loss 2.0279   LearningRate 0.0072   Epoch: 14   Global Step: 83260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:02,261-Speed 3389.53 samples/sec   Loss 2.0325   LearningRate 0.0072   Epoch: 14   Global Step: 83270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:05,303-Speed 3366.33 samples/sec   Loss 1.9805   LearningRate 0.0072   Epoch: 14   Global Step: 83280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:08,324-Speed 3390.60 samples/sec   Loss 2.0574   LearningRate 0.0072   Epoch: 14   Global Step: 83290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:03:11,334-Speed 3402.89 samples/sec   Loss 1.9659   LearningRate 0.0072   Epoch: 14   Global Step: 83300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:14,355-Speed 3391.16 samples/sec   Loss 2.1033   LearningRate 0.0072   Epoch: 14   Global Step: 83310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:17,371-Speed 3394.93 samples/sec   Loss 2.0342   LearningRate 0.0071   Epoch: 14   Global Step: 83320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:20,395-Speed 3387.21 samples/sec   Loss 2.0741   LearningRate 0.0071   Epoch: 14   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:23,413-Speed 3393.76 samples/sec   Loss 2.0274   LearningRate 0.0071   Epoch: 14   Global Step: 83340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:26,431-Speed 3394.74 samples/sec   Loss 2.0441   LearningRate 0.0071   Epoch: 14   Global Step: 83350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:29,450-Speed 3392.18 samples/sec   Loss 1.9549   LearningRate 0.0071   Epoch: 14   Global Step: 83360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:32,471-Speed 3390.26 samples/sec   Loss 1.9649   LearningRate 0.0071   Epoch: 14   Global Step: 83370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:35,493-Speed 3389.22 samples/sec   Loss 1.9921   LearningRate 0.0071   Epoch: 14   Global Step: 83380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:38,514-Speed 3390.14 samples/sec   Loss 2.1035   LearningRate 0.0071   Epoch: 14   Global Step: 83390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:41,522-Speed 3405.22 samples/sec   Loss 2.0112   LearningRate 0.0071   Epoch: 14   Global Step: 83400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:44,546-Speed 3387.34 samples/sec   Loss 2.0890   LearningRate 0.0071   Epoch: 14   Global Step: 83410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:47,572-Speed 3385.15 samples/sec   Loss 1.9989   LearningRate 0.0071   Epoch: 14   Global Step: 83420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:50,597-Speed 3386.04 samples/sec   Loss 2.0952   LearningRate 0.0071   Epoch: 14   Global Step: 83430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:53,624-Speed 3383.97 samples/sec   Loss 2.0178   LearningRate 0.0071   Epoch: 14   Global Step: 83440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:56,639-Speed 3396.70 samples/sec   Loss 1.9723   LearningRate 0.0071   Epoch: 14   Global Step: 83450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:03:59,661-Speed 3388.76 samples/sec   Loss 2.1685   LearningRate 0.0071   Epoch: 14   Global Step: 83460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:02,687-Speed 3385.31 samples/sec   Loss 2.0916   LearningRate 0.0071   Epoch: 14   Global Step: 83470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:05,705-Speed 3393.89 samples/sec   Loss 1.8465   LearningRate 0.0071   Epoch: 14   Global Step: 83480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:08,726-Speed 3390.87 samples/sec   Loss 2.0235   LearningRate 0.0071   Epoch: 14   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:11,749-Speed 3388.34 samples/sec   Loss 1.9931   LearningRate 0.0071   Epoch: 14   Global Step: 83500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:04:14,780-Speed 3378.85 samples/sec   Loss 2.0391   LearningRate 0.0071   Epoch: 14   Global Step: 83510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:17,805-Speed 3386.93 samples/sec   Loss 2.0262   LearningRate 0.0071   Epoch: 14   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:20,823-Speed 3393.25 samples/sec   Loss 2.0548   LearningRate 0.0070   Epoch: 14   Global Step: 83530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:23,849-Speed 3384.47 samples/sec   Loss 2.1167   LearningRate 0.0070   Epoch: 14   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:26,868-Speed 3393.35 samples/sec   Loss 2.1259   LearningRate 0.0070   Epoch: 14   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:04:29,855-Speed 3429.37 samples/sec   Loss 2.0646   LearningRate 0.0070   Epoch: 14   Global Step: 83560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:32,877-Speed 3389.12 samples/sec   Loss 2.0495   LearningRate 0.0070   Epoch: 14   Global Step: 83570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:35,895-Speed 3392.88 samples/sec   Loss 2.0185   LearningRate 0.0070   Epoch: 14   Global Step: 83580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:38,918-Speed 3388.12 samples/sec   Loss 1.9319   LearningRate 0.0070   Epoch: 14   Global Step: 83590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:41,938-Speed 3392.44 samples/sec   Loss 2.0178   LearningRate 0.0070   Epoch: 14   Global Step: 83600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:44,976-Speed 3371.73 samples/sec   Loss 2.0278   LearningRate 0.0070   Epoch: 14   Global Step: 83610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:47,998-Speed 3389.06 samples/sec   Loss 2.0247   LearningRate 0.0070   Epoch: 14   Global Step: 83620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:51,015-Speed 3394.78 samples/sec   Loss 1.9790   LearningRate 0.0070   Epoch: 14   Global Step: 83630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:54,032-Speed 3395.07 samples/sec   Loss 2.0595   LearningRate 0.0070   Epoch: 14   Global Step: 83640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:04:57,052-Speed 3390.91 samples/sec   Loss 1.9851   LearningRate 0.0070   Epoch: 14   Global Step: 83650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 10:05:00,084-Speed 3378.71 samples/sec   Loss 2.0458   LearningRate 0.0070   Epoch: 14   Global Step: 83660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:03,108-Speed 3386.19 samples/sec   Loss 1.9517   LearningRate 0.0070   Epoch: 14   Global Step: 83670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:06,140-Speed 3378.28 samples/sec   Loss 2.1095   LearningRate 0.0070   Epoch: 14   Global Step: 83680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:09,164-Speed 3386.86 samples/sec   Loss 2.0706   LearningRate 0.0070   Epoch: 14   Global Step: 83690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:12,192-Speed 3382.79 samples/sec   Loss 1.9789   LearningRate 0.0070   Epoch: 14   Global Step: 83700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:15,221-Speed 3382.97 samples/sec   Loss 2.0359   LearningRate 0.0070   Epoch: 14   Global Step: 83710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:18,245-Speed 3387.18 samples/sec   Loss 2.0494   LearningRate 0.0070   Epoch: 14   Global Step: 83720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:21,264-Speed 3392.40 samples/sec   Loss 2.0508   LearningRate 0.0070   Epoch: 14   Global Step: 83730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:24,332-Speed 3338.47 samples/sec   Loss 2.0059   LearningRate 0.0070   Epoch: 14   Global Step: 83740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:27,393-Speed 3345.30 samples/sec   Loss 1.9988   LearningRate 0.0069   Epoch: 14   Global Step: 83750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:05:30,415-Speed 3389.51 samples/sec   Loss 2.0398   LearningRate 0.0069   Epoch: 14   Global Step: 83760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:33,441-Speed 3384.90 samples/sec   Loss 2.0386   LearningRate 0.0069   Epoch: 14   Global Step: 83770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:36,470-Speed 3381.36 samples/sec   Loss 1.8627   LearningRate 0.0069   Epoch: 14   Global Step: 83780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:39,491-Speed 3391.24 samples/sec   Loss 1.9550   LearningRate 0.0069   Epoch: 14   Global Step: 83790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:42,508-Speed 3393.87 samples/sec   Loss 2.0473   LearningRate 0.0069   Epoch: 14   Global Step: 83800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:45,531-Speed 3388.55 samples/sec   Loss 2.0397   LearningRate 0.0069   Epoch: 14   Global Step: 83810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:48,555-Speed 3387.77 samples/sec   Loss 2.0202   LearningRate 0.0069   Epoch: 14   Global Step: 83820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:51,580-Speed 3385.38 samples/sec   Loss 2.0774   LearningRate 0.0069   Epoch: 14   Global Step: 83830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:54,607-Speed 3383.66 samples/sec   Loss 1.9777   LearningRate 0.0069   Epoch: 14   Global Step: 83840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:05:57,630-Speed 3387.46 samples/sec   Loss 1.9720   LearningRate 0.0069   Epoch: 14   Global Step: 83850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:06:00,652-Speed 3389.96 samples/sec   Loss 1.9967   LearningRate 0.0069   Epoch: 14   Global Step: 83860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:06:03,655-Speed 3409.79 samples/sec   Loss 2.0524   LearningRate 0.0069   Epoch: 14   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:06:06,679-Speed 3387.46 samples/sec   Loss 2.0426   LearningRate 0.0069   Epoch: 14   Global Step: 83880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:06:09,716-Speed 3373.00 samples/sec   Loss 2.0091   LearningRate 0.0069   Epoch: 14   Global Step: 83890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:06:12,789-Speed 3332.72 samples/sec   Loss 1.9870   LearningRate 0.0069   Epoch: 14   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:06:15,820-Speed 3379.69 samples/sec   Loss 2.0935   LearningRate 0.0069   Epoch: 14   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:06:18,829-Speed 3403.85 samples/sec   Loss 2.1489   LearningRate 0.0069   Epoch: 14   Global Step: 83920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:21,845-Speed 3395.40 samples/sec   Loss 2.0953   LearningRate 0.0069   Epoch: 14   Global Step: 83930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:24,886-Speed 3368.35 samples/sec   Loss 2.0319   LearningRate 0.0069   Epoch: 14   Global Step: 83940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:27,915-Speed 3381.36 samples/sec   Loss 2.0194   LearningRate 0.0069   Epoch: 14   Global Step: 83950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:31,049-Speed 3267.91 samples/sec   Loss 2.0613   LearningRate 0.0068   Epoch: 14   Global Step: 83960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:34,118-Speed 3337.96 samples/sec   Loss 1.9598   LearningRate 0.0068   Epoch: 14   Global Step: 83970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:37,218-Speed 3303.99 samples/sec   Loss 1.9135   LearningRate 0.0068   Epoch: 14   Global Step: 83980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:40,245-Speed 3384.34 samples/sec   Loss 2.1090   LearningRate 0.0068   Epoch: 14   Global Step: 83990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:06:43,273-Speed 3382.38 samples/sec   Loss 1.9963   LearningRate 0.0068   Epoch: 14   Global Step: 84000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:07:26,457-[lfw][84000]XNorm: 21.186704
Training: 2022-04-27 10:07:26,457-[lfw][84000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-27 10:07:26,458-[lfw][84000]Accuracy-Highest: 0.99817
Training: 2022-04-27 10:08:16,704-[cfp_fp][84000]XNorm: 20.323462
Training: 2022-04-27 10:08:16,704-[cfp_fp][84000]Accuracy-Flip: 0.97843+-0.00698
Training: 2022-04-27 10:08:16,705-[cfp_fp][84000]Accuracy-Highest: 0.97843
Training: 2022-04-27 10:09:00,318-[agedb_30][84000]XNorm: 21.553200
Training: 2022-04-27 10:09:00,318-[agedb_30][84000]Accuracy-Flip: 0.98050+-0.00606
Training: 2022-04-27 10:09:00,319-[agedb_30][84000]Accuracy-Highest: 0.98133
Training: 2022-04-27 10:09:03,338-Speed 73.11 samples/sec   Loss 1.9241   LearningRate 0.0068   Epoch: 14   Global Step: 84010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:09:06,342-Speed 3408.98 samples/sec   Loss 1.9987   LearningRate 0.0068   Epoch: 14   Global Step: 84020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:09,360-Speed 3393.38 samples/sec   Loss 1.9864   LearningRate 0.0068   Epoch: 14   Global Step: 84030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:12,370-Speed 3403.64 samples/sec   Loss 2.1260   LearningRate 0.0068   Epoch: 14   Global Step: 84040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:15,381-Speed 3401.28 samples/sec   Loss 2.0263   LearningRate 0.0068   Epoch: 14   Global Step: 84050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:18,389-Speed 3404.38 samples/sec   Loss 2.0006   LearningRate 0.0068   Epoch: 14   Global Step: 84060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:21,404-Speed 3397.62 samples/sec   Loss 1.9460   LearningRate 0.0068   Epoch: 14   Global Step: 84070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:24,427-Speed 3387.94 samples/sec   Loss 2.0955   LearningRate 0.0068   Epoch: 14   Global Step: 84080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:27,455-Speed 3383.40 samples/sec   Loss 2.0299   LearningRate 0.0068   Epoch: 14   Global Step: 84090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:30,469-Speed 3398.30 samples/sec   Loss 1.9939   LearningRate 0.0068   Epoch: 14   Global Step: 84100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:33,490-Speed 3390.38 samples/sec   Loss 1.9697   LearningRate 0.0068   Epoch: 14   Global Step: 84110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:36,520-Speed 3380.27 samples/sec   Loss 1.9109   LearningRate 0.0068   Epoch: 14   Global Step: 84120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:09:39,527-Speed 3405.85 samples/sec   Loss 2.0871   LearningRate 0.0068   Epoch: 14   Global Step: 84130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:42,546-Speed 3393.73 samples/sec   Loss 2.0341   LearningRate 0.0068   Epoch: 14   Global Step: 84140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:45,593-Speed 3361.83 samples/sec   Loss 2.0093   LearningRate 0.0068   Epoch: 14   Global Step: 84150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:48,677-Speed 3321.19 samples/sec   Loss 2.0739   LearningRate 0.0068   Epoch: 14   Global Step: 84160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:51,772-Speed 3308.27 samples/sec   Loss 2.0806   LearningRate 0.0068   Epoch: 14   Global Step: 84170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:54,796-Speed 3387.41 samples/sec   Loss 1.9476   LearningRate 0.0067   Epoch: 14   Global Step: 84180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:09:57,825-Speed 3382.25 samples/sec   Loss 1.9572   LearningRate 0.0067   Epoch: 14   Global Step: 84190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:00,852-Speed 3383.16 samples/sec   Loss 2.0525   LearningRate 0.0067   Epoch: 14   Global Step: 84200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:03,887-Speed 3375.19 samples/sec   Loss 1.9387   LearningRate 0.0067   Epoch: 14   Global Step: 84210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:06,914-Speed 3384.67 samples/sec   Loss 1.9795   LearningRate 0.0067   Epoch: 14   Global Step: 84220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:09,921-Speed 3405.71 samples/sec   Loss 2.0780   LearningRate 0.0067   Epoch: 14   Global Step: 84230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:12,981-Speed 3347.02 samples/sec   Loss 2.0118   LearningRate 0.0067   Epoch: 14   Global Step: 84240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:16,026-Speed 3364.39 samples/sec   Loss 2.0538   LearningRate 0.0067   Epoch: 14   Global Step: 84250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:19,049-Speed 3387.66 samples/sec   Loss 1.9968   LearningRate 0.0067   Epoch: 14   Global Step: 84260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:22,070-Speed 3391.27 samples/sec   Loss 2.0221   LearningRate 0.0067   Epoch: 14   Global Step: 84270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:25,089-Speed 3392.25 samples/sec   Loss 1.9502   LearningRate 0.0067   Epoch: 14   Global Step: 84280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:28,122-Speed 3377.41 samples/sec   Loss 2.0475   LearningRate 0.0067   Epoch: 14   Global Step: 84290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:31,143-Speed 3390.16 samples/sec   Loss 1.9867   LearningRate 0.0067   Epoch: 14   Global Step: 84300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:34,158-Speed 3396.42 samples/sec   Loss 2.0420   LearningRate 0.0067   Epoch: 14   Global Step: 84310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:37,181-Speed 3388.73 samples/sec   Loss 2.0096   LearningRate 0.0067   Epoch: 14   Global Step: 84320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:40,208-Speed 3383.97 samples/sec   Loss 1.9767   LearningRate 0.0067   Epoch: 14   Global Step: 84330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:10:43,213-Speed 3408.57 samples/sec   Loss 2.0215   LearningRate 0.0067   Epoch: 14   Global Step: 84340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:46,224-Speed 3400.56 samples/sec   Loss 1.9859   LearningRate 0.0067   Epoch: 14   Global Step: 84350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:49,242-Speed 3394.68 samples/sec   Loss 1.9729   LearningRate 0.0067   Epoch: 14   Global Step: 84360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:52,254-Speed 3399.97 samples/sec   Loss 1.9250   LearningRate 0.0067   Epoch: 14   Global Step: 84370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:55,265-Speed 3401.59 samples/sec   Loss 1.9899   LearningRate 0.0067   Epoch: 14   Global Step: 84380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:10:58,287-Speed 3389.86 samples/sec   Loss 2.0699   LearningRate 0.0067   Epoch: 14   Global Step: 84390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:01,318-Speed 3378.65 samples/sec   Loss 1.9505   LearningRate 0.0066   Epoch: 14   Global Step: 84400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:04,330-Speed 3400.87 samples/sec   Loss 2.1290   LearningRate 0.0066   Epoch: 14   Global Step: 84410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:07,350-Speed 3391.58 samples/sec   Loss 2.0123   LearningRate 0.0066   Epoch: 14   Global Step: 84420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:10,358-Speed 3405.38 samples/sec   Loss 1.9350   LearningRate 0.0066   Epoch: 14   Global Step: 84430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:13,373-Speed 3396.79 samples/sec   Loss 2.0459   LearningRate 0.0066   Epoch: 14   Global Step: 84440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:11:16,362-Speed 3426.92 samples/sec   Loss 2.0413   LearningRate 0.0066   Epoch: 14   Global Step: 84450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:19,371-Speed 3403.62 samples/sec   Loss 2.0634   LearningRate 0.0066   Epoch: 14   Global Step: 84460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:22,384-Speed 3399.29 samples/sec   Loss 2.0125   LearningRate 0.0066   Epoch: 14   Global Step: 84470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:25,407-Speed 3388.96 samples/sec   Loss 2.0256   LearningRate 0.0066   Epoch: 14   Global Step: 84480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:28,422-Speed 3397.20 samples/sec   Loss 1.9854   LearningRate 0.0066   Epoch: 14   Global Step: 84490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:31,434-Speed 3400.06 samples/sec   Loss 1.9691   LearningRate 0.0066   Epoch: 14   Global Step: 84500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:34,444-Speed 3402.01 samples/sec   Loss 1.9805   LearningRate 0.0066   Epoch: 14   Global Step: 84510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:37,452-Speed 3405.57 samples/sec   Loss 2.0963   LearningRate 0.0066   Epoch: 14   Global Step: 84520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:40,467-Speed 3397.17 samples/sec   Loss 1.9659   LearningRate 0.0066   Epoch: 14   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:43,476-Speed 3404.50 samples/sec   Loss 2.0057   LearningRate 0.0066   Epoch: 14   Global Step: 84540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:46,490-Speed 3398.48 samples/sec   Loss 1.9065   LearningRate 0.0066   Epoch: 14   Global Step: 84550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:11:49,482-Speed 3422.62 samples/sec   Loss 2.0647   LearningRate 0.0066   Epoch: 14   Global Step: 84560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:52,507-Speed 3385.44 samples/sec   Loss 1.9857   LearningRate 0.0066   Epoch: 14   Global Step: 84570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:55,534-Speed 3384.96 samples/sec   Loss 1.9850   LearningRate 0.0066   Epoch: 14   Global Step: 84580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:11:58,542-Speed 3404.94 samples/sec   Loss 1.9638   LearningRate 0.0066   Epoch: 14   Global Step: 84590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:01,559-Speed 3394.60 samples/sec   Loss 1.9735   LearningRate 0.0066   Epoch: 14   Global Step: 84600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:04,570-Speed 3401.13 samples/sec   Loss 2.0173   LearningRate 0.0066   Epoch: 14   Global Step: 84610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:07,582-Speed 3401.07 samples/sec   Loss 1.9669   LearningRate 0.0065   Epoch: 14   Global Step: 84620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:10,600-Speed 3393.24 samples/sec   Loss 2.0311   LearningRate 0.0065   Epoch: 14   Global Step: 84630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:13,629-Speed 3381.97 samples/sec   Loss 1.9994   LearningRate 0.0065   Epoch: 14   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:16,644-Speed 3396.82 samples/sec   Loss 2.0273   LearningRate 0.0065   Epoch: 14   Global Step: 84650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:19,637-Speed 3422.26 samples/sec   Loss 1.9700   LearningRate 0.0065   Epoch: 14   Global Step: 84660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:22,650-Speed 3399.46 samples/sec   Loss 2.0217   LearningRate 0.0065   Epoch: 14   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:25,663-Speed 3399.19 samples/sec   Loss 1.9693   LearningRate 0.0065   Epoch: 14   Global Step: 84680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:28,678-Speed 3396.89 samples/sec   Loss 1.8946   LearningRate 0.0065   Epoch: 14   Global Step: 84690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:31,699-Speed 3390.98 samples/sec   Loss 1.9994   LearningRate 0.0065   Epoch: 14   Global Step: 84700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:34,712-Speed 3399.04 samples/sec   Loss 1.9270   LearningRate 0.0065   Epoch: 14   Global Step: 84710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:37,735-Speed 3388.76 samples/sec   Loss 1.9178   LearningRate 0.0065   Epoch: 14   Global Step: 84720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:40,749-Speed 3398.59 samples/sec   Loss 1.9102   LearningRate 0.0065   Epoch: 14   Global Step: 84730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:43,759-Speed 3402.26 samples/sec   Loss 1.9468   LearningRate 0.0065   Epoch: 14   Global Step: 84740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:46,772-Speed 3399.36 samples/sec   Loss 2.0810   LearningRate 0.0065   Epoch: 14   Global Step: 84750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:49,788-Speed 3395.90 samples/sec   Loss 1.9649   LearningRate 0.0065   Epoch: 14   Global Step: 84760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:12:52,783-Speed 3419.48 samples/sec   Loss 1.9801   LearningRate 0.0065   Epoch: 14   Global Step: 84770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:55,799-Speed 3396.93 samples/sec   Loss 2.0313   LearningRate 0.0065   Epoch: 14   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:12:58,816-Speed 3394.54 samples/sec   Loss 1.9944   LearningRate 0.0065   Epoch: 14   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:01,830-Speed 3397.99 samples/sec   Loss 1.9610   LearningRate 0.0065   Epoch: 14   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:04,836-Speed 3407.60 samples/sec   Loss 1.9884   LearningRate 0.0065   Epoch: 14   Global Step: 84810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:07,851-Speed 3397.16 samples/sec   Loss 1.9061   LearningRate 0.0065   Epoch: 14   Global Step: 84820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:10,867-Speed 3396.67 samples/sec   Loss 1.9747   LearningRate 0.0065   Epoch: 14   Global Step: 84830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:13,877-Speed 3402.73 samples/sec   Loss 2.0359   LearningRate 0.0064   Epoch: 14   Global Step: 84840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:16,890-Speed 3399.07 samples/sec   Loss 2.0096   LearningRate 0.0064   Epoch: 14   Global Step: 84850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:19,907-Speed 3395.42 samples/sec   Loss 2.0456   LearningRate 0.0064   Epoch: 14   Global Step: 84860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:22,920-Speed 3398.66 samples/sec   Loss 2.0463   LearningRate 0.0064   Epoch: 14   Global Step: 84870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:25,934-Speed 3399.32 samples/sec   Loss 1.9388   LearningRate 0.0064   Epoch: 14   Global Step: 84880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:28,955-Speed 3389.58 samples/sec   Loss 1.9988   LearningRate 0.0064   Epoch: 14   Global Step: 84890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:31,975-Speed 3392.28 samples/sec   Loss 1.9368   LearningRate 0.0064   Epoch: 14   Global Step: 84900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:13:34,991-Speed 3395.76 samples/sec   Loss 2.0065   LearningRate 0.0064   Epoch: 14   Global Step: 84910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:38,003-Speed 3401.05 samples/sec   Loss 2.0315   LearningRate 0.0064   Epoch: 14   Global Step: 84920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:41,016-Speed 3399.01 samples/sec   Loss 2.0641   LearningRate 0.0064   Epoch: 14   Global Step: 84930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:44,025-Speed 3403.80 samples/sec   Loss 2.0283   LearningRate 0.0064   Epoch: 14   Global Step: 84940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:47,052-Speed 3384.16 samples/sec   Loss 1.9925   LearningRate 0.0064   Epoch: 14   Global Step: 84950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:50,069-Speed 3394.36 samples/sec   Loss 2.0263   LearningRate 0.0064   Epoch: 14   Global Step: 84960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:53,085-Speed 3395.90 samples/sec   Loss 2.0125   LearningRate 0.0064   Epoch: 14   Global Step: 84970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:56,094-Speed 3404.74 samples/sec   Loss 2.0517   LearningRate 0.0064   Epoch: 14   Global Step: 84980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:13:59,104-Speed 3402.89 samples/sec   Loss 1.9334   LearningRate 0.0064   Epoch: 14   Global Step: 84990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:02,116-Speed 3400.09 samples/sec   Loss 1.9329   LearningRate 0.0064   Epoch: 14   Global Step: 85000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:05,149-Speed 3377.11 samples/sec   Loss 1.9364   LearningRate 0.0064   Epoch: 14   Global Step: 85010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:14:08,150-Speed 3413.22 samples/sec   Loss 2.0254   LearningRate 0.0064   Epoch: 14   Global Step: 85020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:11,166-Speed 3395.94 samples/sec   Loss 1.9222   LearningRate 0.0064   Epoch: 14   Global Step: 85030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:14,179-Speed 3400.15 samples/sec   Loss 1.9659   LearningRate 0.0064   Epoch: 14   Global Step: 85040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:17,195-Speed 3395.05 samples/sec   Loss 1.9735   LearningRate 0.0064   Epoch: 14   Global Step: 85050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:20,211-Speed 3396.29 samples/sec   Loss 1.9769   LearningRate 0.0064   Epoch: 14   Global Step: 85060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:23,221-Speed 3402.60 samples/sec   Loss 1.9997   LearningRate 0.0063   Epoch: 14   Global Step: 85070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:26,234-Speed 3400.51 samples/sec   Loss 1.9807   LearningRate 0.0063   Epoch: 14   Global Step: 85080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:14:29,248-Speed 3397.89 samples/sec   Loss 1.9763   LearningRate 0.0063   Epoch: 14   Global Step: 85090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:32,260-Speed 3400.31 samples/sec   Loss 2.0154   LearningRate 0.0063   Epoch: 14   Global Step: 85100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:35,276-Speed 3396.04 samples/sec   Loss 1.9170   LearningRate 0.0063   Epoch: 14   Global Step: 85110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:38,288-Speed 3400.52 samples/sec   Loss 1.8738   LearningRate 0.0063   Epoch: 14   Global Step: 85120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:41,301-Speed 3398.86 samples/sec   Loss 1.8638   LearningRate 0.0063   Epoch: 14   Global Step: 85130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:44,315-Speed 3398.47 samples/sec   Loss 2.0134   LearningRate 0.0063   Epoch: 14   Global Step: 85140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:47,332-Speed 3395.70 samples/sec   Loss 1.9286   LearningRate 0.0063   Epoch: 14   Global Step: 85150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:50,352-Speed 3391.06 samples/sec   Loss 1.9891   LearningRate 0.0063   Epoch: 14   Global Step: 85160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:53,371-Speed 3392.74 samples/sec   Loss 1.9681   LearningRate 0.0063   Epoch: 14   Global Step: 85170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:56,384-Speed 3399.82 samples/sec   Loss 1.9172   LearningRate 0.0063   Epoch: 14   Global Step: 85180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:14:59,399-Speed 3397.02 samples/sec   Loss 1.8849   LearningRate 0.0063   Epoch: 14   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:02,423-Speed 3386.26 samples/sec   Loss 2.0098   LearningRate 0.0063   Epoch: 14   Global Step: 85200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:05,442-Speed 3393.39 samples/sec   Loss 1.8999   LearningRate 0.0063   Epoch: 14   Global Step: 85210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:08,462-Speed 3391.14 samples/sec   Loss 1.9369   LearningRate 0.0063   Epoch: 14   Global Step: 85220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:11,479-Speed 3394.45 samples/sec   Loss 2.0581   LearningRate 0.0063   Epoch: 14   Global Step: 85230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:14,493-Speed 3398.51 samples/sec   Loss 2.0384   LearningRate 0.0063   Epoch: 14   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:17,507-Speed 3398.08 samples/sec   Loss 1.8645   LearningRate 0.0063   Epoch: 14   Global Step: 85250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:20,523-Speed 3396.57 samples/sec   Loss 1.9839   LearningRate 0.0063   Epoch: 14   Global Step: 85260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:23,558-Speed 3375.17 samples/sec   Loss 1.9740   LearningRate 0.0063   Epoch: 14   Global Step: 85270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:26,754-Speed 3204.79 samples/sec   Loss 1.9393   LearningRate 0.0063   Epoch: 14   Global Step: 85280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:29,745-Speed 3423.77 samples/sec   Loss 1.9709   LearningRate 0.0063   Epoch: 14   Global Step: 85290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:46,321-Speed 617.83 samples/sec   Loss 1.4859   LearningRate 0.0062   Epoch: 15   Global Step: 85300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:49,336-Speed 3397.95 samples/sec   Loss 1.3967   LearningRate 0.0062   Epoch: 15   Global Step: 85310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:52,391-Speed 3351.84 samples/sec   Loss 1.4759   LearningRate 0.0062   Epoch: 15   Global Step: 85320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:55,418-Speed 3384.45 samples/sec   Loss 1.4253   LearningRate 0.0062   Epoch: 15   Global Step: 85330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:15:58,437-Speed 3392.24 samples/sec   Loss 1.3946   LearningRate 0.0062   Epoch: 15   Global Step: 85340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:01,450-Speed 3400.54 samples/sec   Loss 1.4735   LearningRate 0.0062   Epoch: 15   Global Step: 85350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:04,481-Speed 3378.73 samples/sec   Loss 1.4373   LearningRate 0.0062   Epoch: 15   Global Step: 85360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:07,516-Speed 3375.01 samples/sec   Loss 1.3340   LearningRate 0.0062   Epoch: 15   Global Step: 85370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:10,532-Speed 3396.25 samples/sec   Loss 1.4235   LearningRate 0.0062   Epoch: 15   Global Step: 85380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:13,639-Speed 3296.54 samples/sec   Loss 1.4158   LearningRate 0.0062   Epoch: 15   Global Step: 85390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:16:16,659-Speed 3391.06 samples/sec   Loss 1.3531   LearningRate 0.0062   Epoch: 15   Global Step: 85400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:19,685-Speed 3385.17 samples/sec   Loss 1.5333   LearningRate 0.0062   Epoch: 15   Global Step: 85410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:22,714-Speed 3381.31 samples/sec   Loss 1.4386   LearningRate 0.0062   Epoch: 15   Global Step: 85420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:25,733-Speed 3392.51 samples/sec   Loss 1.5236   LearningRate 0.0062   Epoch: 15   Global Step: 85430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:28,802-Speed 3337.40 samples/sec   Loss 1.4402   LearningRate 0.0062   Epoch: 15   Global Step: 85440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:31,818-Speed 3396.09 samples/sec   Loss 1.3959   LearningRate 0.0062   Epoch: 15   Global Step: 85450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:34,844-Speed 3384.67 samples/sec   Loss 1.5307   LearningRate 0.0062   Epoch: 15   Global Step: 85460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:37,916-Speed 3334.67 samples/sec   Loss 1.4679   LearningRate 0.0062   Epoch: 15   Global Step: 85470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:41,085-Speed 3231.59 samples/sec   Loss 1.5547   LearningRate 0.0062   Epoch: 15   Global Step: 85480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:44,098-Speed 3399.84 samples/sec   Loss 1.5495   LearningRate 0.0062   Epoch: 15   Global Step: 85490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:47,111-Speed 3400.11 samples/sec   Loss 1.4535   LearningRate 0.0062   Epoch: 15   Global Step: 85500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:50,133-Speed 3389.39 samples/sec   Loss 1.4218   LearningRate 0.0062   Epoch: 15   Global Step: 85510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:53,156-Speed 3387.05 samples/sec   Loss 1.4565   LearningRate 0.0061   Epoch: 15   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:56,177-Speed 3391.23 samples/sec   Loss 1.5860   LearningRate 0.0061   Epoch: 15   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:16:59,209-Speed 3377.13 samples/sec   Loss 1.3869   LearningRate 0.0061   Epoch: 15   Global Step: 85540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:02,291-Speed 3323.60 samples/sec   Loss 1.4996   LearningRate 0.0061   Epoch: 15   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:05,355-Speed 3342.49 samples/sec   Loss 1.4511   LearningRate 0.0061   Epoch: 15   Global Step: 85560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:08,379-Speed 3387.44 samples/sec   Loss 1.3752   LearningRate 0.0061   Epoch: 15   Global Step: 85570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:11,401-Speed 3389.03 samples/sec   Loss 1.4982   LearningRate 0.0061   Epoch: 15   Global Step: 85580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:14,432-Speed 3380.00 samples/sec   Loss 1.4488   LearningRate 0.0061   Epoch: 15   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:17,443-Speed 3401.14 samples/sec   Loss 1.4307   LearningRate 0.0061   Epoch: 15   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:20,465-Speed 3389.37 samples/sec   Loss 1.4141   LearningRate 0.0061   Epoch: 15   Global Step: 85610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:23,488-Speed 3388.42 samples/sec   Loss 1.5647   LearningRate 0.0061   Epoch: 15   Global Step: 85620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:26,515-Speed 3383.47 samples/sec   Loss 1.3850   LearningRate 0.0061   Epoch: 15   Global Step: 85630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:29,542-Speed 3383.62 samples/sec   Loss 1.4294   LearningRate 0.0061   Epoch: 15   Global Step: 85640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:32,568-Speed 3385.57 samples/sec   Loss 1.4824   LearningRate 0.0061   Epoch: 15   Global Step: 85650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:35,593-Speed 3386.05 samples/sec   Loss 1.5053   LearningRate 0.0061   Epoch: 15   Global Step: 85660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:38,618-Speed 3385.54 samples/sec   Loss 1.5893   LearningRate 0.0061   Epoch: 15   Global Step: 85670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:41,654-Speed 3373.58 samples/sec   Loss 1.4146   LearningRate 0.0061   Epoch: 15   Global Step: 85680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:44,676-Speed 3389.44 samples/sec   Loss 1.4678   LearningRate 0.0061   Epoch: 15   Global Step: 85690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:17:47,701-Speed 3385.93 samples/sec   Loss 1.4813   LearningRate 0.0061   Epoch: 15   Global Step: 85700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:17:50,692-Speed 3424.33 samples/sec   Loss 1.4777   LearningRate 0.0061   Epoch: 15   Global Step: 85710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:17:53,724-Speed 3377.99 samples/sec   Loss 1.5258   LearningRate 0.0061   Epoch: 15   Global Step: 85720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:17:56,750-Speed 3384.63 samples/sec   Loss 1.5785   LearningRate 0.0061   Epoch: 15   Global Step: 85730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:17:59,777-Speed 3383.57 samples/sec   Loss 1.5311   LearningRate 0.0061   Epoch: 15   Global Step: 85740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:18:02,797-Speed 3391.78 samples/sec   Loss 1.3717   LearningRate 0.0060   Epoch: 15   Global Step: 85750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:18:05,827-Speed 3380.52 samples/sec   Loss 1.3866   LearningRate 0.0060   Epoch: 15   Global Step: 85760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:18:08,847-Speed 3392.01 samples/sec   Loss 1.4479   LearningRate 0.0060   Epoch: 15   Global Step: 85770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:18:11,869-Speed 3389.29 samples/sec   Loss 1.4519   LearningRate 0.0060   Epoch: 15   Global Step: 85780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:18:14,896-Speed 3382.84 samples/sec   Loss 1.5298   LearningRate 0.0060   Epoch: 15   Global Step: 85790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:18:17,922-Speed 3384.97 samples/sec   Loss 1.4214   LearningRate 0.0060   Epoch: 15   Global Step: 85800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:18:20,950-Speed 3382.84 samples/sec   Loss 1.5460   LearningRate 0.0060   Epoch: 15   Global Step: 85810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:23,971-Speed 3390.46 samples/sec   Loss 1.4316   LearningRate 0.0060   Epoch: 15   Global Step: 85820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:26,989-Speed 3394.40 samples/sec   Loss 1.4934   LearningRate 0.0060   Epoch: 15   Global Step: 85830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:30,016-Speed 3383.56 samples/sec   Loss 1.4944   LearningRate 0.0060   Epoch: 15   Global Step: 85840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:33,044-Speed 3382.78 samples/sec   Loss 1.5029   LearningRate 0.0060   Epoch: 15   Global Step: 85850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:36,087-Speed 3365.21 samples/sec   Loss 1.4857   LearningRate 0.0060   Epoch: 15   Global Step: 85860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:39,140-Speed 3355.54 samples/sec   Loss 1.4712   LearningRate 0.0060   Epoch: 15   Global Step: 85870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:42,163-Speed 3387.60 samples/sec   Loss 1.5386   LearningRate 0.0060   Epoch: 15   Global Step: 85880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:45,185-Speed 3389.61 samples/sec   Loss 1.4654   LearningRate 0.0060   Epoch: 15   Global Step: 85890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:48,210-Speed 3385.99 samples/sec   Loss 1.4517   LearningRate 0.0060   Epoch: 15   Global Step: 85900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:51,243-Speed 3376.83 samples/sec   Loss 1.5516   LearningRate 0.0060   Epoch: 15   Global Step: 85910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:18:54,250-Speed 3406.12 samples/sec   Loss 1.5173   LearningRate 0.0060   Epoch: 15   Global Step: 85920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:18:57,270-Speed 3391.47 samples/sec   Loss 1.5602   LearningRate 0.0060   Epoch: 15   Global Step: 85930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:19:00,296-Speed 3384.55 samples/sec   Loss 1.5593   LearningRate 0.0060   Epoch: 15   Global Step: 85940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:19:03,322-Speed 3385.50 samples/sec   Loss 1.5660   LearningRate 0.0060   Epoch: 15   Global Step: 85950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:19:06,345-Speed 3387.96 samples/sec   Loss 1.6122   LearningRate 0.0060   Epoch: 15   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:19:09,369-Speed 3387.08 samples/sec   Loss 1.4995   LearningRate 0.0060   Epoch: 15   Global Step: 85970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:19:12,396-Speed 3382.79 samples/sec   Loss 1.4977   LearningRate 0.0060   Epoch: 15   Global Step: 85980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:19:15,424-Speed 3383.34 samples/sec   Loss 1.5299   LearningRate 0.0059   Epoch: 15   Global Step: 85990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:19:18,459-Speed 3373.99 samples/sec   Loss 1.5473   LearningRate 0.0059   Epoch: 15   Global Step: 86000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:20:01,719-[lfw][86000]XNorm: 21.542649
Training: 2022-04-27 10:20:01,720-[lfw][86000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-27 10:20:01,720-[lfw][86000]Accuracy-Highest: 0.99817
Training: 2022-04-27 10:20:51,877-[cfp_fp][86000]XNorm: 20.821875
Training: 2022-04-27 10:20:51,878-[cfp_fp][86000]Accuracy-Flip: 0.97843+-0.00671
Training: 2022-04-27 10:20:51,878-[cfp_fp][86000]Accuracy-Highest: 0.97843
Training: 2022-04-27 10:21:35,061-[agedb_30][86000]XNorm: 21.790368
Training: 2022-04-27 10:21:35,062-[agedb_30][86000]Accuracy-Flip: 0.98117+-0.00667
Training: 2022-04-27 10:21:35,062-[agedb_30][86000]Accuracy-Highest: 0.98133
Training: 2022-04-27 10:21:38,102-Speed 73.33 samples/sec   Loss 1.4744   LearningRate 0.0059   Epoch: 15   Global Step: 86010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:21:41,095-Speed 3422.58 samples/sec   Loss 1.5950   LearningRate 0.0059   Epoch: 15   Global Step: 86020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:21:44,107-Speed 3401.05 samples/sec   Loss 1.4545   LearningRate 0.0059   Epoch: 15   Global Step: 86030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:21:47,120-Speed 3399.73 samples/sec   Loss 1.5157   LearningRate 0.0059   Epoch: 15   Global Step: 86040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:21:50,139-Speed 3391.83 samples/sec   Loss 1.4122   LearningRate 0.0059   Epoch: 15   Global Step: 86050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:21:53,161-Speed 3388.89 samples/sec   Loss 1.4399   LearningRate 0.0059   Epoch: 15   Global Step: 86060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:21:56,178-Speed 3395.31 samples/sec   Loss 1.4436   LearningRate 0.0059   Epoch: 15   Global Step: 86070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:21:59,212-Speed 3376.16 samples/sec   Loss 1.5344   LearningRate 0.0059   Epoch: 15   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:02,226-Speed 3398.35 samples/sec   Loss 1.4951   LearningRate 0.0059   Epoch: 15   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:05,249-Speed 3387.92 samples/sec   Loss 1.5301   LearningRate 0.0059   Epoch: 15   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:08,274-Speed 3385.88 samples/sec   Loss 1.5261   LearningRate 0.0059   Epoch: 15   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:11,325-Speed 3357.32 samples/sec   Loss 1.5562   LearningRate 0.0059   Epoch: 15   Global Step: 86120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:22:14,355-Speed 3381.13 samples/sec   Loss 1.5574   LearningRate 0.0059   Epoch: 15   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:17,449-Speed 3310.16 samples/sec   Loss 1.4477   LearningRate 0.0059   Epoch: 15   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:20,476-Speed 3383.53 samples/sec   Loss 1.5243   LearningRate 0.0059   Epoch: 15   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:23,500-Speed 3387.04 samples/sec   Loss 1.5315   LearningRate 0.0059   Epoch: 15   Global Step: 86160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:26,525-Speed 3385.67 samples/sec   Loss 1.5136   LearningRate 0.0059   Epoch: 15   Global Step: 86170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:29,552-Speed 3384.21 samples/sec   Loss 1.5110   LearningRate 0.0059   Epoch: 15   Global Step: 86180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:32,578-Speed 3384.06 samples/sec   Loss 1.5144   LearningRate 0.0059   Epoch: 15   Global Step: 86190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:35,606-Speed 3383.00 samples/sec   Loss 1.5902   LearningRate 0.0059   Epoch: 15   Global Step: 86200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:38,634-Speed 3382.63 samples/sec   Loss 1.5956   LearningRate 0.0059   Epoch: 15   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:41,660-Speed 3385.12 samples/sec   Loss 1.5530   LearningRate 0.0058   Epoch: 15   Global Step: 86220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:44,679-Speed 3393.18 samples/sec   Loss 1.4270   LearningRate 0.0058   Epoch: 15   Global Step: 86230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:47,702-Speed 3387.54 samples/sec   Loss 1.5653   LearningRate 0.0058   Epoch: 15   Global Step: 86240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:50,743-Speed 3368.86 samples/sec   Loss 1.4809   LearningRate 0.0058   Epoch: 15   Global Step: 86250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:53,769-Speed 3384.71 samples/sec   Loss 1.5979   LearningRate 0.0058   Epoch: 15   Global Step: 86260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:56,796-Speed 3383.09 samples/sec   Loss 1.5790   LearningRate 0.0058   Epoch: 15   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:22:59,831-Speed 3375.09 samples/sec   Loss 1.5410   LearningRate 0.0058   Epoch: 15   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:03,018-Speed 3213.96 samples/sec   Loss 1.5083   LearningRate 0.0058   Epoch: 15   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:06,098-Speed 3325.17 samples/sec   Loss 1.5463   LearningRate 0.0058   Epoch: 15   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:09,124-Speed 3384.95 samples/sec   Loss 1.5634   LearningRate 0.0058   Epoch: 15   Global Step: 86310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:12,151-Speed 3383.65 samples/sec   Loss 1.5763   LearningRate 0.0058   Epoch: 15   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:15,161-Speed 3402.96 samples/sec   Loss 1.5594   LearningRate 0.0058   Epoch: 15   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:18,183-Speed 3389.36 samples/sec   Loss 1.6538   LearningRate 0.0058   Epoch: 15   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:21,204-Speed 3390.02 samples/sec   Loss 1.5677   LearningRate 0.0058   Epoch: 15   Global Step: 86350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:24,228-Speed 3387.47 samples/sec   Loss 1.5334   LearningRate 0.0058   Epoch: 15   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:27,349-Speed 3280.94 samples/sec   Loss 1.5534   LearningRate 0.0058   Epoch: 15   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:30,516-Speed 3233.97 samples/sec   Loss 1.6135   LearningRate 0.0058   Epoch: 15   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:33,540-Speed 3388.21 samples/sec   Loss 1.5656   LearningRate 0.0058   Epoch: 15   Global Step: 86390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:36,562-Speed 3388.78 samples/sec   Loss 1.4985   LearningRate 0.0058   Epoch: 15   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:39,581-Speed 3392.33 samples/sec   Loss 1.5046   LearningRate 0.0058   Epoch: 15   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:23:42,581-Speed 3415.18 samples/sec   Loss 1.5446   LearningRate 0.0058   Epoch: 15   Global Step: 86420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:23:45,604-Speed 3387.53 samples/sec   Loss 1.5667   LearningRate 0.0058   Epoch: 15   Global Step: 86430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:23:48,629-Speed 3386.27 samples/sec   Loss 1.4914   LearningRate 0.0058   Epoch: 15   Global Step: 86440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:23:51,657-Speed 3381.75 samples/sec   Loss 1.5742   LearningRate 0.0058   Epoch: 15   Global Step: 86450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:23:54,681-Speed 3387.78 samples/sec   Loss 1.4021   LearningRate 0.0057   Epoch: 15   Global Step: 86460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:23:57,744-Speed 3343.65 samples/sec   Loss 1.5065   LearningRate 0.0057   Epoch: 15   Global Step: 86470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:24:00,788-Speed 3365.33 samples/sec   Loss 1.6607   LearningRate 0.0057   Epoch: 15   Global Step: 86480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:24:03,820-Speed 3377.56 samples/sec   Loss 1.5547   LearningRate 0.0057   Epoch: 15   Global Step: 86490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:24:06,839-Speed 3392.39 samples/sec   Loss 1.5914   LearningRate 0.0057   Epoch: 15   Global Step: 86500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:24:09,859-Speed 3391.88 samples/sec   Loss 1.6592   LearningRate 0.0057   Epoch: 15   Global Step: 86510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:24:12,882-Speed 3388.57 samples/sec   Loss 1.5679   LearningRate 0.0057   Epoch: 15   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:15,916-Speed 3375.67 samples/sec   Loss 1.6273   LearningRate 0.0057   Epoch: 15   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:18,931-Speed 3397.41 samples/sec   Loss 1.4877   LearningRate 0.0057   Epoch: 15   Global Step: 86540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:21,955-Speed 3386.66 samples/sec   Loss 1.5322   LearningRate 0.0057   Epoch: 15   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:24,990-Speed 3375.35 samples/sec   Loss 1.5513   LearningRate 0.0057   Epoch: 15   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:28,006-Speed 3395.37 samples/sec   Loss 1.6790   LearningRate 0.0057   Epoch: 15   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:31,022-Speed 3396.53 samples/sec   Loss 1.5215   LearningRate 0.0057   Epoch: 15   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:34,041-Speed 3392.89 samples/sec   Loss 1.5916   LearningRate 0.0057   Epoch: 15   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:37,060-Speed 3392.93 samples/sec   Loss 1.6378   LearningRate 0.0057   Epoch: 15   Global Step: 86600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:40,076-Speed 3395.07 samples/sec   Loss 1.6171   LearningRate 0.0057   Epoch: 15   Global Step: 86610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:43,096-Speed 3391.39 samples/sec   Loss 1.5732   LearningRate 0.0057   Epoch: 15   Global Step: 86620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:24:46,100-Speed 3410.38 samples/sec   Loss 1.5993   LearningRate 0.0057   Epoch: 15   Global Step: 86630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:49,126-Speed 3384.60 samples/sec   Loss 1.4955   LearningRate 0.0057   Epoch: 15   Global Step: 86640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:52,153-Speed 3384.31 samples/sec   Loss 1.5942   LearningRate 0.0057   Epoch: 15   Global Step: 86650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:55,167-Speed 3397.76 samples/sec   Loss 1.5584   LearningRate 0.0057   Epoch: 15   Global Step: 86660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:24:58,185-Speed 3393.38 samples/sec   Loss 1.5902   LearningRate 0.0057   Epoch: 15   Global Step: 86670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:01,210-Speed 3386.03 samples/sec   Loss 1.4975   LearningRate 0.0057   Epoch: 15   Global Step: 86680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:04,265-Speed 3353.17 samples/sec   Loss 1.5660   LearningRate 0.0056   Epoch: 15   Global Step: 86690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:07,284-Speed 3391.78 samples/sec   Loss 1.6133   LearningRate 0.0056   Epoch: 15   Global Step: 86700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:10,316-Speed 3378.99 samples/sec   Loss 1.6046   LearningRate 0.0056   Epoch: 15   Global Step: 86710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:13,380-Speed 3342.02 samples/sec   Loss 1.5492   LearningRate 0.0056   Epoch: 15   Global Step: 86720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:16,401-Speed 3391.05 samples/sec   Loss 1.5923   LearningRate 0.0056   Epoch: 15   Global Step: 86730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:19,425-Speed 3387.17 samples/sec   Loss 1.5883   LearningRate 0.0056   Epoch: 15   Global Step: 86740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:22,453-Speed 3382.75 samples/sec   Loss 1.5417   LearningRate 0.0056   Epoch: 15   Global Step: 86750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:25,597-Speed 3257.20 samples/sec   Loss 1.5727   LearningRate 0.0056   Epoch: 15   Global Step: 86760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:28,614-Speed 3394.59 samples/sec   Loss 1.5216   LearningRate 0.0056   Epoch: 15   Global Step: 86770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:31,638-Speed 3388.11 samples/sec   Loss 1.4687   LearningRate 0.0056   Epoch: 15   Global Step: 86780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:34,654-Speed 3395.08 samples/sec   Loss 1.5749   LearningRate 0.0056   Epoch: 15   Global Step: 86790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:37,681-Speed 3383.41 samples/sec   Loss 1.5914   LearningRate 0.0056   Epoch: 15   Global Step: 86800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:40,703-Speed 3389.18 samples/sec   Loss 1.5311   LearningRate 0.0056   Epoch: 15   Global Step: 86810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:43,732-Speed 3382.34 samples/sec   Loss 1.5769   LearningRate 0.0056   Epoch: 15   Global Step: 86820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:46,757-Speed 3386.03 samples/sec   Loss 1.6556   LearningRate 0.0056   Epoch: 15   Global Step: 86830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:25:49,814-Speed 3349.91 samples/sec   Loss 1.6062   LearningRate 0.0056   Epoch: 15   Global Step: 86840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:52,977-Speed 3238.17 samples/sec   Loss 1.4731   LearningRate 0.0056   Epoch: 15   Global Step: 86850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:56,014-Speed 3372.68 samples/sec   Loss 1.6328   LearningRate 0.0056   Epoch: 15   Global Step: 86860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:25:59,040-Speed 3385.09 samples/sec   Loss 1.6599   LearningRate 0.0056   Epoch: 15   Global Step: 86870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:02,062-Speed 3388.61 samples/sec   Loss 1.5928   LearningRate 0.0056   Epoch: 15   Global Step: 86880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:05,097-Speed 3375.14 samples/sec   Loss 1.6765   LearningRate 0.0056   Epoch: 15   Global Step: 86890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:08,127-Speed 3379.94 samples/sec   Loss 1.5602   LearningRate 0.0056   Epoch: 15   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:11,147-Speed 3391.49 samples/sec   Loss 1.5812   LearningRate 0.0056   Epoch: 15   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:14,169-Speed 3390.26 samples/sec   Loss 1.6327   LearningRate 0.0056   Epoch: 15   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:17,191-Speed 3389.20 samples/sec   Loss 1.6155   LearningRate 0.0055   Epoch: 15   Global Step: 86930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:20,201-Speed 3403.71 samples/sec   Loss 1.5218   LearningRate 0.0055   Epoch: 15   Global Step: 86940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:23,235-Speed 3375.22 samples/sec   Loss 1.6235   LearningRate 0.0055   Epoch: 15   Global Step: 86950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:26,263-Speed 3383.15 samples/sec   Loss 1.5942   LearningRate 0.0055   Epoch: 15   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:29,290-Speed 3383.35 samples/sec   Loss 1.5518   LearningRate 0.0055   Epoch: 15   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:32,318-Speed 3382.47 samples/sec   Loss 1.5455   LearningRate 0.0055   Epoch: 15   Global Step: 86980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:35,347-Speed 3381.41 samples/sec   Loss 1.6409   LearningRate 0.0055   Epoch: 15   Global Step: 86990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:38,378-Speed 3378.94 samples/sec   Loss 1.5652   LearningRate 0.0055   Epoch: 15   Global Step: 87000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:41,402-Speed 3387.35 samples/sec   Loss 1.5713   LearningRate 0.0055   Epoch: 15   Global Step: 87010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:44,424-Speed 3389.08 samples/sec   Loss 1.6285   LearningRate 0.0055   Epoch: 15   Global Step: 87020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:47,453-Speed 3381.56 samples/sec   Loss 1.5093   LearningRate 0.0055   Epoch: 15   Global Step: 87030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:50,481-Speed 3381.96 samples/sec   Loss 1.5724   LearningRate 0.0055   Epoch: 15   Global Step: 87040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:26:53,491-Speed 3403.85 samples/sec   Loss 1.5701   LearningRate 0.0055   Epoch: 15   Global Step: 87050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:56,540-Speed 3359.18 samples/sec   Loss 1.6433   LearningRate 0.0055   Epoch: 15   Global Step: 87060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:26:59,574-Speed 3375.10 samples/sec   Loss 1.5937   LearningRate 0.0055   Epoch: 15   Global Step: 87070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:02,618-Speed 3364.91 samples/sec   Loss 1.5353   LearningRate 0.0055   Epoch: 15   Global Step: 87080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:05,671-Speed 3355.10 samples/sec   Loss 1.6973   LearningRate 0.0055   Epoch: 15   Global Step: 87090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:08,695-Speed 3387.04 samples/sec   Loss 1.6117   LearningRate 0.0055   Epoch: 15   Global Step: 87100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:11,792-Speed 3307.56 samples/sec   Loss 1.6089   LearningRate 0.0055   Epoch: 15   Global Step: 87110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:14,818-Speed 3384.24 samples/sec   Loss 1.5426   LearningRate 0.0055   Epoch: 15   Global Step: 87120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:17,862-Speed 3365.22 samples/sec   Loss 1.5963   LearningRate 0.0055   Epoch: 15   Global Step: 87130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:20,891-Speed 3381.31 samples/sec   Loss 1.6475   LearningRate 0.0055   Epoch: 15   Global Step: 87140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:23,899-Speed 3405.30 samples/sec   Loss 1.6090   LearningRate 0.0055   Epoch: 15   Global Step: 87150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:26,932-Speed 3376.53 samples/sec   Loss 1.6238   LearningRate 0.0055   Epoch: 15   Global Step: 87160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:29,966-Speed 3376.31 samples/sec   Loss 1.5499   LearningRate 0.0055   Epoch: 15   Global Step: 87170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:33,009-Speed 3365.59 samples/sec   Loss 1.5713   LearningRate 0.0054   Epoch: 15   Global Step: 87180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:36,035-Speed 3385.46 samples/sec   Loss 1.6522   LearningRate 0.0054   Epoch: 15   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:39,091-Speed 3351.04 samples/sec   Loss 1.6803   LearningRate 0.0054   Epoch: 15   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:43,150-Speed 2523.21 samples/sec   Loss 1.6224   LearningRate 0.0054   Epoch: 15   Global Step: 87210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:46,192-Speed 3367.07 samples/sec   Loss 1.6390   LearningRate 0.0054   Epoch: 15   Global Step: 87220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:49,281-Speed 3315.75 samples/sec   Loss 1.5524   LearningRate 0.0054   Epoch: 15   Global Step: 87230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:52,343-Speed 3345.07 samples/sec   Loss 1.5492   LearningRate 0.0054   Epoch: 15   Global Step: 87240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:27:55,380-Speed 3372.10 samples/sec   Loss 1.6383   LearningRate 0.0054   Epoch: 15   Global Step: 87250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:27:58,403-Speed 3389.26 samples/sec   Loss 1.5171   LearningRate 0.0054   Epoch: 15   Global Step: 87260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:01,430-Speed 3383.34 samples/sec   Loss 1.6026   LearningRate 0.0054   Epoch: 15   Global Step: 87270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:04,455-Speed 3385.99 samples/sec   Loss 1.6173   LearningRate 0.0054   Epoch: 15   Global Step: 87280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:07,486-Speed 3379.65 samples/sec   Loss 1.6114   LearningRate 0.0054   Epoch: 15   Global Step: 87290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:10,515-Speed 3381.17 samples/sec   Loss 1.6423   LearningRate 0.0054   Epoch: 15   Global Step: 87300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:13,542-Speed 3383.05 samples/sec   Loss 1.6154   LearningRate 0.0054   Epoch: 15   Global Step: 87310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:16,574-Speed 3378.08 samples/sec   Loss 1.5884   LearningRate 0.0054   Epoch: 15   Global Step: 87320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:19,597-Speed 3387.97 samples/sec   Loss 1.6416   LearningRate 0.0054   Epoch: 15   Global Step: 87330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:22,625-Speed 3382.79 samples/sec   Loss 1.5158   LearningRate 0.0054   Epoch: 15   Global Step: 87340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:25,658-Speed 3376.83 samples/sec   Loss 1.5815   LearningRate 0.0054   Epoch: 15   Global Step: 87350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:28,683-Speed 3386.93 samples/sec   Loss 1.5446   LearningRate 0.0054   Epoch: 15   Global Step: 87360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:28:31,690-Speed 3405.48 samples/sec   Loss 1.5387   LearningRate 0.0054   Epoch: 15   Global Step: 87370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:34,724-Speed 3376.25 samples/sec   Loss 1.6602   LearningRate 0.0054   Epoch: 15   Global Step: 87380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:37,757-Speed 3376.85 samples/sec   Loss 1.4831   LearningRate 0.0054   Epoch: 15   Global Step: 87390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:40,788-Speed 3378.53 samples/sec   Loss 1.5671   LearningRate 0.0054   Epoch: 15   Global Step: 87400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:43,816-Speed 3382.72 samples/sec   Loss 1.6067   LearningRate 0.0054   Epoch: 15   Global Step: 87410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:46,847-Speed 3379.58 samples/sec   Loss 1.5513   LearningRate 0.0053   Epoch: 15   Global Step: 87420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:49,875-Speed 3382.00 samples/sec   Loss 1.7207   LearningRate 0.0053   Epoch: 15   Global Step: 87430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:52,910-Speed 3374.85 samples/sec   Loss 1.6437   LearningRate 0.0053   Epoch: 15   Global Step: 87440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:55,940-Speed 3380.71 samples/sec   Loss 1.5862   LearningRate 0.0053   Epoch: 15   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:28:58,994-Speed 3353.80 samples/sec   Loss 1.5504   LearningRate 0.0053   Epoch: 15   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:02,097-Speed 3301.33 samples/sec   Loss 1.6292   LearningRate 0.0053   Epoch: 15   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:05,108-Speed 3401.41 samples/sec   Loss 1.4840   LearningRate 0.0053   Epoch: 15   Global Step: 87480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:08,134-Speed 3384.38 samples/sec   Loss 1.5584   LearningRate 0.0053   Epoch: 15   Global Step: 87490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:11,161-Speed 3383.71 samples/sec   Loss 1.6630   LearningRate 0.0053   Epoch: 15   Global Step: 87500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:14,193-Speed 3378.04 samples/sec   Loss 1.7001   LearningRate 0.0053   Epoch: 15   Global Step: 87510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:17,227-Speed 3376.17 samples/sec   Loss 1.5827   LearningRate 0.0053   Epoch: 15   Global Step: 87520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:20,259-Speed 3377.79 samples/sec   Loss 1.5978   LearningRate 0.0053   Epoch: 15   Global Step: 87530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:23,289-Speed 3380.86 samples/sec   Loss 1.5291   LearningRate 0.0053   Epoch: 15   Global Step: 87540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:26,321-Speed 3377.49 samples/sec   Loss 1.5276   LearningRate 0.0053   Epoch: 15   Global Step: 87550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:29,355-Speed 3376.88 samples/sec   Loss 1.5937   LearningRate 0.0053   Epoch: 15   Global Step: 87560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:32,379-Speed 3386.04 samples/sec   Loss 1.5851   LearningRate 0.0053   Epoch: 15   Global Step: 87570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:29:35,405-Speed 3384.76 samples/sec   Loss 1.5532   LearningRate 0.0053   Epoch: 15   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:38,429-Speed 3386.88 samples/sec   Loss 1.6109   LearningRate 0.0053   Epoch: 15   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:41,454-Speed 3386.37 samples/sec   Loss 1.6297   LearningRate 0.0053   Epoch: 15   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:44,485-Speed 3379.02 samples/sec   Loss 1.6277   LearningRate 0.0053   Epoch: 15   Global Step: 87610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:47,536-Speed 3357.17 samples/sec   Loss 1.5422   LearningRate 0.0053   Epoch: 15   Global Step: 87620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:50,617-Speed 3324.29 samples/sec   Loss 1.5849   LearningRate 0.0053   Epoch: 15   Global Step: 87630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:53,793-Speed 3225.28 samples/sec   Loss 1.5303   LearningRate 0.0053   Epoch: 15   Global Step: 87640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:56,820-Speed 3383.49 samples/sec   Loss 1.5705   LearningRate 0.0053   Epoch: 15   Global Step: 87650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:29:59,860-Speed 3369.82 samples/sec   Loss 1.6224   LearningRate 0.0053   Epoch: 15   Global Step: 87660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:02,899-Speed 3369.98 samples/sec   Loss 1.7015   LearningRate 0.0052   Epoch: 15   Global Step: 87670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:05,931-Speed 3377.80 samples/sec   Loss 1.5707   LearningRate 0.0052   Epoch: 15   Global Step: 87680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:30:08,935-Speed 3409.26 samples/sec   Loss 1.5940   LearningRate 0.0052   Epoch: 15   Global Step: 87690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:11,977-Speed 3367.20 samples/sec   Loss 1.6113   LearningRate 0.0052   Epoch: 15   Global Step: 87700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:15,011-Speed 3376.47 samples/sec   Loss 1.4478   LearningRate 0.0052   Epoch: 15   Global Step: 87710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:18,078-Speed 3339.30 samples/sec   Loss 1.5984   LearningRate 0.0052   Epoch: 15   Global Step: 87720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:21,102-Speed 3386.78 samples/sec   Loss 1.6082   LearningRate 0.0052   Epoch: 15   Global Step: 87730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:24,129-Speed 3384.17 samples/sec   Loss 1.6324   LearningRate 0.0052   Epoch: 15   Global Step: 87740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:30:27,140-Speed 3401.86 samples/sec   Loss 1.6577   LearningRate 0.0052   Epoch: 15   Global Step: 87750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:30,164-Speed 3386.39 samples/sec   Loss 1.5609   LearningRate 0.0052   Epoch: 15   Global Step: 87760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:33,188-Speed 3387.94 samples/sec   Loss 1.6255   LearningRate 0.0052   Epoch: 15   Global Step: 87770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:36,239-Speed 3356.90 samples/sec   Loss 1.5500   LearningRate 0.0052   Epoch: 15   Global Step: 87780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:39,259-Speed 3391.03 samples/sec   Loss 1.5829   LearningRate 0.0052   Epoch: 15   Global Step: 87790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:42,287-Speed 3382.66 samples/sec   Loss 1.5983   LearningRate 0.0052   Epoch: 15   Global Step: 87800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:45,311-Speed 3387.26 samples/sec   Loss 1.6840   LearningRate 0.0052   Epoch: 15   Global Step: 87810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:48,336-Speed 3386.57 samples/sec   Loss 1.6602   LearningRate 0.0052   Epoch: 15   Global Step: 87820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:51,371-Speed 3373.86 samples/sec   Loss 1.6534   LearningRate 0.0052   Epoch: 15   Global Step: 87830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:54,413-Speed 3367.76 samples/sec   Loss 1.6560   LearningRate 0.0052   Epoch: 15   Global Step: 87840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:30:57,443-Speed 3380.30 samples/sec   Loss 1.6310   LearningRate 0.0052   Epoch: 15   Global Step: 87850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:00,469-Speed 3384.64 samples/sec   Loss 1.5851   LearningRate 0.0052   Epoch: 15   Global Step: 87860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:03,499-Speed 3380.35 samples/sec   Loss 1.6391   LearningRate 0.0052   Epoch: 15   Global Step: 87870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:06,537-Speed 3370.88 samples/sec   Loss 1.4551   LearningRate 0.0052   Epoch: 15   Global Step: 87880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:09,568-Speed 3380.21 samples/sec   Loss 1.6156   LearningRate 0.0052   Epoch: 15   Global Step: 87890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:12,591-Speed 3388.37 samples/sec   Loss 1.6567   LearningRate 0.0052   Epoch: 15   Global Step: 87900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:15,612-Speed 3389.84 samples/sec   Loss 1.6608   LearningRate 0.0052   Epoch: 15   Global Step: 87910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:18,645-Speed 3376.53 samples/sec   Loss 1.5519   LearningRate 0.0051   Epoch: 15   Global Step: 87920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:21,668-Speed 3388.49 samples/sec   Loss 1.5856   LearningRate 0.0051   Epoch: 15   Global Step: 87930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:24,708-Speed 3368.66 samples/sec   Loss 1.5691   LearningRate 0.0051   Epoch: 15   Global Step: 87940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:27,752-Speed 3365.04 samples/sec   Loss 1.5105   LearningRate 0.0051   Epoch: 15   Global Step: 87950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:30,784-Speed 3378.09 samples/sec   Loss 1.5307   LearningRate 0.0051   Epoch: 15   Global Step: 87960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:33,852-Speed 3338.79 samples/sec   Loss 1.5671   LearningRate 0.0051   Epoch: 15   Global Step: 87970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:36,888-Speed 3373.80 samples/sec   Loss 1.5978   LearningRate 0.0051   Epoch: 15   Global Step: 87980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:39,913-Speed 3386.00 samples/sec   Loss 1.6198   LearningRate 0.0051   Epoch: 15   Global Step: 87990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:31:42,941-Speed 3382.51 samples/sec   Loss 1.6323   LearningRate 0.0051   Epoch: 15   Global Step: 88000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:32:26,129-[lfw][88000]XNorm: 21.941218
Training: 2022-04-27 10:32:26,129-[lfw][88000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-27 10:32:26,130-[lfw][88000]Accuracy-Highest: 0.99817
Training: 2022-04-27 10:33:16,244-[cfp_fp][88000]XNorm: 21.148112
Training: 2022-04-27 10:33:16,245-[cfp_fp][88000]Accuracy-Flip: 0.98257+-0.00530
Training: 2022-04-27 10:33:16,245-[cfp_fp][88000]Accuracy-Highest: 0.98257
Training: 2022-04-27 10:33:59,544-[agedb_30][88000]XNorm: 22.234802
Training: 2022-04-27 10:33:59,545-[agedb_30][88000]Accuracy-Flip: 0.97917+-0.00807
Training: 2022-04-27 10:33:59,545-[agedb_30][88000]Accuracy-Highest: 0.98133
Training: 2022-04-27 10:34:02,563-Speed 73.34 samples/sec   Loss 1.4934   LearningRate 0.0051   Epoch: 15   Global Step: 88010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:05,567-Speed 3408.71 samples/sec   Loss 1.5938   LearningRate 0.0051   Epoch: 15   Global Step: 88020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:08,583-Speed 3396.41 samples/sec   Loss 1.4990   LearningRate 0.0051   Epoch: 15   Global Step: 88030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:11,592-Speed 3403.63 samples/sec   Loss 1.5828   LearningRate 0.0051   Epoch: 15   Global Step: 88040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:14,617-Speed 3385.89 samples/sec   Loss 1.6293   LearningRate 0.0051   Epoch: 15   Global Step: 88050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:34:17,614-Speed 3418.02 samples/sec   Loss 1.6837   LearningRate 0.0051   Epoch: 15   Global Step: 88060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:20,635-Speed 3390.04 samples/sec   Loss 1.5367   LearningRate 0.0051   Epoch: 15   Global Step: 88070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:23,651-Speed 3396.53 samples/sec   Loss 1.6277   LearningRate 0.0051   Epoch: 15   Global Step: 88080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:26,665-Speed 3398.00 samples/sec   Loss 1.5439   LearningRate 0.0051   Epoch: 15   Global Step: 88090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:29,684-Speed 3392.72 samples/sec   Loss 1.7030   LearningRate 0.0051   Epoch: 15   Global Step: 88100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:32,698-Speed 3397.90 samples/sec   Loss 1.5379   LearningRate 0.0051   Epoch: 15   Global Step: 88110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:35,735-Speed 3372.69 samples/sec   Loss 1.6235   LearningRate 0.0051   Epoch: 15   Global Step: 88120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:38,750-Speed 3397.18 samples/sec   Loss 1.5665   LearningRate 0.0051   Epoch: 15   Global Step: 88130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:41,764-Speed 3398.01 samples/sec   Loss 1.5753   LearningRate 0.0051   Epoch: 15   Global Step: 88140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:44,779-Speed 3396.73 samples/sec   Loss 1.5127   LearningRate 0.0051   Epoch: 15   Global Step: 88150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:47,782-Speed 3411.07 samples/sec   Loss 1.4788   LearningRate 0.0051   Epoch: 15   Global Step: 88160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:50,797-Speed 3396.94 samples/sec   Loss 1.6584   LearningRate 0.0050   Epoch: 15   Global Step: 88170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:53,812-Speed 3397.33 samples/sec   Loss 1.5218   LearningRate 0.0050   Epoch: 15   Global Step: 88180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:56,829-Speed 3395.35 samples/sec   Loss 1.6262   LearningRate 0.0050   Epoch: 15   Global Step: 88190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:34:59,843-Speed 3398.02 samples/sec   Loss 1.5104   LearningRate 0.0050   Epoch: 15   Global Step: 88200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:02,868-Speed 3386.36 samples/sec   Loss 1.6174   LearningRate 0.0050   Epoch: 15   Global Step: 88210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:05,886-Speed 3392.80 samples/sec   Loss 1.6389   LearningRate 0.0050   Epoch: 15   Global Step: 88220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:08,911-Speed 3386.22 samples/sec   Loss 1.6163   LearningRate 0.0050   Epoch: 15   Global Step: 88230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:11,936-Speed 3386.62 samples/sec   Loss 1.7125   LearningRate 0.0050   Epoch: 15   Global Step: 88240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:14,949-Speed 3398.62 samples/sec   Loss 1.5865   LearningRate 0.0050   Epoch: 15   Global Step: 88250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:17,974-Speed 3386.21 samples/sec   Loss 1.6252   LearningRate 0.0050   Epoch: 15   Global Step: 88260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:20,992-Speed 3393.93 samples/sec   Loss 1.6026   LearningRate 0.0050   Epoch: 15   Global Step: 88270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:24,016-Speed 3387.11 samples/sec   Loss 1.4805   LearningRate 0.0050   Epoch: 15   Global Step: 88280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:27,028-Speed 3400.57 samples/sec   Loss 1.5606   LearningRate 0.0050   Epoch: 15   Global Step: 88290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:30,038-Speed 3402.00 samples/sec   Loss 1.6676   LearningRate 0.0050   Epoch: 15   Global Step: 88300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:35:33,053-Speed 3397.70 samples/sec   Loss 1.6206   LearningRate 0.0050   Epoch: 15   Global Step: 88310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:36,083-Speed 3380.84 samples/sec   Loss 1.5974   LearningRate 0.0050   Epoch: 15   Global Step: 88320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:39,096-Speed 3398.34 samples/sec   Loss 1.6643   LearningRate 0.0050   Epoch: 15   Global Step: 88330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:42,122-Speed 3385.41 samples/sec   Loss 1.4757   LearningRate 0.0050   Epoch: 15   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:45,143-Speed 3390.57 samples/sec   Loss 1.5958   LearningRate 0.0050   Epoch: 15   Global Step: 88350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:48,166-Speed 3387.61 samples/sec   Loss 1.5383   LearningRate 0.0050   Epoch: 15   Global Step: 88360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:51,180-Speed 3398.44 samples/sec   Loss 1.6458   LearningRate 0.0050   Epoch: 15   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:54,194-Speed 3398.58 samples/sec   Loss 1.5559   LearningRate 0.0050   Epoch: 15   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:35:57,206-Speed 3400.16 samples/sec   Loss 1.6235   LearningRate 0.0050   Epoch: 15   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:00,222-Speed 3396.45 samples/sec   Loss 1.5190   LearningRate 0.0050   Epoch: 15   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:03,218-Speed 3418.92 samples/sec   Loss 1.6227   LearningRate 0.0050   Epoch: 15   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:06,233-Speed 3397.38 samples/sec   Loss 1.6180   LearningRate 0.0049   Epoch: 15   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:09,252-Speed 3391.82 samples/sec   Loss 1.6148   LearningRate 0.0049   Epoch: 15   Global Step: 88430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:12,284-Speed 3380.06 samples/sec   Loss 1.4630   LearningRate 0.0049   Epoch: 15   Global Step: 88440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:15,323-Speed 3370.73 samples/sec   Loss 1.5377   LearningRate 0.0049   Epoch: 15   Global Step: 88450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:18,337-Speed 3397.85 samples/sec   Loss 1.6281   LearningRate 0.0049   Epoch: 15   Global Step: 88460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:21,347-Speed 3402.83 samples/sec   Loss 1.6743   LearningRate 0.0049   Epoch: 15   Global Step: 88470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:24,359-Speed 3400.25 samples/sec   Loss 1.6182   LearningRate 0.0049   Epoch: 15   Global Step: 88480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:27,473-Speed 3289.11 samples/sec   Loss 1.6027   LearningRate 0.0049   Epoch: 15   Global Step: 88490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:30,493-Speed 3391.87 samples/sec   Loss 1.5852   LearningRate 0.0049   Epoch: 15   Global Step: 88500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:33,489-Speed 3417.71 samples/sec   Loss 1.6310   LearningRate 0.0049   Epoch: 15   Global Step: 88510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:36,511-Speed 3389.81 samples/sec   Loss 1.5899   LearningRate 0.0049   Epoch: 15   Global Step: 88520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:39,540-Speed 3381.49 samples/sec   Loss 1.6472   LearningRate 0.0049   Epoch: 15   Global Step: 88530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:42,587-Speed 3362.06 samples/sec   Loss 1.5710   LearningRate 0.0049   Epoch: 15   Global Step: 88540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:45,601-Speed 3398.68 samples/sec   Loss 1.5701   LearningRate 0.0049   Epoch: 15   Global Step: 88550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:48,617-Speed 3395.94 samples/sec   Loss 1.6149   LearningRate 0.0049   Epoch: 15   Global Step: 88560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:51,646-Speed 3380.75 samples/sec   Loss 1.5975   LearningRate 0.0049   Epoch: 15   Global Step: 88570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:54,659-Speed 3399.39 samples/sec   Loss 1.5936   LearningRate 0.0049   Epoch: 15   Global Step: 88580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:36:57,671-Speed 3400.33 samples/sec   Loss 1.5810   LearningRate 0.0049   Epoch: 15   Global Step: 88590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:00,753-Speed 3323.54 samples/sec   Loss 1.5369   LearningRate 0.0049   Epoch: 15   Global Step: 88600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:03,765-Speed 3400.59 samples/sec   Loss 1.7694   LearningRate 0.0049   Epoch: 15   Global Step: 88610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:06,789-Speed 3386.31 samples/sec   Loss 1.6209   LearningRate 0.0049   Epoch: 15   Global Step: 88620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:09,822-Speed 3377.91 samples/sec   Loss 1.5191   LearningRate 0.0049   Epoch: 15   Global Step: 88630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:12,835-Speed 3399.58 samples/sec   Loss 1.5781   LearningRate 0.0049   Epoch: 15   Global Step: 88640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:15,848-Speed 3399.30 samples/sec   Loss 1.5871   LearningRate 0.0049   Epoch: 15   Global Step: 88650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:18,865-Speed 3394.66 samples/sec   Loss 1.5709   LearningRate 0.0049   Epoch: 15   Global Step: 88660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:21,879-Speed 3398.58 samples/sec   Loss 1.5176   LearningRate 0.0049   Epoch: 15   Global Step: 88670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:24,992-Speed 3289.44 samples/sec   Loss 1.6016   LearningRate 0.0048   Epoch: 15   Global Step: 88680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:28,057-Speed 3342.25 samples/sec   Loss 1.5309   LearningRate 0.0048   Epoch: 15   Global Step: 88690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:31,149-Speed 3312.67 samples/sec   Loss 1.5851   LearningRate 0.0048   Epoch: 15   Global Step: 88700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:34,146-Speed 3416.61 samples/sec   Loss 1.5036   LearningRate 0.0048   Epoch: 15   Global Step: 88710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:37,180-Speed 3377.24 samples/sec   Loss 1.5949   LearningRate 0.0048   Epoch: 15   Global Step: 88720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:40,199-Speed 3393.15 samples/sec   Loss 1.5681   LearningRate 0.0048   Epoch: 15   Global Step: 88730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:43,215-Speed 3396.38 samples/sec   Loss 1.6351   LearningRate 0.0048   Epoch: 15   Global Step: 88740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:46,230-Speed 3396.36 samples/sec   Loss 1.6005   LearningRate 0.0048   Epoch: 15   Global Step: 88750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:49,378-Speed 3253.70 samples/sec   Loss 1.6195   LearningRate 0.0048   Epoch: 15   Global Step: 88760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:52,493-Speed 3288.07 samples/sec   Loss 1.6260   LearningRate 0.0048   Epoch: 15   Global Step: 88770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:55,538-Speed 3363.94 samples/sec   Loss 1.5569   LearningRate 0.0048   Epoch: 15   Global Step: 88780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:37:58,578-Speed 3369.03 samples/sec   Loss 1.6519   LearningRate 0.0048   Epoch: 15   Global Step: 88790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:01,599-Speed 3390.31 samples/sec   Loss 1.5504   LearningRate 0.0048   Epoch: 15   Global Step: 88800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:04,606-Speed 3405.54 samples/sec   Loss 1.5817   LearningRate 0.0048   Epoch: 15   Global Step: 88810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:07,637-Speed 3379.98 samples/sec   Loss 1.6296   LearningRate 0.0048   Epoch: 15   Global Step: 88820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:10,666-Speed 3381.52 samples/sec   Loss 1.6996   LearningRate 0.0048   Epoch: 15   Global Step: 88830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:13,688-Speed 3389.68 samples/sec   Loss 1.5785   LearningRate 0.0048   Epoch: 15   Global Step: 88840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:16,758-Speed 3335.59 samples/sec   Loss 1.5560   LearningRate 0.0048   Epoch: 15   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:19,778-Speed 3391.43 samples/sec   Loss 1.6168   LearningRate 0.0048   Epoch: 15   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:22,796-Speed 3394.49 samples/sec   Loss 1.6047   LearningRate 0.0048   Epoch: 15   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:25,794-Speed 3416.19 samples/sec   Loss 1.6301   LearningRate 0.0048   Epoch: 15   Global Step: 88880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:28,811-Speed 3393.94 samples/sec   Loss 1.5607   LearningRate 0.0048   Epoch: 15   Global Step: 88890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:31,829-Speed 3395.31 samples/sec   Loss 1.5583   LearningRate 0.0048   Epoch: 15   Global Step: 88900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:34,879-Speed 3357.80 samples/sec   Loss 1.6130   LearningRate 0.0048   Epoch: 15   Global Step: 88910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:37,901-Speed 3389.52 samples/sec   Loss 1.5157   LearningRate 0.0048   Epoch: 15   Global Step: 88920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:40,922-Speed 3389.59 samples/sec   Loss 1.5620   LearningRate 0.0048   Epoch: 15   Global Step: 88930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:43,943-Speed 3390.33 samples/sec   Loss 1.6359   LearningRate 0.0047   Epoch: 15   Global Step: 88940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:46,974-Speed 3379.63 samples/sec   Loss 1.6093   LearningRate 0.0047   Epoch: 15   Global Step: 88950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:50,011-Speed 3372.15 samples/sec   Loss 1.6498   LearningRate 0.0047   Epoch: 15   Global Step: 88960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:53,036-Speed 3385.98 samples/sec   Loss 1.5594   LearningRate 0.0047   Epoch: 15   Global Step: 88970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 10:38:56,052-Speed 3396.51 samples/sec   Loss 1.5632   LearningRate 0.0047   Epoch: 15   Global Step: 88980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:38:59,068-Speed 3395.99 samples/sec   Loss 1.5704   LearningRate 0.0047   Epoch: 15   Global Step: 88990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:02,179-Speed 3291.83 samples/sec   Loss 1.6079   LearningRate 0.0047   Epoch: 15   Global Step: 89000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:05,203-Speed 3387.51 samples/sec   Loss 1.5333   LearningRate 0.0047   Epoch: 15   Global Step: 89010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:08,231-Speed 3382.54 samples/sec   Loss 1.6557   LearningRate 0.0047   Epoch: 15   Global Step: 89020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:11,248-Speed 3394.20 samples/sec   Loss 1.6109   LearningRate 0.0047   Epoch: 15   Global Step: 89030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:14,324-Speed 3330.26 samples/sec   Loss 1.4790   LearningRate 0.0047   Epoch: 15   Global Step: 89040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:17,346-Speed 3389.32 samples/sec   Loss 1.5912   LearningRate 0.0047   Epoch: 15   Global Step: 89050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:20,374-Speed 3381.76 samples/sec   Loss 1.6479   LearningRate 0.0047   Epoch: 15   Global Step: 89060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:23,481-Speed 3296.89 samples/sec   Loss 1.4959   LearningRate 0.0047   Epoch: 15   Global Step: 89070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:26,531-Speed 3358.71 samples/sec   Loss 1.6438   LearningRate 0.0047   Epoch: 15   Global Step: 89080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:29,553-Speed 3389.64 samples/sec   Loss 1.6775   LearningRate 0.0047   Epoch: 15   Global Step: 89090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:32,569-Speed 3395.52 samples/sec   Loss 1.5353   LearningRate 0.0047   Epoch: 15   Global Step: 89100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:35,588-Speed 3392.33 samples/sec   Loss 1.5693   LearningRate 0.0047   Epoch: 15   Global Step: 89110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:38,609-Speed 3390.63 samples/sec   Loss 1.5834   LearningRate 0.0047   Epoch: 15   Global Step: 89120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:41,632-Speed 3388.23 samples/sec   Loss 1.5653   LearningRate 0.0047   Epoch: 15   Global Step: 89130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:44,655-Speed 3387.91 samples/sec   Loss 1.6013   LearningRate 0.0047   Epoch: 15   Global Step: 89140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:47,678-Speed 3388.38 samples/sec   Loss 1.5908   LearningRate 0.0047   Epoch: 15   Global Step: 89150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:50,710-Speed 3377.41 samples/sec   Loss 1.5830   LearningRate 0.0047   Epoch: 15   Global Step: 89160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:53,863-Speed 3249.15 samples/sec   Loss 1.6577   LearningRate 0.0047   Epoch: 15   Global Step: 89170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:39:56,903-Speed 3369.53 samples/sec   Loss 1.4824   LearningRate 0.0047   Epoch: 15   Global Step: 89180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:39:59,967-Speed 3342.94 samples/sec   Loss 1.6118   LearningRate 0.0047   Epoch: 15   Global Step: 89190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-27 10:40:02,969-Speed 3411.46 samples/sec   Loss 1.5786   LearningRate 0.0046   Epoch: 15   Global Step: 89200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:05,988-Speed 3393.10 samples/sec   Loss 1.5660   LearningRate 0.0046   Epoch: 15   Global Step: 89210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:09,005-Speed 3394.29 samples/sec   Loss 1.5625   LearningRate 0.0046   Epoch: 15   Global Step: 89220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:12,026-Speed 3389.86 samples/sec   Loss 1.6323   LearningRate 0.0046   Epoch: 15   Global Step: 89230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:15,063-Speed 3372.70 samples/sec   Loss 1.5471   LearningRate 0.0046   Epoch: 15   Global Step: 89240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:18,094-Speed 3380.52 samples/sec   Loss 1.5823   LearningRate 0.0046   Epoch: 15   Global Step: 89250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:21,108-Speed 3397.27 samples/sec   Loss 1.7151   LearningRate 0.0046   Epoch: 15   Global Step: 89260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:24,133-Speed 3386.73 samples/sec   Loss 1.5951   LearningRate 0.0046   Epoch: 15   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:27,149-Speed 3396.13 samples/sec   Loss 1.5857   LearningRate 0.0046   Epoch: 15   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:30,165-Speed 3395.53 samples/sec   Loss 1.6314   LearningRate 0.0046   Epoch: 15   Global Step: 89290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:33,167-Speed 3411.96 samples/sec   Loss 1.5156   LearningRate 0.0046   Epoch: 15   Global Step: 89300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:36,197-Speed 3379.86 samples/sec   Loss 1.5014   LearningRate 0.0046   Epoch: 15   Global Step: 89310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:39,219-Speed 3389.11 samples/sec   Loss 1.5376   LearningRate 0.0046   Epoch: 15   Global Step: 89320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:42,240-Speed 3391.26 samples/sec   Loss 1.6009   LearningRate 0.0046   Epoch: 15   Global Step: 89330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:45,256-Speed 3395.85 samples/sec   Loss 1.5456   LearningRate 0.0046   Epoch: 15   Global Step: 89340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:48,271-Speed 3397.75 samples/sec   Loss 1.5160   LearningRate 0.0046   Epoch: 15   Global Step: 89350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:51,289-Speed 3393.41 samples/sec   Loss 1.5434   LearningRate 0.0046   Epoch: 15   Global Step: 89360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:54,312-Speed 3388.89 samples/sec   Loss 1.5408   LearningRate 0.0046   Epoch: 15   Global Step: 89370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:40:57,337-Speed 3385.11 samples/sec   Loss 1.6257   LearningRate 0.0046   Epoch: 15   Global Step: 89380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:00,383-Speed 3363.15 samples/sec   Loss 1.5900   LearningRate 0.0046   Epoch: 15   Global Step: 89390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:03,599-Speed 3184.88 samples/sec   Loss 1.5349   LearningRate 0.0046   Epoch: 15   Global Step: 89400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:06,619-Speed 3391.26 samples/sec   Loss 1.6348   LearningRate 0.0046   Epoch: 15   Global Step: 89410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:09,644-Speed 3385.24 samples/sec   Loss 1.5433   LearningRate 0.0046   Epoch: 15   Global Step: 89420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:12,675-Speed 3379.15 samples/sec   Loss 1.5869   LearningRate 0.0046   Epoch: 15   Global Step: 89430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:15,699-Speed 3387.54 samples/sec   Loss 1.6618   LearningRate 0.0046   Epoch: 15   Global Step: 89440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:18,717-Speed 3393.63 samples/sec   Loss 1.5495   LearningRate 0.0046   Epoch: 15   Global Step: 89450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 10:41:21,739-Speed 3389.97 samples/sec   Loss 1.5060   LearningRate 0.0046   Epoch: 15   Global Step: 89460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:24,778-Speed 3369.62 samples/sec   Loss 1.5827   LearningRate 0.0045   Epoch: 15   Global Step: 89470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:27,799-Speed 3391.23 samples/sec   Loss 1.6015   LearningRate 0.0045   Epoch: 15   Global Step: 89480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:30,818-Speed 3391.73 samples/sec   Loss 1.5580   LearningRate 0.0045   Epoch: 15   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:33,844-Speed 3385.47 samples/sec   Loss 1.6199   LearningRate 0.0045   Epoch: 15   Global Step: 89500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:41:36,849-Speed 3407.77 samples/sec   Loss 1.5462   LearningRate 0.0045   Epoch: 15   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:39,869-Speed 3392.07 samples/sec   Loss 1.6482   LearningRate 0.0045   Epoch: 15   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:42,897-Speed 3382.52 samples/sec   Loss 1.5367   LearningRate 0.0045   Epoch: 15   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:45,929-Speed 3377.78 samples/sec   Loss 1.6248   LearningRate 0.0045   Epoch: 15   Global Step: 89540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:49,011-Speed 3323.49 samples/sec   Loss 1.5882   LearningRate 0.0045   Epoch: 15   Global Step: 89550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:41:52,018-Speed 3406.81 samples/sec   Loss 1.5237   LearningRate 0.0045   Epoch: 15   Global Step: 89560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:41:55,039-Speed 3389.69 samples/sec   Loss 1.4964   LearningRate 0.0045   Epoch: 15   Global Step: 89570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:41:58,086-Speed 3362.31 samples/sec   Loss 1.6260   LearningRate 0.0045   Epoch: 15   Global Step: 89580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:01,120-Speed 3375.07 samples/sec   Loss 1.5017   LearningRate 0.0045   Epoch: 15   Global Step: 89590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:04,159-Speed 3371.00 samples/sec   Loss 1.5960   LearningRate 0.0045   Epoch: 15   Global Step: 89600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:07,188-Speed 3381.47 samples/sec   Loss 1.6329   LearningRate 0.0045   Epoch: 15   Global Step: 89610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:10,212-Speed 3387.49 samples/sec   Loss 1.5154   LearningRate 0.0045   Epoch: 15   Global Step: 89620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:13,234-Speed 3388.70 samples/sec   Loss 1.5075   LearningRate 0.0045   Epoch: 15   Global Step: 89630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:16,253-Speed 3392.59 samples/sec   Loss 1.5227   LearningRate 0.0045   Epoch: 15   Global Step: 89640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:19,276-Speed 3388.46 samples/sec   Loss 1.6283   LearningRate 0.0045   Epoch: 15   Global Step: 89650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:42:22,297-Speed 3389.92 samples/sec   Loss 1.5515   LearningRate 0.0045   Epoch: 15   Global Step: 89660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:25,319-Speed 3389.84 samples/sec   Loss 1.5765   LearningRate 0.0045   Epoch: 15   Global Step: 89670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:28,347-Speed 3382.44 samples/sec   Loss 1.5833   LearningRate 0.0045   Epoch: 15   Global Step: 89680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:31,373-Speed 3385.06 samples/sec   Loss 1.5694   LearningRate 0.0045   Epoch: 15   Global Step: 89690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:34,391-Speed 3393.29 samples/sec   Loss 1.5323   LearningRate 0.0045   Epoch: 15   Global Step: 89700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:37,413-Speed 3389.10 samples/sec   Loss 1.4869   LearningRate 0.0045   Epoch: 15   Global Step: 89710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:40,474-Speed 3346.73 samples/sec   Loss 1.5506   LearningRate 0.0045   Epoch: 15   Global Step: 89720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:43,501-Speed 3383.93 samples/sec   Loss 1.5518   LearningRate 0.0045   Epoch: 15   Global Step: 89730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:46,523-Speed 3389.05 samples/sec   Loss 1.6140   LearningRate 0.0044   Epoch: 15   Global Step: 89740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:49,543-Speed 3390.88 samples/sec   Loss 1.4716   LearningRate 0.0044   Epoch: 15   Global Step: 89750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:52,565-Speed 3389.84 samples/sec   Loss 1.6395   LearningRate 0.0044   Epoch: 15   Global Step: 89760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:42:55,568-Speed 3410.02 samples/sec   Loss 1.6047   LearningRate 0.0044   Epoch: 15   Global Step: 89770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:42:58,606-Speed 3372.30 samples/sec   Loss 1.4694   LearningRate 0.0044   Epoch: 15   Global Step: 89780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:01,633-Speed 3383.47 samples/sec   Loss 1.5588   LearningRate 0.0044   Epoch: 15   Global Step: 89790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:04,698-Speed 3341.75 samples/sec   Loss 1.5536   LearningRate 0.0044   Epoch: 15   Global Step: 89800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:07,725-Speed 3383.77 samples/sec   Loss 1.5227   LearningRate 0.0044   Epoch: 15   Global Step: 89810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:10,783-Speed 3349.10 samples/sec   Loss 1.5074   LearningRate 0.0044   Epoch: 15   Global Step: 89820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:13,810-Speed 3384.17 samples/sec   Loss 1.4583   LearningRate 0.0044   Epoch: 15   Global Step: 89830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:16,840-Speed 3379.46 samples/sec   Loss 1.5786   LearningRate 0.0044   Epoch: 15   Global Step: 89840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:19,863-Speed 3388.76 samples/sec   Loss 1.5845   LearningRate 0.0044   Epoch: 15   Global Step: 89850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:22,885-Speed 3389.33 samples/sec   Loss 1.5939   LearningRate 0.0044   Epoch: 15   Global Step: 89860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:25,896-Speed 3401.34 samples/sec   Loss 1.5513   LearningRate 0.0044   Epoch: 15   Global Step: 89870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:28,948-Speed 3355.79 samples/sec   Loss 1.5559   LearningRate 0.0044   Epoch: 15   Global Step: 89880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:31,975-Speed 3384.25 samples/sec   Loss 1.5765   LearningRate 0.0044   Epoch: 15   Global Step: 89890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:35,009-Speed 3376.43 samples/sec   Loss 1.4988   LearningRate 0.0044   Epoch: 15   Global Step: 89900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:43:38,013-Speed 3409.31 samples/sec   Loss 1.5299   LearningRate 0.0044   Epoch: 15   Global Step: 89910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:43:41,036-Speed 3387.51 samples/sec   Loss 1.5585   LearningRate 0.0044   Epoch: 15   Global Step: 89920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:43:44,061-Speed 3386.10 samples/sec   Loss 1.6002   LearningRate 0.0044   Epoch: 15   Global Step: 89930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:43:47,089-Speed 3382.32 samples/sec   Loss 1.5384   LearningRate 0.0044   Epoch: 15   Global Step: 89940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:43:50,118-Speed 3381.93 samples/sec   Loss 1.5846   LearningRate 0.0044   Epoch: 15   Global Step: 89950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:43:53,151-Speed 3376.48 samples/sec   Loss 1.6068   LearningRate 0.0044   Epoch: 15   Global Step: 89960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:43:56,175-Speed 3386.77 samples/sec   Loss 1.5463   LearningRate 0.0044   Epoch: 15   Global Step: 89970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:43:59,197-Speed 3389.69 samples/sec   Loss 1.4849   LearningRate 0.0044   Epoch: 15   Global Step: 89980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:44:02,220-Speed 3388.17 samples/sec   Loss 1.5819   LearningRate 0.0044   Epoch: 15   Global Step: 89990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:44:05,248-Speed 3382.71 samples/sec   Loss 1.4688   LearningRate 0.0044   Epoch: 15   Global Step: 90000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:44:48,746-[lfw][90000]XNorm: 21.617160
Training: 2022-04-27 10:44:48,747-[lfw][90000]Accuracy-Flip: 0.99817+-0.00252
Training: 2022-04-27 10:44:48,747-[lfw][90000]Accuracy-Highest: 0.99817
Training: 2022-04-27 10:45:39,218-[cfp_fp][90000]XNorm: 21.238212
Training: 2022-04-27 10:45:39,219-[cfp_fp][90000]Accuracy-Flip: 0.98029+-0.00753
Training: 2022-04-27 10:45:39,219-[cfp_fp][90000]Accuracy-Highest: 0.98257
Training: 2022-04-27 10:46:22,799-[agedb_30][90000]XNorm: 22.096798
Training: 2022-04-27 10:46:22,800-[agedb_30][90000]Accuracy-Flip: 0.98017+-0.00751
Training: 2022-04-27 10:46:22,801-[agedb_30][90000]Accuracy-Highest: 0.98133
Training: 2022-04-27 10:46:25,814-Speed 72.85 samples/sec   Loss 1.6027   LearningRate 0.0043   Epoch: 15   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:46:28,817-Speed 3410.53 samples/sec   Loss 1.6096   LearningRate 0.0043   Epoch: 15   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:46:31,828-Speed 3401.96 samples/sec   Loss 1.5861   LearningRate 0.0043   Epoch: 15   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:46:34,840-Speed 3400.36 samples/sec   Loss 1.4919   LearningRate 0.0043   Epoch: 15   Global Step: 90040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:46:37,849-Speed 3402.80 samples/sec   Loss 1.5338   LearningRate 0.0043   Epoch: 15   Global Step: 90050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:46:40,865-Speed 3396.85 samples/sec   Loss 1.5723   LearningRate 0.0043   Epoch: 15   Global Step: 90060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:46:43,881-Speed 3395.97 samples/sec   Loss 1.6864   LearningRate 0.0043   Epoch: 15   Global Step: 90070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:46:46,902-Speed 3389.95 samples/sec   Loss 1.5314   LearningRate 0.0043   Epoch: 15   Global Step: 90080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:46:49,927-Speed 3386.19 samples/sec   Loss 1.6422   LearningRate 0.0043   Epoch: 15   Global Step: 90090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:46:52,945-Speed 3393.80 samples/sec   Loss 1.5587   LearningRate 0.0043   Epoch: 15   Global Step: 90100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:46:55,961-Speed 3396.57 samples/sec   Loss 1.6120   LearningRate 0.0043   Epoch: 15   Global Step: 90110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:46:58,977-Speed 3396.25 samples/sec   Loss 1.4965   LearningRate 0.0043   Epoch: 15   Global Step: 90120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:47:02,015-Speed 3370.74 samples/sec   Loss 1.5256   LearningRate 0.0043   Epoch: 15   Global Step: 90130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:47:05,039-Speed 3387.62 samples/sec   Loss 1.4954   LearningRate 0.0043   Epoch: 15   Global Step: 90140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:47:08,063-Speed 3387.06 samples/sec   Loss 1.6125   LearningRate 0.0043   Epoch: 15   Global Step: 90150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:11,087-Speed 3386.52 samples/sec   Loss 1.4547   LearningRate 0.0043   Epoch: 15   Global Step: 90160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:14,116-Speed 3381.56 samples/sec   Loss 1.6163   LearningRate 0.0043   Epoch: 15   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:17,144-Speed 3382.56 samples/sec   Loss 1.5189   LearningRate 0.0043   Epoch: 15   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:20,172-Speed 3382.53 samples/sec   Loss 1.4897   LearningRate 0.0043   Epoch: 15   Global Step: 90190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:23,206-Speed 3376.31 samples/sec   Loss 1.4590   LearningRate 0.0043   Epoch: 15   Global Step: 90200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:26,350-Speed 3257.51 samples/sec   Loss 1.5721   LearningRate 0.0043   Epoch: 15   Global Step: 90210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:29,481-Speed 3270.90 samples/sec   Loss 1.5345   LearningRate 0.0043   Epoch: 15   Global Step: 90220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:32,505-Speed 3387.18 samples/sec   Loss 1.5185   LearningRate 0.0043   Epoch: 15   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:35,528-Speed 3388.65 samples/sec   Loss 1.5751   LearningRate 0.0043   Epoch: 15   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:38,553-Speed 3385.50 samples/sec   Loss 1.5056   LearningRate 0.0043   Epoch: 15   Global Step: 90250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:47:41,555-Speed 3412.04 samples/sec   Loss 1.4816   LearningRate 0.0043   Epoch: 15   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:44,572-Speed 3394.30 samples/sec   Loss 1.5779   LearningRate 0.0043   Epoch: 15   Global Step: 90270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:47,602-Speed 3381.38 samples/sec   Loss 1.6437   LearningRate 0.0042   Epoch: 15   Global Step: 90280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:50,618-Speed 3395.90 samples/sec   Loss 1.5296   LearningRate 0.0042   Epoch: 15   Global Step: 90290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:53,638-Speed 3391.91 samples/sec   Loss 1.5329   LearningRate 0.0042   Epoch: 15   Global Step: 90300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:56,655-Speed 3394.28 samples/sec   Loss 1.6609   LearningRate 0.0042   Epoch: 15   Global Step: 90310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:47:59,679-Speed 3387.27 samples/sec   Loss 1.4752   LearningRate 0.0042   Epoch: 15   Global Step: 90320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:02,695-Speed 3395.48 samples/sec   Loss 1.6082   LearningRate 0.0042   Epoch: 15   Global Step: 90330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:05,709-Speed 3398.22 samples/sec   Loss 1.6235   LearningRate 0.0042   Epoch: 15   Global Step: 90340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:08,728-Speed 3393.13 samples/sec   Loss 1.5991   LearningRate 0.0042   Epoch: 15   Global Step: 90350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:11,808-Speed 3324.68 samples/sec   Loss 1.5314   LearningRate 0.0042   Epoch: 15   Global Step: 90360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:14,940-Speed 3271.43 samples/sec   Loss 1.5396   LearningRate 0.0042   Epoch: 15   Global Step: 90370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:17,948-Speed 3404.93 samples/sec   Loss 1.5293   LearningRate 0.0042   Epoch: 15   Global Step: 90380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:20,957-Speed 3403.51 samples/sec   Loss 1.4683   LearningRate 0.0042   Epoch: 15   Global Step: 90390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:23,968-Speed 3402.58 samples/sec   Loss 1.5035   LearningRate 0.0042   Epoch: 15   Global Step: 90400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:26,983-Speed 3396.86 samples/sec   Loss 1.5586   LearningRate 0.0042   Epoch: 15   Global Step: 90410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:29,995-Speed 3399.83 samples/sec   Loss 1.5668   LearningRate 0.0042   Epoch: 15   Global Step: 90420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:33,018-Speed 3388.46 samples/sec   Loss 1.6052   LearningRate 0.0042   Epoch: 15   Global Step: 90430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:36,041-Speed 3387.72 samples/sec   Loss 1.5386   LearningRate 0.0042   Epoch: 15   Global Step: 90440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:39,184-Speed 3258.98 samples/sec   Loss 1.5378   LearningRate 0.0042   Epoch: 15   Global Step: 90450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:42,217-Speed 3377.26 samples/sec   Loss 1.5975   LearningRate 0.0042   Epoch: 15   Global Step: 90460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:45,224-Speed 3406.47 samples/sec   Loss 1.5723   LearningRate 0.0042   Epoch: 15   Global Step: 90470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:48,233-Speed 3403.79 samples/sec   Loss 1.5486   LearningRate 0.0042   Epoch: 15   Global Step: 90480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:51,243-Speed 3402.42 samples/sec   Loss 1.4823   LearningRate 0.0042   Epoch: 15   Global Step: 90490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:54,261-Speed 3394.10 samples/sec   Loss 1.5988   LearningRate 0.0042   Epoch: 15   Global Step: 90500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:48:57,277-Speed 3396.42 samples/sec   Loss 1.5390   LearningRate 0.0042   Epoch: 15   Global Step: 90510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:00,291-Speed 3397.70 samples/sec   Loss 1.6033   LearningRate 0.0042   Epoch: 15   Global Step: 90520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:03,302-Speed 3402.26 samples/sec   Loss 1.6021   LearningRate 0.0042   Epoch: 15   Global Step: 90530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:06,339-Speed 3372.69 samples/sec   Loss 1.5335   LearningRate 0.0042   Epoch: 15   Global Step: 90540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:09,353-Speed 3397.86 samples/sec   Loss 1.5270   LearningRate 0.0042   Epoch: 15   Global Step: 90550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:12,373-Speed 3391.14 samples/sec   Loss 1.5674   LearningRate 0.0041   Epoch: 15   Global Step: 90560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:49:15,370-Speed 3417.47 samples/sec   Loss 1.5404   LearningRate 0.0041   Epoch: 15   Global Step: 90570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:18,378-Speed 3405.57 samples/sec   Loss 1.5688   LearningRate 0.0041   Epoch: 15   Global Step: 90580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:21,387-Speed 3404.45 samples/sec   Loss 1.5961   LearningRate 0.0041   Epoch: 15   Global Step: 90590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:24,413-Speed 3384.99 samples/sec   Loss 1.6255   LearningRate 0.0041   Epoch: 15   Global Step: 90600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:27,445-Speed 3377.14 samples/sec   Loss 1.6120   LearningRate 0.0041   Epoch: 15   Global Step: 90610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:30,777-Speed 3073.52 samples/sec   Loss 1.4914   LearningRate 0.0041   Epoch: 15   Global Step: 90620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:33,793-Speed 3396.90 samples/sec   Loss 1.4572   LearningRate 0.0041   Epoch: 15   Global Step: 90630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:36,816-Speed 3387.45 samples/sec   Loss 1.5048   LearningRate 0.0041   Epoch: 15   Global Step: 90640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:39,831-Speed 3397.38 samples/sec   Loss 1.5210   LearningRate 0.0041   Epoch: 15   Global Step: 90650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:42,853-Speed 3389.23 samples/sec   Loss 1.5270   LearningRate 0.0041   Epoch: 15   Global Step: 90660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:45,865-Speed 3401.29 samples/sec   Loss 1.5486   LearningRate 0.0041   Epoch: 15   Global Step: 90670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:49:48,892-Speed 3383.58 samples/sec   Loss 1.5824   LearningRate 0.0041   Epoch: 15   Global Step: 90680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:51,910-Speed 3393.91 samples/sec   Loss 1.5434   LearningRate 0.0041   Epoch: 15   Global Step: 90690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:54,936-Speed 3384.43 samples/sec   Loss 1.5871   LearningRate 0.0041   Epoch: 15   Global Step: 90700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:49:57,965-Speed 3381.61 samples/sec   Loss 1.4611   LearningRate 0.0041   Epoch: 15   Global Step: 90710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:00,981-Speed 3395.68 samples/sec   Loss 1.4632   LearningRate 0.0041   Epoch: 15   Global Step: 90720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:03,992-Speed 3402.46 samples/sec   Loss 1.6067   LearningRate 0.0041   Epoch: 15   Global Step: 90730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:07,007-Speed 3397.42 samples/sec   Loss 1.5176   LearningRate 0.0041   Epoch: 15   Global Step: 90740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:10,069-Speed 3345.06 samples/sec   Loss 1.5961   LearningRate 0.0041   Epoch: 15   Global Step: 90750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:13,084-Speed 3396.87 samples/sec   Loss 1.5335   LearningRate 0.0041   Epoch: 15   Global Step: 90760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:16,095-Speed 3401.92 samples/sec   Loss 1.6518   LearningRate 0.0041   Epoch: 15   Global Step: 90770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:19,111-Speed 3395.71 samples/sec   Loss 1.5320   LearningRate 0.0041   Epoch: 15   Global Step: 90780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:50:22,109-Speed 3417.62 samples/sec   Loss 1.4586   LearningRate 0.0041   Epoch: 15   Global Step: 90790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:25,125-Speed 3395.04 samples/sec   Loss 1.6333   LearningRate 0.0041   Epoch: 15   Global Step: 90800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:28,136-Speed 3401.44 samples/sec   Loss 1.5230   LearningRate 0.0041   Epoch: 15   Global Step: 90810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:31,155-Speed 3393.11 samples/sec   Loss 1.5588   LearningRate 0.0041   Epoch: 15   Global Step: 90820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:34,169-Speed 3398.27 samples/sec   Loss 1.5486   LearningRate 0.0041   Epoch: 15   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:37,184-Speed 3397.83 samples/sec   Loss 1.4725   LearningRate 0.0040   Epoch: 15   Global Step: 90840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:40,221-Speed 3371.57 samples/sec   Loss 1.5950   LearningRate 0.0040   Epoch: 15   Global Step: 90850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:43,246-Speed 3386.26 samples/sec   Loss 1.6629   LearningRate 0.0040   Epoch: 15   Global Step: 90860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:50:46,243-Speed 3417.31 samples/sec   Loss 1.5737   LearningRate 0.0040   Epoch: 15   Global Step: 90870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:50:49,306-Speed 3343.44 samples/sec   Loss 1.4966   LearningRate 0.0040   Epoch: 15   Global Step: 90880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:50:52,323-Speed 3395.29 samples/sec   Loss 1.6125   LearningRate 0.0040   Epoch: 15   Global Step: 90890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:50:55,340-Speed 3394.66 samples/sec   Loss 1.5931   LearningRate 0.0040   Epoch: 15   Global Step: 90900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:50:58,360-Speed 3392.18 samples/sec   Loss 1.4636   LearningRate 0.0040   Epoch: 15   Global Step: 90910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:51:01,401-Speed 3367.54 samples/sec   Loss 1.5840   LearningRate 0.0040   Epoch: 15   Global Step: 90920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:51:04,442-Speed 3369.27 samples/sec   Loss 1.5816   LearningRate 0.0040   Epoch: 15   Global Step: 90930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:51:07,455-Speed 3399.14 samples/sec   Loss 1.5637   LearningRate 0.0040   Epoch: 15   Global Step: 90940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:51:10,468-Speed 3398.55 samples/sec   Loss 1.5739   LearningRate 0.0040   Epoch: 15   Global Step: 90950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:51:13,484-Speed 3396.39 samples/sec   Loss 1.5486   LearningRate 0.0040   Epoch: 15   Global Step: 90960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 10:51:16,584-Speed 3304.17 samples/sec   Loss 1.5311   LearningRate 0.0040   Epoch: 15   Global Step: 90970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:30,847-Speed 717.98 samples/sec   Loss 1.4254   LearningRate 0.0040   Epoch: 16   Global Step: 90980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:33,866-Speed 3393.20 samples/sec   Loss 1.1047   LearningRate 0.0040   Epoch: 16   Global Step: 90990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:36,886-Speed 3391.27 samples/sec   Loss 1.1057   LearningRate 0.0040   Epoch: 16   Global Step: 91000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:39,920-Speed 3376.34 samples/sec   Loss 1.1322   LearningRate 0.0040   Epoch: 16   Global Step: 91010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:42,944-Speed 3388.00 samples/sec   Loss 1.1464   LearningRate 0.0040   Epoch: 16   Global Step: 91020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:45,955-Speed 3400.84 samples/sec   Loss 1.0460   LearningRate 0.0040   Epoch: 16   Global Step: 91030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:48,986-Speed 3379.70 samples/sec   Loss 1.1435   LearningRate 0.0040   Epoch: 16   Global Step: 91040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:52,056-Speed 3336.31 samples/sec   Loss 1.0344   LearningRate 0.0040   Epoch: 16   Global Step: 91050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:55,079-Speed 3387.95 samples/sec   Loss 1.1144   LearningRate 0.0040   Epoch: 16   Global Step: 91060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:51:58,082-Speed 3410.58 samples/sec   Loss 1.0752   LearningRate 0.0040   Epoch: 16   Global Step: 91070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:01,124-Speed 3367.71 samples/sec   Loss 1.1338   LearningRate 0.0040   Epoch: 16   Global Step: 91080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:04,142-Speed 3393.66 samples/sec   Loss 1.0081   LearningRate 0.0040   Epoch: 16   Global Step: 91090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:07,168-Speed 3384.85 samples/sec   Loss 1.1636   LearningRate 0.0040   Epoch: 16   Global Step: 91100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:10,188-Speed 3390.66 samples/sec   Loss 1.0964   LearningRate 0.0040   Epoch: 16   Global Step: 91110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:13,207-Speed 3393.08 samples/sec   Loss 1.0482   LearningRate 0.0039   Epoch: 16   Global Step: 91120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:16,235-Speed 3383.14 samples/sec   Loss 1.1196   LearningRate 0.0039   Epoch: 16   Global Step: 91130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:19,247-Speed 3400.34 samples/sec   Loss 1.0240   LearningRate 0.0039   Epoch: 16   Global Step: 91140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:22,279-Speed 3378.50 samples/sec   Loss 1.0611   LearningRate 0.0039   Epoch: 16   Global Step: 91150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:25,299-Speed 3391.11 samples/sec   Loss 1.0039   LearningRate 0.0039   Epoch: 16   Global Step: 91160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:28,324-Speed 3386.05 samples/sec   Loss 1.1084   LearningRate 0.0039   Epoch: 16   Global Step: 91170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:52:31,340-Speed 3395.28 samples/sec   Loss 1.1061   LearningRate 0.0039   Epoch: 16   Global Step: 91180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:52:34,375-Speed 3374.91 samples/sec   Loss 1.0780   LearningRate 0.0039   Epoch: 16   Global Step: 91190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:52:37,396-Speed 3390.50 samples/sec   Loss 1.0369   LearningRate 0.0039   Epoch: 16   Global Step: 91200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:52:40,400-Speed 3409.93 samples/sec   Loss 1.1529   LearningRate 0.0039   Epoch: 16   Global Step: 91210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:43,430-Speed 3379.84 samples/sec   Loss 1.0677   LearningRate 0.0039   Epoch: 16   Global Step: 91220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:46,452-Speed 3390.01 samples/sec   Loss 1.0558   LearningRate 0.0039   Epoch: 16   Global Step: 91230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:49,477-Speed 3385.17 samples/sec   Loss 1.1891   LearningRate 0.0039   Epoch: 16   Global Step: 91240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:52,503-Speed 3385.65 samples/sec   Loss 1.1470   LearningRate 0.0039   Epoch: 16   Global Step: 91250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:55,541-Speed 3370.78 samples/sec   Loss 1.0848   LearningRate 0.0039   Epoch: 16   Global Step: 91260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:52:58,572-Speed 3379.51 samples/sec   Loss 1.0658   LearningRate 0.0039   Epoch: 16   Global Step: 91270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:01,601-Speed 3380.29 samples/sec   Loss 1.0849   LearningRate 0.0039   Epoch: 16   Global Step: 91280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:04,632-Speed 3379.68 samples/sec   Loss 1.0561   LearningRate 0.0039   Epoch: 16   Global Step: 91290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:07,662-Speed 3380.44 samples/sec   Loss 1.0914   LearningRate 0.0039   Epoch: 16   Global Step: 91300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:10,717-Speed 3353.45 samples/sec   Loss 1.0596   LearningRate 0.0039   Epoch: 16   Global Step: 91310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:53:13,722-Speed 3407.62 samples/sec   Loss 1.2266   LearningRate 0.0039   Epoch: 16   Global Step: 91320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:16,762-Speed 3369.88 samples/sec   Loss 1.0718   LearningRate 0.0039   Epoch: 16   Global Step: 91330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:19,790-Speed 3381.88 samples/sec   Loss 1.0615   LearningRate 0.0039   Epoch: 16   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:22,916-Speed 3277.13 samples/sec   Loss 1.0614   LearningRate 0.0039   Epoch: 16   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:26,052-Speed 3265.22 samples/sec   Loss 1.1071   LearningRate 0.0039   Epoch: 16   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:29,223-Speed 3230.30 samples/sec   Loss 1.0848   LearningRate 0.0039   Epoch: 16   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:32,251-Speed 3383.09 samples/sec   Loss 1.0742   LearningRate 0.0039   Epoch: 16   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:35,278-Speed 3383.68 samples/sec   Loss 1.0702   LearningRate 0.0039   Epoch: 16   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:38,300-Speed 3389.74 samples/sec   Loss 1.0992   LearningRate 0.0039   Epoch: 16   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:41,330-Speed 3380.43 samples/sec   Loss 1.1247   LearningRate 0.0038   Epoch: 16   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:44,356-Speed 3384.71 samples/sec   Loss 1.1434   LearningRate 0.0038   Epoch: 16   Global Step: 91420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:53:47,375-Speed 3392.51 samples/sec   Loss 1.1657   LearningRate 0.0038   Epoch: 16   Global Step: 91430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:50,420-Speed 3363.78 samples/sec   Loss 1.0922   LearningRate 0.0038   Epoch: 16   Global Step: 91440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:53,489-Speed 3339.55 samples/sec   Loss 1.0924   LearningRate 0.0038   Epoch: 16   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:56,511-Speed 3389.74 samples/sec   Loss 1.0602   LearningRate 0.0038   Epoch: 16   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:53:59,533-Speed 3388.60 samples/sec   Loss 1.0817   LearningRate 0.0038   Epoch: 16   Global Step: 91470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:02,577-Speed 3365.08 samples/sec   Loss 1.1260   LearningRate 0.0038   Epoch: 16   Global Step: 91480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:05,647-Speed 3336.15 samples/sec   Loss 1.0707   LearningRate 0.0038   Epoch: 16   Global Step: 91490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:08,666-Speed 3391.82 samples/sec   Loss 1.1637   LearningRate 0.0038   Epoch: 16   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:11,697-Speed 3379.49 samples/sec   Loss 1.1043   LearningRate 0.0038   Epoch: 16   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:14,732-Speed 3375.12 samples/sec   Loss 1.0589   LearningRate 0.0038   Epoch: 16   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:17,768-Speed 3374.35 samples/sec   Loss 1.1358   LearningRate 0.0038   Epoch: 16   Global Step: 91530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:20,791-Speed 3387.78 samples/sec   Loss 1.1601   LearningRate 0.0038   Epoch: 16   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:23,818-Speed 3383.42 samples/sec   Loss 1.1392   LearningRate 0.0038   Epoch: 16   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:26,840-Speed 3389.22 samples/sec   Loss 1.0752   LearningRate 0.0038   Epoch: 16   Global Step: 91560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:29,862-Speed 3389.80 samples/sec   Loss 1.0184   LearningRate 0.0038   Epoch: 16   Global Step: 91570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:32,884-Speed 3388.35 samples/sec   Loss 1.1514   LearningRate 0.0038   Epoch: 16   Global Step: 91580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:35,909-Speed 3386.57 samples/sec   Loss 1.1135   LearningRate 0.0038   Epoch: 16   Global Step: 91590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:38,966-Speed 3349.79 samples/sec   Loss 1.1416   LearningRate 0.0038   Epoch: 16   Global Step: 91600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:41,983-Speed 3395.75 samples/sec   Loss 1.1679   LearningRate 0.0038   Epoch: 16   Global Step: 91610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:45,003-Speed 3390.88 samples/sec   Loss 1.1198   LearningRate 0.0038   Epoch: 16   Global Step: 91620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:48,028-Speed 3386.46 samples/sec   Loss 1.0263   LearningRate 0.0038   Epoch: 16   Global Step: 91630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:54:51,034-Speed 3407.47 samples/sec   Loss 1.1589   LearningRate 0.0038   Epoch: 16   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:54,056-Speed 3389.55 samples/sec   Loss 1.2029   LearningRate 0.0038   Epoch: 16   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:54:57,079-Speed 3387.56 samples/sec   Loss 1.0880   LearningRate 0.0038   Epoch: 16   Global Step: 91660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:00,107-Speed 3382.70 samples/sec   Loss 1.1423   LearningRate 0.0038   Epoch: 16   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:03,135-Speed 3382.55 samples/sec   Loss 1.0955   LearningRate 0.0038   Epoch: 16   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:06,156-Speed 3390.00 samples/sec   Loss 1.1831   LearningRate 0.0038   Epoch: 16   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:09,182-Speed 3384.67 samples/sec   Loss 1.1776   LearningRate 0.0037   Epoch: 16   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:12,206-Speed 3388.05 samples/sec   Loss 1.0931   LearningRate 0.0037   Epoch: 16   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:15,230-Speed 3386.63 samples/sec   Loss 1.1108   LearningRate 0.0037   Epoch: 16   Global Step: 91720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:18,253-Speed 3388.19 samples/sec   Loss 1.1379   LearningRate 0.0037   Epoch: 16   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:21,278-Speed 3385.74 samples/sec   Loss 1.0672   LearningRate 0.0037   Epoch: 16   Global Step: 91740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:55:24,289-Speed 3402.03 samples/sec   Loss 1.1473   LearningRate 0.0037   Epoch: 16   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:27,320-Speed 3378.90 samples/sec   Loss 1.1107   LearningRate 0.0037   Epoch: 16   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:30,356-Speed 3373.58 samples/sec   Loss 1.0758   LearningRate 0.0037   Epoch: 16   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:33,388-Speed 3378.29 samples/sec   Loss 1.0879   LearningRate 0.0037   Epoch: 16   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:36,424-Speed 3373.00 samples/sec   Loss 1.1325   LearningRate 0.0037   Epoch: 16   Global Step: 91790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:39,460-Speed 3373.75 samples/sec   Loss 1.1329   LearningRate 0.0037   Epoch: 16   Global Step: 91800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:42,488-Speed 3383.09 samples/sec   Loss 1.1909   LearningRate 0.0037   Epoch: 16   Global Step: 91810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:45,508-Speed 3391.85 samples/sec   Loss 1.1345   LearningRate 0.0037   Epoch: 16   Global Step: 91820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:48,534-Speed 3384.28 samples/sec   Loss 1.1129   LearningRate 0.0037   Epoch: 16   Global Step: 91830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:51,613-Speed 3327.25 samples/sec   Loss 1.1250   LearningRate 0.0037   Epoch: 16   Global Step: 91840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:55:54,653-Speed 3369.03 samples/sec   Loss 1.0962   LearningRate 0.0037   Epoch: 16   Global Step: 91850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 10:55:57,669-Speed 3396.42 samples/sec   Loss 1.1588   LearningRate 0.0037   Epoch: 16   Global Step: 91860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:00,699-Speed 3379.66 samples/sec   Loss 1.1393   LearningRate 0.0037   Epoch: 16   Global Step: 91870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:03,725-Speed 3384.15 samples/sec   Loss 1.0714   LearningRate 0.0037   Epoch: 16   Global Step: 91880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:06,748-Speed 3388.81 samples/sec   Loss 1.0963   LearningRate 0.0037   Epoch: 16   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:09,771-Speed 3388.38 samples/sec   Loss 1.1166   LearningRate 0.0037   Epoch: 16   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:12,800-Speed 3381.12 samples/sec   Loss 1.1368   LearningRate 0.0037   Epoch: 16   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:15,824-Speed 3387.33 samples/sec   Loss 1.1970   LearningRate 0.0037   Epoch: 16   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:18,850-Speed 3384.79 samples/sec   Loss 1.0742   LearningRate 0.0037   Epoch: 16   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:21,877-Speed 3383.45 samples/sec   Loss 1.1281   LearningRate 0.0037   Epoch: 16   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:24,899-Speed 3389.29 samples/sec   Loss 1.0851   LearningRate 0.0037   Epoch: 16   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:27,909-Speed 3402.51 samples/sec   Loss 1.1561   LearningRate 0.0037   Epoch: 16   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:30,933-Speed 3387.61 samples/sec   Loss 1.1022   LearningRate 0.0037   Epoch: 16   Global Step: 91970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:34,006-Speed 3332.76 samples/sec   Loss 1.1016   LearningRate 0.0037   Epoch: 16   Global Step: 91980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:37,061-Speed 3352.81 samples/sec   Loss 1.1326   LearningRate 0.0037   Epoch: 16   Global Step: 91990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:56:40,094-Speed 3377.51 samples/sec   Loss 1.1607   LearningRate 0.0036   Epoch: 16   Global Step: 92000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:57:23,411-[lfw][92000]XNorm: 22.736918
Training: 2022-04-27 10:57:23,411-[lfw][92000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-27 10:57:23,412-[lfw][92000]Accuracy-Highest: 0.99817
Training: 2022-04-27 10:58:13,799-[cfp_fp][92000]XNorm: 22.136400
Training: 2022-04-27 10:58:13,800-[cfp_fp][92000]Accuracy-Flip: 0.98229+-0.00633
Training: 2022-04-27 10:58:13,800-[cfp_fp][92000]Accuracy-Highest: 0.98257
Training: 2022-04-27 10:58:57,140-[agedb_30][92000]XNorm: 22.949669
Training: 2022-04-27 10:58:57,140-[agedb_30][92000]Accuracy-Flip: 0.98033+-0.00809
Training: 2022-04-27 10:58:57,141-[agedb_30][92000]Accuracy-Highest: 0.98133
Training: 2022-04-27 10:59:00,158-Speed 73.11 samples/sec   Loss 1.1203   LearningRate 0.0036   Epoch: 16   Global Step: 92010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:03,166-Speed 3404.86 samples/sec   Loss 1.1125   LearningRate 0.0036   Epoch: 16   Global Step: 92020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:06,169-Speed 3410.56 samples/sec   Loss 1.1243   LearningRate 0.0036   Epoch: 16   Global Step: 92030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:09,179-Speed 3402.85 samples/sec   Loss 1.1897   LearningRate 0.0036   Epoch: 16   Global Step: 92040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:12,187-Speed 3404.49 samples/sec   Loss 1.1279   LearningRate 0.0036   Epoch: 16   Global Step: 92050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:15,174-Speed 3429.64 samples/sec   Loss 1.1026   LearningRate 0.0036   Epoch: 16   Global Step: 92060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:18,184-Speed 3401.92 samples/sec   Loss 1.1377   LearningRate 0.0036   Epoch: 16   Global Step: 92070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:21,199-Speed 3397.77 samples/sec   Loss 1.0748   LearningRate 0.0036   Epoch: 16   Global Step: 92080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:24,215-Speed 3396.10 samples/sec   Loss 1.0761   LearningRate 0.0036   Epoch: 16   Global Step: 92090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:27,267-Speed 3355.98 samples/sec   Loss 1.1425   LearningRate 0.0036   Epoch: 16   Global Step: 92100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:30,309-Speed 3366.63 samples/sec   Loss 1.1362   LearningRate 0.0036   Epoch: 16   Global Step: 92110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:33,338-Speed 3381.38 samples/sec   Loss 1.1583   LearningRate 0.0036   Epoch: 16   Global Step: 92120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:36,358-Speed 3391.92 samples/sec   Loss 1.2198   LearningRate 0.0036   Epoch: 16   Global Step: 92130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:39,378-Speed 3391.51 samples/sec   Loss 1.1577   LearningRate 0.0036   Epoch: 16   Global Step: 92140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:42,398-Speed 3391.25 samples/sec   Loss 1.1897   LearningRate 0.0036   Epoch: 16   Global Step: 92150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:45,409-Speed 3402.01 samples/sec   Loss 1.1276   LearningRate 0.0036   Epoch: 16   Global Step: 92160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:48,440-Speed 3379.10 samples/sec   Loss 1.1908   LearningRate 0.0036   Epoch: 16   Global Step: 92170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:51,471-Speed 3379.17 samples/sec   Loss 1.0984   LearningRate 0.0036   Epoch: 16   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:54,497-Speed 3385.19 samples/sec   Loss 1.1584   LearningRate 0.0036   Epoch: 16   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 10:59:57,522-Speed 3385.51 samples/sec   Loss 1.0561   LearningRate 0.0036   Epoch: 16   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:00,593-Speed 3334.76 samples/sec   Loss 1.1178   LearningRate 0.0036   Epoch: 16   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:03,623-Speed 3380.79 samples/sec   Loss 1.1952   LearningRate 0.0036   Epoch: 16   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:06,688-Speed 3342.30 samples/sec   Loss 1.0512   LearningRate 0.0036   Epoch: 16   Global Step: 92230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:09,711-Speed 3388.51 samples/sec   Loss 1.1371   LearningRate 0.0036   Epoch: 16   Global Step: 92240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:12,744-Speed 3376.37 samples/sec   Loss 1.1759   LearningRate 0.0036   Epoch: 16   Global Step: 92250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:15,773-Speed 3381.65 samples/sec   Loss 1.1096   LearningRate 0.0036   Epoch: 16   Global Step: 92260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:00:18,789-Speed 3395.85 samples/sec   Loss 1.1165   LearningRate 0.0036   Epoch: 16   Global Step: 92270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:00:21,806-Speed 3394.86 samples/sec   Loss 1.0967   LearningRate 0.0036   Epoch: 16   Global Step: 92280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:00:24,811-Speed 3408.70 samples/sec   Loss 1.1042   LearningRate 0.0036   Epoch: 16   Global Step: 92290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:27,876-Speed 3342.20 samples/sec   Loss 1.1077   LearningRate 0.0035   Epoch: 16   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:31,031-Speed 3246.13 samples/sec   Loss 1.1988   LearningRate 0.0035   Epoch: 16   Global Step: 92310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:34,137-Speed 3297.43 samples/sec   Loss 1.1256   LearningRate 0.0035   Epoch: 16   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:37,155-Speed 3393.36 samples/sec   Loss 1.2016   LearningRate 0.0035   Epoch: 16   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:40,217-Speed 3345.37 samples/sec   Loss 1.0581   LearningRate 0.0035   Epoch: 16   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:43,253-Speed 3373.84 samples/sec   Loss 1.0629   LearningRate 0.0035   Epoch: 16   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:46,296-Speed 3365.63 samples/sec   Loss 1.1597   LearningRate 0.0035   Epoch: 16   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:49,306-Speed 3402.84 samples/sec   Loss 1.1920   LearningRate 0.0035   Epoch: 16   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:52,313-Speed 3406.59 samples/sec   Loss 1.1635   LearningRate 0.0035   Epoch: 16   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:55,304-Speed 3424.88 samples/sec   Loss 1.2393   LearningRate 0.0035   Epoch: 16   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:00:58,327-Speed 3388.03 samples/sec   Loss 1.1245   LearningRate 0.0035   Epoch: 16   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:01,333-Speed 3406.33 samples/sec   Loss 1.1639   LearningRate 0.0035   Epoch: 16   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:04,348-Speed 3397.93 samples/sec   Loss 1.2112   LearningRate 0.0035   Epoch: 16   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:07,359-Speed 3401.89 samples/sec   Loss 1.1122   LearningRate 0.0035   Epoch: 16   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:10,373-Speed 3399.03 samples/sec   Loss 1.1592   LearningRate 0.0035   Epoch: 16   Global Step: 92440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:13,383-Speed 3402.77 samples/sec   Loss 1.1922   LearningRate 0.0035   Epoch: 16   Global Step: 92450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:16,395-Speed 3400.37 samples/sec   Loss 1.1998   LearningRate 0.0035   Epoch: 16   Global Step: 92460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:19,402-Speed 3405.97 samples/sec   Loss 1.1460   LearningRate 0.0035   Epoch: 16   Global Step: 92470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:22,409-Speed 3406.38 samples/sec   Loss 1.1524   LearningRate 0.0035   Epoch: 16   Global Step: 92480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:25,420-Speed 3401.35 samples/sec   Loss 1.1905   LearningRate 0.0035   Epoch: 16   Global Step: 92490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:28,430-Speed 3402.40 samples/sec   Loss 1.1831   LearningRate 0.0035   Epoch: 16   Global Step: 92500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:31,436-Speed 3408.25 samples/sec   Loss 1.1015   LearningRate 0.0035   Epoch: 16   Global Step: 92510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:34,442-Speed 3406.53 samples/sec   Loss 1.1083   LearningRate 0.0035   Epoch: 16   Global Step: 92520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:37,449-Speed 3406.96 samples/sec   Loss 1.0978   LearningRate 0.0035   Epoch: 16   Global Step: 92530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:01:40,457-Speed 3405.33 samples/sec   Loss 1.2127   LearningRate 0.0035   Epoch: 16   Global Step: 92540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:43,474-Speed 3394.68 samples/sec   Loss 1.2073   LearningRate 0.0035   Epoch: 16   Global Step: 92550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:46,496-Speed 3389.18 samples/sec   Loss 1.0667   LearningRate 0.0035   Epoch: 16   Global Step: 92560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:49,506-Speed 3402.23 samples/sec   Loss 1.2215   LearningRate 0.0035   Epoch: 16   Global Step: 92570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:52,530-Speed 3387.51 samples/sec   Loss 1.1864   LearningRate 0.0035   Epoch: 16   Global Step: 92580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:55,541-Speed 3401.55 samples/sec   Loss 1.1413   LearningRate 0.0035   Epoch: 16   Global Step: 92590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:01:58,552-Speed 3401.42 samples/sec   Loss 1.1275   LearningRate 0.0034   Epoch: 16   Global Step: 92600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:01,562-Speed 3402.60 samples/sec   Loss 1.1353   LearningRate 0.0034   Epoch: 16   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:04,717-Speed 3247.05 samples/sec   Loss 1.1314   LearningRate 0.0034   Epoch: 16   Global Step: 92620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:07,750-Speed 3376.22 samples/sec   Loss 1.1972   LearningRate 0.0034   Epoch: 16   Global Step: 92630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:10,776-Speed 3385.50 samples/sec   Loss 1.1046   LearningRate 0.0034   Epoch: 16   Global Step: 92640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:02:13,773-Speed 3417.37 samples/sec   Loss 1.1859   LearningRate 0.0034   Epoch: 16   Global Step: 92650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:16,929-Speed 3245.32 samples/sec   Loss 1.1623   LearningRate 0.0034   Epoch: 16   Global Step: 92660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:20,005-Speed 3329.65 samples/sec   Loss 1.1639   LearningRate 0.0034   Epoch: 16   Global Step: 92670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:23,022-Speed 3395.34 samples/sec   Loss 1.0923   LearningRate 0.0034   Epoch: 16   Global Step: 92680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:26,060-Speed 3371.75 samples/sec   Loss 1.1775   LearningRate 0.0034   Epoch: 16   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:29,069-Speed 3402.91 samples/sec   Loss 1.1520   LearningRate 0.0034   Epoch: 16   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:32,079-Speed 3402.66 samples/sec   Loss 1.2058   LearningRate 0.0034   Epoch: 16   Global Step: 92710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:35,095-Speed 3396.69 samples/sec   Loss 1.2262   LearningRate 0.0034   Epoch: 16   Global Step: 92720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:38,130-Speed 3374.18 samples/sec   Loss 1.1903   LearningRate 0.0034   Epoch: 16   Global Step: 92730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:41,198-Speed 3339.35 samples/sec   Loss 1.1672   LearningRate 0.0034   Epoch: 16   Global Step: 92740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:44,212-Speed 3398.30 samples/sec   Loss 1.0412   LearningRate 0.0034   Epoch: 16   Global Step: 92750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:02:47,212-Speed 3413.86 samples/sec   Loss 1.1082   LearningRate 0.0034   Epoch: 16   Global Step: 92760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:50,234-Speed 3388.81 samples/sec   Loss 1.1578   LearningRate 0.0034   Epoch: 16   Global Step: 92770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:53,247-Speed 3400.09 samples/sec   Loss 1.1064   LearningRate 0.0034   Epoch: 16   Global Step: 92780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:02:56,235-Speed 3427.04 samples/sec   Loss 1.2022   LearningRate 0.0034   Epoch: 16   Global Step: 92790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:02:59,269-Speed 3376.08 samples/sec   Loss 1.2267   LearningRate 0.0034   Epoch: 16   Global Step: 92800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:02,304-Speed 3374.88 samples/sec   Loss 1.1595   LearningRate 0.0034   Epoch: 16   Global Step: 92810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:05,320-Speed 3396.64 samples/sec   Loss 1.2650   LearningRate 0.0034   Epoch: 16   Global Step: 92820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:08,331-Speed 3401.50 samples/sec   Loss 1.1962   LearningRate 0.0034   Epoch: 16   Global Step: 92830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:11,348-Speed 3394.95 samples/sec   Loss 1.1154   LearningRate 0.0034   Epoch: 16   Global Step: 92840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:14,392-Speed 3364.58 samples/sec   Loss 1.1360   LearningRate 0.0034   Epoch: 16   Global Step: 92850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:17,417-Speed 3385.97 samples/sec   Loss 1.1970   LearningRate 0.0034   Epoch: 16   Global Step: 92860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:20,434-Speed 3394.66 samples/sec   Loss 1.1981   LearningRate 0.0034   Epoch: 16   Global Step: 92870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:23,560-Speed 3276.32 samples/sec   Loss 1.1619   LearningRate 0.0034   Epoch: 16   Global Step: 92880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:26,649-Speed 3315.57 samples/sec   Loss 1.1496   LearningRate 0.0034   Epoch: 16   Global Step: 92890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:03:29,666-Speed 3396.98 samples/sec   Loss 1.1351   LearningRate 0.0034   Epoch: 16   Global Step: 92900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:03:32,663-Speed 3417.48 samples/sec   Loss 1.1591   LearningRate 0.0033   Epoch: 16   Global Step: 92910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:35,678-Speed 3397.69 samples/sec   Loss 1.1254   LearningRate 0.0033   Epoch: 16   Global Step: 92920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:38,693-Speed 3396.94 samples/sec   Loss 1.1391   LearningRate 0.0033   Epoch: 16   Global Step: 92930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:41,706-Speed 3399.46 samples/sec   Loss 1.1728   LearningRate 0.0033   Epoch: 16   Global Step: 92940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:44,722-Speed 3395.62 samples/sec   Loss 1.1919   LearningRate 0.0033   Epoch: 16   Global Step: 92950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:47,737-Speed 3396.89 samples/sec   Loss 1.1281   LearningRate 0.0033   Epoch: 16   Global Step: 92960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:50,761-Speed 3387.28 samples/sec   Loss 1.2219   LearningRate 0.0033   Epoch: 16   Global Step: 92970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:53,810-Speed 3359.04 samples/sec   Loss 1.2001   LearningRate 0.0033   Epoch: 16   Global Step: 92980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:56,842-Speed 3378.18 samples/sec   Loss 1.1672   LearningRate 0.0033   Epoch: 16   Global Step: 92990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:03:59,860-Speed 3394.48 samples/sec   Loss 1.1979   LearningRate 0.0033   Epoch: 16   Global Step: 93000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:04:02,875-Speed 3397.61 samples/sec   Loss 1.1582   LearningRate 0.0033   Epoch: 16   Global Step: 93010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:05,891-Speed 3395.00 samples/sec   Loss 1.1120   LearningRate 0.0033   Epoch: 16   Global Step: 93020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:08,911-Speed 3391.83 samples/sec   Loss 1.2155   LearningRate 0.0033   Epoch: 16   Global Step: 93030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:11,929-Speed 3393.30 samples/sec   Loss 1.1922   LearningRate 0.0033   Epoch: 16   Global Step: 93040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:14,950-Speed 3390.93 samples/sec   Loss 1.3117   LearningRate 0.0033   Epoch: 16   Global Step: 93050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:18,013-Speed 3343.52 samples/sec   Loss 1.1946   LearningRate 0.0033   Epoch: 16   Global Step: 93060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:21,029-Speed 3396.35 samples/sec   Loss 1.1727   LearningRate 0.0033   Epoch: 16   Global Step: 93070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:24,053-Speed 3387.27 samples/sec   Loss 1.2025   LearningRate 0.0033   Epoch: 16   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:27,070-Speed 3395.18 samples/sec   Loss 1.1528   LearningRate 0.0033   Epoch: 16   Global Step: 93090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:30,092-Speed 3388.41 samples/sec   Loss 1.1964   LearningRate 0.0033   Epoch: 16   Global Step: 93100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:33,110-Speed 3393.97 samples/sec   Loss 1.1587   LearningRate 0.0033   Epoch: 16   Global Step: 93110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:04:36,111-Speed 3412.77 samples/sec   Loss 1.1114   LearningRate 0.0033   Epoch: 16   Global Step: 93120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:39,142-Speed 3379.83 samples/sec   Loss 1.1946   LearningRate 0.0033   Epoch: 16   Global Step: 93130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:04:42,204-Speed 3344.80 samples/sec   Loss 1.2389   LearningRate 0.0033   Epoch: 16   Global Step: 93140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:04:45,224-Speed 3390.77 samples/sec   Loss 1.2368   LearningRate 0.0033   Epoch: 16   Global Step: 93150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:04:48,248-Speed 3387.12 samples/sec   Loss 1.1057   LearningRate 0.0033   Epoch: 16   Global Step: 93160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:04:51,264-Speed 3396.57 samples/sec   Loss 1.1612   LearningRate 0.0033   Epoch: 16   Global Step: 93170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:04:54,282-Speed 3393.68 samples/sec   Loss 1.1740   LearningRate 0.0033   Epoch: 16   Global Step: 93180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:04:57,300-Speed 3393.80 samples/sec   Loss 1.1890   LearningRate 0.0033   Epoch: 16   Global Step: 93190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:05:00,319-Speed 3392.89 samples/sec   Loss 1.2285   LearningRate 0.0033   Epoch: 16   Global Step: 93200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:05:03,338-Speed 3393.21 samples/sec   Loss 1.1814   LearningRate 0.0033   Epoch: 16   Global Step: 93210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:05:06,353-Speed 3397.02 samples/sec   Loss 1.2443   LearningRate 0.0032   Epoch: 16   Global Step: 93220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:05:09,369-Speed 3395.76 samples/sec   Loss 1.1362   LearningRate 0.0032   Epoch: 16   Global Step: 93230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:05:12,390-Speed 3391.01 samples/sec   Loss 1.1410   LearningRate 0.0032   Epoch: 16   Global Step: 93240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:15,405-Speed 3396.54 samples/sec   Loss 1.2042   LearningRate 0.0032   Epoch: 16   Global Step: 93250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:18,427-Speed 3389.14 samples/sec   Loss 1.2221   LearningRate 0.0032   Epoch: 16   Global Step: 93260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:21,447-Speed 3391.96 samples/sec   Loss 1.2105   LearningRate 0.0032   Epoch: 16   Global Step: 93270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:24,462-Speed 3396.80 samples/sec   Loss 1.1058   LearningRate 0.0032   Epoch: 16   Global Step: 93280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:27,483-Speed 3390.21 samples/sec   Loss 1.1311   LearningRate 0.0032   Epoch: 16   Global Step: 93290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:30,514-Speed 3379.78 samples/sec   Loss 1.2003   LearningRate 0.0032   Epoch: 16   Global Step: 93300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:33,534-Speed 3391.48 samples/sec   Loss 1.1816   LearningRate 0.0032   Epoch: 16   Global Step: 93310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:36,555-Speed 3390.46 samples/sec   Loss 1.1126   LearningRate 0.0032   Epoch: 16   Global Step: 93320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:39,572-Speed 3395.11 samples/sec   Loss 1.1254   LearningRate 0.0032   Epoch: 16   Global Step: 93330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:42,587-Speed 3396.88 samples/sec   Loss 1.1456   LearningRate 0.0032   Epoch: 16   Global Step: 93340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:05:45,592-Speed 3408.19 samples/sec   Loss 1.1376   LearningRate 0.0032   Epoch: 16   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:48,608-Speed 3396.62 samples/sec   Loss 1.1631   LearningRate 0.0032   Epoch: 16   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:51,637-Speed 3381.17 samples/sec   Loss 1.1307   LearningRate 0.0032   Epoch: 16   Global Step: 93370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:54,665-Speed 3382.70 samples/sec   Loss 1.2190   LearningRate 0.0032   Epoch: 16   Global Step: 93380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:05:57,681-Speed 3396.17 samples/sec   Loss 1.1387   LearningRate 0.0032   Epoch: 16   Global Step: 93390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:00,713-Speed 3378.16 samples/sec   Loss 1.1531   LearningRate 0.0032   Epoch: 16   Global Step: 93400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:03,735-Speed 3389.21 samples/sec   Loss 1.1166   LearningRate 0.0032   Epoch: 16   Global Step: 93410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:06,752-Speed 3395.19 samples/sec   Loss 1.1848   LearningRate 0.0032   Epoch: 16   Global Step: 93420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:09,780-Speed 3382.35 samples/sec   Loss 1.1387   LearningRate 0.0032   Epoch: 16   Global Step: 93430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:12,801-Speed 3390.40 samples/sec   Loss 1.1328   LearningRate 0.0032   Epoch: 16   Global Step: 93440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:15,809-Speed 3404.33 samples/sec   Loss 1.1708   LearningRate 0.0032   Epoch: 16   Global Step: 93450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:18,832-Speed 3388.05 samples/sec   Loss 1.2067   LearningRate 0.0032   Epoch: 16   Global Step: 93460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:21,861-Speed 3381.60 samples/sec   Loss 1.1958   LearningRate 0.0032   Epoch: 16   Global Step: 93470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:24,988-Speed 3276.14 samples/sec   Loss 1.2336   LearningRate 0.0032   Epoch: 16   Global Step: 93480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:28,009-Speed 3390.74 samples/sec   Loss 1.0817   LearningRate 0.0032   Epoch: 16   Global Step: 93490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:31,033-Speed 3386.41 samples/sec   Loss 1.1647   LearningRate 0.0032   Epoch: 16   Global Step: 93500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:34,062-Speed 3381.74 samples/sec   Loss 1.0974   LearningRate 0.0032   Epoch: 16   Global Step: 93510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:37,087-Speed 3385.95 samples/sec   Loss 1.1858   LearningRate 0.0032   Epoch: 16   Global Step: 93520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:40,109-Speed 3388.44 samples/sec   Loss 1.2567   LearningRate 0.0032   Epoch: 16   Global Step: 93530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:43,133-Speed 3387.53 samples/sec   Loss 1.1120   LearningRate 0.0031   Epoch: 16   Global Step: 93540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:46,156-Speed 3387.95 samples/sec   Loss 1.1018   LearningRate 0.0031   Epoch: 16   Global Step: 93550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:06:49,168-Speed 3400.77 samples/sec   Loss 1.2222   LearningRate 0.0031   Epoch: 16   Global Step: 93560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:52,191-Speed 3388.33 samples/sec   Loss 1.1836   LearningRate 0.0031   Epoch: 16   Global Step: 93570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:55,210-Speed 3393.23 samples/sec   Loss 1.0979   LearningRate 0.0031   Epoch: 16   Global Step: 93580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:06:58,259-Speed 3359.24 samples/sec   Loss 1.0484   LearningRate 0.0031   Epoch: 16   Global Step: 93590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:07:01,282-Speed 3388.23 samples/sec   Loss 1.1506   LearningRate 0.0031   Epoch: 16   Global Step: 93600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:04,434-Speed 3250.21 samples/sec   Loss 1.1678   LearningRate 0.0031   Epoch: 16   Global Step: 93610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:07,462-Speed 3382.88 samples/sec   Loss 1.1775   LearningRate 0.0031   Epoch: 16   Global Step: 93620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:10,489-Speed 3382.80 samples/sec   Loss 1.2001   LearningRate 0.0031   Epoch: 16   Global Step: 93630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:13,582-Speed 3311.69 samples/sec   Loss 1.1493   LearningRate 0.0031   Epoch: 16   Global Step: 93640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:16,603-Speed 3391.24 samples/sec   Loss 1.1140   LearningRate 0.0031   Epoch: 16   Global Step: 93650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:19,626-Speed 3388.19 samples/sec   Loss 1.1720   LearningRate 0.0031   Epoch: 16   Global Step: 93660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:22,645-Speed 3394.87 samples/sec   Loss 1.1196   LearningRate 0.0031   Epoch: 16   Global Step: 93670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:25,731-Speed 3318.38 samples/sec   Loss 1.1856   LearningRate 0.0031   Epoch: 16   Global Step: 93680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:28,765-Speed 3375.91 samples/sec   Loss 1.2144   LearningRate 0.0031   Epoch: 16   Global Step: 93690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:31,801-Speed 3373.81 samples/sec   Loss 1.1932   LearningRate 0.0031   Epoch: 16   Global Step: 93700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:07:34,939-Speed 3263.73 samples/sec   Loss 1.1954   LearningRate 0.0031   Epoch: 16   Global Step: 93710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:07:37,957-Speed 3394.18 samples/sec   Loss 1.1537   LearningRate 0.0031   Epoch: 16   Global Step: 93720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:40,983-Speed 3385.00 samples/sec   Loss 1.1744   LearningRate 0.0031   Epoch: 16   Global Step: 93730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:44,010-Speed 3383.69 samples/sec   Loss 1.1755   LearningRate 0.0031   Epoch: 16   Global Step: 93740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:47,036-Speed 3384.08 samples/sec   Loss 1.1780   LearningRate 0.0031   Epoch: 16   Global Step: 93750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:50,088-Speed 3357.03 samples/sec   Loss 1.2176   LearningRate 0.0031   Epoch: 16   Global Step: 93760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:53,199-Speed 3291.86 samples/sec   Loss 1.1355   LearningRate 0.0031   Epoch: 16   Global Step: 93770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:56,217-Speed 3394.19 samples/sec   Loss 1.1370   LearningRate 0.0031   Epoch: 16   Global Step: 93780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:07:59,245-Speed 3382.19 samples/sec   Loss 1.2009   LearningRate 0.0031   Epoch: 16   Global Step: 93790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:08:02,288-Speed 3365.86 samples/sec   Loss 1.1789   LearningRate 0.0031   Epoch: 16   Global Step: 93800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:08:05,316-Speed 3382.32 samples/sec   Loss 1.1049   LearningRate 0.0031   Epoch: 16   Global Step: 93810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:08:08,337-Speed 3390.81 samples/sec   Loss 1.2026   LearningRate 0.0031   Epoch: 16   Global Step: 93820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:11,359-Speed 3389.50 samples/sec   Loss 1.2202   LearningRate 0.0031   Epoch: 16   Global Step: 93830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:14,386-Speed 3383.24 samples/sec   Loss 1.1852   LearningRate 0.0031   Epoch: 16   Global Step: 93840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:17,413-Speed 3383.40 samples/sec   Loss 1.1864   LearningRate 0.0031   Epoch: 16   Global Step: 93850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:20,433-Speed 3391.26 samples/sec   Loss 1.1210   LearningRate 0.0030   Epoch: 16   Global Step: 93860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:23,456-Speed 3388.95 samples/sec   Loss 1.1676   LearningRate 0.0030   Epoch: 16   Global Step: 93870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:26,485-Speed 3381.20 samples/sec   Loss 1.1928   LearningRate 0.0030   Epoch: 16   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:29,514-Speed 3381.43 samples/sec   Loss 1.2186   LearningRate 0.0030   Epoch: 16   Global Step: 93890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:32,551-Speed 3372.92 samples/sec   Loss 1.1794   LearningRate 0.0030   Epoch: 16   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:35,672-Speed 3281.70 samples/sec   Loss 1.2145   LearningRate 0.0030   Epoch: 16   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:38,728-Speed 3351.30 samples/sec   Loss 1.2014   LearningRate 0.0030   Epoch: 16   Global Step: 93920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:41,754-Speed 3384.33 samples/sec   Loss 1.1532   LearningRate 0.0030   Epoch: 16   Global Step: 93930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:44,780-Speed 3385.80 samples/sec   Loss 1.1907   LearningRate 0.0030   Epoch: 16   Global Step: 93940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:47,803-Speed 3387.50 samples/sec   Loss 1.2013   LearningRate 0.0030   Epoch: 16   Global Step: 93950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:08:50,822-Speed 3393.23 samples/sec   Loss 1.2338   LearningRate 0.0030   Epoch: 16   Global Step: 93960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:08:53,899-Speed 3327.81 samples/sec   Loss 1.1824   LearningRate 0.0030   Epoch: 16   Global Step: 93970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:08:56,928-Speed 3381.43 samples/sec   Loss 1.1851   LearningRate 0.0030   Epoch: 16   Global Step: 93980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:08:59,987-Speed 3349.11 samples/sec   Loss 1.2136   LearningRate 0.0030   Epoch: 16   Global Step: 93990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:09:03,012-Speed 3386.07 samples/sec   Loss 1.1268   LearningRate 0.0030   Epoch: 16   Global Step: 94000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:09:46,430-[lfw][94000]XNorm: 21.774760
Training: 2022-04-27 11:09:46,431-[lfw][94000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-27 11:09:46,431-[lfw][94000]Accuracy-Highest: 0.99817
Training: 2022-04-27 11:10:36,982-[cfp_fp][94000]XNorm: 21.304880
Training: 2022-04-27 11:10:36,983-[cfp_fp][94000]Accuracy-Flip: 0.98386+-0.00670
Training: 2022-04-27 11:10:36,983-[cfp_fp][94000]Accuracy-Highest: 0.98386
Training: 2022-04-27 11:11:20,791-[agedb_30][94000]XNorm: 22.298897
Training: 2022-04-27 11:11:20,791-[agedb_30][94000]Accuracy-Flip: 0.98133+-0.00849
Training: 2022-04-27 11:11:20,792-[agedb_30][94000]Accuracy-Highest: 0.98133
Training: 2022-04-27 11:11:23,815-Speed 72.73 samples/sec   Loss 1.1243   LearningRate 0.0030   Epoch: 16   Global Step: 94010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:26,829-Speed 3397.59 samples/sec   Loss 1.1874   LearningRate 0.0030   Epoch: 16   Global Step: 94020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:29,840-Speed 3402.55 samples/sec   Loss 1.1513   LearningRate 0.0030   Epoch: 16   Global Step: 94030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:32,848-Speed 3405.28 samples/sec   Loss 1.0855   LearningRate 0.0030   Epoch: 16   Global Step: 94040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:35,871-Speed 3387.65 samples/sec   Loss 1.1185   LearningRate 0.0030   Epoch: 16   Global Step: 94050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:38,885-Speed 3398.85 samples/sec   Loss 1.2069   LearningRate 0.0030   Epoch: 16   Global Step: 94060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:11:41,883-Speed 3416.47 samples/sec   Loss 1.0700   LearningRate 0.0030   Epoch: 16   Global Step: 94070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:44,907-Speed 3386.91 samples/sec   Loss 1.0960   LearningRate 0.0030   Epoch: 16   Global Step: 94080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:47,925-Speed 3393.45 samples/sec   Loss 1.1690   LearningRate 0.0030   Epoch: 16   Global Step: 94090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:50,943-Speed 3394.16 samples/sec   Loss 1.2167   LearningRate 0.0030   Epoch: 16   Global Step: 94100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:54,010-Speed 3340.00 samples/sec   Loss 1.0889   LearningRate 0.0030   Epoch: 16   Global Step: 94110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:11:57,022-Speed 3400.32 samples/sec   Loss 1.1402   LearningRate 0.0030   Epoch: 16   Global Step: 94120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:00,035-Speed 3398.71 samples/sec   Loss 1.1065   LearningRate 0.0030   Epoch: 16   Global Step: 94130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:03,051-Speed 3397.12 samples/sec   Loss 1.1517   LearningRate 0.0030   Epoch: 16   Global Step: 94140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:06,077-Speed 3384.42 samples/sec   Loss 1.1602   LearningRate 0.0030   Epoch: 16   Global Step: 94150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:09,091-Speed 3398.59 samples/sec   Loss 1.1298   LearningRate 0.0030   Epoch: 16   Global Step: 94160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:12,109-Speed 3393.39 samples/sec   Loss 1.0889   LearningRate 0.0030   Epoch: 16   Global Step: 94170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:15,142-Speed 3377.63 samples/sec   Loss 1.1267   LearningRate 0.0030   Epoch: 16   Global Step: 94180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:18,172-Speed 3380.34 samples/sec   Loss 1.2176   LearningRate 0.0029   Epoch: 16   Global Step: 94190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:21,212-Speed 3368.77 samples/sec   Loss 1.1535   LearningRate 0.0029   Epoch: 16   Global Step: 94200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:24,274-Speed 3345.11 samples/sec   Loss 1.0837   LearningRate 0.0029   Epoch: 16   Global Step: 94210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:27,295-Speed 3390.16 samples/sec   Loss 1.2037   LearningRate 0.0029   Epoch: 16   Global Step: 94220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:30,314-Speed 3392.84 samples/sec   Loss 1.2052   LearningRate 0.0029   Epoch: 16   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:33,330-Speed 3396.16 samples/sec   Loss 1.1710   LearningRate 0.0029   Epoch: 16   Global Step: 94240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:12:36,341-Speed 3401.80 samples/sec   Loss 1.2269   LearningRate 0.0029   Epoch: 16   Global Step: 94250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:39,363-Speed 3388.83 samples/sec   Loss 1.1556   LearningRate 0.0029   Epoch: 16   Global Step: 94260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:42,383-Speed 3391.67 samples/sec   Loss 1.1632   LearningRate 0.0029   Epoch: 16   Global Step: 94270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:45,404-Speed 3390.03 samples/sec   Loss 1.1930   LearningRate 0.0029   Epoch: 16   Global Step: 94280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:48,429-Speed 3386.11 samples/sec   Loss 1.1865   LearningRate 0.0029   Epoch: 16   Global Step: 94290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:51,454-Speed 3386.38 samples/sec   Loss 1.1033   LearningRate 0.0029   Epoch: 16   Global Step: 94300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:54,484-Speed 3380.57 samples/sec   Loss 1.2663   LearningRate 0.0029   Epoch: 16   Global Step: 94310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:12:57,506-Speed 3389.13 samples/sec   Loss 1.2414   LearningRate 0.0029   Epoch: 16   Global Step: 94320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:00,529-Speed 3387.11 samples/sec   Loss 1.1621   LearningRate 0.0029   Epoch: 16   Global Step: 94330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:03,554-Speed 3386.72 samples/sec   Loss 1.1576   LearningRate 0.0029   Epoch: 16   Global Step: 94340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:06,575-Speed 3389.39 samples/sec   Loss 1.1477   LearningRate 0.0029   Epoch: 16   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:09,608-Speed 3377.37 samples/sec   Loss 1.1269   LearningRate 0.0029   Epoch: 16   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:12,631-Speed 3388.35 samples/sec   Loss 1.2605   LearningRate 0.0029   Epoch: 16   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:15,650-Speed 3393.46 samples/sec   Loss 1.2093   LearningRate 0.0029   Epoch: 16   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:18,671-Speed 3389.45 samples/sec   Loss 1.1738   LearningRate 0.0029   Epoch: 16   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:21,692-Speed 3390.33 samples/sec   Loss 1.2203   LearningRate 0.0029   Epoch: 16   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:24,724-Speed 3378.41 samples/sec   Loss 1.2269   LearningRate 0.0029   Epoch: 16   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:27,742-Speed 3393.33 samples/sec   Loss 1.1232   LearningRate 0.0029   Epoch: 16   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:30,760-Speed 3393.67 samples/sec   Loss 1.1818   LearningRate 0.0029   Epoch: 16   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:33,778-Speed 3393.79 samples/sec   Loss 1.2082   LearningRate 0.0029   Epoch: 16   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:13:36,806-Speed 3383.31 samples/sec   Loss 1.2430   LearningRate 0.0029   Epoch: 16   Global Step: 94450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:13:39,787-Speed 3435.70 samples/sec   Loss 1.1780   LearningRate 0.0029   Epoch: 16   Global Step: 94460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:42,804-Speed 3394.82 samples/sec   Loss 1.1930   LearningRate 0.0029   Epoch: 16   Global Step: 94470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:45,832-Speed 3382.65 samples/sec   Loss 1.2915   LearningRate 0.0029   Epoch: 16   Global Step: 94480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:48,856-Speed 3387.58 samples/sec   Loss 1.2803   LearningRate 0.0029   Epoch: 16   Global Step: 94490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:51,933-Speed 3328.86 samples/sec   Loss 1.0839   LearningRate 0.0029   Epoch: 16   Global Step: 94500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:54,948-Speed 3396.55 samples/sec   Loss 1.1607   LearningRate 0.0029   Epoch: 16   Global Step: 94510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:13:57,969-Speed 3390.04 samples/sec   Loss 1.1311   LearningRate 0.0029   Epoch: 16   Global Step: 94520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:00,990-Speed 3390.57 samples/sec   Loss 1.1640   LearningRate 0.0028   Epoch: 16   Global Step: 94530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:04,008-Speed 3394.20 samples/sec   Loss 1.1677   LearningRate 0.0028   Epoch: 16   Global Step: 94540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:07,029-Speed 3390.63 samples/sec   Loss 1.1921   LearningRate 0.0028   Epoch: 16   Global Step: 94550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:10,049-Speed 3391.29 samples/sec   Loss 1.2988   LearningRate 0.0028   Epoch: 16   Global Step: 94560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:13,191-Speed 3260.10 samples/sec   Loss 1.1410   LearningRate 0.0028   Epoch: 16   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:16,208-Speed 3394.33 samples/sec   Loss 1.1311   LearningRate 0.0028   Epoch: 16   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:19,223-Speed 3396.65 samples/sec   Loss 1.1712   LearningRate 0.0028   Epoch: 16   Global Step: 94590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:22,239-Speed 3396.12 samples/sec   Loss 1.1729   LearningRate 0.0028   Epoch: 16   Global Step: 94600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:25,257-Speed 3394.07 samples/sec   Loss 1.2323   LearningRate 0.0028   Epoch: 16   Global Step: 94610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:28,286-Speed 3381.34 samples/sec   Loss 1.1112   LearningRate 0.0028   Epoch: 16   Global Step: 94620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:31,309-Speed 3388.63 samples/sec   Loss 1.1430   LearningRate 0.0028   Epoch: 16   Global Step: 94630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:14:34,316-Speed 3406.01 samples/sec   Loss 1.0799   LearningRate 0.0028   Epoch: 16   Global Step: 94640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:37,359-Speed 3366.29 samples/sec   Loss 1.1497   LearningRate 0.0028   Epoch: 16   Global Step: 94650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:40,484-Speed 3277.20 samples/sec   Loss 1.1541   LearningRate 0.0028   Epoch: 16   Global Step: 94660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:43,504-Speed 3391.06 samples/sec   Loss 1.1613   LearningRate 0.0028   Epoch: 16   Global Step: 94670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:46,538-Speed 3376.66 samples/sec   Loss 1.2218   LearningRate 0.0028   Epoch: 16   Global Step: 94680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:49,561-Speed 3388.01 samples/sec   Loss 1.1757   LearningRate 0.0028   Epoch: 16   Global Step: 94690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:52,589-Speed 3382.65 samples/sec   Loss 1.1310   LearningRate 0.0028   Epoch: 16   Global Step: 94700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:55,611-Speed 3388.77 samples/sec   Loss 1.1157   LearningRate 0.0028   Epoch: 16   Global Step: 94710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:14:58,630-Speed 3392.63 samples/sec   Loss 1.1763   LearningRate 0.0028   Epoch: 16   Global Step: 94720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:01,656-Speed 3384.72 samples/sec   Loss 1.1600   LearningRate 0.0028   Epoch: 16   Global Step: 94730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:04,749-Speed 3311.87 samples/sec   Loss 1.1840   LearningRate 0.0028   Epoch: 16   Global Step: 94740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:07,776-Speed 3384.03 samples/sec   Loss 1.1353   LearningRate 0.0028   Epoch: 16   Global Step: 94750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:10,860-Speed 3320.31 samples/sec   Loss 1.1508   LearningRate 0.0028   Epoch: 16   Global Step: 94760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:13,967-Speed 3296.45 samples/sec   Loss 1.1987   LearningRate 0.0028   Epoch: 16   Global Step: 94770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:16,999-Speed 3377.89 samples/sec   Loss 1.1397   LearningRate 0.0028   Epoch: 16   Global Step: 94780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:20,027-Speed 3383.41 samples/sec   Loss 1.1703   LearningRate 0.0028   Epoch: 16   Global Step: 94790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:23,053-Speed 3384.78 samples/sec   Loss 1.1701   LearningRate 0.0028   Epoch: 16   Global Step: 94800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:26,075-Speed 3388.87 samples/sec   Loss 1.1397   LearningRate 0.0028   Epoch: 16   Global Step: 94810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:29,106-Speed 3379.87 samples/sec   Loss 1.1616   LearningRate 0.0028   Epoch: 16   Global Step: 94820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:32,127-Speed 3389.98 samples/sec   Loss 1.0947   LearningRate 0.0028   Epoch: 16   Global Step: 94830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:35,151-Speed 3387.55 samples/sec   Loss 1.1761   LearningRate 0.0028   Epoch: 16   Global Step: 94840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:38,176-Speed 3385.00 samples/sec   Loss 1.1706   LearningRate 0.0028   Epoch: 16   Global Step: 94850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:41,203-Speed 3384.21 samples/sec   Loss 1.1937   LearningRate 0.0028   Epoch: 16   Global Step: 94860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:15:44,231-Speed 3382.09 samples/sec   Loss 1.0785   LearningRate 0.0027   Epoch: 16   Global Step: 94870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:47,252-Speed 3391.02 samples/sec   Loss 1.2193   LearningRate 0.0027   Epoch: 16   Global Step: 94880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:50,314-Speed 3344.72 samples/sec   Loss 1.1965   LearningRate 0.0027   Epoch: 16   Global Step: 94890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:53,383-Speed 3338.11 samples/sec   Loss 1.1593   LearningRate 0.0027   Epoch: 16   Global Step: 94900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:56,408-Speed 3385.68 samples/sec   Loss 1.1505   LearningRate 0.0027   Epoch: 16   Global Step: 94910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:15:59,444-Speed 3373.40 samples/sec   Loss 1.1225   LearningRate 0.0027   Epoch: 16   Global Step: 94920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:02,466-Speed 3389.66 samples/sec   Loss 1.1081   LearningRate 0.0027   Epoch: 16   Global Step: 94930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:05,501-Speed 3374.11 samples/sec   Loss 1.1669   LearningRate 0.0027   Epoch: 16   Global Step: 94940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:08,547-Speed 3362.43 samples/sec   Loss 1.1618   LearningRate 0.0027   Epoch: 16   Global Step: 94950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:11,576-Speed 3381.66 samples/sec   Loss 1.1357   LearningRate 0.0027   Epoch: 16   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:14,609-Speed 3376.95 samples/sec   Loss 1.1324   LearningRate 0.0027   Epoch: 16   Global Step: 94970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:16:17,617-Speed 3405.13 samples/sec   Loss 1.1562   LearningRate 0.0027   Epoch: 16   Global Step: 94980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:20,639-Speed 3388.96 samples/sec   Loss 1.1572   LearningRate 0.0027   Epoch: 16   Global Step: 94990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:23,665-Speed 3385.84 samples/sec   Loss 1.1280   LearningRate 0.0027   Epoch: 16   Global Step: 95000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:26,696-Speed 3378.80 samples/sec   Loss 1.1663   LearningRate 0.0027   Epoch: 16   Global Step: 95010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:29,719-Speed 3388.03 samples/sec   Loss 1.1092   LearningRate 0.0027   Epoch: 16   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:32,743-Speed 3386.59 samples/sec   Loss 1.1525   LearningRate 0.0027   Epoch: 16   Global Step: 95030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:35,767-Speed 3386.87 samples/sec   Loss 1.0696   LearningRate 0.0027   Epoch: 16   Global Step: 95040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:38,889-Speed 3280.77 samples/sec   Loss 1.1271   LearningRate 0.0027   Epoch: 16   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:41,964-Speed 3330.94 samples/sec   Loss 1.0765   LearningRate 0.0027   Epoch: 16   Global Step: 95060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:44,995-Speed 3380.12 samples/sec   Loss 1.2175   LearningRate 0.0027   Epoch: 16   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:48,004-Speed 3403.26 samples/sec   Loss 1.0925   LearningRate 0.0027   Epoch: 16   Global Step: 95080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:51,028-Speed 3387.99 samples/sec   Loss 1.2361   LearningRate 0.0027   Epoch: 16   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:54,049-Speed 3389.82 samples/sec   Loss 1.1223   LearningRate 0.0027   Epoch: 16   Global Step: 95100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:16:57,079-Speed 3380.05 samples/sec   Loss 1.0909   LearningRate 0.0027   Epoch: 16   Global Step: 95110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:17:00,107-Speed 3382.61 samples/sec   Loss 1.0441   LearningRate 0.0027   Epoch: 16   Global Step: 95120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:17:03,119-Speed 3401.53 samples/sec   Loss 1.1325   LearningRate 0.0027   Epoch: 16   Global Step: 95130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:06,144-Speed 3385.66 samples/sec   Loss 1.1655   LearningRate 0.0027   Epoch: 16   Global Step: 95140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:09,170-Speed 3384.39 samples/sec   Loss 1.2057   LearningRate 0.0027   Epoch: 16   Global Step: 95150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:12,196-Speed 3385.37 samples/sec   Loss 1.1540   LearningRate 0.0027   Epoch: 16   Global Step: 95160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:15,228-Speed 3378.35 samples/sec   Loss 1.0425   LearningRate 0.0027   Epoch: 16   Global Step: 95170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:18,253-Speed 3385.52 samples/sec   Loss 1.1939   LearningRate 0.0027   Epoch: 16   Global Step: 95180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:21,283-Speed 3380.64 samples/sec   Loss 1.2048   LearningRate 0.0027   Epoch: 16   Global Step: 95190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:24,307-Speed 3386.94 samples/sec   Loss 1.1918   LearningRate 0.0027   Epoch: 16   Global Step: 95200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:27,333-Speed 3386.10 samples/sec   Loss 1.1149   LearningRate 0.0026   Epoch: 16   Global Step: 95210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:30,358-Speed 3385.44 samples/sec   Loss 1.2532   LearningRate 0.0026   Epoch: 16   Global Step: 95220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:33,393-Speed 3374.69 samples/sec   Loss 1.1238   LearningRate 0.0026   Epoch: 16   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:17:36,403-Speed 3402.44 samples/sec   Loss 1.1176   LearningRate 0.0026   Epoch: 16   Global Step: 95240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:39,435-Speed 3378.86 samples/sec   Loss 1.2607   LearningRate 0.0026   Epoch: 16   Global Step: 95250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:42,459-Speed 3386.44 samples/sec   Loss 1.1576   LearningRate 0.0026   Epoch: 16   Global Step: 95260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:45,495-Speed 3373.86 samples/sec   Loss 1.1532   LearningRate 0.0026   Epoch: 16   Global Step: 95270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:48,524-Speed 3381.76 samples/sec   Loss 1.0530   LearningRate 0.0026   Epoch: 16   Global Step: 95280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:51,557-Speed 3377.09 samples/sec   Loss 1.1882   LearningRate 0.0026   Epoch: 16   Global Step: 95290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:54,587-Speed 3379.83 samples/sec   Loss 1.1207   LearningRate 0.0026   Epoch: 16   Global Step: 95300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:17:57,665-Speed 3327.89 samples/sec   Loss 1.1434   LearningRate 0.0026   Epoch: 16   Global Step: 95310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:18:00,692-Speed 3383.68 samples/sec   Loss 1.1472   LearningRate 0.0026   Epoch: 16   Global Step: 95320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:18:03,719-Speed 3382.90 samples/sec   Loss 1.1300   LearningRate 0.0026   Epoch: 16   Global Step: 95330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:18:06,766-Speed 3362.17 samples/sec   Loss 1.1681   LearningRate 0.0026   Epoch: 16   Global Step: 95340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:09,893-Speed 3275.65 samples/sec   Loss 1.1550   LearningRate 0.0026   Epoch: 16   Global Step: 95350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:12,946-Speed 3354.37 samples/sec   Loss 1.1003   LearningRate 0.0026   Epoch: 16   Global Step: 95360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:15,980-Speed 3376.33 samples/sec   Loss 1.1793   LearningRate 0.0026   Epoch: 16   Global Step: 95370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:19,008-Speed 3382.29 samples/sec   Loss 1.0572   LearningRate 0.0026   Epoch: 16   Global Step: 95380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:22,040-Speed 3377.65 samples/sec   Loss 1.1558   LearningRate 0.0026   Epoch: 16   Global Step: 95390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:25,078-Speed 3372.26 samples/sec   Loss 1.0856   LearningRate 0.0026   Epoch: 16   Global Step: 95400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:28,110-Speed 3378.40 samples/sec   Loss 1.1043   LearningRate 0.0026   Epoch: 16   Global Step: 95410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:31,133-Speed 3387.57 samples/sec   Loss 1.1950   LearningRate 0.0026   Epoch: 16   Global Step: 95420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:34,172-Speed 3370.11 samples/sec   Loss 1.1293   LearningRate 0.0026   Epoch: 16   Global Step: 95430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:37,185-Speed 3399.94 samples/sec   Loss 1.0737   LearningRate 0.0026   Epoch: 16   Global Step: 95440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:40,216-Speed 3379.64 samples/sec   Loss 1.2071   LearningRate 0.0026   Epoch: 16   Global Step: 95450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:43,242-Speed 3385.24 samples/sec   Loss 1.2554   LearningRate 0.0026   Epoch: 16   Global Step: 95460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:46,267-Speed 3385.78 samples/sec   Loss 1.1660   LearningRate 0.0026   Epoch: 16   Global Step: 95470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:49,292-Speed 3385.00 samples/sec   Loss 1.1161   LearningRate 0.0026   Epoch: 16   Global Step: 95480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:52,317-Speed 3385.59 samples/sec   Loss 1.0973   LearningRate 0.0026   Epoch: 16   Global Step: 95490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:55,350-Speed 3378.07 samples/sec   Loss 1.1405   LearningRate 0.0026   Epoch: 16   Global Step: 95500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:18:58,394-Speed 3364.70 samples/sec   Loss 1.1138   LearningRate 0.0026   Epoch: 16   Global Step: 95510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:01,432-Speed 3370.94 samples/sec   Loss 1.1622   LearningRate 0.0026   Epoch: 16   Global Step: 95520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:04,460-Speed 3382.86 samples/sec   Loss 1.2153   LearningRate 0.0026   Epoch: 16   Global Step: 95530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:07,485-Speed 3386.17 samples/sec   Loss 1.1577   LearningRate 0.0026   Epoch: 16   Global Step: 95540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:19:10,498-Speed 3398.65 samples/sec   Loss 1.2353   LearningRate 0.0026   Epoch: 16   Global Step: 95550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:13,523-Speed 3386.98 samples/sec   Loss 1.1515   LearningRate 0.0026   Epoch: 16   Global Step: 95560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:16,565-Speed 3365.97 samples/sec   Loss 1.1498   LearningRate 0.0025   Epoch: 16   Global Step: 95570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:19,586-Speed 3390.45 samples/sec   Loss 1.1691   LearningRate 0.0025   Epoch: 16   Global Step: 95580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:22,617-Speed 3379.53 samples/sec   Loss 1.0668   LearningRate 0.0025   Epoch: 16   Global Step: 95590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:25,655-Speed 3371.03 samples/sec   Loss 1.1376   LearningRate 0.0025   Epoch: 16   Global Step: 95600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:28,713-Speed 3349.40 samples/sec   Loss 1.1850   LearningRate 0.0025   Epoch: 16   Global Step: 95610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:31,736-Speed 3389.43 samples/sec   Loss 1.1036   LearningRate 0.0025   Epoch: 16   Global Step: 95620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:34,764-Speed 3381.65 samples/sec   Loss 1.2407   LearningRate 0.0025   Epoch: 16   Global Step: 95630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:37,788-Speed 3387.71 samples/sec   Loss 1.0612   LearningRate 0.0025   Epoch: 16   Global Step: 95640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:40,808-Speed 3391.46 samples/sec   Loss 1.0820   LearningRate 0.0025   Epoch: 16   Global Step: 95650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:19:43,828-Speed 3390.93 samples/sec   Loss 1.0818   LearningRate 0.0025   Epoch: 16   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:46,853-Speed 3385.94 samples/sec   Loss 1.1522   LearningRate 0.0025   Epoch: 16   Global Step: 95670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:49,876-Speed 3387.69 samples/sec   Loss 1.2030   LearningRate 0.0025   Epoch: 16   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:52,905-Speed 3382.27 samples/sec   Loss 1.1790   LearningRate 0.0025   Epoch: 16   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:55,924-Speed 3392.50 samples/sec   Loss 1.0790   LearningRate 0.0025   Epoch: 16   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:19:58,953-Speed 3381.48 samples/sec   Loss 1.1992   LearningRate 0.0025   Epoch: 16   Global Step: 95710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:01,984-Speed 3379.58 samples/sec   Loss 1.1750   LearningRate 0.0025   Epoch: 16   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:05,121-Speed 3264.84 samples/sec   Loss 1.1459   LearningRate 0.0025   Epoch: 16   Global Step: 95730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:08,153-Speed 3378.24 samples/sec   Loss 1.1201   LearningRate 0.0025   Epoch: 16   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:11,190-Speed 3372.12 samples/sec   Loss 1.1500   LearningRate 0.0025   Epoch: 16   Global Step: 95750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:14,219-Speed 3381.71 samples/sec   Loss 1.1437   LearningRate 0.0025   Epoch: 16   Global Step: 95760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:17,245-Speed 3384.80 samples/sec   Loss 1.2002   LearningRate 0.0025   Epoch: 16   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:20,278-Speed 3376.59 samples/sec   Loss 1.1507   LearningRate 0.0025   Epoch: 16   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:23,314-Speed 3374.12 samples/sec   Loss 1.1594   LearningRate 0.0025   Epoch: 16   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:26,344-Speed 3380.55 samples/sec   Loss 1.0676   LearningRate 0.0025   Epoch: 16   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:29,480-Speed 3265.33 samples/sec   Loss 1.2086   LearningRate 0.0025   Epoch: 16   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:32,508-Speed 3382.59 samples/sec   Loss 1.1265   LearningRate 0.0025   Epoch: 16   Global Step: 95820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:35,609-Speed 3302.73 samples/sec   Loss 1.2075   LearningRate 0.0025   Epoch: 16   Global Step: 95830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:38,653-Speed 3365.28 samples/sec   Loss 1.1239   LearningRate 0.0025   Epoch: 16   Global Step: 95840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:41,681-Speed 3382.66 samples/sec   Loss 1.2465   LearningRate 0.0025   Epoch: 16   Global Step: 95850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:44,695-Speed 3398.19 samples/sec   Loss 1.0938   LearningRate 0.0025   Epoch: 16   Global Step: 95860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:47,726-Speed 3379.18 samples/sec   Loss 1.1082   LearningRate 0.0025   Epoch: 16   Global Step: 95870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:50,758-Speed 3378.29 samples/sec   Loss 1.2203   LearningRate 0.0025   Epoch: 16   Global Step: 95880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:53,788-Speed 3380.34 samples/sec   Loss 1.1023   LearningRate 0.0025   Epoch: 16   Global Step: 95890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:56,815-Speed 3384.21 samples/sec   Loss 1.1488   LearningRate 0.0025   Epoch: 16   Global Step: 95900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:20:59,843-Speed 3381.81 samples/sec   Loss 1.1815   LearningRate 0.0025   Epoch: 16   Global Step: 95910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:21:02,879-Speed 3374.25 samples/sec   Loss 1.1002   LearningRate 0.0025   Epoch: 16   Global Step: 95920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:21:05,897-Speed 3393.68 samples/sec   Loss 1.2135   LearningRate 0.0024   Epoch: 16   Global Step: 95930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:21:08,944-Speed 3361.26 samples/sec   Loss 1.1731   LearningRate 0.0024   Epoch: 16   Global Step: 95940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:21:11,986-Speed 3366.57 samples/sec   Loss 1.1789   LearningRate 0.0024   Epoch: 16   Global Step: 95950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:21:15,029-Speed 3365.63 samples/sec   Loss 1.1989   LearningRate 0.0024   Epoch: 16   Global Step: 95960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:21:18,059-Speed 3380.54 samples/sec   Loss 1.1200   LearningRate 0.0024   Epoch: 16   Global Step: 95970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:21:21,088-Speed 3381.59 samples/sec   Loss 1.1181   LearningRate 0.0024   Epoch: 16   Global Step: 95980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:21:24,133-Speed 3363.75 samples/sec   Loss 1.0332   LearningRate 0.0024   Epoch: 16   Global Step: 95990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:21:27,165-Speed 3378.09 samples/sec   Loss 1.0504   LearningRate 0.0024   Epoch: 16   Global Step: 96000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:22:10,523-[lfw][96000]XNorm: 22.444819
Training: 2022-04-27 11:22:10,524-[lfw][96000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-27 11:22:10,524-[lfw][96000]Accuracy-Highest: 0.99817
Training: 2022-04-27 11:23:01,139-[cfp_fp][96000]XNorm: 21.908401
Training: 2022-04-27 11:23:01,140-[cfp_fp][96000]Accuracy-Flip: 0.98286+-0.00688
Training: 2022-04-27 11:23:01,140-[cfp_fp][96000]Accuracy-Highest: 0.98386
Training: 2022-04-27 11:23:44,485-[agedb_30][96000]XNorm: 22.459557
Training: 2022-04-27 11:23:44,486-[agedb_30][96000]Accuracy-Flip: 0.98233+-0.00779
Training: 2022-04-27 11:23:44,486-[agedb_30][96000]Accuracy-Highest: 0.98233
Training: 2022-04-27 11:23:47,507-Speed 72.97 samples/sec   Loss 1.1703   LearningRate 0.0024   Epoch: 16   Global Step: 96010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:23:50,533-Speed 3384.75 samples/sec   Loss 1.1087   LearningRate 0.0024   Epoch: 16   Global Step: 96020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:23:53,539-Speed 3408.11 samples/sec   Loss 1.1369   LearningRate 0.0024   Epoch: 16   Global Step: 96030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:23:56,547-Speed 3404.04 samples/sec   Loss 1.1461   LearningRate 0.0024   Epoch: 16   Global Step: 96040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:23:59,608-Speed 3346.57 samples/sec   Loss 1.1152   LearningRate 0.0024   Epoch: 16   Global Step: 96050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:02,624-Speed 3396.10 samples/sec   Loss 1.1299   LearningRate 0.0024   Epoch: 16   Global Step: 96060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:05,643-Speed 3393.03 samples/sec   Loss 1.1687   LearningRate 0.0024   Epoch: 16   Global Step: 96070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:08,659-Speed 3395.58 samples/sec   Loss 1.0954   LearningRate 0.0024   Epoch: 16   Global Step: 96080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:11,676-Speed 3395.06 samples/sec   Loss 1.1969   LearningRate 0.0024   Epoch: 16   Global Step: 96090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:14,699-Speed 3388.08 samples/sec   Loss 1.0688   LearningRate 0.0024   Epoch: 16   Global Step: 96100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:17,720-Speed 3390.04 samples/sec   Loss 1.1047   LearningRate 0.0024   Epoch: 16   Global Step: 96110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:20,740-Speed 3391.49 samples/sec   Loss 1.1751   LearningRate 0.0024   Epoch: 16   Global Step: 96120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:23,773-Speed 3376.75 samples/sec   Loss 1.1876   LearningRate 0.0024   Epoch: 16   Global Step: 96130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:24:26,774-Speed 3413.62 samples/sec   Loss 1.0949   LearningRate 0.0024   Epoch: 16   Global Step: 96140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:29,792-Speed 3393.80 samples/sec   Loss 1.0637   LearningRate 0.0024   Epoch: 16   Global Step: 96150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:32,808-Speed 3396.21 samples/sec   Loss 1.1670   LearningRate 0.0024   Epoch: 16   Global Step: 96160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:35,825-Speed 3394.59 samples/sec   Loss 1.1146   LearningRate 0.0024   Epoch: 16   Global Step: 96170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:38,840-Speed 3397.04 samples/sec   Loss 1.0924   LearningRate 0.0024   Epoch: 16   Global Step: 96180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:41,849-Speed 3404.18 samples/sec   Loss 1.1564   LearningRate 0.0024   Epoch: 16   Global Step: 96190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:44,861-Speed 3400.46 samples/sec   Loss 1.1612   LearningRate 0.0024   Epoch: 16   Global Step: 96200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:47,871-Speed 3402.25 samples/sec   Loss 1.1603   LearningRate 0.0024   Epoch: 16   Global Step: 96210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:50,887-Speed 3396.10 samples/sec   Loss 1.1539   LearningRate 0.0024   Epoch: 16   Global Step: 96220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:53,913-Speed 3385.20 samples/sec   Loss 1.1165   LearningRate 0.0024   Epoch: 16   Global Step: 96230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:24:56,934-Speed 3390.27 samples/sec   Loss 1.1168   LearningRate 0.0024   Epoch: 16   Global Step: 96240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:24:59,924-Speed 3425.16 samples/sec   Loss 1.1487   LearningRate 0.0024   Epoch: 16   Global Step: 96250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:25:02,955-Speed 3379.98 samples/sec   Loss 1.1368   LearningRate 0.0024   Epoch: 16   Global Step: 96260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:25:05,953-Speed 3416.59 samples/sec   Loss 1.1639   LearningRate 0.0024   Epoch: 16   Global Step: 96270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:08,968-Speed 3396.65 samples/sec   Loss 1.0394   LearningRate 0.0024   Epoch: 16   Global Step: 96280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:11,985-Speed 3395.24 samples/sec   Loss 1.1419   LearningRate 0.0023   Epoch: 16   Global Step: 96290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:14,996-Speed 3401.79 samples/sec   Loss 1.2004   LearningRate 0.0023   Epoch: 16   Global Step: 96300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:18,010-Speed 3397.68 samples/sec   Loss 1.1062   LearningRate 0.0023   Epoch: 16   Global Step: 96310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:21,025-Speed 3397.27 samples/sec   Loss 1.1485   LearningRate 0.0023   Epoch: 16   Global Step: 96320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:24,046-Speed 3390.58 samples/sec   Loss 1.1507   LearningRate 0.0023   Epoch: 16   Global Step: 96330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:27,061-Speed 3396.90 samples/sec   Loss 1.1785   LearningRate 0.0023   Epoch: 16   Global Step: 96340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:30,072-Speed 3401.23 samples/sec   Loss 1.1854   LearningRate 0.0023   Epoch: 16   Global Step: 96350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:33,102-Speed 3381.20 samples/sec   Loss 1.1627   LearningRate 0.0023   Epoch: 16   Global Step: 96360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:36,101-Speed 3414.95 samples/sec   Loss 1.1423   LearningRate 0.0023   Epoch: 16   Global Step: 96370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:39,117-Speed 3396.05 samples/sec   Loss 1.1711   LearningRate 0.0023   Epoch: 16   Global Step: 96380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:42,136-Speed 3393.18 samples/sec   Loss 1.1325   LearningRate 0.0023   Epoch: 16   Global Step: 96390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:45,152-Speed 3395.54 samples/sec   Loss 1.1214   LearningRate 0.0023   Epoch: 16   Global Step: 96400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:48,162-Speed 3402.32 samples/sec   Loss 1.1313   LearningRate 0.0023   Epoch: 16   Global Step: 96410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:51,176-Speed 3398.28 samples/sec   Loss 1.1620   LearningRate 0.0023   Epoch: 16   Global Step: 96420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:54,189-Speed 3399.25 samples/sec   Loss 1.0531   LearningRate 0.0023   Epoch: 16   Global Step: 96430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:25:57,203-Speed 3399.30 samples/sec   Loss 1.1078   LearningRate 0.0023   Epoch: 16   Global Step: 96440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:00,241-Speed 3370.86 samples/sec   Loss 1.1697   LearningRate 0.0023   Epoch: 16   Global Step: 96450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:03,254-Speed 3400.10 samples/sec   Loss 1.2284   LearningRate 0.0023   Epoch: 16   Global Step: 96460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:06,265-Speed 3401.34 samples/sec   Loss 1.1204   LearningRate 0.0023   Epoch: 16   Global Step: 96470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:09,288-Speed 3388.33 samples/sec   Loss 1.1268   LearningRate 0.0023   Epoch: 16   Global Step: 96480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:12,417-Speed 3273.04 samples/sec   Loss 1.1115   LearningRate 0.0023   Epoch: 16   Global Step: 96490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:15,460-Speed 3366.09 samples/sec   Loss 1.1871   LearningRate 0.0023   Epoch: 16   Global Step: 96500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:18,476-Speed 3395.53 samples/sec   Loss 1.1425   LearningRate 0.0023   Epoch: 16   Global Step: 96510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:21,488-Speed 3400.55 samples/sec   Loss 1.1595   LearningRate 0.0023   Epoch: 16   Global Step: 96520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:24,501-Speed 3399.81 samples/sec   Loss 1.1843   LearningRate 0.0023   Epoch: 16   Global Step: 96530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:27,519-Speed 3393.87 samples/sec   Loss 1.1374   LearningRate 0.0023   Epoch: 16   Global Step: 96540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:30,535-Speed 3396.07 samples/sec   Loss 1.1351   LearningRate 0.0023   Epoch: 16   Global Step: 96550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:33,549-Speed 3398.47 samples/sec   Loss 1.1188   LearningRate 0.0023   Epoch: 16   Global Step: 96560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:36,565-Speed 3395.32 samples/sec   Loss 1.0857   LearningRate 0.0023   Epoch: 16   Global Step: 96570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:26:39,658-Speed 3311.87 samples/sec   Loss 1.1543   LearningRate 0.0023   Epoch: 16   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:26:42,665-Speed 3406.64 samples/sec   Loss 1.1386   LearningRate 0.0023   Epoch: 16   Global Step: 96590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:45,691-Speed 3384.05 samples/sec   Loss 1.1846   LearningRate 0.0023   Epoch: 16   Global Step: 96600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:48,733-Speed 3367.02 samples/sec   Loss 1.1319   LearningRate 0.0023   Epoch: 16   Global Step: 96610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:51,753-Speed 3391.38 samples/sec   Loss 1.0955   LearningRate 0.0023   Epoch: 16   Global Step: 96620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:54,774-Speed 3390.62 samples/sec   Loss 1.1120   LearningRate 0.0023   Epoch: 16   Global Step: 96630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:26:57,790-Speed 3396.32 samples/sec   Loss 1.1280   LearningRate 0.0023   Epoch: 16   Global Step: 96640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:27:00,806-Speed 3396.52 samples/sec   Loss 1.1298   LearningRate 0.0023   Epoch: 16   Global Step: 96650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:27:03,897-Speed 3313.21 samples/sec   Loss 1.0689   LearningRate 0.0023   Epoch: 16   Global Step: 96660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:27:17,151-Speed 772.66 samples/sec   Loss 0.8309   LearningRate 0.0022   Epoch: 17   Global Step: 96670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:27:20,161-Speed 3402.86 samples/sec   Loss 0.8191   LearningRate 0.0022   Epoch: 17   Global Step: 96680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:27:23,183-Speed 3389.94 samples/sec   Loss 0.7564   LearningRate 0.0022   Epoch: 17   Global Step: 96690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:26,225-Speed 3366.34 samples/sec   Loss 0.8170   LearningRate 0.0022   Epoch: 17   Global Step: 96700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:29,260-Speed 3374.78 samples/sec   Loss 0.7908   LearningRate 0.0022   Epoch: 17   Global Step: 96710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:32,291-Speed 3378.68 samples/sec   Loss 0.7564   LearningRate 0.0022   Epoch: 17   Global Step: 96720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:35,331-Speed 3369.55 samples/sec   Loss 0.7287   LearningRate 0.0022   Epoch: 17   Global Step: 96730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:38,373-Speed 3367.43 samples/sec   Loss 0.8058   LearningRate 0.0022   Epoch: 17   Global Step: 96740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:41,391-Speed 3394.34 samples/sec   Loss 0.7069   LearningRate 0.0022   Epoch: 17   Global Step: 96750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:44,418-Speed 3383.34 samples/sec   Loss 0.7949   LearningRate 0.0022   Epoch: 17   Global Step: 96760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:47,444-Speed 3384.13 samples/sec   Loss 0.7713   LearningRate 0.0022   Epoch: 17   Global Step: 96770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:50,479-Speed 3374.64 samples/sec   Loss 0.8263   LearningRate 0.0022   Epoch: 17   Global Step: 96780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:27:53,505-Speed 3385.31 samples/sec   Loss 0.8016   LearningRate 0.0022   Epoch: 17   Global Step: 96790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:27:56,534-Speed 3381.26 samples/sec   Loss 0.7392   LearningRate 0.0022   Epoch: 17   Global Step: 96800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:27:59,540-Speed 3407.49 samples/sec   Loss 0.8107   LearningRate 0.0022   Epoch: 17   Global Step: 96810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:02,562-Speed 3388.94 samples/sec   Loss 0.7245   LearningRate 0.0022   Epoch: 17   Global Step: 96820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:05,584-Speed 3389.92 samples/sec   Loss 0.8274   LearningRate 0.0022   Epoch: 17   Global Step: 96830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:08,608-Speed 3386.57 samples/sec   Loss 0.8283   LearningRate 0.0022   Epoch: 17   Global Step: 96840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:11,692-Speed 3321.27 samples/sec   Loss 0.7840   LearningRate 0.0022   Epoch: 17   Global Step: 96850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:14,707-Speed 3397.01 samples/sec   Loss 0.8065   LearningRate 0.0022   Epoch: 17   Global Step: 96860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:17,731-Speed 3387.19 samples/sec   Loss 0.7830   LearningRate 0.0022   Epoch: 17   Global Step: 96870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:20,735-Speed 3408.93 samples/sec   Loss 0.8186   LearningRate 0.0022   Epoch: 17   Global Step: 96880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:23,752-Speed 3395.24 samples/sec   Loss 0.7403   LearningRate 0.0022   Epoch: 17   Global Step: 96890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:26,770-Speed 3394.07 samples/sec   Loss 0.8370   LearningRate 0.0022   Epoch: 17   Global Step: 96900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:29,786-Speed 3396.44 samples/sec   Loss 0.8804   LearningRate 0.0022   Epoch: 17   Global Step: 96910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:32,812-Speed 3385.06 samples/sec   Loss 0.7808   LearningRate 0.0022   Epoch: 17   Global Step: 96920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:35,872-Speed 3346.31 samples/sec   Loss 0.7608   LearningRate 0.0022   Epoch: 17   Global Step: 96930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:38,900-Speed 3382.38 samples/sec   Loss 0.7514   LearningRate 0.0022   Epoch: 17   Global Step: 96940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:41,921-Speed 3390.39 samples/sec   Loss 0.8459   LearningRate 0.0022   Epoch: 17   Global Step: 96950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:44,964-Speed 3365.98 samples/sec   Loss 0.8356   LearningRate 0.0022   Epoch: 17   Global Step: 96960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:48,016-Speed 3356.07 samples/sec   Loss 0.7859   LearningRate 0.0022   Epoch: 17   Global Step: 96970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:28:51,062-Speed 3365.38 samples/sec   Loss 0.8722   LearningRate 0.0022   Epoch: 17   Global Step: 96980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:54,168-Speed 3297.46 samples/sec   Loss 0.7171   LearningRate 0.0022   Epoch: 17   Global Step: 96990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:28:57,196-Speed 3382.95 samples/sec   Loss 0.9039   LearningRate 0.0022   Epoch: 17   Global Step: 97000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:00,234-Speed 3371.86 samples/sec   Loss 0.8146   LearningRate 0.0022   Epoch: 17   Global Step: 97010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:03,268-Speed 3375.60 samples/sec   Loss 0.8004   LearningRate 0.0022   Epoch: 17   Global Step: 97020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:06,305-Speed 3372.64 samples/sec   Loss 0.8258   LearningRate 0.0022   Epoch: 17   Global Step: 97030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:09,324-Speed 3391.80 samples/sec   Loss 0.7963   LearningRate 0.0022   Epoch: 17   Global Step: 97040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:12,392-Speed 3338.47 samples/sec   Loss 0.7942   LearningRate 0.0021   Epoch: 17   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:15,421-Speed 3381.59 samples/sec   Loss 0.8383   LearningRate 0.0021   Epoch: 17   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:18,451-Speed 3380.49 samples/sec   Loss 0.7884   LearningRate 0.0021   Epoch: 17   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:21,478-Speed 3384.36 samples/sec   Loss 0.8048   LearningRate 0.0021   Epoch: 17   Global Step: 97080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:29:24,482-Speed 3409.70 samples/sec   Loss 0.8359   LearningRate 0.0021   Epoch: 17   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:27,559-Speed 3328.91 samples/sec   Loss 0.7714   LearningRate 0.0021   Epoch: 17   Global Step: 97100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:30,638-Speed 3325.54 samples/sec   Loss 0.7600   LearningRate 0.0021   Epoch: 17   Global Step: 97110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:33,660-Speed 3389.51 samples/sec   Loss 0.7788   LearningRate 0.0021   Epoch: 17   Global Step: 97120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:36,682-Speed 3389.63 samples/sec   Loss 0.8414   LearningRate 0.0021   Epoch: 17   Global Step: 97130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:39,707-Speed 3385.34 samples/sec   Loss 0.8785   LearningRate 0.0021   Epoch: 17   Global Step: 97140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:29:42,719-Speed 3400.43 samples/sec   Loss 0.7712   LearningRate 0.0021   Epoch: 17   Global Step: 97150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:29:45,742-Speed 3388.69 samples/sec   Loss 0.8237   LearningRate 0.0021   Epoch: 17   Global Step: 97160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:29:48,818-Speed 3329.88 samples/sec   Loss 0.8051   LearningRate 0.0021   Epoch: 17   Global Step: 97170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:29:51,849-Speed 3378.72 samples/sec   Loss 0.8229   LearningRate 0.0021   Epoch: 17   Global Step: 97180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:29:54,880-Speed 3379.66 samples/sec   Loss 0.8337   LearningRate 0.0021   Epoch: 17   Global Step: 97190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:29:57,910-Speed 3380.45 samples/sec   Loss 0.7940   LearningRate 0.0021   Epoch: 17   Global Step: 97200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:30:00,933-Speed 3388.37 samples/sec   Loss 0.7311   LearningRate 0.0021   Epoch: 17   Global Step: 97210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:30:03,964-Speed 3378.84 samples/sec   Loss 0.8310   LearningRate 0.0021   Epoch: 17   Global Step: 97220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:30:06,998-Speed 3375.83 samples/sec   Loss 0.8356   LearningRate 0.0021   Epoch: 17   Global Step: 97230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:30:10,032-Speed 3376.01 samples/sec   Loss 0.7604   LearningRate 0.0021   Epoch: 17   Global Step: 97240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:30:13,069-Speed 3372.85 samples/sec   Loss 0.8463   LearningRate 0.0021   Epoch: 17   Global Step: 97250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:16,095-Speed 3384.01 samples/sec   Loss 0.7939   LearningRate 0.0021   Epoch: 17   Global Step: 97260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:19,118-Speed 3388.39 samples/sec   Loss 0.8148   LearningRate 0.0021   Epoch: 17   Global Step: 97270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:22,150-Speed 3378.45 samples/sec   Loss 0.8159   LearningRate 0.0021   Epoch: 17   Global Step: 97280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:25,174-Speed 3386.61 samples/sec   Loss 0.8591   LearningRate 0.0021   Epoch: 17   Global Step: 97290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:28,201-Speed 3383.38 samples/sec   Loss 0.7709   LearningRate 0.0021   Epoch: 17   Global Step: 97300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:31,228-Speed 3383.62 samples/sec   Loss 0.8051   LearningRate 0.0021   Epoch: 17   Global Step: 97310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:34,264-Speed 3374.16 samples/sec   Loss 0.7583   LearningRate 0.0021   Epoch: 17   Global Step: 97320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:37,393-Speed 3273.62 samples/sec   Loss 0.8117   LearningRate 0.0021   Epoch: 17   Global Step: 97330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:40,492-Speed 3305.23 samples/sec   Loss 0.8159   LearningRate 0.0021   Epoch: 17   Global Step: 97340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:43,498-Speed 3407.32 samples/sec   Loss 0.7016   LearningRate 0.0021   Epoch: 17   Global Step: 97350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:46,522-Speed 3386.57 samples/sec   Loss 0.8588   LearningRate 0.0021   Epoch: 17   Global Step: 97360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:49,561-Speed 3369.68 samples/sec   Loss 0.7942   LearningRate 0.0021   Epoch: 17   Global Step: 97370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:52,584-Speed 3388.57 samples/sec   Loss 0.8794   LearningRate 0.0021   Epoch: 17   Global Step: 97380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:55,606-Speed 3388.69 samples/sec   Loss 0.8018   LearningRate 0.0021   Epoch: 17   Global Step: 97390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:30:58,633-Speed 3383.68 samples/sec   Loss 0.8181   LearningRate 0.0021   Epoch: 17   Global Step: 97400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:01,645-Speed 3401.66 samples/sec   Loss 0.7939   LearningRate 0.0021   Epoch: 17   Global Step: 97410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:04,673-Speed 3382.29 samples/sec   Loss 0.7785   LearningRate 0.0021   Epoch: 17   Global Step: 97420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:07,698-Speed 3386.19 samples/sec   Loss 0.8327   LearningRate 0.0021   Epoch: 17   Global Step: 97430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:10,723-Speed 3385.84 samples/sec   Loss 0.8686   LearningRate 0.0020   Epoch: 17   Global Step: 97440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:13,755-Speed 3377.82 samples/sec   Loss 0.8489   LearningRate 0.0020   Epoch: 17   Global Step: 97450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:16,777-Speed 3388.86 samples/sec   Loss 0.8021   LearningRate 0.0020   Epoch: 17   Global Step: 97460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:19,805-Speed 3382.70 samples/sec   Loss 0.8079   LearningRate 0.0020   Epoch: 17   Global Step: 97470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:22,850-Speed 3363.65 samples/sec   Loss 0.8359   LearningRate 0.0020   Epoch: 17   Global Step: 97480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:25,927-Speed 3328.95 samples/sec   Loss 0.8265   LearningRate 0.0020   Epoch: 17   Global Step: 97490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:28,953-Speed 3384.36 samples/sec   Loss 0.7852   LearningRate 0.0020   Epoch: 17   Global Step: 97500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:31:32,019-Speed 3341.26 samples/sec   Loss 0.7613   LearningRate 0.0020   Epoch: 17   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:35,068-Speed 3358.84 samples/sec   Loss 0.7547   LearningRate 0.0020   Epoch: 17   Global Step: 97520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:38,094-Speed 3384.68 samples/sec   Loss 0.7852   LearningRate 0.0020   Epoch: 17   Global Step: 97530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:41,118-Speed 3387.03 samples/sec   Loss 0.8086   LearningRate 0.0020   Epoch: 17   Global Step: 97540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:44,144-Speed 3385.13 samples/sec   Loss 0.7879   LearningRate 0.0020   Epoch: 17   Global Step: 97550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:47,173-Speed 3381.41 samples/sec   Loss 0.8322   LearningRate 0.0020   Epoch: 17   Global Step: 97560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:50,204-Speed 3379.42 samples/sec   Loss 0.7877   LearningRate 0.0020   Epoch: 17   Global Step: 97570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:53,232-Speed 3382.02 samples/sec   Loss 0.8197   LearningRate 0.0020   Epoch: 17   Global Step: 97580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:56,257-Speed 3386.47 samples/sec   Loss 0.7897   LearningRate 0.0020   Epoch: 17   Global Step: 97590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:31:59,292-Speed 3374.71 samples/sec   Loss 0.8231   LearningRate 0.0020   Epoch: 17   Global Step: 97600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:02,312-Speed 3391.49 samples/sec   Loss 0.8300   LearningRate 0.0020   Epoch: 17   Global Step: 97610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:05,349-Speed 3372.05 samples/sec   Loss 0.7348   LearningRate 0.0020   Epoch: 17   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:08,371-Speed 3389.28 samples/sec   Loss 0.8104   LearningRate 0.0020   Epoch: 17   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:11,404-Speed 3376.87 samples/sec   Loss 0.8523   LearningRate 0.0020   Epoch: 17   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:14,428-Speed 3387.26 samples/sec   Loss 0.8280   LearningRate 0.0020   Epoch: 17   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:17,451-Speed 3388.22 samples/sec   Loss 0.7855   LearningRate 0.0020   Epoch: 17   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:20,490-Speed 3370.72 samples/sec   Loss 0.7757   LearningRate 0.0020   Epoch: 17   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:23,516-Speed 3384.91 samples/sec   Loss 0.8272   LearningRate 0.0020   Epoch: 17   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:26,543-Speed 3382.92 samples/sec   Loss 0.8387   LearningRate 0.0020   Epoch: 17   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:29,570-Speed 3384.29 samples/sec   Loss 0.7754   LearningRate 0.0020   Epoch: 17   Global Step: 97700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:32,598-Speed 3382.85 samples/sec   Loss 0.8228   LearningRate 0.0020   Epoch: 17   Global Step: 97710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:32:35,605-Speed 3405.26 samples/sec   Loss 0.7825   LearningRate 0.0020   Epoch: 17   Global Step: 97720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:38,671-Speed 3340.83 samples/sec   Loss 0.7712   LearningRate 0.0020   Epoch: 17   Global Step: 97730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:41,702-Speed 3379.23 samples/sec   Loss 0.7272   LearningRate 0.0020   Epoch: 17   Global Step: 97740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:44,735-Speed 3376.67 samples/sec   Loss 0.8576   LearningRate 0.0020   Epoch: 17   Global Step: 97750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:47,761-Speed 3385.26 samples/sec   Loss 0.7548   LearningRate 0.0020   Epoch: 17   Global Step: 97760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:50,791-Speed 3380.50 samples/sec   Loss 0.8174   LearningRate 0.0020   Epoch: 17   Global Step: 97770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:53,824-Speed 3377.20 samples/sec   Loss 0.8804   LearningRate 0.0020   Epoch: 17   Global Step: 97780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:56,846-Speed 3389.40 samples/sec   Loss 0.8023   LearningRate 0.0020   Epoch: 17   Global Step: 97790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:32:59,881-Speed 3374.44 samples/sec   Loss 0.7718   LearningRate 0.0020   Epoch: 17   Global Step: 97800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:02,955-Speed 3331.54 samples/sec   Loss 0.8363   LearningRate 0.0020   Epoch: 17   Global Step: 97810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:05,971-Speed 3396.37 samples/sec   Loss 0.8267   LearningRate 0.0020   Epoch: 17   Global Step: 97820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:08,997-Speed 3384.23 samples/sec   Loss 0.7385   LearningRate 0.0020   Epoch: 17   Global Step: 97830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:12,032-Speed 3375.64 samples/sec   Loss 0.7919   LearningRate 0.0019   Epoch: 17   Global Step: 97840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:15,075-Speed 3365.77 samples/sec   Loss 0.8298   LearningRate 0.0019   Epoch: 17   Global Step: 97850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:18,104-Speed 3380.73 samples/sec   Loss 0.7695   LearningRate 0.0019   Epoch: 17   Global Step: 97860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:21,129-Speed 3386.86 samples/sec   Loss 0.8285   LearningRate 0.0019   Epoch: 17   Global Step: 97870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:24,164-Speed 3374.37 samples/sec   Loss 0.8209   LearningRate 0.0019   Epoch: 17   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:27,198-Speed 3375.90 samples/sec   Loss 0.7795   LearningRate 0.0019   Epoch: 17   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:30,225-Speed 3383.49 samples/sec   Loss 0.8110   LearningRate 0.0019   Epoch: 17   Global Step: 97900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:33,252-Speed 3383.71 samples/sec   Loss 0.8109   LearningRate 0.0019   Epoch: 17   Global Step: 97910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:36,262-Speed 3402.79 samples/sec   Loss 0.7961   LearningRate 0.0019   Epoch: 17   Global Step: 97920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:39,297-Speed 3374.87 samples/sec   Loss 0.7930   LearningRate 0.0019   Epoch: 17   Global Step: 97930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:42,322-Speed 3385.04 samples/sec   Loss 0.8051   LearningRate 0.0019   Epoch: 17   Global Step: 97940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:45,349-Speed 3384.27 samples/sec   Loss 0.7941   LearningRate 0.0019   Epoch: 17   Global Step: 97950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:48,385-Speed 3373.88 samples/sec   Loss 0.7763   LearningRate 0.0019   Epoch: 17   Global Step: 97960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:51,412-Speed 3384.24 samples/sec   Loss 0.8450   LearningRate 0.0019   Epoch: 17   Global Step: 97970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:54,437-Speed 3384.96 samples/sec   Loss 0.7952   LearningRate 0.0019   Epoch: 17   Global Step: 97980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:33:57,488-Speed 3357.68 samples/sec   Loss 0.8232   LearningRate 0.0019   Epoch: 17   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:34:00,591-Speed 3302.58 samples/sec   Loss 0.7851   LearningRate 0.0019   Epoch: 17   Global Step: 98000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:34:44,358-[lfw][98000]XNorm: 21.285736
Training: 2022-04-27 11:34:44,358-[lfw][98000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-27 11:34:44,359-[lfw][98000]Accuracy-Highest: 0.99817
Training: 2022-04-27 11:35:35,015-[cfp_fp][98000]XNorm: 21.325338
Training: 2022-04-27 11:35:35,016-[cfp_fp][98000]Accuracy-Flip: 0.98471+-0.00564
Training: 2022-04-27 11:35:35,016-[cfp_fp][98000]Accuracy-Highest: 0.98471
Training: 2022-04-27 11:36:18,550-[agedb_30][98000]XNorm: 22.087762
Training: 2022-04-27 11:36:18,551-[agedb_30][98000]Accuracy-Flip: 0.98100+-0.00923
Training: 2022-04-27 11:36:18,551-[agedb_30][98000]Accuracy-Highest: 0.98233
Training: 2022-04-27 11:36:21,586-Speed 72.63 samples/sec   Loss 0.7787   LearningRate 0.0019   Epoch: 17   Global Step: 98010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:36:24,581-Speed 3419.39 samples/sec   Loss 0.8522   LearningRate 0.0019   Epoch: 17   Global Step: 98020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:36:27,588-Speed 3407.37 samples/sec   Loss 0.8012   LearningRate 0.0019   Epoch: 17   Global Step: 98030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:36:30,597-Speed 3403.19 samples/sec   Loss 0.7725   LearningRate 0.0019   Epoch: 17   Global Step: 98040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:36:33,590-Speed 3422.53 samples/sec   Loss 0.8154   LearningRate 0.0019   Epoch: 17   Global Step: 98050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:36,679-Speed 3314.73 samples/sec   Loss 0.8335   LearningRate 0.0019   Epoch: 17   Global Step: 98060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:39,688-Speed 3403.79 samples/sec   Loss 0.8766   LearningRate 0.0019   Epoch: 17   Global Step: 98070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:42,696-Speed 3405.33 samples/sec   Loss 0.7831   LearningRate 0.0019   Epoch: 17   Global Step: 98080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:45,704-Speed 3404.75 samples/sec   Loss 0.8351   LearningRate 0.0019   Epoch: 17   Global Step: 98090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:48,715-Speed 3402.25 samples/sec   Loss 0.7524   LearningRate 0.0019   Epoch: 17   Global Step: 98100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:51,741-Speed 3385.03 samples/sec   Loss 0.8130   LearningRate 0.0019   Epoch: 17   Global Step: 98110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:54,769-Speed 3382.59 samples/sec   Loss 0.7786   LearningRate 0.0019   Epoch: 17   Global Step: 98120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:36:57,784-Speed 3396.46 samples/sec   Loss 0.8162   LearningRate 0.0019   Epoch: 17   Global Step: 98130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:37:00,803-Speed 3393.72 samples/sec   Loss 0.7367   LearningRate 0.0019   Epoch: 17   Global Step: 98140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:37:03,884-Speed 3323.86 samples/sec   Loss 0.7505   LearningRate 0.0019   Epoch: 17   Global Step: 98150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:06,925-Speed 3367.63 samples/sec   Loss 0.8464   LearningRate 0.0019   Epoch: 17   Global Step: 98160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:09,937-Speed 3400.35 samples/sec   Loss 0.8288   LearningRate 0.0019   Epoch: 17   Global Step: 98170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:12,964-Speed 3383.75 samples/sec   Loss 0.7568   LearningRate 0.0019   Epoch: 17   Global Step: 98180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:15,974-Speed 3402.78 samples/sec   Loss 0.8101   LearningRate 0.0019   Epoch: 17   Global Step: 98190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:18,991-Speed 3396.02 samples/sec   Loss 0.8236   LearningRate 0.0019   Epoch: 17   Global Step: 98200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:22,017-Speed 3383.96 samples/sec   Loss 0.8002   LearningRate 0.0019   Epoch: 17   Global Step: 98210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:25,043-Speed 3384.98 samples/sec   Loss 0.8382   LearningRate 0.0019   Epoch: 17   Global Step: 98220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:28,061-Speed 3393.87 samples/sec   Loss 0.7616   LearningRate 0.0019   Epoch: 17   Global Step: 98230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:31,079-Speed 3394.30 samples/sec   Loss 0.8188   LearningRate 0.0019   Epoch: 17   Global Step: 98240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:34,089-Speed 3401.94 samples/sec   Loss 0.8271   LearningRate 0.0019   Epoch: 17   Global Step: 98250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:37:37,092-Speed 3410.49 samples/sec   Loss 0.8310   LearningRate 0.0018   Epoch: 17   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:40,112-Speed 3392.01 samples/sec   Loss 0.8261   LearningRate 0.0018   Epoch: 17   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:43,127-Speed 3397.07 samples/sec   Loss 0.7760   LearningRate 0.0018   Epoch: 17   Global Step: 98280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:46,138-Speed 3401.98 samples/sec   Loss 0.8620   LearningRate 0.0018   Epoch: 17   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:49,152-Speed 3398.41 samples/sec   Loss 0.7657   LearningRate 0.0018   Epoch: 17   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:52,167-Speed 3397.44 samples/sec   Loss 0.7681   LearningRate 0.0018   Epoch: 17   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:55,192-Speed 3385.41 samples/sec   Loss 0.8107   LearningRate 0.0018   Epoch: 17   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:37:58,211-Speed 3392.84 samples/sec   Loss 0.8378   LearningRate 0.0018   Epoch: 17   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:01,248-Speed 3372.32 samples/sec   Loss 0.8533   LearningRate 0.0018   Epoch: 17   Global Step: 98340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:04,263-Speed 3397.89 samples/sec   Loss 0.7978   LearningRate 0.0018   Epoch: 17   Global Step: 98350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:07,281-Speed 3394.21 samples/sec   Loss 0.9189   LearningRate 0.0018   Epoch: 17   Global Step: 98360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:38:10,280-Speed 3415.06 samples/sec   Loss 0.8280   LearningRate 0.0018   Epoch: 17   Global Step: 98370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:13,292-Speed 3400.98 samples/sec   Loss 0.8180   LearningRate 0.0018   Epoch: 17   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:16,311-Speed 3392.37 samples/sec   Loss 0.8023   LearningRate 0.0018   Epoch: 17   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:19,330-Speed 3392.92 samples/sec   Loss 0.7937   LearningRate 0.0018   Epoch: 17   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:22,343-Speed 3398.86 samples/sec   Loss 0.8844   LearningRate 0.0018   Epoch: 17   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:25,361-Speed 3393.76 samples/sec   Loss 0.8103   LearningRate 0.0018   Epoch: 17   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:38:28,361-Speed 3414.21 samples/sec   Loss 0.7305   LearningRate 0.0018   Epoch: 17   Global Step: 98430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:31,372-Speed 3402.08 samples/sec   Loss 0.8654   LearningRate 0.0018   Epoch: 17   Global Step: 98440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:34,387-Speed 3396.46 samples/sec   Loss 0.7039   LearningRate 0.0018   Epoch: 17   Global Step: 98450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:37,402-Speed 3396.74 samples/sec   Loss 0.8918   LearningRate 0.0018   Epoch: 17   Global Step: 98460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:40,416-Speed 3398.80 samples/sec   Loss 0.7929   LearningRate 0.0018   Epoch: 17   Global Step: 98470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:43,863-Speed 3393.57 samples/sec   Loss 0.8379   LearningRate 0.0018   Epoch: 17   Global Step: 98480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:46,880-Speed 3395.90 samples/sec   Loss 0.8004   LearningRate 0.0018   Epoch: 17   Global Step: 98490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:49,896-Speed 3395.87 samples/sec   Loss 0.8051   LearningRate 0.0018   Epoch: 17   Global Step: 98500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:52,910-Speed 3398.29 samples/sec   Loss 0.8313   LearningRate 0.0018   Epoch: 17   Global Step: 98510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:55,920-Speed 3401.90 samples/sec   Loss 0.8362   LearningRate 0.0018   Epoch: 17   Global Step: 98520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:38:58,927-Speed 3406.53 samples/sec   Loss 0.8328   LearningRate 0.0018   Epoch: 17   Global Step: 98530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:39:01,951-Speed 3386.84 samples/sec   Loss 0.7415   LearningRate 0.0018   Epoch: 17   Global Step: 98540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:39:04,948-Speed 3417.95 samples/sec   Loss 0.7755   LearningRate 0.0018   Epoch: 17   Global Step: 98550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:07,964-Speed 3395.22 samples/sec   Loss 0.8355   LearningRate 0.0018   Epoch: 17   Global Step: 98560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:10,984-Speed 3392.47 samples/sec   Loss 0.9174   LearningRate 0.0018   Epoch: 17   Global Step: 98570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:13,998-Speed 3398.35 samples/sec   Loss 0.8423   LearningRate 0.0018   Epoch: 17   Global Step: 98580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:17,007-Speed 3403.49 samples/sec   Loss 0.8650   LearningRate 0.0018   Epoch: 17   Global Step: 98590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:20,018-Speed 3401.11 samples/sec   Loss 0.8118   LearningRate 0.0018   Epoch: 17   Global Step: 98600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:23,033-Speed 3397.89 samples/sec   Loss 0.8233   LearningRate 0.0018   Epoch: 17   Global Step: 98610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:26,046-Speed 3399.55 samples/sec   Loss 0.8221   LearningRate 0.0018   Epoch: 17   Global Step: 98620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:29,072-Speed 3384.51 samples/sec   Loss 0.8493   LearningRate 0.0018   Epoch: 17   Global Step: 98630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:32,084-Speed 3400.18 samples/sec   Loss 0.6967   LearningRate 0.0018   Epoch: 17   Global Step: 98640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:35,098-Speed 3398.90 samples/sec   Loss 0.7510   LearningRate 0.0018   Epoch: 17   Global Step: 98650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:39:38,101-Speed 3410.17 samples/sec   Loss 0.8314   LearningRate 0.0018   Epoch: 17   Global Step: 98660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:41,116-Speed 3397.22 samples/sec   Loss 0.7981   LearningRate 0.0018   Epoch: 17   Global Step: 98670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:44,132-Speed 3396.63 samples/sec   Loss 0.8682   LearningRate 0.0017   Epoch: 17   Global Step: 98680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:47,153-Speed 3390.17 samples/sec   Loss 0.8739   LearningRate 0.0017   Epoch: 17   Global Step: 98690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:50,168-Speed 3396.79 samples/sec   Loss 0.8338   LearningRate 0.0017   Epoch: 17   Global Step: 98700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:53,185-Speed 3394.57 samples/sec   Loss 0.8777   LearningRate 0.0017   Epoch: 17   Global Step: 98710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:56,217-Speed 3378.69 samples/sec   Loss 0.7962   LearningRate 0.0017   Epoch: 17   Global Step: 98720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:39:59,282-Speed 3340.85 samples/sec   Loss 0.8131   LearningRate 0.0017   Epoch: 17   Global Step: 98730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:02,302-Speed 3391.42 samples/sec   Loss 0.8077   LearningRate 0.0017   Epoch: 17   Global Step: 98740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:05,318-Speed 3396.49 samples/sec   Loss 0.7828   LearningRate 0.0017   Epoch: 17   Global Step: 98750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:08,334-Speed 3396.93 samples/sec   Loss 0.8342   LearningRate 0.0017   Epoch: 17   Global Step: 98760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:11,370-Speed 3373.35 samples/sec   Loss 0.8703   LearningRate 0.0017   Epoch: 17   Global Step: 98770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:14,446-Speed 3328.96 samples/sec   Loss 0.7781   LearningRate 0.0017   Epoch: 17   Global Step: 98780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:17,482-Speed 3373.80 samples/sec   Loss 0.8334   LearningRate 0.0017   Epoch: 17   Global Step: 98790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:20,497-Speed 3397.88 samples/sec   Loss 0.8074   LearningRate 0.0017   Epoch: 17   Global Step: 98800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:23,514-Speed 3394.74 samples/sec   Loss 0.8988   LearningRate 0.0017   Epoch: 17   Global Step: 98810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:26,536-Speed 3388.37 samples/sec   Loss 0.8271   LearningRate 0.0017   Epoch: 17   Global Step: 98820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:29,555-Speed 3392.92 samples/sec   Loss 0.8522   LearningRate 0.0017   Epoch: 17   Global Step: 98830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:40:32,557-Speed 3412.38 samples/sec   Loss 0.8143   LearningRate 0.0017   Epoch: 17   Global Step: 98840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:35,576-Speed 3393.21 samples/sec   Loss 0.8744   LearningRate 0.0017   Epoch: 17   Global Step: 98850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:38,603-Speed 3383.25 samples/sec   Loss 0.8181   LearningRate 0.0017   Epoch: 17   Global Step: 98860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:41,623-Speed 3390.96 samples/sec   Loss 0.8865   LearningRate 0.0017   Epoch: 17   Global Step: 98870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:44,643-Speed 3392.03 samples/sec   Loss 0.7637   LearningRate 0.0017   Epoch: 17   Global Step: 98880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:47,664-Speed 3389.83 samples/sec   Loss 0.7711   LearningRate 0.0017   Epoch: 17   Global Step: 98890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:50,685-Speed 3390.48 samples/sec   Loss 0.8365   LearningRate 0.0017   Epoch: 17   Global Step: 98900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:53,705-Speed 3391.88 samples/sec   Loss 0.8233   LearningRate 0.0017   Epoch: 17   Global Step: 98910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:56,728-Speed 3387.40 samples/sec   Loss 0.8421   LearningRate 0.0017   Epoch: 17   Global Step: 98920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:40:59,757-Speed 3381.59 samples/sec   Loss 0.8449   LearningRate 0.0017   Epoch: 17   Global Step: 98930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 11:41:02,829-Speed 3334.49 samples/sec   Loss 0.7784   LearningRate 0.0017   Epoch: 17   Global Step: 98940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:05,907-Speed 3328.08 samples/sec   Loss 0.7573   LearningRate 0.0017   Epoch: 17   Global Step: 98950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:08,924-Speed 3395.09 samples/sec   Loss 0.8482   LearningRate 0.0017   Epoch: 17   Global Step: 98960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:11,942-Speed 3393.57 samples/sec   Loss 0.8138   LearningRate 0.0017   Epoch: 17   Global Step: 98970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:14,967-Speed 3385.33 samples/sec   Loss 0.7575   LearningRate 0.0017   Epoch: 17   Global Step: 98980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:17,986-Speed 3393.12 samples/sec   Loss 0.8395   LearningRate 0.0017   Epoch: 17   Global Step: 98990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:21,001-Speed 3396.44 samples/sec   Loss 0.8014   LearningRate 0.0017   Epoch: 17   Global Step: 99000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:24,022-Speed 3390.53 samples/sec   Loss 0.8331   LearningRate 0.0017   Epoch: 17   Global Step: 99010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:27,053-Speed 3379.37 samples/sec   Loss 0.7519   LearningRate 0.0017   Epoch: 17   Global Step: 99020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:30,147-Speed 3310.59 samples/sec   Loss 0.8537   LearningRate 0.0017   Epoch: 17   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:33,180-Speed 3376.35 samples/sec   Loss 0.8041   LearningRate 0.0017   Epoch: 17   Global Step: 99040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-27 11:41:36,184-Speed 3410.66 samples/sec   Loss 0.8082   LearningRate 0.0017   Epoch: 17   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:39,207-Speed 3387.42 samples/sec   Loss 0.7949   LearningRate 0.0017   Epoch: 17   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:42,225-Speed 3393.79 samples/sec   Loss 0.8227   LearningRate 0.0017   Epoch: 17   Global Step: 99070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:45,242-Speed 3394.37 samples/sec   Loss 0.7808   LearningRate 0.0017   Epoch: 17   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:48,258-Speed 3396.25 samples/sec   Loss 0.7792   LearningRate 0.0017   Epoch: 17   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:51,280-Speed 3389.30 samples/sec   Loss 0.7896   LearningRate 0.0017   Epoch: 17   Global Step: 99100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:54,301-Speed 3391.20 samples/sec   Loss 0.8153   LearningRate 0.0017   Epoch: 17   Global Step: 99110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:41:57,321-Speed 3390.74 samples/sec   Loss 0.8677   LearningRate 0.0016   Epoch: 17   Global Step: 99120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:42:00,346-Speed 3386.76 samples/sec   Loss 0.9056   LearningRate 0.0016   Epoch: 17   Global Step: 99130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:42:03,364-Speed 3393.31 samples/sec   Loss 0.8052   LearningRate 0.0016   Epoch: 17   Global Step: 99140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:42:06,364-Speed 3413.66 samples/sec   Loss 0.8314   LearningRate 0.0016   Epoch: 17   Global Step: 99150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:42:09,383-Speed 3392.76 samples/sec   Loss 0.8558   LearningRate 0.0016   Epoch: 17   Global Step: 99160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:42:12,414-Speed 3378.77 samples/sec   Loss 0.8401   LearningRate 0.0016   Epoch: 17   Global Step: 99170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:42:15,431-Speed 3395.60 samples/sec   Loss 0.7526   LearningRate 0.0016   Epoch: 17   Global Step: 99180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 11:42:18,457-Speed 3384.82 samples/sec   Loss 0.8001   LearningRate 0.0016   Epoch: 17   Global Step: 99190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:42:21,481-Speed 3388.11 samples/sec   Loss 0.8051   LearningRate 0.0016   Epoch: 17   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:42:24,549-Speed 3337.94 samples/sec   Loss 0.8187   LearningRate 0.0016   Epoch: 17   Global Step: 99210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:42:27,588-Speed 3370.75 samples/sec   Loss 0.7827   LearningRate 0.0016   Epoch: 17   Global Step: 99220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:42:30,613-Speed 3385.06 samples/sec   Loss 0.9193   LearningRate 0.0016   Epoch: 17   Global Step: 99230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:42:33,635-Speed 3389.24 samples/sec   Loss 0.8814   LearningRate 0.0016   Epoch: 17   Global Step: 99240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:42:36,652-Speed 3395.78 samples/sec   Loss 0.8269   LearningRate 0.0016   Epoch: 17   Global Step: 99250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:42:39,670-Speed 3393.10 samples/sec   Loss 0.8331   LearningRate 0.0016   Epoch: 17   Global Step: 99260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:42:42,692-Speed 3389.50 samples/sec   Loss 0.7834   LearningRate 0.0016   Epoch: 17   Global Step: 99270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:42:45,713-Speed 3390.01 samples/sec   Loss 0.8623   LearningRate 0.0016   Epoch: 17   Global Step: 99280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:42:48,744-Speed 3380.17 samples/sec   Loss 0.8657   LearningRate 0.0016   Epoch: 17   Global Step: 99290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:42:51,767-Speed 3387.58 samples/sec   Loss 0.8378   LearningRate 0.0016   Epoch: 17   Global Step: 99300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:42:54,784-Speed 3395.44 samples/sec   Loss 0.7932   LearningRate 0.0016   Epoch: 17   Global Step: 99310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:42:57,833-Speed 3359.50 samples/sec   Loss 0.8518   LearningRate 0.0016   Epoch: 17   Global Step: 99320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:43:00,867-Speed 3375.15 samples/sec   Loss 0.7850   LearningRate 0.0016   Epoch: 17   Global Step: 99330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:43:03,920-Speed 3354.39 samples/sec   Loss 0.8823   LearningRate 0.0016   Epoch: 17   Global Step: 99340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:43:06,946-Speed 3385.81 samples/sec   Loss 0.9058   LearningRate 0.0016   Epoch: 17   Global Step: 99350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:09,972-Speed 3384.04 samples/sec   Loss 0.7666   LearningRate 0.0016   Epoch: 17   Global Step: 99360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:13,005-Speed 3376.75 samples/sec   Loss 0.8661   LearningRate 0.0016   Epoch: 17   Global Step: 99370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:16,035-Speed 3380.55 samples/sec   Loss 0.7471   LearningRate 0.0016   Epoch: 17   Global Step: 99380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:19,060-Speed 3386.07 samples/sec   Loss 0.7727   LearningRate 0.0016   Epoch: 17   Global Step: 99390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:22,095-Speed 3374.97 samples/sec   Loss 0.8243   LearningRate 0.0016   Epoch: 17   Global Step: 99400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:25,129-Speed 3376.31 samples/sec   Loss 0.8707   LearningRate 0.0016   Epoch: 17   Global Step: 99410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:28,196-Speed 3339.29 samples/sec   Loss 0.8344   LearningRate 0.0016   Epoch: 17   Global Step: 99420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:31,217-Speed 3389.64 samples/sec   Loss 0.8733   LearningRate 0.0016   Epoch: 17   Global Step: 99430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:34,270-Speed 3355.15 samples/sec   Loss 0.8355   LearningRate 0.0016   Epoch: 17   Global Step: 99440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:37,415-Speed 3256.41 samples/sec   Loss 0.8387   LearningRate 0.0016   Epoch: 17   Global Step: 99450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:40,451-Speed 3374.06 samples/sec   Loss 0.8282   LearningRate 0.0016   Epoch: 17   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:43,482-Speed 3379.64 samples/sec   Loss 0.7353   LearningRate 0.0016   Epoch: 17   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:46,513-Speed 3379.13 samples/sec   Loss 0.8784   LearningRate 0.0016   Epoch: 17   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:49,535-Speed 3389.09 samples/sec   Loss 0.8289   LearningRate 0.0016   Epoch: 17   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:43:52,543-Speed 3404.79 samples/sec   Loss 0.7944   LearningRate 0.0016   Epoch: 17   Global Step: 99500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:43:55,565-Speed 3389.36 samples/sec   Loss 0.8282   LearningRate 0.0016   Epoch: 17   Global Step: 99510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:43:58,590-Speed 3385.91 samples/sec   Loss 0.8873   LearningRate 0.0016   Epoch: 17   Global Step: 99520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:01,617-Speed 3383.61 samples/sec   Loss 0.8050   LearningRate 0.0016   Epoch: 17   Global Step: 99530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:04,645-Speed 3382.89 samples/sec   Loss 0.8036   LearningRate 0.0016   Epoch: 17   Global Step: 99540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:07,674-Speed 3381.24 samples/sec   Loss 0.7798   LearningRate 0.0016   Epoch: 17   Global Step: 99550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:10,696-Speed 3389.24 samples/sec   Loss 0.7250   LearningRate 0.0016   Epoch: 17   Global Step: 99560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:13,773-Speed 3328.95 samples/sec   Loss 0.8364   LearningRate 0.0015   Epoch: 17   Global Step: 99570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:16,796-Speed 3387.68 samples/sec   Loss 0.8089   LearningRate 0.0015   Epoch: 17   Global Step: 99580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:19,820-Speed 3387.68 samples/sec   Loss 0.8137   LearningRate 0.0015   Epoch: 17   Global Step: 99590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:44:22,847-Speed 3383.75 samples/sec   Loss 0.8714   LearningRate 0.0015   Epoch: 17   Global Step: 99600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:25,899-Speed 3355.66 samples/sec   Loss 0.8478   LearningRate 0.0015   Epoch: 17   Global Step: 99610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:28,942-Speed 3366.05 samples/sec   Loss 0.7931   LearningRate 0.0015   Epoch: 17   Global Step: 99620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:31,969-Speed 3383.53 samples/sec   Loss 0.7688   LearningRate 0.0015   Epoch: 17   Global Step: 99630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:34,998-Speed 3380.71 samples/sec   Loss 0.8183   LearningRate 0.0015   Epoch: 17   Global Step: 99640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:38,027-Speed 3382.54 samples/sec   Loss 0.8743   LearningRate 0.0015   Epoch: 17   Global Step: 99650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:41,071-Speed 3364.78 samples/sec   Loss 0.7793   LearningRate 0.0015   Epoch: 17   Global Step: 99660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:44,107-Speed 3372.93 samples/sec   Loss 0.8914   LearningRate 0.0015   Epoch: 17   Global Step: 99670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:47,133-Speed 3384.98 samples/sec   Loss 0.8480   LearningRate 0.0015   Epoch: 17   Global Step: 99680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:50,268-Speed 3266.88 samples/sec   Loss 0.8167   LearningRate 0.0015   Epoch: 17   Global Step: 99690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:53,344-Speed 3330.18 samples/sec   Loss 0.8091   LearningRate 0.0015   Epoch: 17   Global Step: 99700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:44:56,356-Speed 3399.71 samples/sec   Loss 0.9036   LearningRate 0.0015   Epoch: 17   Global Step: 99710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:44:59,379-Speed 3388.55 samples/sec   Loss 0.7759   LearningRate 0.0015   Epoch: 17   Global Step: 99720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:02,443-Speed 3343.29 samples/sec   Loss 0.7540   LearningRate 0.0015   Epoch: 17   Global Step: 99730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:05,542-Speed 3305.15 samples/sec   Loss 0.8012   LearningRate 0.0015   Epoch: 17   Global Step: 99740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:08,564-Speed 3388.57 samples/sec   Loss 0.8358   LearningRate 0.0015   Epoch: 17   Global Step: 99750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:11,583-Speed 3393.53 samples/sec   Loss 0.8736   LearningRate 0.0015   Epoch: 17   Global Step: 99760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:14,638-Speed 3351.98 samples/sec   Loss 0.7465   LearningRate 0.0015   Epoch: 17   Global Step: 99770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:17,662-Speed 3386.78 samples/sec   Loss 0.8649   LearningRate 0.0015   Epoch: 17   Global Step: 99780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:20,686-Speed 3387.14 samples/sec   Loss 0.7552   LearningRate 0.0015   Epoch: 17   Global Step: 99790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:23,800-Speed 3289.36 samples/sec   Loss 0.7903   LearningRate 0.0015   Epoch: 17   Global Step: 99800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:26,872-Speed 3333.46 samples/sec   Loss 0.9074   LearningRate 0.0015   Epoch: 17   Global Step: 99810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:45:29,893-Speed 3391.32 samples/sec   Loss 0.8327   LearningRate 0.0015   Epoch: 17   Global Step: 99820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:32,938-Speed 3363.96 samples/sec   Loss 0.8823   LearningRate 0.0015   Epoch: 17   Global Step: 99830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:35,965-Speed 3383.16 samples/sec   Loss 0.8429   LearningRate 0.0015   Epoch: 17   Global Step: 99840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:38,992-Speed 3384.24 samples/sec   Loss 0.8276   LearningRate 0.0015   Epoch: 17   Global Step: 99850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:42,016-Speed 3386.86 samples/sec   Loss 0.7808   LearningRate 0.0015   Epoch: 17   Global Step: 99860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:45,039-Speed 3387.72 samples/sec   Loss 0.8565   LearningRate 0.0015   Epoch: 17   Global Step: 99870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:48,092-Speed 3354.32 samples/sec   Loss 0.8761   LearningRate 0.0015   Epoch: 17   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:51,137-Speed 3363.86 samples/sec   Loss 0.7561   LearningRate 0.0015   Epoch: 17   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:54,162-Speed 3386.40 samples/sec   Loss 0.8701   LearningRate 0.0015   Epoch: 17   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:45:57,186-Speed 3386.48 samples/sec   Loss 0.8443   LearningRate 0.0015   Epoch: 17   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:00,202-Speed 3397.16 samples/sec   Loss 0.8330   LearningRate 0.0015   Epoch: 17   Global Step: 99920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:03,224-Speed 3389.47 samples/sec   Loss 0.8380   LearningRate 0.0015   Epoch: 17   Global Step: 99930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:06,252-Speed 3381.74 samples/sec   Loss 0.7886   LearningRate 0.0015   Epoch: 17   Global Step: 99940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:09,274-Speed 3389.09 samples/sec   Loss 0.8672   LearningRate 0.0015   Epoch: 17   Global Step: 99950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:12,298-Speed 3387.27 samples/sec   Loss 0.8146   LearningRate 0.0015   Epoch: 17   Global Step: 99960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:15,329-Speed 3379.72 samples/sec   Loss 0.8554   LearningRate 0.0015   Epoch: 17   Global Step: 99970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:18,354-Speed 3384.91 samples/sec   Loss 0.8065   LearningRate 0.0015   Epoch: 17   Global Step: 99980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:21,378-Speed 3387.18 samples/sec   Loss 0.7915   LearningRate 0.0015   Epoch: 17   Global Step: 99990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:46:24,404-Speed 3384.98 samples/sec   Loss 0.7611   LearningRate 0.0015   Epoch: 17   Global Step: 100000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:47:07,682-[lfw][100000]XNorm: 21.826810
Training: 2022-04-27 11:47:07,682-[lfw][100000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-27 11:47:07,683-[lfw][100000]Accuracy-Highest: 0.99817
Training: 2022-04-27 11:47:58,371-[cfp_fp][100000]XNorm: 21.727874
Training: 2022-04-27 11:47:58,372-[cfp_fp][100000]Accuracy-Flip: 0.98614+-0.00593
Training: 2022-04-27 11:47:58,372-[cfp_fp][100000]Accuracy-Highest: 0.98614
Training: 2022-04-27 11:48:41,865-[agedb_30][100000]XNorm: 22.020445
Training: 2022-04-27 11:48:41,866-[agedb_30][100000]Accuracy-Flip: 0.97983+-0.00883
Training: 2022-04-27 11:48:41,866-[agedb_30][100000]Accuracy-Highest: 0.98233
Training: 2022-04-27 11:48:44,856-Speed 72.91 samples/sec   Loss 0.8884   LearningRate 0.0015   Epoch: 17   Global Step: 100010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:48:47,879-Speed 3388.39 samples/sec   Loss 0.7642   LearningRate 0.0015   Epoch: 17   Global Step: 100020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:48:50,882-Speed 3411.19 samples/sec   Loss 0.7448   LearningRate 0.0014   Epoch: 17   Global Step: 100030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:48:53,881-Speed 3415.48 samples/sec   Loss 0.8352   LearningRate 0.0014   Epoch: 17   Global Step: 100040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:48:56,884-Speed 3409.82 samples/sec   Loss 0.7856   LearningRate 0.0014   Epoch: 17   Global Step: 100050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:48:59,927-Speed 3365.95 samples/sec   Loss 0.8482   LearningRate 0.0014   Epoch: 17   Global Step: 100060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:49:02,938-Speed 3401.35 samples/sec   Loss 0.7606   LearningRate 0.0014   Epoch: 17   Global Step: 100070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:49:05,946-Speed 3405.54 samples/sec   Loss 0.8731   LearningRate 0.0014   Epoch: 17   Global Step: 100080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:49:08,959-Speed 3399.33 samples/sec   Loss 0.8663   LearningRate 0.0014   Epoch: 17   Global Step: 100090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:49:11,976-Speed 3396.05 samples/sec   Loss 0.8777   LearningRate 0.0014   Epoch: 17   Global Step: 100100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:49:14,995-Speed 3392.67 samples/sec   Loss 0.7943   LearningRate 0.0014   Epoch: 17   Global Step: 100110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:18,003-Speed 3404.22 samples/sec   Loss 0.8326   LearningRate 0.0014   Epoch: 17   Global Step: 100120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:21,014-Speed 3402.16 samples/sec   Loss 0.8284   LearningRate 0.0014   Epoch: 17   Global Step: 100130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:24,035-Speed 3389.96 samples/sec   Loss 0.8357   LearningRate 0.0014   Epoch: 17   Global Step: 100140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:27,059-Speed 3387.21 samples/sec   Loss 0.8420   LearningRate 0.0014   Epoch: 17   Global Step: 100150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:30,075-Speed 3396.12 samples/sec   Loss 0.8482   LearningRate 0.0014   Epoch: 17   Global Step: 100160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:33,096-Speed 3389.92 samples/sec   Loss 0.8337   LearningRate 0.0014   Epoch: 17   Global Step: 100170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:36,123-Speed 3384.35 samples/sec   Loss 0.7482   LearningRate 0.0014   Epoch: 17   Global Step: 100180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:39,150-Speed 3383.55 samples/sec   Loss 0.8048   LearningRate 0.0014   Epoch: 17   Global Step: 100190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:42,173-Speed 3387.87 samples/sec   Loss 0.8378   LearningRate 0.0014   Epoch: 17   Global Step: 100200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:45,172-Speed 3415.86 samples/sec   Loss 0.8195   LearningRate 0.0014   Epoch: 17   Global Step: 100210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:48,208-Speed 3373.65 samples/sec   Loss 0.8747   LearningRate 0.0014   Epoch: 17   Global Step: 100220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:49:51,223-Speed 3397.15 samples/sec   Loss 0.8567   LearningRate 0.0014   Epoch: 17   Global Step: 100230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:49:54,258-Speed 3374.74 samples/sec   Loss 0.7500   LearningRate 0.0014   Epoch: 17   Global Step: 100240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:49:57,292-Speed 3375.16 samples/sec   Loss 0.8279   LearningRate 0.0014   Epoch: 17   Global Step: 100250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:00,321-Speed 3381.01 samples/sec   Loss 0.7461   LearningRate 0.0014   Epoch: 17   Global Step: 100260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:03,357-Speed 3373.60 samples/sec   Loss 0.7825   LearningRate 0.0014   Epoch: 17   Global Step: 100270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:06,382-Speed 3386.82 samples/sec   Loss 0.7691   LearningRate 0.0014   Epoch: 17   Global Step: 100280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:09,410-Speed 3382.80 samples/sec   Loss 0.8660   LearningRate 0.0014   Epoch: 17   Global Step: 100290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:12,431-Speed 3389.87 samples/sec   Loss 0.8802   LearningRate 0.0014   Epoch: 17   Global Step: 100300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:15,468-Speed 3372.32 samples/sec   Loss 0.8422   LearningRate 0.0014   Epoch: 17   Global Step: 100310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:18,491-Speed 3387.96 samples/sec   Loss 0.8290   LearningRate 0.0014   Epoch: 17   Global Step: 100320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:50:21,506-Speed 3397.90 samples/sec   Loss 0.7841   LearningRate 0.0014   Epoch: 17   Global Step: 100330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:24,523-Speed 3394.68 samples/sec   Loss 0.8964   LearningRate 0.0014   Epoch: 17   Global Step: 100340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:27,531-Speed 3404.76 samples/sec   Loss 0.8165   LearningRate 0.0014   Epoch: 17   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:30,542-Speed 3401.19 samples/sec   Loss 0.8559   LearningRate 0.0014   Epoch: 17   Global Step: 100360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:33,563-Speed 3390.88 samples/sec   Loss 0.8668   LearningRate 0.0014   Epoch: 17   Global Step: 100370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:36,580-Speed 3394.79 samples/sec   Loss 0.8846   LearningRate 0.0014   Epoch: 17   Global Step: 100380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:39,598-Speed 3394.12 samples/sec   Loss 0.8907   LearningRate 0.0014   Epoch: 17   Global Step: 100390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:42,609-Speed 3402.11 samples/sec   Loss 0.8233   LearningRate 0.0014   Epoch: 17   Global Step: 100400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:45,633-Speed 3386.37 samples/sec   Loss 0.9233   LearningRate 0.0014   Epoch: 17   Global Step: 100410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:48,725-Speed 3312.30 samples/sec   Loss 0.8133   LearningRate 0.0014   Epoch: 17   Global Step: 100420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:51,749-Speed 3387.39 samples/sec   Loss 0.8182   LearningRate 0.0014   Epoch: 17   Global Step: 100430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:50:54,748-Speed 3415.38 samples/sec   Loss 0.8318   LearningRate 0.0014   Epoch: 17   Global Step: 100440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:50:57,774-Speed 3384.30 samples/sec   Loss 0.8675   LearningRate 0.0014   Epoch: 17   Global Step: 100450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:00,787-Speed 3399.93 samples/sec   Loss 0.7902   LearningRate 0.0014   Epoch: 17   Global Step: 100460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:03,807-Speed 3391.34 samples/sec   Loss 0.8391   LearningRate 0.0014   Epoch: 17   Global Step: 100470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:06,822-Speed 3397.03 samples/sec   Loss 0.8260   LearningRate 0.0014   Epoch: 17   Global Step: 100480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:09,831-Speed 3404.76 samples/sec   Loss 0.8124   LearningRate 0.0014   Epoch: 17   Global Step: 100490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:12,858-Speed 3383.00 samples/sec   Loss 0.8456   LearningRate 0.0014   Epoch: 17   Global Step: 100500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:15,897-Speed 3370.13 samples/sec   Loss 0.7673   LearningRate 0.0013   Epoch: 17   Global Step: 100510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:18,914-Speed 3394.84 samples/sec   Loss 0.8450   LearningRate 0.0013   Epoch: 17   Global Step: 100520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:21,928-Speed 3398.82 samples/sec   Loss 0.8891   LearningRate 0.0013   Epoch: 17   Global Step: 100530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:24,946-Speed 3393.91 samples/sec   Loss 0.8323   LearningRate 0.0013   Epoch: 17   Global Step: 100540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:51:27,956-Speed 3402.46 samples/sec   Loss 0.8043   LearningRate 0.0013   Epoch: 17   Global Step: 100550   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:51:30,961-Speed 3408.21 samples/sec   Loss 0.7945   LearningRate 0.0013   Epoch: 17   Global Step: 100560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:51:33,950-Speed 3427.30 samples/sec   Loss 0.7975   LearningRate 0.0013   Epoch: 17   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:36,964-Speed 3397.66 samples/sec   Loss 0.8391   LearningRate 0.0013   Epoch: 17   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:39,979-Speed 3397.63 samples/sec   Loss 0.7407   LearningRate 0.0013   Epoch: 17   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:42,990-Speed 3401.66 samples/sec   Loss 0.7932   LearningRate 0.0013   Epoch: 17   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:46,022-Speed 3377.51 samples/sec   Loss 0.8654   LearningRate 0.0013   Epoch: 17   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:49,037-Speed 3396.89 samples/sec   Loss 0.8166   LearningRate 0.0013   Epoch: 17   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:52,057-Speed 3391.60 samples/sec   Loss 0.7692   LearningRate 0.0013   Epoch: 17   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:55,068-Speed 3402.59 samples/sec   Loss 0.8255   LearningRate 0.0013   Epoch: 17   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:51:58,079-Speed 3401.38 samples/sec   Loss 0.7992   LearningRate 0.0013   Epoch: 17   Global Step: 100650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:01,116-Speed 3372.13 samples/sec   Loss 0.8548   LearningRate 0.0013   Epoch: 17   Global Step: 100660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:04,129-Speed 3400.38 samples/sec   Loss 0.7973   LearningRate 0.0013   Epoch: 17   Global Step: 100670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:52:07,120-Speed 3423.51 samples/sec   Loss 0.7402   LearningRate 0.0013   Epoch: 17   Global Step: 100680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:10,128-Speed 3405.99 samples/sec   Loss 0.7782   LearningRate 0.0013   Epoch: 17   Global Step: 100690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:13,155-Speed 3383.46 samples/sec   Loss 0.7576   LearningRate 0.0013   Epoch: 17   Global Step: 100700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:16,169-Speed 3398.12 samples/sec   Loss 0.8671   LearningRate 0.0013   Epoch: 17   Global Step: 100710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:19,177-Speed 3405.15 samples/sec   Loss 0.8331   LearningRate 0.0013   Epoch: 17   Global Step: 100720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:22,208-Speed 3378.80 samples/sec   Loss 0.8185   LearningRate 0.0013   Epoch: 17   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:25,219-Speed 3402.39 samples/sec   Loss 0.7798   LearningRate 0.0013   Epoch: 17   Global Step: 100740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:28,232-Speed 3399.29 samples/sec   Loss 0.8212   LearningRate 0.0013   Epoch: 17   Global Step: 100750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:31,244-Speed 3399.93 samples/sec   Loss 0.8982   LearningRate 0.0013   Epoch: 17   Global Step: 100760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:34,259-Speed 3396.92 samples/sec   Loss 0.8437   LearningRate 0.0013   Epoch: 17   Global Step: 100770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:37,288-Speed 3381.94 samples/sec   Loss 0.8422   LearningRate 0.0013   Epoch: 17   Global Step: 100780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:52:40,303-Speed 3397.62 samples/sec   Loss 0.8949   LearningRate 0.0013   Epoch: 17   Global Step: 100790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:43,317-Speed 3398.39 samples/sec   Loss 0.7821   LearningRate 0.0013   Epoch: 17   Global Step: 100800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:46,333-Speed 3395.28 samples/sec   Loss 0.8334   LearningRate 0.0013   Epoch: 17   Global Step: 100810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:49,344-Speed 3401.97 samples/sec   Loss 0.8211   LearningRate 0.0013   Epoch: 17   Global Step: 100820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:52,359-Speed 3397.09 samples/sec   Loss 0.7736   LearningRate 0.0013   Epoch: 17   Global Step: 100830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:55,372-Speed 3399.46 samples/sec   Loss 0.8380   LearningRate 0.0013   Epoch: 17   Global Step: 100840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:52:58,384-Speed 3400.43 samples/sec   Loss 0.8379   LearningRate 0.0013   Epoch: 17   Global Step: 100850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:01,421-Speed 3373.34 samples/sec   Loss 0.8139   LearningRate 0.0013   Epoch: 17   Global Step: 100860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:04,440-Speed 3392.31 samples/sec   Loss 0.8167   LearningRate 0.0013   Epoch: 17   Global Step: 100870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:07,463-Speed 3388.23 samples/sec   Loss 0.8690   LearningRate 0.0013   Epoch: 17   Global Step: 100880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:10,475-Speed 3400.16 samples/sec   Loss 0.8178   LearningRate 0.0013   Epoch: 17   Global Step: 100890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:53:13,490-Speed 3397.55 samples/sec   Loss 0.8319   LearningRate 0.0013   Epoch: 17   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:16,549-Speed 3348.34 samples/sec   Loss 0.8026   LearningRate 0.0013   Epoch: 17   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:19,565-Speed 3395.63 samples/sec   Loss 0.7908   LearningRate 0.0013   Epoch: 17   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:22,583-Speed 3394.10 samples/sec   Loss 0.8556   LearningRate 0.0013   Epoch: 17   Global Step: 100930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:25,597-Speed 3398.68 samples/sec   Loss 0.8375   LearningRate 0.0013   Epoch: 17   Global Step: 100940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:28,615-Speed 3393.45 samples/sec   Loss 0.8799   LearningRate 0.0013   Epoch: 17   Global Step: 100950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:31,631-Speed 3395.57 samples/sec   Loss 0.7904   LearningRate 0.0013   Epoch: 17   Global Step: 100960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:34,645-Speed 3398.43 samples/sec   Loss 0.8148   LearningRate 0.0013   Epoch: 17   Global Step: 100970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:37,665-Speed 3391.59 samples/sec   Loss 0.7602   LearningRate 0.0013   Epoch: 17   Global Step: 100980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:40,684-Speed 3392.62 samples/sec   Loss 0.8088   LearningRate 0.0013   Epoch: 17   Global Step: 100990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:43,689-Speed 3409.17 samples/sec   Loss 0.7536   LearningRate 0.0013   Epoch: 17   Global Step: 101000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:46,714-Speed 3385.31 samples/sec   Loss 0.8203   LearningRate 0.0012   Epoch: 17   Global Step: 101010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:49,730-Speed 3396.13 samples/sec   Loss 0.8146   LearningRate 0.0012   Epoch: 17   Global Step: 101020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:52,748-Speed 3393.20 samples/sec   Loss 0.7910   LearningRate 0.0012   Epoch: 17   Global Step: 101030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:55,765-Speed 3395.26 samples/sec   Loss 0.8152   LearningRate 0.0012   Epoch: 17   Global Step: 101040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:53:58,798-Speed 3377.28 samples/sec   Loss 0.8045   LearningRate 0.0012   Epoch: 17   Global Step: 101050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:01,834-Speed 3373.32 samples/sec   Loss 0.8115   LearningRate 0.0012   Epoch: 17   Global Step: 101060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:04,851-Speed 3394.86 samples/sec   Loss 0.8140   LearningRate 0.0012   Epoch: 17   Global Step: 101070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:07,881-Speed 3380.38 samples/sec   Loss 0.8001   LearningRate 0.0012   Epoch: 17   Global Step: 101080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:10,900-Speed 3392.85 samples/sec   Loss 0.8156   LearningRate 0.0012   Epoch: 17   Global Step: 101090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:13,906-Speed 3407.47 samples/sec   Loss 0.8108   LearningRate 0.0012   Epoch: 17   Global Step: 101100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:16,905-Speed 3415.64 samples/sec   Loss 0.8196   LearningRate 0.0012   Epoch: 17   Global Step: 101110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:19,920-Speed 3397.34 samples/sec   Loss 0.7964   LearningRate 0.0012   Epoch: 17   Global Step: 101120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:22,936-Speed 3395.60 samples/sec   Loss 0.8185   LearningRate 0.0012   Epoch: 17   Global Step: 101130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:25,957-Speed 3390.59 samples/sec   Loss 0.8283   LearningRate 0.0012   Epoch: 17   Global Step: 101140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:28,977-Speed 3390.97 samples/sec   Loss 0.8276   LearningRate 0.0012   Epoch: 17   Global Step: 101150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:31,995-Speed 3393.77 samples/sec   Loss 0.8258   LearningRate 0.0012   Epoch: 17   Global Step: 101160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:35,026-Speed 3380.21 samples/sec   Loss 0.7968   LearningRate 0.0012   Epoch: 17   Global Step: 101170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:38,127-Speed 3302.80 samples/sec   Loss 0.8414   LearningRate 0.0012   Epoch: 17   Global Step: 101180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:41,145-Speed 3393.35 samples/sec   Loss 0.8219   LearningRate 0.0012   Epoch: 17   Global Step: 101190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:44,173-Speed 3382.80 samples/sec   Loss 0.8260   LearningRate 0.0012   Epoch: 17   Global Step: 101200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:54:47,201-Speed 3382.80 samples/sec   Loss 0.7888   LearningRate 0.0012   Epoch: 17   Global Step: 101210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:50,219-Speed 3393.97 samples/sec   Loss 0.8290   LearningRate 0.0012   Epoch: 17   Global Step: 101220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:53,237-Speed 3393.29 samples/sec   Loss 0.7300   LearningRate 0.0012   Epoch: 17   Global Step: 101230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:56,260-Speed 3388.34 samples/sec   Loss 0.8057   LearningRate 0.0012   Epoch: 17   Global Step: 101240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:54:59,280-Speed 3391.91 samples/sec   Loss 0.8412   LearningRate 0.0012   Epoch: 17   Global Step: 101250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:55:02,370-Speed 3314.73 samples/sec   Loss 0.8483   LearningRate 0.0012   Epoch: 17   Global Step: 101260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:55:05,490-Speed 3282.55 samples/sec   Loss 0.7395   LearningRate 0.0012   Epoch: 17   Global Step: 101270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:55:08,515-Speed 3386.11 samples/sec   Loss 0.8123   LearningRate 0.0012   Epoch: 17   Global Step: 101280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:11,543-Speed 3381.98 samples/sec   Loss 0.8455   LearningRate 0.0012   Epoch: 17   Global Step: 101290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:14,557-Speed 3398.93 samples/sec   Loss 0.8428   LearningRate 0.0012   Epoch: 17   Global Step: 101300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:17,581-Speed 3386.06 samples/sec   Loss 0.8110   LearningRate 0.0012   Epoch: 17   Global Step: 101310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:20,600-Speed 3393.12 samples/sec   Loss 0.7896   LearningRate 0.0012   Epoch: 17   Global Step: 101320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:23,625-Speed 3386.30 samples/sec   Loss 0.8669   LearningRate 0.0012   Epoch: 17   Global Step: 101330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:26,665-Speed 3369.01 samples/sec   Loss 0.7545   LearningRate 0.0012   Epoch: 17   Global Step: 101340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:29,690-Speed 3386.35 samples/sec   Loss 0.8096   LearningRate 0.0012   Epoch: 17   Global Step: 101350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:32,703-Speed 3398.36 samples/sec   Loss 0.7605   LearningRate 0.0012   Epoch: 17   Global Step: 101360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:35,722-Speed 3393.26 samples/sec   Loss 0.7813   LearningRate 0.0012   Epoch: 17   Global Step: 101370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:38,750-Speed 3382.17 samples/sec   Loss 0.7517   LearningRate 0.0012   Epoch: 17   Global Step: 101380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:55:41,774-Speed 3387.66 samples/sec   Loss 0.8452   LearningRate 0.0012   Epoch: 17   Global Step: 101390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:55:44,793-Speed 3392.13 samples/sec   Loss 0.8597   LearningRate 0.0012   Epoch: 17   Global Step: 101400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:55:47,812-Speed 3392.52 samples/sec   Loss 0.7872   LearningRate 0.0012   Epoch: 17   Global Step: 101410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:55:50,812-Speed 3414.63 samples/sec   Loss 0.8731   LearningRate 0.0012   Epoch: 17   Global Step: 101420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:53,833-Speed 3389.82 samples/sec   Loss 0.7535   LearningRate 0.0012   Epoch: 17   Global Step: 101430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:56,859-Speed 3385.26 samples/sec   Loss 0.8005   LearningRate 0.0012   Epoch: 17   Global Step: 101440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:55:59,876-Speed 3395.26 samples/sec   Loss 0.8139   LearningRate 0.0012   Epoch: 17   Global Step: 101450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:02,900-Speed 3386.87 samples/sec   Loss 0.7925   LearningRate 0.0012   Epoch: 17   Global Step: 101460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:05,923-Speed 3387.45 samples/sec   Loss 0.8048   LearningRate 0.0012   Epoch: 17   Global Step: 101470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:08,940-Speed 3395.21 samples/sec   Loss 0.8440   LearningRate 0.0012   Epoch: 17   Global Step: 101480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:11,960-Speed 3390.94 samples/sec   Loss 0.8099   LearningRate 0.0012   Epoch: 17   Global Step: 101490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:15,023-Speed 3344.21 samples/sec   Loss 0.8146   LearningRate 0.0012   Epoch: 17   Global Step: 101500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:18,053-Speed 3380.06 samples/sec   Loss 0.8510   LearningRate 0.0012   Epoch: 17   Global Step: 101510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:21,072-Speed 3393.12 samples/sec   Loss 0.8318   LearningRate 0.0012   Epoch: 17   Global Step: 101520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:24,115-Speed 3366.10 samples/sec   Loss 0.8483   LearningRate 0.0011   Epoch: 17   Global Step: 101530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:27,138-Speed 3388.73 samples/sec   Loss 0.8318   LearningRate 0.0011   Epoch: 17   Global Step: 101540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:30,158-Speed 3391.54 samples/sec   Loss 0.7857   LearningRate 0.0011   Epoch: 17   Global Step: 101550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:33,181-Speed 3387.03 samples/sec   Loss 0.8432   LearningRate 0.0011   Epoch: 17   Global Step: 101560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:36,214-Speed 3377.87 samples/sec   Loss 0.6957   LearningRate 0.0011   Epoch: 17   Global Step: 101570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:39,237-Speed 3388.26 samples/sec   Loss 0.8131   LearningRate 0.0011   Epoch: 17   Global Step: 101580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:42,257-Speed 3390.75 samples/sec   Loss 0.8238   LearningRate 0.0011   Epoch: 17   Global Step: 101590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:45,274-Speed 3395.18 samples/sec   Loss 0.7793   LearningRate 0.0011   Epoch: 17   Global Step: 101600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:48,302-Speed 3382.06 samples/sec   Loss 0.7968   LearningRate 0.0011   Epoch: 17   Global Step: 101610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:56:51,318-Speed 3396.93 samples/sec   Loss 0.8872   LearningRate 0.0011   Epoch: 17   Global Step: 101620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:54,348-Speed 3380.11 samples/sec   Loss 0.8185   LearningRate 0.0011   Epoch: 17   Global Step: 101630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:56:57,373-Speed 3386.25 samples/sec   Loss 0.8429   LearningRate 0.0011   Epoch: 17   Global Step: 101640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:00,406-Speed 3376.94 samples/sec   Loss 0.8568   LearningRate 0.0011   Epoch: 17   Global Step: 101650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:03,473-Speed 3338.96 samples/sec   Loss 0.8154   LearningRate 0.0011   Epoch: 17   Global Step: 101660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:06,575-Speed 3302.20 samples/sec   Loss 0.8398   LearningRate 0.0011   Epoch: 17   Global Step: 101670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:09,594-Speed 3393.03 samples/sec   Loss 0.7585   LearningRate 0.0011   Epoch: 17   Global Step: 101680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:12,620-Speed 3384.44 samples/sec   Loss 0.7701   LearningRate 0.0011   Epoch: 17   Global Step: 101690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:15,646-Speed 3384.11 samples/sec   Loss 0.7594   LearningRate 0.0011   Epoch: 17   Global Step: 101700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:18,686-Speed 3369.83 samples/sec   Loss 0.8101   LearningRate 0.0011   Epoch: 17   Global Step: 101710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 11:57:21,710-Speed 3387.82 samples/sec   Loss 0.8366   LearningRate 0.0011   Epoch: 17   Global Step: 101720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:24,736-Speed 3384.60 samples/sec   Loss 0.8166   LearningRate 0.0011   Epoch: 17   Global Step: 101730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:27,767-Speed 3378.38 samples/sec   Loss 0.8379   LearningRate 0.0011   Epoch: 17   Global Step: 101740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:30,790-Speed 3387.83 samples/sec   Loss 0.7880   LearningRate 0.0011   Epoch: 17   Global Step: 101750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:33,813-Speed 3388.67 samples/sec   Loss 0.7359   LearningRate 0.0011   Epoch: 17   Global Step: 101760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:36,840-Speed 3383.98 samples/sec   Loss 0.8079   LearningRate 0.0011   Epoch: 17   Global Step: 101770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:39,867-Speed 3383.06 samples/sec   Loss 0.8959   LearningRate 0.0011   Epoch: 17   Global Step: 101780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:42,891-Speed 3387.66 samples/sec   Loss 0.8621   LearningRate 0.0011   Epoch: 17   Global Step: 101790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:45,916-Speed 3385.55 samples/sec   Loss 0.8097   LearningRate 0.0011   Epoch: 17   Global Step: 101800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:48,942-Speed 3384.90 samples/sec   Loss 0.7981   LearningRate 0.0011   Epoch: 17   Global Step: 101810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:57:51,967-Speed 3386.56 samples/sec   Loss 0.8330   LearningRate 0.0011   Epoch: 17   Global Step: 101820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:57:54,989-Speed 3389.27 samples/sec   Loss 0.8612   LearningRate 0.0011   Epoch: 17   Global Step: 101830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:57:57,998-Speed 3403.64 samples/sec   Loss 0.8519   LearningRate 0.0011   Epoch: 17   Global Step: 101840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:01,028-Speed 3380.05 samples/sec   Loss 0.8025   LearningRate 0.0011   Epoch: 17   Global Step: 101850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:04,061-Speed 3376.89 samples/sec   Loss 0.8107   LearningRate 0.0011   Epoch: 17   Global Step: 101860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:07,091-Speed 3380.37 samples/sec   Loss 0.7677   LearningRate 0.0011   Epoch: 17   Global Step: 101870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:10,120-Speed 3381.11 samples/sec   Loss 0.7251   LearningRate 0.0011   Epoch: 17   Global Step: 101880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:13,151-Speed 3378.94 samples/sec   Loss 0.7737   LearningRate 0.0011   Epoch: 17   Global Step: 101890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:16,214-Speed 3344.30 samples/sec   Loss 0.8188   LearningRate 0.0011   Epoch: 17   Global Step: 101900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:19,243-Speed 3381.62 samples/sec   Loss 0.8005   LearningRate 0.0011   Epoch: 17   Global Step: 101910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:22,268-Speed 3385.97 samples/sec   Loss 0.8383   LearningRate 0.0011   Epoch: 17   Global Step: 101920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:25,295-Speed 3384.11 samples/sec   Loss 0.7585   LearningRate 0.0011   Epoch: 17   Global Step: 101930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:28,324-Speed 3380.81 samples/sec   Loss 0.7511   LearningRate 0.0011   Epoch: 17   Global Step: 101940   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:58:31,363-Speed 3370.77 samples/sec   Loss 0.8269   LearningRate 0.0011   Epoch: 17   Global Step: 101950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:58:34,396-Speed 3376.76 samples/sec   Loss 0.8369   LearningRate 0.0011   Epoch: 17   Global Step: 101960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 11:58:37,410-Speed 3398.48 samples/sec   Loss 0.7689   LearningRate 0.0011   Epoch: 17   Global Step: 101970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:40,444-Speed 3375.64 samples/sec   Loss 0.8358   LearningRate 0.0011   Epoch: 17   Global Step: 101980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:43,470-Speed 3385.29 samples/sec   Loss 0.7846   LearningRate 0.0011   Epoch: 17   Global Step: 101990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:58:46,493-Speed 3388.06 samples/sec   Loss 0.8409   LearningRate 0.0011   Epoch: 17   Global Step: 102000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 11:59:29,952-[lfw][102000]XNorm: 21.864571
Training: 2022-04-27 11:59:29,952-[lfw][102000]Accuracy-Flip: 0.99733+-0.00318
Training: 2022-04-27 11:59:29,953-[lfw][102000]Accuracy-Highest: 0.99817
Training: 2022-04-27 12:00:20,388-[cfp_fp][102000]XNorm: 21.876868
Training: 2022-04-27 12:00:20,388-[cfp_fp][102000]Accuracy-Flip: 0.98586+-0.00445
Training: 2022-04-27 12:00:20,389-[cfp_fp][102000]Accuracy-Highest: 0.98614
Training: 2022-04-27 12:01:03,778-[agedb_30][102000]XNorm: 22.357217
Training: 2022-04-27 12:01:03,779-[agedb_30][102000]Accuracy-Flip: 0.98183+-0.00724
Training: 2022-04-27 12:01:03,779-[agedb_30][102000]Accuracy-Highest: 0.98233
Training: 2022-04-27 12:01:06,792-Speed 72.99 samples/sec   Loss 0.7178   LearningRate 0.0011   Epoch: 17   Global Step: 102010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:09,795-Speed 3411.82 samples/sec   Loss 0.7886   LearningRate 0.0011   Epoch: 17   Global Step: 102020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:12,802-Speed 3406.32 samples/sec   Loss 0.8263   LearningRate 0.0011   Epoch: 17   Global Step: 102030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:15,807-Speed 3407.62 samples/sec   Loss 0.8298   LearningRate 0.0011   Epoch: 17   Global Step: 102040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:18,814-Speed 3406.67 samples/sec   Loss 0.7546   LearningRate 0.0011   Epoch: 17   Global Step: 102050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:21,830-Speed 3395.65 samples/sec   Loss 0.8317   LearningRate 0.0011   Epoch: 17   Global Step: 102060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:24,843-Speed 3399.49 samples/sec   Loss 0.7555   LearningRate 0.0010   Epoch: 17   Global Step: 102070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:01:27,868-Speed 3385.39 samples/sec   Loss 0.8746   LearningRate 0.0010   Epoch: 17   Global Step: 102080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:30,887-Speed 3392.87 samples/sec   Loss 0.9242   LearningRate 0.0010   Epoch: 17   Global Step: 102090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:33,906-Speed 3392.42 samples/sec   Loss 0.8185   LearningRate 0.0010   Epoch: 17   Global Step: 102100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:36,918-Speed 3401.22 samples/sec   Loss 0.7864   LearningRate 0.0010   Epoch: 17   Global Step: 102110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:39,929-Speed 3401.69 samples/sec   Loss 0.7417   LearningRate 0.0010   Epoch: 17   Global Step: 102120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:42,946-Speed 3394.47 samples/sec   Loss 0.8191   LearningRate 0.0010   Epoch: 17   Global Step: 102130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:45,961-Speed 3398.01 samples/sec   Loss 0.9015   LearningRate 0.0010   Epoch: 17   Global Step: 102140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:48,983-Speed 3388.81 samples/sec   Loss 0.8249   LearningRate 0.0010   Epoch: 17   Global Step: 102150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:51,997-Speed 3398.71 samples/sec   Loss 0.7549   LearningRate 0.0010   Epoch: 17   Global Step: 102160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:01:54,996-Speed 3414.79 samples/sec   Loss 0.7541   LearningRate 0.0010   Epoch: 17   Global Step: 102170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:01:58,009-Speed 3399.08 samples/sec   Loss 0.7012   LearningRate 0.0010   Epoch: 17   Global Step: 102180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:01,037-Speed 3382.94 samples/sec   Loss 0.8964   LearningRate 0.0010   Epoch: 17   Global Step: 102190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:04,057-Speed 3391.61 samples/sec   Loss 0.7932   LearningRate 0.0010   Epoch: 17   Global Step: 102200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:07,077-Speed 3391.55 samples/sec   Loss 0.7876   LearningRate 0.0010   Epoch: 17   Global Step: 102210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:10,103-Speed 3384.81 samples/sec   Loss 0.8208   LearningRate 0.0010   Epoch: 17   Global Step: 102220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:13,134-Speed 3378.90 samples/sec   Loss 0.7571   LearningRate 0.0010   Epoch: 17   Global Step: 102230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:16,183-Speed 3359.74 samples/sec   Loss 0.8648   LearningRate 0.0010   Epoch: 17   Global Step: 102240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:19,196-Speed 3398.84 samples/sec   Loss 0.8535   LearningRate 0.0010   Epoch: 17   Global Step: 102250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:22,214-Speed 3394.48 samples/sec   Loss 0.8247   LearningRate 0.0010   Epoch: 17   Global Step: 102260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:02:25,237-Speed 3388.11 samples/sec   Loss 0.8420   LearningRate 0.0010   Epoch: 17   Global Step: 102270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:28,252-Speed 3396.95 samples/sec   Loss 0.8066   LearningRate 0.0010   Epoch: 17   Global Step: 102280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:31,275-Speed 3388.69 samples/sec   Loss 0.8079   LearningRate 0.0010   Epoch: 17   Global Step: 102290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:34,296-Speed 3390.15 samples/sec   Loss 0.8466   LearningRate 0.0010   Epoch: 17   Global Step: 102300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:37,319-Speed 3387.70 samples/sec   Loss 0.7315   LearningRate 0.0010   Epoch: 17   Global Step: 102310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:40,380-Speed 3346.51 samples/sec   Loss 0.7711   LearningRate 0.0010   Epoch: 17   Global Step: 102320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:43,395-Speed 3397.25 samples/sec   Loss 0.8595   LearningRate 0.0010   Epoch: 17   Global Step: 102330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:46,493-Speed 3305.33 samples/sec   Loss 0.8394   LearningRate 0.0010   Epoch: 17   Global Step: 102340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:02:59,627-Speed 779.73 samples/sec   Loss 0.7455   LearningRate 0.0010   Epoch: 18   Global Step: 102350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:02,733-Speed 3298.62 samples/sec   Loss 0.6029   LearningRate 0.0010   Epoch: 18   Global Step: 102360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:05,789-Speed 3350.95 samples/sec   Loss 0.6102   LearningRate 0.0010   Epoch: 18   Global Step: 102370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:03:08,812-Speed 3388.16 samples/sec   Loss 0.5861   LearningRate 0.0010   Epoch: 18   Global Step: 102380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:03:11,814-Speed 3412.76 samples/sec   Loss 0.6070   LearningRate 0.0010   Epoch: 18   Global Step: 102390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:14,834-Speed 3391.61 samples/sec   Loss 0.6093   LearningRate 0.0010   Epoch: 18   Global Step: 102400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:17,855-Speed 3389.86 samples/sec   Loss 0.5643   LearningRate 0.0010   Epoch: 18   Global Step: 102410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:20,871-Speed 3396.23 samples/sec   Loss 0.5832   LearningRate 0.0010   Epoch: 18   Global Step: 102420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:23,888-Speed 3394.15 samples/sec   Loss 0.5994   LearningRate 0.0010   Epoch: 18   Global Step: 102430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:26,917-Speed 3382.08 samples/sec   Loss 0.5514   LearningRate 0.0010   Epoch: 18   Global Step: 102440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:29,940-Speed 3388.46 samples/sec   Loss 0.5307   LearningRate 0.0010   Epoch: 18   Global Step: 102450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:32,952-Speed 3399.85 samples/sec   Loss 0.6105   LearningRate 0.0010   Epoch: 18   Global Step: 102460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:03:35,953-Speed 3413.11 samples/sec   Loss 0.5696   LearningRate 0.0010   Epoch: 18   Global Step: 102470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:03:39,005-Speed 3355.60 samples/sec   Loss 0.6684   LearningRate 0.0010   Epoch: 18   Global Step: 102480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:03:42,024-Speed 3393.40 samples/sec   Loss 0.5501   LearningRate 0.0010   Epoch: 18   Global Step: 102490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:03:45,075-Speed 3357.24 samples/sec   Loss 0.5844   LearningRate 0.0010   Epoch: 18   Global Step: 102500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:03:48,092-Speed 3394.42 samples/sec   Loss 0.5839   LearningRate 0.0010   Epoch: 18   Global Step: 102510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:03:51,112-Speed 3391.43 samples/sec   Loss 0.6407   LearningRate 0.0010   Epoch: 18   Global Step: 102520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:03:54,132-Speed 3391.71 samples/sec   Loss 0.6625   LearningRate 0.0010   Epoch: 18   Global Step: 102530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:03:57,163-Speed 3379.21 samples/sec   Loss 0.5468   LearningRate 0.0010   Epoch: 18   Global Step: 102540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:04:00,190-Speed 3383.93 samples/sec   Loss 0.5838   LearningRate 0.0010   Epoch: 18   Global Step: 102550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:04:03,257-Speed 3339.17 samples/sec   Loss 0.6883   LearningRate 0.0010   Epoch: 18   Global Step: 102560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:04:06,276-Speed 3393.43 samples/sec   Loss 0.5760   LearningRate 0.0010   Epoch: 18   Global Step: 102570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:09,335-Speed 3348.12 samples/sec   Loss 0.6140   LearningRate 0.0010   Epoch: 18   Global Step: 102580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:12,357-Speed 3389.07 samples/sec   Loss 0.6207   LearningRate 0.0010   Epoch: 18   Global Step: 102590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:15,398-Speed 3367.55 samples/sec   Loss 0.5683   LearningRate 0.0010   Epoch: 18   Global Step: 102600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:18,423-Speed 3386.55 samples/sec   Loss 0.5398   LearningRate 0.0010   Epoch: 18   Global Step: 102610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:21,456-Speed 3376.87 samples/sec   Loss 0.5661   LearningRate 0.0010   Epoch: 18   Global Step: 102620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:24,483-Speed 3383.81 samples/sec   Loss 0.5837   LearningRate 0.0010   Epoch: 18   Global Step: 102630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:27,524-Speed 3368.18 samples/sec   Loss 0.5581   LearningRate 0.0009   Epoch: 18   Global Step: 102640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:30,549-Speed 3386.04 samples/sec   Loss 0.5799   LearningRate 0.0009   Epoch: 18   Global Step: 102650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:33,575-Speed 3385.25 samples/sec   Loss 0.5869   LearningRate 0.0009   Epoch: 18   Global Step: 102660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:36,602-Speed 3383.31 samples/sec   Loss 0.5870   LearningRate 0.0009   Epoch: 18   Global Step: 102670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:04:39,613-Speed 3401.61 samples/sec   Loss 0.6253   LearningRate 0.0009   Epoch: 18   Global Step: 102680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:42,654-Speed 3368.60 samples/sec   Loss 0.5276   LearningRate 0.0009   Epoch: 18   Global Step: 102690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:45,716-Speed 3344.66 samples/sec   Loss 0.5981   LearningRate 0.0009   Epoch: 18   Global Step: 102700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:48,771-Speed 3352.90 samples/sec   Loss 0.5983   LearningRate 0.0009   Epoch: 18   Global Step: 102710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:51,810-Speed 3370.17 samples/sec   Loss 0.5757   LearningRate 0.0009   Epoch: 18   Global Step: 102720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:54,841-Speed 3379.48 samples/sec   Loss 0.6346   LearningRate 0.0009   Epoch: 18   Global Step: 102730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:04:57,865-Speed 3387.21 samples/sec   Loss 0.6059   LearningRate 0.0009   Epoch: 18   Global Step: 102740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:00,907-Speed 3367.89 samples/sec   Loss 0.5598   LearningRate 0.0009   Epoch: 18   Global Step: 102750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:03,932-Speed 3385.51 samples/sec   Loss 0.6182   LearningRate 0.0009   Epoch: 18   Global Step: 102760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:06,978-Speed 3362.79 samples/sec   Loss 0.5649   LearningRate 0.0009   Epoch: 18   Global Step: 102770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:09,982-Speed 3409.04 samples/sec   Loss 0.6232   LearningRate 0.0009   Epoch: 18   Global Step: 102780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:13,147-Speed 3236.14 samples/sec   Loss 0.5253   LearningRate 0.0009   Epoch: 18   Global Step: 102790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:16,191-Speed 3365.36 samples/sec   Loss 0.6410   LearningRate 0.0009   Epoch: 18   Global Step: 102800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:19,210-Speed 3392.58 samples/sec   Loss 0.6398   LearningRate 0.0009   Epoch: 18   Global Step: 102810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:22,223-Speed 3399.72 samples/sec   Loss 0.6238   LearningRate 0.0009   Epoch: 18   Global Step: 102820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:25,260-Speed 3372.76 samples/sec   Loss 0.6539   LearningRate 0.0009   Epoch: 18   Global Step: 102830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:28,335-Speed 3330.51 samples/sec   Loss 0.6683   LearningRate 0.0009   Epoch: 18   Global Step: 102840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:31,363-Speed 3381.89 samples/sec   Loss 0.5666   LearningRate 0.0009   Epoch: 18   Global Step: 102850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:34,468-Speed 3299.50 samples/sec   Loss 0.5715   LearningRate 0.0009   Epoch: 18   Global Step: 102860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:37,502-Speed 3375.19 samples/sec   Loss 0.5971   LearningRate 0.0009   Epoch: 18   Global Step: 102870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:40,535-Speed 3377.47 samples/sec   Loss 0.6394   LearningRate 0.0009   Epoch: 18   Global Step: 102880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:43,557-Speed 3388.97 samples/sec   Loss 0.6026   LearningRate 0.0009   Epoch: 18   Global Step: 102890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:46,602-Speed 3364.54 samples/sec   Loss 0.6492   LearningRate 0.0009   Epoch: 18   Global Step: 102900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:49,627-Speed 3385.68 samples/sec   Loss 0.5496   LearningRate 0.0009   Epoch: 18   Global Step: 102910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:05:52,655-Speed 3381.89 samples/sec   Loss 0.5817   LearningRate 0.0009   Epoch: 18   Global Step: 102920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:55,685-Speed 3380.05 samples/sec   Loss 0.5683   LearningRate 0.0009   Epoch: 18   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:05:58,713-Speed 3383.09 samples/sec   Loss 0.5470   LearningRate 0.0009   Epoch: 18   Global Step: 102940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:01,740-Speed 3383.48 samples/sec   Loss 0.5641   LearningRate 0.0009   Epoch: 18   Global Step: 102950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:04,768-Speed 3382.94 samples/sec   Loss 0.5889   LearningRate 0.0009   Epoch: 18   Global Step: 102960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:07,799-Speed 3378.53 samples/sec   Loss 0.5531   LearningRate 0.0009   Epoch: 18   Global Step: 102970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:10,824-Speed 3386.10 samples/sec   Loss 0.6519   LearningRate 0.0009   Epoch: 18   Global Step: 102980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:13,846-Speed 3389.11 samples/sec   Loss 0.6672   LearningRate 0.0009   Epoch: 18   Global Step: 102990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:16,870-Speed 3387.71 samples/sec   Loss 0.5950   LearningRate 0.0009   Epoch: 18   Global Step: 103000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:19,889-Speed 3392.56 samples/sec   Loss 0.5271   LearningRate 0.0009   Epoch: 18   Global Step: 103010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:22,918-Speed 3381.77 samples/sec   Loss 0.6224   LearningRate 0.0009   Epoch: 18   Global Step: 103020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:06:25,925-Speed 3405.28 samples/sec   Loss 0.6218   LearningRate 0.0009   Epoch: 18   Global Step: 103030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:28,945-Speed 3391.53 samples/sec   Loss 0.5755   LearningRate 0.0009   Epoch: 18   Global Step: 103040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:31,967-Speed 3389.77 samples/sec   Loss 0.5425   LearningRate 0.0009   Epoch: 18   Global Step: 103050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:34,999-Speed 3377.90 samples/sec   Loss 0.5940   LearningRate 0.0009   Epoch: 18   Global Step: 103060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:38,028-Speed 3381.44 samples/sec   Loss 0.5869   LearningRate 0.0009   Epoch: 18   Global Step: 103070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:41,063-Speed 3375.34 samples/sec   Loss 0.5423   LearningRate 0.0009   Epoch: 18   Global Step: 103080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:44,091-Speed 3381.68 samples/sec   Loss 0.5114   LearningRate 0.0009   Epoch: 18   Global Step: 103090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:47,116-Speed 3385.94 samples/sec   Loss 0.6864   LearningRate 0.0009   Epoch: 18   Global Step: 103100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:50,200-Speed 3321.92 samples/sec   Loss 0.6472   LearningRate 0.0009   Epoch: 18   Global Step: 103110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:53,225-Speed 3386.03 samples/sec   Loss 0.5857   LearningRate 0.0009   Epoch: 18   Global Step: 103120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:56,228-Speed 3410.31 samples/sec   Loss 0.6540   LearningRate 0.0009   Epoch: 18   Global Step: 103130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:06:59,258-Speed 3379.68 samples/sec   Loss 0.6385   LearningRate 0.0009   Epoch: 18   Global Step: 103140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:02,287-Speed 3381.79 samples/sec   Loss 0.5794   LearningRate 0.0009   Epoch: 18   Global Step: 103150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:05,336-Speed 3359.72 samples/sec   Loss 0.6476   LearningRate 0.0009   Epoch: 18   Global Step: 103160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:08,364-Speed 3383.07 samples/sec   Loss 0.5998   LearningRate 0.0009   Epoch: 18   Global Step: 103170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:11,367-Speed 3409.62 samples/sec   Loss 0.6345   LearningRate 0.0009   Epoch: 18   Global Step: 103180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:14,391-Speed 3387.76 samples/sec   Loss 0.5966   LearningRate 0.0009   Epoch: 18   Global Step: 103190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:17,409-Speed 3393.05 samples/sec   Loss 0.5682   LearningRate 0.0009   Epoch: 18   Global Step: 103200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:20,433-Speed 3387.21 samples/sec   Loss 0.5549   LearningRate 0.0009   Epoch: 18   Global Step: 103210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:23,822-Speed 3022.07 samples/sec   Loss 0.5950   LearningRate 0.0009   Epoch: 18   Global Step: 103220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:26,854-Speed 3378.07 samples/sec   Loss 0.5975   LearningRate 0.0009   Epoch: 18   Global Step: 103230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:29,882-Speed 3383.08 samples/sec   Loss 0.5700   LearningRate 0.0008   Epoch: 18   Global Step: 103240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:32,918-Speed 3373.81 samples/sec   Loss 0.5957   LearningRate 0.0008   Epoch: 18   Global Step: 103250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:35,943-Speed 3385.59 samples/sec   Loss 0.5508   LearningRate 0.0008   Epoch: 18   Global Step: 103260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:38,969-Speed 3384.76 samples/sec   Loss 0.5868   LearningRate 0.0008   Epoch: 18   Global Step: 103270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:07:42,074-Speed 3298.81 samples/sec   Loss 0.6115   LearningRate 0.0008   Epoch: 18   Global Step: 103280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:45,112-Speed 3371.69 samples/sec   Loss 0.6520   LearningRate 0.0008   Epoch: 18   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:48,152-Speed 3368.33 samples/sec   Loss 0.6168   LearningRate 0.0008   Epoch: 18   Global Step: 103300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:51,254-Speed 3302.48 samples/sec   Loss 0.5960   LearningRate 0.0008   Epoch: 18   Global Step: 103310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:54,308-Speed 3353.80 samples/sec   Loss 0.5897   LearningRate 0.0008   Epoch: 18   Global Step: 103320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:07:57,337-Speed 3381.70 samples/sec   Loss 0.5849   LearningRate 0.0008   Epoch: 18   Global Step: 103330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:00,385-Speed 3359.88 samples/sec   Loss 0.6350   LearningRate 0.0008   Epoch: 18   Global Step: 103340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:03,473-Speed 3316.76 samples/sec   Loss 0.5957   LearningRate 0.0008   Epoch: 18   Global Step: 103350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:06,492-Speed 3393.31 samples/sec   Loss 0.5882   LearningRate 0.0008   Epoch: 18   Global Step: 103360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:09,518-Speed 3384.47 samples/sec   Loss 0.5781   LearningRate 0.0008   Epoch: 18   Global Step: 103370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:12,555-Speed 3372.13 samples/sec   Loss 0.6161   LearningRate 0.0008   Epoch: 18   Global Step: 103380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:08:15,576-Speed 3390.76 samples/sec   Loss 0.6599   LearningRate 0.0008   Epoch: 18   Global Step: 103390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:18,605-Speed 3381.79 samples/sec   Loss 0.6011   LearningRate 0.0008   Epoch: 18   Global Step: 103400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:21,639-Speed 3375.55 samples/sec   Loss 0.5866   LearningRate 0.0008   Epoch: 18   Global Step: 103410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:24,666-Speed 3383.66 samples/sec   Loss 0.5823   LearningRate 0.0008   Epoch: 18   Global Step: 103420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:27,688-Speed 3389.85 samples/sec   Loss 0.5942   LearningRate 0.0008   Epoch: 18   Global Step: 103430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:30,717-Speed 3380.50 samples/sec   Loss 0.5627   LearningRate 0.0008   Epoch: 18   Global Step: 103440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:33,735-Speed 3394.16 samples/sec   Loss 0.6585   LearningRate 0.0008   Epoch: 18   Global Step: 103450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:36,764-Speed 3381.43 samples/sec   Loss 0.6116   LearningRate 0.0008   Epoch: 18   Global Step: 103460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:39,795-Speed 3379.13 samples/sec   Loss 0.5854   LearningRate 0.0008   Epoch: 18   Global Step: 103470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:42,826-Speed 3379.67 samples/sec   Loss 0.5463   LearningRate 0.0008   Epoch: 18   Global Step: 103480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:45,863-Speed 3372.18 samples/sec   Loss 0.5966   LearningRate 0.0008   Epoch: 18   Global Step: 103490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:08:48,871-Speed 3404.95 samples/sec   Loss 0.5805   LearningRate 0.0008   Epoch: 18   Global Step: 103500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:51,894-Speed 3388.81 samples/sec   Loss 0.5996   LearningRate 0.0008   Epoch: 18   Global Step: 103510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:54,914-Speed 3391.69 samples/sec   Loss 0.6394   LearningRate 0.0008   Epoch: 18   Global Step: 103520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:08:57,931-Speed 3394.39 samples/sec   Loss 0.5758   LearningRate 0.0008   Epoch: 18   Global Step: 103530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:09:00,953-Speed 3389.05 samples/sec   Loss 0.6161   LearningRate 0.0008   Epoch: 18   Global Step: 103540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:09:03,954-Speed 3412.99 samples/sec   Loss 0.5935   LearningRate 0.0008   Epoch: 18   Global Step: 103550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:06,973-Speed 3392.70 samples/sec   Loss 0.5783   LearningRate 0.0008   Epoch: 18   Global Step: 103560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:09,996-Speed 3387.89 samples/sec   Loss 0.6053   LearningRate 0.0008   Epoch: 18   Global Step: 103570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:13,020-Speed 3387.39 samples/sec   Loss 0.5408   LearningRate 0.0008   Epoch: 18   Global Step: 103580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:16,048-Speed 3382.88 samples/sec   Loss 0.6119   LearningRate 0.0008   Epoch: 18   Global Step: 103590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:19,077-Speed 3381.17 samples/sec   Loss 0.5935   LearningRate 0.0008   Epoch: 18   Global Step: 103600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:22,113-Speed 3373.69 samples/sec   Loss 0.5507   LearningRate 0.0008   Epoch: 18   Global Step: 103610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:25,146-Speed 3377.60 samples/sec   Loss 0.5986   LearningRate 0.0008   Epoch: 18   Global Step: 103620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:28,210-Speed 3342.06 samples/sec   Loss 0.5157   LearningRate 0.0008   Epoch: 18   Global Step: 103630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:31,239-Speed 3381.23 samples/sec   Loss 0.6011   LearningRate 0.0008   Epoch: 18   Global Step: 103640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:34,258-Speed 3393.14 samples/sec   Loss 0.5937   LearningRate 0.0008   Epoch: 18   Global Step: 103650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:09:37,259-Speed 3412.95 samples/sec   Loss 0.6388   LearningRate 0.0008   Epoch: 18   Global Step: 103660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:40,280-Speed 3390.44 samples/sec   Loss 0.6274   LearningRate 0.0008   Epoch: 18   Global Step: 103670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:43,306-Speed 3384.40 samples/sec   Loss 0.5540   LearningRate 0.0008   Epoch: 18   Global Step: 103680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:46,325-Speed 3392.59 samples/sec   Loss 0.6037   LearningRate 0.0008   Epoch: 18   Global Step: 103690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:49,348-Speed 3388.92 samples/sec   Loss 0.6142   LearningRate 0.0008   Epoch: 18   Global Step: 103700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:52,373-Speed 3385.85 samples/sec   Loss 0.5735   LearningRate 0.0008   Epoch: 18   Global Step: 103710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:55,402-Speed 3381.56 samples/sec   Loss 0.5742   LearningRate 0.0008   Epoch: 18   Global Step: 103720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:09:58,422-Speed 3391.01 samples/sec   Loss 0.6581   LearningRate 0.0008   Epoch: 18   Global Step: 103730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:10:01,442-Speed 3391.60 samples/sec   Loss 0.6585   LearningRate 0.0008   Epoch: 18   Global Step: 103740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:10:04,466-Speed 3387.65 samples/sec   Loss 0.6121   LearningRate 0.0008   Epoch: 18   Global Step: 103750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:10:07,485-Speed 3392.12 samples/sec   Loss 0.5988   LearningRate 0.0008   Epoch: 18   Global Step: 103760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:10,510-Speed 3385.45 samples/sec   Loss 0.6008   LearningRate 0.0008   Epoch: 18   Global Step: 103770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:13,532-Speed 3389.06 samples/sec   Loss 0.6081   LearningRate 0.0008   Epoch: 18   Global Step: 103780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:16,559-Speed 3384.06 samples/sec   Loss 0.6549   LearningRate 0.0008   Epoch: 18   Global Step: 103790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:19,579-Speed 3392.25 samples/sec   Loss 0.5812   LearningRate 0.0008   Epoch: 18   Global Step: 103800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:22,606-Speed 3383.91 samples/sec   Loss 0.5618   LearningRate 0.0008   Epoch: 18   Global Step: 103810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:25,634-Speed 3381.94 samples/sec   Loss 0.6175   LearningRate 0.0008   Epoch: 18   Global Step: 103820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:28,654-Speed 3391.71 samples/sec   Loss 0.6474   LearningRate 0.0008   Epoch: 18   Global Step: 103830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:31,673-Speed 3392.38 samples/sec   Loss 0.5732   LearningRate 0.0008   Epoch: 18   Global Step: 103840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:34,692-Speed 3392.77 samples/sec   Loss 0.6113   LearningRate 0.0008   Epoch: 18   Global Step: 103850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:37,715-Speed 3388.26 samples/sec   Loss 0.6235   LearningRate 0.0008   Epoch: 18   Global Step: 103860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:40,737-Speed 3388.46 samples/sec   Loss 0.5494   LearningRate 0.0008   Epoch: 18   Global Step: 103870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:43,769-Speed 3378.33 samples/sec   Loss 0.5973   LearningRate 0.0007   Epoch: 18   Global Step: 103880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:46,786-Speed 3396.01 samples/sec   Loss 0.6015   LearningRate 0.0007   Epoch: 18   Global Step: 103890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:49,814-Speed 3381.72 samples/sec   Loss 0.6953   LearningRate 0.0007   Epoch: 18   Global Step: 103900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:52,830-Speed 3396.33 samples/sec   Loss 0.5865   LearningRate 0.0007   Epoch: 18   Global Step: 103910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:55,851-Speed 3390.95 samples/sec   Loss 0.5423   LearningRate 0.0007   Epoch: 18   Global Step: 103920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:10:58,882-Speed 3379.02 samples/sec   Loss 0.6611   LearningRate 0.0007   Epoch: 18   Global Step: 103930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:11:01,912-Speed 3380.11 samples/sec   Loss 0.5234   LearningRate 0.0007   Epoch: 18   Global Step: 103940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:11:04,936-Speed 3386.35 samples/sec   Loss 0.6275   LearningRate 0.0007   Epoch: 18   Global Step: 103950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:11:07,969-Speed 3376.84 samples/sec   Loss 0.5837   LearningRate 0.0007   Epoch: 18   Global Step: 103960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:11:10,981-Speed 3401.26 samples/sec   Loss 0.5897   LearningRate 0.0007   Epoch: 18   Global Step: 103970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:11:13,998-Speed 3394.76 samples/sec   Loss 0.6858   LearningRate 0.0007   Epoch: 18   Global Step: 103980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:11:17,024-Speed 3385.34 samples/sec   Loss 0.6322   LearningRate 0.0007   Epoch: 18   Global Step: 103990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:11:20,042-Speed 3393.48 samples/sec   Loss 0.6417   LearningRate 0.0007   Epoch: 18   Global Step: 104000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:12:03,285-[lfw][104000]XNorm: 21.637584
Training: 2022-04-27 12:12:03,285-[lfw][104000]Accuracy-Flip: 0.99767+-0.00271
Training: 2022-04-27 12:12:03,286-[lfw][104000]Accuracy-Highest: 0.99817
Training: 2022-04-27 12:12:53,380-[cfp_fp][104000]XNorm: 21.790685
Training: 2022-04-27 12:12:53,381-[cfp_fp][104000]Accuracy-Flip: 0.98371+-0.00520
Training: 2022-04-27 12:12:53,381-[cfp_fp][104000]Accuracy-Highest: 0.98614
Training: 2022-04-27 12:13:36,544-[agedb_30][104000]XNorm: 22.020270
Training: 2022-04-27 12:13:36,545-[agedb_30][104000]Accuracy-Flip: 0.98183+-0.00851
Training: 2022-04-27 12:13:36,545-[agedb_30][104000]Accuracy-Highest: 0.98233
Training: 2022-04-27 12:13:39,562-Speed 73.39 samples/sec   Loss 0.5285   LearningRate 0.0007   Epoch: 18   Global Step: 104010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:13:42,572-Speed 3403.34 samples/sec   Loss 0.5770   LearningRate 0.0007   Epoch: 18   Global Step: 104020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:13:45,576-Speed 3409.15 samples/sec   Loss 0.5167   LearningRate 0.0007   Epoch: 18   Global Step: 104030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:13:48,587-Speed 3401.72 samples/sec   Loss 0.5947   LearningRate 0.0007   Epoch: 18   Global Step: 104040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:13:51,604-Speed 3394.33 samples/sec   Loss 0.6518   LearningRate 0.0007   Epoch: 18   Global Step: 104050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:13:54,635-Speed 3380.06 samples/sec   Loss 0.5671   LearningRate 0.0007   Epoch: 18   Global Step: 104060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:13:57,633-Speed 3415.78 samples/sec   Loss 0.5888   LearningRate 0.0007   Epoch: 18   Global Step: 104070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:00,653-Speed 3391.75 samples/sec   Loss 0.6008   LearningRate 0.0007   Epoch: 18   Global Step: 104080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:03,674-Speed 3390.69 samples/sec   Loss 0.5500   LearningRate 0.0007   Epoch: 18   Global Step: 104090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:06,694-Speed 3391.03 samples/sec   Loss 0.5561   LearningRate 0.0007   Epoch: 18   Global Step: 104100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:09,717-Speed 3388.28 samples/sec   Loss 0.6480   LearningRate 0.0007   Epoch: 18   Global Step: 104110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:12,757-Speed 3369.80 samples/sec   Loss 0.6303   LearningRate 0.0007   Epoch: 18   Global Step: 104120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:15,826-Speed 3336.39 samples/sec   Loss 0.5931   LearningRate 0.0007   Epoch: 18   Global Step: 104130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:18,841-Speed 3397.21 samples/sec   Loss 0.6069   LearningRate 0.0007   Epoch: 18   Global Step: 104140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:21,864-Speed 3388.98 samples/sec   Loss 0.5490   LearningRate 0.0007   Epoch: 18   Global Step: 104150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:24,879-Speed 3396.53 samples/sec   Loss 0.6045   LearningRate 0.0007   Epoch: 18   Global Step: 104160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:27,892-Speed 3399.48 samples/sec   Loss 0.5702   LearningRate 0.0007   Epoch: 18   Global Step: 104170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:14:30,902-Speed 3402.33 samples/sec   Loss 0.6519   LearningRate 0.0007   Epoch: 18   Global Step: 104180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:14:33,904-Speed 3412.58 samples/sec   Loss 0.5965   LearningRate 0.0007   Epoch: 18   Global Step: 104190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:36,919-Speed 3396.99 samples/sec   Loss 0.6643   LearningRate 0.0007   Epoch: 18   Global Step: 104200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:39,933-Speed 3398.76 samples/sec   Loss 0.5940   LearningRate 0.0007   Epoch: 18   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:42,946-Speed 3398.68 samples/sec   Loss 0.6140   LearningRate 0.0007   Epoch: 18   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:45,956-Speed 3402.58 samples/sec   Loss 0.5288   LearningRate 0.0007   Epoch: 18   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:48,967-Speed 3402.20 samples/sec   Loss 0.5850   LearningRate 0.0007   Epoch: 18   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:14:51,961-Speed 3420.80 samples/sec   Loss 0.5909   LearningRate 0.0007   Epoch: 18   Global Step: 104250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:14:54,972-Speed 3401.79 samples/sec   Loss 0.5831   LearningRate 0.0007   Epoch: 18   Global Step: 104260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:14:58,018-Speed 3361.86 samples/sec   Loss 0.6446   LearningRate 0.0007   Epoch: 18   Global Step: 104270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:01,042-Speed 3388.19 samples/sec   Loss 0.6298   LearningRate 0.0007   Epoch: 18   Global Step: 104280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:04,064-Speed 3389.10 samples/sec   Loss 0.6546   LearningRate 0.0007   Epoch: 18   Global Step: 104290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:07,074-Speed 3402.14 samples/sec   Loss 0.5733   LearningRate 0.0007   Epoch: 18   Global Step: 104300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:10,087-Speed 3400.18 samples/sec   Loss 0.6481   LearningRate 0.0007   Epoch: 18   Global Step: 104310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:13,099-Speed 3400.16 samples/sec   Loss 0.5793   LearningRate 0.0007   Epoch: 18   Global Step: 104320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:16,132-Speed 3376.36 samples/sec   Loss 0.5466   LearningRate 0.0007   Epoch: 18   Global Step: 104330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:19,146-Speed 3398.70 samples/sec   Loss 0.5688   LearningRate 0.0007   Epoch: 18   Global Step: 104340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:22,176-Speed 3380.57 samples/sec   Loss 0.6144   LearningRate 0.0007   Epoch: 18   Global Step: 104350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:15:25,248-Speed 3333.52 samples/sec   Loss 0.5972   LearningRate 0.0007   Epoch: 18   Global Step: 104360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:28,271-Speed 3388.23 samples/sec   Loss 0.5835   LearningRate 0.0007   Epoch: 18   Global Step: 104370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:31,281-Speed 3403.97 samples/sec   Loss 0.5512   LearningRate 0.0007   Epoch: 18   Global Step: 104380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:34,313-Speed 3377.61 samples/sec   Loss 0.5659   LearningRate 0.0007   Epoch: 18   Global Step: 104390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:37,334-Speed 3390.29 samples/sec   Loss 0.5927   LearningRate 0.0007   Epoch: 18   Global Step: 104400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:40,342-Speed 3404.24 samples/sec   Loss 0.5569   LearningRate 0.0007   Epoch: 18   Global Step: 104410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:43,352-Speed 3403.12 samples/sec   Loss 0.6885   LearningRate 0.0007   Epoch: 18   Global Step: 104420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:46,367-Speed 3397.34 samples/sec   Loss 0.6014   LearningRate 0.0007   Epoch: 18   Global Step: 104430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:49,382-Speed 3397.16 samples/sec   Loss 0.5873   LearningRate 0.0007   Epoch: 18   Global Step: 104440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:52,404-Speed 3389.49 samples/sec   Loss 0.5947   LearningRate 0.0007   Epoch: 18   Global Step: 104450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:15:55,424-Speed 3391.80 samples/sec   Loss 0.5872   LearningRate 0.0007   Epoch: 18   Global Step: 104460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:15:58,436-Speed 3400.78 samples/sec   Loss 0.5917   LearningRate 0.0007   Epoch: 18   Global Step: 104470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:01,451-Speed 3396.34 samples/sec   Loss 0.5986   LearningRate 0.0007   Epoch: 18   Global Step: 104480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:04,466-Speed 3397.19 samples/sec   Loss 0.6169   LearningRate 0.0007   Epoch: 18   Global Step: 104490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:07,481-Speed 3397.01 samples/sec   Loss 0.5708   LearningRate 0.0007   Epoch: 18   Global Step: 104500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:10,516-Speed 3374.61 samples/sec   Loss 0.6147   LearningRate 0.0007   Epoch: 18   Global Step: 104510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:13,532-Speed 3397.02 samples/sec   Loss 0.5725   LearningRate 0.0007   Epoch: 18   Global Step: 104520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:16,551-Speed 3392.11 samples/sec   Loss 0.5628   LearningRate 0.0007   Epoch: 18   Global Step: 104530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:19,547-Speed 3419.02 samples/sec   Loss 0.6504   LearningRate 0.0007   Epoch: 18   Global Step: 104540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:22,568-Speed 3390.10 samples/sec   Loss 0.6117   LearningRate 0.0007   Epoch: 18   Global Step: 104550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:25,583-Speed 3397.12 samples/sec   Loss 0.6500   LearningRate 0.0006   Epoch: 18   Global Step: 104560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:28,592-Speed 3404.21 samples/sec   Loss 0.6206   LearningRate 0.0006   Epoch: 18   Global Step: 104570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:31,606-Speed 3398.07 samples/sec   Loss 0.5598   LearningRate 0.0006   Epoch: 18   Global Step: 104580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:34,627-Speed 3390.99 samples/sec   Loss 0.5680   LearningRate 0.0006   Epoch: 18   Global Step: 104590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:37,744-Speed 3285.25 samples/sec   Loss 0.5806   LearningRate 0.0006   Epoch: 18   Global Step: 104600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:40,860-Speed 3286.76 samples/sec   Loss 0.6875   LearningRate 0.0006   Epoch: 18   Global Step: 104610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:43,872-Speed 3400.46 samples/sec   Loss 0.5978   LearningRate 0.0006   Epoch: 18   Global Step: 104620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:46,884-Speed 3401.33 samples/sec   Loss 0.5403   LearningRate 0.0006   Epoch: 18   Global Step: 104630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:16:49,900-Speed 3395.68 samples/sec   Loss 0.5939   LearningRate 0.0006   Epoch: 18   Global Step: 104640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:52,912-Speed 3400.92 samples/sec   Loss 0.5681   LearningRate 0.0006   Epoch: 18   Global Step: 104650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:55,938-Speed 3384.11 samples/sec   Loss 0.6553   LearningRate 0.0006   Epoch: 18   Global Step: 104660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:16:58,964-Speed 3385.44 samples/sec   Loss 0.6221   LearningRate 0.0006   Epoch: 18   Global Step: 104670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:01,986-Speed 3389.77 samples/sec   Loss 0.6069   LearningRate 0.0006   Epoch: 18   Global Step: 104680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:05,005-Speed 3392.41 samples/sec   Loss 0.6107   LearningRate 0.0006   Epoch: 18   Global Step: 104690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:08,016-Speed 3400.90 samples/sec   Loss 0.6002   LearningRate 0.0006   Epoch: 18   Global Step: 104700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:11,035-Speed 3393.19 samples/sec   Loss 0.6479   LearningRate 0.0006   Epoch: 18   Global Step: 104710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:14,047-Speed 3400.16 samples/sec   Loss 0.5805   LearningRate 0.0006   Epoch: 18   Global Step: 104720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:17,058-Speed 3401.37 samples/sec   Loss 0.6626   LearningRate 0.0006   Epoch: 18   Global Step: 104730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:20,052-Speed 3422.31 samples/sec   Loss 0.5504   LearningRate 0.0006   Epoch: 18   Global Step: 104740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:23,050-Speed 3416.79 samples/sec   Loss 0.6180   LearningRate 0.0006   Epoch: 18   Global Step: 104750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:26,083-Speed 3377.07 samples/sec   Loss 0.5961   LearningRate 0.0006   Epoch: 18   Global Step: 104760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:29,115-Speed 3377.55 samples/sec   Loss 0.6685   LearningRate 0.0006   Epoch: 18   Global Step: 104770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:32,129-Speed 3398.93 samples/sec   Loss 0.5190   LearningRate 0.0006   Epoch: 18   Global Step: 104780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:35,147-Speed 3393.44 samples/sec   Loss 0.5802   LearningRate 0.0006   Epoch: 18   Global Step: 104790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:38,157-Speed 3402.63 samples/sec   Loss 0.5733   LearningRate 0.0006   Epoch: 18   Global Step: 104800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:41,172-Speed 3396.58 samples/sec   Loss 0.6223   LearningRate 0.0006   Epoch: 18   Global Step: 104810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:44,189-Speed 3396.09 samples/sec   Loss 0.5617   LearningRate 0.0006   Epoch: 18   Global Step: 104820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:47,234-Speed 3363.45 samples/sec   Loss 0.5581   LearningRate 0.0006   Epoch: 18   Global Step: 104830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:50,329-Speed 3309.33 samples/sec   Loss 0.6080   LearningRate 0.0006   Epoch: 18   Global Step: 104840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:17:53,365-Speed 3373.55 samples/sec   Loss 0.6323   LearningRate 0.0006   Epoch: 18   Global Step: 104850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:56,377-Speed 3400.70 samples/sec   Loss 0.6567   LearningRate 0.0006   Epoch: 18   Global Step: 104860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:17:59,388-Speed 3401.15 samples/sec   Loss 0.6073   LearningRate 0.0006   Epoch: 18   Global Step: 104870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:02,471-Speed 3322.23 samples/sec   Loss 0.6809   LearningRate 0.0006   Epoch: 18   Global Step: 104880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:05,485-Speed 3398.10 samples/sec   Loss 0.6175   LearningRate 0.0006   Epoch: 18   Global Step: 104890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:08,505-Speed 3391.89 samples/sec   Loss 0.6053   LearningRate 0.0006   Epoch: 18   Global Step: 104900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:11,541-Speed 3374.06 samples/sec   Loss 0.5269   LearningRate 0.0006   Epoch: 18   Global Step: 104910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:14,569-Speed 3382.84 samples/sec   Loss 0.5614   LearningRate 0.0006   Epoch: 18   Global Step: 104920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:17,590-Speed 3389.79 samples/sec   Loss 0.5874   LearningRate 0.0006   Epoch: 18   Global Step: 104930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:20,617-Speed 3383.36 samples/sec   Loss 0.5827   LearningRate 0.0006   Epoch: 18   Global Step: 104940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:23,629-Speed 3400.53 samples/sec   Loss 0.5887   LearningRate 0.0006   Epoch: 18   Global Step: 104950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:18:26,628-Speed 3415.47 samples/sec   Loss 0.5761   LearningRate 0.0006   Epoch: 18   Global Step: 104960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:29,643-Speed 3397.50 samples/sec   Loss 0.5480   LearningRate 0.0006   Epoch: 18   Global Step: 104970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:32,659-Speed 3395.96 samples/sec   Loss 0.5587   LearningRate 0.0006   Epoch: 18   Global Step: 104980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:35,670-Speed 3401.08 samples/sec   Loss 0.5609   LearningRate 0.0006   Epoch: 18   Global Step: 104990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:38,687-Speed 3394.91 samples/sec   Loss 0.5838   LearningRate 0.0006   Epoch: 18   Global Step: 105000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:41,705-Speed 3394.39 samples/sec   Loss 0.6025   LearningRate 0.0006   Epoch: 18   Global Step: 105010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:44,721-Speed 3396.08 samples/sec   Loss 0.6471   LearningRate 0.0006   Epoch: 18   Global Step: 105020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:47,757-Speed 3373.39 samples/sec   Loss 0.6121   LearningRate 0.0006   Epoch: 18   Global Step: 105030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:50,774-Speed 3394.85 samples/sec   Loss 0.6398   LearningRate 0.0006   Epoch: 18   Global Step: 105040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:53,789-Speed 3396.66 samples/sec   Loss 0.5585   LearningRate 0.0006   Epoch: 18   Global Step: 105050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:56,787-Speed 3417.29 samples/sec   Loss 0.5861   LearningRate 0.0006   Epoch: 18   Global Step: 105060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:18:59,818-Speed 3378.74 samples/sec   Loss 0.5848   LearningRate 0.0006   Epoch: 18   Global Step: 105070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:02,854-Speed 3373.82 samples/sec   Loss 0.6027   LearningRate 0.0006   Epoch: 18   Global Step: 105080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:05,871-Speed 3394.97 samples/sec   Loss 0.5582   LearningRate 0.0006   Epoch: 18   Global Step: 105090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:08,895-Speed 3386.74 samples/sec   Loss 0.6629   LearningRate 0.0006   Epoch: 18   Global Step: 105100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:11,911-Speed 3396.86 samples/sec   Loss 0.5780   LearningRate 0.0006   Epoch: 18   Global Step: 105110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:14,942-Speed 3379.44 samples/sec   Loss 0.6139   LearningRate 0.0006   Epoch: 18   Global Step: 105120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:17,958-Speed 3395.00 samples/sec   Loss 0.5909   LearningRate 0.0006   Epoch: 18   Global Step: 105130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:20,965-Speed 3407.03 samples/sec   Loss 0.6164   LearningRate 0.0006   Epoch: 18   Global Step: 105140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:23,990-Speed 3385.76 samples/sec   Loss 0.6747   LearningRate 0.0006   Epoch: 18   Global Step: 105150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:27,011-Speed 3389.91 samples/sec   Loss 0.6098   LearningRate 0.0006   Epoch: 18   Global Step: 105160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:30,027-Speed 3396.63 samples/sec   Loss 0.5900   LearningRate 0.0006   Epoch: 18   Global Step: 105170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:33,047-Speed 3391.70 samples/sec   Loss 0.5977   LearningRate 0.0006   Epoch: 18   Global Step: 105180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:36,075-Speed 3382.21 samples/sec   Loss 0.5474   LearningRate 0.0006   Epoch: 18   Global Step: 105190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:39,090-Speed 3397.55 samples/sec   Loss 0.5994   LearningRate 0.0006   Epoch: 18   Global Step: 105200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:42,105-Speed 3396.63 samples/sec   Loss 0.6339   LearningRate 0.0006   Epoch: 18   Global Step: 105210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:45,119-Speed 3398.58 samples/sec   Loss 0.6657   LearningRate 0.0006   Epoch: 18   Global Step: 105220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:48,149-Speed 3380.09 samples/sec   Loss 0.5962   LearningRate 0.0006   Epoch: 18   Global Step: 105230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:19:51,180-Speed 3379.11 samples/sec   Loss 0.5259   LearningRate 0.0006   Epoch: 18   Global Step: 105240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:54,210-Speed 3380.39 samples/sec   Loss 0.5721   LearningRate 0.0006   Epoch: 18   Global Step: 105250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:19:57,231-Speed 3389.67 samples/sec   Loss 0.6507   LearningRate 0.0006   Epoch: 18   Global Step: 105260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:00,254-Speed 3389.18 samples/sec   Loss 0.6270   LearningRate 0.0006   Epoch: 18   Global Step: 105270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:03,301-Speed 3361.13 samples/sec   Loss 0.6167   LearningRate 0.0006   Epoch: 18   Global Step: 105280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:06,328-Speed 3383.32 samples/sec   Loss 0.6212   LearningRate 0.0005   Epoch: 18   Global Step: 105290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:09,364-Speed 3373.95 samples/sec   Loss 0.5445   LearningRate 0.0005   Epoch: 18   Global Step: 105300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:12,382-Speed 3393.57 samples/sec   Loss 0.5250   LearningRate 0.0005   Epoch: 18   Global Step: 105310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:15,488-Speed 3297.12 samples/sec   Loss 0.5951   LearningRate 0.0005   Epoch: 18   Global Step: 105320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:18,579-Speed 3314.02 samples/sec   Loss 0.6182   LearningRate 0.0005   Epoch: 18   Global Step: 105330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:21,578-Speed 3414.56 samples/sec   Loss 0.5671   LearningRate 0.0005   Epoch: 18   Global Step: 105340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:24,607-Speed 3382.12 samples/sec   Loss 0.5920   LearningRate 0.0005   Epoch: 18   Global Step: 105350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:27,636-Speed 3382.04 samples/sec   Loss 0.5918   LearningRate 0.0005   Epoch: 18   Global Step: 105360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:30,656-Speed 3391.60 samples/sec   Loss 0.6157   LearningRate 0.0005   Epoch: 18   Global Step: 105370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:33,676-Speed 3391.41 samples/sec   Loss 0.5373   LearningRate 0.0005   Epoch: 18   Global Step: 105380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:36,697-Speed 3390.29 samples/sec   Loss 0.4980   LearningRate 0.0005   Epoch: 18   Global Step: 105390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:39,714-Speed 3394.04 samples/sec   Loss 0.6164   LearningRate 0.0005   Epoch: 18   Global Step: 105400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:42,730-Speed 3396.62 samples/sec   Loss 0.5121   LearningRate 0.0005   Epoch: 18   Global Step: 105410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:45,749-Speed 3392.18 samples/sec   Loss 0.5556   LearningRate 0.0005   Epoch: 18   Global Step: 105420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:48,771-Speed 3389.14 samples/sec   Loss 0.5991   LearningRate 0.0005   Epoch: 18   Global Step: 105430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:51,784-Speed 3399.18 samples/sec   Loss 0.5819   LearningRate 0.0005   Epoch: 18   Global Step: 105440   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:20:54,784-Speed 3414.61 samples/sec   Loss 0.5830   LearningRate 0.0005   Epoch: 18   Global Step: 105450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:20:57,803-Speed 3393.06 samples/sec   Loss 0.6139   LearningRate 0.0005   Epoch: 18   Global Step: 105460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:00,832-Speed 3381.40 samples/sec   Loss 0.6402   LearningRate 0.0005   Epoch: 18   Global Step: 105470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:03,851-Speed 3393.29 samples/sec   Loss 0.6183   LearningRate 0.0005   Epoch: 18   Global Step: 105480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:06,870-Speed 3391.88 samples/sec   Loss 0.6575   LearningRate 0.0005   Epoch: 18   Global Step: 105490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:09,892-Speed 3388.94 samples/sec   Loss 0.4956   LearningRate 0.0005   Epoch: 18   Global Step: 105500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:12,908-Speed 3396.41 samples/sec   Loss 0.6224   LearningRate 0.0005   Epoch: 18   Global Step: 105510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:15,907-Speed 3415.66 samples/sec   Loss 0.5893   LearningRate 0.0005   Epoch: 18   Global Step: 105520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:18,930-Speed 3387.29 samples/sec   Loss 0.6551   LearningRate 0.0005   Epoch: 18   Global Step: 105530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:21,945-Speed 3397.52 samples/sec   Loss 0.6160   LearningRate 0.0005   Epoch: 18   Global Step: 105540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:24,970-Speed 3386.15 samples/sec   Loss 0.6223   LearningRate 0.0005   Epoch: 18   Global Step: 105550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:27,989-Speed 3392.77 samples/sec   Loss 0.6046   LearningRate 0.0005   Epoch: 18   Global Step: 105560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:31,004-Speed 3396.88 samples/sec   Loss 0.5704   LearningRate 0.0005   Epoch: 18   Global Step: 105570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:34,025-Speed 3390.71 samples/sec   Loss 0.5035   LearningRate 0.0005   Epoch: 18   Global Step: 105580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:37,054-Speed 3381.56 samples/sec   Loss 0.5809   LearningRate 0.0005   Epoch: 18   Global Step: 105590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:40,070-Speed 3395.85 samples/sec   Loss 0.6013   LearningRate 0.0005   Epoch: 18   Global Step: 105600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:43,085-Speed 3397.44 samples/sec   Loss 0.6247   LearningRate 0.0005   Epoch: 18   Global Step: 105610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:21:46,102-Speed 3394.99 samples/sec   Loss 0.6018   LearningRate 0.0005   Epoch: 18   Global Step: 105620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:49,128-Speed 3384.07 samples/sec   Loss 0.5938   LearningRate 0.0005   Epoch: 18   Global Step: 105630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:52,152-Speed 3387.54 samples/sec   Loss 0.6241   LearningRate 0.0005   Epoch: 18   Global Step: 105640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:55,180-Speed 3383.01 samples/sec   Loss 0.5693   LearningRate 0.0005   Epoch: 18   Global Step: 105650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:21:58,197-Speed 3394.68 samples/sec   Loss 0.5480   LearningRate 0.0005   Epoch: 18   Global Step: 105660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:01,212-Speed 3397.50 samples/sec   Loss 0.5926   LearningRate 0.0005   Epoch: 18   Global Step: 105670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:04,229-Speed 3394.05 samples/sec   Loss 0.6216   LearningRate 0.0005   Epoch: 18   Global Step: 105680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:07,244-Speed 3397.56 samples/sec   Loss 0.6223   LearningRate 0.0005   Epoch: 18   Global Step: 105690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:10,262-Speed 3393.51 samples/sec   Loss 0.5817   LearningRate 0.0005   Epoch: 18   Global Step: 105700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:13,302-Speed 3369.47 samples/sec   Loss 0.5861   LearningRate 0.0005   Epoch: 18   Global Step: 105710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:16,300-Speed 3416.04 samples/sec   Loss 0.5405   LearningRate 0.0005   Epoch: 18   Global Step: 105720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:19,319-Speed 3392.64 samples/sec   Loss 0.5835   LearningRate 0.0005   Epoch: 18   Global Step: 105730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:22,334-Speed 3397.33 samples/sec   Loss 0.5642   LearningRate 0.0005   Epoch: 18   Global Step: 105740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:25,355-Speed 3390.71 samples/sec   Loss 0.6542   LearningRate 0.0005   Epoch: 18   Global Step: 105750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:28,370-Speed 3397.15 samples/sec   Loss 0.6514   LearningRate 0.0005   Epoch: 18   Global Step: 105760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:31,386-Speed 3395.68 samples/sec   Loss 0.5986   LearningRate 0.0005   Epoch: 18   Global Step: 105770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:34,401-Speed 3397.06 samples/sec   Loss 0.5923   LearningRate 0.0005   Epoch: 18   Global Step: 105780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:37,445-Speed 3365.18 samples/sec   Loss 0.5839   LearningRate 0.0005   Epoch: 18   Global Step: 105790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:40,603-Speed 3243.39 samples/sec   Loss 0.5943   LearningRate 0.0005   Epoch: 18   Global Step: 105800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:43,626-Speed 3388.54 samples/sec   Loss 0.5620   LearningRate 0.0005   Epoch: 18   Global Step: 105810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:22:46,683-Speed 3350.18 samples/sec   Loss 0.6675   LearningRate 0.0005   Epoch: 18   Global Step: 105820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:22:49,702-Speed 3392.76 samples/sec   Loss 0.6673   LearningRate 0.0005   Epoch: 18   Global Step: 105830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:22:52,727-Speed 3386.24 samples/sec   Loss 0.6141   LearningRate 0.0005   Epoch: 18   Global Step: 105840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:22:55,744-Speed 3394.84 samples/sec   Loss 0.6596   LearningRate 0.0005   Epoch: 18   Global Step: 105850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:22:58,742-Speed 3416.03 samples/sec   Loss 0.6068   LearningRate 0.0005   Epoch: 18   Global Step: 105860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:01,765-Speed 3388.15 samples/sec   Loss 0.5339   LearningRate 0.0005   Epoch: 18   Global Step: 105870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:04,787-Speed 3390.94 samples/sec   Loss 0.6013   LearningRate 0.0005   Epoch: 18   Global Step: 105880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:07,811-Speed 3386.31 samples/sec   Loss 0.5442   LearningRate 0.0005   Epoch: 18   Global Step: 105890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:10,832-Speed 3390.66 samples/sec   Loss 0.6665   LearningRate 0.0005   Epoch: 18   Global Step: 105900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:13,851-Speed 3392.93 samples/sec   Loss 0.6018   LearningRate 0.0005   Epoch: 18   Global Step: 105910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:16,880-Speed 3381.83 samples/sec   Loss 0.5510   LearningRate 0.0005   Epoch: 18   Global Step: 105920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:19,897-Speed 3394.36 samples/sec   Loss 0.5895   LearningRate 0.0005   Epoch: 18   Global Step: 105930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:22,916-Speed 3392.50 samples/sec   Loss 0.6196   LearningRate 0.0005   Epoch: 18   Global Step: 105940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:25,937-Speed 3390.51 samples/sec   Loss 0.6709   LearningRate 0.0005   Epoch: 18   Global Step: 105950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:28,945-Speed 3405.34 samples/sec   Loss 0.6533   LearningRate 0.0005   Epoch: 18   Global Step: 105960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:31,970-Speed 3385.19 samples/sec   Loss 0.6275   LearningRate 0.0005   Epoch: 18   Global Step: 105970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:34,987-Speed 3395.78 samples/sec   Loss 0.6161   LearningRate 0.0005   Epoch: 18   Global Step: 105980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:38,006-Speed 3393.01 samples/sec   Loss 0.6282   LearningRate 0.0005   Epoch: 18   Global Step: 105990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:23:41,032-Speed 3384.14 samples/sec   Loss 0.6342   LearningRate 0.0005   Epoch: 18   Global Step: 106000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:24:24,513-[lfw][106000]XNorm: 21.776894
Training: 2022-04-27 12:24:24,514-[lfw][106000]Accuracy-Flip: 0.99767+-0.00271
Training: 2022-04-27 12:24:24,514-[lfw][106000]Accuracy-Highest: 0.99817
Training: 2022-04-27 12:25:15,057-[cfp_fp][106000]XNorm: 22.000450
Training: 2022-04-27 12:25:15,057-[cfp_fp][106000]Accuracy-Flip: 0.98486+-0.00492
Training: 2022-04-27 12:25:15,058-[cfp_fp][106000]Accuracy-Highest: 0.98614
Training: 2022-04-27 12:25:58,465-[agedb_30][106000]XNorm: 22.311786
Training: 2022-04-27 12:25:58,466-[agedb_30][106000]Accuracy-Flip: 0.98167+-0.00734
Training: 2022-04-27 12:25:58,466-[agedb_30][106000]Accuracy-Highest: 0.98233
Training: 2022-04-27 12:26:01,470-Speed 72.92 samples/sec   Loss 0.5434   LearningRate 0.0005   Epoch: 18   Global Step: 106010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:04,495-Speed 3386.61 samples/sec   Loss 0.6617   LearningRate 0.0005   Epoch: 18   Global Step: 106020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:07,523-Speed 3382.44 samples/sec   Loss 0.6195   LearningRate 0.0005   Epoch: 18   Global Step: 106030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:10,539-Speed 3395.70 samples/sec   Loss 0.5708   LearningRate 0.0005   Epoch: 18   Global Step: 106040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:13,581-Speed 3366.77 samples/sec   Loss 0.5582   LearningRate 0.0005   Epoch: 18   Global Step: 106050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:16,882-Speed 3102.92 samples/sec   Loss 0.5975   LearningRate 0.0005   Epoch: 18   Global Step: 106060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:19,901-Speed 3392.39 samples/sec   Loss 0.6512   LearningRate 0.0005   Epoch: 18   Global Step: 106070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:22,943-Speed 3366.49 samples/sec   Loss 0.5753   LearningRate 0.0005   Epoch: 18   Global Step: 106080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:26,086-Speed 3259.08 samples/sec   Loss 0.6347   LearningRate 0.0005   Epoch: 18   Global Step: 106090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:29,103-Speed 3395.27 samples/sec   Loss 0.6185   LearningRate 0.0004   Epoch: 18   Global Step: 106100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:26:32,121-Speed 3394.56 samples/sec   Loss 0.5854   LearningRate 0.0004   Epoch: 18   Global Step: 106110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:35,138-Speed 3394.43 samples/sec   Loss 0.5694   LearningRate 0.0004   Epoch: 18   Global Step: 106120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:38,166-Speed 3382.54 samples/sec   Loss 0.6336   LearningRate 0.0004   Epoch: 18   Global Step: 106130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:41,210-Speed 3364.20 samples/sec   Loss 0.6458   LearningRate 0.0004   Epoch: 18   Global Step: 106140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:44,233-Speed 3388.60 samples/sec   Loss 0.5697   LearningRate 0.0004   Epoch: 18   Global Step: 106150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:47,260-Speed 3383.32 samples/sec   Loss 0.5703   LearningRate 0.0004   Epoch: 18   Global Step: 106160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:50,285-Speed 3386.17 samples/sec   Loss 0.5790   LearningRate 0.0004   Epoch: 18   Global Step: 106170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:53,323-Speed 3372.10 samples/sec   Loss 0.5234   LearningRate 0.0004   Epoch: 18   Global Step: 106180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:56,348-Speed 3385.79 samples/sec   Loss 0.6642   LearningRate 0.0004   Epoch: 18   Global Step: 106190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:26:59,369-Speed 3391.00 samples/sec   Loss 0.5631   LearningRate 0.0004   Epoch: 18   Global Step: 106200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:02,392-Speed 3387.73 samples/sec   Loss 0.5595   LearningRate 0.0004   Epoch: 18   Global Step: 106210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:27:05,414-Speed 3388.60 samples/sec   Loss 0.6224   LearningRate 0.0004   Epoch: 18   Global Step: 106220   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:27:08,431-Speed 3395.56 samples/sec   Loss 0.6697   LearningRate 0.0004   Epoch: 18   Global Step: 106230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:27:11,453-Speed 3388.49 samples/sec   Loss 0.6279   LearningRate 0.0004   Epoch: 18   Global Step: 106240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:27:14,448-Speed 3419.94 samples/sec   Loss 0.5547   LearningRate 0.0004   Epoch: 18   Global Step: 106250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:17,472-Speed 3386.60 samples/sec   Loss 0.6187   LearningRate 0.0004   Epoch: 18   Global Step: 106260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:20,498-Speed 3385.98 samples/sec   Loss 0.6137   LearningRate 0.0004   Epoch: 18   Global Step: 106270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:23,516-Speed 3393.30 samples/sec   Loss 0.6186   LearningRate 0.0004   Epoch: 18   Global Step: 106280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:26,560-Speed 3364.91 samples/sec   Loss 0.6418   LearningRate 0.0004   Epoch: 18   Global Step: 106290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:29,570-Speed 3402.62 samples/sec   Loss 0.5466   LearningRate 0.0004   Epoch: 18   Global Step: 106300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:32,580-Speed 3403.32 samples/sec   Loss 0.5999   LearningRate 0.0004   Epoch: 18   Global Step: 106310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:35,595-Speed 3397.19 samples/sec   Loss 0.6134   LearningRate 0.0004   Epoch: 18   Global Step: 106320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:38,609-Speed 3398.17 samples/sec   Loss 0.5812   LearningRate 0.0004   Epoch: 18   Global Step: 106330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:41,626-Speed 3395.13 samples/sec   Loss 0.6422   LearningRate 0.0004   Epoch: 18   Global Step: 106340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:44,640-Speed 3397.32 samples/sec   Loss 0.5906   LearningRate 0.0004   Epoch: 18   Global Step: 106350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:27:47,638-Speed 3416.37 samples/sec   Loss 0.6160   LearningRate 0.0004   Epoch: 18   Global Step: 106360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:50,661-Speed 3388.79 samples/sec   Loss 0.5023   LearningRate 0.0004   Epoch: 18   Global Step: 106370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:53,667-Speed 3407.34 samples/sec   Loss 0.6052   LearningRate 0.0004   Epoch: 18   Global Step: 106380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:56,681-Speed 3398.07 samples/sec   Loss 0.6352   LearningRate 0.0004   Epoch: 18   Global Step: 106390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:27:59,720-Speed 3370.34 samples/sec   Loss 0.5674   LearningRate 0.0004   Epoch: 18   Global Step: 106400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:02,742-Speed 3389.33 samples/sec   Loss 0.6180   LearningRate 0.0004   Epoch: 18   Global Step: 106410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:05,748-Speed 3407.01 samples/sec   Loss 0.6401   LearningRate 0.0004   Epoch: 18   Global Step: 106420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:08,756-Speed 3405.34 samples/sec   Loss 0.5945   LearningRate 0.0004   Epoch: 18   Global Step: 106430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:11,837-Speed 3324.33 samples/sec   Loss 0.6243   LearningRate 0.0004   Epoch: 18   Global Step: 106440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:14,863-Speed 3385.08 samples/sec   Loss 0.6127   LearningRate 0.0004   Epoch: 18   Global Step: 106450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:17,873-Speed 3402.92 samples/sec   Loss 0.6137   LearningRate 0.0004   Epoch: 18   Global Step: 106460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:28:20,863-Speed 3425.54 samples/sec   Loss 0.6039   LearningRate 0.0004   Epoch: 18   Global Step: 106470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:23,916-Speed 3354.54 samples/sec   Loss 0.6219   LearningRate 0.0004   Epoch: 18   Global Step: 106480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:26,973-Speed 3350.65 samples/sec   Loss 0.6501   LearningRate 0.0004   Epoch: 18   Global Step: 106490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:29,980-Speed 3406.28 samples/sec   Loss 0.5812   LearningRate 0.0004   Epoch: 18   Global Step: 106500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:32,991-Speed 3401.87 samples/sec   Loss 0.5829   LearningRate 0.0004   Epoch: 18   Global Step: 106510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:36,016-Speed 3385.40 samples/sec   Loss 0.6255   LearningRate 0.0004   Epoch: 18   Global Step: 106520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:39,035-Speed 3392.95 samples/sec   Loss 0.5995   LearningRate 0.0004   Epoch: 18   Global Step: 106530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:42,046-Speed 3401.29 samples/sec   Loss 0.6136   LearningRate 0.0004   Epoch: 18   Global Step: 106540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:45,056-Speed 3402.71 samples/sec   Loss 0.5317   LearningRate 0.0004   Epoch: 18   Global Step: 106550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:48,067-Speed 3402.05 samples/sec   Loss 0.6079   LearningRate 0.0004   Epoch: 18   Global Step: 106560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:51,059-Speed 3423.72 samples/sec   Loss 0.6375   LearningRate 0.0004   Epoch: 18   Global Step: 106570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:54,068-Speed 3403.64 samples/sec   Loss 0.5648   LearningRate 0.0004   Epoch: 18   Global Step: 106580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:28:57,080-Speed 3400.56 samples/sec   Loss 0.5678   LearningRate 0.0004   Epoch: 18   Global Step: 106590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:00,105-Speed 3385.69 samples/sec   Loss 0.6671   LearningRate 0.0004   Epoch: 18   Global Step: 106600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:03,120-Speed 3397.54 samples/sec   Loss 0.5999   LearningRate 0.0004   Epoch: 18   Global Step: 106610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:06,134-Speed 3397.34 samples/sec   Loss 0.5884   LearningRate 0.0004   Epoch: 18   Global Step: 106620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:09,148-Speed 3398.17 samples/sec   Loss 0.6503   LearningRate 0.0004   Epoch: 18   Global Step: 106630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:12,189-Speed 3368.52 samples/sec   Loss 0.6299   LearningRate 0.0004   Epoch: 18   Global Step: 106640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:15,203-Speed 3399.58 samples/sec   Loss 0.5647   LearningRate 0.0004   Epoch: 18   Global Step: 106650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:18,216-Speed 3399.11 samples/sec   Loss 0.6323   LearningRate 0.0004   Epoch: 18   Global Step: 106660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:21,240-Speed 3387.29 samples/sec   Loss 0.6697   LearningRate 0.0004   Epoch: 18   Global Step: 106670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:29:24,242-Speed 3411.60 samples/sec   Loss 0.6041   LearningRate 0.0004   Epoch: 18   Global Step: 106680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:27,264-Speed 3389.58 samples/sec   Loss 0.5961   LearningRate 0.0004   Epoch: 18   Global Step: 106690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:30,286-Speed 3388.56 samples/sec   Loss 0.5790   LearningRate 0.0004   Epoch: 18   Global Step: 106700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:33,296-Speed 3402.96 samples/sec   Loss 0.6024   LearningRate 0.0004   Epoch: 18   Global Step: 106710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:36,306-Speed 3402.60 samples/sec   Loss 0.6367   LearningRate 0.0004   Epoch: 18   Global Step: 106720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:39,327-Speed 3390.73 samples/sec   Loss 0.6058   LearningRate 0.0004   Epoch: 18   Global Step: 106730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:42,339-Speed 3401.35 samples/sec   Loss 0.6519   LearningRate 0.0004   Epoch: 18   Global Step: 106740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:45,348-Speed 3403.72 samples/sec   Loss 0.5740   LearningRate 0.0004   Epoch: 18   Global Step: 106750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:48,366-Speed 3394.22 samples/sec   Loss 0.6174   LearningRate 0.0004   Epoch: 18   Global Step: 106760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:51,380-Speed 3398.10 samples/sec   Loss 0.5494   LearningRate 0.0004   Epoch: 18   Global Step: 106770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:29:54,388-Speed 3404.04 samples/sec   Loss 0.6457   LearningRate 0.0004   Epoch: 18   Global Step: 106780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:29:57,381-Speed 3422.81 samples/sec   Loss 0.5790   LearningRate 0.0004   Epoch: 18   Global Step: 106790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:00,399-Speed 3394.05 samples/sec   Loss 0.5904   LearningRate 0.0004   Epoch: 18   Global Step: 106800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:03,414-Speed 3396.50 samples/sec   Loss 0.5915   LearningRate 0.0004   Epoch: 18   Global Step: 106810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:06,434-Speed 3392.61 samples/sec   Loss 0.6136   LearningRate 0.0004   Epoch: 18   Global Step: 106820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:09,466-Speed 3377.49 samples/sec   Loss 0.6643   LearningRate 0.0004   Epoch: 18   Global Step: 106830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:12,484-Speed 3394.36 samples/sec   Loss 0.6016   LearningRate 0.0004   Epoch: 18   Global Step: 106840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:15,476-Speed 3422.85 samples/sec   Loss 0.6530   LearningRate 0.0004   Epoch: 18   Global Step: 106850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:18,489-Speed 3399.66 samples/sec   Loss 0.5486   LearningRate 0.0004   Epoch: 18   Global Step: 106860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:21,499-Speed 3401.79 samples/sec   Loss 0.5648   LearningRate 0.0004   Epoch: 18   Global Step: 106870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:24,509-Speed 3403.19 samples/sec   Loss 0.6523   LearningRate 0.0004   Epoch: 18   Global Step: 106880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:27,520-Speed 3401.66 samples/sec   Loss 0.5780   LearningRate 0.0004   Epoch: 18   Global Step: 106890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:30,534-Speed 3398.67 samples/sec   Loss 0.5621   LearningRate 0.0004   Epoch: 18   Global Step: 106900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:33,545-Speed 3401.48 samples/sec   Loss 0.5922   LearningRate 0.0004   Epoch: 18   Global Step: 106910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:36,555-Speed 3402.97 samples/sec   Loss 0.6578   LearningRate 0.0004   Epoch: 18   Global Step: 106920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:39,567-Speed 3400.90 samples/sec   Loss 0.6059   LearningRate 0.0004   Epoch: 18   Global Step: 106930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:42,574-Speed 3405.89 samples/sec   Loss 0.6606   LearningRate 0.0004   Epoch: 18   Global Step: 106940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:30:45,588-Speed 3398.09 samples/sec   Loss 0.6701   LearningRate 0.0004   Epoch: 18   Global Step: 106950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:48,595-Speed 3405.62 samples/sec   Loss 0.5547   LearningRate 0.0004   Epoch: 18   Global Step: 106960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:51,612-Speed 3395.27 samples/sec   Loss 0.6207   LearningRate 0.0004   Epoch: 18   Global Step: 106970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:54,620-Speed 3405.07 samples/sec   Loss 0.5507   LearningRate 0.0004   Epoch: 18   Global Step: 106980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:30:57,628-Speed 3404.92 samples/sec   Loss 0.6766   LearningRate 0.0004   Epoch: 18   Global Step: 106990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:00,658-Speed 3380.73 samples/sec   Loss 0.6180   LearningRate 0.0003   Epoch: 18   Global Step: 107000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:03,666-Speed 3404.82 samples/sec   Loss 0.5907   LearningRate 0.0003   Epoch: 18   Global Step: 107010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:06,676-Speed 3402.70 samples/sec   Loss 0.5963   LearningRate 0.0003   Epoch: 18   Global Step: 107020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:09,690-Speed 3398.93 samples/sec   Loss 0.6297   LearningRate 0.0003   Epoch: 18   Global Step: 107030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:12,698-Speed 3404.56 samples/sec   Loss 0.6261   LearningRate 0.0003   Epoch: 18   Global Step: 107040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:15,704-Speed 3407.11 samples/sec   Loss 0.5946   LearningRate 0.0003   Epoch: 18   Global Step: 107050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:18,720-Speed 3396.93 samples/sec   Loss 0.5774   LearningRate 0.0003   Epoch: 18   Global Step: 107060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:21,744-Speed 3386.01 samples/sec   Loss 0.5872   LearningRate 0.0003   Epoch: 18   Global Step: 107070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:24,765-Speed 3391.24 samples/sec   Loss 0.5807   LearningRate 0.0003   Epoch: 18   Global Step: 107080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:31:27,760-Speed 3420.29 samples/sec   Loss 0.5878   LearningRate 0.0003   Epoch: 18   Global Step: 107090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:30,768-Speed 3404.29 samples/sec   Loss 0.6088   LearningRate 0.0003   Epoch: 18   Global Step: 107100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:33,778-Speed 3403.48 samples/sec   Loss 0.5763   LearningRate 0.0003   Epoch: 18   Global Step: 107110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:36,787-Speed 3403.66 samples/sec   Loss 0.6347   LearningRate 0.0003   Epoch: 18   Global Step: 107120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:39,795-Speed 3404.48 samples/sec   Loss 0.6351   LearningRate 0.0003   Epoch: 18   Global Step: 107130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:42,808-Speed 3399.57 samples/sec   Loss 0.6031   LearningRate 0.0003   Epoch: 18   Global Step: 107140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:45,837-Speed 3381.76 samples/sec   Loss 0.5696   LearningRate 0.0003   Epoch: 18   Global Step: 107150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:48,849-Speed 3399.97 samples/sec   Loss 0.6700   LearningRate 0.0003   Epoch: 18   Global Step: 107160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:51,856-Speed 3406.19 samples/sec   Loss 0.5515   LearningRate 0.0003   Epoch: 18   Global Step: 107170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:54,867-Speed 3401.90 samples/sec   Loss 0.6163   LearningRate 0.0003   Epoch: 18   Global Step: 107180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:31:57,877-Speed 3403.02 samples/sec   Loss 0.5933   LearningRate 0.0003   Epoch: 18   Global Step: 107190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:00,867-Speed 3425.63 samples/sec   Loss 0.5821   LearningRate 0.0003   Epoch: 18   Global Step: 107200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:03,898-Speed 3379.77 samples/sec   Loss 0.6184   LearningRate 0.0003   Epoch: 18   Global Step: 107210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:06,914-Speed 3395.95 samples/sec   Loss 0.6247   LearningRate 0.0003   Epoch: 18   Global Step: 107220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:09,927-Speed 3399.21 samples/sec   Loss 0.5735   LearningRate 0.0003   Epoch: 18   Global Step: 107230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:12,957-Speed 3380.31 samples/sec   Loss 0.5668   LearningRate 0.0003   Epoch: 18   Global Step: 107240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:15,988-Speed 3378.89 samples/sec   Loss 0.6134   LearningRate 0.0003   Epoch: 18   Global Step: 107250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:19,014-Speed 3384.28 samples/sec   Loss 0.5809   LearningRate 0.0003   Epoch: 18   Global Step: 107260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:22,023-Speed 3404.58 samples/sec   Loss 0.5740   LearningRate 0.0003   Epoch: 18   Global Step: 107270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:25,038-Speed 3397.18 samples/sec   Loss 0.5818   LearningRate 0.0003   Epoch: 18   Global Step: 107280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:28,063-Speed 3386.33 samples/sec   Loss 0.5725   LearningRate 0.0003   Epoch: 18   Global Step: 107290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:32:31,079-Speed 3395.54 samples/sec   Loss 0.6012   LearningRate 0.0003   Epoch: 18   Global Step: 107300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:34,095-Speed 3395.97 samples/sec   Loss 0.5632   LearningRate 0.0003   Epoch: 18   Global Step: 107310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:37,107-Speed 3400.59 samples/sec   Loss 0.5798   LearningRate 0.0003   Epoch: 18   Global Step: 107320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:40,119-Speed 3400.42 samples/sec   Loss 0.5828   LearningRate 0.0003   Epoch: 18   Global Step: 107330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:43,134-Speed 3397.78 samples/sec   Loss 0.5884   LearningRate 0.0003   Epoch: 18   Global Step: 107340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:46,155-Speed 3390.41 samples/sec   Loss 0.5946   LearningRate 0.0003   Epoch: 18   Global Step: 107350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:49,187-Speed 3377.13 samples/sec   Loss 0.5344   LearningRate 0.0003   Epoch: 18   Global Step: 107360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:52,203-Speed 3396.79 samples/sec   Loss 0.5396   LearningRate 0.0003   Epoch: 18   Global Step: 107370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:55,214-Speed 3401.66 samples/sec   Loss 0.5806   LearningRate 0.0003   Epoch: 18   Global Step: 107380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:32:58,245-Speed 3378.90 samples/sec   Loss 0.5654   LearningRate 0.0003   Epoch: 18   Global Step: 107390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:01,257-Speed 3400.61 samples/sec   Loss 0.6277   LearningRate 0.0003   Epoch: 18   Global Step: 107400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:33:04,286-Speed 3381.78 samples/sec   Loss 0.5329   LearningRate 0.0003   Epoch: 18   Global Step: 107410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:33:07,287-Speed 3413.07 samples/sec   Loss 0.6411   LearningRate 0.0003   Epoch: 18   Global Step: 107420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:10,311-Speed 3387.01 samples/sec   Loss 0.6046   LearningRate 0.0003   Epoch: 18   Global Step: 107430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:13,338-Speed 3383.50 samples/sec   Loss 0.5413   LearningRate 0.0003   Epoch: 18   Global Step: 107440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:16,349-Speed 3401.52 samples/sec   Loss 0.5455   LearningRate 0.0003   Epoch: 18   Global Step: 107450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:19,374-Speed 3386.33 samples/sec   Loss 0.6245   LearningRate 0.0003   Epoch: 18   Global Step: 107460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:22,399-Speed 3386.45 samples/sec   Loss 0.5914   LearningRate 0.0003   Epoch: 18   Global Step: 107470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:25,424-Speed 3385.05 samples/sec   Loss 0.5589   LearningRate 0.0003   Epoch: 18   Global Step: 107480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:28,449-Speed 3386.60 samples/sec   Loss 0.5898   LearningRate 0.0003   Epoch: 18   Global Step: 107490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:31,461-Speed 3400.56 samples/sec   Loss 0.5764   LearningRate 0.0003   Epoch: 18   Global Step: 107500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:34,472-Speed 3400.94 samples/sec   Loss 0.6164   LearningRate 0.0003   Epoch: 18   Global Step: 107510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:37,497-Speed 3385.95 samples/sec   Loss 0.6677   LearningRate 0.0003   Epoch: 18   Global Step: 107520   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:33:40,499-Speed 3411.67 samples/sec   Loss 0.5746   LearningRate 0.0003   Epoch: 18   Global Step: 107530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:43,512-Speed 3399.99 samples/sec   Loss 0.6453   LearningRate 0.0003   Epoch: 18   Global Step: 107540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:46,527-Speed 3397.64 samples/sec   Loss 0.6408   LearningRate 0.0003   Epoch: 18   Global Step: 107550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:49,612-Speed 3319.23 samples/sec   Loss 0.6679   LearningRate 0.0003   Epoch: 18   Global Step: 107560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:52,631-Speed 3392.83 samples/sec   Loss 0.5962   LearningRate 0.0003   Epoch: 18   Global Step: 107570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:55,646-Speed 3398.00 samples/sec   Loss 0.6097   LearningRate 0.0003   Epoch: 18   Global Step: 107580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:33:58,671-Speed 3385.46 samples/sec   Loss 0.6279   LearningRate 0.0003   Epoch: 18   Global Step: 107590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:01,692-Speed 3390.40 samples/sec   Loss 0.5393   LearningRate 0.0003   Epoch: 18   Global Step: 107600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:04,688-Speed 3418.48 samples/sec   Loss 0.5760   LearningRate 0.0003   Epoch: 18   Global Step: 107610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:07,702-Speed 3398.64 samples/sec   Loss 0.6010   LearningRate 0.0003   Epoch: 18   Global Step: 107620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:10,714-Speed 3400.60 samples/sec   Loss 0.5606   LearningRate 0.0003   Epoch: 18   Global Step: 107630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:13,754-Speed 3369.55 samples/sec   Loss 0.5878   LearningRate 0.0003   Epoch: 18   Global Step: 107640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:16,821-Speed 3339.25 samples/sec   Loss 0.5504   LearningRate 0.0003   Epoch: 18   Global Step: 107650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:19,838-Speed 3395.28 samples/sec   Loss 0.6284   LearningRate 0.0003   Epoch: 18   Global Step: 107660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:22,849-Speed 3401.03 samples/sec   Loss 0.5674   LearningRate 0.0003   Epoch: 18   Global Step: 107670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:25,861-Speed 3401.62 samples/sec   Loss 0.6287   LearningRate 0.0003   Epoch: 18   Global Step: 107680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:28,884-Speed 3388.12 samples/sec   Loss 0.6172   LearningRate 0.0003   Epoch: 18   Global Step: 107690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:31,897-Speed 3399.06 samples/sec   Loss 0.5855   LearningRate 0.0003   Epoch: 18   Global Step: 107700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:34:34,912-Speed 3397.81 samples/sec   Loss 0.6072   LearningRate 0.0003   Epoch: 18   Global Step: 107710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:37,937-Speed 3386.64 samples/sec   Loss 0.5678   LearningRate 0.0003   Epoch: 18   Global Step: 107720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:40,950-Speed 3399.20 samples/sec   Loss 0.5613   LearningRate 0.0003   Epoch: 18   Global Step: 107730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:43,963-Speed 3399.11 samples/sec   Loss 0.6311   LearningRate 0.0003   Epoch: 18   Global Step: 107740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:46,981-Speed 3394.15 samples/sec   Loss 0.5471   LearningRate 0.0003   Epoch: 18   Global Step: 107750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:50,004-Speed 3387.48 samples/sec   Loss 0.5738   LearningRate 0.0003   Epoch: 18   Global Step: 107760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:53,019-Speed 3397.06 samples/sec   Loss 0.5878   LearningRate 0.0003   Epoch: 18   Global Step: 107770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:56,044-Speed 3386.57 samples/sec   Loss 0.6354   LearningRate 0.0003   Epoch: 18   Global Step: 107780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:34:59,063-Speed 3392.49 samples/sec   Loss 0.6211   LearningRate 0.0003   Epoch: 18   Global Step: 107790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:02,094-Speed 3379.44 samples/sec   Loss 0.5872   LearningRate 0.0003   Epoch: 18   Global Step: 107800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:05,113-Speed 3392.16 samples/sec   Loss 0.5458   LearningRate 0.0003   Epoch: 18   Global Step: 107810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:35:08,108-Speed 3420.02 samples/sec   Loss 0.6214   LearningRate 0.0003   Epoch: 18   Global Step: 107820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:11,130-Speed 3390.38 samples/sec   Loss 0.6277   LearningRate 0.0003   Epoch: 18   Global Step: 107830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:14,143-Speed 3399.42 samples/sec   Loss 0.5888   LearningRate 0.0003   Epoch: 18   Global Step: 107840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:17,160-Speed 3395.07 samples/sec   Loss 0.6286   LearningRate 0.0003   Epoch: 18   Global Step: 107850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:20,173-Speed 3398.62 samples/sec   Loss 0.6234   LearningRate 0.0003   Epoch: 18   Global Step: 107860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:23,208-Speed 3375.43 samples/sec   Loss 0.6166   LearningRate 0.0003   Epoch: 18   Global Step: 107870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:26,236-Speed 3382.10 samples/sec   Loss 0.6123   LearningRate 0.0003   Epoch: 18   Global Step: 107880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:29,425-Speed 3211.55 samples/sec   Loss 0.5591   LearningRate 0.0003   Epoch: 18   Global Step: 107890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:32,440-Speed 3398.11 samples/sec   Loss 0.5812   LearningRate 0.0003   Epoch: 18   Global Step: 107900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:35,455-Speed 3397.20 samples/sec   Loss 0.6540   LearningRate 0.0003   Epoch: 18   Global Step: 107910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:38,473-Speed 3393.91 samples/sec   Loss 0.6088   LearningRate 0.0003   Epoch: 18   Global Step: 107920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:35:41,468-Speed 3418.66 samples/sec   Loss 0.5477   LearningRate 0.0003   Epoch: 18   Global Step: 107930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:44,484-Speed 3396.63 samples/sec   Loss 0.5745   LearningRate 0.0003   Epoch: 18   Global Step: 107940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:47,503-Speed 3392.19 samples/sec   Loss 0.5935   LearningRate 0.0003   Epoch: 18   Global Step: 107950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:50,520-Speed 3395.23 samples/sec   Loss 0.6218   LearningRate 0.0003   Epoch: 18   Global Step: 107960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:53,546-Speed 3384.99 samples/sec   Loss 0.5347   LearningRate 0.0003   Epoch: 18   Global Step: 107970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:56,564-Speed 3393.77 samples/sec   Loss 0.6456   LearningRate 0.0003   Epoch: 18   Global Step: 107980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:35:59,578-Speed 3398.89 samples/sec   Loss 0.5836   LearningRate 0.0003   Epoch: 18   Global Step: 107990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:36:02,593-Speed 3396.57 samples/sec   Loss 0.5949   LearningRate 0.0003   Epoch: 18   Global Step: 108000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:36:45,708-[lfw][108000]XNorm: 21.752915
Training: 2022-04-27 12:36:45,709-[lfw][108000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-04-27 12:36:45,709-[lfw][108000]Accuracy-Highest: 0.99817
Training: 2022-04-27 12:37:36,252-[cfp_fp][108000]XNorm: 22.033458
Training: 2022-04-27 12:37:36,253-[cfp_fp][108000]Accuracy-Flip: 0.98586+-0.00517
Training: 2022-04-27 12:37:36,253-[cfp_fp][108000]Accuracy-Highest: 0.98614
Training: 2022-04-27 12:38:19,611-[agedb_30][108000]XNorm: 22.249081
Training: 2022-04-27 12:38:19,611-[agedb_30][108000]Accuracy-Flip: 0.98083+-0.00837
Training: 2022-04-27 12:38:19,612-[agedb_30][108000]Accuracy-Highest: 0.98233
Training: 2022-04-27 12:38:22,632-Speed 73.12 samples/sec   Loss 0.6393   LearningRate 0.0003   Epoch: 18   Global Step: 108010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:25,652-Speed 3391.94 samples/sec   Loss 0.5140   LearningRate 0.0003   Epoch: 18   Global Step: 108020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:28,844-Speed 3209.31 samples/sec   Loss 0.5625   LearningRate 0.0003   Epoch: 18   Global Step: 108030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:42,033-Speed 776.43 samples/sec   Loss 0.4842   LearningRate 0.0002   Epoch: 19   Global Step: 108040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:45,064-Speed 3379.68 samples/sec   Loss 0.4742   LearningRate 0.0002   Epoch: 19   Global Step: 108050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:48,156-Speed 3312.93 samples/sec   Loss 0.5031   LearningRate 0.0002   Epoch: 19   Global Step: 108060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:51,258-Speed 3302.35 samples/sec   Loss 0.4644   LearningRate 0.0002   Epoch: 19   Global Step: 108070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:54,280-Speed 3388.67 samples/sec   Loss 0.5099   LearningRate 0.0002   Epoch: 19   Global Step: 108080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:38:57,307-Speed 3384.58 samples/sec   Loss 0.4506   LearningRate 0.0002   Epoch: 19   Global Step: 108090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:00,413-Speed 3297.41 samples/sec   Loss 0.4759   LearningRate 0.0002   Epoch: 19   Global Step: 108100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:03,448-Speed 3374.60 samples/sec   Loss 0.5270   LearningRate 0.0002   Epoch: 19   Global Step: 108110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:06,477-Speed 3382.09 samples/sec   Loss 0.4726   LearningRate 0.0002   Epoch: 19   Global Step: 108120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:09,493-Speed 3395.14 samples/sec   Loss 0.5547   LearningRate 0.0002   Epoch: 19   Global Step: 108130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:39:12,509-Speed 3396.77 samples/sec   Loss 0.4903   LearningRate 0.0002   Epoch: 19   Global Step: 108140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:39:15,523-Speed 3398.38 samples/sec   Loss 0.4811   LearningRate 0.0002   Epoch: 19   Global Step: 108150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:39:18,542-Speed 3392.70 samples/sec   Loss 0.5179   LearningRate 0.0002   Epoch: 19   Global Step: 108160   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:39:21,558-Speed 3396.01 samples/sec   Loss 0.4855   LearningRate 0.0002   Epoch: 19   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:24,608-Speed 3357.80 samples/sec   Loss 0.5058   LearningRate 0.0002   Epoch: 19   Global Step: 108180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:27,652-Speed 3365.08 samples/sec   Loss 0.4820   LearningRate 0.0002   Epoch: 19   Global Step: 108190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:30,681-Speed 3381.36 samples/sec   Loss 0.5246   LearningRate 0.0002   Epoch: 19   Global Step: 108200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:33,720-Speed 3370.58 samples/sec   Loss 0.5415   LearningRate 0.0002   Epoch: 19   Global Step: 108210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:36,745-Speed 3386.18 samples/sec   Loss 0.4780   LearningRate 0.0002   Epoch: 19   Global Step: 108220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:39,779-Speed 3376.30 samples/sec   Loss 0.4737   LearningRate 0.0002   Epoch: 19   Global Step: 108230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:42,810-Speed 3379.65 samples/sec   Loss 0.4860   LearningRate 0.0002   Epoch: 19   Global Step: 108240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:45,870-Speed 3346.55 samples/sec   Loss 0.4707   LearningRate 0.0002   Epoch: 19   Global Step: 108250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:48,906-Speed 3374.11 samples/sec   Loss 0.5058   LearningRate 0.0002   Epoch: 19   Global Step: 108260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:51,915-Speed 3404.95 samples/sec   Loss 0.5341   LearningRate 0.0002   Epoch: 19   Global Step: 108270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:54,938-Speed 3388.51 samples/sec   Loss 0.5408   LearningRate 0.0002   Epoch: 19   Global Step: 108280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:39:57,961-Speed 3388.29 samples/sec   Loss 0.5431   LearningRate 0.0002   Epoch: 19   Global Step: 108290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:01,005-Speed 3364.72 samples/sec   Loss 0.4884   LearningRate 0.0002   Epoch: 19   Global Step: 108300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:04,051-Speed 3362.69 samples/sec   Loss 0.4828   LearningRate 0.0002   Epoch: 19   Global Step: 108310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:07,076-Speed 3384.97 samples/sec   Loss 0.5295   LearningRate 0.0002   Epoch: 19   Global Step: 108320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:10,106-Speed 3380.75 samples/sec   Loss 0.5060   LearningRate 0.0002   Epoch: 19   Global Step: 108330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:13,129-Speed 3388.49 samples/sec   Loss 0.3917   LearningRate 0.0002   Epoch: 19   Global Step: 108340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:16,197-Speed 3337.86 samples/sec   Loss 0.4942   LearningRate 0.0002   Epoch: 19   Global Step: 108350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:19,228-Speed 3380.54 samples/sec   Loss 0.5013   LearningRate 0.0002   Epoch: 19   Global Step: 108360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:22,250-Speed 3388.13 samples/sec   Loss 0.4746   LearningRate 0.0002   Epoch: 19   Global Step: 108370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:40:25,252-Speed 3411.95 samples/sec   Loss 0.4412   LearningRate 0.0002   Epoch: 19   Global Step: 108380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:28,271-Speed 3392.57 samples/sec   Loss 0.4859   LearningRate 0.0002   Epoch: 19   Global Step: 108390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:31,289-Speed 3394.31 samples/sec   Loss 0.5219   LearningRate 0.0002   Epoch: 19   Global Step: 108400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:34,331-Speed 3367.00 samples/sec   Loss 0.4790   LearningRate 0.0002   Epoch: 19   Global Step: 108410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:37,362-Speed 3378.90 samples/sec   Loss 0.5384   LearningRate 0.0002   Epoch: 19   Global Step: 108420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:40,384-Speed 3389.41 samples/sec   Loss 0.5091   LearningRate 0.0002   Epoch: 19   Global Step: 108430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:43,402-Speed 3393.37 samples/sec   Loss 0.4665   LearningRate 0.0002   Epoch: 19   Global Step: 108440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:46,428-Speed 3385.34 samples/sec   Loss 0.4816   LearningRate 0.0002   Epoch: 19   Global Step: 108450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:49,446-Speed 3393.43 samples/sec   Loss 0.4269   LearningRate 0.0002   Epoch: 19   Global Step: 108460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:52,465-Speed 3392.96 samples/sec   Loss 0.5007   LearningRate 0.0002   Epoch: 19   Global Step: 108470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:55,469-Speed 3409.14 samples/sec   Loss 0.5360   LearningRate 0.0002   Epoch: 19   Global Step: 108480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:40:58,504-Speed 3375.15 samples/sec   Loss 0.4842   LearningRate 0.0002   Epoch: 19   Global Step: 108490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:01,539-Speed 3375.00 samples/sec   Loss 0.4946   LearningRate 0.0002   Epoch: 19   Global Step: 108500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:04,562-Speed 3387.88 samples/sec   Loss 0.5333   LearningRate 0.0002   Epoch: 19   Global Step: 108510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:07,590-Speed 3382.98 samples/sec   Loss 0.4684   LearningRate 0.0002   Epoch: 19   Global Step: 108520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:10,610-Speed 3392.23 samples/sec   Loss 0.4896   LearningRate 0.0002   Epoch: 19   Global Step: 108530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:13,628-Speed 3394.32 samples/sec   Loss 0.4735   LearningRate 0.0002   Epoch: 19   Global Step: 108540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:16,643-Speed 3396.28 samples/sec   Loss 0.4904   LearningRate 0.0002   Epoch: 19   Global Step: 108550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:19,661-Speed 3394.79 samples/sec   Loss 0.4798   LearningRate 0.0002   Epoch: 19   Global Step: 108560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:22,685-Speed 3386.47 samples/sec   Loss 0.4698   LearningRate 0.0002   Epoch: 19   Global Step: 108570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:25,799-Speed 3289.16 samples/sec   Loss 0.4807   LearningRate 0.0002   Epoch: 19   Global Step: 108580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-27 12:41:28,800-Speed 3412.09 samples/sec   Loss 0.5164   LearningRate 0.0002   Epoch: 19   Global Step: 108590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:31,817-Speed 3396.01 samples/sec   Loss 0.5243   LearningRate 0.0002   Epoch: 19   Global Step: 108600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:34,835-Speed 3393.43 samples/sec   Loss 0.5058   LearningRate 0.0002   Epoch: 19   Global Step: 108610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 12:41:37,832-Speed 3417.06 samples/sec   Loss 0.4543   LearningRate 0.0002   Epoch: 19   Global Step: 108620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:41:40,858-Speed 3385.38 samples/sec   Loss 0.4887   LearningRate 0.0002   Epoch: 19   Global Step: 108630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:41:43,881-Speed 3388.60 samples/sec   Loss 0.5257   LearningRate 0.0002   Epoch: 19   Global Step: 108640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:41:46,904-Speed 3387.31 samples/sec   Loss 0.4827   LearningRate 0.0002   Epoch: 19   Global Step: 108650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:41:49,932-Speed 3383.50 samples/sec   Loss 0.5586   LearningRate 0.0002   Epoch: 19   Global Step: 108660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:41:52,946-Speed 3397.27 samples/sec   Loss 0.5379   LearningRate 0.0002   Epoch: 19   Global Step: 108670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:41:55,964-Speed 3393.82 samples/sec   Loss 0.5505   LearningRate 0.0002   Epoch: 19   Global Step: 108680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:41:58,980-Speed 3396.13 samples/sec   Loss 0.5053   LearningRate 0.0002   Epoch: 19   Global Step: 108690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:01,993-Speed 3400.23 samples/sec   Loss 0.5146   LearningRate 0.0002   Epoch: 19   Global Step: 108700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:05,009-Speed 3395.08 samples/sec   Loss 0.4984   LearningRate 0.0002   Epoch: 19   Global Step: 108710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:08,027-Speed 3394.37 samples/sec   Loss 0.4366   LearningRate 0.0002   Epoch: 19   Global Step: 108720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:11,075-Speed 3359.67 samples/sec   Loss 0.4365   LearningRate 0.0002   Epoch: 19   Global Step: 108730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:14,101-Speed 3385.08 samples/sec   Loss 0.4985   LearningRate 0.0002   Epoch: 19   Global Step: 108740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:17,135-Speed 3376.16 samples/sec   Loss 0.4982   LearningRate 0.0002   Epoch: 19   Global Step: 108750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:20,163-Speed 3382.44 samples/sec   Loss 0.5130   LearningRate 0.0002   Epoch: 19   Global Step: 108760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:23,181-Speed 3393.89 samples/sec   Loss 0.4593   LearningRate 0.0002   Epoch: 19   Global Step: 108770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:26,199-Speed 3393.49 samples/sec   Loss 0.5629   LearningRate 0.0002   Epoch: 19   Global Step: 108780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:29,222-Speed 3388.50 samples/sec   Loss 0.4796   LearningRate 0.0002   Epoch: 19   Global Step: 108790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 12:42:32,241-Speed 3392.62 samples/sec   Loss 0.4621   LearningRate 0.0002   Epoch: 19   Global Step: 108800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:35,267-Speed 3385.24 samples/sec   Loss 0.5100   LearningRate 0.0002   Epoch: 19   Global Step: 108810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:38,290-Speed 3388.36 samples/sec   Loss 0.4740   LearningRate 0.0002   Epoch: 19   Global Step: 108820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:41,310-Speed 3391.51 samples/sec   Loss 0.5060   LearningRate 0.0002   Epoch: 19   Global Step: 108830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:44,344-Speed 3376.01 samples/sec   Loss 0.5251   LearningRate 0.0002   Epoch: 19   Global Step: 108840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:47,365-Speed 3390.65 samples/sec   Loss 0.5171   LearningRate 0.0002   Epoch: 19   Global Step: 108850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:50,382-Speed 3394.47 samples/sec   Loss 0.4384   LearningRate 0.0002   Epoch: 19   Global Step: 108860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:53,459-Speed 3328.83 samples/sec   Loss 0.4540   LearningRate 0.0002   Epoch: 19   Global Step: 108870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 12:42:56,511-Speed 3356.46 samples/sec   Loss 0.5274   LearningRate 0.0002   Epoch: 19   Global Step: 108880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:42:59,569-Speed 3349.24 samples/sec   Loss 0.5171   LearningRate 0.0002   Epoch: 19   Global Step: 108890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:02,609-Speed 3369.33 samples/sec   Loss 0.5589   LearningRate 0.0002   Epoch: 19   Global Step: 108900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:05,611-Speed 3411.67 samples/sec   Loss 0.5649   LearningRate 0.0002   Epoch: 19   Global Step: 108910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:08,642-Speed 3378.79 samples/sec   Loss 0.5174   LearningRate 0.0002   Epoch: 19   Global Step: 108920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:11,660-Speed 3393.35 samples/sec   Loss 0.5303   LearningRate 0.0002   Epoch: 19   Global Step: 108930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:14,682-Speed 3389.75 samples/sec   Loss 0.5200   LearningRate 0.0002   Epoch: 19   Global Step: 108940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:17,703-Speed 3390.15 samples/sec   Loss 0.5112   LearningRate 0.0002   Epoch: 19   Global Step: 108950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:20,725-Speed 3389.78 samples/sec   Loss 0.5703   LearningRate 0.0002   Epoch: 19   Global Step: 108960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:23,752-Speed 3383.98 samples/sec   Loss 0.5713   LearningRate 0.0002   Epoch: 19   Global Step: 108970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:26,776-Speed 3387.31 samples/sec   Loss 0.5036   LearningRate 0.0002   Epoch: 19   Global Step: 108980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:29,793-Speed 3394.55 samples/sec   Loss 0.4988   LearningRate 0.0002   Epoch: 19   Global Step: 108990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:32,819-Speed 3385.60 samples/sec   Loss 0.5347   LearningRate 0.0002   Epoch: 19   Global Step: 109000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:43:35,842-Speed 3388.03 samples/sec   Loss 0.4790   LearningRate 0.0002   Epoch: 19   Global Step: 109010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:38,868-Speed 3384.40 samples/sec   Loss 0.5218   LearningRate 0.0002   Epoch: 19   Global Step: 109020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:41,885-Speed 3394.10 samples/sec   Loss 0.5220   LearningRate 0.0002   Epoch: 19   Global Step: 109030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:44,917-Speed 3388.04 samples/sec   Loss 0.4875   LearningRate 0.0002   Epoch: 19   Global Step: 109040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:47,940-Speed 3388.45 samples/sec   Loss 0.4447   LearningRate 0.0002   Epoch: 19   Global Step: 109050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:50,962-Speed 3388.75 samples/sec   Loss 0.4852   LearningRate 0.0002   Epoch: 19   Global Step: 109060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:53,986-Speed 3386.63 samples/sec   Loss 0.4731   LearningRate 0.0002   Epoch: 19   Global Step: 109070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:43:57,007-Speed 3390.32 samples/sec   Loss 0.5853   LearningRate 0.0002   Epoch: 19   Global Step: 109080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:00,039-Speed 3377.95 samples/sec   Loss 0.5245   LearningRate 0.0002   Epoch: 19   Global Step: 109090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:03,060-Speed 3390.52 samples/sec   Loss 0.4868   LearningRate 0.0002   Epoch: 19   Global Step: 109100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:06,055-Speed 3420.30 samples/sec   Loss 0.4519   LearningRate 0.0002   Epoch: 19   Global Step: 109110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:09,075-Speed 3391.71 samples/sec   Loss 0.5110   LearningRate 0.0002   Epoch: 19   Global Step: 109120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:12,142-Speed 3339.01 samples/sec   Loss 0.5528   LearningRate 0.0002   Epoch: 19   Global Step: 109130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:15,225-Speed 3322.67 samples/sec   Loss 0.4937   LearningRate 0.0002   Epoch: 19   Global Step: 109140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:18,249-Speed 3387.13 samples/sec   Loss 0.4596   LearningRate 0.0002   Epoch: 19   Global Step: 109150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:44:21,256-Speed 3406.41 samples/sec   Loss 0.5127   LearningRate 0.0002   Epoch: 19   Global Step: 109160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:24,276-Speed 3390.70 samples/sec   Loss 0.4595   LearningRate 0.0002   Epoch: 19   Global Step: 109170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:27,299-Speed 3387.99 samples/sec   Loss 0.5248   LearningRate 0.0002   Epoch: 19   Global Step: 109180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:30,324-Speed 3386.09 samples/sec   Loss 0.4646   LearningRate 0.0002   Epoch: 19   Global Step: 109190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:33,344-Speed 3391.89 samples/sec   Loss 0.4434   LearningRate 0.0002   Epoch: 19   Global Step: 109200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:36,429-Speed 3319.36 samples/sec   Loss 0.4647   LearningRate 0.0002   Epoch: 19   Global Step: 109210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:39,500-Speed 3336.06 samples/sec   Loss 0.5236   LearningRate 0.0002   Epoch: 19   Global Step: 109220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:42,519-Speed 3392.68 samples/sec   Loss 0.5332   LearningRate 0.0002   Epoch: 19   Global Step: 109230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:45,558-Speed 3371.33 samples/sec   Loss 0.5206   LearningRate 0.0002   Epoch: 19   Global Step: 109240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:48,585-Speed 3384.74 samples/sec   Loss 0.4524   LearningRate 0.0002   Epoch: 19   Global Step: 109250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:51,593-Speed 3405.17 samples/sec   Loss 0.4540   LearningRate 0.0002   Epoch: 19   Global Step: 109260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:54,614-Speed 3390.65 samples/sec   Loss 0.4303   LearningRate 0.0002   Epoch: 19   Global Step: 109270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:44:57,735-Speed 3282.90 samples/sec   Loss 0.5467   LearningRate 0.0002   Epoch: 19   Global Step: 109280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:00,760-Speed 3386.23 samples/sec   Loss 0.4858   LearningRate 0.0002   Epoch: 19   Global Step: 109290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:03,824-Speed 3343.10 samples/sec   Loss 0.4715   LearningRate 0.0002   Epoch: 19   Global Step: 109300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:06,848-Speed 3386.12 samples/sec   Loss 0.4666   LearningRate 0.0002   Epoch: 19   Global Step: 109310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:09,872-Speed 3387.04 samples/sec   Loss 0.4950   LearningRate 0.0001   Epoch: 19   Global Step: 109320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:12,901-Speed 3383.80 samples/sec   Loss 0.4503   LearningRate 0.0001   Epoch: 19   Global Step: 109330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:15,923-Speed 3390.18 samples/sec   Loss 0.5369   LearningRate 0.0001   Epoch: 19   Global Step: 109340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:18,949-Speed 3384.56 samples/sec   Loss 0.5380   LearningRate 0.0001   Epoch: 19   Global Step: 109350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:45:21,976-Speed 3382.74 samples/sec   Loss 0.4940   LearningRate 0.0001   Epoch: 19   Global Step: 109360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:25,003-Speed 3384.47 samples/sec   Loss 0.5292   LearningRate 0.0001   Epoch: 19   Global Step: 109370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:28,034-Speed 3379.06 samples/sec   Loss 0.5454   LearningRate 0.0001   Epoch: 19   Global Step: 109380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:31,051-Speed 3394.55 samples/sec   Loss 0.4868   LearningRate 0.0001   Epoch: 19   Global Step: 109390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:34,079-Speed 3382.77 samples/sec   Loss 0.5200   LearningRate 0.0001   Epoch: 19   Global Step: 109400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:37,100-Speed 3390.93 samples/sec   Loss 0.5105   LearningRate 0.0001   Epoch: 19   Global Step: 109410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:40,119-Speed 3392.30 samples/sec   Loss 0.4729   LearningRate 0.0001   Epoch: 19   Global Step: 109420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:43,144-Speed 3386.33 samples/sec   Loss 0.4492   LearningRate 0.0001   Epoch: 19   Global Step: 109430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:46,166-Speed 3389.34 samples/sec   Loss 0.5427   LearningRate 0.0001   Epoch: 19   Global Step: 109440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:49,191-Speed 3385.86 samples/sec   Loss 0.4763   LearningRate 0.0001   Epoch: 19   Global Step: 109450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:52,196-Speed 3407.92 samples/sec   Loss 0.5077   LearningRate 0.0001   Epoch: 19   Global Step: 109460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:55,222-Speed 3384.71 samples/sec   Loss 0.5060   LearningRate 0.0001   Epoch: 19   Global Step: 109470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:45:58,241-Speed 3393.07 samples/sec   Loss 0.5214   LearningRate 0.0001   Epoch: 19   Global Step: 109480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:01,261-Speed 3391.85 samples/sec   Loss 0.4956   LearningRate 0.0001   Epoch: 19   Global Step: 109490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:04,269-Speed 3404.48 samples/sec   Loss 0.4741   LearningRate 0.0001   Epoch: 19   Global Step: 109500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:07,290-Speed 3391.51 samples/sec   Loss 0.4822   LearningRate 0.0001   Epoch: 19   Global Step: 109510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:10,336-Speed 3361.68 samples/sec   Loss 0.4802   LearningRate 0.0001   Epoch: 19   Global Step: 109520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:13,390-Speed 3354.58 samples/sec   Loss 0.4726   LearningRate 0.0001   Epoch: 19   Global Step: 109530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:16,422-Speed 3377.65 samples/sec   Loss 0.4852   LearningRate 0.0001   Epoch: 19   Global Step: 109540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:19,473-Speed 3357.09 samples/sec   Loss 0.4646   LearningRate 0.0001   Epoch: 19   Global Step: 109550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:22,509-Speed 3373.26 samples/sec   Loss 0.4884   LearningRate 0.0001   Epoch: 19   Global Step: 109560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:25,532-Speed 3387.74 samples/sec   Loss 0.4781   LearningRate 0.0001   Epoch: 19   Global Step: 109570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:28,557-Speed 3386.50 samples/sec   Loss 0.5155   LearningRate 0.0001   Epoch: 19   Global Step: 109580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:31,583-Speed 3384.88 samples/sec   Loss 0.5366   LearningRate 0.0001   Epoch: 19   Global Step: 109590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:46:34,609-Speed 3384.93 samples/sec   Loss 0.4590   LearningRate 0.0001   Epoch: 19   Global Step: 109600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:37,674-Speed 3341.47 samples/sec   Loss 0.5146   LearningRate 0.0001   Epoch: 19   Global Step: 109610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:40,713-Speed 3370.43 samples/sec   Loss 0.4973   LearningRate 0.0001   Epoch: 19   Global Step: 109620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:43,745-Speed 3378.88 samples/sec   Loss 0.4979   LearningRate 0.0001   Epoch: 19   Global Step: 109630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:46,769-Speed 3386.76 samples/sec   Loss 0.4895   LearningRate 0.0001   Epoch: 19   Global Step: 109640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:49,793-Speed 3386.61 samples/sec   Loss 0.4858   LearningRate 0.0001   Epoch: 19   Global Step: 109650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:52,813-Speed 3391.95 samples/sec   Loss 0.4718   LearningRate 0.0001   Epoch: 19   Global Step: 109660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:55,837-Speed 3386.93 samples/sec   Loss 0.5548   LearningRate 0.0001   Epoch: 19   Global Step: 109670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:46:58,867-Speed 3380.11 samples/sec   Loss 0.4902   LearningRate 0.0001   Epoch: 19   Global Step: 109680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:01,902-Speed 3374.81 samples/sec   Loss 0.4550   LearningRate 0.0001   Epoch: 19   Global Step: 109690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:04,962-Speed 3347.98 samples/sec   Loss 0.5120   LearningRate 0.0001   Epoch: 19   Global Step: 109700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:07,984-Speed 3388.94 samples/sec   Loss 0.5086   LearningRate 0.0001   Epoch: 19   Global Step: 109710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:11,010-Speed 3384.62 samples/sec   Loss 0.4855   LearningRate 0.0001   Epoch: 19   Global Step: 109720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:14,012-Speed 3412.12 samples/sec   Loss 0.5396   LearningRate 0.0001   Epoch: 19   Global Step: 109730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:17,038-Speed 3384.52 samples/sec   Loss 0.5630   LearningRate 0.0001   Epoch: 19   Global Step: 109740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:20,074-Speed 3374.11 samples/sec   Loss 0.4685   LearningRate 0.0001   Epoch: 19   Global Step: 109750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:23,106-Speed 3377.39 samples/sec   Loss 0.5152   LearningRate 0.0001   Epoch: 19   Global Step: 109760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:26,129-Speed 3388.15 samples/sec   Loss 0.5313   LearningRate 0.0001   Epoch: 19   Global Step: 109770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:29,156-Speed 3383.83 samples/sec   Loss 0.4748   LearningRate 0.0001   Epoch: 19   Global Step: 109780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:32,208-Speed 3357.09 samples/sec   Loss 0.4694   LearningRate 0.0001   Epoch: 19   Global Step: 109790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:35,233-Speed 3385.52 samples/sec   Loss 0.5061   LearningRate 0.0001   Epoch: 19   Global Step: 109800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:38,252-Speed 3392.61 samples/sec   Loss 0.4788   LearningRate 0.0001   Epoch: 19   Global Step: 109810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:41,328-Speed 3328.97 samples/sec   Loss 0.4560   LearningRate 0.0001   Epoch: 19   Global Step: 109820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:47:44,350-Speed 3389.86 samples/sec   Loss 0.4990   LearningRate 0.0001   Epoch: 19   Global Step: 109830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:47,372-Speed 3389.41 samples/sec   Loss 0.4680   LearningRate 0.0001   Epoch: 19   Global Step: 109840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:50,400-Speed 3382.21 samples/sec   Loss 0.5500   LearningRate 0.0001   Epoch: 19   Global Step: 109850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:53,475-Speed 3331.21 samples/sec   Loss 0.5249   LearningRate 0.0001   Epoch: 19   Global Step: 109860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:56,498-Speed 3388.66 samples/sec   Loss 0.4936   LearningRate 0.0001   Epoch: 19   Global Step: 109870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:47:59,521-Speed 3387.44 samples/sec   Loss 0.4559   LearningRate 0.0001   Epoch: 19   Global Step: 109880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:02,544-Speed 3388.94 samples/sec   Loss 0.5043   LearningRate 0.0001   Epoch: 19   Global Step: 109890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:05,566-Speed 3388.70 samples/sec   Loss 0.4812   LearningRate 0.0001   Epoch: 19   Global Step: 109900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:08,585-Speed 3392.47 samples/sec   Loss 0.4971   LearningRate 0.0001   Epoch: 19   Global Step: 109910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:11,609-Speed 3386.78 samples/sec   Loss 0.4502   LearningRate 0.0001   Epoch: 19   Global Step: 109920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:14,659-Speed 3358.71 samples/sec   Loss 0.5171   LearningRate 0.0001   Epoch: 19   Global Step: 109930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:17,684-Speed 3385.23 samples/sec   Loss 0.5336   LearningRate 0.0001   Epoch: 19   Global Step: 109940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:20,709-Speed 3386.48 samples/sec   Loss 0.5000   LearningRate 0.0001   Epoch: 19   Global Step: 109950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:23,732-Speed 3387.87 samples/sec   Loss 0.4773   LearningRate 0.0001   Epoch: 19   Global Step: 109960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:26,759-Speed 3384.61 samples/sec   Loss 0.4959   LearningRate 0.0001   Epoch: 19   Global Step: 109970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:29,785-Speed 3384.64 samples/sec   Loss 0.5141   LearningRate 0.0001   Epoch: 19   Global Step: 109980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:48:32,795-Speed 3402.82 samples/sec   Loss 0.5339   LearningRate 0.0001   Epoch: 19   Global Step: 109990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:48:35,872-Speed 3328.58 samples/sec   Loss 0.5863   LearningRate 0.0001   Epoch: 19   Global Step: 110000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:49:19,167-[lfw][110000]XNorm: 21.717442
Training: 2022-04-27 12:49:19,168-[lfw][110000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-27 12:49:19,168-[lfw][110000]Accuracy-Highest: 0.99817
Training: 2022-04-27 12:50:09,451-[cfp_fp][110000]XNorm: 22.112701
Training: 2022-04-27 12:50:09,452-[cfp_fp][110000]Accuracy-Flip: 0.98629+-0.00483
Training: 2022-04-27 12:50:09,452-[cfp_fp][110000]Accuracy-Highest: 0.98629
Training: 2022-04-27 12:50:52,847-[agedb_30][110000]XNorm: 22.248529
Training: 2022-04-27 12:50:52,848-[agedb_30][110000]Accuracy-Flip: 0.98250+-0.00786
Training: 2022-04-27 12:50:52,848-[agedb_30][110000]Accuracy-Highest: 0.98250
Training: 2022-04-27 12:50:55,859-Speed 73.15 samples/sec   Loss 0.4784   LearningRate 0.0001   Epoch: 19   Global Step: 110010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:50:58,915-Speed 3351.91 samples/sec   Loss 0.4128   LearningRate 0.0001   Epoch: 19   Global Step: 110020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:01,931-Speed 3395.84 samples/sec   Loss 0.4833   LearningRate 0.0001   Epoch: 19   Global Step: 110030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:04,957-Speed 3385.47 samples/sec   Loss 0.5313   LearningRate 0.0001   Epoch: 19   Global Step: 110040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:07,967-Speed 3402.10 samples/sec   Loss 0.5218   LearningRate 0.0001   Epoch: 19   Global Step: 110050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:10,987-Speed 3391.91 samples/sec   Loss 0.4865   LearningRate 0.0001   Epoch: 19   Global Step: 110060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:13,998-Speed 3401.01 samples/sec   Loss 0.4624   LearningRate 0.0001   Epoch: 19   Global Step: 110070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:17,020-Speed 3389.18 samples/sec   Loss 0.5127   LearningRate 0.0001   Epoch: 19   Global Step: 110080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:20,034-Speed 3398.67 samples/sec   Loss 0.5065   LearningRate 0.0001   Epoch: 19   Global Step: 110090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:51:23,075-Speed 3368.49 samples/sec   Loss 0.5087   LearningRate 0.0001   Epoch: 19   Global Step: 110100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:51:26,138-Speed 3343.75 samples/sec   Loss 0.5029   LearningRate 0.0001   Epoch: 19   Global Step: 110110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:51:29,150-Speed 3400.41 samples/sec   Loss 0.4988   LearningRate 0.0001   Epoch: 19   Global Step: 110120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:51:32,164-Speed 3398.52 samples/sec   Loss 0.5102   LearningRate 0.0001   Epoch: 19   Global Step: 110130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:51:35,159-Speed 3420.15 samples/sec   Loss 0.4808   LearningRate 0.0001   Epoch: 19   Global Step: 110140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:51:38,161-Speed 3411.05 samples/sec   Loss 0.4585   LearningRate 0.0001   Epoch: 19   Global Step: 110150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:51:41,188-Speed 3384.25 samples/sec   Loss 0.4695   LearningRate 0.0001   Epoch: 19   Global Step: 110160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:51:44,202-Speed 3397.85 samples/sec   Loss 0.4780   LearningRate 0.0001   Epoch: 19   Global Step: 110170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:51:47,236-Speed 3377.46 samples/sec   Loss 0.4889   LearningRate 0.0001   Epoch: 19   Global Step: 110180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:51:50,261-Speed 3384.91 samples/sec   Loss 0.4613   LearningRate 0.0001   Epoch: 19   Global Step: 110190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:51:53,280-Speed 3392.52 samples/sec   Loss 0.4618   LearningRate 0.0001   Epoch: 19   Global Step: 110200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:51:56,292-Speed 3400.47 samples/sec   Loss 0.5747   LearningRate 0.0001   Epoch: 19   Global Step: 110210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:51:59,312-Speed 3392.03 samples/sec   Loss 0.4793   LearningRate 0.0001   Epoch: 19   Global Step: 110220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:52:02,328-Speed 3395.92 samples/sec   Loss 0.4823   LearningRate 0.0001   Epoch: 19   Global Step: 110230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:52:05,345-Speed 3395.31 samples/sec   Loss 0.4783   LearningRate 0.0001   Epoch: 19   Global Step: 110240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 12:52:08,374-Speed 3381.52 samples/sec   Loss 0.5310   LearningRate 0.0001   Epoch: 19   Global Step: 110250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:11,393-Speed 3392.69 samples/sec   Loss 0.4883   LearningRate 0.0001   Epoch: 19   Global Step: 110260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:14,421-Speed 3383.02 samples/sec   Loss 0.4603   LearningRate 0.0001   Epoch: 19   Global Step: 110270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:17,443-Speed 3389.15 samples/sec   Loss 0.4850   LearningRate 0.0001   Epoch: 19   Global Step: 110280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:20,463-Speed 3391.21 samples/sec   Loss 0.4969   LearningRate 0.0001   Epoch: 19   Global Step: 110290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:23,480-Speed 3395.02 samples/sec   Loss 0.5141   LearningRate 0.0001   Epoch: 19   Global Step: 110300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:26,496-Speed 3395.83 samples/sec   Loss 0.5374   LearningRate 0.0001   Epoch: 19   Global Step: 110310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:29,510-Speed 3398.24 samples/sec   Loss 0.4371   LearningRate 0.0001   Epoch: 19   Global Step: 110320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:32,530-Speed 3391.32 samples/sec   Loss 0.4708   LearningRate 0.0001   Epoch: 19   Global Step: 110330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:35,574-Speed 3364.83 samples/sec   Loss 0.5141   LearningRate 0.0001   Epoch: 19   Global Step: 110340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:52:38,609-Speed 3374.72 samples/sec   Loss 0.4971   LearningRate 0.0001   Epoch: 19   Global Step: 110350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:52:41,648-Speed 3370.73 samples/sec   Loss 0.4949   LearningRate 0.0001   Epoch: 19   Global Step: 110360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:52:44,664-Speed 3396.04 samples/sec   Loss 0.5710   LearningRate 0.0001   Epoch: 19   Global Step: 110370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:52:47,679-Speed 3397.59 samples/sec   Loss 0.4961   LearningRate 0.0001   Epoch: 19   Global Step: 110380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:52:50,692-Speed 3398.82 samples/sec   Loss 0.5692   LearningRate 0.0001   Epoch: 19   Global Step: 110390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:52:53,713-Speed 3390.55 samples/sec   Loss 0.4760   LearningRate 0.0001   Epoch: 19   Global Step: 110400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:52:56,729-Speed 3396.51 samples/sec   Loss 0.5217   LearningRate 0.0001   Epoch: 19   Global Step: 110410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:52:59,757-Speed 3381.94 samples/sec   Loss 0.5428   LearningRate 0.0001   Epoch: 19   Global Step: 110420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:02,778-Speed 3390.13 samples/sec   Loss 0.5053   LearningRate 0.0001   Epoch: 19   Global Step: 110430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:05,799-Speed 3390.80 samples/sec   Loss 0.4802   LearningRate 0.0001   Epoch: 19   Global Step: 110440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:08,812-Speed 3400.08 samples/sec   Loss 0.5224   LearningRate 0.0001   Epoch: 19   Global Step: 110450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:11,826-Speed 3397.79 samples/sec   Loss 0.5234   LearningRate 0.0001   Epoch: 19   Global Step: 110460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:14,836-Speed 3403.84 samples/sec   Loss 0.5488   LearningRate 0.0001   Epoch: 19   Global Step: 110470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:17,850-Speed 3398.33 samples/sec   Loss 0.4670   LearningRate 0.0001   Epoch: 19   Global Step: 110480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:20,866-Speed 3396.13 samples/sec   Loss 0.4612   LearningRate 0.0001   Epoch: 19   Global Step: 110490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:23,882-Speed 3395.65 samples/sec   Loss 0.4754   LearningRate 0.0001   Epoch: 19   Global Step: 110500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:26,896-Speed 3398.36 samples/sec   Loss 0.4549   LearningRate 0.0001   Epoch: 19   Global Step: 110510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:29,931-Speed 3374.98 samples/sec   Loss 0.4651   LearningRate 0.0001   Epoch: 19   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:32,945-Speed 3397.95 samples/sec   Loss 0.5477   LearningRate 0.0001   Epoch: 19   Global Step: 110530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:35,960-Speed 3397.82 samples/sec   Loss 0.4929   LearningRate 0.0001   Epoch: 19   Global Step: 110540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:38,985-Speed 3386.06 samples/sec   Loss 0.5105   LearningRate 0.0001   Epoch: 19   Global Step: 110550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:53:41,999-Speed 3398.10 samples/sec   Loss 0.4809   LearningRate 0.0001   Epoch: 19   Global Step: 110560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:53:45,012-Speed 3399.50 samples/sec   Loss 0.5257   LearningRate 0.0001   Epoch: 19   Global Step: 110570   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:53:48,057-Speed 3363.15 samples/sec   Loss 0.5260   LearningRate 0.0001   Epoch: 19   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:51,071-Speed 3398.14 samples/sec   Loss 0.4650   LearningRate 0.0001   Epoch: 19   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:54,088-Speed 3394.87 samples/sec   Loss 0.5345   LearningRate 0.0001   Epoch: 19   Global Step: 110600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:53:57,102-Speed 3398.39 samples/sec   Loss 0.4980   LearningRate 0.0001   Epoch: 19   Global Step: 110610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:00,103-Speed 3413.45 samples/sec   Loss 0.5402   LearningRate 0.0001   Epoch: 19   Global Step: 110620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:03,117-Speed 3397.44 samples/sec   Loss 0.5325   LearningRate 0.0001   Epoch: 19   Global Step: 110630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:06,135-Speed 3394.69 samples/sec   Loss 0.4897   LearningRate 0.0001   Epoch: 19   Global Step: 110640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:09,150-Speed 3396.93 samples/sec   Loss 0.5026   LearningRate 0.0001   Epoch: 19   Global Step: 110650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:12,169-Speed 3392.88 samples/sec   Loss 0.5302   LearningRate 0.0001   Epoch: 19   Global Step: 110660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:15,217-Speed 3360.24 samples/sec   Loss 0.4980   LearningRate 0.0001   Epoch: 19   Global Step: 110670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:18,236-Speed 3392.13 samples/sec   Loss 0.4609   LearningRate 0.0001   Epoch: 19   Global Step: 110680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:21,252-Speed 3396.30 samples/sec   Loss 0.4957   LearningRate 0.0001   Epoch: 19   Global Step: 110690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:24,295-Speed 3366.22 samples/sec   Loss 0.4855   LearningRate 0.0001   Epoch: 19   Global Step: 110700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:27,313-Speed 3393.27 samples/sec   Loss 0.4571   LearningRate 0.0001   Epoch: 19   Global Step: 110710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:54:30,341-Speed 3383.38 samples/sec   Loss 0.5657   LearningRate 0.0001   Epoch: 19   Global Step: 110720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:33,371-Speed 3379.68 samples/sec   Loss 0.4373   LearningRate 0.0001   Epoch: 19   Global Step: 110730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:36,404-Speed 3377.01 samples/sec   Loss 0.5074   LearningRate 0.0001   Epoch: 19   Global Step: 110740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:39,433-Speed 3381.58 samples/sec   Loss 0.4946   LearningRate 0.0001   Epoch: 19   Global Step: 110750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:42,462-Speed 3382.13 samples/sec   Loss 0.5168   LearningRate 0.0001   Epoch: 19   Global Step: 110760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:45,483-Speed 3389.53 samples/sec   Loss 0.4914   LearningRate 0.0001   Epoch: 19   Global Step: 110770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:48,501-Speed 3393.89 samples/sec   Loss 0.4733   LearningRate 0.0001   Epoch: 19   Global Step: 110780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:51,530-Speed 3381.31 samples/sec   Loss 0.5003   LearningRate 0.0001   Epoch: 19   Global Step: 110790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:54,560-Speed 3380.52 samples/sec   Loss 0.5239   LearningRate 0.0001   Epoch: 19   Global Step: 110800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:54:57,576-Speed 3396.46 samples/sec   Loss 0.5012   LearningRate 0.0001   Epoch: 19   Global Step: 110810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:00,624-Speed 3360.25 samples/sec   Loss 0.4875   LearningRate 0.0001   Epoch: 19   Global Step: 110820   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:55:03,750-Speed 3276.98 samples/sec   Loss 0.4907   LearningRate 0.0001   Epoch: 19   Global Step: 110830   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:55:06,757-Speed 3406.56 samples/sec   Loss 0.5084   LearningRate 0.0001   Epoch: 19   Global Step: 110840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:09,780-Speed 3388.20 samples/sec   Loss 0.5614   LearningRate 0.0001   Epoch: 19   Global Step: 110850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:12,799-Speed 3392.13 samples/sec   Loss 0.5098   LearningRate 0.0001   Epoch: 19   Global Step: 110860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:15,820-Speed 3389.85 samples/sec   Loss 0.4830   LearningRate 0.0001   Epoch: 19   Global Step: 110870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:18,840-Speed 3392.34 samples/sec   Loss 0.4930   LearningRate 0.0001   Epoch: 19   Global Step: 110880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:21,880-Speed 3368.89 samples/sec   Loss 0.4938   LearningRate 0.0001   Epoch: 19   Global Step: 110890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:24,918-Speed 3371.72 samples/sec   Loss 0.5408   LearningRate 0.0001   Epoch: 19   Global Step: 110900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:27,945-Speed 3383.09 samples/sec   Loss 0.5151   LearningRate 0.0001   Epoch: 19   Global Step: 110910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:30,971-Speed 3385.16 samples/sec   Loss 0.5152   LearningRate 0.0001   Epoch: 19   Global Step: 110920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:33,990-Speed 3392.14 samples/sec   Loss 0.5343   LearningRate 0.0001   Epoch: 19   Global Step: 110930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:37,012-Speed 3389.57 samples/sec   Loss 0.4520   LearningRate 0.0001   Epoch: 19   Global Step: 110940   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:55:40,017-Speed 3408.15 samples/sec   Loss 0.5144   LearningRate 0.0001   Epoch: 19   Global Step: 110950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:43,040-Speed 3388.50 samples/sec   Loss 0.4705   LearningRate 0.0001   Epoch: 19   Global Step: 110960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:46,088-Speed 3360.39 samples/sec   Loss 0.5399   LearningRate 0.0001   Epoch: 19   Global Step: 110970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:49,192-Speed 3299.59 samples/sec   Loss 0.5129   LearningRate 0.0001   Epoch: 19   Global Step: 110980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:52,213-Speed 3391.10 samples/sec   Loss 0.4904   LearningRate 0.0001   Epoch: 19   Global Step: 110990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:55,236-Speed 3388.58 samples/sec   Loss 0.5398   LearningRate 0.0001   Epoch: 19   Global Step: 111000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:55:58,260-Speed 3387.61 samples/sec   Loss 0.5182   LearningRate 0.0001   Epoch: 19   Global Step: 111010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:01,278-Speed 3392.90 samples/sec   Loss 0.4846   LearningRate 0.0001   Epoch: 19   Global Step: 111020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:04,301-Speed 3388.99 samples/sec   Loss 0.4613   LearningRate 0.0001   Epoch: 19   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:07,317-Speed 3395.41 samples/sec   Loss 0.5202   LearningRate 0.0001   Epoch: 19   Global Step: 111040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:10,321-Speed 3409.46 samples/sec   Loss 0.5069   LearningRate 0.0001   Epoch: 19   Global Step: 111050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:13,344-Speed 3388.64 samples/sec   Loss 0.5014   LearningRate 0.0001   Epoch: 19   Global Step: 111060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:16,394-Speed 3358.13 samples/sec   Loss 0.5301   LearningRate 0.0001   Epoch: 19   Global Step: 111070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:19,412-Speed 3393.51 samples/sec   Loss 0.5574   LearningRate 0.0001   Epoch: 19   Global Step: 111080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:22,436-Speed 3386.97 samples/sec   Loss 0.4919   LearningRate 0.0001   Epoch: 19   Global Step: 111090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:25,456-Speed 3391.90 samples/sec   Loss 0.5441   LearningRate 0.0001   Epoch: 19   Global Step: 111100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:28,477-Speed 3390.02 samples/sec   Loss 0.4871   LearningRate 0.0001   Epoch: 19   Global Step: 111110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:31,496-Speed 3393.06 samples/sec   Loss 0.5027   LearningRate 0.0001   Epoch: 19   Global Step: 111120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:34,513-Speed 3394.48 samples/sec   Loss 0.4739   LearningRate 0.0001   Epoch: 19   Global Step: 111130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:37,556-Speed 3365.80 samples/sec   Loss 0.4191   LearningRate 0.0001   Epoch: 19   Global Step: 111140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:40,601-Speed 3364.25 samples/sec   Loss 0.4679   LearningRate 0.0001   Epoch: 19   Global Step: 111150   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:56:43,604-Speed 3410.06 samples/sec   Loss 0.5062   LearningRate 0.0001   Epoch: 19   Global Step: 111160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:46,630-Speed 3385.49 samples/sec   Loss 0.4907   LearningRate 0.0001   Epoch: 19   Global Step: 111170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:49,696-Speed 3340.02 samples/sec   Loss 0.4902   LearningRate 0.0000   Epoch: 19   Global Step: 111180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:52,725-Speed 3382.55 samples/sec   Loss 0.5534   LearningRate 0.0000   Epoch: 19   Global Step: 111190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:55,761-Speed 3373.15 samples/sec   Loss 0.5144   LearningRate 0.0000   Epoch: 19   Global Step: 111200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:56:58,780-Speed 3391.88 samples/sec   Loss 0.5698   LearningRate 0.0000   Epoch: 19   Global Step: 111210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:01,836-Speed 3352.41 samples/sec   Loss 0.5045   LearningRate 0.0000   Epoch: 19   Global Step: 111220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:04,896-Speed 3347.14 samples/sec   Loss 0.4953   LearningRate 0.0000   Epoch: 19   Global Step: 111230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:07,912-Speed 3395.23 samples/sec   Loss 0.5246   LearningRate 0.0000   Epoch: 19   Global Step: 111240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:10,914-Speed 3412.60 samples/sec   Loss 0.4033   LearningRate 0.0000   Epoch: 19   Global Step: 111250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:13,931-Speed 3394.37 samples/sec   Loss 0.4670   LearningRate 0.0000   Epoch: 19   Global Step: 111260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:16,955-Speed 3387.84 samples/sec   Loss 0.4691   LearningRate 0.0000   Epoch: 19   Global Step: 111270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:19,974-Speed 3392.86 samples/sec   Loss 0.5240   LearningRate 0.0000   Epoch: 19   Global Step: 111280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:23,014-Speed 3368.71 samples/sec   Loss 0.4992   LearningRate 0.0000   Epoch: 19   Global Step: 111290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:26,079-Speed 3342.15 samples/sec   Loss 0.4791   LearningRate 0.0000   Epoch: 19   Global Step: 111300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:29,103-Speed 3386.18 samples/sec   Loss 0.5117   LearningRate 0.0000   Epoch: 19   Global Step: 111310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:32,119-Speed 3396.14 samples/sec   Loss 0.4740   LearningRate 0.0000   Epoch: 19   Global Step: 111320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:35,148-Speed 3381.65 samples/sec   Loss 0.4406   LearningRate 0.0000   Epoch: 19   Global Step: 111330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:38,173-Speed 3385.63 samples/sec   Loss 0.5700   LearningRate 0.0000   Epoch: 19   Global Step: 111340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:57:41,193-Speed 3391.69 samples/sec   Loss 0.5507   LearningRate 0.0000   Epoch: 19   Global Step: 111350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:44,220-Speed 3384.45 samples/sec   Loss 0.4852   LearningRate 0.0000   Epoch: 19   Global Step: 111360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:47,238-Speed 3393.54 samples/sec   Loss 0.4815   LearningRate 0.0000   Epoch: 19   Global Step: 111370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:50,322-Speed 3320.96 samples/sec   Loss 0.5156   LearningRate 0.0000   Epoch: 19   Global Step: 111380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:53,340-Speed 3394.39 samples/sec   Loss 0.5351   LearningRate 0.0000   Epoch: 19   Global Step: 111390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:56,378-Speed 3371.45 samples/sec   Loss 0.5280   LearningRate 0.0000   Epoch: 19   Global Step: 111400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:57:59,407-Speed 3380.69 samples/sec   Loss 0.4265   LearningRate 0.0000   Epoch: 19   Global Step: 111410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:02,425-Speed 3393.88 samples/sec   Loss 0.5175   LearningRate 0.0000   Epoch: 19   Global Step: 111420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:05,441-Speed 3396.29 samples/sec   Loss 0.4921   LearningRate 0.0000   Epoch: 19   Global Step: 111430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:08,515-Speed 3332.11 samples/sec   Loss 0.5164   LearningRate 0.0000   Epoch: 19   Global Step: 111440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:11,686-Speed 3230.18 samples/sec   Loss 0.5342   LearningRate 0.0000   Epoch: 19   Global Step: 111450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:14,707-Speed 3391.35 samples/sec   Loss 0.4904   LearningRate 0.0000   Epoch: 19   Global Step: 111460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:17,724-Speed 3393.92 samples/sec   Loss 0.5206   LearningRate 0.0000   Epoch: 19   Global Step: 111470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:20,741-Speed 3394.86 samples/sec   Loss 0.5216   LearningRate 0.0000   Epoch: 19   Global Step: 111480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:23,762-Speed 3391.36 samples/sec   Loss 0.4251   LearningRate 0.0000   Epoch: 19   Global Step: 111490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:26,780-Speed 3393.07 samples/sec   Loss 0.4223   LearningRate 0.0000   Epoch: 19   Global Step: 111500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:29,805-Speed 3386.04 samples/sec   Loss 0.5527   LearningRate 0.0000   Epoch: 19   Global Step: 111510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:32,827-Speed 3388.59 samples/sec   Loss 0.5306   LearningRate 0.0000   Epoch: 19   Global Step: 111520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:35,855-Speed 3385.84 samples/sec   Loss 0.4729   LearningRate 0.0000   Epoch: 19   Global Step: 111530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:38,886-Speed 3379.39 samples/sec   Loss 0.4870   LearningRate 0.0000   Epoch: 19   Global Step: 111540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:58:41,912-Speed 3385.53 samples/sec   Loss 0.5130   LearningRate 0.0000   Epoch: 19   Global Step: 111550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:44,928-Speed 3395.82 samples/sec   Loss 0.5202   LearningRate 0.0000   Epoch: 19   Global Step: 111560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:47,941-Speed 3398.80 samples/sec   Loss 0.5189   LearningRate 0.0000   Epoch: 19   Global Step: 111570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:50,957-Speed 3395.69 samples/sec   Loss 0.4531   LearningRate 0.0000   Epoch: 19   Global Step: 111580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:53,987-Speed 3380.32 samples/sec   Loss 0.4684   LearningRate 0.0000   Epoch: 19   Global Step: 111590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:58:57,016-Speed 3381.43 samples/sec   Loss 0.5266   LearningRate 0.0000   Epoch: 19   Global Step: 111600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:00,038-Speed 3389.85 samples/sec   Loss 0.4794   LearningRate 0.0000   Epoch: 19   Global Step: 111610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:03,065-Speed 3383.30 samples/sec   Loss 0.4340   LearningRate 0.0000   Epoch: 19   Global Step: 111620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:06,089-Speed 3387.01 samples/sec   Loss 0.5012   LearningRate 0.0000   Epoch: 19   Global Step: 111630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:09,115-Speed 3384.75 samples/sec   Loss 0.4467   LearningRate 0.0000   Epoch: 19   Global Step: 111640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:12,134-Speed 3393.34 samples/sec   Loss 0.5298   LearningRate 0.0000   Epoch: 19   Global Step: 111650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 12:59:15,134-Speed 3414.04 samples/sec   Loss 0.4789   LearningRate 0.0000   Epoch: 19   Global Step: 111660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:18,156-Speed 3389.33 samples/sec   Loss 0.5303   LearningRate 0.0000   Epoch: 19   Global Step: 111670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:21,182-Speed 3384.69 samples/sec   Loss 0.4937   LearningRate 0.0000   Epoch: 19   Global Step: 111680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:24,202-Speed 3391.01 samples/sec   Loss 0.4729   LearningRate 0.0000   Epoch: 19   Global Step: 111690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:27,233-Speed 3379.34 samples/sec   Loss 0.5106   LearningRate 0.0000   Epoch: 19   Global Step: 111700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:30,271-Speed 3372.15 samples/sec   Loss 0.4731   LearningRate 0.0000   Epoch: 19   Global Step: 111710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:33,289-Speed 3393.78 samples/sec   Loss 0.4386   LearningRate 0.0000   Epoch: 19   Global Step: 111720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:36,314-Speed 3385.67 samples/sec   Loss 0.4863   LearningRate 0.0000   Epoch: 19   Global Step: 111730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:39,337-Speed 3388.82 samples/sec   Loss 0.5366   LearningRate 0.0000   Epoch: 19   Global Step: 111740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 12:59:42,351-Speed 3397.73 samples/sec   Loss 0.4884   LearningRate 0.0000   Epoch: 19   Global Step: 111750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:59:45,370-Speed 3393.00 samples/sec   Loss 0.5265   LearningRate 0.0000   Epoch: 19   Global Step: 111760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:59:48,401-Speed 3379.10 samples/sec   Loss 0.5461   LearningRate 0.0000   Epoch: 19   Global Step: 111770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:59:51,469-Speed 3338.22 samples/sec   Loss 0.4366   LearningRate 0.0000   Epoch: 19   Global Step: 111780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:59:54,523-Speed 3354.11 samples/sec   Loss 0.4759   LearningRate 0.0000   Epoch: 19   Global Step: 111790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 12:59:57,546-Speed 3388.37 samples/sec   Loss 0.5336   LearningRate 0.0000   Epoch: 19   Global Step: 111800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:00,569-Speed 3388.99 samples/sec   Loss 0.4419   LearningRate 0.0000   Epoch: 19   Global Step: 111810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:03,638-Speed 3337.22 samples/sec   Loss 0.4732   LearningRate 0.0000   Epoch: 19   Global Step: 111820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:06,661-Speed 3388.24 samples/sec   Loss 0.4435   LearningRate 0.0000   Epoch: 19   Global Step: 111830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:09,708-Speed 3361.21 samples/sec   Loss 0.5759   LearningRate 0.0000   Epoch: 19   Global Step: 111840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:12,745-Speed 3372.50 samples/sec   Loss 0.5213   LearningRate 0.0000   Epoch: 19   Global Step: 111850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:15,766-Speed 3390.17 samples/sec   Loss 0.4750   LearningRate 0.0000   Epoch: 19   Global Step: 111860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:18,789-Speed 3388.48 samples/sec   Loss 0.5077   LearningRate 0.0000   Epoch: 19   Global Step: 111870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:21,804-Speed 3397.24 samples/sec   Loss 0.4399   LearningRate 0.0000   Epoch: 19   Global Step: 111880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:24,819-Speed 3397.15 samples/sec   Loss 0.4788   LearningRate 0.0000   Epoch: 19   Global Step: 111890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:27,853-Speed 3375.17 samples/sec   Loss 0.5482   LearningRate 0.0000   Epoch: 19   Global Step: 111900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:30,876-Speed 3388.99 samples/sec   Loss 0.4931   LearningRate 0.0000   Epoch: 19   Global Step: 111910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:33,899-Speed 3387.89 samples/sec   Loss 0.4325   LearningRate 0.0000   Epoch: 19   Global Step: 111920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:36,920-Speed 3391.16 samples/sec   Loss 0.4626   LearningRate 0.0000   Epoch: 19   Global Step: 111930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:00:39,947-Speed 3382.62 samples/sec   Loss 0.4696   LearningRate 0.0000   Epoch: 19   Global Step: 111940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:42,971-Speed 3387.71 samples/sec   Loss 0.5706   LearningRate 0.0000   Epoch: 19   Global Step: 111950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:45,997-Speed 3384.11 samples/sec   Loss 0.5059   LearningRate 0.0000   Epoch: 19   Global Step: 111960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:49,014-Speed 3395.28 samples/sec   Loss 0.5090   LearningRate 0.0000   Epoch: 19   Global Step: 111970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:52,048-Speed 3375.21 samples/sec   Loss 0.4806   LearningRate 0.0000   Epoch: 19   Global Step: 111980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:55,065-Speed 3395.62 samples/sec   Loss 0.4926   LearningRate 0.0000   Epoch: 19   Global Step: 111990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:00:58,124-Speed 3347.78 samples/sec   Loss 0.5099   LearningRate 0.0000   Epoch: 19   Global Step: 112000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:01:41,424-[lfw][112000]XNorm: 21.779004
Training: 2022-04-27 13:01:41,424-[lfw][112000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-27 13:01:41,425-[lfw][112000]Accuracy-Highest: 0.99817
Training: 2022-04-27 13:02:31,736-[cfp_fp][112000]XNorm: 22.111145
Training: 2022-04-27 13:02:31,737-[cfp_fp][112000]Accuracy-Flip: 0.98529+-0.00542
Training: 2022-04-27 13:02:31,737-[cfp_fp][112000]Accuracy-Highest: 0.98629
Training: 2022-04-27 13:03:15,249-[agedb_30][112000]XNorm: 22.232803
Training: 2022-04-27 13:03:15,250-[agedb_30][112000]Accuracy-Flip: 0.98167+-0.00882
Training: 2022-04-27 13:03:15,250-[agedb_30][112000]Accuracy-Highest: 0.98250
Training: 2022-04-27 13:03:18,266-Speed 73.07 samples/sec   Loss 0.4502   LearningRate 0.0000   Epoch: 19   Global Step: 112010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:03:21,274-Speed 3405.36 samples/sec   Loss 0.5108   LearningRate 0.0000   Epoch: 19   Global Step: 112020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:03:24,281-Speed 3406.76 samples/sec   Loss 0.4599   LearningRate 0.0000   Epoch: 19   Global Step: 112030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:03:27,298-Speed 3394.45 samples/sec   Loss 0.4756   LearningRate 0.0000   Epoch: 19   Global Step: 112040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:30,309-Speed 3401.65 samples/sec   Loss 0.4858   LearningRate 0.0000   Epoch: 19   Global Step: 112050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:33,320-Speed 3401.16 samples/sec   Loss 0.4681   LearningRate 0.0000   Epoch: 19   Global Step: 112060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:36,334-Speed 3399.12 samples/sec   Loss 0.4237   LearningRate 0.0000   Epoch: 19   Global Step: 112070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:39,348-Speed 3398.27 samples/sec   Loss 0.5001   LearningRate 0.0000   Epoch: 19   Global Step: 112080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:42,365-Speed 3393.91 samples/sec   Loss 0.5034   LearningRate 0.0000   Epoch: 19   Global Step: 112090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:45,383-Speed 3394.42 samples/sec   Loss 0.4726   LearningRate 0.0000   Epoch: 19   Global Step: 112100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:48,488-Speed 3298.01 samples/sec   Loss 0.4948   LearningRate 0.0000   Epoch: 19   Global Step: 112110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:51,575-Speed 3318.37 samples/sec   Loss 0.4561   LearningRate 0.0000   Epoch: 19   Global Step: 112120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:54,598-Speed 3388.08 samples/sec   Loss 0.4937   LearningRate 0.0000   Epoch: 19   Global Step: 112130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:03:57,635-Speed 3372.44 samples/sec   Loss 0.5214   LearningRate 0.0000   Epoch: 19   Global Step: 112140   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:04:00,657-Speed 3389.27 samples/sec   Loss 0.5138   LearningRate 0.0000   Epoch: 19   Global Step: 112150   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:04:03,660-Speed 3410.53 samples/sec   Loss 0.5814   LearningRate 0.0000   Epoch: 19   Global Step: 112160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:06,677-Speed 3395.15 samples/sec   Loss 0.5404   LearningRate 0.0000   Epoch: 19   Global Step: 112170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:09,704-Speed 3383.94 samples/sec   Loss 0.5261   LearningRate 0.0000   Epoch: 19   Global Step: 112180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:12,726-Speed 3389.62 samples/sec   Loss 0.4456   LearningRate 0.0000   Epoch: 19   Global Step: 112190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:15,744-Speed 3393.44 samples/sec   Loss 0.5125   LearningRate 0.0000   Epoch: 19   Global Step: 112200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:18,758-Speed 3398.22 samples/sec   Loss 0.5533   LearningRate 0.0000   Epoch: 19   Global Step: 112210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:21,768-Speed 3403.50 samples/sec   Loss 0.4777   LearningRate 0.0000   Epoch: 19   Global Step: 112220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:24,783-Speed 3396.19 samples/sec   Loss 0.5504   LearningRate 0.0000   Epoch: 19   Global Step: 112230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:27,795-Speed 3401.70 samples/sec   Loss 0.4904   LearningRate 0.0000   Epoch: 19   Global Step: 112240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:30,805-Speed 3401.76 samples/sec   Loss 0.5899   LearningRate 0.0000   Epoch: 19   Global Step: 112250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:33,799-Speed 3421.38 samples/sec   Loss 0.4784   LearningRate 0.0000   Epoch: 19   Global Step: 112260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:36,815-Speed 3395.26 samples/sec   Loss 0.4487   LearningRate 0.0000   Epoch: 19   Global Step: 112270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:39,847-Speed 3378.63 samples/sec   Loss 0.5125   LearningRate 0.0000   Epoch: 19   Global Step: 112280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:42,866-Speed 3392.70 samples/sec   Loss 0.5163   LearningRate 0.0000   Epoch: 19   Global Step: 112290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:45,890-Speed 3386.94 samples/sec   Loss 0.4780   LearningRate 0.0000   Epoch: 19   Global Step: 112300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:48,900-Speed 3402.98 samples/sec   Loss 0.5244   LearningRate 0.0000   Epoch: 19   Global Step: 112310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:51,912-Speed 3401.12 samples/sec   Loss 0.5469   LearningRate 0.0000   Epoch: 19   Global Step: 112320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:54,922-Speed 3402.27 samples/sec   Loss 0.4883   LearningRate 0.0000   Epoch: 19   Global Step: 112330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:04:57,930-Speed 3404.84 samples/sec   Loss 0.5511   LearningRate 0.0000   Epoch: 19   Global Step: 112340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:00,944-Speed 3398.18 samples/sec   Loss 0.5141   LearningRate 0.0000   Epoch: 19   Global Step: 112350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:03,965-Speed 3390.26 samples/sec   Loss 0.5498   LearningRate 0.0000   Epoch: 19   Global Step: 112360   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:05:06,942-Speed 3440.85 samples/sec   Loss 0.4767   LearningRate 0.0000   Epoch: 19   Global Step: 112370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:09,958-Speed 3396.80 samples/sec   Loss 0.4909   LearningRate 0.0000   Epoch: 19   Global Step: 112380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:12,976-Speed 3393.74 samples/sec   Loss 0.4841   LearningRate 0.0000   Epoch: 19   Global Step: 112390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:15,991-Speed 3396.96 samples/sec   Loss 0.4926   LearningRate 0.0000   Epoch: 19   Global Step: 112400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:19,005-Speed 3397.89 samples/sec   Loss 0.4858   LearningRate 0.0000   Epoch: 19   Global Step: 112410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:22,029-Speed 3387.33 samples/sec   Loss 0.5290   LearningRate 0.0000   Epoch: 19   Global Step: 112420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:25,046-Speed 3394.98 samples/sec   Loss 0.4372   LearningRate 0.0000   Epoch: 19   Global Step: 112430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:28,054-Speed 3405.20 samples/sec   Loss 0.5088   LearningRate 0.0000   Epoch: 19   Global Step: 112440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:31,064-Speed 3402.02 samples/sec   Loss 0.5039   LearningRate 0.0000   Epoch: 19   Global Step: 112450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:34,075-Speed 3402.35 samples/sec   Loss 0.5022   LearningRate 0.0000   Epoch: 19   Global Step: 112460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:05:37,084-Speed 3403.59 samples/sec   Loss 0.5019   LearningRate 0.0000   Epoch: 19   Global Step: 112470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:40,100-Speed 3396.31 samples/sec   Loss 0.5266   LearningRate 0.0000   Epoch: 19   Global Step: 112480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:43,109-Speed 3403.70 samples/sec   Loss 0.4984   LearningRate 0.0000   Epoch: 19   Global Step: 112490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:46,121-Speed 3401.23 samples/sec   Loss 0.4802   LearningRate 0.0000   Epoch: 19   Global Step: 112500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:49,140-Speed 3392.04 samples/sec   Loss 0.4515   LearningRate 0.0000   Epoch: 19   Global Step: 112510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:52,158-Speed 3394.72 samples/sec   Loss 0.5168   LearningRate 0.0000   Epoch: 19   Global Step: 112520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:55,168-Speed 3402.23 samples/sec   Loss 0.5103   LearningRate 0.0000   Epoch: 19   Global Step: 112530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:05:58,204-Speed 3373.22 samples/sec   Loss 0.5572   LearningRate 0.0000   Epoch: 19   Global Step: 112540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:01,217-Speed 3400.35 samples/sec   Loss 0.5128   LearningRate 0.0000   Epoch: 19   Global Step: 112550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:04,210-Speed 3421.76 samples/sec   Loss 0.5285   LearningRate 0.0000   Epoch: 19   Global Step: 112560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:07,220-Speed 3402.72 samples/sec   Loss 0.5091   LearningRate 0.0000   Epoch: 19   Global Step: 112570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:10,255-Speed 3376.12 samples/sec   Loss 0.4617   LearningRate 0.0000   Epoch: 19   Global Step: 112580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:13,300-Speed 3363.73 samples/sec   Loss 0.5086   LearningRate 0.0000   Epoch: 19   Global Step: 112590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:16,315-Speed 3396.79 samples/sec   Loss 0.5198   LearningRate 0.0000   Epoch: 19   Global Step: 112600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:19,328-Speed 3398.66 samples/sec   Loss 0.5105   LearningRate 0.0000   Epoch: 19   Global Step: 112610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:22,338-Speed 3402.89 samples/sec   Loss 0.4613   LearningRate 0.0000   Epoch: 19   Global Step: 112620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:25,368-Speed 3380.33 samples/sec   Loss 0.4610   LearningRate 0.0000   Epoch: 19   Global Step: 112630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:28,381-Speed 3399.84 samples/sec   Loss 0.5206   LearningRate 0.0000   Epoch: 19   Global Step: 112640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:31,393-Speed 3401.13 samples/sec   Loss 0.5515   LearningRate 0.0000   Epoch: 19   Global Step: 112650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:06:34,406-Speed 3399.39 samples/sec   Loss 0.4820   LearningRate 0.0000   Epoch: 19   Global Step: 112660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:37,418-Speed 3399.97 samples/sec   Loss 0.4697   LearningRate 0.0000   Epoch: 19   Global Step: 112670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:40,440-Speed 3389.76 samples/sec   Loss 0.5368   LearningRate 0.0000   Epoch: 19   Global Step: 112680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:43,448-Speed 3405.07 samples/sec   Loss 0.4746   LearningRate 0.0000   Epoch: 19   Global Step: 112690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:46,460-Speed 3400.71 samples/sec   Loss 0.5521   LearningRate 0.0000   Epoch: 19   Global Step: 112700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:49,474-Speed 3397.87 samples/sec   Loss 0.4674   LearningRate 0.0000   Epoch: 19   Global Step: 112710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:52,485-Speed 3401.24 samples/sec   Loss 0.5480   LearningRate 0.0000   Epoch: 19   Global Step: 112720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:55,498-Speed 3400.11 samples/sec   Loss 0.5256   LearningRate 0.0000   Epoch: 19   Global Step: 112730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:06:58,526-Speed 3382.84 samples/sec   Loss 0.4880   LearningRate 0.0000   Epoch: 19   Global Step: 112740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:01,538-Speed 3400.06 samples/sec   Loss 0.4571   LearningRate 0.0000   Epoch: 19   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:04,551-Speed 3400.07 samples/sec   Loss 0.4666   LearningRate 0.0000   Epoch: 19   Global Step: 112760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:07:07,543-Speed 3422.84 samples/sec   Loss 0.5273   LearningRate 0.0000   Epoch: 19   Global Step: 112770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:10,533-Speed 3425.73 samples/sec   Loss 0.4601   LearningRate 0.0000   Epoch: 19   Global Step: 112780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:13,545-Speed 3399.93 samples/sec   Loss 0.5046   LearningRate 0.0000   Epoch: 19   Global Step: 112790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:16,561-Speed 3396.41 samples/sec   Loss 0.4541   LearningRate 0.0000   Epoch: 19   Global Step: 112800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:19,575-Speed 3398.21 samples/sec   Loss 0.4385   LearningRate 0.0000   Epoch: 19   Global Step: 112810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:22,595-Speed 3391.44 samples/sec   Loss 0.4823   LearningRate 0.0000   Epoch: 19   Global Step: 112820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:25,619-Speed 3386.57 samples/sec   Loss 0.5323   LearningRate 0.0000   Epoch: 19   Global Step: 112830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:28,637-Speed 3395.15 samples/sec   Loss 0.4875   LearningRate 0.0000   Epoch: 19   Global Step: 112840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:31,678-Speed 3367.61 samples/sec   Loss 0.5219   LearningRate 0.0000   Epoch: 19   Global Step: 112850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:34,690-Speed 3400.40 samples/sec   Loss 0.5681   LearningRate 0.0000   Epoch: 19   Global Step: 112860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:37,715-Speed 3385.53 samples/sec   Loss 0.5367   LearningRate 0.0000   Epoch: 19   Global Step: 112870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:07:40,751-Speed 3374.19 samples/sec   Loss 0.5630   LearningRate 0.0000   Epoch: 19   Global Step: 112880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:43,767-Speed 3395.96 samples/sec   Loss 0.4742   LearningRate 0.0000   Epoch: 19   Global Step: 112890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:46,866-Speed 3305.20 samples/sec   Loss 0.4790   LearningRate 0.0000   Epoch: 19   Global Step: 112900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:49,889-Speed 3388.18 samples/sec   Loss 0.4941   LearningRate 0.0000   Epoch: 19   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:52,905-Speed 3395.79 samples/sec   Loss 0.4283   LearningRate 0.0000   Epoch: 19   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:55,918-Speed 3399.14 samples/sec   Loss 0.5187   LearningRate 0.0000   Epoch: 19   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:07:58,936-Speed 3393.91 samples/sec   Loss 0.5653   LearningRate 0.0000   Epoch: 19   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:01,949-Speed 3399.82 samples/sec   Loss 0.4886   LearningRate 0.0000   Epoch: 19   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:04,970-Speed 3390.34 samples/sec   Loss 0.4689   LearningRate 0.0000   Epoch: 19   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:07,986-Speed 3395.75 samples/sec   Loss 0.4902   LearningRate 0.0000   Epoch: 19   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:11,015-Speed 3381.19 samples/sec   Loss 0.5034   LearningRate 0.0000   Epoch: 19   Global Step: 112980   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:08:14,030-Speed 3397.77 samples/sec   Loss 0.4765   LearningRate 0.0000   Epoch: 19   Global Step: 112990   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:08:17,070-Speed 3369.25 samples/sec   Loss 0.5304   LearningRate 0.0000   Epoch: 19   Global Step: 113000   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:08:20,089-Speed 3392.35 samples/sec   Loss 0.5531   LearningRate 0.0000   Epoch: 19   Global Step: 113010   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:08:23,100-Speed 3401.51 samples/sec   Loss 0.5087   LearningRate 0.0000   Epoch: 19   Global Step: 113020   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:08:26,114-Speed 3398.21 samples/sec   Loss 0.5288   LearningRate 0.0000   Epoch: 19   Global Step: 113030   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:08:29,106-Speed 3423.80 samples/sec   Loss 0.4663   LearningRate 0.0000   Epoch: 19   Global Step: 113040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:32,123-Speed 3395.35 samples/sec   Loss 0.4987   LearningRate 0.0000   Epoch: 19   Global Step: 113050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:35,136-Speed 3398.97 samples/sec   Loss 0.4570   LearningRate 0.0000   Epoch: 19   Global Step: 113060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:38,156-Speed 3391.52 samples/sec   Loss 0.5141   LearningRate 0.0000   Epoch: 19   Global Step: 113070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:41,196-Speed 3369.68 samples/sec   Loss 0.5039   LearningRate 0.0000   Epoch: 19   Global Step: 113080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:44,210-Speed 3397.60 samples/sec   Loss 0.4947   LearningRate 0.0000   Epoch: 19   Global Step: 113090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:47,224-Speed 3398.60 samples/sec   Loss 0.4683   LearningRate 0.0000   Epoch: 19   Global Step: 113100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:50,237-Speed 3399.72 samples/sec   Loss 0.4888   LearningRate 0.0000   Epoch: 19   Global Step: 113110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:53,256-Speed 3392.47 samples/sec   Loss 0.4694   LearningRate 0.0000   Epoch: 19   Global Step: 113120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:56,269-Speed 3398.73 samples/sec   Loss 0.5267   LearningRate 0.0000   Epoch: 19   Global Step: 113130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:08:59,287-Speed 3394.58 samples/sec   Loss 0.4763   LearningRate 0.0000   Epoch: 19   Global Step: 113140   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:09:02,305-Speed 3393.57 samples/sec   Loss 0.4862   LearningRate 0.0000   Epoch: 19   Global Step: 113150   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:09:05,344-Speed 3370.52 samples/sec   Loss 0.4515   LearningRate 0.0000   Epoch: 19   Global Step: 113160   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:09:08,344-Speed 3414.08 samples/sec   Loss 0.4520   LearningRate 0.0000   Epoch: 19   Global Step: 113170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:11,371-Speed 3384.45 samples/sec   Loss 0.5087   LearningRate 0.0000   Epoch: 19   Global Step: 113180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:14,384-Speed 3398.77 samples/sec   Loss 0.4977   LearningRate 0.0000   Epoch: 19   Global Step: 113190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:17,397-Speed 3400.10 samples/sec   Loss 0.4423   LearningRate 0.0000   Epoch: 19   Global Step: 113200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:20,409-Speed 3401.07 samples/sec   Loss 0.5068   LearningRate 0.0000   Epoch: 19   Global Step: 113210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:23,424-Speed 3395.96 samples/sec   Loss 0.4507   LearningRate 0.0000   Epoch: 19   Global Step: 113220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:26,463-Speed 3370.76 samples/sec   Loss 0.5165   LearningRate 0.0000   Epoch: 19   Global Step: 113230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:29,481-Speed 3393.34 samples/sec   Loss 0.5161   LearningRate 0.0000   Epoch: 19   Global Step: 113240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:32,499-Speed 3393.75 samples/sec   Loss 0.5633   LearningRate 0.0000   Epoch: 19   Global Step: 113250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:35,526-Speed 3384.55 samples/sec   Loss 0.5298   LearningRate 0.0000   Epoch: 19   Global Step: 113260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:38,546-Speed 3390.81 samples/sec   Loss 0.4612   LearningRate 0.0000   Epoch: 19   Global Step: 113270   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:09:41,551-Speed 3409.43 samples/sec   Loss 0.5142   LearningRate 0.0000   Epoch: 19   Global Step: 113280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:44,565-Speed 3397.74 samples/sec   Loss 0.5273   LearningRate 0.0000   Epoch: 19   Global Step: 113290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:47,602-Speed 3373.04 samples/sec   Loss 0.4878   LearningRate 0.0000   Epoch: 19   Global Step: 113300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:50,635-Speed 3376.71 samples/sec   Loss 0.5184   LearningRate 0.0000   Epoch: 19   Global Step: 113310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:53,650-Speed 3396.91 samples/sec   Loss 0.4259   LearningRate 0.0000   Epoch: 19   Global Step: 113320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:56,664-Speed 3398.77 samples/sec   Loss 0.5359   LearningRate 0.0000   Epoch: 19   Global Step: 113330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:09:59,682-Speed 3393.23 samples/sec   Loss 0.5062   LearningRate 0.0000   Epoch: 19   Global Step: 113340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:02,698-Speed 3396.64 samples/sec   Loss 0.5403   LearningRate 0.0000   Epoch: 19   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:05,722-Speed 3386.63 samples/sec   Loss 0.5279   LearningRate 0.0000   Epoch: 19   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:08,737-Speed 3397.91 samples/sec   Loss 0.5114   LearningRate 0.0000   Epoch: 19   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:11,759-Speed 3388.68 samples/sec   Loss 0.4573   LearningRate 0.0000   Epoch: 19   Global Step: 113380   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:10:14,760-Speed 3413.03 samples/sec   Loss 0.5706   LearningRate 0.0000   Epoch: 19   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:17,779-Speed 3392.77 samples/sec   Loss 0.5397   LearningRate 0.0000   Epoch: 19   Global Step: 113400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:20,795-Speed 3395.92 samples/sec   Loss 0.5255   LearningRate 0.0000   Epoch: 19   Global Step: 113410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:23,822-Speed 3383.71 samples/sec   Loss 0.4858   LearningRate 0.0000   Epoch: 19   Global Step: 113420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:26,836-Speed 3398.16 samples/sec   Loss 0.4894   LearningRate 0.0000   Epoch: 19   Global Step: 113430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:29,850-Speed 3398.48 samples/sec   Loss 0.4380   LearningRate 0.0000   Epoch: 19   Global Step: 113440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:10:32,854-Speed 3408.85 samples/sec   Loss 0.5289   LearningRate 0.0000   Epoch: 19   Global Step: 113450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:35,892-Speed 3372.38 samples/sec   Loss 0.4786   LearningRate 0.0000   Epoch: 19   Global Step: 113460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:38,926-Speed 3375.46 samples/sec   Loss 0.5205   LearningRate 0.0000   Epoch: 19   Global Step: 113470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:41,943-Speed 3395.45 samples/sec   Loss 0.5217   LearningRate 0.0000   Epoch: 19   Global Step: 113480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:44,958-Speed 3396.67 samples/sec   Loss 0.4637   LearningRate 0.0000   Epoch: 19   Global Step: 113490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:47,972-Speed 3398.12 samples/sec   Loss 0.5180   LearningRate 0.0000   Epoch: 19   Global Step: 113500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:50,990-Speed 3394.29 samples/sec   Loss 0.4811   LearningRate 0.0000   Epoch: 19   Global Step: 113510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:54,027-Speed 3372.81 samples/sec   Loss 0.4937   LearningRate 0.0000   Epoch: 19   Global Step: 113520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:10:57,041-Speed 3397.42 samples/sec   Loss 0.4477   LearningRate 0.0000   Epoch: 19   Global Step: 113530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:11:00,066-Speed 3385.84 samples/sec   Loss 0.4777   LearningRate 0.0000   Epoch: 19   Global Step: 113540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 13:11:03,098-Speed 3379.03 samples/sec   Loss 0.5145   LearningRate 0.0000   Epoch: 19   Global Step: 113550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:06,192-Speed 3309.67 samples/sec   Loss 0.4934   LearningRate 0.0000   Epoch: 19   Global Step: 113560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:09,208-Speed 3397.18 samples/sec   Loss 0.4894   LearningRate 0.0000   Epoch: 19   Global Step: 113570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:12,282-Speed 3331.38 samples/sec   Loss 0.5240   LearningRate 0.0000   Epoch: 19   Global Step: 113580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:15,301-Speed 3391.87 samples/sec   Loss 0.5036   LearningRate 0.0000   Epoch: 19   Global Step: 113590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:18,321-Speed 3392.44 samples/sec   Loss 0.4426   LearningRate 0.0000   Epoch: 19   Global Step: 113600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:21,345-Speed 3386.30 samples/sec   Loss 0.5241   LearningRate 0.0000   Epoch: 19   Global Step: 113610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:24,398-Speed 3355.26 samples/sec   Loss 0.5259   LearningRate 0.0000   Epoch: 19   Global Step: 113620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:27,517-Speed 3283.48 samples/sec   Loss 0.4634   LearningRate 0.0000   Epoch: 19   Global Step: 113630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:30,540-Speed 3388.83 samples/sec   Loss 0.4484   LearningRate 0.0000   Epoch: 19   Global Step: 113640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:33,557-Speed 3395.37 samples/sec   Loss 0.5249   LearningRate 0.0000   Epoch: 19   Global Step: 113650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-27 13:11:36,556-Speed 3415.35 samples/sec   Loss 0.4938   LearningRate 0.0000   Epoch: 19   Global Step: 113660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:39,574-Speed 3394.40 samples/sec   Loss 0.5447   LearningRate 0.0000   Epoch: 19   Global Step: 113670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:42,594-Speed 3391.35 samples/sec   Loss 0.4329   LearningRate 0.0000   Epoch: 19   Global Step: 113680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:45,615-Speed 3390.07 samples/sec   Loss 0.4785   LearningRate 0.0000   Epoch: 19   Global Step: 113690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:48,639-Speed 3386.47 samples/sec   Loss 0.5062   LearningRate 0.0000   Epoch: 19   Global Step: 113700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:51,741-Speed 3302.55 samples/sec   Loss 0.4741   LearningRate 0.0000   Epoch: 19   Global Step: 113710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-27 13:11:54,754-Speed 3398.57 samples/sec   Loss 0.4556   LearningRate 0.0000   Epoch: 19   Global Step: 113720   Fp16 Grad Scale: 65536   Required: -0 hours